Avant d’utiliser PostgreSQL, assurons-nous de consolider nos fondations en abordant la théorie générale de bases de données. Vous n’aurez besoin d’entrer aucun des codes en exemple; ils ne servent qu’à des fins d’illustration.
Objectif de cette leçon: Comprendre les concepts fondamentaux des Bases de Données
Une base de données consiste en un ensemble organisé de données destinées à une ou plusieurs utilisations, généralement sous format numérique. - Wikipedia
Un Système de Gestion de Base de Données (SGBD) se compose d’un logiciel qui exploite des bases de données, gère le stockage, l’accès, la sécurité, la sauvegarde/restauration et autres fonctionnalités. - Wikipedia
In relational databases and flat file databases, a table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows. A table has a specified number of columns, but can have any number of rows. Each row is identified by the values appearing in a particular column subset which has been identified as a candidate key. - Wikipedia
id | name | age
----+-------+-----
1 | Tim | 20
2 | Horst | 88
(2 rows)
In SQL databases a table is also known as a relation.
A column is a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed. The term field is often used interchangeably with column, although many consider it more correct to use field (or field value) to refer specifically to the single item that exists at the intersection between one row and one column. - Wikipedia
Une colonne:
| name |
+-------+
| Tim |
| Horst |
Un champ:
| Horst |
Un enregistrement est une information stockée dans une ligne d’une table. Chaque enregistrement aura un champ pour chacune des colonnes de la table.
2 | Horst | 88 <-- one record
Les Types de données restreignent le genre d’informations pouvant être stockées dans une colonne. - Tim and Horst
Il existe différents type de données. Voyons les plus fréquents:
Texte ou String - pour stocker des données de texte libre
Entier ou Integer - pour stocker des nombres entiers
Réel ou Real - pour stocker les nombres décimaux
Date - pour stocker l’anniversaire de Horst afin que personne n’oublie
Booléen ou Boolean - pour stocker des valeurs vrai/faux
Dans une base de données, vous pouvez autoriser de ne rien stocker dans un champ. S’il n’y a rien dans un champ, alors le contenu de ce champ est référencé comme une valeur ‘null’:
insert into person (age) values (40);
select * from person;
Résultat:
id | name | age
----+-------+-----
1 | Tim | 20
2 | Horst | 88
4 | | 40 <-- null for name
(3 rows)
Il existe beaucoup plus de types de données utilisables - Lisez le manuel de PostgreSQL!
Let’s use a simple case study to see how a database is constructed. We want to create an address database.
Write down the properties which make up a simple address and which we would want to store in our database.
The properties that describe an address are the columns. The type of information stored in each column is its datatype. In the next section we will analyse our conceptual address table to see how we can make it better!
The process of creating a database involves creating a model of the real world; taking real world concepts and representing them in the database as entities.
Une des idées principales dans une base de données consiste à éviter la duplication/redondance de données. Le processus de suppression de la redondance dans une base de données s’appelle Normalisation.
Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics - insertion, update, and deletion anomalies - that could lead to a loss of data integrity. - Wikipedia
There are different kinds of normalisation ‘forms’.
Examinons cet exemple tout simple:
Table "public.people"
Column | Type | Modifiers
----------+------------------------+-----------------------------------------
id | integer | not null default
| | nextval('people_id_seq'::regclass)
| |
name | character varying(50) |
address | character varying(200) | not null
phone_no | character varying |
Indexes:
"people_pkey" PRIMARY KEY, btree (id)
select * from people;
id | name | address | phone_no
--+---------------+-----------------------------+-------------
1 | Tim Sutton | 3 Buirski Plein, Swellendam | 071 123 123
2 | Horst Duester | 4 Avenue du Roix, Geneva | 072 121 122
(2 rows)
Imaginez que vous avez beaucoup d’amis avec le même nom de rue ou de ville. Chaque fois que ces données sont dupliquées, elles consomment de l’espace. Pire encore, si un nom de ville change, vous devrez effectuer beaucoup de travail pour mettre à jour votre base de données.
Redesign the theoretical people table above to reduce duplication and to normalise the data structure.
You can read more about database normalisation here
Un index de base de données est un lot de données destiné à accélérer les opérations de recherche de données. - Wikipedia
Imaginez que vous lisez un bouquin et cherchez la définition d’un concept - et que le bouquin n’a pas d’index! Vous devrez lire le livre depuis le début et page après page jusqu’à ce que vous trouviez l’information recherchée. L’index à la fin du livre vous permet d’aller directement à la page contenant l’information pertinente.
create index person_name_idx on people (name);
Maintenant, les recherches sur le nom seront plus rapides:
Table "public.people"
Column | Type | Modifiers
----------+------------------------+-----------------------------------------
id | integer | not null default
| | nextval('people_id_seq'::regclass)
| |
name | character varying(50) |
address | character varying(200) | not null
phone_no | character varying |
Indexes:
"people_pkey" PRIMARY KEY, btree (id)
"person_name_idx" btree (name)
Une séquence est un générateur de nombre unique. Il est normalement utilisé pour créer un identifiant unique pour une colonne d’une table.
Dans cet exemple, id est une séquence - le nombre est incrémenté chaque fois qu’un nouvel enregistrement est ajouté à la table:
id | name | address | phone_no
---+--------------+-----------------------------+-------------
1 | Tim Sutton | 3 Buirski Plein, Swellendam | 071 123 123
2 | Horst Duster | 4 Avenue du Roix, Geneva | 072 121 122
In a normalised database, you typically have many relations (tables). The entity-relationship diagram (ER Diagram) is used to design the logical dependencies between the relations. Consider our non-normalised people table from earlier in the lesson:
select * from people;
id | name | address | phone_no
----+--------------+-----------------------------+-------------
1 | Tim Sutton | 3 Buirski Plein, Swellendam | 071 123 123
2 | Horst Duster | 4 Avenue du Roix, Geneva | 072 121 122
(2 rows)
With a little work we can split it into two tables, removing the need to repeat the street name for individuals who live in the same street:
select * from streets;
id | name
----+--------------
1 | Plein Street
(1 row)
et:
select * from people;
id | name | house_no | street_id | phone_no
----+--------------+----------+-----------+-------------
1 | Horst Duster | 4 | 1 | 072 121 122
(1 row)
We can then link the two tables using the ‘keys’ streets.id
and
people.streets_id
.
If we draw an ER Diagram for these two tables it would look something like this:
The ER Diagram helps us to express ‘one to many’ relationships. In this case the arrow symbol show that one street can have many people living on it.
Our people model still has some normalisation issues - try to see if you can normalise it further and show your thoughts by means of an ER Diagram.
A database constraint is used to ensure that data in a relation matches the
modeller’s view of how that data should be stored. For example a constraint on
your postal code could ensure that the number falls between 1000
and
9999
.
A Primary key is one or more field values that make a record unique. Usually the primary key is called id and is a sequence.
A Foreign key is used to refer to a unique record on another table (using that other table’s primary key).
In ER Diagramming, the linkage between tables is normally based on Foreign keys linking to Primary keys.
If we look at our people example, the table definition shows that the street column is a foreign key that references the primary key on the streets table:
Table "public.people"
Column | Type | Modifiers
-----------+-----------------------+--------------------------------------
id | integer | not null default
| | nextval('people_id_seq'::regclass)
name | character varying(50) |
house_no | integer | not null
street_id | integer | not null
phone_no | character varying |
Indexes:
"people_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"people_street_id_fkey" FOREIGN KEY (street_id) REFERENCES streets(id)
When adding, changing, or deleting data in a database, it is always important that the database is left in a good state if something goes wrong. Most databases provide a feature called transaction support. Transactions allow you to create a rollback position that you can return to if your modifications to the database did not run as planned.
Take a scenario where you have an accounting system. You need to transfer funds from one account and add them to another. The sequence of steps would go like this:
- remove R20 from Joe
- add R20 to Anne
If something goes wrong during the process (e.g. power failure), the transaction will be rolled back.
Databases allow you to manage data in a structured way using simple code structures.
Now that we’ve looked at how databases work in theory, let’s create a new database to implement the theory we’ve covered.