Primary Key

In a relational database, each row of data in a table is called a record. A record is composed of multiple fields. For example, two records from the students table might look like this:

id	class_id	name	gender	score
1	1	Xiao Ming	M	90
2	1	Xiao Hong	F	95

Each record contains several defined fields, and all records in the same table have the same field definitions.

An important constraint for relational tables is that no two records can be identical. "Not identical" means that there must be a field that can uniquely distinguish different records; this field is known as the primary key.

For example, if we choose the name field as the primary key, we can uniquely identify a record by the name "Xiao Ming" or "Xiao Hong." However, this setup would prevent us from storing students with the same name, as inserting two records with the same primary key is not allowed.

The most critical requirement for a primary key is that once a record is inserted into a table, the primary key should ideally not be modified. The primary key is used to uniquely locate records, and changing it can cause a series of issues.

Because the role of the primary key is crucial, how it is selected significantly impacts business development. For instance, using a student’s ID number as the primary key may seem unique. However, since the ID number is a business-related field, any changes to it would necessitate modifying the primary key, potentially leading to serious business ramifications.

Thus, a fundamental principle in selecting a primary key is: do not use any business-related fields as the primary key.

As a result, fields that appear unique, such as ID numbers, phone numbers, and email addresses, should not be used as primary keys.

The best choice for a primary key is a field that is entirely unrelated to business logic, typically named id. Common types that can serve as the id field include:

Auto-incrementing integer: The database automatically assigns an incrementing integer to each record upon insertion, alleviating concerns about primary key duplication and eliminating the need to pre-generate keys.
Globally unique GUID: Also known as UUID, this uses a globally unique string as the primary key, such as 8f55d96b-8acc-4636-8cb8-76bf8abc2f57. The GUID algorithm ensures that any string generated is unique across computers and times by using the MAC address, timestamp, and random numbers. Most programming languages include built-in GUID algorithms.

For most applications, an auto-incrementing primary key will suffice. In the students table, we define the primary key as BIGINT NOT NULL AUTO_INCREMENT.

Note

If you use an INT auto-incrementing type, you will encounter a limit once the number of records exceeds 2,147,483,647 (approximately 2.1 billion). Using BIGINT allows for up to about 92 quintillion records.

Composite Primary Key

Relational databases also allow multiple fields to uniquely identify a record, known as a composite primary key.

With a composite primary key, one column may have duplicate values as long as the combination of all primary key columns is unique:

id_num	id_type	other columns...
1	A	...
2	A	...
2	B	...

If we use the id_num and id_type columns as a composite primary key, the above three records are valid because no combination of the two key columns is identical.

Unless necessary, we generally avoid using composite primary keys, as they increase the complexity of relational tables.

Summary

The primary key is the unique identifier for records in a relational table. Selecting the primary key is very important: it should not carry business meaning and should typically be of type BIGINT auto-increment or GUID. The primary key should also not allow NULL values.

Multiple columns can be used as a composite primary key, but composite primary keys are not commonly used.

Primary Key ​

Composite Primary Key ​

Summary ​

Primary Key

Composite Primary Key

Summary