Appearance
Overview of Relational Databases
Why Do We Need Databases?
Applications need to save user data. For instance, Word needs to save user documents for future editing or copying to another computer.
A simple way to save user data is to write it to a file. For example, to save information for all students in a class, one could create a CSV file:
id,name,gender,score
1, Xiaoming, M, 90
2, Xiaohong, F, 95
3, Xiaojun, M, 88
4, Xiaoli, F, 88
To save information for all classes in a school, another CSV file could be written.
However, as applications become more complex and data volume increases, managing this data becomes a significant issue:
- Reading and writing files and parsing data require a lot of repetitive code.
- Quickly querying specific data from thousands of records involves complex logic.
If each application writes its own data reading/writing code, it is inefficient and prone to errors. Furthermore, the data access interfaces differ between applications, making data reuse difficult.
Thus, databases emerged as software specifically designed to manage data. Applications no longer need to manage data themselves but instead read and write data through interfaces provided by database software. The details of how data is stored are handled by the database software, freeing applications from those concerns:
┌───────────┐
│application│
└───────────┘
▲ │
│ │
read│ │write
│ │
│ ▼
┌───────────┐
│ database │
└───────────┘
This greatly simplifies data reading and writing in application development.
Data Models
Databases organize, store, and manage data based on data structures, and there are three main types of models:
Hierarchical Model: Organizes data in a “parent-child” hierarchy, resembling a tree structure.
Network Model: Connects each data node to many others, forming a structure like a road network between cities.
Relational Model: Represents data as a two-dimensional table. Each data entry can be uniquely identified by its row and column numbers, similar to an Excel spreadsheet.
Over time, relational databases based on the relational model have gained a dominant market share.
Why Do Relational Databases Have the Widest Application?
The relational model is the simplest to understand and use compared to hierarchical and network models. It is based on mathematical theory. While the theory can be complex, understanding it through everyday relationships, such as a class of students represented in a table, is straightforward:
ID | Name | Class ID | Gender | Age |
---|---|---|---|---|
1 | Xiaoming | 201 | M | 9 |
2 | Xiaohong | 202 | F | 8 |
3 | Xiaojun | 202 | M | 8 |
4 | Xiaobai | 201 | F | 9 |
The Class ID corresponds to another class table:
ID | Name | Teacher |
---|---|---|
201 | Class 1 | Wang |
202 | Class 2 | Li |
Given a class name, one can look up a class record, and using the Class ID, multiple student records can be retrieved, establishing a "one-to-many" relationship between the two tables.
Data Types
For a relational table, it is essential to define both the name and data type of each column. Standard data types supported by relational databases include:
Name | Type | Description |
---|---|---|
INT | Integer | 4-byte integer type, range about +/-2.1 billion |
BIGINT | Long Integer | 8-byte integer type, range about +/-922 trillion |
REAL | Floating | 4-byte floating-point number, range about +/-10^38 |
DOUBLE | Floating | 8-byte floating-point number, range about +/-10^308 |
DECIMAL(M,N) | High Precision | User-specified precision decimal, e.g., DECIMAL(20,10) for 20 digits total with 10 after the decimal |
CHAR(N) | Fixed Length | Stores fixed-length strings, e.g., CHAR(100) always stores 100 characters |
VARCHAR(N) | Variable Length | Stores variable-length strings, e.g., VARCHAR(100) can store 0-100 characters |
BOOLEAN | Boolean | Stores True or False |
DATE | Date | Stores dates, e.g., 2018-06-22 |
TIME | Time | Stores time, e.g., 12:20:59 |
DATETIME | Date and Time | Stores date and time, e.g., 2018-06-22 12:20:59 |
The table above lists the most commonly used data types. Many data types also have aliases, for example, REAL can also be written as FLOAT(24). Some less common data types include TINYINT (range 0-255). Each database vendor may also support specific data types like JSON.
When choosing data types, select the appropriate type based on business rules. Generally, BIGINT meets integer storage needs, while VARCHAR(N) meets string storage needs; these two types are the most widely used.
Major Relational Databases
Currently, major relational databases can be categorized into:
- Commercial Databases: e.g., Oracle, SQL Server, DB2.
- Open Source Databases: e.g., MySQL, PostgreSQL.
- Desktop Databases: e.g., Microsoft Access, suitable for desktop applications.
- Embedded Databases: e.g., SQLite, suitable for mobile apps and desktop programs.
What is SQL?
SQL stands for Structured Query Language, used to access and manipulate database systems. SQL statements can query data from the database, as well as add, update, and delete data, in addition to managing and maintaining the database. Different databases support SQL, so by learning SQL, we can interact with various databases.
Although SQL has been defined as a standard by ANSI, unfortunately, different databases do not consistently support standard SQL. Most databases extend standard SQL, meaning that while standard SQL theoretically works across databases, using a specific database's extended SQL may not be executable on another. For example, Oracle refers to its extended SQL as PL/SQL, and Microsoft calls it T-SQL.
In practice, if we only use the core functionalities of standard SQL, most databases can execute those. Less common SQL functionalities may have varying levels of support among databases. The specific features supported by each database's extensions are often referred to as “dialects.”
Overall, SQL defines several capabilities for interacting with databases:
DDL: Data Definition Language allows users to define data by creating, deleting, and modifying table structures. Typically executed by database administrators.
DML: Data Manipulation Language provides users the ability to add, delete, and update data—essentially the daily operations applications perform on the database.
DQL: Data Query Language allows users to query data, which is often the most frequent operation in database interactions.
Syntax Characteristics
SQL keywords are not case-sensitive! However, different databases may distinguish case for table names and column names. In the same database, case sensitivity may differ on Linux and Windows.
Thus, in this tutorial, we will adopt the convention of writing SQL keywords in uppercase for emphasis, while using lowercase for table names and column names.