Persistence

Redis stores all its data in memory, which means that if it suddenly crashes, all data will be lost. Therefore, a mechanism is required to ensure that Redis data is not lost due to failures. This mechanism is Redis's persistence strategy.

Redis offers two types of persistence: snapshots and AOF (Append Only File) logs. Snapshots provide a full backup at a specific point in time, while AOF logs record continuous incremental backups. Snapshots are binary serialized forms of the in-memory data, making them very compact, while AOF logs contain textual records of commands that modify the in-memory data. Over time, AOF logs can become exceedingly large, and during a database restart, loading the AOF log for command replay can take a long time. Therefore, periodic AOF rewriting is necessary to reduce its size.

Snapshot Principle

Redis is a single-threaded program, which means that this thread is responsible for handling multiple concurrent client socket read/write operations as well as the logical read/write operations of in-memory data structures.

While handling requests, Redis must also perform memory snapshots, which require file I/O operations. However, file I/O cannot utilize the multiplexing API.

This means that the single thread must manage both client requests and file I/O, which can severely impact server performance. Additionally, to avoid blocking ongoing business operations, Redis needs to persist data while still responding to client requests. If a large hash dictionary is being persisted and a request comes in to delete it before the snapshot is complete, it complicates the process.

So, how does Redis handle this?

Redis uses the operating system's multi-process COW (Copy On Write) mechanism to achieve snapshot persistence. This mechanism is interesting and not widely known, and it serves as an important indicator of a programmer's knowledge breadth.

Forking (Multi-Process)

During persistence, Redis calls the fork function from glibc to create a child process, which handles the snapshot persistence while the parent process continues to process client requests. When the child process is created, it shares the code and data segments in memory with the parent process. At this moment, you can think of the parent and child processes as conjoined twins sharing a body. This mechanism is designed to save memory resources by maximizing sharing. The memory footprint remains nearly unchanged at the moment of forking.

The logic of process separation can be described in Python as follows. The fork function returns in both processes: it returns the child process's PID in the parent process and zero in the child process. If the operating system runs low on memory, the PID will be negative, indicating a fork failure.

python

pid = os.fork()
if pid > 0:
    handle_client_requests()  # Parent continues to handle client requests
if pid == 0:
    handle_snapshot_write()  # Child handles snapshot write to disk
if pid < 0:
    # Fork error

The child process handles data persistence without modifying the existing memory data structures; it simply traverses the data structures, reads them, and serializes them to disk. Meanwhile, the parent process continues serving client requests and modifying the in-memory data structures.

At this point, the operating system's COW mechanism is used to separate the data segment pages. The data segment consists of many pages combined by the operating system. When the parent process modifies one of these shared pages, it creates a copy of that page, allowing the modification to happen on the new copy. The child process continues to see the data from the moment the processes were created.

As the parent process continues to modify data, more and more shared pages are separated, leading to a gradual increase in memory usage. However, this increase will not exceed twice the original data size. In many Redis instances, cold data occupies a significant portion, so it is rare for all pages to be separated—typically, only some pages are affected. Each page is usually 4K, and a Redis instance can contain thousands of pages.

The child process, seeing no changes in its data, can safely traverse and serialize the data to disk, which is why Redis persistence is referred to as "snapshot."

AOF Principle

AOF logs store a sequential record of commands executed by the Redis server that modify the in-memory data. If an AOF log records all modifying commands since the Redis instance was created, it can restore the current state of the in-memory data structure by replaying all commands on an empty Redis instance.

When Redis receives a client modification command, it performs parameter validation and logical processing. If everything checks out, it immediately stores the command text in the AOF log before executing it. This is different from storage engines like LevelDB and HBase, which typically store the log before performing logical processing.

Over time, the AOF log can grow significantly. If the instance crashes and restarts, replaying the entire AOF log can be time-consuming, making Redis unavailable for an extended period. Therefore, AOF logs need to be reduced in size.

AOF Rewriting

Redis provides the bgrewriteaof command to reduce the size of the AOF log. The principle involves creating a child process that traverses memory to convert the data into a series of Redis operation commands, serializing them into a new AOF log file. After serialization, any incremental AOF log generated during this process is appended to the new AOF log file. Once complete, the new file replaces the old AOF log file, finishing the reduction process.

fsync

AOF logs exist as files, and when the program writes to the AOF log file, the content is actually written to a memory buffer allocated by the kernel for the file descriptor. The kernel then asynchronously flushes the dirty data back to disk.

This means that if the machine suddenly crashes, the AOF log content may not have been fully written to disk, leading to potential data loss. What can be done?

Linux's glibc provides the fsync(int fd) function, which forces the specified file's contents to be flushed from the kernel cache to disk. If Redis processes call fsync in real-time, the AOF log can be safeguarded against loss. However, fsync is a slow disk I/O operation! If Redis were to execute a command and then call fsync each time, its high-performance status would be compromised.

In production environments, Redis typically executes fsync every second. This 1-second interval is configurable, representing a trade-off between data safety and performance, aiming to maintain high performance while minimizing data loss.

Redis also offers two other strategies: one that never calls fsync—leaving the operating system to decide when to sync to disk (not safe)—and one that calls fsync after every command (very slow). However, these options are generally not used in production.

Operations and Maintenance

Snapshots involve resource-intensive operations, including traversing the entire memory and writing large amounts of data to disk, which can burden system performance. AOF's fsync is a time-consuming I/O operation that lowers Redis's performance and increases system I/O load.

Typically, Redis's primary nodes do not perform persistence operations; instead, these operations are primarily conducted on replica nodes. Replica nodes are backup nodes and do not face client request pressure, leaving their operating system resources more available.

However, if a network partition occurs, leading to a long-term disconnection between the replica node and the primary node, data inconsistency issues can arise, especially if the primary node crashes during the partition. To mitigate this risk, it's crucial to maintain real-time monitoring to ensure network stability and fast recovery. Adding additional replica nodes can also reduce the likelihood of network partitions; as long as one replica node has a consistent data sync, data loss can be avoided.

Redis 4.0 Hybrid Persistence

When restarting Redis, we rarely use RDB to restore the memory state since it can lead to significant data loss. Instead, we typically rely on AOF log replay. However, AOF log replay is significantly slower than RDB restoration, leading to lengthy startup times in large Redis instances.

To address this, Redis 4.0 introduced a new persistence option—hybrid persistence. This method combines RDB file contents with incremental AOF logs. Here, the AOF log only captures the incremental changes during the time between when persistence starts and ends, which is usually a small amount of data.

Therefore, during a Redis restart, it can first load the RDB content and then replay the incremental AOF log, significantly enhancing restart efficiency compared to relying solely on the full AOF file.

Thought Questions

Some say Redis is only suitable as a cache and not as a database. What are your thoughts on this?
Why does Redis execute commands before logging them in the AOF instead of the reverse, as seen in other storage engines?

Persistence ​

Snapshot Principle ​

Forking (Multi-Process) ​

AOF Principle ​

AOF Rewriting ​

fsync ​

Operations and Maintenance ​

Redis 4.0 Hybrid Persistence ​

Thought Questions ​