Appearance
Bitmap Data Structure {#bitmap-data-structure
In our daily development process, we often need to store boolean data, such as a user's check-in record for a year, where checked in is represented by 1 and not checked in by 0, requiring a record for 365 days. If we use a regular key/value storage, each user would need to record 365 entries, which would require an astonishing amount of storage space when the number of users exceeds one hundred million.
To solve this problem, Redis provides the bitmap data structure, which only occupies a single bit for each day's check-in record. Thus, 365 days can be stored in just 365 bits, requiring only 46 bytes (approximately the size of a slightly longer string), significantly saving storage space.
A bitmap is not a special data structure; its content is essentially a regular string, or a byte array. We can use standard get
/set
commands to retrieve and set the entire bitmap's content, or use bitmap operations like getbit
/setbit
to treat the byte array as a "bit array."
When we need to count monthly active users, we need to deduplicate and use a set to record all active user IDs, which consumes a lot of memory. At this point, we can consider using a bitmap to mark user activity status. Each user will occupy a specific position in this bitmap, where 0 indicates inactivity and 1 indicates activity. At the end of the month, we can simply traverse the bitmap to get the monthly active user count. However, this method has conditions: the user IDs must be consecutive integers, and the activity rate should be relatively high; otherwise, it may not be worth the effort.
This section may seem a bit tedious. If readers feel a bit confused, that is normal, and they can skip ahead to the next section. Based on my experience, few candidates have experience using Redis bitmaps in interviews, so if you have some understanding of Redis bitmaps, it will be a bonus point in your interview.
Basic Usage
The bit array in Redis expands automatically. If a certain offset position exceeds the current content range, it will automatically zero-fill the bit array.
Next, we will use bit operations to set the string to "hello" (instead of using the set
command directly). First, we need to obtain the ASCII codes of "hello," which can easily be done using the Python command line to get the binary values of each character's ASCII code.
python
>>> bin(ord('h'))
'0b1101000' # High -> Low
>>> bin(ord('e'))
'0b1100101'
>>> bin(ord('l'))
'0b1101100'
>>> bin(ord('l'))
'0b1101100'
>>> bin(ord('o'))
'0b1101111'
Next, we use redis-cli
to set the first character, which is the first 8 bits of the bit array. We only need to set the bits with a value of 1, as shown above. The character 'h' requires bits 1, 2, and 4 to be set, while 'e' requires bits 9, 10, 13, and 15. It is important to note that the order of the bit array is the opposite of the bit order of the characters.
bash
127.0.0.1:6379> setbit s 1 1
(integer) 0
127.0.0.1:6379> setbit s 2 1
(integer) 0
127.0.0.1:6379> setbit s 4 1
(integer) 0
127.0.0.1:6379> setbit s 9 1
(integer) 0
127.0.0.1:6379> setbit s 10 1
(integer) 0
127.0.0.1:6379> setbit s 13 1
(integer) 0
127.0.0.1:6379> setbit s 15 1
(integer) 0
127.0.0.1:6379> get s
"he"
This example can be understood as "zero deposit, whole withdraw," and similarly, we can also have "zero deposit, zero withdraw" and "whole deposit, zero withdraw." "Zero deposit" means using setbit
to set the bit values one by one, while "whole deposit" means filling the entire bit array at once with a string, overwriting the old value.
Zero Deposit, Zero Withdraw
bash
127.0.0.1:6379> setbit w 1 1
(integer) 0
127.0.0.1:6379> setbit w 2 1
(integer) 0
127.0.0.1:6379> setbit w 4 1
(integer) 0
127.0.0.1:6379> getbit w 1 # Get the value at a specific position 0/1
(integer) 1
127.0.0.1:6379> getbit w 2
(integer) 1
127.0.0.1:6379> getbit w 4
(integer) 1
127.0.0.1:6379> getbit w 5
(integer) 0
Whole Deposit, Zero Withdraw
bash
127.0.0.1:6379> set w h # Whole deposit
(integer) 0
127.0.0.1:6379> getbit w 1
(integer) 1
127.0.0.1:6379> getbit w 2
(integer) 1
127.0.0.1:6379> getbit w 4
(integer) 1
127.0.0.1:6379> getbit w 5
(integer) 0
If the byte corresponding to a bit is a non-printable character, redis-cli
will display the hexadecimal representation of that character.
bash
127.0.0.1:6379> setbit x 0 1
(integer) 0
127.0.0.1:6379> setbit x 1 1
(integer) 0
127.0.0.1:6379> get x
"\xc0"
Statistics and Search
Redis provides bitmap statistics command bitcount
and bitmap search command bitpos
. bitcount
is used to count the number of 1s in a specified range, while bitpos
is used to find the first occurrence of 0 or 1 in a specified range.
For example, we can use bitcount
to count how many days a user has checked in, and we can use the bitpos
command to find out the first day the user checked in. If we specify the range parameters [start, end]
, we can count how many days the user checked in during a certain time period, or from a specific day onward.
Unfortunately, the start
and end
parameters are byte indexes, which means the specified bit range must be a multiple of 8 and cannot be arbitrary. This is puzzling, and I don't quite understand why Antirez designed it this way. Because of this design, we cannot directly calculate how many days a user checked in within a certain month; instead, we must retrieve the entire byte content covering that month (using getrange
to extract a substring) and then perform the counting in memory, which is quite cumbersome.
Next, let's try using the bitcount
and bitpos
commands:
bash
127.0.0.1:6379> set w hello
OK
127.0.0.1:6379> bitcount w
(integer) 21
127.0.0.1:6379> bitcount w 0 0 # Number of 1s in the first character
(integer) 3
127.0.0.1:6379> bitcount w 0 1 # Number of 1s in the first two characters
(integer) 7
127.0.0.1:6379> bitpos w 0 # First position of 0
(integer) 0
127.0.0.1:6379> bitpos w 1 # First position of 1
(integer) 1
127.0.0.1:6379> bitpos w 1 1 1 # First position of 1 starting from the second character
(integer) 9
127.0.0.1:6379> bitpos w 1 2 2 # First position of 1 starting from the third character
(integer) 17
Magic Command: Bitfield
In the previous sections, we set (using setbit
) and got (using getbit
) values for individual bits. To operate on multiple bits at once, we previously had to use pipelining. However, starting from version 3.2, Redis introduced a powerful command that allows multiple bit operations without the need for pipelining.
The bitfield
command has three subcommands: get
, set
, and incrby
, which can read and write to specified bit segments but can handle a maximum of 64 consecutive bits. If you exceed this limit, multiple subcommands must be used. This command allows you to execute several subcommands in one go.
Let's look at a simple example:
bash
127.0.0.1:6379> set w hello
OK
127.0.0.1:6379> bitfield w get u4 0 # Get 4 bits starting from the first; result is unsigned (u)
(integer) 6
127.0.0.1:6379> bitfield w get u3 2 # Get 3 bits starting from the third; result is unsigned (u)
(integer) 5
127.0.0.1:6379> bitfield w get i4 0 # Get 4 bits starting from the first; result is signed (i)
1) (integer) 6
127.0.0.1:6379> bitfield w get i3 2 # Get 3 bits starting from the third; result is signed (i)
1) (integer) -3
A signed number means that the first bit is the sign bit; if it's 1, the number is negative. An unsigned number has no sign bit and represents only non-negative values. Redis can retrieve up to 64 bits for signed numbers and 63 bits for unsigned numbers.
Next, we can execute multiple subcommands at once:
bash
127.0.0.1:6379> bitfield w get u4 0 get u3 2 get i4 0 get i3 2
1) (integer) 6
2) (integer) 5
3) (integer) 6
4) (integer) -3
Next, let's use the set
subcommand to change the second character 'e' to 'a' (ASCII code 97), which returns the old value:
bash
127.0.0.1:6379> bitfield w set u8 8 97 # From the 9th bit, replace the next 8 bits with unsigned number 97
1) (integer) 101
127.0.0.1:6379> get w
"hallo"
The incrby
subcommand allows for incrementing bits in a specified range. Overflow may occur with positive increments (overflow up) or negative increments (underflow). Redis defaults to wrapping; if overflow occurs, it discards the sign bit. For example, if an 8-bit unsigned number 255 is incremented by 1, it will wrap around to 0.
Let's practice the incrby
subcommand:
bash
127.0.0.1:6379> set w hello
OK
127.0.0.1:6379> bitfield w incrby u4 2 1 # Increment the next 4 bits from the third by 1
1) (integer) 11
Continuing this, we can see how overflow affects the values:
bash
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 12
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 13
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 14
127.0.0.1:6379> bitfield w incrby u4 2 1
1) (integer) 15
127.0.0.1:6379> bitfield w incrby u4 2 1 # Overflow wraps
1) (integer) 0
The bitfield
command provides an overflow
subcommand, allowing users to choose overflow behavior. The default is wrap; you can also choose fail (report error without executing) or saturate (stop at maximum or minimum). This only affects the first subsequent command.
Let's see both behaviors:
Saturation (SAT)
bash
127.0.0.1:6379> set w hello
OK
127.0.0.1:6379> bitfield w overflow sat incrby u4 2 1
1) (integer) 11
127.0.0.1:6379> bitfield w overflow sat incrby u4 2 1
1) (integer) 12
127.0.0.1:6379> bitfield w overflow sat incrby u4 2 1
1) (integer) 13
127.0.0.1:6379> bitfield w overflow sat incrby u4 2 1
1) (integer) 14
127.0.0.1:6379> bitfield w overflow sat incrby u4 2 1
1) (integer) 15
127.0.0.1:6379> bitfield w overflow sat incrby u4 2 1 # Keeps at maximum
1) (integer) 15
Fail (FAIL)
bash
127.0.0.1:6379> set w hello
OK
127.0.0.1:6379> bitfield w overflow fail incrby u4 2 1
1) (integer) 11
127.0.0.1:6379> bitfield w overflow fail incrby u4 2 1
1) (integer) 12
127.0.0.1:6379> bitfield w overflow fail incrby u4 2 1
1) (integer) 13
127.0.0.1:6379> bitfield w overflow fail incrby u4 2 1
1) (integer) 14
127.0.0.1:6379> bitfield w overflow fail incrby u4 2 1
1) (integer) 15
127.0.0.1:6379> bitfield w overflow fail incrby u4 2 1 # Does not execute
1) (nil)
Thoughts & Homework
In this article, we used bit operations to set two characters "he." Please use bit operations to set all five characters of the word "hello." The bitfield
command can execute multiple set/get/incrby
subcommands simultaneously; please try to complete this task.