Appearance
GeoHash
Redis introduced the GEO module starting from version 3.2, which means we can use Redis to implement features like "nearby Mobikes" for Mobike and "nearby restaurants" for Meituan and Ele.me.
Using a Database to Calculate Nearby People
Geographic location data is represented using two-dimensional latitude and longitude coordinates, with longitude ranging from (-180, 180] and latitude from (-90, 90]. Positive latitude is north of the equator and negative is south; positive longitude is east of the Prime Meridian (Greenwich Observatory in the UK) and negative is west. For example, the coordinates for the Juejin office in Wangjing SOHO are (116.48105, 39.996794), both positive, as China is located in the northeastern hemisphere.
When the distance between two elements is not very far, we can directly use the Pythagorean theorem to calculate the distance between the elements. The "nearby people" feature we commonly use involves small distances, so the Pythagorean theorem is sufficient for distance calculations. However, it's important to note that the density of latitude and longitude coordinates is not uniform (the Earth is an ellipse). When calculating the square difference with the Pythagorean theorem, a weighted sum based on certain coefficients is needed. If exact precision is not required, weighting can be skipped.
Question: The longitude covers 360 degrees, while the latitude covers only 180 degrees. Why is the distance density not 2:1?
Now, if we want to calculate "nearby people," meaning given a coordinate of an element, we need to calculate other elements nearby and sort them by distance. How should we approach this?
If the elements' latitude and longitude coordinates are stored in a relational database (element id, longitude x, latitude y), how would you calculate it?
First, you cannot calculate the distance between all elements and the target element by brute force, as the computation load would be too high, and performance would certainly suffer. The typical approach is to limit the number of elements by defining a rectangular area, and then calculate the distances for elements within that area and sort them. This significantly reduces the computational load. How do we define the rectangular area? We can specify a radius r and use a SQL query to select elements within that radius. If users are not satisfied with the filtered results, they can expand the radius for further filtering.
sql
SELECT id FROM positions WHERE x0-r < x < x0+r AND y0-r < y < y0+r
To achieve high performance with the rectangular area algorithm, the data table should have a bi-directional composite index on the latitude and longitude coordinates (x, y) to optimize query performance.
However, database query performance is limited. If "nearby people" queries are very frequent, especially in high-concurrency scenarios, this may not be the best solution.
GeoHash Algorithm
A widely used geographic distance sorting algorithm in the industry is the GeoHash algorithm, which Redis also uses. The GeoHash algorithm maps two-dimensional latitude and longitude data to a one-dimensional integer, meaning all elements are mapped onto a single line, and the distances between points that are close in two-dimensional coordinates will also be close when mapped to one dimension. When calculating "nearby people," we first map the target location onto this line, then retrieve nearby points from this one-dimensional space.
So, how does this mapping algorithm work? It treats the entire Earth as a two-dimensional plane, dividing it into a series of square grids, similar to a Go board. All map element coordinates are placed within a unique grid. The smaller the grid, the more precise the coordinates. These grids are then integer-encoded, with closer grids receiving closer codes. A simple method for encoding is the "cake cutting" method. Imagine a square cake in front of you; a cut divides it into four smaller squares, labeled as 00, 01, 10, and 11 in binary. Each small square can be further cut, with each further division allowing the squares to be represented by increasingly longer binary integers, improving precision.
In the above example, a two-cut method is used, but in real algorithms, there are many other cutting methods, resulting in different encoded integers.
After encoding, each map element's coordinates become an integer, from which the coordinates can be restored; the longer the integer, the less loss there is in the restored coordinate values. For the "nearby people" feature, minor precision loss can be ignored.
The GeoHash algorithm then performs a base32 encoding (0-9, a-z excluding the letters a, i, l, o) to convert the integer into a string. In Redis, latitude and longitude are encoded as a 52-bit integer, placed in a sorted set (zset), where the value is the element's key and the score is the 52-bit GeoHash integer. Although the zset score is a floating-point number, it can store 52-bit integers without loss.
When using Redis for Geo queries, we must remember that its internal structure is essentially a zset (skiplist). By sorting based on zset scores, we can find other elements near the coordinates (though actual situations are more complex, this understanding is sufficient). By converting the scores back to coordinate values, we can obtain the original coordinates of the elements.
Basic Usage of Redis Geo Commands
Redis provides only six Geo commands, which are easy to master. Users should remember that it is simply a regular zset structure.
Adding Elements
The geoadd
command takes the set name and multiple longitude-latitude-name triples. Note that multiple triples can be added.
plaintext
127.0.0.1:6379> geoadd company 116.48105 39.996794 juejin
(integer) 1
127.0.0.1:6379> geoadd company 116.514203 39.905409 ireader
(integer) 1
127.0.0.1:6379> geoadd company 116.489033 40.007669 meituan
(integer) 1
127.0.0.1:6379> geoadd company 116.562108 39.787602 jd 116.334255 40.027400 xiaomi
(integer) 2
You may wonder why Redis does not provide a geo delete command. As mentioned earlier, the geo storage structure uses zset, which means we can use zset-related commands to manipulate geo data, so the delete command can simply use zrem
.
Calculating Distances
The geodist
command calculates the distance between two elements, taking the set name, two names, and the distance unit.
plaintext
127.0.0.1:6379> geodist company juejin ireader km
"10.5501"
127.0.0.1:6379> geodist company juejin meituan km
"1.3878"
127.0.0.1:6379> geodist company juejin jd km
"24.2739"
127.0.0.1:6379> geodist company juejin xiaomi km
"12.9606"
127.0.0.1:6379> geodist company juejin juejin km
"0.0000"
We can see that Juejin is closest to Meituan since they are both in Wangjing. The distance unit can be m, km, ml, or ft, representing meters, kilometers, miles, and feet, respectively.
Getting Element Positions
The geopos
command retrieves the latitude and longitude coordinates of any element in the set and can fetch multiple at once.
plaintext
127.0.0.1:6379> geopos company juejin
1) 1) "116.48104995489120483"
2) "39.99679348858259686"
127.0.0.1:6379> geopos company ireader
1) 1) "116.5142020583152771"
2) "39.90540918662494363"
127.0.0.1:6379> geopos company juejin ireader
1) 1) "116.48104995489120483"
2) "39.99679348858259686"
2) 1) "116.5142020583152771"
2) "39.90540918662494363"
We observe a slight error in the retrieved latitude and longitude coordinates compared to the coordinates added with geoadd
. This is because the geohash's one-dimensional mapping of two-dimensional coordinates is lossy, resulting in small discrepancies when restoring the values. For the "nearby people" feature, this error is negligible.
Getting Element's Hash Value
The geohash
command retrieves the base32 encoded string of the element's latitude and longitude. You can use this encoded value at geohash.org/${hash} for standard geohash encoding.
plaintext
127.0.0.1:6379> geohash company ireader
1) "wx4g52e1ce0"
127.0.0.1:6379> geohash company juejin
1) "wx4gd94yjn0"
Let’s visit the link geohash.org/wx4g52e1ce0...
It’s an accurate location.
Nearby Companies
The georadiusbymember
command is the most crucial command, allowing us to query other elements near a specified element. Its parameters are quite complex.
- **Within a 20 km radius, up to 3 elements sorted by distance (
including itself)😗*
plaintext
127.0.0.1:6379> georadiusbymember company ireader 20 km count 3 asc
1) "ireader"
2) "juejin"
3) "meituan"
- Within a 20 km radius, up to 3 elements sorted in reverse order:
plaintext
127.0.0.1:6379> georadiusbymember company ireader 20 km count 3 desc
1) "jd"
2) "meituan"
3) "juejin"
- Three optional parameters:
withcoord
,withdist
,withhash
for additional information.withdist
is very useful as it displays distances.
plaintext
127.0.0.1:6379> georadiusbymember company ireader 20 km withcoord withdist withhash count 3 asc
1) 1) "ireader"
2) "0.0000"
3) (integer) 4069886008361398
4) 1) "116.5142020583152771"
2) "39.90540918662494363"
2) 1) "juejin"
2) "10.5501"
3) (integer) 4069887154388167
4) 1) "116.48104995489120483"
2) "39.99679348858259686"
3) 1) "meituan"
2) "11.5748"
3) (integer) 4069887179083478
4) 1) "116.48903220891952515"
2) "40.00766997707732031"
In addition to georadiusbymember
, Redis also provides a command to query nearby elements based on coordinate values, which is even more useful for calculating "nearby cars" or "nearby restaurants" based on user location. The parameters are similar to georadiusbymember
, except the target element is now latitude and longitude coordinates.
plaintext
127.0.0.1:6379> georadius company 116.514202 39.905409 20 km withdist count 3 asc
1) 1) "ireader"
2) "0.0000"
2) 1) "juejin"
2) "10.5501"
3) 1) "meituan"
2) "11.5748"
Summary & Notes
In a mapping application, the data for vehicles, restaurants, and people may reach millions or tens of millions of records. If using Redis's Geo data structure, they can all be placed in a single zset collection. In a Redis cluster environment, the collection may migrate from one node to another. If the data for a single key is too large, it can significantly impact the migration process, so the data volume per single key should not exceed 1M; otherwise, it can lead to delays in cluster migration and affect normal online services.
Thus, it is recommended to deploy Geo data using a separate Redis instance instead of in a clustered environment.
If the data volume exceeds hundreds of millions or more, Geo data needs to be split, by country, province, or city, and even by district in very populous cities. This significantly reduces the size of individual zset collections.