Distributed Lock

When distributed applications perform logical processing, they often encounter concurrency issues.

For instance, if an operation modifies a user's status, it first needs to read the user's current status, modify it in memory, and then save it back. If these operations occur simultaneously, concurrency issues arise because reading and saving the status are not atomic operations. (Wiki explanation: An atomic operation is one that cannot be interrupted by the thread scheduling mechanism; once it starts, it runs to completion without any context switches.)

At this point, distributed locks are needed to restrict concurrent execution of the program. Redis distributed locks are widely used and are an important interview topic. Many people are aware of this concept and generally understand the principle of distributed locks, but the specifics of implementation are often not entirely correct.

distribute.awebp

The ultimate goal of a distributed lock is to occupy a "pit" in Redis. When another process tries to occupy it and finds it already taken, it must either give up or try again later.

The pit is typically occupied using the setnx (set if not exists) command, allowing only one client to occupy it. First come, first served; once done, the del command is called to release the pit.

bash

> setnx lock:codehole true
OK
... do something critical ...
> del lock:codehole
(integer) 1

However, if an exception occurs during the operation, it might prevent the del command from being called, resulting in a deadlock where the lock is never released.

Thus, after obtaining the lock, we should set an expiration time for the lock, such as 5 seconds, ensuring that even if an exception occurs, the lock will be automatically released after that duration.

bash

> setnx lock:codehole true
OK
> expire lock:codehole 5
... do something critical ...
> del lock:codehole
(integer) 1

But this logic has another issue. If the server process crashes between the setnx and expire commands—perhaps due to a power failure or being manually terminated—then expire won't execute, leading to a deadlock.

The root cause is that setnx and expire are two separate commands rather than an atomic instruction. If these two commands could be executed together, problems would not arise. You might think of using Redis transactions, but that won't work here since expire depends on the result of setnx; if setnx fails to acquire the lock, expire should not execute. Transactions lack if-else branching logic and execute all commands at once; they either all succeed or none do.

To address this dilemma, the Redis open-source community has produced various libraries for distributed locks specifically to tackle this issue. Implementing them can be complex, often requiring considerable effort for newcomers to understand. If you need to use a distributed lock, it means you cannot just use Jedis or redis-py; you also need to introduce a distributed lock library.

To mitigate this confusion, Redis version 2.8 introduced an extended parameter for the set command, allowing setnx and expire to execute together, completely resolving the chaos surrounding distributed locks. Since then, all third-party distributed lock libraries can take a break.

bash

> set lock:codehole true ex 5 nx
OK
... do something critical ...
> del lock:codehole

This command combines setnx and expire into an atomic instruction, which is the essence of distributed locks.

Timeout Issues

Redis distributed locks cannot solve timeout issues. If the logic execution between acquiring and releasing the lock takes too long and exceeds the lock's timeout limit, problems arise. At this point, the lock held by the first thread may expire before the critical section logic completes, allowing the second thread to prematurely reacquire the lock, which prevents strict serial execution of the critical section code.

To avoid this issue, Redis distributed locks should not be used for long-running tasks. If such cases occur, minor data inconsistencies may need manual intervention to resolve.

python

tag = random.nextint()  # Random number
if redis.set(key, tag, nx=True, ex=5):
    do_something()
    redis.delifequals(key, tag)  # Imaginary delifequals command

A somewhat safer approach is to set a random number as the value parameter for the set command. When releasing the lock, the thread first checks if the random number matches before deleting the key. This ensures that the current thread's lock won't be released by other threads unless the lock has expired and been automatically released by the server. However, matching the value and deleting the key are not atomic operations, and Redis does not provide a delifequals command. This necessitates using Lua scripts, as Lua can guarantee the atomic execution of multiple commands.

lua

# delifequals
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end

However, this is not a perfect solution; it is just relatively safer. If a timeout occurs and the current thread's logic has not completed, other threads may still take advantage of the situation.

Reentrancy

Reentrancy refers to a thread requesting a lock while already holding it. If a lock supports multiple acquisitions by the same thread, it is considered reentrant. For example, Java's ReentrantLock is a reentrant lock. To support reentrancy in Redis distributed locks, the client's set method needs to be wrapped to use a thread's ThreadLocal variable to store the count of locks held.

python

# -*- coding: utf-8
import redis
import threading

locks = threading.local()
locks.redis = {}

def key_for(user_id):
    return "account_{}".format(user_id)

def _lock(client, key):
    return bool(client.set(key, True, nx=True, ex=5))

def _unlock(client, key):
    client.delete(key)

def lock(client, user_id):
    key = key_for(user_id)
    if key in locks.redis:
        locks.redis[key] += 1
        return True
    ok = _lock(client, key)
    if not ok:
        return False
    locks.redis[key] = 1
    return True

def unlock(client, user_id):
    key = key_for(user_id)
    if key in locks.redis:
        locks.redis[key] -= 1
        if locks.redis[key] <= 0:
            del locks.redis[key]
            _unlock(client, key)
        return True
    return False

client = redis.StrictRedis()
print("lock", lock(client, "codehole"))
print("lock", lock(client, "codehole"))
print("unlock", unlock(client, "codehole"))
print("unlock", unlock(client, "codehole"))

This is still not the complete implementation of a reentrant lock. More precision is required to consider the expiration time of the in-memory lock count, which will increase code complexity. It is generally not recommended to use reentrant locks, as they add complexity to the client. By adjusting the logical structure in business methods, you can often avoid needing a reentrant lock. Here’s a Java version of a reentrant lock.

java

public class RedisWithReentrantLock {
  private ThreadLocal<Map<String, Integer>> lockers = new ThreadLocal<>();
  private Jedis jedis;

  public RedisWithReentrantLock(Jedis jedis) {
    this.jedis = jedis;
  }

  private boolean _lock(String key) {
    return jedis.set(key, "", "nx", "ex", 5L) != null;
  }

  private void _unlock(String key) {
    jedis.del(key);
  }

  private Map<String, Integer> currentLockers() {
    Map<String, Integer> refs = lockers.get();
    if (refs != null) {
      return refs;
    }
    lockers.set(new HashMap<>());
    return lockers.get();
  }

  public boolean lock(String key) {
    Map<String, Integer> refs = currentLockers();
    Integer refCnt = refs.get(key);
    if (refCnt != null) {
      refs.put(key, refCnt + 1);
      return true;
    }
    boolean ok = this._lock(key);
    if (!ok) {
      return false;
    }
    refs.put(key, 1);
    return true;
  }

  public boolean unlock(String key) {
    Map<String, Integer> refs = currentLockers();
    Integer refCnt = refs.get(key);
    if (refCnt == null) {
      return false;
    }
    refCnt -= 1;
    if (refCnt > 0) {
      refs.put(key, refCnt);
    } else {
      refs.remove(key);
      this._unlock(key);
    }
    return true;
  }

  public static void main(String[] args) {
    Jedis jedis = new Jedis();
    RedisWithReentrantLock redis = new RedisWithReentrantLock(jedis);
    System.out.println(redis.lock("codehole"));
    System.out.println(redis.lock("codehole"));
    System.out.println(redis.unlock("codehole"));
    System.out.println(redis.unlock("codehole"));
  }
}

The difference from the Python version is minimal; it is also based on ThreadLocal and reference counting.

Distributed Lock ​

Timeout Issues ​

Reentrancy ​

Distributed Lock

Timeout Issues

Reentrancy