Distributed Locks
Coordinate exclusive access to a shared resource across multiple processes or nodes.
1. Concept Overview
A distributed lock allows only one holder at a time across a cluster. It is used for leader election, preventing duplicate processing, or guarding a critical section (e.g. “only one worker should process this shard”).
Why it exists: In a single process, a mutex is enough. Across many nodes, you need a shared store (e.g. Redis, ZooKeeper, etcd) that guarantees mutual exclusion.
2. Core Principles
Requirements
Mutual exclusion: Only one holder at a time.
Liveness: If holder crashes, lock is eventually released (TTL or ephemeral node).
Safety: No two nodes believe they hold the lock simultaneously (consistency of the lock store).
Implementation patterns
1. Redis (single or Redlock)
SET key unique_value NX PX ttlto acquire; delete key to release.Redlock: Use multiple Redis instances and acquire on majority to tolerate single-node failure.
Caveat: Clock skew and network delays can break safety in edge cases.
2. ZooKeeper / etcd
Create ephemeral node (e.g.
/lock/resource-123). Lowest sequence number wins.If holder dies, session ends and node is removed; next waiter gets the lock.
Strong consistency (CP) so no two nodes see themselves as leader.
3. Database
SELECT ... FOR UPDATEor “insert unique row” with retry.Works but adds DB load and depends on DB availability.
Architecture (ZooKeeper-style)
3. Real-World Usage
Leader election: One active instance (e.g. cron runner, consumer group coordinator); ZooKeeper used by Kafka, HBase.
Scheduled jobs: Only one node runs the job at a time (e.g. Redis lock with TTL).
Resource guard: Only one worker processes a given partition or queue.
Critical section: E.g. “single writer to this file” across multiple servers.
4. Trade-offs
Redis (single)
Fast, simple
Single point of failure; no strong consistency guarantee
Redlock
Tolerates Redis node failure
Complex; still debated for safety under clock/network issues
ZooKeeper / etcd
Strong consistency; ephemeral nodes
Heavier; more latency; operational complexity
DB
Uses existing infra
DB as bottleneck; not ideal for high contention
When to use: Leader election, single-active job, or guarding a shared resource across nodes. When not: Prefer stateless design or message ordering (e.g. single consumer per partition) to avoid lock contention.
5. Failure Scenarios
Holder crashes before release
TTL (Redis) or ephemeral node (ZooKeeper) so lock is released
Clock skew (Redis)
Prefer ZooKeeper/etcd for critical locks; or use Redlock with caution
Lock store down
No new acquisitions; existing holders may not know; design for “at most one” and idempotent work
Split brain
Use CP store (ZooKeeper, etcd) so only one partition can grant locks
6. Performance Considerations
Latency: Acquire/release involves network round-trips; use short critical sections.
Contention: High contention increases latency and load on lock store; shard resources or use per-resource locks.
TTL: Too short → premature release; too long → slow recovery after crash. Match to max expected hold time.
7. Implementation Patterns
Try-lock with backoff: Retry with exponential backoff if lock is busy; avoid thundering herd.
Renewal (lease): If work can outlast TTL, renew lock periodically in a background thread; stop renewing when done.
Fencing tokens: Store (e.g. in etcd) an increasing token with the lock; resource checks token and rejects stale holders (prevents two “leaders” after split-brain).
Quick Revision
Use cases: Leader election, single-active job, guarding shared resource.
Redis: Simple, TTL for liveness; Redlock for multi-node. ZooKeeper/etcd: Ephemeral nodes, strong consistency.
Failure: TTL or ephemeral to release on crash; prefer CP store for safety; fencing token for critical resources.
Interview: “We use ZooKeeper for leader election so only one instance runs the scheduler; if the leader dies, the ephemeral node goes away and another instance takes over. For less critical cases we use Redis with a short TTL.”
Last updated