Distributed Locks

Coordinate exclusive access to a shared resource across multiple processes or nodes.

1. Concept Overview

A distributed lock allows only one holder at a time across a cluster. It is used for leader election, preventing duplicate processing, or guarding a critical section (e.g. “only one worker should process this shard”).

Why it exists: In a single process, a mutex is enough. Across many nodes, you need a shared store (e.g. Redis, ZooKeeper, etcd) that guarantees mutual exclusion.

2. Core Principles

Requirements

Mutual exclusion: Only one holder at a time.
Liveness: If holder crashes, lock is eventually released (TTL or ephemeral node).
Safety: No two nodes believe they hold the lock simultaneously (consistency of the lock store).

Implementation patterns

1. Redis (single or Redlock)

SET key unique_value NX PX ttl to acquire; delete key to release.
Redlock: Use multiple Redis instances and acquire on majority to tolerate single-node failure.
Caveat: Clock skew and network delays can break safety in edge cases.

2. ZooKeeper / etcd

Create ephemeral node (e.g. /lock/resource-123). Lowest sequence number wins.
If holder dies, session ends and node is removed; next waiter gets the lock.
Strong consistency (CP) so no two nodes see themselves as leader.

3. Database

SELECT ... FOR UPDATE or “insert unique row” with retry.
Works but adds DB load and depends on DB availability.

Architecture (ZooKeeper-style)

  Node A ──▶ Create /lock/xyz (ephemeral) ──▶ Leader
  Node B ──▶ Create /lock/xyz (ephemeral) ──▶ Waits (watches A's node)
  Node C ──▶ Create /lock/xyz (ephemeral) ──▶ Waits

  When A dies → ephemeral node deleted → B or C becomes leader

3. Real-World Usage

Leader election: One active instance (e.g. cron runner, consumer group coordinator); ZooKeeper used by Kafka, HBase.
Scheduled jobs: Only one node runs the job at a time (e.g. Redis lock with TTL).
Resource guard: Only one worker processes a given partition or queue.
Critical section: E.g. “single writer to this file” across multiple servers.

4. Trade-offs

Approach

Pros

Cons

Redis (single)

Fast, simple

Single point of failure; no strong consistency guarantee

Redlock

Tolerates Redis node failure

Complex; still debated for safety under clock/network issues

ZooKeeper / etcd

Strong consistency; ephemeral nodes

Heavier; more latency; operational complexity

Uses existing infra

DB as bottleneck; not ideal for high contention

When to use: Leader election, single-active job, or guarding a shared resource across nodes. When not: Prefer stateless design or message ordering (e.g. single consumer per partition) to avoid lock contention.

5. Failure Scenarios

Scenario

Mitigation

Holder crashes before release

TTL (Redis) or ephemeral node (ZooKeeper) so lock is released

Clock skew (Redis)

Prefer ZooKeeper/etcd for critical locks; or use Redlock with caution

Lock store down

No new acquisitions; existing holders may not know; design for “at most one” and idempotent work

Split brain

Use CP store (ZooKeeper, etcd) so only one partition can grant locks

6. Performance Considerations

Latency: Acquire/release involves network round-trips; use short critical sections.
Contention: High contention increases latency and load on lock store; shard resources or use per-resource locks.
TTL: Too short → premature release; too long → slow recovery after crash. Match to max expected hold time.

7. Implementation Patterns

Try-lock with backoff: Retry with exponential backoff if lock is busy; avoid thundering herd.
Renewal (lease): If work can outlast TTL, renew lock periodically in a background thread; stop renewing when done.
Fencing tokens: Store (e.g. in etcd) an increasing token with the lock; resource checks token and rejects stale holders (prevents two “leaders” after split-brain).

Quick Revision

Use cases: Leader election, single-active job, guarding shared resource.
Redis: Simple, TTL for liveness; Redlock for multi-node. ZooKeeper/etcd: Ephemeral nodes, strong consistency.
Failure: TTL or ephemeral to release on crash; prefer CP store for safety; fencing token for critical resources.
Interview: “We use ZooKeeper for leader election so only one instance runs the scheduler; if the leader dies, the ephemeral node goes away and another instance takes over. For less critical cases we use Redis with a short TTL.”

PreviousAPI Gateway NextRate Limiting

Last updated 18 days ago

hashtag1. Concept Overview

hashtag2. Core Principles

hashtagRequirements

hashtagImplementation patterns

hashtagArchitecture (ZooKeeper-style)

hashtag3. Real-World Usage

hashtag4. Trade-offs

hashtag5. Failure Scenarios

hashtag6. Performance Considerations

hashtag7. Implementation Patterns

hashtagQuick Revision