githubEdit

2. Advanced Topics

For SDE-3 Interview Preparation Deep dive into distributed systems concepts crucial for senior-level system design interviews

Table of Contents


Introduction to Distributed Systems

A distributed system is a collection of independent computers that appears to its users as a single coherent system.

Why Distributed Systems?

Reasons to distribute:

  • Scalability: Handle more load by adding machines

  • Availability: Continue operating despite failures

  • Geographic Distribution: Serve users from nearby locations

  • Fault Tolerance: No single point of failure

Challenges:

  • Network Failures: Partial failures, network partitions

  • Concurrency: Coordination across nodes

  • Consistency: Keeping data synchronized

  • Complexity: Harder to reason about and debug


Consistency Models

Strong Consistency (Linearizability)

Definition: All operations appear to execute atomically and in order, as if on a single machine.

Characteristics:

  • Total Order: All operations have a global order

  • Real-time Guarantee: If operation A completes before B starts, A < B in order

  • Atomic Visibility: Operations take effect instantaneously

Examples:

  • Google Spanner: Uses TrueTime (atomic clocks) for external consistency

  • Etcd: Uses Raft consensus for strong consistency

  • ZooKeeper: Linearizable writes, potentially stale reads

Trade-offs:

  • ✅ Easy to reason about (behaves like single-node database)

  • ✅ No surprises for application developers

  • ❌ Higher latency (coordination overhead)

  • ❌ Lower availability during partitions (CP in CAP)

When to Use:

  • Financial transactions (bank account balance)

  • Inventory management (prevent overselling)

  • Leader election

  • Distributed locking

Sequential Consistency

Definition: All operations appear to execute in some sequential order, but not necessarily real-time order.

Weaker than linearizability (no real-time guarantee). Stronger than eventual consistency (all clients see same order).

Causal Consistency

Definition: Operations that are causally related are seen in the same order by all processes.

Techniques:

  • Vector Clocks: Track causality across nodes

  • Version Vectors: Similar but optimized for dynamo-style systems

Examples:

  • DynamoDB: Optional causal consistency

  • Cosmos DB: Session consistency (reads reflect writes in same session)

Use Cases:

  • Social media feeds (see post before comments)

  • Collaborative editing

  • Chat applications

Eventual Consistency

Definition: If no new updates are made, eventually all replicas will converge to the same value.

Characteristics:

  • No Ordering Guarantees: Reads may return stale data

  • Convergence: Eventually consistent (given time)

  • High Availability: AP in CAP theorem

Conflict Resolution Strategies:

  1. Last-Write-Wins (LWW): Use timestamp, discard older writes

  2. Version Vectors: Track causality, detect conflicts

  3. Application-Level: Let application resolve (e.g., shopping cart merge)

Examples:

  • Cassandra: Tunable consistency (can be eventual)

  • DynamoDB: Default eventual consistency

  • DNS: DNS propagation is eventually consistent

Trade-offs:

  • ✅ High availability

  • ✅ Low latency (no coordination)

  • ✅ Partition tolerant

  • ❌ Application complexity (handle stale reads)

  • ❌ Conflict resolution needed

When to Use:

  • Social media feeds

  • Product catalogs (staleness acceptable)

  • Analytics dashboards

  • Caching layers

Read-Your-Writes Consistency

Definition: After a client writes a value, their subsequent reads will always see that value or a newer one.

Implementation:

  • Route user's reads to same replica

  • Use session tokens with version numbers

  • Read from leader for user's own data

Examples:

  • User profile updates

  • Comment systems (see your own comments)


Consensus Protocols

Consensus protocols allow distributed nodes to agree on a single value despite failures.

Raft Consensus

Problem Solved: Leader election and log replication in distributed systems

Key Concepts:

1. Leader Election

2. Log Replication

Properties:

  • Safety: Never return incorrect results

  • Availability: Available as long as majority operational

  • No clock dependency: Doesn't depend on timing

Examples:

  • Etcd: Kubernetes uses Etcd for cluster coordination

  • Consul: Service discovery and configuration

  • CockroachDB: Distributed SQL database

When to Use:

  • Distributed configuration storage

  • Leader election

  • Service discovery

  • Distributed locking

Paxos

Problem: Achieve consensus in an distributed environment with unreliable network

Phases:

  1. Prepare: Proposer sends proposal number, asks for promises

  2. Promise: Acceptors promise not to accept lower-numbered proposals

  3. Accept: Proposer sends value with proposal number

  4. Accepted: Acceptors accept if they haven't promised to higher number

Variants:

  • Multi-Paxos: Optimize by having stable leader (skip prepare phase)

  • Fast Paxos: Reduce latency by allowing clients to propose directly

Examples:

  • Google Chubby: Distributed lock service

  • Apache ZooKeeper: Inspired by Paxos (uses Zab protocol)

Raft vs Paxos:

  • Raft: Easier to understand, popular in industry

  • Paxos: More complex, theoretically elegant, less common


Distributed Transactions

Two-Phase Commit (2PC)

Goal: Ensure all-or-nothing execution across multiple databases

Protocol:

Phase 1: Prepare

Phase 2: Commit

Problems:

  • Blocking: If coordinator crashes, participants wait indefinitely

  • Single Point of Failure: Coordinator failure blocks transaction

  • Not Partition Tolerant: Network partition can cause inconsistency

Example:

Use Cases:

  • Traditional RDBMS distributed transactions

  • Microservices with strong consistency needs

  • Financial systems (rare, prefer compensation)

Three-Phase Commit (3PC)

Improvement over 2PC: Adds timeout to avoid indefinite blocking

Not widely used because:

  • Network partitions still cause issues

  • Adds latency

  • Saga pattern preferred

Saga Pattern

Alternative to distributed transactions: Long-running business processes with compensation

Two Approaches:

1. Choreography (Event-Driven)

2. Orchestration (Centralized)

Compensating Transactions:

Trade-offs:

  • ✅ No distributed locks (better availability)

  • ✅ Works across services, even HTTP APIs

  • ❌ Eventual consistency (intermediate states visible)

  • ❌ Compensations must be idempotent

Examples:

  • E-commerce checkout (order, payment, shipping)

  • Travel booking (flight, hotel, car rental)

  • Food delivery (order, restaurant, driver assignment)

Outbox Pattern

Problem: Ensure database write and message publish happen atomically

Solution:

Guarantees:

  • At-least-once delivery (may duplicate, make consumers idempotent)

  • No lost messages (atomically written with data)

Examples:

  • Order service writes order + publishes "OrderCreated" event

  • Payment service writes payment + publishes "PaymentProcessed" event


Time and Ordering

The Problem with Time

In distributed systems, there is no global clock. Each node has its own clock, which may drift.

Physical Clocks

NTP (Network Time Protocol): Synchronize clocks across nodes

  • Accuracy: ~1ms on LAN, ~50ms on WAN

  • Problem: Clock skew, clock drift

Consequences:

Lamport Timestamps (Logical Clocks)

Idea: Use logical counter instead of physical time

Rules:

  1. Each node has counter (initially 0)

  2. Before event: increment counter

  3. Send message: include counter

  4. Receive message: counter = max(local, received) + 1

Property: If event A happens-before event B, then timestamp(A) < timestamp(B)

Limitation: Converse not true (can't determine causality from timestamps alone)

Vector Clocks

Improvement: Capture causality between events

Structure: Each node maintains vector of counters (one per node)

Causality Detection:

Examples:

  • Dynamo (DynamoDB, Riak, Cassandra): Uses version vectors (optimized vector clocks)

  • Distributed version control (Git): Track causality of commits

Google Spanner's TrueTime

Idea: Use GPS + atomic clocks for global time with bounded uncertainty

Commit Wait:

Result: External consistency (linearizability across datacenters)


Conflict Resolution

Last-Write-Wins (LWW)

Strategy: Most recent write (by timestamp) wins

Problems:

  • Requires synchronized clocks (or use logical timestamps)

  • Data loss (earlier write discarded)

Use Cases:

  • Shopping cart (last user action matters)

  • User profile updates

Multi-Value Resolution

Strategy: Keep all conflicting values, let application decide

Examples:

  • DynamoDB: Returns all versions on conflict

  • Riak: Siblings (multiple values for same key)

CRDTs (Conflict-Free Replicated Data Types)

Idea: Data structures that automatically merge without conflicts

Types:

1. G-Counter (Grow-only Counter)

2. PN-Counter (Positive-Negative Counter)

3. LWW-Register

4. OR-Set (Observed-Remove Set)

Examples:

  • Redis: CRDT support in Redis Enterprise

  • Riak: CRDT data types (counters, sets, maps)

  • Cosmos DB: Supports CRDTs


Distributed Coordination

Apache ZooKeeper

Purpose: Centralized coordination service for distributed systems

Features: -Configuration Management: Store configuration, notify on changes

  • Leader Election: Elect leader among distributed nodes

  • Distributed Locks: Coordinate access to shared resources

  • Group Membership: Track which nodes are alive

Data Model:

ZNodes Types:

  • Persistent: Remain until explicitly deleted

  • Ephemeral: Deleted when session ends (for leader election)

  • Sequential: Auto-incrementing suffix (for locks)

Example: Leader Election

Used By:

  • Kafka: Broker coordination, topic metadata

  • HBase: Master election, region server coordination

  • Solr: Cluster state management


Summary: Decision Matrix

Need
Solution
Trade-off

Strong consistency

Spanner, Etcd (Raft)

Lower availability, higher latency

High availability

Cassandra, DynamoDB

Eventual consistency

Leader election

ZooKeeper, Etcd

Centralized coordinator

Distributed transactions

2PC (rare), Saga pattern

Saga = eventual consistency

Ordering without consensus

Lamport/Vector clocks

Logical time, not real-time

Conflict-free updates

CRDTs

Limited operations


For SDE-3 Interviews: Be ready to discuss trade-offs between consistency, availability, and latency. Know when to use consensus (Raft), when to use eventual consistency, and how to handle conflicts.


Quick Revision

  • Consistency models: Strong (linearizability) → sequential → causal → eventual. Strong = single-node semantics; eventual = high availability, stale reads possible.

  • Consensus: Raft (leader election + log replication); Paxos (equivalent, less intuitive). Used by etcd, Consul, ZooKeeper for coordination.

  • Distributed transactions: 2PC (blocking, not partition-tolerant); Saga (compensation, eventual consistency); Outbox (DB + message atomically).

  • Time: Lamport clocks (happens-before); vector clocks (causality); TrueTime (Spanner, bounded uncertainty).

  • Conflict resolution: LWW (timestamp); version vectors; CRDTs (merge without conflict).

  • Interview talking points: "For strong consistency we'd use a CP store (etcd/Raft); for scale and availability we'd use eventual consistency and handle conflicts with LWW or application merge. Cross-service we'd use Saga, not 2PC."

  • Common mistakes: Assuming 2PC is always the answer; ignoring replication lag when reading from replicas; using physical time for ordering across nodes without TrueTime-like guarantees.

Last updated