2. Advanced Topics
For SDE-3 Interview Preparation Deep dive into distributed systems concepts crucial for senior-level system design interviews
Table of Contents
Introduction to Distributed Systems
A distributed system is a collection of independent computers that appears to its users as a single coherent system.
Why Distributed Systems?
Reasons to distribute:
Scalability: Handle more load by adding machines
Availability: Continue operating despite failures
Geographic Distribution: Serve users from nearby locations
Fault Tolerance: No single point of failure
Challenges:
Network Failures: Partial failures, network partitions
Concurrency: Coordination across nodes
Consistency: Keeping data synchronized
Complexity: Harder to reason about and debug
Consistency Models
Strong Consistency (Linearizability)
Definition: All operations appear to execute atomically and in order, as if on a single machine.
Characteristics:
Total Order: All operations have a global order
Real-time Guarantee: If operation A completes before B starts, A < B in order
Atomic Visibility: Operations take effect instantaneously
Examples:
Google Spanner: Uses TrueTime (atomic clocks) for external consistency
Etcd: Uses Raft consensus for strong consistency
ZooKeeper: Linearizable writes, potentially stale reads
Trade-offs:
✅ Easy to reason about (behaves like single-node database)
✅ No surprises for application developers
❌ Higher latency (coordination overhead)
❌ Lower availability during partitions (CP in CAP)
When to Use:
Financial transactions (bank account balance)
Inventory management (prevent overselling)
Leader election
Distributed locking
Sequential Consistency
Definition: All operations appear to execute in some sequential order, but not necessarily real-time order.
Weaker than linearizability (no real-time guarantee). Stronger than eventual consistency (all clients see same order).
Causal Consistency
Definition: Operations that are causally related are seen in the same order by all processes.
Techniques:
Vector Clocks: Track causality across nodes
Version Vectors: Similar but optimized for dynamo-style systems
Examples:
DynamoDB: Optional causal consistency
Cosmos DB: Session consistency (reads reflect writes in same session)
Use Cases:
Social media feeds (see post before comments)
Collaborative editing
Chat applications
Eventual Consistency
Definition: If no new updates are made, eventually all replicas will converge to the same value.
Characteristics:
No Ordering Guarantees: Reads may return stale data
Convergence: Eventually consistent (given time)
High Availability: AP in CAP theorem
Conflict Resolution Strategies:
Last-Write-Wins (LWW): Use timestamp, discard older writes
Version Vectors: Track causality, detect conflicts
Application-Level: Let application resolve (e.g., shopping cart merge)
Examples:
Cassandra: Tunable consistency (can be eventual)
DynamoDB: Default eventual consistency
DNS: DNS propagation is eventually consistent
Trade-offs:
✅ High availability
✅ Low latency (no coordination)
✅ Partition tolerant
❌ Application complexity (handle stale reads)
❌ Conflict resolution needed
When to Use:
Social media feeds
Product catalogs (staleness acceptable)
Analytics dashboards
Caching layers
Read-Your-Writes Consistency
Definition: After a client writes a value, their subsequent reads will always see that value or a newer one.
Implementation:
Route user's reads to same replica
Use session tokens with version numbers
Read from leader for user's own data
Examples:
User profile updates
Comment systems (see your own comments)
Consensus Protocols
Consensus protocols allow distributed nodes to agree on a single value despite failures.
Raft Consensus
Problem Solved: Leader election and log replication in distributed systems
Key Concepts:
1. Leader Election
2. Log Replication
Properties:
Safety: Never return incorrect results
Availability: Available as long as majority operational
No clock dependency: Doesn't depend on timing
Examples:
Etcd: Kubernetes uses Etcd for cluster coordination
Consul: Service discovery and configuration
CockroachDB: Distributed SQL database
When to Use:
Distributed configuration storage
Leader election
Service discovery
Distributed locking
Paxos
Problem: Achieve consensus in an distributed environment with unreliable network
Phases:
Prepare: Proposer sends proposal number, asks for promises
Promise: Acceptors promise not to accept lower-numbered proposals
Accept: Proposer sends value with proposal number
Accepted: Acceptors accept if they haven't promised to higher number
Variants:
Multi-Paxos: Optimize by having stable leader (skip prepare phase)
Fast Paxos: Reduce latency by allowing clients to propose directly
Examples:
Google Chubby: Distributed lock service
Apache ZooKeeper: Inspired by Paxos (uses Zab protocol)
Raft vs Paxos:
Raft: Easier to understand, popular in industry
Paxos: More complex, theoretically elegant, less common
Distributed Transactions
Two-Phase Commit (2PC)
Goal: Ensure all-or-nothing execution across multiple databases
Protocol:
Phase 1: Prepare
Phase 2: Commit
Problems:
Blocking: If coordinator crashes, participants wait indefinitely
Single Point of Failure: Coordinator failure blocks transaction
Not Partition Tolerant: Network partition can cause inconsistency
Example:
Use Cases:
Traditional RDBMS distributed transactions
Microservices with strong consistency needs
Financial systems (rare, prefer compensation)
Three-Phase Commit (3PC)
Improvement over 2PC: Adds timeout to avoid indefinite blocking
Not widely used because:
Network partitions still cause issues
Adds latency
Saga pattern preferred
Saga Pattern
Alternative to distributed transactions: Long-running business processes with compensation
Two Approaches:
1. Choreography (Event-Driven)
2. Orchestration (Centralized)
Compensating Transactions:
Trade-offs:
✅ No distributed locks (better availability)
✅ Works across services, even HTTP APIs
❌ Eventual consistency (intermediate states visible)
❌ Compensations must be idempotent
Examples:
E-commerce checkout (order, payment, shipping)
Travel booking (flight, hotel, car rental)
Food delivery (order, restaurant, driver assignment)
Outbox Pattern
Problem: Ensure database write and message publish happen atomically
Solution:
Guarantees:
At-least-once delivery (may duplicate, make consumers idempotent)
No lost messages (atomically written with data)
Examples:
Order service writes order + publishes "OrderCreated" event
Payment service writes payment + publishes "PaymentProcessed" event
Time and Ordering
The Problem with Time
In distributed systems, there is no global clock. Each node has its own clock, which may drift.
Physical Clocks
NTP (Network Time Protocol): Synchronize clocks across nodes
Accuracy: ~1ms on LAN, ~50ms on WAN
Problem: Clock skew, clock drift
Consequences:
Lamport Timestamps (Logical Clocks)
Idea: Use logical counter instead of physical time
Rules:
Each node has counter (initially 0)
Before event: increment counter
Send message: include counter
Receive message: counter = max(local, received) + 1
Property: If event A happens-before event B, then timestamp(A) < timestamp(B)
Limitation: Converse not true (can't determine causality from timestamps alone)
Vector Clocks
Improvement: Capture causality between events
Structure: Each node maintains vector of counters (one per node)
Causality Detection:
Examples:
Dynamo (DynamoDB, Riak, Cassandra): Uses version vectors (optimized vector clocks)
Distributed version control (Git): Track causality of commits
Google Spanner's TrueTime
Idea: Use GPS + atomic clocks for global time with bounded uncertainty
Commit Wait:
Result: External consistency (linearizability across datacenters)
Conflict Resolution
Last-Write-Wins (LWW)
Strategy: Most recent write (by timestamp) wins
Problems:
Requires synchronized clocks (or use logical timestamps)
Data loss (earlier write discarded)
Use Cases:
Shopping cart (last user action matters)
User profile updates
Multi-Value Resolution
Strategy: Keep all conflicting values, let application decide
Examples:
DynamoDB: Returns all versions on conflict
Riak: Siblings (multiple values for same key)
CRDTs (Conflict-Free Replicated Data Types)
Idea: Data structures that automatically merge without conflicts
Types:
1. G-Counter (Grow-only Counter)
2. PN-Counter (Positive-Negative Counter)
3. LWW-Register
4. OR-Set (Observed-Remove Set)
Examples:
Redis: CRDT support in Redis Enterprise
Riak: CRDT data types (counters, sets, maps)
Cosmos DB: Supports CRDTs
Distributed Coordination
Apache ZooKeeper
Purpose: Centralized coordination service for distributed systems
Features: -Configuration Management: Store configuration, notify on changes
Leader Election: Elect leader among distributed nodes
Distributed Locks: Coordinate access to shared resources
Group Membership: Track which nodes are alive
Data Model:
ZNodes Types:
Persistent: Remain until explicitly deleted
Ephemeral: Deleted when session ends (for leader election)
Sequential: Auto-incrementing suffix (for locks)
Example: Leader Election
Used By:
Kafka: Broker coordination, topic metadata
HBase: Master election, region server coordination
Solr: Cluster state management
Summary: Decision Matrix
Strong consistency
Spanner, Etcd (Raft)
Lower availability, higher latency
High availability
Cassandra, DynamoDB
Eventual consistency
Leader election
ZooKeeper, Etcd
Centralized coordinator
Distributed transactions
2PC (rare), Saga pattern
Saga = eventual consistency
Ordering without consensus
Lamport/Vector clocks
Logical time, not real-time
Conflict-free updates
CRDTs
Limited operations
For SDE-3 Interviews: Be ready to discuss trade-offs between consistency, availability, and latency. Know when to use consensus (Raft), when to use eventual consistency, and how to handle conflicts.
Quick Revision
Consistency models: Strong (linearizability) → sequential → causal → eventual. Strong = single-node semantics; eventual = high availability, stale reads possible.
Consensus: Raft (leader election + log replication); Paxos (equivalent, less intuitive). Used by etcd, Consul, ZooKeeper for coordination.
Distributed transactions: 2PC (blocking, not partition-tolerant); Saga (compensation, eventual consistency); Outbox (DB + message atomically).
Time: Lamport clocks (happens-before); vector clocks (causality); TrueTime (Spanner, bounded uncertainty).
Conflict resolution: LWW (timestamp); version vectors; CRDTs (merge without conflict).
Interview talking points: "For strong consistency we'd use a CP store (etcd/Raft); for scale and availability we'd use eventual consistency and handle conflicts with LWW or application merge. Cross-service we'd use Saga, not 2PC."
Common mistakes: Assuming 2PC is always the answer; ignoring replication lag when reading from replicas; using physical time for ordering across nodes without TrueTime-like guarantees.
Last updated