Maintain disjoint sets with near O(1) amortized union and find using path compression and union by rank. Essential for connectivity, MST (Kruskal), and dynamic connectivity.
1. Concept Overview
Problem space: Connected components (undirected graph), cycle detection in graph, Kruskal's MST, dynamic connectivity ("add edge", "are u and v connected?"), grouping (accounts merge, etc.).
When to use: "Merge sets" / "are two elements in same set?" with many such operations. Prefer over DFS when you need to repeatedly add edges and query connectivity.
2. Core Algorithms
Find (with path compression)
Follow parent until root; optionally set parent[x] = root for all on path (path compression). Amortized O(α(n)) ≈ O(1).
Union (by rank)
Find roots of x and y. If same, return. Else attach smaller rank under larger; if equal, increment one rank. Amortized O(α(n)).
Full DSU
classDSU:def__init__(self,n):self.parent =list(range(n))self.rank =[0]* ndeffind(self,x):ifself.parent[x]!= x:self.parent[x]=self.find(self.parent[x])returnself.parent[x]defunion(self,x,y): rx, ry =self.find(x),self.find(y)if rx == ry:returnFalseifself.rank[rx]<self.rank[ry]: rx, ry = ry, rxself.parent[ry]= rxifself.rank[rx]==self.rank[ry]:self.rank[rx]+=1returnTrue
3. Advanced Variations
Size of component: Maintain size[root]; on union, add smaller size to larger.
Count components: Initially n; decrement on successful union.
DSU with rollback: For "offline" queries (e.g., process in order and undo); keep history of parent changes to revert.
Kruskal: Sort edges by weight; add edge if union(u,v) is true; repeat until n-1 edges. MST weight = sum of added edge weights.
Edge Cases
Empty graph; single node; multiple edges same weight (order can affect which MST); disconnected graph (MST not possible, count components).
4. Common Interview Problems
Medium: Number of Connected Components, Redundant Connection, Accounts Merge, Number of Islands (can use DSU instead of DFS).
Hard: Longest Consecutive Sequence (DSU over indices), Critical Connections (bridges — use DFS/Tarjan, not DSU), Minimize Malware Spread.
5. Pattern Recognition
Connectivity: "Same component" / "merge groups" → DSU. "Count components after adding/removing edges" → DSU or DFS.
MST: Kruskal = sort edges + DSU to avoid cycles.
Cycle in undirected graph: Add edge (u,v); if find(u)==find(v) before union, cycle.
6. SDE-3 Level Thinking
Trade-offs: DSU vs DFS for connectivity — DSU supports incremental edges and many queries; DFS is simpler for one-shot "count components". Complexity: α(n) ≈ 4 for practical n.
Memory: O(N) for parent and rank. With rollback, store history (stack of (node, old_parent)).