Hashing

Map keys to indices via hash function for O(1) average lookup/insert/delete. SDE-3: collision strategies, load factor, consistent hashing (distributed), and when to use set vs map.


1. Concept Overview

Problem space: Frequency count, existence (set), two-sum (complement map), grouping (anagrams, by key), subarray sum = K (prefix sum → count map), LRU (map + DLL).

When to use: O(1) lookup by key; counting; deduplication; "find pair with sum" → complement in map.


2. Core Algorithms

Hash Table Operations

  • Insert: hash(key) → bucket; handle collision (chain or open addressing). Amortized O(1) with resizing.

  • Lookup/Delete: Same hash; in chain traverse list; in open addressing probe. O(1) average.

  • Load factor α = n/m; resize when α > threshold (e.g., 0.75); rehash all. Amortized O(1).

Collision Handling

  • Chaining: List (or tree) per bucket. Simple; cache-unfriendly for long chains.

  • Open addressing: Linear/quadratic/double hashing. Better cache; clustering risk (linear).

Subarray Sum Equals K

  • Prefix sum P; for each P[j], count of indices i < j with P[i] = P[j] - K. Single pass with count map. O(N).


3. Advanced Variations

  • Consistent hashing: Ring; keys and nodes hashed; assign key to first node clockwise. Minimal reassignment when nodes join/leave. Used in distributed caches.

  • Bloom filter: Multiple hashes; set bits; no false negatives, possible false positives; no deletion (or counting bloom filter). O(k) space per element, k = hash count.

  • Rabin-Karp: Rolling hash for substring; hash(s[i:j+1]) from hash(s[i:j]) in O(1). Collision check with equality.

Edge Cases

  • Empty array; single element; all same; negative K; prefix sum 0 (count as one: one subarray from start); integer overflow in sum.


4. Common Interview Problems

Easy: Two Sum, Valid Anagram, First Unique Character. Medium: Group Anagrams, Subarray Sum Equals K, Longest Consecutive Sequence, LRU Cache. Hard: Minimum Window Substring (map for count), Max Points on a Line (slope map), Substring with Concatenation of All Words.


5. Pattern Recognition

  • Complement map: Two sum, 3-sum (fix one, two-sum on rest with set/map).

  • Frequency map: Anagrams, majority, "at most K distinct" in window.

  • Prefix map: Subarray sum = K (count of prefix sums); longest subarray with sum 0 (first occurrence of prefix).

  • Group by key: Anagrams (sorted word or count tuple as key); design key for grouping.


6. Code Implementations


7. SDE-3 Level Thinking

  • Trade-offs: Chaining vs open addressing — chaining simpler, open addressing better cache and one allocation. Load factor vs space vs collision rate.

  • Scalability: Distributed systems → consistent hashing. Huge key space with small set → bloom filter. LRU at scale → sharded maps + TTL.

  • Concurrency: ConcurrentHashMap-style locking or lock-free per bucket.


8. Interview Strategy

  • Identify: "Pair with sum" → complement map. "Count subarray sum K" → prefix + count map. "Group by" → design key, map to list.

  • Common mistakes: Forgetting prefix 0 (subarray from start); off-by-one in window problems; treating anagram key as string (tuple of counts or sorted).


9. Quick Revision

  • Tricks: Prefix sum + map for subarray sum; sorted string or count tuple for anagram key; complement = target - x.

  • Edge cases: K=0; prefix 0; empty; duplicates.

  • Pattern tip: "Find pair" / "subarray sum" → map; "group" → key design.

Last updated