Elasticsearch
Overview
Core Concepts
Cluster, Nodes, and Shards
Cluster: "my-cluster"
├─ Node 1 (Master-eligible, Data)
│ ├─ Index "products" - Shard 0 (Primary)
│ └─ Index "products" - Shard 2 (Replica)
├─ Node 2 (Data)
│ ├─ Index "products" - Shard 1 (Primary)
│ └─ Index "products" - Shard 0 (Replica)
└─ Node 3 (Data)
├─ Index "products" - Shard 2 (Primary)
└─ Index "products" - Shard 1 (Replica)Key Terms
Inverted Index (Core of Search)
What is an Inverted Index?
Token Analysis
Why It's Fast
Document Indexing Flow
1. Client Sends Document
2. Routing (Choosing Target Shard)
3. Write to Primary Shard
4. Refresh (Make Doc Searchable)
5. Replicate to Replica Shards
6. Persist to Disk (Flush)
Segments & Merging
Problem: Too Many Segments
Solution: Segment Merging
Merge Policies
Search Query Flow
1. Query Phase (Scatter)
2. Fetch Phase (Gather)
Example Query
Scoring & Relevance
TF-IDF (Classic Scoring)
BM25 (Default Scoring)
Sharding Strategy
Number of Shards
Replicas
Cluster State & Master Node
Master Node Responsibilities
Split-Brain Prevention
Querying Optimizations
1. Filter vs Query Context
2. Field Data Cache
3. Index Sorting
Handling Deletes & Updates
Documents are Immutable
Deleted Docs
Translog & Durability
Purpose
Flush
Configurable Durability
Aggregations
Bucket Aggregations
Metric Aggregations
Pipeline Aggregations
Common Pitfalls
❌ Using Text Fields for Aggregations
❌ Deep Pagination
❌ Wildcard Queries (*laptop*)
*laptop*)❌ Too Many Shards
Indexing Performance
Search Performance
Hardware
Interview Questions
Last updated