#7 Hotel reservation system
Below is a complete, time-boxed 1-hour interview answer for designing a Hotel Reservation System (think Booking.com / Expedia core features). It follows your preferred pattern: clarify → FR/NFR → APIs → high-level design → deep dives (availability, booking, payments, scaling) → BoE sizing → trade-offs & wrap up. Use it as a script in an interview — speak the minutes noted.
0 – 5 min — Problem recap, scope & key assumptions (set the stage)
Quickly restate and confirm scope so interviewer and you are aligned.
Goal: A system that lets users search hotels, view availability & prices, hold/confirm bookings, manage cancellations/changes, and supports hosts (inventory) and payments. Provide real-time availability, prevent double-booking, support promotions/taxes, and provide dashboards + billing.
Primary needs:
Real-time availability & booking correctness (no double-booking).
High search throughput and low latency for price/availability.
Integration with external property systems (PMS) for large hotels.
Support OTA features: cancellation policy, refunds, holds, payment capture.
Example assumptions (adjustable):
1M MAU, peak 20k searches/sec, peak 500 bookings/sec.
100k properties, average 50 rooms/property.
Booking conversion ~0.5%; retention 30 days raw events.
Latency targets: search P95 < 200 ms; booking end-to-end < 1s for availability verification, < 3s including payment.
5 – 15 min — Functional & Non-Functional Requirements
Functional Requirements (Must / Should / Nice)
Must
Search hotels by location/date/guests/filters; returns available properties with prices and room types.
Check availability for specific room(s) & date range.
Create booking: hold inventory, collect guest info, accept payment (or authorize), confirm booking.
Cancel / modify booking per policy.
Host management: inventory CRUD, rates, restrictions, sync with Property Management Systems (PMS) via APIs/feeds.
Pricing rules: support seasonal rates, dynamic pricing, taxes, fees, promotions, and currency conversion.
Notifications: emails/SMS for confirmations, reminders, cancellations.
Reporting & reconciliation: payouts to hosts, billing, chargebacks handling.
Should
Prepaid & pay-at-hotel flows; partial payments; refunds.
Waitlist & soft-hold (short-lived holds).
Loyalty points & coupon codes.
Nice-to-have
Room map visualization (inventory by room number), multi-property corporate booking tools, recommendations.
Non-Functional Requirements (NFR)
Performance: Search P95 < 200 ms; booking path fast & reliable.
Consistency: Strong correctness for booking inventory (no double-booking).
Scalability: scale search horizontally; booking scale with partitions.
Availability: 99.95% for reads; 99.9% for booking operations.
Durability: booking data persisted and replicated; audit logs.
Security: PCI-DSS for card handling, TLS, RBAC for host dashboards.
Privacy: PII protection, GDPR delete support.
Observability: metrics for search latency, booking success, inventory conflicts, payments.
Cost/operability: choose managed services where practical.
15 – 25 min — External APIs & Data model (contract)
Key APIs (REST/gRPC)
Simplified data model (high-level)
Hotel: id, name, location, timezone, policies, pms_integration_info
RoomType: id, hotel_id, capacity, amenities
RatePlan: id, room_type_id, price_rules, cancellation_policy, currency
Inventory: room_type_id, date, available_count, holds_count
Booking: id, hotel_id, room_type_id, checkin, checkout, guest_info, status (held/confirmed/cancelled), price_breakdown, payment_info, created_at
Hold: hold_id, room_type_id, start_date, end_date, expires_at, reserved_count, booking_id (if consumed)
25 – 35 min — High-Level Architecture
Key components explained
Search Service: serves search queries from precomputed indexes and caches (Elasticsearch/Opensearch or custom search over materialized availability). It returns candidate hotels + price info (cached per room/date snapshot).
Inventory Service: authoritative store of per-room_type per-date availability. Must support transactional updates and atomic holds/commits. Implemented on a strong-consistency DB (e.g., RDBMS) or a partitioned service with per-roomtype sharding.
Booking Service: orchestrates hold → payment authorization → convert hold to booking (confirm) or release on failure. Uses distributed transactions or compensation patterns.
Hold mechanism: short-lived holds (e.g., 5–15 minutes) to avoid double-booking during payment; holds write into inventory atomically.
Payment Gateway: integrate with PCI compliant processors; prefer tokenization & external vault.
Event bus / Kafka: async flows: notify host, update analytics, process payouts.
PMS adapters: for large hotels, sync availability and receive reservations/cancellations.
35 – 45 min — Booking flow & correctness (deep dive)
Flow: Search → Book (step by step)
User searches → Search Service queries availability index (Elasticsearch/Redis) and returns options.
User selects room & clicks book → Booking Service requests a Hold from Inventory Service for the requested room_type + date range (atomic).
Hold operation: decrement available_count by requested rooms and increment holds_count, create Hold record with expiry.
Inventory writes must be atomic per room_type/date (use DB transaction or compare-and-swap).
Booking Service returns hold_id to frontend; client proceeds to payment.
Payment: Payment Gateway authorizes card (auth-only) or charges depending on rateplan.
On success: Booking Service commits hold → creates Booking record with status=CONFIRMED; reduces holds_count, persists payment info (token), emits booking event.
On payment failure or hold expiry: Booking Service releases hold (increment available_count) and notifies user.
Concurrency & correctness strategies
Strongly consistent inventory updates:
Option A: Centralized DB per partition (room_type/date) with row-level locking and transactions (ACID). Suitable for moderate scale.
Option B: Distributed lock per (hotel_id, room_type_id) with optimistic concurrency (compare-and-swap using version numbers) or lightweight DB transactions.
Avoid long locks: use short holds with TTL and background reclaimers.
Idempotency: Booking API includes client-generated idempotency keys so retrying a booking doesn’t double-charge or double-book.
Eventual reconciliation: periodic batch job compares bookings vs inventory and repairs mismatches (audit log + manual alerts).
PMS sync: for hotels with external PMS, reconcile inbound reservations and pushes to keep inventory authoritative.
Handling edge cases
Late arrival: payment delays and hold expiration — inform user; offer re-check availability.
Overbooking: allow controlled overbooking for hotels that tolerate it; treat as business rule.
Race on last room: only one hold should succeed—ensure atomic decrement and check >0 before success.
45 – 50 min — Pricing, promotions & rate engines
Rate engine: evaluate pricing rules (base price, seasonal multipliers, occupancy-driven dynamic pricing, promo codes, taxes). Compute price breakdown on search and lock price at hold time (or accept variable price with reprice rules).
Price guarantee / reprice: show price at hold time; if price changes pre-confirmation, apply business rule: honor displayed price for hold window or reprice with user consent.
Coupons & inventory constraints: coupon validation is done at booking time; constraints (limited redemption) must be atomic (use separate counters with CAS).
50 – 55 min — Scaling, caching & BoE estimates
Sizing (example)
Assume peak 20k searches/sec and 500 bookings/sec.
Search
Use Elasticsearch cluster sharded by geo + hotel index. Cache hot queries & tile results in Redis CDN.
Cache search results per (loc, dates) for ~30s–2min TTL; use CDN for static assets.
Inventory & Booking
Partition inventory by hotel_id % N across Inventory DB instances to scale writes/holds. Choose N to keep partition load manageable.
For 500 bookings/sec, if each booking touches ~3 date rows (avg nights), expect ~1.5k inventory writes/sec. A few DB instances with transaction throughput ~k/s suffice.
Storage
Booking records: 500 bookings/sec -> ~43.2M bookings/day? Wait compute properly:
500 bookings/sec × 3600 × 24 = 43,200,000 bookings/day — that’s huge; realistic numbers likely lower. Adjust to interviewer numbers.
Use compression & archive older bookings to cold storage after retention period.
Example quick math (moderate scale):
500 bookings/sec = 43.2M/day; at 1 KB/booking = ~43.2 GB/day raw. With replication factor 3 => ~129.6 GB/day.
(Always adapt numbers per interviewer prompts.)
55 – 58 min — Observability, testing, security & ops
Monitoring
Metrics: search latency, cache hit ratio, holds/sec, confirm/sec, booking failure rates, payment success, inventory mismatches, reconciliation errors.
Tracing: end-to-end tracing from search → hold → payment → confirm.
Testing
Load tests for search and booking flows (simulate race on last room).
Chaos tests: fail inventory DB; validate reconciliation.
Security / Compliance
PCI compliance: do not store raw card numbers; use tokenization/payments provider. TLS everywhere, audit logs, RBAC, GDPR data deletion for bookings upon request (subject to business constraints).
58 – 60 min — Trade-offs, evolution & summary (wrap-up)
Key trade-offs
Strong consistency (no double-booking) vs latency & scale: prefer strong consistency for booking path; search can be eventually consistent with caches.
Centralized DB (simplicity) vs sharded inventory service (scale + complexity).
Short holds reduce double-booking but may decrease conversion (user may abandon). Balance hold TTL.
Evolution path
MVP: single DB inventory (ACID), simple search + cache, payment gateway integration.
Scale: shard inventory, introduce search index (Elasticsearch), caching layer, CDN.
Add PMS integrations, advanced pricing engine, ML for recommendations.
Geo-distributed active/active for global low-latency.
One-line summary
Build search as a highly cacheable read path; implement inventory & booking as the authoritative, strongly-consistent transactional core using short atomic holds + idempotent booking confirms; use async streams for notifications, reconciliation, and analytics — balancing correctness for bookings with scalability for search.
If you want, I can:
convert this into a 5-slide interview deck (one slide per major phase), or
produce a detailed sequence diagram for the booking flow (hold → payment → confirm), or
run BoE capacity math using your specific scale numbers (events/sec, properties, avg nights). Which should I do next?
Last updated