#7 Hotel reservation system

Below is a complete, time-boxed 1-hour interview answer for designing a Hotel Reservation System (think Booking.com / Expedia core features). It follows your preferred pattern: clarify → FR/NFR → APIs → high-level design → deep dives (availability, booking, payments, scaling) → BoE sizing → trade-offs & wrap up. Use it as a script in an interview — speak the minutes noted.

0 – 5 min — Problem recap, scope & key assumptions (set the stage)

Quickly restate and confirm scope so interviewer and you are aligned.

Goal: A system that lets users search hotels, view availability & prices, hold/confirm bookings, manage cancellations/changes, and supports hosts (inventory) and payments. Provide real-time availability, prevent double-booking, support promotions/taxes, and provide dashboards + billing.

Primary needs:

Real-time availability & booking correctness (no double-booking).
High search throughput and low latency for price/availability.
Integration with external property systems (PMS) for large hotels.
Support OTA features: cancellation policy, refunds, holds, payment capture.

Example assumptions (adjustable):

1M MAU, peak 20k searches/sec, peak 500 bookings/sec.
100k properties, average 50 rooms/property.
Booking conversion ~0.5%; retention 30 days raw events.
Latency targets: search P95 < 200 ms; booking end-to-end < 1s for availability verification, < 3s including payment.

5 – 15 min — Functional & Non-Functional Requirements

Functional Requirements (Must / Should / Nice)

Must

Search hotels by location/date/guests/filters; returns available properties with prices and room types.
Check availability for specific room(s) & date range.
Create booking: hold inventory, collect guest info, accept payment (or authorize), confirm booking.
Cancel / modify booking per policy.
Host management: inventory CRUD, rates, restrictions, sync with Property Management Systems (PMS) via APIs/feeds.
Pricing rules: support seasonal rates, dynamic pricing, taxes, fees, promotions, and currency conversion.
Notifications: emails/SMS for confirmations, reminders, cancellations.
Reporting & reconciliation: payouts to hosts, billing, chargebacks handling.

Should

Prepaid & pay-at-hotel flows; partial payments; refunds.
Waitlist & soft-hold (short-lived holds).
Loyalty points & coupon codes.

Nice-to-have

Room map visualization (inventory by room number), multi-property corporate booking tools, recommendations.

Non-Functional Requirements (NFR)

Performance: Search P95 < 200 ms; booking path fast & reliable.
Consistency: Strong correctness for booking inventory (no double-booking).
Scalability: scale search horizontally; booking scale with partitions.
Availability: 99.95% for reads; 99.9% for booking operations.
Durability: booking data persisted and replicated; audit logs.
Security: PCI-DSS for card handling, TLS, RBAC for host dashboards.
Privacy: PII protection, GDPR delete support.
Observability: metrics for search latency, booking success, inventory conflicts, payments.
Cost/operability: choose managed services where practical.

15 – 25 min — External APIs & Data model (contract)

Key APIs (REST/gRPC)

GET  /search?loc=Paris&checkin=2025-10-10&checkout=2025-10-13&guests=2
GET  /hotel/{hotel_id}?checkin=&checkout=&rooms=
POST /booking/create
    {hotel_id, room_type_id, checkin, checkout, guests, guest_info, payment_method, hold_id?}
POST /booking/confirm {booking_id, payment_method}
POST /booking/cancel {booking_id, reason}
POST /inventory/update (host) {hotel_id, room_type_id, rates, availability}
GET  /booking/{id}
POST /payouts/process (admin)

Simplified data model (high-level)

Hotel: id, name, location, timezone, policies, pms_integration_info
RoomType: id, hotel_id, capacity, amenities
RatePlan: id, room_type_id, price_rules, cancellation_policy, currency
Inventory: room_type_id, date, available_count, holds_count
Booking: id, hotel_id, room_type_id, checkin, checkout, guest_info, status (held/confirmed/cancelled), price_breakdown, payment_info, created_at
Hold: hold_id, room_type_id, start_date, end_date, expires_at, reserved_count, booking_id (if consumed)

25 – 35 min — High-Level Architecture

Clients (web/mobile) 
   |
 CDN/API Gateway (auth, rate-limit, geo)
   |
 Frontend Services (stateless) —> Cache (Redis)
   |
 +-------------------------------+
 | Search Service (read-optimized)|
 +-------------------------------+
   |
 Booking Service (transactional) <-- Payment Gateway
   |
 Inventory Service (authoritative) <--> Host Portal / PMS Integrations
   |
 Event Bus (Kafka)  -> downstream: Notification, Billing, Reconciliation, Analytics
   |
 Data Lake / OLAP (for reports)

Key components explained

Search Service: serves search queries from precomputed indexes and caches (Elasticsearch/Opensearch or custom search over materialized availability). It returns candidate hotels + price info (cached per room/date snapshot).
Inventory Service: authoritative store of per-room_type per-date availability. Must support transactional updates and atomic holds/commits. Implemented on a strong-consistency DB (e.g., RDBMS) or a partitioned service with per-roomtype sharding.
Booking Service: orchestrates hold → payment authorization → convert hold to booking (confirm) or release on failure. Uses distributed transactions or compensation patterns.
Hold mechanism: short-lived holds (e.g., 5–15 minutes) to avoid double-booking during payment; holds write into inventory atomically.
Payment Gateway: integrate with PCI compliant processors; prefer tokenization & external vault.
Event bus / Kafka: async flows: notify host, update analytics, process payouts.
PMS adapters: for large hotels, sync availability and receive reservations/cancellations.

35 – 45 min — Booking flow & correctness (deep dive)

Flow: Search → Book (step by step)

User searches → Search Service queries availability index (Elasticsearch/Redis) and returns options.
User selects room & clicks book → Booking Service requests a Hold from Inventory Service for the requested room_type + date range (atomic).
- Hold operation: decrement available_count by requested rooms and increment holds_count, create Hold record with expiry.
- Inventory writes must be atomic per room_type/date (use DB transaction or compare-and-swap).
Booking Service returns hold_id to frontend; client proceeds to payment.
Payment: Payment Gateway authorizes card (auth-only) or charges depending on rateplan.
On success: Booking Service commits hold → creates Booking record with status=CONFIRMED; reduces holds_count, persists payment info (token), emits booking event.
On payment failure or hold expiry: Booking Service releases hold (increment available_count) and notifies user.

Concurrency & correctness strategies

Strongly consistent inventory updates:
- Option A: Centralized DB per partition (room_type/date) with row-level locking and transactions (ACID). Suitable for moderate scale.
- Option B: Distributed lock per (hotel_id, room_type_id) with optimistic concurrency (compare-and-swap using version numbers) or lightweight DB transactions.
- Avoid long locks: use short holds with TTL and background reclaimers.
Idempotency: Booking API includes client-generated idempotency keys so retrying a booking doesn’t double-charge or double-book.
Eventual reconciliation: periodic batch job compares bookings vs inventory and repairs mismatches (audit log + manual alerts).
PMS sync: for hotels with external PMS, reconcile inbound reservations and pushes to keep inventory authoritative.

Handling edge cases

Late arrival: payment delays and hold expiration — inform user; offer re-check availability.
Overbooking: allow controlled overbooking for hotels that tolerate it; treat as business rule.
Race on last room: only one hold should succeed—ensure atomic decrement and check >0 before success.

45 – 50 min — Pricing, promotions & rate engines

Rate engine: evaluate pricing rules (base price, seasonal multipliers, occupancy-driven dynamic pricing, promo codes, taxes). Compute price breakdown on search and lock price at hold time (or accept variable price with reprice rules).
Price guarantee / reprice: show price at hold time; if price changes pre-confirmation, apply business rule: honor displayed price for hold window or reprice with user consent.
Coupons & inventory constraints: coupon validation is done at booking time; constraints (limited redemption) must be atomic (use separate counters with CAS).

50 – 55 min — Scaling, caching & BoE estimates

Sizing (example)

Assume peak 20k searches/sec and 500 bookings/sec.

Use Elasticsearch cluster sharded by geo + hotel index. Cache hot queries & tile results in Redis CDN.
Cache search results per (loc, dates) for ~30s–2min TTL; use CDN for static assets.

Inventory & Booking

Partition inventory by hotel_id % N across Inventory DB instances to scale writes/holds. Choose N to keep partition load manageable.
For 500 bookings/sec, if each booking touches ~3 date rows (avg nights), expect ~1.5k inventory writes/sec. A few DB instances with transaction throughput ~k/s suffice.

Storage

Booking records: 500 bookings/sec -> ~43.2M bookings/day? Wait compute properly:
- 500 bookings/sec × 3600 × 24 = 43,200,000 bookings/day — that’s huge; realistic numbers likely lower. Adjust to interviewer numbers.
Use compression & archive older bookings to cold storage after retention period.

Example quick math (moderate scale):

500 bookings/sec = 43.2M/day; at 1 KB/booking = ~43.2 GB/day raw. With replication factor 3 => ~129.6 GB/day.

(Always adapt numbers per interviewer prompts.)

55 – 58 min — Observability, testing, security & ops

Monitoring

Metrics: search latency, cache hit ratio, holds/sec, confirm/sec, booking failure rates, payment success, inventory mismatches, reconciliation errors.
Tracing: end-to-end tracing from search → hold → payment → confirm.

Testing

Load tests for search and booking flows (simulate race on last room).
Chaos tests: fail inventory DB; validate reconciliation.

Security / Compliance

PCI compliance: do not store raw card numbers; use tokenization/payments provider. TLS everywhere, audit logs, RBAC, GDPR data deletion for bookings upon request (subject to business constraints).

58 – 60 min — Trade-offs, evolution & summary (wrap-up)

Key trade-offs

Strong consistency (no double-booking) vs latency & scale: prefer strong consistency for booking path; search can be eventually consistent with caches.
Centralized DB (simplicity) vs sharded inventory service (scale + complexity).
Short holds reduce double-booking but may decrease conversion (user may abandon). Balance hold TTL.

Evolution path

MVP: single DB inventory (ACID), simple search + cache, payment gateway integration.
Scale: shard inventory, introduce search index (Elasticsearch), caching layer, CDN.
Add PMS integrations, advanced pricing engine, ML for recommendations.
Geo-distributed active/active for global low-latency.

One-line summary

Build search as a highly cacheable read path; implement inventory & booking as the authoritative, strongly-consistent transactional core using short atomic holds + idempotent booking confirms; use async streams for notifications, reconciliation, and analytics — balancing correctness for bookings with scalability for search.

If you want, I can:

convert this into a 5-slide interview deck (one slide per major phase), or
produce a detailed sequence diagram for the booking flow (hold → payment → confirm), or
run BoE capacity math using your specific scale numbers (events/sec, properties, avg nights). Which should I do next?

Previous#3 Google map Next#5 Metrics monitoring and alerting system

Last updated 4 months ago

hashtag0 – 5 min — Problem recap, scope & key assumptions (set the stage)

hashtag5 – 15 min — Functional & Non-Functional Requirements

hashtagFunctional Requirements (Must / Should / Nice)

hashtagNon-Functional Requirements (NFR)

hashtag15 – 25 min — External APIs & Data model (contract)

hashtagKey APIs (REST/gRPC)

hashtagSimplified data model (high-level)

hashtag25 – 35 min — High-Level Architecture

hashtag35 – 45 min — Booking flow & correctness (deep dive)

hashtagFlow: Search → Book (step by step)

hashtagConcurrency & correctness strategies

hashtagHandling edge cases

hashtag45 – 50 min — Pricing, promotions & rate engines

hashtag50 – 55 min — Scaling, caching & BoE estimates

hashtagSizing (example)

hashtag55 – 58 min — Observability, testing, security & ops

hashtag58 – 60 min — Trade-offs, evolution & summary (wrap-up)

hashtagKey trade-offs

hashtagEvolution path

hashtagOne-line summary