#11 Payment system
Below is a complete, time-boxed 1-hour interview answer for designing a Payment System (think payments platform that handles card wallets, bank transfers, authorizations, captures, refunds, payouts — similar to Stripe/Adyen/PayPal core flows). It follows your interview pattern: clarify → FR/NFR → APIs & data model → high-level architecture → deep dives (consistency, transactions, reconciliation, fraud) → capacity/BoE → ops/security → trade-offs & wrap up. Use the minute markers to pace your delivery.
0 – 5 min — Problem recap, scope & assumptions (set the stage)
Start by restating and aligning scope.
Goal: Build a payment processing platform that accepts payment requests (cards, bank debits, wallets), performs authorization/capture/refund flows, routes to payment processors/acquirers, handles payouts to merchants, provides dashboards & webhooks, supports PCI/DSS security, fraud detection, settlement/reconciliation, and strong auditability.
Key capabilities
Authorize and capture payments (separate flows).
Support multiple payment methods & processors (pluggable).
Handle refunds, partial refunds, chargebacks.
Maintain idempotency, strong durability, and reconciliation for financial correctness.
Expose APIs for merchants + webhooks for event notifications.
Strong security, compliance (PCI, KYC), monitoring, and dispute flows.
Example assumptions (tunable)
10M transactions/day (≈116 tps avg), peak 5k tps during campaigns.
Average transaction payload ~1 KB.
Settlement daily; funds movement through payment rails (acquirers/issuers) and payouts to merchants (ACH/SEPA/Wire).
SLA: API acceptance P95 < 200 ms; orchestration flows complete within seconds to minutes depending on payment rails.
5 – 15 min — Functional & Non-Functional Requirements
Functional Requirements (Must / Should / Nice)
Must
Accept payment requests via REST/gRPC and SDKs: support create payment (authorize, capture), one-click charges, tokenization.
Multiple payment methods: cards (tokenized), bank transfers (ACH/SEPA), wallets (Apple/Google Pay), BNPL (optional).
Authorization / Capture: support auth-only, later capture, voids.
Refunds & chargebacks: partial/full refunds, handle inbound disputes and evidence submission.
Payouts: pay out merchant balances to bank accounts, support scheduling & currency conversion.
Routing & fallbacks: choose/acquirer routing with retry/failover and dynamic route selection (cost/latency/success).
Webhooks & notifications: push payment status changes to merchants.
Reporting & settlement: detailed transaction logs, settlement files, fees, reconciliations.
Idempotency & retries: client idempotency keys, safe retries without double-charging.
Fraud detection: real-time scoring, blocking, manual review flows.
Should
Tokenization / vault for card data (PCI scope reduction).
Support multi-currency and FX.
Rule engine for custom routing/pricing per merchant.
Nice-to-have
Smart retry logic using historical success rates per acquirer/currency.
Instant payouts to debit cards (real-time rails).
Non-Functional Requirements
Performance: API accept P95 < 200 ms; auth latency to acquirers typically < 1s but can be higher.
Throughput: support peak tps (example 5k tps) with auto-scale.
Durability & correctness: no lost transactions, idempotent behavior, persisted audit trail (immutable ledger).
Availability: 99.95% for acceptance path; background settlement can be less strict.
Consistency: strong consistency for financial state (balances, ledger) — serializable or linearizable operations for money movement.
Security & compliance: PCI-DSS, encryption, KYC for merchants, auditability.
Observability: per-transaction tracing, reconciliation metrics, fraud metrics.
Latency vs accuracy trade-off: prefer correctness for money flows; allow async for long-running external rails.
15 – 25 min — APIs & Data model (external contract)
Key APIs
Minimal transaction/event model
Payment
LedgerEntry (immutable)
Accounts
Platform account (holds fees)
Merchant ledger/account (available balance, pending settlements) — must be updated transactionally
PaymentMethod / Token
token_id pointing to vault entry (no raw PAN in system)
Routing & Acquirer config
acquirer_id, regions/currencies, fees, retry policies, limits
Audit log: append-only immutable stream of events for each payment.
25 – 40 min — High-level architecture & data flow
Key components explained
API Gateway: authenticates merchant, enforces quotas, dedupe via idempotency keys (reject duplicates or return existing payment).
Payments Orchestrator: core service that handles the happy path: validate request, reserve ledger entries (pending debit to card + hold merchant pending balance), route to connector.
Connector Layer: adapters per acquirer/PSP that handle protocol differences, retries, and error mapping.
Event Bus: reliable durable messaging for async work, retries, webhooks, and reconciliation.
Ledger / Accounting Service: strong-consistency store that records all Debits/Credits as immutable ledger entries; updates merchant balances transactionally. This must be ACID and support concurrency controls.
Vault / Tokenization: PCI-scope vault for card data (separate service), tokens returned to merchants.
Fraud / Risk Service: synchronous scoring during auth, and async risk workflows for manual review.
Reconciliation & Settlement: batch jobs that reconcile acquirer settlement files to ledger, compute payouts, and generate settlement reports.
Payout Service: issues ACH/SEPA/Wire payouts, manages payout schedules and limits.
Monitoring & Audit: trace and audit log accessible for disputes & investigations.
40 – 50 min — Deep dives: correctness, transactions, reconciliation & fraud
Money movement & consistency
Two-layer model:
External movement: interaction with acquirers/rails (auth, capture, settlement) — external rails are eventually consistent and can have delays/failures.
Internal ledger: we maintain an internal single source of truth (immutable ledger) and merchant balances derived from ledger (available balance, pending). All business decisions use ledger state.
Atomicity:
On payment authorization: create ledger entries for pending hold (e.g., Debit customer/payment source? — external) and create a pending credit/reservation in merchant ledger (not available until capture/settlement). These ledger writes are transactional.
Use database transactions or a transactional ledger service (e.g., strongly-consistent DB or CRDTs with strong serializability) to ensure money conservation invariants.
Idempotency:
Client provides idempotency_key. Orchestrator checks store: if existing request with same key, return existing response rather than re-issuing network calls that could double-charge.
Exactly-once to external connectors:
Use connector-level dedupe (store external request IDs) and idempotent APIs when supported; otherwise design retry logic carefully to avoid duplicate captures.
Handling async confirmations:
Auth may be synchronous (instant) or async (3DS, bank delays). Payment status transitions are driven by incoming connector events (webhooks or polling) that update ledger.
Reconciliation & Settlement
Settlement flow:
Capture events → acquirer settles in batch to platform (with settlement report/statement). Platform reconciles settlement file to ledger entries: match transaction IDs, amounts, fees. Discrepancies flagged.
Reconciliation process:
Compare acquirer settlement lines with ledger entries; produce adjustments (fees, chargebacks). Implement tolerances and workflows for unmatched items.
Persist reconciliation metadata; produce settlement files for merchant payouts.
Chargebacks:
On chargeback, platform subtracts amount from merchant ledger (provision for reserve), creates chargeback ledger entries, triggers dispute process (evidence submission). Maintain dispute lifecycle state.
Fraud Prevention & Risk
Synchronous fraud checks:
Before auth: score payment with real-time models (device fingerprint, velocity checks, BIN checks, user behavior, 3DS flows). If high-risk, block or route for additional verification (3DS, manual review).
Asynchronous analytics:
Post-transaction scoring for patterns, cluster-based detection for collusion, repeat offenders, or synthetic identities.
Manual review:
Provide UI/workflow to approve/reject flagged transactions; actions should be replayable and produce audit trail.
Fail-open vs fail-closed:
For small merchants, choose conservative defaults (fail closed) or configurable per merchant risk tolerance.
Fault-tolerance & idempotency for connectors
Connector design:
Each connector persists outbound request state and external identifiers; on retry, consult external ID to avoid duplicates. Support transactional outbox pattern to guarantee delivery.
Outbox & inbox:
Use transactional outbox pattern: orchestrator writes to DB (ledgers + outbox row) in a single transaction; separate delivery worker reads outbox and sends to connector; on ack, marks outbox delivered.
50 – 55 min — Back-of-the-envelope capacity & sizing
(Sample math — adapt to interviewer numbers.)
Assumptions: 10M tx/day average, peak 5k tps, avg payload 1 KB.
Ingress: 10M/day → ≈ 116 tps avg; provision for peak = 5k tps.
Event bus: at 5k tps × 1 KB = 5 MB/s ≈ 43.2 GB/day. With replication factor 3, ~130 GB/day through Kafka.
Ledger DB: writes per tx (multiple entries per payment: authorization, fees, settlement) → say 3 entries/tx → 30M ledger writes/day. Choose a database (NewSQL or partitioned RDBMS) sized for sustained writes with multi-master or partitioned leaders.
Connector throughput: many external acquirers limit QPS per merchant or per IP; need lots of connector workers & pooled connections.
Storage: raw events & audit logs archived to S3 — 43.2 GB/day ≈ 1.3 TB/month (compressible).
Workers: scale orchestrator instances and connector workers horizontally; use autoscaling based on queue depth and latencies.
Stateful services: ledger nodes and payout schedulers need high availability, backups, and consistent replication.
55 – 58 min — Operations, security & compliance
Security & PCI
PCI scope reduction: use tokenization & vault externalization (e.g., use a PCI-compliant vault service). Avoid storing PAN in app DBs.
Encryption: TLS in transit; encrypt at-rest for sensitive fields (keys in KMS).
Access control & audit: strict IAM, RBAC for admin ops; immutable audit logs for all financial actions.
Key management: KMS for keys, HSM for signing if needed (e.g., for settlement files).
Compliance & KYC
Merchant onboarding (KYC): identity checks, business validation, risk profiling, limits per merchant.
Regulatory reporting: transaction logs, AML/KYC, suspicious activity reporting.
Monitoring & alerts
Metrics: API latency, success/failure rates, queue lag, ledger invariants (sum of debits = sum of credits), reconciliation mismatch rate, fraud flags.
Tracing: distributed tracing for each payment lifecycle.
Alerts: escalations for ledger imbalance, replication lag, connector outage, spike in declines/chargebacks.
Disaster Recovery
Backup ledger DBs, replicate critical state to multiple regions, have failover procedures and reconcile using immutable event archives.
58 – 60 min — Trade-offs, evolution & summary (wrap-up)
Key trade-offs
Synchronous vs asynchronous: sync auth provides immediate UX but relies on external rails; async flows needed for slow rails (ACH). Prefer sync for auth, async for settlement.
Consistency vs availability: ledger should be strongly consistent (ACID) for money correctness; design other parts (analytics, reporting) eventually consistent.
Complexity vs correctness: transactional outbox + strong ledger increases complexity but is essential for financial correctness.
Performance vs fraud prevention: heavier fraud checks increase latency; allow configurable risk profiles per merchant.
Evolution path
MVP: accept card payments via one acquirer, basic ledger, tokenization via vault, webhooks, basic refunds.
Phase 2: add connector layer with multiple acquirers, routing & smart retry, payouts, reconciliation engine.
Phase 3: advanced fraud ML, multi-currency FX, dynamic routing, instant payouts, global multi-region failover and regulatory features.
One-line summary
Implement a payments platform where a strong, transactional internal ledger is the single source of truth; orchestrate external authorizations/captures through a resilient connector layer with idempotency and outbox delivery; perform settlement and reconciliation with immutable audit trails; and surround it with robust fraud, KYC, and PCI-compliant security to guarantee correctness and trust for money flows.
If you’d like, I can now:
convert this into a 5-slide interview deck (one slide per major time block),
produce a sequence diagram for create payment → auth → capture → settle → payout, or
run capacity math with your specific expected TPS, average transaction size, and retention window.
Which would you like next?
Last updated