Difficulty: Hard Topics: Distributed Systems, Blob Storage, Metadata Management Key Concepts: Decoupling Metadata from Data, Immutable Objects, Flat Namespace.
Phase 1: Requirements Gathering
Goals
Design a simplified Object Storage Service like AWS S3.
Support manipulating "Buckets" and "Objects" (Files).
Ensure high durability and availability (conceptually).
1. Who are the actors?
User/Service: Uploads or downloads files via API.
Storage System: Manages physical bytes.
Metadata System: Manages indexing and attributes.
2. What are the must-have features? (Core)
Bucket Operations: Create, Delete, List.
Object Operations: Put, Get, Delete.
immutability: Objects are immutable (overwrite implies new version/file).
3. What are the constraints?
Consistency: Metadata should be eventually permissible, but strong consistency is preferred for new objects (S3 standard).
Blob Size: Support small (KB) to large (GB) files.
Phase 2: Use Cases
UC1: Create Bucket
Actor: User Flow:
User requests CreateBucket("my-photos").
System checks if name is globally unique.
System records new Bucket in Metadata Store.
UC2: Put Object
Actor: User Flow:
User uploads data to PutObject("my-photos", "vacation.jpg").
System (Storage Node) streams bytes to disk/SSD.
System generates a unique content address/path.
System updates Metadata Store with {Key: "vacation.jpg", Path: "/disk1/xyz", Size: ...}.
System returns Success.
Phase 3: Class Diagram
Step 1: Core Entities
S3Service: Facade.
Bucket: Logical container.
S3Object: Metadata Wrapper.
StorageBackend: Interface for physical storage (Local Disk, DFS).
UML Diagram
Phase 4: Design Patterns
1. Strategy Pattern
Description: Defines a family of algorithms, encapsulates each one, and makes them interchangeable.
Why used: The StorageBackend implementation can vary (Local Disk, HDFS, S3 Glacier, In-Memory). Strategy allows the storage engine to be swapped based on environment or cost requirements without changing the core S3 logic.
2. Facade Pattern
Description: Provides a unified interface to a set of interfaces in a subsystem. Facade defines a higher-level interface that makes the subsystem easier to use.
Why used: S3Service acts as a Facade, hiding the complexity of coordinating the Metadata Store (BucketManager) and the Blob Store (StorageBackend). Clients just call simple methods like putObject.
Phase 5: Code Key Methods
Java Implementation
Phase 6: Discussion
Scalability
Q: How to handle 1 Exabyte of data?
A: "The StorageBackend must be sharded. Use Consistent Hashing to distribute blobs across distinct storage nodes. Metadata DB (e.g., DynamoDB) is also partitioned by Bucket/Key."
Large Files
Q: How to upload a 5GB file?
A: "Multipart Upload. Client splits file into 100MB chunks. Uploads them in parallel.
A: "No. It is a flat Keyspace. 'Folders' are just prefixes. photos/2023/jan.jpg is the key. Validating 'folder' existence is an O(N) scan operation, which is why 'renaming a folder' is expensive (Copy+Delete)."
SOLID Principles Checklist
S (Single Responsibility): StorageBackend handles bytes, Bucket handles metadata.
O (Open/Closed): Add GlacierBackend without changing S3Service logging.
L (Liskov Substitution): FileSystemStorage can be replaced with NetworkStorage.
I (Interface Segregation): StorageBackend is a simple Read/Write interface.
D (Dependency Inversion): S3Service depends on StorageBackend interface.