This directory serves as the theoretical bedrock for Large Language Models (LLMs). It covers the fundamental research, architectural innovations, and training methodologies that power models like GPT-4, Llama 3, and Claude.
From trillions of tokens to a helpful assistant.
Pre-training (Objectives, Data Mixture).
SFT (Instruction Tuning).
Alignment (RLHF vs DPO).
The physics of LLM performance.
Compute Optimality (Chinchilla Laws).
Data Quality & Deduplication.
Synthetic Data generation.
State-of-the-art model internals.
Mixture of Experts (MoE).
KV-Cache & Attention Optimization (GQA, MQA).
Flash Attention & RoPE.
Measuring intelligence objectively.
MMLU, GSM8K, and HumanEval.
Model-as-a-Judge.
LMSYS Chatbot Arena.
Direct answers to high-signal LLM interview questions.
Architecture deep-dives.
Training strategies.
Production trade-offs.
Deep-dive books and resources on LLM development.
Last updated 20 days ago