githubEdit

LLMs

This directory serves as the theoretical bedrock for Large Language Models (LLMs). It covers the fundamental research, architectural innovations, and training methodologies that power models like GPT-4, Llama 3, and Claude.

Theoretical Roadmap

From trillions of tokens to a helpful assistant.

  • Pre-training (Objectives, Data Mixture).

  • SFT (Instruction Tuning).

  • Alignment (RLHF vs DPO).

The physics of LLM performance.

  • Compute Optimality (Chinchilla Laws).

  • Data Quality & Deduplication.

  • Synthetic Data generation.

State-of-the-art model internals.

  • Mixture of Experts (MoE).

  • KV-Cache & Attention Optimization (GQA, MQA).

  • Flash Attention & RoPE.

Measuring intelligence objectively.

  • MMLU, GSM8K, and HumanEval.

  • Model-as-a-Judge.

  • LMSYS Chatbot Arena.


Direct answers to high-signal LLM interview questions.

  • Architecture deep-dives.

  • Training strategies.

  • Production trade-offs.


Deep-dive books and resources on LLM development.

Last updated