English · Español
Study plan & timeline¶
The full curriculum is 362–599 hours of study across 42 core phases (plus the optional Extension Track). Pick a pace and the timeline below recomputes how many weeks and what finish date that implies — alongside the per-phase detail: cognitive effort, theory/lab split, concepts, and the milestone you unlock.
Pick your pace¶
Timeline¶
Per-phase detail¶
Table (no-JavaScript fallback)¶
| Phase | Chapter | Hours | Effort | Theory/Lab | Milestone |
|---|---|---|---|---|---|
| 00 | Project Foundations & Learning Methodology | 5–8 | ●●○○○ | 45/55 | Bootstrap a reproducible, gated Python project from a clean clone. |
| 01 | Hardware & Computing Substrate | 6–10 | ●●●○○ | 60/40 | Predict whether a kernel is memory- or compute-bound on your own machine. |
| 02 | Numerical Representation | 6–10 | ●●●●○ | 60/40 | Make a softmax numerically stable and explain exactly why it was not. |
| 03 | Linear Algebra from First Principles | 8–14 | ●●●●○ | 65/35 | Express any linear operation in the curriculum as an einsum and reason about its shapes. |
| 04 | Calculus & Optimization for AI | 8–14 | ●●●●○ | 60/40 | Derive backprop and the Adam update from scratch and visualize them on Rosenbrock. |
| 05 | Probability & Information Theory | 7–12 | ●●●●○ | 65/35 | Derive the cross-entropy loss as maximum likelihood and connect it to perplexity. |
| 06 | Python for AI Engineering | 6–10 | ●●●○○ | 40/60 | Predict the memory cost of a tensor op from its strides before running it. |
| 07 | Scalar Autograd from Scratch (minigrad) |
12–20 | ●●●●● | 40/60 | Train a 2-layer MLP with an autograd engine you wrote from nothing. |
| 08 | Tensor Autograd from Scratch | 12–20 | ●●●●● | 35/65 | Pass gradcheck on 20+ tensor ops including matmul and softmax. |
| 09 | Tiny MLP & Module Abstraction (minitorch) |
8–14 | ●●●○○ | 30/70 | Wrap your autograd in a PyTorch-style API you can compose modules with. |
| 10 | Initialization, Normalization, Residuals | 7–12 | ●●●●○ | 50/50 | Make a deep stack trainable with correct init, norm, and residuals. |
| 11 | Tokenization Theory + BPE Implementation | 8–14 | ●●●○○ | 40/60 | Build a BPE tokenizer from scratch and round-trip the verb corpus exactly. |
| 12 | The Corpus: Designing the Microscopic Dataset | 7–12 | ●●○○○ | 45/55 | Enumerate the full bilingual verb corpus with controlled mis-conjugations. |
| 13 | Embeddings & Representation Spaces | 6–10 | ●●●○○ | 45/55 | Train embeddings where tenses and verbs visibly cluster in the geometry. |
| 14 | Pre-Transformer Sequence Models | 8–12 | ●●●○○ | 50/50 | Show empirically why recurrence loses to attention on longer context. |
| 15 | Attention from Scratch | 15–25 | ●●●●● | 55/45 | Implement multi-head causal attention in NumPy and match a hand-derived reference. |
| 16 | Positional Encodings | 6–10 | ●●●●○ | 55/45 | Add RoPE to attention and explain why order suddenly matters. |
| 17 | Tiny Transformer Block & Mini-GPT | 10–16 | ●●●●○ | 40/60 | Build a Mini-GPT whose parameter count matches a closed-form formula to the digit. |
| 18 | Training Loop, Mixed Precision Preview, Checkpointing | 10–16 | ●●●●○ | 35/65 | Train the Mini-GPT past the n-gram baseline on verb-grammar perplexity. |
| 19 | Training Dynamics & Debugging | 8–14 | ●●●●○ | 40/60 | Diagnose three engineered training failures from dashboards alone. |
| 20 | Evaluation Harness | 7–12 | ●●●○○ | 45/55 | Build a domain eval harness with calibration and confidence intervals. |
| 21 | Inference Internals & Sampling | 7–12 | ●●●○○ | 40/60 | Implement the full sampling menu from raw logits and feel each knob. |
| 22 | KV Cache: From Math to Memory | 8–14 | ●●●●○ | 45/55 | Cut autoregressive generation from quadratic to linear with a KV cache. |
| 23 | GPU Architecture Fundamentals | 10–16 | ●●●●○ | 50/50 | Measure a rented GPU empirically and place your kernels on its roofline. |
| 24 | CUDA & Triton Hands-On | 14–22 | ●●●●● | 35/65 | Write a kernel in CUDA and Triton reaching 30%+ of peak — your first PyTorch import. |
| 25 | PyTorch Internals | 10–16 | ●●●●○ | 45/55 | Register a custom op with backward and read what torch.compile emits. |
| 26 | Quantization Deep Dive | 9–14 | ●●●●○ | 45/55 | Quantize the model and measure the quality-vs-bandwidth curve end-to-end. |
| 27 | Modern Attention Optimizations | 9–14 | ●●●●● | 55/45 | Derive FlashAttention as a roofline win that moves fewer bytes, not fewer FLOPs. |
| 28 | Fine-Tuning, LoRA, QLoRA | 9–14 | ●●●●○ | 45/55 | Fine-tune with LoRA without storing full-weight gradients, then quantize the base. |
| 29 | Retrieval-Augmented Generation (RAG) | 10–16 | ●●●●○ | 40/60 | Ground answers in retrieved text with hybrid search and citations. |
| 30 | Structured Generation & Constrained Decoding | 7–12 | ●●●●○ | 45/55 | Constrain output to valid JSON with logit masking — no post-hoc parsing. |
| 31 | Tool Use & the Model Context Protocol (MCP) | 8–12 | ●●●○○ | 40/60 | Expose Python functions as MCP tools with schema validation and error handling. |
| 32 | Agents: Planning, Memory, Sandboxing (Grammar Tutor) | 12–20 | ●●●●● | 40/60 | Build the grammar-tutor agent: plans, remembers, and runs tools safely. |
| 33 | Inference Serving: From FastAPI to Continuous Batching | 8–14 | ●●●●○ | 40/60 | Serve the tutor with a continuous-batching scheduler that beats static batching on p95. |
| 34 | Observability, Cost & Capacity | 7–12 | ●●●○○ | 40/60 | Instrument the stack with metrics, traces, and dollar-per-request accounting. |
| 35 | Distributed Training & Inference | 10–16 | ●●●●● | 55/45 | Design a multi-GPU sharding strategy and reason about its collectives. |
| 36 | Frontier Architectures | 8–14 | ●●●●○ | 60/40 | Map each frontier architecture to the specific bottleneck it solves. |
| 37 | Security & Safety of AI Systems | 9–14 | ●●●●○ | 45/55 | Turn every successful attack on the agent into a regression test. |
| 38 | Cost, Capacity, Operations, MLOps | 7–12 | ●●●○○ | 45/55 | Operate a registry with lineage and cost-per-quality deployment gates. |
| 39 | Capstone: The Miniature Production System | 12–20 | ●●●●○ | 25/75 | Ship one just demo that cold-starts, corrects verbs, traces, and tears down. |
| 40 | Hardening, Postmortem, "What's Next" | 6–10 | ●●●○○ | 55/45 | Write the postmortem and map the remaining frontier with a reading list. |
| 41 | Learner Portal: Delivering the Curriculum | 12–20 | ●●●●○ | 30/70 | Deliver the curriculum to many students with passwordless auth and spaced review. |
| X1 | Pretraining at Scale | 12–20 | ●●●●○ | 50/50 | Run a one-day cloud pretraining job and feel the MFU and cost dynamics. |
| X2 | Multi-Modal Models | 14–22 | ●●●●○ | 50/50 | Load and reason about ViT/CLIP/Whisper end-to-end. |
| X3 | RLHF / DPO / RLAIF | 14–24 | ●●●●● | 60/40 | Derive DPO and reward modeling from first principles and align the tutor. |
| X4 | Hardware Deep-Dive | 10–16 | ●●●●○ | 65/35 | Speak to H100/NVLink/AllReduce and datacenter economics at interview depth. |
| X5 | Interview Prep | 8–14 | ●●●○○ | 40/60 | Convert the whole curriculum into interview signal you can deliver on demand. |