Skip to content

English · Español

Study plan & timeline

The full curriculum is 362–599 hours of study across 42 core phases (plus the optional Extension Track). Pick a pace and the timeline below recomputes how many weeks and what finish date that implies — alongside the per-phase detail: cognitive effort, theory/lab split, concepts, and the milestone you unlock.

Pick your pace

Timeline

Per-phase detail

Table (no-JavaScript fallback)

Phase Chapter Hours Effort Theory/Lab Milestone
00 Project Foundations & Learning Methodology 5–8 ●●○○○ 45/55 Bootstrap a reproducible, gated Python project from a clean clone.
01 Hardware & Computing Substrate 6–10 ●●●○○ 60/40 Predict whether a kernel is memory- or compute-bound on your own machine.
02 Numerical Representation 6–10 ●●●●○ 60/40 Make a softmax numerically stable and explain exactly why it was not.
03 Linear Algebra from First Principles 8–14 ●●●●○ 65/35 Express any linear operation in the curriculum as an einsum and reason about its shapes.
04 Calculus & Optimization for AI 8–14 ●●●●○ 60/40 Derive backprop and the Adam update from scratch and visualize them on Rosenbrock.
05 Probability & Information Theory 7–12 ●●●●○ 65/35 Derive the cross-entropy loss as maximum likelihood and connect it to perplexity.
06 Python for AI Engineering 6–10 ●●●○○ 40/60 Predict the memory cost of a tensor op from its strides before running it.
07 Scalar Autograd from Scratch (minigrad) 12–20 ●●●●● 40/60 Train a 2-layer MLP with an autograd engine you wrote from nothing.
08 Tensor Autograd from Scratch 12–20 ●●●●● 35/65 Pass gradcheck on 20+ tensor ops including matmul and softmax.
09 Tiny MLP & Module Abstraction (minitorch) 8–14 ●●●○○ 30/70 Wrap your autograd in a PyTorch-style API you can compose modules with.
10 Initialization, Normalization, Residuals 7–12 ●●●●○ 50/50 Make a deep stack trainable with correct init, norm, and residuals.
11 Tokenization Theory + BPE Implementation 8–14 ●●●○○ 40/60 Build a BPE tokenizer from scratch and round-trip the verb corpus exactly.
12 The Corpus: Designing the Microscopic Dataset 7–12 ●●○○○ 45/55 Enumerate the full bilingual verb corpus with controlled mis-conjugations.
13 Embeddings & Representation Spaces 6–10 ●●●○○ 45/55 Train embeddings where tenses and verbs visibly cluster in the geometry.
14 Pre-Transformer Sequence Models 8–12 ●●●○○ 50/50 Show empirically why recurrence loses to attention on longer context.
15 Attention from Scratch 15–25 ●●●●● 55/45 Implement multi-head causal attention in NumPy and match a hand-derived reference.
16 Positional Encodings 6–10 ●●●●○ 55/45 Add RoPE to attention and explain why order suddenly matters.
17 Tiny Transformer Block & Mini-GPT 10–16 ●●●●○ 40/60 Build a Mini-GPT whose parameter count matches a closed-form formula to the digit.
18 Training Loop, Mixed Precision Preview, Checkpointing 10–16 ●●●●○ 35/65 Train the Mini-GPT past the n-gram baseline on verb-grammar perplexity.
19 Training Dynamics & Debugging 8–14 ●●●●○ 40/60 Diagnose three engineered training failures from dashboards alone.
20 Evaluation Harness 7–12 ●●●○○ 45/55 Build a domain eval harness with calibration and confidence intervals.
21 Inference Internals & Sampling 7–12 ●●●○○ 40/60 Implement the full sampling menu from raw logits and feel each knob.
22 KV Cache: From Math to Memory 8–14 ●●●●○ 45/55 Cut autoregressive generation from quadratic to linear with a KV cache.
23 GPU Architecture Fundamentals 10–16 ●●●●○ 50/50 Measure a rented GPU empirically and place your kernels on its roofline.
24 CUDA & Triton Hands-On 14–22 ●●●●● 35/65 Write a kernel in CUDA and Triton reaching 30%+ of peak — your first PyTorch import.
25 PyTorch Internals 10–16 ●●●●○ 45/55 Register a custom op with backward and read what torch.compile emits.
26 Quantization Deep Dive 9–14 ●●●●○ 45/55 Quantize the model and measure the quality-vs-bandwidth curve end-to-end.
27 Modern Attention Optimizations 9–14 ●●●●● 55/45 Derive FlashAttention as a roofline win that moves fewer bytes, not fewer FLOPs.
28 Fine-Tuning, LoRA, QLoRA 9–14 ●●●●○ 45/55 Fine-tune with LoRA without storing full-weight gradients, then quantize the base.
29 Retrieval-Augmented Generation (RAG) 10–16 ●●●●○ 40/60 Ground answers in retrieved text with hybrid search and citations.
30 Structured Generation & Constrained Decoding 7–12 ●●●●○ 45/55 Constrain output to valid JSON with logit masking — no post-hoc parsing.
31 Tool Use & the Model Context Protocol (MCP) 8–12 ●●●○○ 40/60 Expose Python functions as MCP tools with schema validation and error handling.
32 Agents: Planning, Memory, Sandboxing (Grammar Tutor) 12–20 ●●●●● 40/60 Build the grammar-tutor agent: plans, remembers, and runs tools safely.
33 Inference Serving: From FastAPI to Continuous Batching 8–14 ●●●●○ 40/60 Serve the tutor with a continuous-batching scheduler that beats static batching on p95.
34 Observability, Cost & Capacity 7–12 ●●●○○ 40/60 Instrument the stack with metrics, traces, and dollar-per-request accounting.
35 Distributed Training & Inference 10–16 ●●●●● 55/45 Design a multi-GPU sharding strategy and reason about its collectives.
36 Frontier Architectures 8–14 ●●●●○ 60/40 Map each frontier architecture to the specific bottleneck it solves.
37 Security & Safety of AI Systems 9–14 ●●●●○ 45/55 Turn every successful attack on the agent into a regression test.
38 Cost, Capacity, Operations, MLOps 7–12 ●●●○○ 45/55 Operate a registry with lineage and cost-per-quality deployment gates.
39 Capstone: The Miniature Production System 12–20 ●●●●○ 25/75 Ship one just demo that cold-starts, corrects verbs, traces, and tears down.
40 Hardening, Postmortem, "What's Next" 6–10 ●●●○○ 55/45 Write the postmortem and map the remaining frontier with a reading list.
41 Learner Portal: Delivering the Curriculum 12–20 ●●●●○ 30/70 Deliver the curriculum to many students with passwordless auth and spaced review.
X1 Pretraining at Scale 12–20 ●●●●○ 50/50 Run a one-day cloud pretraining job and feel the MFU and cost dynamics.
X2 Multi-Modal Models 14–22 ●●●●○ 50/50 Load and reason about ViT/CLIP/Whisper end-to-end.
X3 RLHF / DPO / RLAIF 14–24 ●●●●● 60/40 Derive DPO and reward modeling from first principles and align the tutor.
X4 Hardware Deep-Dive 10–16 ●●●●○ 65/35 Speak to H100/NVLink/AllReduce and datacenter economics at interview depth.
X5 Interview Prep 8–14 ●●●○○ 40/60 Convert the whole curriculum into interview signal you can deliver on demand.