English · Español

Study plan & timeline¶

The full curriculum is 362–599 hours of study across 42 core phases (plus the optional Extension Track). Pick a pace and the timeline below recomputes how many weeks and what finish date that implies — alongside the per-phase detail: cognitive effort, theory/lab split, concepts, and the milestone you unlock.

Pick your pace¶

Timeline¶

Per-phase detail¶

Table (no-JavaScript fallback)¶

Phase	Chapter	Hours	Effort	Theory/Lab	Milestone
00	Project Foundations & Learning Methodology	5–8	●●○○○	45/55	Bootstrap a reproducible, gated Python project from a clean clone.
01	Hardware & Computing Substrate	6–10	●●●○○	60/40	Predict whether a kernel is memory- or compute-bound on your own machine.
02	Numerical Representation	6–10	●●●●○	60/40	Make a softmax numerically stable and explain exactly why it was not.
03	Linear Algebra from First Principles	8–14	●●●●○	65/35	Express any linear operation in the curriculum as an einsum and reason about its shapes.
04	Calculus & Optimization for AI	8–14	●●●●○	60/40	Derive backprop and the Adam update from scratch and visualize them on Rosenbrock.
05	Probability & Information Theory	7–12	●●●●○	65/35	Derive the cross-entropy loss as maximum likelihood and connect it to perplexity.
06	Python for AI Engineering	6–10	●●●○○	40/60	Predict the memory cost of a tensor op from its strides before running it.
07	Scalar Autograd from Scratch (`minigrad`)	12–20	●●●●●	40/60	Train a 2-layer MLP with an autograd engine you wrote from nothing.
08	Tensor Autograd from Scratch	12–20	●●●●●	35/65	Pass gradcheck on 20+ tensor ops including matmul and softmax.
09	Tiny MLP & Module Abstraction (`minitorch`)	8–14	●●●○○	30/70	Wrap your autograd in a PyTorch-style API you can compose modules with.
10	Initialization, Normalization, Residuals	7–12	●●●●○	50/50	Make a deep stack trainable with correct init, norm, and residuals.
11	Tokenization Theory + BPE Implementation	8–14	●●●○○	40/60	Build a BPE tokenizer from scratch and round-trip the verb corpus exactly.
12	The Corpus: Designing the Microscopic Dataset	7–12	●●○○○	45/55	Enumerate the full bilingual verb corpus with controlled mis-conjugations.
13	Embeddings & Representation Spaces	6–10	●●●○○	45/55	Train embeddings where tenses and verbs visibly cluster in the geometry.
14	Pre-Transformer Sequence Models	8–12	●●●○○	50/50	Show empirically why recurrence loses to attention on longer context.
15	Attention from Scratch	15–25	●●●●●	55/45	Implement multi-head causal attention in NumPy and match a hand-derived reference.
16	Positional Encodings	6–10	●●●●○	55/45	Add RoPE to attention and explain why order suddenly matters.
17	Tiny Transformer Block & Mini-GPT	10–16	●●●●○	40/60	Build a Mini-GPT whose parameter count matches a closed-form formula to the digit.
18	Training Loop, Mixed Precision Preview, Checkpointing	10–16	●●●●○	35/65	Train the Mini-GPT past the n-gram baseline on verb-grammar perplexity.
19	Training Dynamics & Debugging	8–14	●●●●○	40/60	Diagnose three engineered training failures from dashboards alone.
20	Evaluation Harness	7–12	●●●○○	45/55	Build a domain eval harness with calibration and confidence intervals.
21	Inference Internals & Sampling	7–12	●●●○○	40/60	Implement the full sampling menu from raw logits and feel each knob.
22	KV Cache: From Math to Memory	8–14	●●●●○	45/55	Cut autoregressive generation from quadratic to linear with a KV cache.
23	GPU Architecture Fundamentals	10–16	●●●●○	50/50	Measure a rented GPU empirically and place your kernels on its roofline.
24	CUDA & Triton Hands-On	14–22	●●●●●	35/65	Write a kernel in CUDA and Triton reaching 30%+ of peak — your first PyTorch import.
25	PyTorch Internals	10–16	●●●●○	45/55	Register a custom op with backward and read what torch.compile emits.
26	Quantization Deep Dive	9–14	●●●●○	45/55	Quantize the model and measure the quality-vs-bandwidth curve end-to-end.
27	Modern Attention Optimizations	9–14	●●●●●	55/45	Derive FlashAttention as a roofline win that moves fewer bytes, not fewer FLOPs.
28	Fine-Tuning, LoRA, QLoRA	9–14	●●●●○	45/55	Fine-tune with LoRA without storing full-weight gradients, then quantize the base.
29	Retrieval-Augmented Generation (RAG)	10–16	●●●●○	40/60	Ground answers in retrieved text with hybrid search and citations.
30	Structured Generation & Constrained Decoding	7–12	●●●●○	45/55	Constrain output to valid JSON with logit masking — no post-hoc parsing.
31	Tool Use & the Model Context Protocol (MCP)	8–12	●●●○○	40/60	Expose Python functions as MCP tools with schema validation and error handling.
32	Agents: Planning, Memory, Sandboxing (Grammar Tutor)	12–20	●●●●●	40/60	Build the grammar-tutor agent: plans, remembers, and runs tools safely.
33	Inference Serving: From FastAPI to Continuous Batching	8–14	●●●●○	40/60	Serve the tutor with a continuous-batching scheduler that beats static batching on p95.
34	Observability, Cost & Capacity	7–12	●●●○○	40/60	Instrument the stack with metrics, traces, and dollar-per-request accounting.
35	Distributed Training & Inference	10–16	●●●●●	55/45	Design a multi-GPU sharding strategy and reason about its collectives.
36	Frontier Architectures	8–14	●●●●○	60/40	Map each frontier architecture to the specific bottleneck it solves.
37	Security & Safety of AI Systems	9–14	●●●●○	45/55	Turn every successful attack on the agent into a regression test.
38	Cost, Capacity, Operations, MLOps	7–12	●●●○○	45/55	Operate a registry with lineage and cost-per-quality deployment gates.
39	Capstone: The Miniature Production System	12–20	●●●●○	25/75	Ship one `just demo` that cold-starts, corrects verbs, traces, and tears down.
40	Hardening, Postmortem, "What's Next"	6–10	●●●○○	55/45	Write the postmortem and map the remaining frontier with a reading list.
41	Learner Portal: Delivering the Curriculum	12–20	●●●●○	30/70	Deliver the curriculum to many students with passwordless auth and spaced review.
X1	Pretraining at Scale	12–20	●●●●○	50/50	Run a one-day cloud pretraining job and feel the MFU and cost dynamics.
X2	Multi-Modal Models	14–22	●●●●○	50/50	Load and reason about ViT/CLIP/Whisper end-to-end.
X3	RLHF / DPO / RLAIF	14–24	●●●●●	60/40	Derive DPO and reward modeling from first principles and align the tutor.
X4	Hardware Deep-Dive	10–16	●●●●○	65/35	Speak to H100/NVLink/AllReduce and datacenter economics at interview depth.
X5	Interview Prep	8–14	●●●○○	40/60	Convert the whole curriculum into interview signal you can deliver on demand.