English · Español
Phase 07 — Scalar Autograd from Scratch (minigrad.scalar)¶
Requires: 04 — Calculus & Optimization for AI · 06 — Python for AI Engineering Teaches:
autograd·computation-graph·reverse-mode·topological-sort·dagJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per
LYNX_CORTEX_ADDENDUM.md§A12. Theory and lab statements pre-written; solutions populated just-in-time at phase open.🇪🇸 La idea de backprop, en su forma más pequeña: cada operación entre dos números crea un nodo en un grafo, cada nodo guarda una pequeña función que dice "cómo paso el gradiente a mis padres", y al final recorremos el grafo en orden topológico inverso. Cuando lo construyes a mano para un escalar, PyTorch deja de ser magia.
Goal¶
Build, by hand, the smallest possible automatic differentiation engine: a Value class wrapping a single Python float that supports + - * / ** exp log relu tanh, accumulates gradients through reverse topological traversal, and trains a 2-layer MLP on a microscopic verb-tense identity task using only this engine.
The pedagogical claim: if you understand minigrad.scalar, you understand backprop forever. Every framework — PyTorch, JAX, TensorFlow — does fundamentally the same thing at tensor grain.
Topic anchor (§A13). Every worked example uses the English verb grammar grid: a loss like L = sum_i (logit_i - target_i)^2 over the 5 tenses of one verb (e.g., work), hand-derive dL/dlogit_i, see it match what the autograd computes. The autograd code is grammar-agnostic — Value does not know about verbs — but the demonstrations are anchored in the §A13 corpus so the topic is rehearsed continuously.
By phase close, Borja owns:
src/minigrad/scalar.py, ~150 LOC, his own implementation.- A test suite cross-checking every op against PyTorch FP64.
- A graphviz rendering of a small grammar-loss forward+backward DAG with values and gradients annotated.
- A working tense-identity MLP trained end-to-end without numpy in the autograd core.
Read order¶
theory/00-motivation.md— why scalar autograd is the right place to start.theory/01-computation-graphs.md— what a DAG is, how the forward pass builds it.theory/02-op-derivatives.md— hand-derive the local Jacobian for every op we'll implement.theory/03-worked-backprop.md— a complete worked example: forward + backward by hand for a small expression, then verified by finite differences.theory/04-reverse-mode-vs-forward-mode.md— why we pick reverse-mode for ML (n_outputs << n_inputs).lab/00-value-skeleton.md— write theValueclass skeleton, no ops yet.lab/01-implement-ops.md— fill in+ - * / ** exp log relu tanhwith their backwards.lab/02-train-xor.md(filename retained for git history; task is now 5-way tense identity for one verb — see lab header) — build a 2-layer MLP fromValueneurons and train it on the 5-input tense-identity task.
solutions/ populated at phase open.
Definition of Done¶
See PHASE_07_PLAN.md §6. Briefly:
src/minigrad/scalar.pypassesmypy --strict,ruff, all tests; cross-checked vs PyTorch FP64 to 1e-9.tests/test_scalar_graph.pycovers diamond dependencies (a node used in multiple downstream computations).experiments/07-train-tense-logits/shows loss < 0.5 within 300 epochs.experiments/07-visualize-graph/graph.svgcommitted; nodes labeled with both forward value and backward gradient./quiz 07≥ 70%.
What this phase intentionally does NOT cover¶
- Tensors / NumPy. Phase 8 lifts everything to NumPy arrays. Phase 7 is deliberately float-only — adding broadcasting at the same time as autograd is too much complexity at once.
- GPU. Phase 23+.
- Higher-order derivatives. Doable in this framework (build a graph over a graph), but not in scope; survey-only mention in
theory/04. jit/ graph rewriting. No optimization passes. Pure eager.- Optimizers as classes. Phase 9. Here we hand-roll
p.data -= lr * p.gradinline in the XOR training loop. - Production safety. No
nanclamping, no gradient clipping, nodetach. Add these in Phase 8 / Phase 18 when they earn their keep.
Phase 7's scope is exactly: DAG + reverse traversal + chain rule + ~10 ops, all at float grain. Resist scope creep.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 💻 micrograd — Karpathy · 2020. 100 lines of scalar autograd — the same thing you build.
- 📄 Automatic Differentiation in Machine Learning: a Survey — Baydin et al. · 2018. forward vs reverse mode, rigorously.