Skip to content

English · Español

Phase 07 — Scalar Autograd from Scratch (minigrad.scalar)

Requires: 04 — Calculus & Optimization for AI · 06 — Python for AI Engineering Teaches: autograd · computation-graph · reverse-mode · topological-sort · dag Jump to any chapter from the phase reference index.

Chapter map

Pre-written per LYNX_CORTEX_ADDENDUM.md §A12. Theory and lab statements pre-written; solutions populated just-in-time at phase open.

🇪🇸 La idea de backprop, en su forma más pequeña: cada operación entre dos números crea un nodo en un grafo, cada nodo guarda una pequeña función que dice "cómo paso el gradiente a mis padres", y al final recorremos el grafo en orden topológico inverso. Cuando lo construyes a mano para un escalar, PyTorch deja de ser magia.


Goal

Build, by hand, the smallest possible automatic differentiation engine: a Value class wrapping a single Python float that supports + - * / ** exp log relu tanh, accumulates gradients through reverse topological traversal, and trains a 2-layer MLP on a microscopic verb-tense identity task using only this engine.

The pedagogical claim: if you understand minigrad.scalar, you understand backprop forever. Every framework — PyTorch, JAX, TensorFlow — does fundamentally the same thing at tensor grain.

Topic anchor (§A13). Every worked example uses the English verb grammar grid: a loss like L = sum_i (logit_i - target_i)^2 over the 5 tenses of one verb (e.g., work), hand-derive dL/dlogit_i, see it match what the autograd computes. The autograd code is grammar-agnostic — Value does not know about verbs — but the demonstrations are anchored in the §A13 corpus so the topic is rehearsed continuously.

By phase close, Borja owns:

  • src/minigrad/scalar.py, ~150 LOC, his own implementation.
  • A test suite cross-checking every op against PyTorch FP64.
  • A graphviz rendering of a small grammar-loss forward+backward DAG with values and gradients annotated.
  • A working tense-identity MLP trained end-to-end without numpy in the autograd core.

Read order

  1. theory/00-motivation.md — why scalar autograd is the right place to start.
  2. theory/01-computation-graphs.md — what a DAG is, how the forward pass builds it.
  3. theory/02-op-derivatives.md — hand-derive the local Jacobian for every op we'll implement.
  4. theory/03-worked-backprop.md — a complete worked example: forward + backward by hand for a small expression, then verified by finite differences.
  5. theory/04-reverse-mode-vs-forward-mode.md — why we pick reverse-mode for ML (n_outputs << n_inputs).
  6. lab/00-value-skeleton.md — write the Value class skeleton, no ops yet.
  7. lab/01-implement-ops.md — fill in + - * / ** exp log relu tanh with their backwards.
  8. lab/02-train-xor.md (filename retained for git history; task is now 5-way tense identity for one verb — see lab header) — build a 2-layer MLP from Value neurons and train it on the 5-input tense-identity task.

solutions/ populated at phase open.

Definition of Done

See PHASE_07_PLAN.md §6. Briefly:

  • src/minigrad/scalar.py passes mypy --strict, ruff, all tests; cross-checked vs PyTorch FP64 to 1e-9.
  • tests/test_scalar_graph.py covers diamond dependencies (a node used in multiple downstream computations).
  • experiments/07-train-tense-logits/ shows loss < 0.5 within 300 epochs.
  • experiments/07-visualize-graph/graph.svg committed; nodes labeled with both forward value and backward gradient.
  • /quiz 07 ≥ 70%.

What this phase intentionally does NOT cover

  • Tensors / NumPy. Phase 8 lifts everything to NumPy arrays. Phase 7 is deliberately float-only — adding broadcasting at the same time as autograd is too much complexity at once.
  • GPU. Phase 23+.
  • Higher-order derivatives. Doable in this framework (build a graph over a graph), but not in scope; survey-only mention in theory/04.
  • jit / graph rewriting. No optimization passes. Pure eager.
  • Optimizers as classes. Phase 9. Here we hand-roll p.data -= lr * p.grad inline in the XOR training loop.
  • Production safety. No nan clamping, no gradient clipping, no detach. Add these in Phase 8 / Phase 18 when they earn their keep.

Phase 7's scope is exactly: DAG + reverse traversal + chain rule + ~10 ops, all at float grain. Resist scope creep.

Further reading

Optional — enrichment, not required to pass the phase.