English · Español
Phase 02 — Numerical Representation¶
Requires: 01 — Hardware & Computing Substrate Teaches:
ieee-754·floating-point·softmax-stability·log-sum-exp·precision·denormalsJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per A12; re-anchored to A13 (English verb grammar). Theory + lab statements are stable drafts; solutions land just-in-time at phase open.
🇪🇸 La aritmética de coma flotante es un sistema de aproximación con reglas raras. Fase 2 te enseña esas reglas hasta que predices, sin ejecutar código, dónde habrá pérdida de precisión, overflow o NaN. Los ejemplos viven en el universo del modelo: conjugaciones verbales inglesas (§A13).
Goal¶
Internalize IEEE-754 deeply enough that Borja can:
- Decode by hand a 32-bit float bit pattern into sign, exponent, mantissa, value — including small probabilities like
1/600(the uniform distribution over the verb-form vocabulary of §A13). - Predict, without running code, which expressions overflow, underflow, lose precision, or NaN.
- Re-derive the
-maxsoftmax trick and Kahan summation from scratch — illustrated on a length-5 logit vector that classifies a token as one of the five English tenses (infinitive / present / past / past-participle / future). - Quantify the rounding error of fp32-to-int8 round-trips, foreshadowing Phase 26's quantization deep dive.
This is the foundation for Phase 7 (scalar autograd), Phase 18 (training loop), and Phase 26 (quantization). Skip it and every later phase carries a leak.
Read order¶
theory/00-motivation.md— why a precision phase before any linear algebra.theory/01-ieee754-anatomy.md— bit layout of fp64/fp32/fp16/bf16, denormals, special values, rounding.theory/02-softmax-stability.md— the-maxtrick, log-sum-exp, stable cross-entropy. The most important theory page in this phase.theory/03-summation-and-cancellation.md— catastrophic cancellation, Kahan and Neumaier summation, error bounds.theory/04-precision-zoo.md— BF16, TF32, FP8 E4M3/E5M2, INT8/INT4. Bit layouts and trade-offs. Foreshadows Phase 26.lab/00-bit-anatomy.md— decode floats by hand and by code, including the bit pattern of1/600.lab/01-softmax-stability.md— break naive softmax on tense logits, then fix it.lab/02-summation-experiments.md— measure Kahan vs naive when summingN = 10⁶per-form probabilities.lab/03-quantization-preview.md— round fp32 ↔ int8 on a logit array and quantify the loss.
solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.
Definition of Done¶
See PHASE_02_PLAN.md §6. Briefly:
- Four experiment directories with manifests + output artefacts committed.
- Borja can hand-derive the
-maxtrick and explain why naive softmax overflows when one logit is≳ 89. - Borja can identify, without running code, which arithmetic expressions over the verb-vocabulary distribution will lose precision or NaN.
What this phase intentionally does NOT cover¶
- Gradients through these ops. That's Phase 4 (calculus) and Phase 7 (autograd).
- A
src/minigrad/module. Phase 2 stays inexperiments/; the stable primitives are scaffolded in Phase 7 when an autogradValuefirst consumes them. - Real quantization (calibration, error-bounded rounding, per-channel scales). Foreshadowed in
lab/03but the deep dive is Phase 26. - GPU floating-point (tensor cores, FP8 native ops). Phase 23+.
- Interval arithmetic, posit numbers, unum, fixed-point DSP formats. Out of scope of the curriculum entirely.
- Multi-precision libraries (
mpmath,gmpy2). We use them as fp64 oracles for tests only.
Phase 2's scope is understanding the floating-point substrate well enough to predict numerical failure modes. Nothing more.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 What Every Computer Scientist Should Know About Floating-Point Arithmetic — Goldberg · 1991. the floating-point bible behind softmax stability.