Skip to content

English · Español

Phase 02 — Numerical Representation

Requires: 01 — Hardware & Computing Substrate Teaches: ieee-754 · floating-point · softmax-stability · log-sum-exp · precision · denormals Jump to any chapter from the phase reference index.

Chapter map

Pre-written per A12; re-anchored to A13 (English verb grammar). Theory + lab statements are stable drafts; solutions land just-in-time at phase open.

🇪🇸 La aritmética de coma flotante es un sistema de aproximación con reglas raras. Fase 2 te enseña esas reglas hasta que predices, sin ejecutar código, dónde habrá pérdida de precisión, overflow o NaN. Los ejemplos viven en el universo del modelo: conjugaciones verbales inglesas (§A13).


Goal

Internalize IEEE-754 deeply enough that Borja can:

  • Decode by hand a 32-bit float bit pattern into sign, exponent, mantissa, value — including small probabilities like 1/600 (the uniform distribution over the verb-form vocabulary of §A13).
  • Predict, without running code, which expressions overflow, underflow, lose precision, or NaN.
  • Re-derive the -max softmax trick and Kahan summation from scratch — illustrated on a length-5 logit vector that classifies a token as one of the five English tenses (infinitive / present / past / past-participle / future).
  • Quantify the rounding error of fp32-to-int8 round-trips, foreshadowing Phase 26's quantization deep dive.

This is the foundation for Phase 7 (scalar autograd), Phase 18 (training loop), and Phase 26 (quantization). Skip it and every later phase carries a leak.

Read order

  1. theory/00-motivation.md — why a precision phase before any linear algebra.
  2. theory/01-ieee754-anatomy.md — bit layout of fp64/fp32/fp16/bf16, denormals, special values, rounding.
  3. theory/02-softmax-stability.md — the -max trick, log-sum-exp, stable cross-entropy. The most important theory page in this phase.
  4. theory/03-summation-and-cancellation.md — catastrophic cancellation, Kahan and Neumaier summation, error bounds.
  5. theory/04-precision-zoo.md — BF16, TF32, FP8 E4M3/E5M2, INT8/INT4. Bit layouts and trade-offs. Foreshadows Phase 26.
  6. lab/00-bit-anatomy.md — decode floats by hand and by code, including the bit pattern of 1/600.
  7. lab/01-softmax-stability.md — break naive softmax on tense logits, then fix it.
  8. lab/02-summation-experiments.md — measure Kahan vs naive when summing N = 10⁶ per-form probabilities.
  9. lab/03-quantization-preview.md — round fp32 ↔ int8 on a logit array and quantify the loss.

solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.

Definition of Done

See PHASE_02_PLAN.md §6. Briefly:

  • Four experiment directories with manifests + output artefacts committed.
  • Borja can hand-derive the -max trick and explain why naive softmax overflows when one logit is ≳ 89.
  • Borja can identify, without running code, which arithmetic expressions over the verb-vocabulary distribution will lose precision or NaN.

What this phase intentionally does NOT cover

  • Gradients through these ops. That's Phase 4 (calculus) and Phase 7 (autograd).
  • A src/minigrad/ module. Phase 2 stays in experiments/; the stable primitives are scaffolded in Phase 7 when an autograd Value first consumes them.
  • Real quantization (calibration, error-bounded rounding, per-channel scales). Foreshadowed in lab/03 but the deep dive is Phase 26.
  • GPU floating-point (tensor cores, FP8 native ops). Phase 23+.
  • Interval arithmetic, posit numbers, unum, fixed-point DSP formats. Out of scope of the curriculum entirely.
  • Multi-precision libraries (mpmath, gmpy2). We use them as fp64 oracles for tests only.

Phase 2's scope is understanding the floating-point substrate well enough to predict numerical failure modes. Nothing more.

Further reading

Optional — enrichment, not required to pass the phase.