Skip to content

English · Español

Phase 05 — Probability & Information Theory

Requires: 04 — Calculus & Optimization for AI Teaches: probability · entropy · kl-divergence · cross-entropy · mle · perplexity Jump to any chapter from the phase reference index.

Chapter map

🇪🇸 Toda red neuronal entrenada con cross_entropy_loss está optimizando información. Esta fase es donde aprendemos qué información, por qué esa función de pérdida, y cómo manejarla sin reventar la precisión numérica.

The mini-GPT we will build outputs a probability distribution over ≈600 verb forms (the §A13 corpus). Training it means making that distribution match the ground truth: cross-entropy loss is the language we'll use. Phase 05 derives that language from first principles so that nothing about Phase 07's autograd or Phase 18's training loop is mysterious.

What you build here

  • A working understanding of discrete probability over the verb-form vocabulary.
  • Hand-derivations of entropy, KL divergence, cross-entropy, MLE.
  • A numerically stable log_softmax and cross_entropy_logits in NumPy.
  • A calibration analysis on a toy classifier over the 5 tenses.

What this phase does NOT cover

  • Continuous distributions (deferred — the categorical is all we need for the microscopic §A13 scope).
  • MCMC, variational inference, Bayesian deep learning (out of scope; ML systems uses point estimates).
  • Mutual information labs (kept theory-only here; paired with probing in Phase 20).

Files

See PHASE_05_PLAN.md at the repo root for the full plan.

Next: theory/00-motivation.md

Further reading

Optional — enrichment, not required to pass the phase.