English · Español
Phase 05 — Probability & Information Theory¶
Requires: 04 — Calculus & Optimization for AI Teaches:
probability·entropy·kl-divergence·cross-entropy·mle·perplexityJump to any chapter from the phase reference index.
Chapter map¶
🇪🇸 Toda red neuronal entrenada con
cross_entropy_lossestá optimizando información. Esta fase es donde aprendemos qué información, por qué esa función de pérdida, y cómo manejarla sin reventar la precisión numérica.
The mini-GPT we will build outputs a probability distribution over ≈600 verb forms (the §A13 corpus). Training it means making that distribution match the ground truth: cross-entropy loss is the language we'll use. Phase 05 derives that language from first principles so that nothing about Phase 07's autograd or Phase 18's training loop is mysterious.
What you build here¶
- A working understanding of discrete probability over the verb-form vocabulary.
- Hand-derivations of entropy, KL divergence, cross-entropy, MLE.
- A numerically stable
log_softmaxandcross_entropy_logitsin NumPy. - A calibration analysis on a toy classifier over the 5 tenses.
What this phase does NOT cover¶
- Continuous distributions (deferred — the categorical is all we need for the microscopic §A13 scope).
- MCMC, variational inference, Bayesian deep learning (out of scope; ML systems uses point estimates).
- Mutual information labs (kept theory-only here; paired with probing in Phase 20).
Files¶
- Theory:
00-motivation.md— why this phase exists01-discrete-distributions.md— distributions over conjugations02-entropy-and-kl.md— uncertainty and distance03-cross-entropy-and-mle.md— the training-loss derivation04-log-sum-exp-and-stability.md— numerical safety- Lab statements:
00-entropy-by-hand.md01-kl-and-cross-entropy.md02-log-sum-exp.md03-calibration.md
See PHASE_05_PLAN.md at the repo root for the full plan.
Next: theory/00-motivation.md
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 A Mathematical Theory of Communication — Shannon · 1948. where entropy and bits come from.
- 📕 Information Theory, Inference, and Learning Algorithms — MacKay · 2003. free, deep, and exactly on-topic.