English · Español
Phase 5 — Quizzes¶
🇪🇸 Espejo legible de
data/quizzes/phase-05-probability-information.yaml. Incluye la identidadH(p,q) = H(p) + KL(p||q)y la explicación de calibración.
Source of truth: data/quizzes/phase-05-probability-information.yaml.
q-05-01 — Entropy of uniform 5-tense distribution (free)¶
p = [0.2, 0.2, 0.2, 0.2, 0.2]. Compute H(p) in nats and bits.
Answer
`H = log(5) ≈ 1.609 nats ≈ 2.322 bits`. This is the **maximum** possible entropy for 5 classes — total uncertainty.q-05-02 — KL asymmetry (multi-choice)¶
- KL(p || q) = expected extra nats encoding p with a q-code.
- KL(p || q) = mode-covering: penalizes q when small where p is large.
- KL(q || p) = mode-seeking: allows q to concentrate on a single mode.
- KL is a metric.
- Cross-entropy training minimizes KL(p_data || p_model) = mode-covering.
Answer
**Choices 1, 2, 3, 5.** KL is not a metric (asymmetric, no triangle inequality).q-05-03 — Cross-entropy identity (free)¶
Show H(p, q) = H(p) + KL(p || q) and use it to explain why CE training ≡ KL minimization.
Answer
`H(p, q) = -Σ p_i log q_i = -Σ p_i log p_i + Σ p_i log(p_i / q_i) = H(p) + KL(p || q)`. Since `H(p_data)` is constant in θ, minimizing `H(p_data, p_model)` w.r.t. θ is identical to minimizing `KL(p_data || p_model)`.q-05-04 — Why is log_softmax more stable than log(softmax)?¶
- Because logsumexp uses fp64 internally.
- Because softmax can underflow to 0;
log(0) = -inf.log_softmaxnever materializes small probabilities. - Because logsumexp is differentiable and softmax is not.
- Because PyTorch fuses them on CUDA.