Skip to content

English · Español

Lab 00 — Entropy by hand

Read theory/02-entropy-and-kl.md before starting. Do not consult solutions/.

Objective

Hand-compute entropy on a small categorical distribution; verify the upper-bound proof works; implement entropy(p) in NumPy with the correct handling of \(p_i = 0\).

Setup

Use the §A13 verb-tense alphabet: {infinitive, present, past, past_participle, future} (\(V = 5\)).

Tasks

Task 1 — entropy on paper

For each of the following distributions over the 5 tenses, compute \(H(p)\) in nats by hand. Show your arithmetic.

Distribution \(p\)
A \((1, 0, 0, 0, 0)\)
B \((0.5, 0.5, 0, 0, 0)\)
C \((0.25, 0.25, 0.25, 0.25, 0)\)
D \((0.2, 0.2, 0.2, 0.2, 0.2)\) (uniform)
E \((0.6, 0.1, 0.1, 0.1, 0.1)\)

Predict which has the largest entropy before you compute. Then check.

Task 2 — implement entropy(p) in NumPy

Constraints:

  • Pure NumPy; no scipy.stats.entropy, no PyTorch.
  • Must handle \(p_i = 0\) correctly (convention: \(0 \log 0 = 0\)). Hint: there is a one-line idiom involving np.where or xlogy.
  • Must validate that \(p\) is a valid probability vector: shape, non-negative, sums to 1 within tolerance.
  • Module: src/phase05/probability.py (this is a Phase 05 scratch module — does NOT graduate into src/utils/; that's a later phase).

Signature suggestion:

def entropy(p: NDArray[np.float64]) -> float:
    """Return H(p) in nats. Raises ValueError if p is not a valid distribution."""

Task 3 — verify the upper bound numerically

For \(V \in \{2, 5, 10, 100, 600\}\):

  1. Sample 1000 random distributions on \(V\) outcomes (e.g., from a Dirichlet(1, ..., 1)).
  2. For each, compute \(H\).
  3. Verify \(H \le \log V\) for every sample.
  4. Plot the histogram of \(H / \log V\) — most should be close to 1 (Dirichlet(1,...,1) concentrates near uniform for moderate \(V\)).

Task 4 — reproduce the Jensen proof

In your lab notes (or a notebook cell), write out the proof:

\[H(p) = \mathbb{E}_p[\log(1/p_i)] \le \log \mathbb{E}_p[1/p_i] = \log V.\]

Justify each step. Identify exactly where Jensen's inequality is used and verify the concavity claim.

Measurements to capture

  • Wall-clock to compute entropy(p) on \(V = 600\), 100k samples (should be ≲ 10 ms — it's a tiny op).
  • Sample manifest under experiments/<date>-phase-05-entropy/manifest.json per src/utils/seeding.py.
  • The histogram from Task 3 saved as experiments/<date>-phase-05-entropy/histogram.png.

Acceptance

  • All 5 distributions A-E have correct entropies computed on paper.
  • entropy(p) handles \(p_i = 0\) without NaN.
  • entropy(p) raises ValueError on invalid inputs (non-normalised, negative values, wrong shape).
  • Property tests pass: entropy(uniform_V) ≈ log(V), entropy(point_mass) == 0.
  • Histogram plot exists; visual check that all samples respect the bound.
  • Jensen proof reproduced in your notes.

Pitfalls to expect

  • np.log(p) on a vector with zeros silently emits -inf; multiplying by 0 gives NaN (the IEEE-754 0 * inf rule). Use np.where(p > 0, p * np.log(p), 0.0) or scipy.special.xlogy(p, p).
  • Dirichlet samples may not be exactly normalised (rounding); your validator should allow np.isclose(p.sum(), 1.0) with default tolerance.
  • Confusing nats and bits: \(H\) in nats uses np.log; in bits use np.log2. The convention for this project is nats (matches PyTorch's F.cross_entropy).

Next: 01-kl-and-cross-entropy.md