Skip to content

English · Español

Phase 22 — Quizzes (mirror)

🇪🇸 Las preguntas canónicas viven en data/quizzes/phase-22-kv-cache.yaml.


q-22-01 — Memory math (mini-GPT)

Prompt (EN): For the Phase-17 mini-GPT (\(L=2, H=4, d_h=16\)) in fp16, what is the KV-cache size for \(B=4, S=64\)?

  • A. 8 KiB
  • B. 32 KiB
  • C. 128 KiB
  • D. 512 KiB

Correct: C. \(2 \cdot L \cdot H \cdot d_h \cdot S \cdot B \cdot s = 2 \cdot 2 \cdot 4 \cdot 16 \cdot 64 \cdot 4 \cdot 2 = 131{,}072\) bytes = 128 KiB.


q-22-02 — Off-by-one detection

Prompt (EN): A KV-cache implementation passes all tests at sequence length 8 but accuracy collapses on length-80 probes. What is the most likely bug?

  • A. fp16 overflow.
  • B. Off-by-one in position index (or similar silent stateful indexing bug).
  • C. Wrong attention head count.
  • D. Model checkpoint loaded with mismatched architecture.

Correct: B. The signature "fine at short length, collapses at long length" is the off-by-one fingerprint. The corruption accumulates with sequence length; short sequences hide it.


q-22-03 — Prefill vs decode distinction

Prompt (EN): In one or two sentences, explain the difference between prefill and decode and why the KV cache helps the decode phase but not the prefill phase.

Free response. Expected mentions: prefill processes the prompt in parallel (all positions at once, no cache reuse possible); decode generates one token at a time and reuses past K, V.


q-22-04 — Cache size scaling levers

Prompt (EN): Select every change that reduces KV-cache memory.

  • A. Switching from fp32 to fp16.
  • B. Reducing batch size \(B\).
  • C. Using Grouped-Query Attention (GQA) with fewer KV heads than Q heads.
  • D. Increasing \(d_\text{ff}\) (the FFN inner dimension).

Correct: A, B, C. The FFN inner dimension does not appear in the KV-cache formula; changing it doesn't affect cache size.


q-22-05 — Per-token marginal cost

Prompt (EN): For the Phase-17 mini-GPT in fp16, what is the per-generated-token marginal increase in cache memory (for \(B = 1\))?

  • A. 64 bytes
  • B. 256 bytes
  • C. 512 bytes
  • D. 1 KiB

Correct: C. \(2 \cdot L \cdot H \cdot d_h \cdot s = 2 \cdot 2 \cdot 4 \cdot 16 \cdot 2 = 512\) bytes per token (with \(B = 1\)).