English · Español

Fase 22 — Quizzes (espejo)¶

🇪🇸 Las preguntas canónicas viven en data/quizzes/phase-22-kv-cache.yaml.

q-22-01 — Matemática de memoria (mini-GPT)¶

Prompt (EN): For the Phase-17 mini-GPT (\(L=2, H=4, d_h=16\)) in fp16, what is the KV-cache size for \(B=4, S=64\)?

A. 8 KiB
B. 32 KiB
C. 128 KiB
D. 512 KiB

Correcta: C. \(2 \cdot L \cdot H \cdot d_h \cdot S \cdot B \cdot s = 2 \cdot 2 \cdot 4 \cdot 16 \cdot 64 \cdot 4 \cdot 2 = 131{,}072\) bytes = 128 KiB.

q-22-02 — Detección de off-by-one¶

Prompt (EN): A KV-cache implementation passes all tests at sequence length 8 but accuracy collapses on length-80 probes. What is the most likely bug?

A. fp16 overflow.
B. Off-by-one in position index (or similar silent stateful indexing bug).
C. Wrong attention head count.
D. Model checkpoint loaded with mismatched architecture.

Correcta: B. La firma "bien a corta longitud, colapsa a longitud larga" es la huella del off-by-one. La corrupción se acumula con la longitud de secuencia; las secuencias cortas la ocultan.

q-22-03 — Distinción prefill vs decode¶

Prompt (EN): In one or two sentences, explain the difference between prefill and decode and why the KV cache helps the decode phase but not the prefill phase.

Respuesta libre. Menciones esperadas: el prefill procesa el prompt en paralelo (todas las posiciones a la vez, sin posibilidad de reutilizar caché); el decode genera un token a la vez y reutiliza K, V pasados.

q-22-04 — Palancas de escalado del tamaño del caché¶

Prompt (EN): Select every change that reduces KV-cache memory.

A. Switching from fp32 to fp16.
B. Reducing batch size \(B\).
C. Using Grouped-Query Attention (GQA) with fewer KV heads than Q heads.
D. Increasing \(d_\text{ff}\) (the FFN inner dimension).

Correcta: A, B, C. La dimensión interna del FFN no aparece en la fórmula del KV cache; cambiarla no afecta al tamaño del caché.

q-22-05 — Coste marginal por token¶

Prompt (EN): For the Phase-17 mini-GPT in fp16, what is the per-generated-token marginal increase in cache memory (for \(B = 1\))?

A. 64 bytes
B. 256 bytes
C. 512 bytes
D. 1 KiB

Correcta: C. \(2 \cdot L \cdot H \cdot d_h \cdot s = 2 \cdot 2 \cdot 4 \cdot 16 \cdot 2 = 512\) bytes por token (con \(B = 1\)).