English · Español

Phase 21 — Quizzes (mirror)¶

🇪🇸 Las preguntas canónicas viven en data/quizzes/phase-21-inference-sampling.yaml.

q-21-01 — Top-p cutoff arithmetic¶

Prompt (EN): Given probabilities (sorted descending): [0.50, 0.20, 0.15, 0.10, 0.05], what is the smallest top-p set for p = 0.8?

A. {first 1 token}
B. {first 2 tokens}
C. {first 3 tokens}
D. {first 4 tokens}

Correct: C. Cumulative: 0.50 → 0.70 → 0.85 → 0.95. First prefix with cumulative ≥ 0.8 is {first 3}.

q-21-02 — Why p=1.0 admits garbage¶

Prompt (EN): Why does top_p = 1.0 produce occasional garbage outputs even when the model assigns very low probability to those tokens?

A. The model's softmax is broken.
B. No truncation; the tail of the distribution is occasionally sampled despite low probability.
C. Temperature is implicitly 0.
D. The tokenizer is producing invalid tokens.

Correct: B. With no filter, sampling occasionally lands on tail tokens; even \(p = 10^{-4}\) becomes visible over 1000 generations.

q-21-03 — Temperature vs top-p¶

Prompt (EN): In one or two sentences, explain when temperature scaling and top-p sampling produce different outputs, and which is more appropriate when you want to admit "creative but plausible" continuations.

Free response. Expected mentions: temperature softens/sharpens distribution but keeps full support; top-p truncates and renormalizes. Top-p is generally preferred when you want plausibility — temperature alone never zeroes out tail tokens.

q-21-04 — Adaptive vs fixed-size truncation¶

Prompt (EN): Select every statement that correctly characterizes top-p sampling versus top-k sampling.

A. Top-p keeps an adaptive number of tokens based on the distribution's entropy.
B. Top-k keeps a fixed number of tokens regardless of entropy.
C. On a confident (peaked) distribution, top-p with p = 0.95 keeps fewer tokens than top-k with k = 50.
D. Top-p is always faster than top-k.

Correct: A, B, C. Top-p does add a sort step but the cost is comparable; D is false.

q-21-05 — Beam search vs sampling¶

Prompt (EN): For the §A13 grammar tutor (which proposes corrections to a learner's sentence and benefits from showing multiple plausible alternatives), is beam search or top-p sampling the better choice?

A. Beam search — it produces the highest-likelihood outputs.
B. Top-p sampling — it produces diverse outputs whose distribution matches the model's assigned plausibility.
C. Greedy — it is deterministic.
D. Either works equally well.

Correct: B. Beam search gives the top-N highest-likelihood beams, which tend to be near-duplicates of each other (small variations of the same completion). Top-p draws diverse samples whose composition reflects the model's uncertainty.