English · Español
Phase 21 — Quizzes (mirror)¶
🇪🇸 Las preguntas canónicas viven en
data/quizzes/phase-21-inference-sampling.yaml.
q-21-01 — Top-p cutoff arithmetic¶
Prompt (EN): Given probabilities (sorted descending): [0.50, 0.20, 0.15, 0.10, 0.05], what is the smallest top-p set for p = 0.8?
- A. {first 1 token}
- B. {first 2 tokens}
- C. {first 3 tokens}
- D. {first 4 tokens}
Correct: C. Cumulative: 0.50 → 0.70 → 0.85 → 0.95. First prefix with cumulative ≥ 0.8 is {first 3}.
q-21-02 — Why p=1.0 admits garbage¶
Prompt (EN): Why does top_p = 1.0 produce occasional garbage outputs even when the model assigns very low probability to those tokens?
- A. The model's softmax is broken.
- B. No truncation; the tail of the distribution is occasionally sampled despite low probability.
- C. Temperature is implicitly 0.
- D. The tokenizer is producing invalid tokens.
Correct: B. With no filter, sampling occasionally lands on tail tokens; even \(p = 10^{-4}\) becomes visible over 1000 generations.
q-21-03 — Temperature vs top-p¶
Prompt (EN): In one or two sentences, explain when temperature scaling and top-p sampling produce different outputs, and which is more appropriate when you want to admit "creative but plausible" continuations.
Free response. Expected mentions: temperature softens/sharpens distribution but keeps full support; top-p truncates and renormalizes. Top-p is generally preferred when you want plausibility — temperature alone never zeroes out tail tokens.
q-21-04 — Adaptive vs fixed-size truncation¶
Prompt (EN): Select every statement that correctly characterizes top-p sampling versus top-k sampling.
- A. Top-p keeps an adaptive number of tokens based on the distribution's entropy.
- B. Top-k keeps a fixed number of tokens regardless of entropy.
- C. On a confident (peaked) distribution, top-p with
p = 0.95keeps fewer tokens than top-k withk = 50. - D. Top-p is always faster than top-k.
Correct: A, B, C. Top-p does add a sort step but the cost is comparable; D is false.
q-21-05 — Beam search vs sampling¶
Prompt (EN): For the §A13 grammar tutor (which proposes corrections to a learner's sentence and benefits from showing multiple plausible alternatives), is beam search or top-p sampling the better choice?
- A. Beam search — it produces the highest-likelihood outputs.
- B. Top-p sampling — it produces diverse outputs whose distribution matches the model's assigned plausibility.
- C. Greedy — it is deterministic.
- D. Either works equally well.
Correct: B. Beam search gives the top-N highest-likelihood beams, which tend to be near-duplicates of each other (small variations of the same completion). Top-p draws diverse samples whose composition reflects the model's uncertainty.