English · Español
Phase 30 — Quizzes¶
🇪🇸 Espejo legible de
data/quizzes/phase-30-structured-generation.yaml.
Source of truth: data/quizzes/phase-30-structured-generation.yaml.
q-30-01 — When a regex mask is incorrect¶
You want the model to emit a balanced-parenthesis expression of arbitrary nesting depth. Why is a regex mask the wrong tool, even with carefully crafted patterns?
- Regex DFA compilation is too slow.
- Balanced parentheses are not a regular language — the pumping lemma rules it out — so no DFA accepts the full language.
- Regex masks consume too much VRAM during decoding.
- Regex cannot encode the start/end-of-string anchors needed.
Answer
**Choice 2.** Balanced-parens is the canonical CFG-but-not-regular example. A finite-state DFA cannot count arbitrary nesting depth; only a stack-based parser can. Bounded-depth regexes accept the wrong language.q-30-02 — Cost of re-deriving the JSON mask each token¶
A naive implementation re-derives the JSON-Schema allowed-token set at every decoding step. Roughly how much does this slow decoding compared to a precompiled JSON-Schema automaton?
- ≈ 1% slower
- ≈ 10% slower
- ≈ 200% slower (~3× total decode latency)
- ≈ 10× slower
Answer
**Choice 3 (~200%).** Per-token mask cost ~5 ms naive vs ~0.05 ms precompiled. At Mini-GPT's ~250 tok/s pace, that's roughly +200% overhead — bigger than the decoder itself. Precompile.q-30-03 — Mask family per use case¶
Match each output specification to the minimal correct mask family: (a) 5-digit zip code; (b) recursive S-expressions; © JSON object with typed fields; (d) one of 600 enumerated verb forms (flat list).
- (a) → regex
- (b) → CFG
- © → JSON-Schema
- (d) → regex (large alternation), or JSON-Schema enum if the verb sits inside JSON
Answer
**All four pairings.** Each is the smallest-expressivity correct choice. Note (d): even 600 alternations are still regular, so a regex DFA is minimal — unless the verb is wrapped in JSON.q-30-04 — Why §A13 scope benefits from enum constraints (free)¶
The §A13 grammar tutor returns JSON with a verb field. Why is using a JSON-Schema enum of the 20 allowed verbs strictly better than letting the model emit a free-form string for the verb?
Answer
The `enum` mechanically enforces the §A13 microscopic-**scope** invariant — the model cannot hallucinate a verb outside the 20 (e.g., "drink"). Without the enum, the scope guarantee depends on the model's training distribution, which is empirical, not structural.q-30-05 — Failure mode without the mask¶
You disable the JSON-Schema mask on the conjugate CLI and run 50 samples. Which of the following are likely failure categories you'd observe?
- Valid JSON with the wrong key names (e.g.,
españolinstead ofspanish) - Trailing-comma JSON (invalid by
json.loads) - Prose / markdown explanations instead of JSON
- All outputs are perfect because Mini-GPT was trained on JSON