English · Español

Phase 30 — Quizzes¶

🇪🇸 Espejo legible de data/quizzes/phase-30-structured-generation.yaml.

Source of truth: data/quizzes/phase-30-structured-generation.yaml.

q-30-01 — When a regex mask is incorrect¶

You want the model to emit a balanced-parenthesis expression of arbitrary nesting depth. Why is a regex mask the wrong tool, even with carefully crafted patterns?

Regex DFA compilation is too slow.
Balanced parentheses are not a regular language — the pumping lemma rules it out — so no DFA accepts the full language.
Regex masks consume too much VRAM during decoding.
Regex cannot encode the start/end-of-string anchors needed.

Answer

**Choice 2.** Balanced-parens is the canonical CFG-but-not-regular example. A finite-state DFA cannot count arbitrary nesting depth; only a stack-based parser can. Bounded-depth regexes accept the wrong language.

q-30-02 — Cost of re-deriving the JSON mask each token¶

A naive implementation re-derives the JSON-Schema allowed-token set at every decoding step. Roughly how much does this slow decoding compared to a precompiled JSON-Schema automaton?

≈ 1% slower
≈ 10% slower
≈ 200% slower (~3× total decode latency)
≈ 10× slower

Answer

**Choice 3 (~200%).** Per-token mask cost ~5 ms naive vs ~0.05 ms precompiled. At Mini-GPT's ~250 tok/s pace, that's roughly +200% overhead — bigger than the decoder itself. Precompile.

q-30-03 — Mask family per use case¶

Match each output specification to the minimal correct mask family: (a) 5-digit zip code; (b) recursive S-expressions; © JSON object with typed fields; (d) one of 600 enumerated verb forms (flat list).

(a) → regex
(b) → CFG
© → JSON-Schema
(d) → regex (large alternation), or JSON-Schema enum if the verb sits inside JSON

Answer

**All four pairings.** Each is the smallest-expressivity correct choice. Note (d): even 600 alternations are still regular, so a regex DFA is minimal — unless the verb is wrapped in JSON.

q-30-04 — Why §A13 scope benefits from enum constraints (free)¶

The §A13 grammar tutor returns JSON with a verb field. Why is using a JSON-Schema enum of the 20 allowed verbs strictly better than letting the model emit a free-form string for the verb?

Answer

The `enum` mechanically enforces the §A13 microscopic-**scope** invariant — the model cannot hallucinate a verb outside the 20 (e.g., "drink"). Without the enum, the scope guarantee depends on the model's training distribution, which is empirical, not structural.

q-30-05 — Failure mode without the mask¶

You disable the JSON-Schema mask on the conjugate CLI and run 50 samples. Which of the following are likely failure categories you'd observe?

Valid JSON with the wrong key names (e.g., español instead of spanish)
Trailing-comma JSON (invalid by json.loads)
Prose / markdown explanations instead of JSON
All outputs are perfect because Mini-GPT was trained on JSON

Answer

**Choices 1, 2, 3.** Mini-GPT was trained on §A13 grammar text, not JSON-heavy data; free-form output produces a mix of off-schema JSON, broken JSON, and prose. The mask makes all three impossible.