English · Español
Phase 30 — Structured Generation & Constrained Decoding¶
Requires: 21 — Inference Internals & Sampling · 29 — Retrieval-Augmented Generation (RAG) Teaches:
structured-generation·json-mode·logit-masking·grammar-constrained-decodingJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.
🇪🇸 Las palabras "JSON mode" son marketing; la mecánica subyacente es enmascarar logits. En cada paso, ponemos
-infa los tokens que romperían el esquema y dejamos al softmax repartir la masa entre los legales. Eso es todo.
Goal¶
Make MiniGPT emit valid, schema-conformant JSON every single time, by constraining the per-step token distribution to a precomputed set of legal next tokens. The phase ends with a CLI that takes an English sentence and emits the canonical conjugation triple {verb, tense, person} — the contract that Phases 31 (tools), 32 (agent), and 33 (serving) will consume.
This phase introduces the new module src/ministruct/ — the structured-generation primitive used by every later phase.
Read order¶
theory/00-motivation.md— why free-form output is fundamentally unreliable for downstream tooling, and what masking buys us.theory/01-jsonmode-vs-grammar.md— the spectrum from "ask nicely" to "JSON mode" to "GBNF grammar"; what each gives up.theory/02-logit-masks.md— the derivation: constrained sampling = unconstrained sampling with a-infmask. Math, KL, composition with sampling knobs.theory/03-grammar-as-dfa.md— production implementations precompile the grammar into an automaton; we derive why and describe the structure for the conjugation schema.lab/00-regex-mask.md— warm-up: mask a single regex (digits-only, fixed-length).lab/01-json-schema-mask.md— escalate to the conjugation JSON schema (verb ∈ enum-of-20,tense ∈ enum-of-5,person ∈ enum-of-3).lab/02-end-to-end-conjugate.md— wire mask + MiniGPT + sampler intoscripts/conjugate_structured.py. (The filelab/02-end-to-end-audit.mdis the superseded stub from §A1; see its content for the pointer.)lab/03-mask-overhead.md— measure the cost of masking per step.
solutions/ is empty during pre-write — populated at phase open after Borja's Phase 21 sampler API is visible.
Definition of Done¶
See PHASE_30_PLAN.md §6. Briefly:
- 100% parse rate on the §A13 conjugation eval probe set.
scripts/conjugate_structured.pyruns end-to-end.- Per-step overhead documented (target: < 2× unconstrained).
src/ministruct/{mask,dfa,schemas}.pyimplemented, typed, tested.
What this phase intentionally does NOT cover¶
- Constrained beam search. Beam × mask interaction is mentioned, not implemented.
- GBNF grammar parser. We describe the format; we do not write a parser. (llama.cpp has one; reading its source is a stretch goal, not a DoD requirement.)
- Tokenizer-aware grammar precompilation across arbitrary tokenizers. Production Outlines does this; we explain the algorithm in
theory/03-grammar-as-dfa.mdwithout coding the general case. Our DFA targets only the §A13 BPE vocabulary. - Partial-token (mid-BPE) masking. Per-token mask only.
- Streaming masks under SSE / WebSocket. That's a Phase 33 (serving) concern. The Phase 30 mask is per-completion.
- Constraints on tool-call argument schemas. That's Phase 31 territory; the mechanism is identical, so we point at it and move on.
- Correctness of the conjugation itself. Phase 30 guarantees the shape of the output. Whether
"verb": "goed"is the right answer for "He went home" is a Phase 32 (agent) and Phase 20 (eval) concern, not Phase 30's.
Phase 30's scope is: logit masking, schema-constrained JSON output, end-to-end conjugation CLI, the src/ministruct/ module. Nothing more.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 Efficient Guided Generation for LLMs (Outlines) — Willard & Louf · 2023. constrained decoding as a finite-state machine.