English · Español

Phase 30 — Structured Generation & Constrained Decoding¶

Requires: 21 — Inference Internals & Sampling · 29 — Retrieval-Augmented Generation (RAG) Teaches: structured-generation · json-mode · logit-masking · grammar-constrained-decoding Jump to any chapter from the phase reference index.

Chapter map¶

Pre-written per A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 Las palabras "JSON mode" son marketing; la mecánica subyacente es enmascarar logits. En cada paso, ponemos -inf a los tokens que romperían el esquema y dejamos al softmax repartir la masa entre los legales. Eso es todo.

Goal¶

Make MiniGPT emit valid, schema-conformant JSON every single time, by constraining the per-step token distribution to a precomputed set of legal next tokens. The phase ends with a CLI that takes an English sentence and emits the canonical conjugation triple {verb, tense, person} — the contract that Phases 31 (tools), 32 (agent), and 33 (serving) will consume.

This phase introduces the new module src/ministruct/ — the structured-generation primitive used by every later phase.

Read order¶

theory/00-motivation.md — why free-form output is fundamentally unreliable for downstream tooling, and what masking buys us.
theory/01-jsonmode-vs-grammar.md — the spectrum from "ask nicely" to "JSON mode" to "GBNF grammar"; what each gives up.
theory/02-logit-masks.md — the derivation: constrained sampling = unconstrained sampling with a -inf mask. Math, KL, composition with sampling knobs.
theory/03-grammar-as-dfa.md — production implementations precompile the grammar into an automaton; we derive why and describe the structure for the conjugation schema.
lab/00-regex-mask.md — warm-up: mask a single regex (digits-only, fixed-length).
lab/01-json-schema-mask.md — escalate to the conjugation JSON schema (verb ∈ enum-of-20, tense ∈ enum-of-5, person ∈ enum-of-3).
lab/02-end-to-end-conjugate.md — wire mask + MiniGPT + sampler into scripts/conjugate_structured.py. (The file lab/02-end-to-end-audit.md is the superseded stub from §A1; see its content for the pointer.)
lab/03-mask-overhead.md — measure the cost of masking per step.

solutions/ is empty during pre-write — populated at phase open after Borja's Phase 21 sampler API is visible.

Definition of Done¶

See PHASE_30_PLAN.md §6. Briefly:

100% parse rate on the §A13 conjugation eval probe set.
scripts/conjugate_structured.py runs end-to-end.
Per-step overhead documented (target: < 2× unconstrained).
src/ministruct/{mask,dfa,schemas}.py implemented, typed, tested.

What this phase intentionally does NOT cover¶

Constrained beam search. Beam × mask interaction is mentioned, not implemented.
GBNF grammar parser. We describe the format; we do not write a parser. (llama.cpp has one; reading its source is a stretch goal, not a DoD requirement.)
Tokenizer-aware grammar precompilation across arbitrary tokenizers. Production Outlines does this; we explain the algorithm in theory/03-grammar-as-dfa.md without coding the general case. Our DFA targets only the §A13 BPE vocabulary.
Partial-token (mid-BPE) masking. Per-token mask only.
Streaming masks under SSE / WebSocket. That's a Phase 33 (serving) concern. The Phase 30 mask is per-completion.
Constraints on tool-call argument schemas. That's Phase 31 territory; the mechanism is identical, so we point at it and move on.
Correctness of the conjugation itself. Phase 30 guarantees the shape of the output. Whether "verb": "goed" is the right answer for "He went home" is a Phase 32 (agent) and Phase 20 (eval) concern, not Phase 30's.

Phase 30's scope is: logit masking, schema-constrained JSON output, end-to-end conjugation CLI, the src/ministruct/ module. Nothing more.