English · Español
Lab 02 — End-to-End Structured Conjugation CLI¶
Goal: wire mask + model + sampler into
scripts/conjugate_structured.py; produce the contract artifact for Phase 31's tools and Phase 32's grammar-tutor agent.Estimated time: 2–3 hours (assumes lab 01 finished).
Prereq: lab 01 (JSON-schema mask). MiniGPT checkpoint from Phase 18.
What you produce¶
scripts/conjugate_structured.py— CLI:python scripts/conjugate_structured.py "He ate pizza yesterday"→ conjugation JSON on stdout.experiments/30-end-to-end/with:sample_sentences/— five English sentences (one per tense bucket: infinitive context, present_simple, past_simple, past_participle, simple_future).outputs/— captured CLI outputs for each.README.md— for each sentence, the input, the output, your one-paragraph commentary on whether the model got it right semantically.
TODOs¶
Block A — CLI shape¶
-
scripts/conjugate_structured.pytakes a single positional sentence argument OR stdin if no arg. - Loads MiniGPT checkpoint from
models/minigpt-phase-18.npz(or wherever Phase 18 saves it). - Constructs
JSONSchemaMask(tokenizer, conjugation_schema)once. - Builds a prompt:
- Calls
generate(prompt, mask=mask, max_new_tokens=64). - Prints the output. Pipe-friendly: no extra prose, just the JSON.
Block B — terminate-on-done¶
- When
mask.is_done()is True, stop generation immediately (don't wait formax_new_tokens). This shortens average decode time meaningfully. - Test: an output of
{"verb":"eat","tense":"past_simple","person":"3sg"}should NOT have trailing tokens.
Block C — validate-and-report¶
- After generation, run
json.loadsandjsonschema.validate(lib OK here, post-hoc). If either fails, print the malformed output to stderr with a clear error and exit non-zero. (This should never happen in practice; the safety net catches mask bugs.)
Block D — sample sentences¶
Pick five English sentences covering the §A13 tense space. Recommended set:
- Infinitive context.
"I want to work tomorrow."— main verbwant, present_simple, 1sg. (The infinitiveworkis also present; the lab specifies "main verb" — document the disambiguation rule in README.) - Present simple, 3sg.
"She plays the piano every Sunday."— verbplay, present_simple, 3sg. - Past simple, regular.
"They listened to the radio."—theyis plural; §A13 plurals are deferred, so this is out of scope. Skip and use:"He listened to the radio."— verblisten, past_simple, 3sg. - Past simple, irregular.
"She wrote a long letter."— verbwrite, past_simple, 3sg. - Simple future.
"I will study tomorrow."— verbstudy, simple_future, 1sg.
Run each through conjugate_structured.py. Save outputs. Comment in README on whether the content matches expectation. (Recall: Phase 30 guarantees parse, NOT correctness. If the model emits tense: "present_simple" for "She wrote a long letter", the mask did its job; the model is undertrained.)
Block E — pipeline smoke test¶
In tests/test_conjugate_cli.py:
-
test_conjugate_cli_returns_valid_json— run the CLI on a fixed sentence, assert the output parses and validates. -
test_conjugate_cli_verb_in_scope— assert theverbfield is one of the 20 enums. -
test_conjugate_cli_tense_in_scope— assert thetensefield is one of the 5 enums. -
test_conjugate_cli_person_in_scope— assert thepersonfield is one of the 3 enums. -
test_conjugate_cli_exits_zero_on_valid— exit code is 0 on success. -
test_conjugate_cli_exits_nonzero_on_no_input— empty stdin → non-zero exit + helpful error.
Constraints¶
- No retries. If generation fails (parse error), the CLI exits with non-zero. Phase 30's contract is that this never happens.
- No prose in output. The CLI outputs only the JSON. No header, no logging on stdout (logging goes to stderr).
- No
transformersimport. Use the hand-built MiniGPT. - Deterministic seed flag.
--seed Nfor reproducibility. Default seed = 0 for the smoke tests.
Stop conditions¶
Done when:
- CLI runs on all five sample sentences and produces parseable, schema-valid JSON.
- The Block E tests pass.
- README documents which sentence the model got right semantically and which it got wrong. (This is data for Phase 28 fine-tuning later, not a Phase 30 failure.)
Pitfalls¶
- MiniGPT is undertrained. The model from Phase 18 may produce semantically wrong but parseable output. That's fine for Phase 30's DoD. Don't get distracted trying to improve content quality here.
- Context truncation. If the sentence + prompt is larger than the model's context window, you'll truncate. Hard-clip to the last N tokens that fit; document in README.
- EOS in the middle of a string. The model might want to emit EOS early. Your mask should forbid EOS until the schema's DONE state is reached. Test this.
- Trailing whitespace in CLI output. Strip before printing — downstream pipes are sensitive.
- Disambiguating "main verb". "I want to work tomorrow" contains two verbs; the schema asks for one. The prompt must instruct the model to pick the finite verb (the one carrying the tense). Document the rule.
When to consult solutions/¶
After the CLI works end-to-end. The solution's interest is mainly in how the prompt is structured (a minor art form that doesn't change correctness but affects KL per step).
Next lab: lab/03-mask-overhead.md.