English · Español

Lab 02 — End-to-End Structured Conjugation CLI¶

Goal: wire mask + model + sampler into scripts/conjugate_structured.py; produce the contract artifact for Phase 31's tools and Phase 32's grammar-tutor agent.

Estimated time: 2–3 hours (assumes lab 01 finished).

Prereq: lab 01 (JSON-schema mask). MiniGPT checkpoint from Phase 18.

What you produce¶

scripts/conjugate_structured.py — CLI: python scripts/conjugate_structured.py "He ate pizza yesterday" → conjugation JSON on stdout.
experiments/30-end-to-end/ with:
sample_sentences/ — five English sentences (one per tense bucket: infinitive context, present_simple, past_simple, past_participle, simple_future).
outputs/ — captured CLI outputs for each.
README.md — for each sentence, the input, the output, your one-paragraph commentary on whether the model got it right semantically.

TODOs¶

Block A — CLI shape¶

scripts/conjugate_structured.py takes a single positional sentence argument OR stdin if no arg.
Loads MiniGPT checkpoint from models/minigpt-phase-18.npz (or wherever Phase 18 saves it).
Constructs JSONSchemaMask(tokenizer, conjugation_schema) once.

Builds a prompt:

Identify the main verb conjugation in this English sentence.
Return JSON with keys: verb (lemma), tense, person.

Sentence: <sentence>

Conjugation:

Calls generate(prompt, mask=mask, max_new_tokens=64).
Prints the output. Pipe-friendly: no extra prose, just the JSON.

Block B — terminate-on-done¶

When mask.is_done() is True, stop generation immediately (don't wait for max_new_tokens). This shortens average decode time meaningfully.
Test: an output of {"verb":"eat","tense":"past_simple","person":"3sg"} should NOT have trailing tokens.

Block C — validate-and-report¶

After generation, run json.loads and jsonschema.validate (lib OK here, post-hoc). If either fails, print the malformed output to stderr with a clear error and exit non-zero. (This should never happen in practice; the safety net catches mask bugs.)

Block D — sample sentences¶

Pick five English sentences covering the §A13 tense space. Recommended set:

Infinitive context. "I want to work tomorrow." — main verb want, present_simple, 1sg. (The infinitive work is also present; the lab specifies "main verb" — document the disambiguation rule in README.)
Present simple, 3sg. "She plays the piano every Sunday." — verb play, present_simple, 3sg.
Past simple, regular. "They listened to the radio." — they is plural; §A13 plurals are deferred, so this is out of scope. Skip and use: "He listened to the radio." — verb listen, past_simple, 3sg.
Past simple, irregular. "She wrote a long letter." — verb write, past_simple, 3sg.
Simple future. "I will study tomorrow." — verb study, simple_future, 1sg.

Run each through conjugate_structured.py. Save outputs. Comment in README on whether the content matches expectation. (Recall: Phase 30 guarantees parse, NOT correctness. If the model emits tense: "present_simple" for "She wrote a long letter", the mask did its job; the model is undertrained.)

Block E — pipeline smoke test¶

In tests/test_conjugate_cli.py:

test_conjugate_cli_returns_valid_json — run the CLI on a fixed sentence, assert the output parses and validates.
test_conjugate_cli_verb_in_scope — assert the verb field is one of the 20 enums.
test_conjugate_cli_tense_in_scope — assert the tense field is one of the 5 enums.
test_conjugate_cli_person_in_scope — assert the person field is one of the 3 enums.
test_conjugate_cli_exits_zero_on_valid — exit code is 0 on success.
test_conjugate_cli_exits_nonzero_on_no_input — empty stdin → non-zero exit + helpful error.

Constraints¶

No retries. If generation fails (parse error), the CLI exits with non-zero. Phase 30's contract is that this never happens.
No prose in output. The CLI outputs only the JSON. No header, no logging on stdout (logging goes to stderr).
No transformers import. Use the hand-built MiniGPT.
Deterministic seed flag. --seed N for reproducibility. Default seed = 0 for the smoke tests.

Stop conditions¶

Done when:

CLI runs on all five sample sentences and produces parseable, schema-valid JSON.
The Block E tests pass.
README documents which sentence the model got right semantically and which it got wrong. (This is data for Phase 28 fine-tuning later, not a Phase 30 failure.)

Pitfalls¶

MiniGPT is undertrained. The model from Phase 18 may produce semantically wrong but parseable output. That's fine for Phase 30's DoD. Don't get distracted trying to improve content quality here.
Context truncation. If the sentence + prompt is larger than the model's context window, you'll truncate. Hard-clip to the last N tokens that fit; document in README.
EOS in the middle of a string. The model might want to emit EOS early. Your mask should forbid EOS until the schema's DONE state is reached. Test this.
Trailing whitespace in CLI output. Strip before printing — downstream pipes are sensitive.
Disambiguating "main verb". "I want to work tomorrow" contains two verbs; the schema asks for one. The prompt must instruct the model to pick the finite verb (the one carrying the tense). Document the rule.

When to consult `solutions/`¶

After the CLI works end-to-end. The solution's interest is mainly in how the prompt is structured (a minor art form that doesn't change correctness but affects KL per step).

Next lab: lab/03-mask-overhead.md.