Skip to content

English · Español

Lab 02 — End-to-End Structured Conjugation CLI

Goal: wire mask + model + sampler into scripts/conjugate_structured.py; produce the contract artifact for Phase 31's tools and Phase 32's grammar-tutor agent.

Estimated time: 2–3 hours (assumes lab 01 finished).

Prereq: lab 01 (JSON-schema mask). MiniGPT checkpoint from Phase 18.


What you produce

  • scripts/conjugate_structured.py — CLI: python scripts/conjugate_structured.py "He ate pizza yesterday" → conjugation JSON on stdout.
  • experiments/30-end-to-end/ with:
  • sample_sentences/ — five English sentences (one per tense bucket: infinitive context, present_simple, past_simple, past_participle, simple_future).
  • outputs/ — captured CLI outputs for each.
  • README.md — for each sentence, the input, the output, your one-paragraph commentary on whether the model got it right semantically.

TODOs

Block A — CLI shape

  • scripts/conjugate_structured.py takes a single positional sentence argument OR stdin if no arg.
  • Loads MiniGPT checkpoint from models/minigpt-phase-18.npz (or wherever Phase 18 saves it).
  • Constructs JSONSchemaMask(tokenizer, conjugation_schema) once.
  • Builds a prompt:
    Identify the main verb conjugation in this English sentence.
    Return JSON with keys: verb (lemma), tense, person.
    
    Sentence: <sentence>
    
    Conjugation:
    
  • Calls generate(prompt, mask=mask, max_new_tokens=64).
  • Prints the output. Pipe-friendly: no extra prose, just the JSON.

Block B — terminate-on-done

  • When mask.is_done() is True, stop generation immediately (don't wait for max_new_tokens). This shortens average decode time meaningfully.
  • Test: an output of {"verb":"eat","tense":"past_simple","person":"3sg"} should NOT have trailing tokens.

Block C — validate-and-report

  • After generation, run json.loads and jsonschema.validate (lib OK here, post-hoc). If either fails, print the malformed output to stderr with a clear error and exit non-zero. (This should never happen in practice; the safety net catches mask bugs.)

Block D — sample sentences

Pick five English sentences covering the §A13 tense space. Recommended set:

  1. Infinitive context. "I want to work tomorrow." — main verb want, present_simple, 1sg. (The infinitive work is also present; the lab specifies "main verb" — document the disambiguation rule in README.)
  2. Present simple, 3sg. "She plays the piano every Sunday." — verb play, present_simple, 3sg.
  3. Past simple, regular. "They listened to the radio."they is plural; §A13 plurals are deferred, so this is out of scope. Skip and use: "He listened to the radio." — verb listen, past_simple, 3sg.
  4. Past simple, irregular. "She wrote a long letter." — verb write, past_simple, 3sg.
  5. Simple future. "I will study tomorrow." — verb study, simple_future, 1sg.

Run each through conjugate_structured.py. Save outputs. Comment in README on whether the content matches expectation. (Recall: Phase 30 guarantees parse, NOT correctness. If the model emits tense: "present_simple" for "She wrote a long letter", the mask did its job; the model is undertrained.)

Block E — pipeline smoke test

In tests/test_conjugate_cli.py:

  • test_conjugate_cli_returns_valid_json — run the CLI on a fixed sentence, assert the output parses and validates.
  • test_conjugate_cli_verb_in_scope — assert the verb field is one of the 20 enums.
  • test_conjugate_cli_tense_in_scope — assert the tense field is one of the 5 enums.
  • test_conjugate_cli_person_in_scope — assert the person field is one of the 3 enums.
  • test_conjugate_cli_exits_zero_on_valid — exit code is 0 on success.
  • test_conjugate_cli_exits_nonzero_on_no_input — empty stdin → non-zero exit + helpful error.

Constraints

  • No retries. If generation fails (parse error), the CLI exits with non-zero. Phase 30's contract is that this never happens.
  • No prose in output. The CLI outputs only the JSON. No header, no logging on stdout (logging goes to stderr).
  • No transformers import. Use the hand-built MiniGPT.
  • Deterministic seed flag. --seed N for reproducibility. Default seed = 0 for the smoke tests.

Stop conditions

Done when:

  1. CLI runs on all five sample sentences and produces parseable, schema-valid JSON.
  2. The Block E tests pass.
  3. README documents which sentence the model got right semantically and which it got wrong. (This is data for Phase 28 fine-tuning later, not a Phase 30 failure.)

Pitfalls

  • MiniGPT is undertrained. The model from Phase 18 may produce semantically wrong but parseable output. That's fine for Phase 30's DoD. Don't get distracted trying to improve content quality here.
  • Context truncation. If the sentence + prompt is larger than the model's context window, you'll truncate. Hard-clip to the last N tokens that fit; document in README.
  • EOS in the middle of a string. The model might want to emit EOS early. Your mask should forbid EOS until the schema's DONE state is reached. Test this.
  • Trailing whitespace in CLI output. Strip before printing — downstream pipes are sensitive.
  • Disambiguating "main verb". "I want to work tomorrow" contains two verbs; the schema asks for one. The prompt must instruct the model to pick the finite verb (the one carrying the tense). Document the rule.

When to consult solutions/

After the CLI works end-to-end. The solution's interest is mainly in how the prompt is structured (a minor art form that doesn't change correctness but affects KL per step).


Next lab: lab/03-mask-overhead.md.