Skip to content

English · Español

Lab 00 — A planner under JSONSchemaMask

Read theory/01-react-and-planning.md. Do not consult solutions/.

Objective

Build a Planner that, given a state (sentence + trace), generates the next step as valid JSON conforming to the planner schema — using Phase 30's JSONSchemaMask to constrain decoding. The planner should be unable to emit invalid output by construction.

Setup

A new file: src/miniagent/planner.py. It imports:

  • MiniGPT (or a fine-tuned variant) from src/minimodel/mini_gpt.py.
  • JSONSchemaMask from src/ministruct/mask.py (Phase 30).
  • ToolCall, FinalAnswer, Step from src/miniagent/types.py.

Tasks

Task 1 — define the planner schema

In src/miniagent/schemas.py, define the JSON Schema for the planner's output:

{
  "oneOf": [
    {
      "type": "object",
      "properties": {
        "next": {"const": "tool_call"},
        "tool": {"enum": ["conjugate", "lookup_irregular_verb", "check_subject_verb_agreement", "lookup_spanish"]},
        "args": {"type": "object"}
      },
      "required": ["next", "tool", "args"],
      "additionalProperties": false
    },
    {
      "type": "object",
      "properties": {
        "next": {"const": "final_answer"},
        "answer": {
          "type": "object",
          "properties": {
            "corrected": {"type": ["string", "null"]},
            "rationale": {"type": "array", "items": {"type": "string"}},
            "spanish_gloss": {"type": ["string", "null"]},
            "in_scope": {"type": "boolean"}
          },
          "required": ["corrected", "rationale", "in_scope"],
          "additionalProperties": false
        }
      },
      "required": ["next", "answer"],
      "additionalProperties": false
    }
  ]
}

Validate this schema on paper before coding — write a few example outputs and confirm they parse (a tool_call with tool="conjugate" and args; a final_answer with full payload).

Task 2 — implement the Planner class

class Planner:
    def __init__(self, model: MiniGPT, mask: JSONSchemaMask, tokenizer):
        self.model = model
        self.mask = mask
        self.tokenizer = tokenizer

    def next_step(self, state: PlannerState) -> Step:
        """Generate the next step. Output is guaranteed to be a valid Step by construction."""
        prompt = self._format_prompt(state)
        tokens = self.tokenizer.encode(prompt)
        # Run masked generation until the JSON object closes
        json_str = self._masked_generate(tokens)
        parsed = json.loads(json_str)
        return self._parse_step(parsed)

    def _masked_generate(self, prompt_tokens) -> str:
        """Generate tokens one at a time, applying the mask at each step."""
        # Loop:
        #   logits = self.model(tokens + generated)
        #   masked_logits = self.mask.apply(logits[-1], partial_json=...)
        #   next_tok = sample_or_argmax(masked_logits)
        #   if json_complete(generated): break
        ...

Constraints:

  • The mask must be applied at every token, not just at structural points. The schema constrains every character of the output.
  • The decode loop should support both greedy (argmax) and temperature-sampling decode. Default: greedy.
  • A maximum generation budget (e.g., 256 tokens) prevents infinite loops if the mask logic has a bug.

Task 3 — validate the schema enforcement

Add to tests/test_planner.py:

  1. Schema compliance test. Generate 100 planner outputs (with varying prompts). Validate each against the schema using jsonschema.validate. Zero failures expected.
  2. Tool enum test. Confirm the generated tool value is always in the allowed set.
  3. No-trailing-garbage test. Confirm the generated output ends exactly after the closing } — no trailing tokens.
  4. Mask-disabled comparison. Run the same prompts without the mask. Most outputs should be invalid JSON or off-schema. This proves the mask is doing real work.

Task 4 — handle the model's untrained state gracefully

Phase 17's MiniGPT is untrained. Its outputs are random under masked decoding too — but with the mask, they at least parse. The next step (tool field, args) will be random — that's fine. Phase 32 expects a trained or fine-tuned model from Phase 28 to plug in here.

In the lab, write a MockPlanner that produces correct steps for a set of canonical test sentences. This lets you exercise the agent loop in Lab 01 even before a trained model is wired in:

class MockPlanner:
    """For testing only. Returns scripted steps for known sentences."""
    def __init__(self, scripts: dict[str, list[Step]]): ...
    def next_step(self, state: PlannerState) -> Step:
        return self.scripts[state.original][state.step_index]

Phase 32's lab 01 will use MockPlanner for the canonical test set; in production (Phase 33), the real Planner plugs in.

Task 5 — measure: tokens to emit a step

For a fixed test sentence and trace, time:

  1. The model's forward (unconstrained generation): how many tokens until something resembling a JSON object?
  2. The masked planner: how many tokens until a complete valid object?
  3. The mask's overhead per token (timing).

Expected: masked decoding produces a complete object in fewer tokens (no "thinking out loud" preamble) and the per-token overhead is < 10× the bare forward (the mask is doing a schema-tracking computation per token).

Save to experiments/<date>-phase-32-planner/timing.csv.

Measurements to capture

  • Schema compliance: 100/100 generated outputs validate.
  • Mask-disabled: count of invalid outputs (should be high).
  • Tokens per step: masked vs unmasked.
  • Mask per-token overhead.

Acceptance

  • src/miniagent/planner.py and src/miniagent/schemas.py exist.
  • All generated outputs pass jsonschema.validate.
  • MockPlanner available for use by Lab 01.
  • Test tests/test_planner.py is green.
  • Mask-vs-no-mask comparison documented.

Pitfalls to expect

  • Decoding doesn't know when to stop. The mask defines valid next tokens, but you also need to detect when the object is complete. Strategy: when the bracket depth returns to zero after the opening {, stop generating.
  • additionalProperties: false. Easy to omit; without it, the model can emit extra fields. Test this explicitly.
  • oneOf over two schemas. The mask must track which branch is being committed to as soon as the model commits (e.g., the moment "next": "tool_call" is emitted, the schema collapses to the tool_call branch). If your mask doesn't handle oneOf properly, outputs may straddle schemas.
  • Tokenizer alignment. JSON-mask logic typically operates at the character level, but generation is at the token level. You need a JSONSchemaMask that knows which tokens correspond to which characters (or you decode/re-encode per step). This is the implementation detail of mask-constrained decoding; Phase 30 should have given you a working pattern.

Next: 01-tutor-end-to-end.md