English · Español

00 — Why Tools Exist¶

🇪🇸 Sin herramientas, el modelo solo sabe lo que aprendió en el entrenamiento. Con herramientas, el modelo puede consultar hechos en el momento (qué dijo lookup_irregular_verb('go'), qué dijo lookup_spanish('worked')) y componer la respuesta usando esos hechos. La diferencia entre adivinar y saber.

The closed-knowledge ceiling¶

A pure language model encodes whatever it learned during training. At inference time it cannot:

Look anything up. If the training corpus said "the past simple of go is went", the model probably knows this. If the corpus had a typo and said "goed", the model probably propagates the typo.
Reason precisely about edge cases. Conjugation has many small rules (-y → -ied for study, -y → -yed for play because the y is preceded by a vowel; e doubling on regular -ed is rare in our 20-verb set). The model has an associative knowledge of these; it does not have an algorithmic one. Small rules with many exceptions is exactly where neural models fail at the long tail.
Cite a source. If the user asks why "He goed" is wrong, the model will say something fluent. It will not point to a table where go is marked irregular with past_simple = "went".

Tools fix all three. A lookup_irregular_verb('go') call returns the canonical truth table entry — the one we hand-curated in Phase 12's corpus. The model is no longer guessing; it's consulting.

What "a tool" is, mechanically¶

A tool, in the function-calling sense, is exactly four things:

A Python function. Pure, typed, side-effect-free (in our scope). Example:

def conjugate(verb: str, tense: str, person: str) -> str:
    """Return the conjugated form."""
    ...

A JSON-Schema descriptor of the function's arguments. Example:

{
  "name": "conjugate",
  "description": "Return the conjugated form of an English verb.",
  "input_schema": {
    "type": "object",
    "properties": {
      "verb":   {"type": "string", "enum": [...20 verbs...]},
      "tense":  {"type": "string", "enum": [...5 tenses...]},
      "person": {"type": "string", "enum": [...3 persons...]}
    },
    "required": ["verb", "tense", "person"]
  }
}

A return value contract. What shape comes back. For conjugate, a string. For lookup_irregular_verb, a dict.
An error contract. What happens when arguments are out of scope, the underlying data is missing, etc. Returned (not raised) as a structured error object.

That's it. There is no "AI" in any of those four things. The AI is what picks which tool to call and constructs the arguments. Phase 31 builds tools (1-4) in vacuum. Phase 32 wires them to the model.

Why this matters now¶

The Phase 32 grammar-tutor agent has a tight loop:

1. Receive English sentence from user.
2. Parse the sentence to identify the main verb and subject.
3. Decide whether the verb form matches the inferred tense/person.
4. If a mismatch: propose the correct form (and optional Spanish gloss).
5. Return a structured response.

Steps 2-4 are information-rich. The model could try to do them by pure association — and on common cases will succeed — but on edge cases (have, be, do — the most irregular and most frequent verbs in English) it will sometimes confuse forms. The tools layer turns step 4 from "model produces an answer" into "model decides which tool to call, then reads the tool's answer". The answer is guaranteed correct by construction because the truth table is correct by construction.

This is the retrieval-augmented-generation pattern restricted to a tiny closed domain. It's the same pattern as Phase 29 RAG (lookup → ground reply in lookup), just with a typed schema instead of an open-text knowledge base.

What this phase does not solve¶

Tool selection. Phase 31 implements the tools; Phase 32 decides which tool to call. The agent's planner is what does selection.
Tool result composition. If the agent calls two tools, how does it combine their outputs? Phase 32's planner. We don't even try here.
Tool error recovery. If conjugate(verb="run", ...) returns {"error": "out_of_scope"}, what does the agent do? Phase 32.

The Model Context Protocol (MCP) preview¶

A tool, as defined above, lives inside the agent's process. The agent imports the function, calls it, gets the result.

MCP introduces a level of indirection: the tool lives in a separate process, exposed over a transport (stdio, SSE, streamable HTTP). The agent's process is a client; the tool host is a server. The wire format is JSON-RPC 2.0, with a small registry of conventions (tools/list, tools/call).

Why bother with the indirection?

Process isolation. A misbehaving tool crashes the tool server, not the agent. Phase 32's sandbox uses this.
Language independence. A Python agent can call a Rust tool. The protocol doesn't care.
Discoverability. The agent can ask tools/list and learn what's available, instead of needing prior knowledge of the tool catalog.
Re-usability. The same tool server can serve multiple agent clients.

Phase 31 builds the minimal MCP — enough to demonstrate the protocol over stdio with one client and one server. Phase 33's serving layer will expose the agent over HTTP, and the tool server stays on stdio behind it.

The Phase 30 → Phase 31 bridge¶

A tool-call message looks like (Anthropic's format, similar to OpenAI's):

{
  "method": "tools/call",
  "params": {
    "name": "conjugate",
    "arguments": {"verb": "eat", "tense": "past_simple", "person": "3sg"}
  }
}

The arguments object must conform to the tool's input schema. Phase 30's JSONSchemaMask is exactly the mechanism that guarantees the model emits a valid arguments object. Without the mask, the model occasionally emits {"verb": "ate", ...} (a conjugated form instead of the lemma) or {"tense": "past"} (a string outside the enum). With the mask, the model is constrained to the enum at decode time and cannot produce an invalid argument blob.

This is the first concrete payoff of Phase 30. Phase 31's lab 03 demonstrates it explicitly.

The bigger picture¶

The arc from Phase 30 to Phase 33 is:

Phase 30: model produces structured output (mask-constrained).
Phase 31: structured output dispatches to tools (this phase).
Phase 32: agent loop chains tool calls into multi-step plans.
Phase 33: the whole stack is exposed over HTTP.

Each phase is a thin layer over the previous. Phase 31 is the connective tissue — without it, the model emits JSON that nothing acts on; without it, Phase 32 has nothing to plan with.

What this phase does NOT cover¶

Tool selection / planning. Phase 32.
Sandboxed execution. Phase 32.
HTTP / streaming transports. Phase 33.
Tool caching. A repeated conjugate(verb=eat, tense=past_simple, person=3sg) re-runs from scratch. Phase 33 may add an LRU cache; out of scope here.
Tool versioning. A tool's schema can evolve; we don't track this until Phase 38 (MLOps).

Next: theory/01-function-calling-formats.md — the survey of major tool-call formats and why they all converge on the same shape.