English · Español
00 — What an agent is, and what it isn't¶
🇪🇸 Un agente aquí significa una cosa precisa: un loop alrededor del modelo que mantiene estado, invoca herramientas y termina. Ni más ni menos. La palabra "agente" carga mucho hype del marketing; este capítulo recorta hasta el hueso.
A working definition¶
An agent is a loop that, on each step, decides whether to call a tool or to produce a final answer, executes that decision, updates its state, and repeats until termination.
That's it. Three primitives:
- A decision function (the planner).
- An effect mechanism (tool calls — possibly producing observations).
- A termination condition (a final-answer step, a step budget, or an error).
Plus state: the conversation so far, intermediate tool results, persistent facts. We'll call this memory — split into a transient scratchpad and a persistent store.
That definition fits LangChain agents, AutoGen, GPT-4's function-calling loop, and the one we're about to build. The differences are in implementation, not in concept.
What an agent isn't (in this project)¶
- A reasoner. The model still doesn't reason in any deep philosophical sense — it samples from a distribution. The "agent" wrapper exposes that sampling in a structured way (tool calls, observations) but doesn't grant new cognitive abilities.
- An autonomous entity. Our agent runs when invoked and terminates. It has no goals it sets itself, no schedule, no awareness. Saying "the agent decided to call tool X" is a useful figure of speech, not a metaphysical claim.
- A multi-step planner that compiles before acting. Some literature uses "agent" for systems that first build a plan (a graph of intended actions) and then execute. Phase 32 uses ReAct (Yao et al. 2022): decide one action, do it, observe, decide the next. Plan-and-execute is mentioned in lab 01 for contrast.
- A general assistant. The grammar tutor does one thing: corrects English verb conjugations per §A13. Out-of-scope requests get refused, not generously interpreted.
Why the loop matters¶
A single forward pass through a transformer is a stateless function: tokens in, tokens out. Real applications need state — "what did I just look up?" "what did the user already correct?" "is this a recurring mistake?". The loop is where state lives:
state = init_state(user_input)
while not done:
decision = planner(state) # the model picks the next move
if is_final_answer(decision):
return decision.answer
observation = call_tool(decision) # the world produces a result
state = update(state, decision, observation)
Six lines. Every component beyond this — memory, sandboxing, planner constraints, tool selection — is an enhancement of one of those six lines.
Two emphases that shape the rest of this phase¶
Emphasis 1 — the planner emits structured decisions, not free text¶
A naive "agent" prompts the model with "Should I call a tool or give a final answer?" and parses the resulting prose. This is unreliable for any model under 70B parameters (and is questionable above that). We use a different approach:
The planner emits JSON that matches a strict schema, enforced by
JSONSchemaMask(from Phase 30). The model cannot produce invalid output by construction.
Mechanism: at each token-generation step, we compute which tokens are valid continuations under the schema and zero out (via \(-\infty\) logit bias) every other token. The result: the planner's output is always either {"next": "tool_call", "tool": <enum>, "args": {...}} or {"next": "final_answer", "answer": ...}. Never anything else. This eliminates an entire category of failure mode.
We covered the technique in Phase 30. Phase 32 uses it.
Emphasis 2 — the agent terminates under bounded resources¶
The agent has a hard step budget (typical: 8). If it hits the budget without producing a final answer, it returns a structured "could not converge" result. Step budget + deduplication of repeated tool calls prevents infinite loops. This is the difference between a system and a demo.
We'll measure mean steps per correction across the 30-sentence canonical test set. For §A13's verb corrections, \(\mu \approx 3\) steps (parse → look up form → check agreement → answer). If you measure \(\mu > 5\) in lab 01, your planner is looping; debug.
The §A13 grammar tutor — concrete shape¶
Input: an English sentence.
Output: a CorrectionResult (dataclass):
@dataclass
class CorrectionResult:
original: str # input verbatim
corrected: str | None # corrected sentence, or None if no correction
rationale: list[str] # human-readable explanations (1-3 items)
spanish_gloss: str | None # Spanish translation, only if corrected ≠ original
in_scope: bool # False if the sentence is outside §A13's verb set
tool_trace: list[ToolCall] # the path through tool space (for debugging)
Tools (from Phase 31, MCP-served; canonical names per docs/phase-31-tools-mcp/lab/00-typed-tools.md):
conjugate(verb: str, tense: str, person: str) → str— returns the expected conjugated form.lookup_irregular_verb(verb: str) → dict— principal parts plus anis_irregularflag; used to drive the rationale.check_subject_verb_agreement(subject: str, verb_form: str) → dict— does the form match the subject? Returns{agrees, expected_form}.lookup_spanish(english_form: str) → str— paired Spanish form for the conjugated English form.
The agent's job: decide which of these to call, in what order, with what arguments, and when to stop. That's the agent loop.
What you should be able to do after this phase¶
- Write down the agent loop from memory in 6–10 lines.
- Explain when ReAct beats plan-and-execute and when it doesn't. (Roughly: ReAct for short interaction depths and uncertain tool calls; plan-and-execute when the plan can be cheaply computed up front.)
- Identify three failure modes of agent loops and the mitigations: looping (step budget + dedup), planner hallucination (JSON schema mask), tool misbehavior (sandbox).
- Read any agent framework's source — LangChain, AutoGen, Anthropic's tool-use cookbook — and locate the corresponding 6 lines.
What this file does NOT cover¶
- The math of the planner's masked decoding. Phase 30.
- The MCP wire format. Phase 31.
- The training of an agent. Phase 28's fine-tuning could be tuned on agent traces; Phase 32 does not. Real-world agent training (e.g., DPO on tool-trajectories) is one of several pieces left out.
- Multi-agent debate, swarms, etc. Out of scope.
Next: 01-react-and-planning.md