English · Español

Phase 32 — Agents: Planning, Memory, Sandboxing (Capstone Application)¶

Requires: 29 — Retrieval-Augmented Generation (RAG) · 31 — Tool Use & the Model Context Protocol (MCP) Teaches: agents · react-loop · planning · agent-memory · sandboxing Jump to any chapter from the phase reference index.

Chapter map¶

🇪🇸 La fase capstone. Aquí el modelo (Fases 17–22), las herramientas (Fase 31), el decoding restringido (Fase 30) y el RAG (Fase 29) se juntan para formar un tutor de gramática que corrige oraciones inglesas y, cuando procede, glosa la forma corregida en español. No es un toy demo: es el producto del libro.

Anchors: LYNX_CORTEX.md §4 / PHASE 32, PHASE_32_PLAN.md, LYNX_CORTEX_ADDENDUM.md §A13 (the grammar-tutor framing).

Why this phase exists¶

The thirty-one previous phases all converge here. Phase 17 gave us the model architecture; Phases 18–22 trained and served it; Phase 28 fine-tuned it for instruction following; Phase 30 made it emit structured output; Phase 31 wired it to tools via MCP. Phase 32 puts these in a closed loop with a planner, scratchpad, persistent memory, and a sandbox — that is, it turns a model into an agent.

The capstone product is a grammar tutor: it reads an English sentence, identifies the verb and inferred subject, calls a small set of tools (verb-form lookup, agreement check, regularity classifier), produces a structured correction, and — when the corrected form differs from the original — provides a Spanish gloss. Per LYNX_CORTEX_ADDENDUM.md §A13, this is the project's defining application.

What you'll build¶

                                                  ┌─────────┐
                                                  │ Memory  │
                                                  │ (long)  │
                                                  └────▲────┘
                                                       │
   User sentence                                       │
        │                                              │
        ▼                                              │
   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──┴────┐
   │  Agent   │───▶│ Planner  │───▶│ Tool via │───▶│ Scratch │
   │  (loop)  │◀───│ (masked) │◀───│   MCP    │◀───│   pad   │
   └──────────┘    └──────────┘    └──────────┘    └─────────┘
        │                                              │
        ▼                                              │
   Correction                                          │
   (structured)                                        ▼
                                                  ┌─────────┐
                                                  │ Sandbox │
                                                  │ policy  │
                                                  └─────────┘

Five components, all in src/miniagent/:

agent.py — GrammarTutorAgent, the orchestrator. Holds the loop.
planner.py — masked-generation planner. Emits ToolCall or FinalAnswer.
memory.py — ScratchpadMemory (per correction) + LongTermMemory (across corrections).
sandbox.py — SandboxPolicy enum + a run_under_sandbox wrapper.
types.py — the dataclasses that flow through the loop.

The deliverable is the agent corrects ≥ 90% of 30 canonical test sentences.

Files¶

phase-32-agents/
├── README.md                          # this file
├── theory/
│   ├── 00-motivation.md              # what an agent is, what it isn't
│   ├── 01-react-and-planning.md      # ReAct, plan-and-execute, constrained planners
│   ├── 02-memory.md                  # scratchpad vs long-term; what to persist
│   └── 03-sandboxing.md              # subprocess, rlimits, network policy
├── lab/
│   ├── 00-planner-by-mask.md         # implement planner under JSONSchemaMask
│   ├── 01-tutor-end-to-end.md        # run on 30 canonical sentences
│   ├── 02-sandbox-an-evil-tool.md    # prove containment
│   └── 03-failure-mode-tour.md       # induce + diagnose 4 classic agent failure modes
├── solutions/                         # populated at phase-open; do NOT read first
├── notebooks/
└── diagrams/                          # ReAct loop, scratchpad lifecycle, steps histogram

What this phase does NOT cover¶

Multi-turn dialog. Phase 32's tutor is single-turn (correct(sentence) → correction). Multi-turn lives in a future "conversation" phase that we don't have on the 40-phase roadmap.
User authentication. Phase 33's HTTP layer adds sessions. Phase 32's LongTermMemory keys by a hard-coded learner name (borja).
Tool training / tool-use fine-tuning. Tools are wired by structured generation (Phase 30) and MCP (Phase 31), not learned. A real production agent might fine-tune on tool-call traces; Phase 32 does not.
Multi-agent systems. Single agent, one model, one planner. Multi-agent orchestration (debate, supervisor-worker) is interesting and out of scope.
Vector-DB retrieval inside the agent loop. Phase 29 (RAG) is a batch tool the agent can call, not the agent's core memory. The agent's own memory is structured + small.

Phase-open checklist (per `CLAUDE.md` §1)¶

Re-read PHASE_32_PLAN.md §§0–8.
Re-read LYNX_CORTEX_ADDENDUM.md §A13 — the §A13 verb set and the tutor specification.
Re-read Phase 30 (structured generation) and Phase 31 (MCP) — Phase 32 builds directly on both.
Run the Phase 31 tools to confirm they still work end-to-end (sanity).
Open src/miniagent/BLUEPRINT.md — Phase 32 additions section should already be drafted; review before code.

A note on "agent" hype¶

The word "agent" carries a lot of marketing weight. For this project, an agent is precisely: a loop around a model that holds state, calls tools, and terminates. Nothing more. There's no implicit claim of "intelligence" or "autonomy" beyond what the loop and the tools provide. The §A13 grammar tutor is a perfectly respectable agent that does exactly one thing well.

This framing matters because the anti-goal §10 in LYNX_CORTEX.md (at repo root) excludes langchain / llama-index — those libraries package the loop into an opaque abstraction. Phase 32 builds the loop by hand, so Borja can read any agent framework's source and recognise the moving parts.

Next: theory/00-motivation.md