Skip to content

English · Español

03 — Sandboxing untrusted tools

🇪🇸 Las herramientas de Fase 31 son puramente de lectura — no necesitan sandbox. Pero si alguna vez añadimos una herramienta que ejecute código del usuario, navegue la web o invoque una API externa, tiene que correr aislada. Esta fase construye el patrón ahora, mientras el coste es bajo, para que esté listo cuando importe.

The threat model

A tool the agent calls is, in general, untrusted code:

  • It can hang (infinite loop), exhausting the agent's step budget for no good reason.
  • It can OOM (allocate gigabytes), crashing the agent process.
  • It can shell out, read sensitive files, write to disk, send network requests.
  • It can crash, leaving the agent process in an undefined state.

For Phase 32, the registered tools (conjugate, lookup_irregular_verb, check_subject_verb_agreement, lookup_spanish) are pure-data lookups against the §A13 verb table. They cannot misbehave. But the sandbox infrastructure must be built now, while the cost is zero, so we don't add it under fire when a misbehaving tool ships in Phase 33+.

This is exactly the principle from Phase 22 (KV cache): build the discipline before you need it.

Three levels of containment

PERMISSIVE — no sandbox

def call(tool, args): return tool(**args)

In-process, no boundary. Use for pure-data tools that you wrote and trust.

For Phase 32's tools, this is the default. They're all PERMISSIVE.

SUBPROCESS — separate process, bounded resources

def call(tool, args):
    proc = subprocess.run(
        [sys.executable, "-c", f"import json; from {tool.module} import {tool.name}; "
                              f"print(json.dumps({tool.name}(**{args!r})))"],
        capture_output=True, timeout=tool.timeout_s, text=True,
    )
    return json.loads(proc.stdout)

The tool runs in a child process. Wall-clock timeout via subprocess.run(timeout=...). CPU and memory limits via resource.setrlimit in the child's preexec hook. Crashes don't kill the parent; timeouts terminate cleanly.

Use for tools that should be safe but might misbehave (e.g., a verb conjugator that parses unstructured input, where malformed input might trigger a slow path).

CONTAINER — full isolation

For tools that execute arbitrary user code or shell out, run the tool inside a Docker / Firejail / nsjail container with no network, read-only filesystem, dropped capabilities.

Out of scope for Phase 32 (we have no such tools yet). Mentioned for completeness — and so Borja sees the right abstraction (a policy enum that selects the right strategy per tool) even though we only implement two levels.

The SandboxPolicy enum and the dispatcher

from enum import Enum

class SandboxPolicy(Enum):
    PERMISSIVE = "permissive"   # in-process
    SUBPROCESS = "subprocess"   # separate process, timeout + rlimits
    CONTAINER  = "container"    # full isolation (Phase 32: NotImplementedError)


def run_under_sandbox(
    tool: Tool,
    args: dict,
    policy: SandboxPolicy = SandboxPolicy.PERMISSIVE,
) -> ToolResult:
    match policy:
        case SandboxPolicy.PERMISSIVE:
            return _call_in_process(tool, args)
        case SandboxPolicy.SUBPROCESS:
            return _call_in_subprocess(tool, args)
        case SandboxPolicy.CONTAINER:
            raise NotImplementedError("CONTAINER policy planned for Phase 33+")

The agent calls run_under_sandbox with the policy declared in the tool's registration metadata. This decouples the policy from the loop — adding a new tool requires only its declaration.

Resource limits — the SUBPROCESS recipe

In the child process's preexec_fn (Unix):

import resource

def child_setup():
    # CPU time: 5 seconds.
    resource.setrlimit(resource.RLIMIT_CPU, (5, 5))
    # Address space (virtual memory): 256 MB.
    resource.setrlimit(resource.RLIMIT_AS, (256 * 1024 * 1024, 256 * 1024 * 1024))
    # No core dumps.
    resource.setrlimit(resource.RLIMIT_CORE, (0, 0))
    # File size: 10 MB.
    resource.setrlimit(resource.RLIMIT_FSIZE, (10 * 1024 * 1024, 10 * 1024 * 1024))

Note: RLIMIT_AS is Linux-specific. On macOS use RLIMIT_DATA or skip the memory limit. setrlimit doesn't kill the process — it makes the kernel deny excessive allocations, causing the tool to raise MemoryError. This is cleaner than killing.

The wall-clock timeout (subprocess.run(timeout=...)) is the hard backstop: if the rlimits don't bite for some reason, the timeout will.

Network policy

By default, the SUBPROCESS policy does not restrict network access — the child inherits the parent's network namespace. To enforce a network policy, use one of:

  • firejail --net=none python -c ... — denies network access entirely.
  • unshare -n — Linux namespace isolation; child has its own (empty) network namespace.
  • Docker / nsjail — for full isolation (CONTAINER policy).

For Phase 32, all tools are local data lookups; no network needed. We default to "no network restriction" in SUBPROCESS and document that tools requiring network and sandboxing must use CONTAINER.

Testing the sandbox — adversarial tools

To prove the sandbox works, we need misbehaving tools to test against. Lab 02 introduces three:

  • evil_infinite_loop()while True: pass. Should be terminated by RLIMIT_CPU or by subprocess.run timeout.
  • evil_memory_eater()x = bytearray(10**10). Should raise MemoryError under RLIMIT_AS.
  • evil_fork_bomb():(){ :|:& };: equivalent. Should be limited by RLIMIT_NPROC.

For each: confirm the parent agent survives, the tool returns a ToolError("sandbox: timeout") (or similar), and the loop continues to the next step (or terminates with a budget-exhausted error).

This is a real test, not a hypothetical. Borja will write and run these tools. The point is to feel the sandbox catch a misbehaving program — building confidence that the abstraction works.

Sandbox vs trust — a security principle

The sandbox is a defence in depth, not a substitute for trust:

  • Don't run sandboxed code on production data.
  • Don't assume sandboxing makes arbitrary user code safe — kernel CVEs exist.
  • Don't fall into "I sandboxed it, so it's fine" — sandboxes have escape vectors.

For a §A13 grammar tutor, the threat model is low (no untrusted code). The sandbox is infrastructure for the future: when Phase 33+ adds tools that might be untrusted, the pattern is ready.

A note on "the sandbox doesn't matter" objection

A reader might say: "Phase 32's tools are pure data; this is over-engineering."

Reply: building the sandbox now costs ~200 lines of code. Adding it under fire later (when a tool starts misbehaving in production) is much more expensive — you'll be debugging a hung agent in the middle of a corrupted session, with users complaining. The Pareto-frontier choice is to build the lightweight version now.

This is the same argument as "why bother with the SBOM in Phase 0" or "why bother with the manifest.json for every experiment." Disciplines that are cheap to maintain and expensive to retrofit should be installed at the earliest opportunity.

What this file does NOT cover

  • Network policy in CONTAINER mode. Docker / firejail are mentioned but not implemented. Phase 33+.
  • Capability dropping (e.g., prctl(PR_SET_NO_NEW_PRIVS)). Useful for deeper sandboxing; out of scope.
  • gVisor, Firecracker, lightweight VMs. Production-grade isolation. Out of scope.

Next: ../lab/00-planner-by-mask.md