English · Español

Lab 02 — Sandbox containment of an evil tool¶

Read theory/03-sandboxing.md. Do not consult solutions/.

Objective¶

Build three deliberately-misbehaving tools (infinite loop, memory eater, fork bomb), register them with the agent, and verify that the SUBPROCESS sandbox contains each. The parent agent process must survive all three; each misbehaving tool must return a clean ToolError. This is the security test of Phase 32.

Setup¶

A new module: src/miniagent/sandbox.py. Tests in tests/test_sandbox.py. Evil tools live in tests/evil_tools/ (deliberately not in src/, so they're never accidentally registered for real use).

Tasks¶

Task 1 — implement `run_under_sandbox`¶

def run_under_sandbox(
    tool: Tool,
    args: dict,
    policy: SandboxPolicy = SandboxPolicy.PERMISSIVE,
    timeout_s: float = 5.0,
    memory_mb: int = 256,
) -> ToolResult:
    match policy:
        case SandboxPolicy.PERMISSIVE:
            try:
                return ToolResult.ok(tool(**args))
            except Exception as e:
                return ToolResult.error(f"in-process: {e}")
        case SandboxPolicy.SUBPROCESS:
            return _call_in_subprocess(tool, args, timeout_s, memory_mb)
        case SandboxPolicy.CONTAINER:
            raise NotImplementedError(...)

For SUBPROCESS:

Spawn a child via subprocess.Popen([sys.executable, ...]).
Use preexec_fn to set RLIMIT_CPU (CPU seconds), RLIMIT_AS (virtual memory bytes), RLIMIT_NPROC (max user processes), RLIMIT_FSIZE (max file size).
Use subprocess.run(timeout=timeout_s) as the wall-clock backstop.
Marshal args via JSON; deserialise the result the same way.
On any failure (timeout, non-zero exit, malformed output), return ToolResult.error("sandbox: <reason>") — never propagate the crash to the parent.

Task 2 — write the three evil tools¶

In tests/evil_tools/:

# tests/evil_tools/infinite_loop.py
def evil_infinite_loop():
    while True:
        pass

# tests/evil_tools/memory_eater.py
def evil_memory_eater():
    x = bytearray(2 * 10**9)  # 2 GB
    return len(x)

# tests/evil_tools/fork_bomb.py
import os
def evil_fork_bomb():
    while True:
        os.fork()

Each is a single function that misbehaves in a specific way. Keep them obvious — this isn't a CTF.

Task 3 — write the containment tests¶

For each evil tool:

def test_infinite_loop_is_terminated():
    result = run_under_sandbox(
        evil_infinite_loop, {},
        policy=SandboxPolicy.SUBPROCESS,
        timeout_s=2.0,
    )
    assert result.is_error
    assert "timeout" in result.error.lower()
    # Critical: parent process is still alive after this.
    assert os.getpid() == initial_pid

Repeat for memory_eater (expect MemoryError or RLIMIT_AS-killed in the child) and fork_bomb (expect RLIMIT_NPROC to bite).

For each test, assert:

The result is an error (result.is_error == True).
The error string indicates the relevant limit (timeout / memory / fork).
The parent process survived (os.getpid() unchanged, parent didn't crash).
The test completed within ~5 seconds wall-clock (sandbox didn't hang the test runner).

Task 4 — measure resource ceiling¶

For each evil tool, measure:

Time-to-termination (how quickly the sandbox detects + kills).
Peak memory observed in the child (use resource.getrusage after the child exits).
Whether the kill was via signal (SIGKILL, SIGTERM), via RLIMIT_* (raises in the child), or via subprocess.TimeoutExpired.

Save to experiments/<date>-phase-32-sandbox-eval/containment.csv with columns tool, kill_mechanism, time_to_kill_s, peak_memory_mb.

Task 5 — register with the agent, run with the real tutor¶

Add the three evil tools to a separate MCP server (clearly labelled "evil"). Have the agent's Planner (in a test fixture) emit a tool_call for one of them, dispatched via SUBPROCESS sandbox. Verify:

The agent's loop catches the ToolError.
The scratchpad records the failed call cleanly.
The agent terminates with a graceful "tool failure" result, not a crash.

This is the closed-loop sandbox test: not just "can we sandbox a function," but "does the agent's full loop handle a misbehaving tool gracefully."

Measurements to capture¶

3 evil tools: each contained, parent survives, error returned.
Time-to-kill for each.
Peak memory observed.
Kill mechanism per tool.
Agent's behavior when a sandboxed call fails (graceful, not crashing).

Acceptance¶

src/miniagent/sandbox.py implements PERMISSIVE and SUBPROCESS policies.
3 evil tools written in tests/evil_tools/.
3 containment tests pass.
Containment measurements saved to CSV.
Closed-loop sandbox test (Task 5) passes.
No evil tool's PID lingers after the test suite finishes.

Pitfalls to expect¶

Fork bomb defeats RLIMIT_NPROC. RLIMIT_NPROC limits processes per user, not per process tree. If your test user already has many processes, the limit may bite other tests. Run the fork-bomb test under a setrlimit low enough to take effect but isolate the impact (e.g., set the limit to current+5).
RLIMIT_AS doesn't always bite. On macOS, RLIMIT_AS is not enforced; use RLIMIT_DATA or skip on macOS. On Linux, RLIMIT_AS is enforced. Document the OS-conditional behavior.
Zombie processes. A sandboxed subprocess that's terminated via SIGKILL may not have its exit code reaped, leaving a zombie. Use subprocess.Popen.wait() after a kill to reap.
Timeout granularity. subprocess.run(timeout=...) has a granularity of ~10ms. Don't set timeout=0.001 — it's not meaningful.
preexec_fn security note. preexec_fn is not thread-safe. If the parent ever uses threads, prefer start_new_session=True + signal-based termination instead. For Phase 32, single-threaded is fine.

Next: 03-failure-mode-tour.md