English · Español
Lab 02 — Sandbox containment of an evil tool¶
Read
theory/03-sandboxing.md. Do not consultsolutions/.
Objective¶
Build three deliberately-misbehaving tools (infinite loop, memory eater, fork bomb), register them with the agent, and verify that the SUBPROCESS sandbox contains each. The parent agent process must survive all three; each misbehaving tool must return a clean ToolError. This is the security test of Phase 32.
Setup¶
A new module: src/miniagent/sandbox.py. Tests in tests/test_sandbox.py. Evil tools live in tests/evil_tools/ (deliberately not in src/, so they're never accidentally registered for real use).
Tasks¶
Task 1 — implement run_under_sandbox¶
def run_under_sandbox(
tool: Tool,
args: dict,
policy: SandboxPolicy = SandboxPolicy.PERMISSIVE,
timeout_s: float = 5.0,
memory_mb: int = 256,
) -> ToolResult:
match policy:
case SandboxPolicy.PERMISSIVE:
try:
return ToolResult.ok(tool(**args))
except Exception as e:
return ToolResult.error(f"in-process: {e}")
case SandboxPolicy.SUBPROCESS:
return _call_in_subprocess(tool, args, timeout_s, memory_mb)
case SandboxPolicy.CONTAINER:
raise NotImplementedError(...)
For SUBPROCESS:
- Spawn a child via
subprocess.Popen([sys.executable, ...]). - Use
preexec_fnto setRLIMIT_CPU(CPU seconds),RLIMIT_AS(virtual memory bytes),RLIMIT_NPROC(max user processes),RLIMIT_FSIZE(max file size). - Use
subprocess.run(timeout=timeout_s)as the wall-clock backstop. - Marshal args via JSON; deserialise the result the same way.
- On any failure (timeout, non-zero exit, malformed output), return
ToolResult.error("sandbox: <reason>")— never propagate the crash to the parent.
Task 2 — write the three evil tools¶
In tests/evil_tools/:
# tests/evil_tools/infinite_loop.py
def evil_infinite_loop():
while True:
pass
# tests/evil_tools/memory_eater.py
def evil_memory_eater():
x = bytearray(2 * 10**9) # 2 GB
return len(x)
# tests/evil_tools/fork_bomb.py
import os
def evil_fork_bomb():
while True:
os.fork()
Each is a single function that misbehaves in a specific way. Keep them obvious — this isn't a CTF.
Task 3 — write the containment tests¶
For each evil tool:
def test_infinite_loop_is_terminated():
result = run_under_sandbox(
evil_infinite_loop, {},
policy=SandboxPolicy.SUBPROCESS,
timeout_s=2.0,
)
assert result.is_error
assert "timeout" in result.error.lower()
# Critical: parent process is still alive after this.
assert os.getpid() == initial_pid
Repeat for memory_eater (expect MemoryError or RLIMIT_AS-killed in the child) and fork_bomb (expect RLIMIT_NPROC to bite).
For each test, assert:
- The result is an error (
result.is_error == True). - The error string indicates the relevant limit (timeout / memory / fork).
- The parent process survived (
os.getpid()unchanged, parent didn't crash). - The test completed within ~5 seconds wall-clock (sandbox didn't hang the test runner).
Task 4 — measure resource ceiling¶
For each evil tool, measure:
- Time-to-termination (how quickly the sandbox detects + kills).
- Peak memory observed in the child (use
resource.getrusageafter the child exits). - Whether the kill was via signal (SIGKILL, SIGTERM), via
RLIMIT_*(raises in the child), or viasubprocess.TimeoutExpired.
Save to experiments/<date>-phase-32-sandbox-eval/containment.csv with columns tool, kill_mechanism, time_to_kill_s, peak_memory_mb.
Task 5 — register with the agent, run with the real tutor¶
Add the three evil tools to a separate MCP server (clearly labelled "evil"). Have the agent's Planner (in a test fixture) emit a tool_call for one of them, dispatched via SUBPROCESS sandbox. Verify:
- The agent's loop catches the
ToolError. - The scratchpad records the failed call cleanly.
- The agent terminates with a graceful "tool failure" result, not a crash.
This is the closed-loop sandbox test: not just "can we sandbox a function," but "does the agent's full loop handle a misbehaving tool gracefully."
Measurements to capture¶
- 3 evil tools: each contained, parent survives, error returned.
- Time-to-kill for each.
- Peak memory observed.
- Kill mechanism per tool.
- Agent's behavior when a sandboxed call fails (graceful, not crashing).
Acceptance¶
-
src/miniagent/sandbox.pyimplementsPERMISSIVEandSUBPROCESSpolicies. - 3 evil tools written in
tests/evil_tools/. - 3 containment tests pass.
- Containment measurements saved to CSV.
- Closed-loop sandbox test (Task 5) passes.
- No evil tool's PID lingers after the test suite finishes.
Pitfalls to expect¶
- Fork bomb defeats
RLIMIT_NPROC.RLIMIT_NPROClimits processes per user, not per process tree. If your test user already has many processes, the limit may bite other tests. Run the fork-bomb test under asetrlimitlow enough to take effect but isolate the impact (e.g., set the limit to current+5). RLIMIT_ASdoesn't always bite. On macOS,RLIMIT_ASis not enforced; useRLIMIT_DATAor skip on macOS. On Linux,RLIMIT_ASis enforced. Document the OS-conditional behavior.- Zombie processes. A sandboxed subprocess that's terminated via
SIGKILLmay not have its exit code reaped, leaving a zombie. Usesubprocess.Popen.wait()after a kill to reap. - Timeout granularity.
subprocess.run(timeout=...)has a granularity of ~10ms. Don't settimeout=0.001— it's not meaningful. preexec_fnsecurity note.preexec_fnis not thread-safe. If the parent ever uses threads, preferstart_new_session=True+ signal-based termination instead. For Phase 32, single-threaded is fine.
Next: 03-failure-mode-tour.md