English · Español
Lab 04 — Supply chain: verify_artifacts.sh and safetensors enforcement¶
🇪🇸 El lab más corto y de mayor retorno: un script shell que verifica SHA256 de cada artefacto contra
MANIFEST.json, más un bandit-rule que prohíbe cargar pickles. Cinco minutos de paranoia, cero RCE por carga de pesos.
Goal¶
Ship two cheap defenses that together close the highest-severity supply-chain risks:
scripts/verify_artifacts.sh— walksMANIFEST.json, hashes each artifact, exits non-zero on mismatch.safetensors-only policy — enforced bybanditrule B301 plus a custom rule that flags anytorch.load(withoutweights_only=True.
This is the smallest lab by linecount, the largest lab by residual-risk reduction (T6 + T7 in the Theory 03 matrix: \(\Delta R \approx -0.85\) combined).
Deliverables¶
scripts/verify_artifacts.sh— executable shell script; passes on healthy manifest, fails clearly on tampered.scripts/forge_tampered_manifest.sh— test helper: copies the manifest to a temp dir, mutates one byte of one artifact, runs verify, asserts non-zero exit. Used to prove the verifier actually fires.tests/test_verify_artifacts.py— pytest wrapping the two shell scripts.security/banditrc.yamlorpyproject.toml[tool.bandit]block — enable B301 (pickle) and add custom rule fortorch.loadwithoutweights_only.- A new row in
security/THREATS.md(Borja appends; commitsecurity: phase-37-threats-supply-chain). security/supply-chain.mdextended with a grammar-tutor-specific section (the file likely exists from Phase 0; extend, don't overwrite).
Step 1 — Write scripts/verify_artifacts.sh¶
Shape (concrete implementation is Borja's; this is the contract):
#!/usr/bin/env bash
set -euo pipefail
MANIFEST="${1:-MANIFEST.json}"
if [[ ! -f "$MANIFEST" ]]; then
echo "ERROR: manifest not found: $MANIFEST" >&2
exit 2
fi
mismatches=0
jq -r '.artifacts[] | "\(.sha256) \(.path)"' "$MANIFEST" | while read -r expected path; do
if [[ ! -f "$path" ]]; then
echo "MISSING: $path" >&2
mismatches=$((mismatches + 1))
continue
fi
actual=$(sha256sum "$path" | awk '{print $1}')
if [[ "$actual" != "$expected" ]]; then
echo "MISMATCH: $path" >&2
echo " expected: $expected" >&2
echo " actual: $actual" >&2
mismatches=$((mismatches + 1))
fi
done
if [[ $mismatches -gt 0 ]]; then
echo "FAIL: $mismatches mismatch(es)" >&2
exit 1
fi
echo "OK: all $(jq '.artifacts | length' "$MANIFEST") artifacts verified"
Notes:
set -euo pipefailensures bash actually fails on intermediate errors. Without it,while readswallowing errors is a classic gotcha.- Exit codes:
0= all good.1= at least one mismatch.2= manifest missing / unreadable. The distinction matters for CI. - The
mismatchescounter inside awhile readsubshell is the one classical bash pitfall — careful with subshell scoping. If Borja's implementation reads inside a pipeline, the counter doesn't update in the parent. Solutions:process substitution(< <(jq ...)) orset -o lastpipe.
Step 2 — Run the verifier (healthy state)¶
If it doesn't return 0 on a freshly trained Phase 18 manifest, something is already wrong — investigate before continuing.
Step 3 — Forge a tampered manifest and verify failure¶
scripts/forge_tampered_manifest.sh:
#!/usr/bin/env bash
set -euo pipefail
WORK=$(mktemp -d)
cp -r artifacts "$WORK/"
cp MANIFEST.json "$WORK/"
cd "$WORK"
# Flip one byte in the model weights.
WEIGHTS=$(jq -r '.artifacts[] | select(.role == "model-weights").path' MANIFEST.json)
printf '\x00' | dd of="$WEIGHTS" bs=1 count=1 seek=42 conv=notrunc 2>/dev/null
# Verify must fail.
if scripts/verify_artifacts.sh; then
echo "FAIL: verifier did not detect tampering" >&2
exit 1
fi
echo "OK: tampering was detected"
Run it and confirm "tampering was detected." If verifier passes on the tampered file, the verifier is broken — fix before continuing.
Step 4 — pytest wrapper¶
# tests/test_verify_artifacts.py
import subprocess
from pathlib import Path
def test_verifier_passes_on_clean_manifest():
result = subprocess.run(["scripts/verify_artifacts.sh"], capture_output=True, text=True)
assert result.returncode == 0, result.stderr
def test_verifier_fails_on_tampered_manifest(tmp_path):
# Run the forge script which creates a tampered copy in tmp and runs verify.
result = subprocess.run(
["scripts/forge_tampered_manifest.sh"],
capture_output=True,
text=True,
env={**os.environ, "TMPDIR": str(tmp_path)},
)
assert result.returncode == 0, "forge script itself failed"
assert "tampering was detected" in result.stdout
Both tests must pass in CI.
Step 5 — Bandit policy¶
pyproject.toml add or extend:
[tool.bandit]
exclude_dirs = [".venv", "build", "dist"]
skips = []
# Explicitly enabled:
tests = ["B301", "B102", "B307"] # pickle, exec_used, eval
Custom rule for torch.load: bandit doesn't have a built-in. Either:
- Use
ruff's custom rule mechanism (pyproject.toml [tool.ruff.lint] select = ["S301"]reuses bandit's catalog). - Or write a tiny grep-based check in
scripts/check_torch_load.shthat fails on anytorch.load(lackingweights_only=Trueinsrc/.
# scripts/check_torch_load.sh
#!/usr/bin/env bash
set -euo pipefail
bad=$(rg -n 'torch\.load\(' src/ | grep -v 'weights_only=True' || true)
if [[ -n "$bad" ]]; then
echo "ERROR: torch.load without weights_only=True:" >&2
echo "$bad" >&2
exit 1
fi
echo "OK: all torch.load calls use weights_only=True (or no calls present)"
Add to CI: just security runs both bandit src/ and scripts/check_torch_load.sh.
Step 6 — Refactor: safetensors everywhere¶
For every checkpoint save/load in src/:
# Before:
torch.save(model.state_dict(), "checkpoint.pt")
state = torch.load("checkpoint.pt")
# After:
from safetensors.torch import save_file, load_file
save_file(model.state_dict(), "checkpoint.safetensors")
state = load_file("checkpoint.safetensors")
Update the MANIFEST.json schema in Phase 18 to record .safetensors artifacts instead of .pt. Old .pt checkpoints can be one-shot converted with a scripts/convert_pt_to_safetensors.py helper.
Bandit will now flag any remaining torch.load as a finding. Fix all callers, or document each # nosec B301 with a justification in security/THREATS.md.
Step 7 — Extend security/supply-chain.md¶
Append a grammar-tutor section:
## Grammar tutor (Phase 32) — artifacts
| Artifact | Path | Format | Integrity check |
|---|---|---|---|
| Model weights | `artifacts/checkpoints/mini-gpt-grammar.safetensors` | safetensors | MANIFEST.json SHA256 |
| Tokenizer vocab | `artifacts/tokenizer/vocab.json` | JSON | MANIFEST.json SHA256 |
| Tokenizer merges | `artifacts/tokenizer/merges.txt` | text | MANIFEST.json SHA256 |
| RAG index | `artifacts/rag/index/embeddings.safetensors` | safetensors | MANIFEST.json SHA256 |
| KB chunks | `data/kb/grammar-rules/chunks.jsonl` | JSONL | MANIFEST.json SHA256 |
## Loading policy
- Model weights: `safetensors.torch.load_file` only. `torch.load` is prohibited in `src/` (CI-enforced by `scripts/check_torch_load.sh` + bandit B301).
- KB chunks: JSON parser with schema validation; no `eval` or `pickle` anywhere in the load path.
- Before any `agent-start`, CI runs `scripts/verify_artifacts.sh`. Non-zero exit blocks the start.
Commit this as docs(security): grammar-tutor supply-chain extension.
Step 8 — THREATS.md row¶
| Phase | Surface | Asset at risk | Adversary | Mitigation | Status |
|---|---|---|---|---|---|
| 37 | Model weight load / KB document load | Code execution on host (pickle RCE), integrity of model and KB | Malicious checkpoint or tampered KB file | safetensors-only (bandit + custom rule); MANIFEST.json SHA256 verification (scripts/verify_artifacts.sh); CI gates agent-start on verification |
mitigated |
Commit: security: phase-37-threats-supply-chain.
Step 9 — What "done" looks like¶
-
scripts/verify_artifacts.shexits 0 on healthy, 1 on tampered, 2 on missing manifest. -
scripts/forge_tampered_manifest.shproves the failure path. -
tests/test_verify_artifacts.pyhas both clean + tampered cases. -
bandit src/runs injust securityand passes. -
scripts/check_torch_load.shruns injust securityand passes (notorch.loadwithoutweights_only=True). - All checkpoints in
artifacts/are.safetensors; any leftover.ptfiles are converted or removed. -
security/supply-chain.mdhas the grammar-tutor section. -
security/THREATS.mdhas the supply-chain row.
Common pitfalls¶
- Subshell variable scope in bash.
while read; do counter=$((counter+1)); done < fileworks;cmd | while read; ...doesn't updatecounterin the parent. Use process substitution. weights_only=Trueas the only defense. It's a defense-in-depth, not a replacement for safetensors. Pickle deserialization is still pickle deserialization; bugs in PyTorch's restricted-unpickler have happened. Use safetensors.- Forgetting to remove old
.ptfiles. A stalemodel.ptnext tomodel.safetensorsinvites a typo to load the wrong one. Either delete or move to a clearly-named_archive/directory excluded from the manifest. - Manifest without a signature. SHA256 verification catches accidental corruption and naive tampering. A sophisticated attacker rewrites both the file and the manifest. Stretch goal: GPG-sign the manifest itself.
- CI not gating on verification. A verifier that exists but isn't run before
agent-startis just decoration. Wire it into the just target.
Stretch goals¶
- GPG-sign
MANIFEST.jsonduring Phase 18 training.verify_artifacts.shchecks the signature with a public key checked intosecurity/keys/. - Per-file signatures (not just manifest-level): each
chunk_idin the KB gets its own GPG signature; the verifier checks each one. More work, but catches tampering of individual chunks even if the manifest is replaced. - A pre-commit hook that runs
scripts/verify_artifacts.shifMANIFEST.jsonor any tracked artifact changes. Catches accidental commits of unverified artifacts.
End of Phase 37 lab sequence. Next, write experiments/37-redteam-report/findings.md (the honest write-up) and PHASE_37_REPORT.md.