Skip to content

English · Español

Lab 04 — Supply chain: verify_artifacts.sh and safetensors enforcement

🇪🇸 El lab más corto y de mayor retorno: un script shell que verifica SHA256 de cada artefacto contra MANIFEST.json, más un bandit-rule que prohíbe cargar pickles. Cinco minutos de paranoia, cero RCE por carga de pesos.


Goal

Ship two cheap defenses that together close the highest-severity supply-chain risks:

  1. scripts/verify_artifacts.sh — walks MANIFEST.json, hashes each artifact, exits non-zero on mismatch.
  2. safetensors-only policy — enforced by bandit rule B301 plus a custom rule that flags any torch.load( without weights_only=True.

This is the smallest lab by linecount, the largest lab by residual-risk reduction (T6 + T7 in the Theory 03 matrix: \(\Delta R \approx -0.85\) combined).

Deliverables

  • scripts/verify_artifacts.sh — executable shell script; passes on healthy manifest, fails clearly on tampered.
  • scripts/forge_tampered_manifest.sh — test helper: copies the manifest to a temp dir, mutates one byte of one artifact, runs verify, asserts non-zero exit. Used to prove the verifier actually fires.
  • tests/test_verify_artifacts.py — pytest wrapping the two shell scripts.
  • security/banditrc.yaml or pyproject.toml [tool.bandit] block — enable B301 (pickle) and add custom rule for torch.load without weights_only.
  • A new row in security/THREATS.md (Borja appends; commit security: phase-37-threats-supply-chain).
  • security/supply-chain.md extended with a grammar-tutor-specific section (the file likely exists from Phase 0; extend, don't overwrite).

Step 1 — Write scripts/verify_artifacts.sh

Shape (concrete implementation is Borja's; this is the contract):

#!/usr/bin/env bash
set -euo pipefail

MANIFEST="${1:-MANIFEST.json}"

if [[ ! -f "$MANIFEST" ]]; then
    echo "ERROR: manifest not found: $MANIFEST" >&2
    exit 2
fi

mismatches=0

jq -r '.artifacts[] | "\(.sha256)  \(.path)"' "$MANIFEST" | while read -r expected path; do
    if [[ ! -f "$path" ]]; then
        echo "MISSING: $path" >&2
        mismatches=$((mismatches + 1))
        continue
    fi
    actual=$(sha256sum "$path" | awk '{print $1}')
    if [[ "$actual" != "$expected" ]]; then
        echo "MISMATCH: $path" >&2
        echo "  expected: $expected" >&2
        echo "  actual:   $actual" >&2
        mismatches=$((mismatches + 1))
    fi
done

if [[ $mismatches -gt 0 ]]; then
    echo "FAIL: $mismatches mismatch(es)" >&2
    exit 1
fi
echo "OK: all $(jq '.artifacts | length' "$MANIFEST") artifacts verified"

Notes:

  • set -euo pipefail ensures bash actually fails on intermediate errors. Without it, while read swallowing errors is a classic gotcha.
  • Exit codes: 0 = all good. 1 = at least one mismatch. 2 = manifest missing / unreadable. The distinction matters for CI.
  • The mismatches counter inside a while read subshell is the one classical bash pitfall — careful with subshell scoping. If Borja's implementation reads inside a pipeline, the counter doesn't update in the parent. Solutions: process substitution (< <(jq ...)) or set -o lastpipe.

Step 2 — Run the verifier (healthy state)

$ scripts/verify_artifacts.sh
OK: all 7 artifacts verified
$ echo $?
0

If it doesn't return 0 on a freshly trained Phase 18 manifest, something is already wrong — investigate before continuing.

Step 3 — Forge a tampered manifest and verify failure

scripts/forge_tampered_manifest.sh:

#!/usr/bin/env bash
set -euo pipefail
WORK=$(mktemp -d)
cp -r artifacts "$WORK/"
cp MANIFEST.json "$WORK/"
cd "$WORK"

# Flip one byte in the model weights.
WEIGHTS=$(jq -r '.artifacts[] | select(.role == "model-weights").path' MANIFEST.json)
printf '\x00' | dd of="$WEIGHTS" bs=1 count=1 seek=42 conv=notrunc 2>/dev/null

# Verify must fail.
if scripts/verify_artifacts.sh; then
    echo "FAIL: verifier did not detect tampering" >&2
    exit 1
fi
echo "OK: tampering was detected"

Run it and confirm "tampering was detected." If verifier passes on the tampered file, the verifier is broken — fix before continuing.

Step 4 — pytest wrapper

# tests/test_verify_artifacts.py
import subprocess
from pathlib import Path

def test_verifier_passes_on_clean_manifest():
    result = subprocess.run(["scripts/verify_artifacts.sh"], capture_output=True, text=True)
    assert result.returncode == 0, result.stderr

def test_verifier_fails_on_tampered_manifest(tmp_path):
    # Run the forge script which creates a tampered copy in tmp and runs verify.
    result = subprocess.run(
        ["scripts/forge_tampered_manifest.sh"],
        capture_output=True,
        text=True,
        env={**os.environ, "TMPDIR": str(tmp_path)},
    )
    assert result.returncode == 0, "forge script itself failed"
    assert "tampering was detected" in result.stdout

Both tests must pass in CI.

Step 5 — Bandit policy

pyproject.toml add or extend:

[tool.bandit]
exclude_dirs = [".venv", "build", "dist"]
skips = []
# Explicitly enabled:
tests = ["B301", "B102", "B307"]    # pickle, exec_used, eval

Custom rule for torch.load: bandit doesn't have a built-in. Either:

  • Use ruff's custom rule mechanism (pyproject.toml [tool.ruff.lint] select = ["S301"] reuses bandit's catalog).
  • Or write a tiny grep-based check in scripts/check_torch_load.sh that fails on any torch.load( lacking weights_only=True in src/.
# scripts/check_torch_load.sh
#!/usr/bin/env bash
set -euo pipefail
bad=$(rg -n 'torch\.load\(' src/ | grep -v 'weights_only=True' || true)
if [[ -n "$bad" ]]; then
    echo "ERROR: torch.load without weights_only=True:" >&2
    echo "$bad" >&2
    exit 1
fi
echo "OK: all torch.load calls use weights_only=True (or no calls present)"

Add to CI: just security runs both bandit src/ and scripts/check_torch_load.sh.

Step 6 — Refactor: safetensors everywhere

For every checkpoint save/load in src/:

# Before:
torch.save(model.state_dict(), "checkpoint.pt")
state = torch.load("checkpoint.pt")

# After:
from safetensors.torch import save_file, load_file
save_file(model.state_dict(), "checkpoint.safetensors")
state = load_file("checkpoint.safetensors")

Update the MANIFEST.json schema in Phase 18 to record .safetensors artifacts instead of .pt. Old .pt checkpoints can be one-shot converted with a scripts/convert_pt_to_safetensors.py helper.

Bandit will now flag any remaining torch.load as a finding. Fix all callers, or document each # nosec B301 with a justification in security/THREATS.md.

Step 7 — Extend security/supply-chain.md

Append a grammar-tutor section:

## Grammar tutor (Phase 32) — artifacts

| Artifact | Path | Format | Integrity check |
|---|---|---|---|
| Model weights | `artifacts/checkpoints/mini-gpt-grammar.safetensors` | safetensors | MANIFEST.json SHA256 |
| Tokenizer vocab | `artifacts/tokenizer/vocab.json` | JSON | MANIFEST.json SHA256 |
| Tokenizer merges | `artifacts/tokenizer/merges.txt` | text | MANIFEST.json SHA256 |
| RAG index | `artifacts/rag/index/embeddings.safetensors` | safetensors | MANIFEST.json SHA256 |
| KB chunks | `data/kb/grammar-rules/chunks.jsonl` | JSONL | MANIFEST.json SHA256 |

## Loading policy

- Model weights: `safetensors.torch.load_file` only. `torch.load` is prohibited in `src/` (CI-enforced by `scripts/check_torch_load.sh` + bandit B301).
- KB chunks: JSON parser with schema validation; no `eval` or `pickle` anywhere in the load path.
- Before any `agent-start`, CI runs `scripts/verify_artifacts.sh`. Non-zero exit blocks the start.

Commit this as docs(security): grammar-tutor supply-chain extension.

Step 8 — THREATS.md row

Phase Surface Asset at risk Adversary Mitigation Status
37 Model weight load / KB document load Code execution on host (pickle RCE), integrity of model and KB Malicious checkpoint or tampered KB file safetensors-only (bandit + custom rule); MANIFEST.json SHA256 verification (scripts/verify_artifacts.sh); CI gates agent-start on verification mitigated

Commit: security: phase-37-threats-supply-chain.

Step 9 — What "done" looks like

  • scripts/verify_artifacts.sh exits 0 on healthy, 1 on tampered, 2 on missing manifest.
  • scripts/forge_tampered_manifest.sh proves the failure path.
  • tests/test_verify_artifacts.py has both clean + tampered cases.
  • bandit src/ runs in just security and passes.
  • scripts/check_torch_load.sh runs in just security and passes (no torch.load without weights_only=True).
  • All checkpoints in artifacts/ are .safetensors; any leftover .pt files are converted or removed.
  • security/supply-chain.md has the grammar-tutor section.
  • security/THREATS.md has the supply-chain row.

Common pitfalls

  1. Subshell variable scope in bash. while read; do counter=$((counter+1)); done < file works; cmd | while read; ... doesn't update counter in the parent. Use process substitution.
  2. weights_only=True as the only defense. It's a defense-in-depth, not a replacement for safetensors. Pickle deserialization is still pickle deserialization; bugs in PyTorch's restricted-unpickler have happened. Use safetensors.
  3. Forgetting to remove old .pt files. A stale model.pt next to model.safetensors invites a typo to load the wrong one. Either delete or move to a clearly-named _archive/ directory excluded from the manifest.
  4. Manifest without a signature. SHA256 verification catches accidental corruption and naive tampering. A sophisticated attacker rewrites both the file and the manifest. Stretch goal: GPG-sign the manifest itself.
  5. CI not gating on verification. A verifier that exists but isn't run before agent-start is just decoration. Wire it into the just target.

Stretch goals

  • GPG-sign MANIFEST.json during Phase 18 training. verify_artifacts.sh checks the signature with a public key checked into security/keys/.
  • Per-file signatures (not just manifest-level): each chunk_id in the KB gets its own GPG signature; the verifier checks each one. More work, but catches tampering of individual chunks even if the manifest is replaced.
  • A pre-commit hook that runs scripts/verify_artifacts.sh if MANIFEST.json or any tracked artifact changes. Catches accidental commits of unverified artifacts.

End of Phase 37 lab sequence. Next, write experiments/37-redteam-report/findings.md (the honest write-up) and PHASE_37_REPORT.md.