English · Español
Lab 02 — Add a custom pre-commit hook¶
Pre-req: read
../theory/02-engineering-hygiene.md. Goal: write a local pre-commit hook (not from a public repo) that enforces a project-specific rule. No peeking atsolutions/02-precommit-ref.mduntil you've made it work.
§1 Background¶
.pre-commit-config.yaml declares hooks. Most hooks live in public repos (pre-commit-hooks, ruff-pre-commit, etc.) and are versioned by tag. But for project-specific rules — things only this curriculum cares about — you write a local hook: a script in this repo that pre-commit invokes.
§2 Your task¶
Write a local pre-commit hook called forbid-pickle-in-checkpoint-load that:
- Scans staged
.pyfiles for the stringspickle.load(orpickle.loads(. - Allows matches in:
- Files whose path starts with
tests/. - Lines that include the trailing comment
# nosec safe-source: <reason>. - Rejects any other match with a clear error message that mentions
safetensorsas the alternative.
This is a real defense from security/THREATS.md: pickle.load on a checkpoint that came from an untrusted source executes arbitrary code on load. Phase 16+ exclusively uses safetensors.
§3 Constraints¶
- Pure Python, no extra deps. Use only the standard library.
- The script lives at
scripts/precommit/forbid_pickle_loads.py. - It must be runnable directly (
uv run python scripts/precommit/forbid_pickle_loads.py file1.py file2.py) and exit non-zero on a finding. - It must be wired into
.pre-commit-config.yamlas arepo: localhook withlanguage: python. - Output must be
path:line:col: <message>on offending lines — same formatruffuses, so editor LSPs can navigate.
§4 Tests¶
Add tests/test_forbid_pickle_loads.py covering:
- A file containing
pickle.load(...)→ hook exits 1, prints a finding. - A file in
tests/containingpickle.load(...)→ hook exits 0. - A file with
pickle.load(...) # nosec safe-source: round-trip test→ hook exits 0. - A file with
pickle.loads(b"...")→ hook exits 1. - A file with
import pickleonly → hook exits 0.
Don't read input from filesystem fixtures only — pytest's capsys + passing temp file paths via tmp_path is cleaner.
§5 Stop conditions¶
-
uv run python scripts/precommit/forbid_pickle_loads.py <file>produces the expected exit codes. - The hook fires on a real
git commit -am 'test'where you've addedpickle.load(...)somewhere not whitelisted. -
pytest tests/test_forbid_pickle_loads.pypasses. -
just lintis green. - Commit:
lab: phase-00 add forbid-pickle-loads pre-commit hook.
§6 What you'll have learned¶
- How pre-commit invokes local hooks (file list as argv, exit-code = pass/fail).
- The difference between a style gate (ruff) and a policy gate (this one — about security).
- Why
# nosecwith a reason is better than blanket suppression (auditable, scoped, removable). - The
safetensorsargument as a concrete example.
§7 Hints (use sparingly)¶
pre-commitpasses staged files as positional args to the hook.sys.argv[1:]is the list.- Use
tokenizeor a simple regex over lines — full AST parsing is overkill for a string-find policy. - The
repo: localconfig in pre-commit: - The hook should also work when run with zero files (no-op, exit 0).
If you reach for
solutions/02-precommit-ref.mdbefore completing this, markdod.lab_attempted_before_solutions: false. Honesty matters here.