Skip to content

English · Español

Lab 02 — Add a custom pre-commit hook

Pre-req: read ../theory/02-engineering-hygiene.md. Goal: write a local pre-commit hook (not from a public repo) that enforces a project-specific rule. No peeking at solutions/02-precommit-ref.md until you've made it work.

§1 Background

.pre-commit-config.yaml declares hooks. Most hooks live in public repos (pre-commit-hooks, ruff-pre-commit, etc.) and are versioned by tag. But for project-specific rules — things only this curriculum cares about — you write a local hook: a script in this repo that pre-commit invokes.

§2 Your task

Write a local pre-commit hook called forbid-pickle-in-checkpoint-load that:

  1. Scans staged .py files for the strings pickle.load( or pickle.loads(.
  2. Allows matches in:
  3. Files whose path starts with tests/.
  4. Lines that include the trailing comment # nosec safe-source: <reason>.
  5. Rejects any other match with a clear error message that mentions safetensors as the alternative.

This is a real defense from security/THREATS.md: pickle.load on a checkpoint that came from an untrusted source executes arbitrary code on load. Phase 16+ exclusively uses safetensors.

§3 Constraints

  • Pure Python, no extra deps. Use only the standard library.
  • The script lives at scripts/precommit/forbid_pickle_loads.py.
  • It must be runnable directly (uv run python scripts/precommit/forbid_pickle_loads.py file1.py file2.py) and exit non-zero on a finding.
  • It must be wired into .pre-commit-config.yaml as a repo: local hook with language: python.
  • Output must be path:line:col: <message> on offending lines — same format ruff uses, so editor LSPs can navigate.

§4 Tests

Add tests/test_forbid_pickle_loads.py covering:

  1. A file containing pickle.load(...) → hook exits 1, prints a finding.
  2. A file in tests/ containing pickle.load(...) → hook exits 0.
  3. A file with pickle.load(...) # nosec safe-source: round-trip test → hook exits 0.
  4. A file with pickle.loads(b"...") → hook exits 1.
  5. A file with import pickle only → hook exits 0.

Don't read input from filesystem fixtures only — pytest's capsys + passing temp file paths via tmp_path is cleaner.

§5 Stop conditions

  • uv run python scripts/precommit/forbid_pickle_loads.py <file> produces the expected exit codes.
  • The hook fires on a real git commit -am 'test' where you've added pickle.load(...) somewhere not whitelisted.
  • pytest tests/test_forbid_pickle_loads.py passes.
  • just lint is green.
  • Commit: lab: phase-00 add forbid-pickle-loads pre-commit hook.

§6 What you'll have learned

  • How pre-commit invokes local hooks (file list as argv, exit-code = pass/fail).
  • The difference between a style gate (ruff) and a policy gate (this one — about security).
  • Why # nosec with a reason is better than blanket suppression (auditable, scoped, removable).
  • The safetensors argument as a concrete example.

§7 Hints (use sparingly)

  1. pre-commit passes staged files as positional args to the hook. sys.argv[1:] is the list.
  2. Use tokenize or a simple regex over lines — full AST parsing is overkill for a string-find policy.
  3. The repo: local config in pre-commit:
    - repo: local
      hooks:
        - id: forbid-pickle-loads
          name: Forbid pickle.load(s) outside tests/
          entry: uv run python scripts/precommit/forbid_pickle_loads.py
          language: system
          types: [python]
    
  4. The hook should also work when run with zero files (no-op, exit 0).

If you reach for solutions/02-precommit-ref.md before completing this, mark dod.lab_attempted_before_solutions: false. Honesty matters here.