English · Español
Engineering hygiene — pre-commit, ruff, mypy, bandit, pip-audit, nbstripout¶
🇪🇸 Resumen. "Higiene" no es estilo: es la red de seguridad que detiene bugs invisibles (tipos rotos, imports muertos, secretos en cuadernos, CVEs en dependencias) antes de llegar al commit. Las reglas las ejecuta
pre-commit; las repos las ejecuta CI.
§0 The principle¶
Catch defects as close to their introduction as possible. The cost of a defect grows roughly geometrically with how far it travels:
typed wrong in editor → caught by mypy ~5 seconds wasted
caught by pre-commit ~30 seconds
caught by CI ~5 minutes
caught by a teammate in review ~hours
caught in production hours to days
Phase 0 wires every gate so the first row is the common case.
§1 The gates¶
| Gate | What it catches | When it runs |
|---|---|---|
ruff check |
unused imports, undefined names, mutable defaults, comprehension misuse, deprecated patterns | pre-commit + CI + editor LSP |
ruff format |
style drift (blocks 80% of "style" PR comments) | pre-commit + CI |
mypy --strict |
tensor-shape bugs (via type hints), Optional mishandling, Any propagation |
pre-commit (src/ only) + CI |
pytest |
functional regressions, autouse seed fixture catches non-determinism | manual + CI |
bandit |
pickle.load on untrusted input, subprocess(shell=True, user_input), assert in prod, hardcoded passwords, weak crypto |
pre-commit + CI |
pip-audit |
known CVEs in any locked dep | just audit-deps + CI weekly |
nbstripout |
committed notebook output (secrets, GBs of arrays, diff noise) | pre-commit |
nbqa-ruff + nbqa-mypy |
the same checks above applied to notebooks | pre-commit |
check-added-large-files |
accidentally-committed model checkpoint / dataset | pre-commit |
detect-private-key |
accidentally-committed SSH keys / PEM | pre-commit |
§2 ruff — the linter + formatter¶
We use the rules in pyproject.toml:
- E / F — pycodestyle / pyflakes (basics).
- I — import order (replaces isort).
- B — bugbear (mutable defaults, except: without re-raise, etc.).
- UP — pyupgrade (use modern syntax — Path over os.path, | unions, etc.).
- N — pep8-naming.
- C4 — comprehension correctness (avoid list(map(...)) when a comprehension is clearer).
- RET — return-statement issues (return None redundant, etc.).
- SIM — code simplifications.
ignore = ["E501"] because the formatter enforces line length, and ruff's E501 then becomes redundant.
ruff format is opinionated — we accept its choices to delete the debate. Two-space indent? Single-quote strings? The formatter wins. Time saved is real.
§3 mypy --strict¶
strict mode bundles:
- --disallow-untyped-defs — every function has type hints.
- --disallow-any-generics — no bare list, dict, tuple; specify the parameter.
- --disallow-untyped-decorators — no @some_untyped_decorator quietly poisoning types.
- --no-implicit-optional — def f(x: int = None) is rejected; must be Optional[int].
- --warn-return-any — flag if a function annotated as int returns Any.
- --warn-unreachable — dead branches caught.
We type only src/ (production code). Tests and experiments are intentionally untyped (they're throwaway / exploratory). The pyproject.toml config reflects this.
§3.1 Why this catches ML bugs¶
Tensor shapes:
def normalize(x: NDArray[np.float32], axis: int = -1) -> NDArray[np.float32]:
return x - x.mean(axis, keepdims=True)
A subsequent caller that accidentally passes an int for x (e.g., a misplaced reduction) is caught by mypy before the test even runs. The dimension axis=-1 default is preserved through the type system. As we add jaxtyping / tensorly shape annotations in later phases, mypy's coverage will grow.
§4 bandit¶
Static analyzer for common Python security smells (not style). The ones that matter for us:
- B301:
pickle.load— phase 16+ checkpoint loading must usesafetensors, notpickle, exactly becausepickle.loadcan execute arbitrary code on load. - B602:
subprocess(shell=True)with user input — command injection. - B105 / B106: hardcoded password strings.
- B324:
hashlib.md5/sha1— weak hashes, flag for security uses. - B101:
assertstatements — they're stripped inpython -O, so they're worthless for security checks (and we use real validation where it matters).
🇪🇸
banditno es un linter de estilo: busca patrones de seguridad comopickle.load(puede ejecutar código arbitrario) osubprocess(shell=True)con input del usuario.
§5 pip-audit¶
Reads the lockfile, checks every locked package against the PyPA Advisory Database. Output is a CVE list with: package, installed version, fixed version, severity.
Policy in this repo: just audit-deps is enforced from Phase 0. Any new CVE blocks the next commit until either the dep is upgraded or the CVE is marked as not applicable with a written justification in security/THREATS.md.
§6 nbstripout + nbqa¶
Notebooks (*.ipynb) are JSON. Their outputs cells contain rendered images, dataframe HTML, computation results — frequently MBs each. Committed, they:
- Make diffs unreadable.
- Bloat the repo to GBs over a year.
- Leak secrets (cell output of os.environ, print(api_key), etc.).
nbstripout runs in pre-commit and strips outputs + execution_count from any .ipynb before it lands in a commit. The notebook still runs identically; only the committed artifact is the stripped version.
nbqa-ruff + nbqa-mypy apply our ruff / mypy rules to notebook code cells. The same standards as src/. Notebooks are not write-only sketchpads — when they're committed, they're documentation.
§7 The pre-commit framework¶
.pre-commit-config.yaml declares the hooks; pre-commit install wires them as .git/hooks/pre-commit. On every git commit, the hooks run on the staged files only (fast — typical run is < 2 s on a 100-file diff).
Anti-pattern: git commit --no-verify to skip hooks. We don't do that. If a hook fails, fix the underlying issue. (CLAUDE.md §0 calls this out explicitly.)
§8 What this looks like at the commit level¶
A typical successful pre-commit run:
end-of-file-fixer....................Passed
trailing-whitespace..................Passed
check-yaml...........................Passed
check-toml...........................Passed
check-added-large-files..............Passed
detect-private-key...................Passed
ruff.................................Passed
ruff-format..........................Passed
mypy.................................Passed
bandit...............................Passed
nbstripout...........................Passed
nbqa-ruff............................Passed
nbqa-mypy............................Passed
A failing run halts the commit. Fix → re-stage → retry.
§9 Conventional commits (a small extra layer)¶
commitizen is installed and we adopt Conventional Commits:
phase: open Phase 1 — linear algebra
theory: derive softmax with max-shift
lab: write the Justfile exercise
feat(utils): add log_versions
fix(seeding): cover np.random.default_rng generator
chore: bump uv 0.4.18 → 0.4.19
docs: rewrite reproducibility theory
test(utils): add seed determinism property test
security: pin transitive cryptography>=43
ci: split lint and test jobs
Why: git log --grep '^phase:' gives a phase history. git log --grep '^security:' gives a security history. The grouping isn't tooling-driven; it's documentation that survives.
§10 Exercises (solutions in solutions/)¶
- Add a single pre-commit hook that rejects any commit that adds a file > 1 MiB. (Hint: this exists in
pre-commit-hooksalready — find it.) - Without running mypy, predict whether the following will pass
--strict: If it fails, why? Write the minimal fix. - Write a
banditconfig that allowspickle.loadonly in files namedtest_pickle_*.py. (Real use case: round-trip tests for legacy formats.)
§11 Pitfalls¶
- Auto-fixing during a review.
ruff --fixrewrites your code. Commit the un-fixed version, run--fix, review the diff before squashing. Otherwise you commit code you haven't read. - Suppressing mypy errors with
# type: ignorewithout a reason code. Always# type: ignore[error-code]so the suppression is auditable and gets removed when the underlying bug is fixed. - Letting
banditwarnings accumulate. If you# nosecsomething, comment why. Mass-# nosec-ing is how a real CVE slips through. - Disabling
nbstripout"just for this commit." It's how aprint(API_KEY)cell output ends up on GitHub.
§12 Read next¶
→ 03-dev-environment.md — IDE, plugins, CLI, Claude Code customization.