Skip to content

English · Español

Engineering hygiene — pre-commit, ruff, mypy, bandit, pip-audit, nbstripout

🇪🇸 Resumen. "Higiene" no es estilo: es la red de seguridad que detiene bugs invisibles (tipos rotos, imports muertos, secretos en cuadernos, CVEs en dependencias) antes de llegar al commit. Las reglas las ejecuta pre-commit; las repos las ejecuta CI.

§0 The principle

Catch defects as close to their introduction as possible. The cost of a defect grows roughly geometrically with how far it travels:

typed wrong in editor → caught by mypy        ~5 seconds wasted
caught by pre-commit                          ~30 seconds
caught by CI                                  ~5 minutes
caught by a teammate in review                ~hours
caught in production                          hours to days

Phase 0 wires every gate so the first row is the common case.

§1 The gates

Gate What it catches When it runs
ruff check unused imports, undefined names, mutable defaults, comprehension misuse, deprecated patterns pre-commit + CI + editor LSP
ruff format style drift (blocks 80% of "style" PR comments) pre-commit + CI
mypy --strict tensor-shape bugs (via type hints), Optional mishandling, Any propagation pre-commit (src/ only) + CI
pytest functional regressions, autouse seed fixture catches non-determinism manual + CI
bandit pickle.load on untrusted input, subprocess(shell=True, user_input), assert in prod, hardcoded passwords, weak crypto pre-commit + CI
pip-audit known CVEs in any locked dep just audit-deps + CI weekly
nbstripout committed notebook output (secrets, GBs of arrays, diff noise) pre-commit
nbqa-ruff + nbqa-mypy the same checks above applied to notebooks pre-commit
check-added-large-files accidentally-committed model checkpoint / dataset pre-commit
detect-private-key accidentally-committed SSH keys / PEM pre-commit

§2 ruff — the linter + formatter

We use the rules in pyproject.toml: - E / F — pycodestyle / pyflakes (basics). - I — import order (replaces isort). - B — bugbear (mutable defaults, except: without re-raise, etc.). - UP — pyupgrade (use modern syntax — Path over os.path, | unions, etc.). - N — pep8-naming. - C4 — comprehension correctness (avoid list(map(...)) when a comprehension is clearer). - RET — return-statement issues (return None redundant, etc.). - SIM — code simplifications.

ignore = ["E501"] because the formatter enforces line length, and ruff's E501 then becomes redundant.

ruff format is opinionated — we accept its choices to delete the debate. Two-space indent? Single-quote strings? The formatter wins. Time saved is real.

§3 mypy --strict

strict mode bundles: - --disallow-untyped-defs — every function has type hints. - --disallow-any-generics — no bare list, dict, tuple; specify the parameter. - --disallow-untyped-decorators — no @some_untyped_decorator quietly poisoning types. - --no-implicit-optionaldef f(x: int = None) is rejected; must be Optional[int]. - --warn-return-any — flag if a function annotated as int returns Any. - --warn-unreachable — dead branches caught.

We type only src/ (production code). Tests and experiments are intentionally untyped (they're throwaway / exploratory). The pyproject.toml config reflects this.

§3.1 Why this catches ML bugs

Tensor shapes:

def normalize(x: NDArray[np.float32], axis: int = -1) -> NDArray[np.float32]:
    return x - x.mean(axis, keepdims=True)

A subsequent caller that accidentally passes an int for x (e.g., a misplaced reduction) is caught by mypy before the test even runs. The dimension axis=-1 default is preserved through the type system. As we add jaxtyping / tensorly shape annotations in later phases, mypy's coverage will grow.

§4 bandit

Static analyzer for common Python security smells (not style). The ones that matter for us:

  • B301: pickle.load — phase 16+ checkpoint loading must use safetensors, not pickle, exactly because pickle.load can execute arbitrary code on load.
  • B602: subprocess(shell=True) with user input — command injection.
  • B105 / B106: hardcoded password strings.
  • B324: hashlib.md5 / sha1 — weak hashes, flag for security uses.
  • B101: assert statements — they're stripped in python -O, so they're worthless for security checks (and we use real validation where it matters).

🇪🇸 bandit no es un linter de estilo: busca patrones de seguridad como pickle.load (puede ejecutar código arbitrario) o subprocess(shell=True) con input del usuario.

§5 pip-audit

Reads the lockfile, checks every locked package against the PyPA Advisory Database. Output is a CVE list with: package, installed version, fixed version, severity.

Policy in this repo: just audit-deps is enforced from Phase 0. Any new CVE blocks the next commit until either the dep is upgraded or the CVE is marked as not applicable with a written justification in security/THREATS.md.

§6 nbstripout + nbqa

Notebooks (*.ipynb) are JSON. Their outputs cells contain rendered images, dataframe HTML, computation results — frequently MBs each. Committed, they: - Make diffs unreadable. - Bloat the repo to GBs over a year. - Leak secrets (cell output of os.environ, print(api_key), etc.).

nbstripout runs in pre-commit and strips outputs + execution_count from any .ipynb before it lands in a commit. The notebook still runs identically; only the committed artifact is the stripped version.

nbqa-ruff + nbqa-mypy apply our ruff / mypy rules to notebook code cells. The same standards as src/. Notebooks are not write-only sketchpads — when they're committed, they're documentation.

§7 The pre-commit framework

.pre-commit-config.yaml declares the hooks; pre-commit install wires them as .git/hooks/pre-commit. On every git commit, the hooks run on the staged files only (fast — typical run is < 2 s on a 100-file diff).

Anti-pattern: git commit --no-verify to skip hooks. We don't do that. If a hook fails, fix the underlying issue. (CLAUDE.md §0 calls this out explicitly.)

§8 What this looks like at the commit level

A typical successful pre-commit run:

end-of-file-fixer....................Passed
trailing-whitespace..................Passed
check-yaml...........................Passed
check-toml...........................Passed
check-added-large-files..............Passed
detect-private-key...................Passed
ruff.................................Passed
ruff-format..........................Passed
mypy.................................Passed
bandit...............................Passed
nbstripout...........................Passed
nbqa-ruff............................Passed
nbqa-mypy............................Passed

A failing run halts the commit. Fix → re-stage → retry.

§9 Conventional commits (a small extra layer)

commitizen is installed and we adopt Conventional Commits:

phase: open Phase 1 — linear algebra
theory: derive softmax with max-shift
lab: write the Justfile exercise
feat(utils): add log_versions
fix(seeding): cover np.random.default_rng generator
chore: bump uv 0.4.18 → 0.4.19
docs: rewrite reproducibility theory
test(utils): add seed determinism property test
security: pin transitive cryptography>=43
ci: split lint and test jobs

Why: git log --grep '^phase:' gives a phase history. git log --grep '^security:' gives a security history. The grouping isn't tooling-driven; it's documentation that survives.

§10 Exercises (solutions in solutions/)

  1. Add a single pre-commit hook that rejects any commit that adds a file > 1 MiB. (Hint: this exists in pre-commit-hooks already — find it.)
  2. Without running mypy, predict whether the following will pass --strict:
    def f(x):
        return x + 1
    
    If it fails, why? Write the minimal fix.
  3. Write a bandit config that allows pickle.load only in files named test_pickle_*.py. (Real use case: round-trip tests for legacy formats.)

§11 Pitfalls

  • Auto-fixing during a review. ruff --fix rewrites your code. Commit the un-fixed version, run --fix, review the diff before squashing. Otherwise you commit code you haven't read.
  • Suppressing mypy errors with # type: ignore without a reason code. Always # type: ignore[error-code] so the suppression is auditable and gets removed when the underlying bug is fixed.
  • Letting bandit warnings accumulate. If you # nosec something, comment why. Mass-# nosec-ing is how a real CVE slips through.
  • Disabling nbstripout "just for this commit." It's how a print(API_KEY) cell output ends up on GitHub.

03-dev-environment.md — IDE, plugins, CLI, Claude Code customization.