English · Español
Reproducibility — seeds, lockfiles, manifests¶
🇪🇸 Resumen. Tres mecanismos: (1) sembrar todas las fuentes de aleatoriedad, (2) congelar versiones exactas con un lockfile, (3) persistir un manifiesto por experimento. Sin los tres, los resultados no son reproducibles — son anécdotas.
§0 The three pillars¶
A result is reproducible when someone else, six months from now, on different hardware, can re-run your script and get bit-identical (or, on GPU, "within documented tolerance") numerical results. That requires three things:
- All randomness is seeded — every RNG that touches the computation graph.
- All code dependencies are pinned — exact versions + hashes, in a lockfile that's committed.
- Per-experiment provenance is recorded — the seed, the lockfile sha, the git sha, the hardware, the config. Lose any one and the run is unreproducible by definition.
§1 Sources of randomness¶
In a NumPy + (eventually) PyTorch + CUDA stack, the RNG sources are:
| Source | Where it bites you |
|---|---|
random (stdlib) |
random.shuffle, random.choice, anything in secrets (no — secrets is intentionally non-seedable) |
numpy.random (legacy + Generator) |
Most pre-ML phases |
torch CPU RNG |
torch.randn, torch.randperm, dropout, init layers |
torch.cuda per-device RNG |
GPU dropouts, GPU init |
| cuDNN nondeterministic algorithms | torch.backends.cudnn.deterministic, cudnn.benchmark |
PYTHONHASHSEED |
Dict iteration order, hash(str) — affects dataloaders that index by hash |
| OS scheduling / multi-threading | OMP_NUM_THREADS, BLAS threading; sum-reduction order on >2 threads is non-deterministic |
| Hardware nondeterminism (TF32, FP16 atomic add) | TF32 on Ampere+; atomicAdd in FP16 |
The function in src/utils/seeding.py covers the first five. The remaining three are handled at run boundaries — OMP_NUM_THREADS=1 for fully deterministic CPU runs, TF32 disabled where determinism matters.
§1.1 Why each seeding call?¶
def seed_everything(seed: int) -> None:
os.environ["PYTHONHASHSEED"] = str(seed) # dict order, hash(str)
random.seed(seed) # stdlib RNG
np.random.seed(seed) # NumPy legacy global RNG
torch.manual_seed(seed) # CPU + default-device RNG
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed) # every CUDA device
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Subtleties:
- PYTHONHASHSEED must be set before Python starts to affect hash(str) deterministically across processes; setting it via os.environ after import only helps within the same process. In practice we set it in the launcher (just recipe or shell wrapper). Code-level setting is hygienic but not sufficient.
- np.random.seed only seeds the legacy global generator. Code that uses rng = np.random.default_rng(seed) is independent — and better — but the legacy seed is still set for libraries that haven't migrated.
- cudnn.benchmark = True lets cuDNN pick the fastest algorithm at runtime; the choice depends on input shapes and can flip between runs. Disabling it costs throughput but is the price of determinism.
§1.2 What seed_everything does not do¶
- It doesn't make
O(n)parallel reductions deterministic on multi-threaded BLAS. For that, setOMP_NUM_THREADS=1or use deterministic reduction implementations. - It doesn't make non-deterministic algorithms (some
scatter_addvariants,torch.use_deterministic_algorithms(True)is the way to enforce; it raises if you use a non-deterministic op). - It doesn't seed any threads spawned before the call. Call it first.
§2 Lockfiles — the difference between "I installed numpy 2.x" and "we both installed numpy 2.0.1 from the same wheel hash"¶
A requirements.txt with version specifiers like numpy>=2,<3 is not a lockfile — it's a constraint. The same constraint resolves to different exact versions on different days, depending on what's been released since.
A lockfile records: - Every direct dependency's exact version. - Every transitive dependency's exact version. - The wheel/sdist hash for each. - The resolver decisions (which conflicts were broken which way).
uv.lock is uv's lockfile format. It's checked in. uv sync reads it; it does not re-resolve unless you ask. The hash check means a compromised PyPI index can't quietly swap a package — installation would fail.
🇪🇸 Un
requirements.txtcon>=y<es una restricción, no un lockfile. Resuelve a versiones diferentes cada día. El lockfile congela las versiones exactas y los hashes — si alguien manipula el índice, la instalación falla.
§2.1 When the lockfile changes¶
- A direct dep is added/removed/upgraded in
pyproject.toml→ re-runuv lock. - A transitive dep's constraint moves (because a direct dep changed its constraints) →
uv lockregenerates. - The lockfile is committed. PRs that change it must justify the change.
§3 The experiment manifest¶
For every run that produces a numeric artifact (a loss, a metric, a checkpoint, a plot), persist:
{
"id": "2026-05-22-softmax-stability",
"git_sha": "a1b2c3d4...",
"git_dirty": false,
"seed": 42,
"config": { "/* the actual hyperparameters */": null },
"versions": {
"python": "3.11.9",
"numpy": "2.0.1",
"torch": "2.3.1",
"uv": "0.4.18"
},
"hardware": {
"cpu": "Intel i5-8250U",
"ram_gb": 62,
"gpu": "Intel UHD 620 (no CUDA)",
"os": "Fedora 43"
},
"started_at": "2026-05-22T19:14:02Z",
"finished_at": "2026-05-22T19:14:08Z",
"wall_seconds": 6.31,
"artifacts": ["plot.svg", "loss.npy"]
}
src/utils/seeding.py has log_versions() for the versions block; the rest is composed in the experiment script.
§3.1 Why hardware is in the manifest¶
CPU vs. GPU paths are bit-different. Within GPU, sm_70 vs sm_80 differ on TF32 default. Within CPU, the BLAS reduction tree can vary by core count. If you don't record hardware, you can't tell whether a numerical discrepancy is your bug or the machine's.
§3.2 Why "git_dirty" matters¶
If the working tree is dirty (uncommitted changes), the git sha is a lie. Manifests with git_dirty: true are quarantine artifacts — useful for fast iteration, useless for the report. The phase-gatekeeper flags any DoD-relevant manifest with git_dirty: true.
§4 What the lab will test¶
- §lab/03: re-implement
seed_everythingfrom scratch without peeking. Confirm withpytestthat 10 invocations ofrandom.random()afterseed_everything(0)produce the same sequence as 10 invocations afterrandom.seed(0). - §lab/00 checklist: confirm
uv.lockis present,pip-auditis clean,banditis clean.
§5 Pitfalls¶
- Forgetting to seed before forking a worker process. Subprocess gets a fresh RNG state unless you seed it inside.
- Setting
PYTHONHASHSEEDin code instead of in the launcher. Affectshash(str)only for new processes. - Relying on
bool(torch.cuda.is_available())at module import time. It can returnTrueon systems where CUDA is broken at runtime. Wrap CUDA-only paths in try/except. - Trusting
numpy.random.seedto seednp.random.default_rng(...). It doesn't —default_rnghas its own state. - Persisting the manifest without
wall_secondsandfinished_at. You'll want the timing data when you're debugging "why did this run take 8× longer this time?".
§6 Exercises (solutions in solutions/)¶
- Without looking at
src/utils/seeding.py, write aseed_everything(seed: int) -> Nonethat coversrandom,numpy,torch(if importable). Test that calling it twice with the same seed gives the same first tenrandom.random()outputs. - Add a
log_versions() -> dict[str, str]that returns Python + NumPy + Torch + uv versions. Handle the case where any of them isn't importable. - Write a
record_manifest(experiment_id: str, config: dict, seed: int, artifacts: list[str]) -> Paththat captures the schema in §3, writes it toexperiments/<date>-<id>/manifest.json, and returns the path. Includegit_sha,git_dirty, wall time.
§7 References¶
- Reproducibility in ML — Pineau et al., 2020 (NeurIPS reproducibility checklist).
- PyTorch determinism docs —
torch.use_deterministic_algorithmssemantics. uvdocs — lockfile format,uv syncvs.uv pip install.
§8 Read next¶
→ 02-engineering-hygiene.md — pre-commit, ruff, mypy, bandit, pip-audit as policy.