English · Español
Solution 03 — seed_everything reference¶
Read only after completing
../lab/03-seed-by-hand.mdand committing your attempt.
Reference implementation¶
src/utils/seeding.py:
"""Deterministic seeding for every RNG in scope.
What this covers:
- PYTHONHASHSEED — dict iteration, hash(str) for new processes
- random — stdlib RNG
- numpy.random — legacy global generator
- torch — CPU + per-CUDA-device + cuDNN deterministic
- log_versions — capture installed library versions for manifests
What it does NOT cover:
- Multi-threaded BLAS reduction order (set OMP_NUM_THREADS=1 explicitly)
- TF32 / FP16 hardware nondeterminism on Ampere+ GPUs
- numpy.random.default_rng(seed) — those generators have their own state
(prefer passing rng around explicitly in your own code; seed_everything
only sets the legacy global state for libraries that haven't migrated)
"""
from __future__ import annotations
import os
import random
import sys
def seed_everything(seed: int) -> None:
"""Seed every reachable RNG with the given integer.
Call this **before** any RNG use. Setting PYTHONHASHSEED here only
affects the current process for code that reads os.environ at runtime;
to make hash(str) deterministic across processes, also set
PYTHONHASHSEED in the launcher shell.
"""
os.environ["PYTHONHASHSEED"] = str(seed)
random.seed(seed)
try:
import numpy as np
except ImportError:
pass
else:
np.random.seed(seed)
try:
import torch
except ImportError:
pass
else:
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def log_versions() -> dict[str, str]:
"""Return a dict {name: version-or-'not installed'} for the libs we care about."""
versions: dict[str, str] = {"python": sys.version.split()[0]}
for name in ("numpy", "torch", "scipy"):
try:
mod = __import__(name)
except ImportError:
versions[name] = "not installed"
else:
versions[name] = getattr(mod, "__version__", "unknown")
return versions
Decisions made in this reference¶
try/except ImportErrorper library: letsseed_everythingwork in an empty venv, on a CPU-only machine, etc. Tests verify this intests/test_seeding.py.else:clause aftertry:: clearer than nested ifs. Reads as "if the import succeeded, do the seeding."from __future__ import annotations: lets type hints stay PEP 604-style on Python 3.11 even before we settarget-version. Cheap insurance.np.random.seedonly: we deliberately do not try to enumerate allnp.random.default_rng(...)instances — they have independent state by design. Prefer passing explicit generators in your own code.torch.manual_seedbeforecuda.is_available:manual_seedis cheap; doing it unconditionally is fine and reduces ordering concerns.
Subtleties to compare¶
- Did you write
from numpy.random import seed as np_seed? That works but pulls numpy at import time ofseeding.py— which is bad if a CPU-only experiment never wants numpy loaded. - Did you return
__version__directly withoutgetattr? Some libs don't expose it (rare for the ones we care about, butgetattr(mod, "__version__", "unknown")is defensive without being noisy). - Did you set
cudnn.benchmark = Falsewithoutcudnn.deterministic = True? Both are needed for true determinism. - Did you call
torch.cuda.manual_seed(singular — only current device) instead ofmanual_seed_all(every device)?
Property test (companion)¶
Append to tests/test_seeding.py:
import random
from hypothesis import given, strategies as st
from utils.seeding import seed_everything
@given(st.integers(min_value=0, max_value=2**31 - 1))
def test_seed_determinism_stdlib(seed: int) -> None:
seed_everything(seed)
a = [random.random() for _ in range(5)]
seed_everything(seed)
b = [random.random() for _ in range(5)]
assert a == b
@given(st.integers(min_value=0, max_value=2**31 - 1))
def test_seed_determinism_numpy(seed: int) -> None:
np = __import__("numpy")
seed_everything(seed)
a = np.random.rand(5).tolist()
seed_everything(seed)
b = np.random.rand(5).tolist()
assert a == b
The torch determinism test would be similar but gated by pytest.importorskip("torch").
What this lab actually teaches¶
You will almost certainly write something close to this implementation on the first try. The lesson isn't "did you get the code right." It's:
- Did you understand
PYTHONHASHSEED's scope? (Process-level; setting it in code doesn't retroactively affecthash(str)of strings already indictkeys.) - Did you understand that
np.random.seed≠np.random.default_rng(seed)? They produce different sequences from the same seed. - Did you write a test that fails on a buggy version? If your test is
seed_everything(0); assert random.random() != 0, that's not a determinism test — it's a smoke test. The property test above is correct because it compares two seeded runs. - Did you list a source of non-determinism that
seed_everythingcannot cover? (Multi-threaded BLAS reduction order, TF32, atomic-add on FP16, multi-process forking without re-seeding.)
If your learners/borja/phase-00/notes/seeding-notes.md covers those four points honestly, this lab is done.
When you compare and decide "keep mine"¶
Sometimes the answer to the diff is "my version is fine for this curriculum, even if the reference is slightly more conservative." That's a valid call — note it in seeding-notes.md and move on. The lab is not "make your file identical to the reference." It's "be able to defend every line you wrote."