English · Español

Solution 03 — `seed_everything` reference¶

Read only after completing ../lab/03-seed-by-hand.md and committing your attempt.

Reference implementation¶

src/utils/seeding.py:

"""Deterministic seeding for every RNG in scope.

What this covers:
- PYTHONHASHSEED  — dict iteration, hash(str) for new processes
- random          — stdlib RNG
- numpy.random    — legacy global generator
- torch           — CPU + per-CUDA-device + cuDNN deterministic
- log_versions    — capture installed library versions for manifests

What it does NOT cover:
- Multi-threaded BLAS reduction order (set OMP_NUM_THREADS=1 explicitly)
- TF32 / FP16 hardware nondeterminism on Ampere+ GPUs
- numpy.random.default_rng(seed) — those generators have their own state
  (prefer passing rng around explicitly in your own code; seed_everything
  only sets the legacy global state for libraries that haven't migrated)
"""

from __future__ import annotations

import os
import random
import sys


def seed_everything(seed: int) -> None:
    """Seed every reachable RNG with the given integer.

    Call this **before** any RNG use. Setting PYTHONHASHSEED here only
    affects the current process for code that reads os.environ at runtime;
    to make hash(str) deterministic across processes, also set
    PYTHONHASHSEED in the launcher shell.
    """
    os.environ["PYTHONHASHSEED"] = str(seed)
    random.seed(seed)

    try:
        import numpy as np
    except ImportError:
        pass
    else:
        np.random.seed(seed)

    try:
        import torch
    except ImportError:
        pass
    else:
        torch.manual_seed(seed)
        if torch.cuda.is_available():
            torch.cuda.manual_seed_all(seed)
            torch.backends.cudnn.deterministic = True
            torch.backends.cudnn.benchmark = False


def log_versions() -> dict[str, str]:
    """Return a dict {name: version-or-'not installed'} for the libs we care about."""
    versions: dict[str, str] = {"python": sys.version.split()[0]}
    for name in ("numpy", "torch", "scipy"):
        try:
            mod = __import__(name)
        except ImportError:
            versions[name] = "not installed"
        else:
            versions[name] = getattr(mod, "__version__", "unknown")
    return versions

Decisions made in this reference¶

try/except ImportError per library: lets seed_everything work in an empty venv, on a CPU-only machine, etc. Tests verify this in tests/test_seeding.py.
else: clause after try:: clearer than nested ifs. Reads as "if the import succeeded, do the seeding."
from __future__ import annotations: lets type hints stay PEP 604-style on Python 3.11 even before we set target-version. Cheap insurance.
np.random.seed only: we deliberately do not try to enumerate all np.random.default_rng(...) instances — they have independent state by design. Prefer passing explicit generators in your own code.
torch.manual_seed before cuda.is_available: manual_seed is cheap; doing it unconditionally is fine and reduces ordering concerns.

Subtleties to compare¶

Did you write from numpy.random import seed as np_seed? That works but pulls numpy at import time of seeding.py — which is bad if a CPU-only experiment never wants numpy loaded.
Did you return __version__ directly without getattr? Some libs don't expose it (rare for the ones we care about, but getattr(mod, "__version__", "unknown") is defensive without being noisy).
Did you set cudnn.benchmark = False without cudnn.deterministic = True? Both are needed for true determinism.
Did you call torch.cuda.manual_seed (singular — only current device) instead of manual_seed_all (every device)?

Property test (companion)¶

Append to tests/test_seeding.py:

import random
from hypothesis import given, strategies as st
from utils.seeding import seed_everything


@given(st.integers(min_value=0, max_value=2**31 - 1))
def test_seed_determinism_stdlib(seed: int) -> None:
    seed_everything(seed)
    a = [random.random() for _ in range(5)]
    seed_everything(seed)
    b = [random.random() for _ in range(5)]
    assert a == b


@given(st.integers(min_value=0, max_value=2**31 - 1))
def test_seed_determinism_numpy(seed: int) -> None:
    np = __import__("numpy")
    seed_everything(seed)
    a = np.random.rand(5).tolist()
    seed_everything(seed)
    b = np.random.rand(5).tolist()
    assert a == b

The torch determinism test would be similar but gated by pytest.importorskip("torch").

What this lab actually teaches¶

You will almost certainly write something close to this implementation on the first try. The lesson isn't "did you get the code right." It's:

Did you understand PYTHONHASHSEED's scope? (Process-level; setting it in code doesn't retroactively affect hash(str) of strings already in dict keys.)
Did you understand that np.random.seed ≠ np.random.default_rng(seed)? They produce different sequences from the same seed.
Did you write a test that fails on a buggy version? If your test is seed_everything(0); assert random.random() != 0, that's not a determinism test — it's a smoke test. The property test above is correct because it compares two seeded runs.
Did you list a source of non-determinism that seed_everything cannot cover? (Multi-threaded BLAS reduction order, TF32, atomic-add on FP16, multi-process forking without re-seeding.)

If your learners/borja/phase-00/notes/seeding-notes.md covers those four points honestly, this lab is done.

When you compare and decide "keep mine"¶

Sometimes the answer to the diff is "my version is fine for this curriculum, even if the reference is slightly more conservative." That's a valid call — note it in seeding-notes.md and move on. The lab is not "make your file identical to the reference." It's "be able to defend every line you wrote."

Solution 03 — seed_everything reference¶