Skip to content

English · Español

Lab 03 — Implement seed_everything by hand

Pre-req: read ../theory/01-reproducibility.md. Goal: re-implement src/utils/seeding.py from scratch without looking at the existing file or at solutions/03-seeding-ref.md. The existing implementation will be moved aside for you.

§1 Setup

  1. Move the existing implementation: git mv src/utils/seeding.py src/utils/seeding.py.bak.
  2. Create a fresh src/utils/seeding.py with only the docstring header — empty body.
  3. Verify pytest tests/test_seeding.py now fails (this is expected — you haven't implemented yet).

You will not look at seeding.py.bak until step 6.

§2 Your task

Implement two functions:

§2.1 seed_everything(seed: int) -> None

Seeds every RNG source described in ../theory/01-reproducibility.md §1.

Cover at minimum: - PYTHONHASHSEED (via os.environ, with the caveat from theory §1.1). - random (stdlib). - numpy.random — both legacy and the question of default_rng (answer this in a code comment). - torch if importable (CPU + per-CUDA-device + cuDNN deterministic + cuDNN benchmark off).

Constraints: - Must work when NumPy and/or PyTorch are not installed — wrap imports in try/except. - Must be type-annotated, passing mypy --strict. - Must have no side effects beyond seeding the RNGs and setting environment variables.

§2.2 log_versions() -> dict[str, str]

Returns a dict mapping library name → version string for python, numpy, torch, scipy. For each library: - If importable → use __version__. - If not importable → value is "not installed".

Must work without raising even when no optional libs are present.

§3 Tests

tests/test_seeding.py already exists. Re-run it. It should pass once your implementation is correct.

Add at least one property-based test using hypothesis to tests/test_seeding.py:

from hypothesis import given, strategies as st

@given(st.integers(min_value=0, max_value=2**31 - 1))
def test_seed_determinism(seed: int) -> None:
    seed_everything(seed)
    a = [random.random() for _ in range(5)]
    seed_everything(seed)
    b = [random.random() for _ in range(5)]
    assert a == b

(Yes, this is given to you — it's the shape of the test you should also write for numpy.random, and, if installed, torch.randn.)

§4 Stop conditions

  • pytest tests/test_seeding.py passes (including any new tests you added).
  • mypy --strict src/utils/seeding.py passes.
  • ruff check src/utils/seeding.py is clean.
  • bandit src/utils/seeding.py is clean.
  • You wrote a paragraph in learners/borja/phase-00/notes/seeding-notes.md explaining:
  • Why PYTHONHASHSEED must be set in the launcher to affect hash(str) across processes.
  • Why np.random.seed doesn't seed np.random.default_rng(...).
  • One source of non-determinism that seed_everything cannot cover (e.g., multi-threaded BLAS reduction order).
  • Diff your implementation against seeding.py.bak. List the differences in learners/borja/phase-00/notes/seeding-notes.md §"Comparison" — for each: which version is better, why.
  • Decide: keep yours, or restore seeding.py.bak? Defend the decision.
  • Delete seeding.py.bak (or git rm it).
  • Commit: lab: phase-00 reimplement seed_everything from scratch.

§5 What you'll have learned

  • The mechanics of every RNG in a Python+NumPy+PyTorch stack.
  • The difference between "the RNG is seeded" and "the program is deterministic" (multi-threaded BLAS, TF32, etc.).
  • How to design a function that gracefully degrades when optional deps are missing.
  • That comparing your own implementation against a reference is the most useful exercise in this phase — not because yours is necessarily worse, but because the delta is the lesson.

§6 Hints (use sparingly)

  1. try: import x except ImportError: x = None lets mypy --strict infer x: ModuleType | None. You'll need to narrow before use.
  2. torch.cuda.manual_seed_all (plural) seeds every device; manual_seed (singular) seeds only the current.
  3. cudnn.deterministic = True and cudnn.benchmark = False — both. The first picks deterministic algorithms; the second prevents algorithm-shape autotuning that varies between runs.
  4. numpy.random.default_rng(seed) is the modern API. seed_everything should seed the legacy state for libraries that haven't migrated, but in your own code prefer rng = np.random.default_rng(seed) and pass rng around.

If you reach for solutions/03-seeding-ref.md before completing this, mark dod.lab_attempted_before_solutions: false. Honesty matters here.