English · Español
Lab 03 — Implement seed_everything by hand¶
Pre-req: read
../theory/01-reproducibility.md. Goal: re-implementsrc/utils/seeding.pyfrom scratch without looking at the existing file or atsolutions/03-seeding-ref.md. The existing implementation will be moved aside for you.
§1 Setup¶
- Move the existing implementation:
git mv src/utils/seeding.py src/utils/seeding.py.bak. - Create a fresh
src/utils/seeding.pywith only the docstring header — empty body. - Verify
pytest tests/test_seeding.pynow fails (this is expected — you haven't implemented yet).
You will not look at seeding.py.bak until step 6.
§2 Your task¶
Implement two functions:
§2.1 seed_everything(seed: int) -> None¶
Seeds every RNG source described in ../theory/01-reproducibility.md §1.
Cover at minimum:
- PYTHONHASHSEED (via os.environ, with the caveat from theory §1.1).
- random (stdlib).
- numpy.random — both legacy and the question of default_rng (answer this in a code comment).
- torch if importable (CPU + per-CUDA-device + cuDNN deterministic + cuDNN benchmark off).
Constraints:
- Must work when NumPy and/or PyTorch are not installed — wrap imports in try/except.
- Must be type-annotated, passing mypy --strict.
- Must have no side effects beyond seeding the RNGs and setting environment variables.
§2.2 log_versions() -> dict[str, str]¶
Returns a dict mapping library name → version string for python, numpy, torch, scipy. For each library:
- If importable → use __version__.
- If not importable → value is "not installed".
Must work without raising even when no optional libs are present.
§3 Tests¶
tests/test_seeding.py already exists. Re-run it. It should pass once your implementation is correct.
Add at least one property-based test using hypothesis to tests/test_seeding.py:
from hypothesis import given, strategies as st
@given(st.integers(min_value=0, max_value=2**31 - 1))
def test_seed_determinism(seed: int) -> None:
seed_everything(seed)
a = [random.random() for _ in range(5)]
seed_everything(seed)
b = [random.random() for _ in range(5)]
assert a == b
(Yes, this is given to you — it's the shape of the test you should also write for numpy.random, and, if installed, torch.randn.)
§4 Stop conditions¶
-
pytest tests/test_seeding.pypasses (including any new tests you added). -
mypy --strict src/utils/seeding.pypasses. -
ruff check src/utils/seeding.pyis clean. -
bandit src/utils/seeding.pyis clean. - You wrote a paragraph in
learners/borja/phase-00/notes/seeding-notes.mdexplaining: - Why
PYTHONHASHSEEDmust be set in the launcher to affecthash(str)across processes. - Why
np.random.seeddoesn't seednp.random.default_rng(...). - One source of non-determinism that
seed_everythingcannot cover (e.g., multi-threaded BLAS reduction order). - Diff your implementation against
seeding.py.bak. List the differences inlearners/borja/phase-00/notes/seeding-notes.md§"Comparison" — for each: which version is better, why. - Decide: keep yours, or restore
seeding.py.bak? Defend the decision. - Delete
seeding.py.bak(orgit rmit). - Commit:
lab: phase-00 reimplement seed_everything from scratch.
§5 What you'll have learned¶
- The mechanics of every RNG in a Python+NumPy+PyTorch stack.
- The difference between "the RNG is seeded" and "the program is deterministic" (multi-threaded BLAS, TF32, etc.).
- How to design a function that gracefully degrades when optional deps are missing.
- That comparing your own implementation against a reference is the most useful exercise in this phase — not because yours is necessarily worse, but because the delta is the lesson.
§6 Hints (use sparingly)¶
try: import x except ImportError: x = Noneletsmypy --strictinferx: ModuleType | None. You'll need to narrow before use.torch.cuda.manual_seed_all(plural) seeds every device;manual_seed(singular) seeds only the current.cudnn.deterministic = Trueandcudnn.benchmark = False— both. The first picks deterministic algorithms; the second prevents algorithm-shape autotuning that varies between runs.numpy.random.default_rng(seed)is the modern API.seed_everythingshould seed the legacy state for libraries that haven't migrated, but in your own code preferrng = np.random.default_rng(seed)and passrngaround.
If you reach for
solutions/03-seeding-ref.mdbefore completing this, markdod.lab_attempted_before_solutions: false. Honesty matters here.