Skip to content

English · Español

Lab 00 — Environment and utilities

Goal: ship src/utils/seeding.py and src/utils/logging.py — two tiny modules every later phase imports.

Estimated time: 60–90 minutes.

Prereqs: Phase 0 environment is set up (uv, just, ruff, mypy, pytest).


What you produce

  1. src/utils/seeding.pyseed_everything(seed: int) -> int. Seeds Python random, NumPy default_rng, and PYTHONHASHSEED. Returns the seed for logging.
  2. src/utils/logging.pyget_logger(name: str). Returns a structlog logger configured to emit JSON to stdout, with a phase context field that callers can set.
  3. tests/test_seeding.py — verifies determinism: two calls to seed_everything(42) followed by np.random.default_rng().random(5) produce identical arrays.
  4. tests/test_logging.py — verifies the logger emits a JSON line with the expected fields.
  5. experiments/06-environment-check/manifest.json + README.md — a tiny smoke test that imports both utilities, logs one message, and seeds an RNG.

TODOs

Block A — seed_everything

  • Function signature: def seed_everything(seed: int) -> int. Type-annotated. Docstring.
  • Seed Python's random.seed(seed).
  • Set os.environ["PYTHONHASHSEED"] = str(seed) — note this only affects child processes; document the caveat.
  • Do NOT call np.random.seed(seed) (global state — see theory/01). Instead, return the seed and let callers do rng = np.random.default_rng(seed_everything(42)).
  • Decide and document: should torch be seeded too? Phase 6 doesn't import torch in src/; defer to Phase 8 tests which need it. Keep seeding.py torch-free.
  • Log a structlog event when called: log.info("seed_set", seed=seed).

Block B — get_logger

  • Function signature: def get_logger(name: str).
  • Configure structlog once at module import time (idempotent).
  • Processors: timestamp (ISO 8601, UTC), log level, the message itself, JSONRenderer.
  • Provide a bind_phase(phase: str) helper that returns a logger pre-bound with a phase field.
  • Test that calling get_logger("foo").info("bar", x=1) emits valid JSON on stdout containing event="bar", x=1, a timestamp, and logger="foo".

Block C — tests

  • tests/test_seeding.py:
  • Test 1: seed_everything(42) returns 42.
  • Test 2: After two calls to seed_everything(42), two np.random.default_rng(42).random(5) calls produce identical arrays.
  • Test 3: After seed_everything(42), random.random() is deterministic.
  • tests/test_logging.py:
  • Test 1: get_logger returns an object with info, warning, error, debug methods.
  • Test 2: Capturing stdout, calling log.info("event_name", key="value") emits JSON containing "event": "event_name" and "key": "value". Use capsys fixture.
  • Test 3: bind_phase("phase-06").info("foo") includes "phase": "phase-06" in the JSON.

Block D — smoke test

  • Create experiments/06-environment-check/ with:
  • check.py — imports both utilities, calls seed_everything(42), logs one info message with the seed.
  • manifest.json{seed, versions, config, hardware} per LYNX_CORTEX.md §5.
  • README.md (1 paragraph) — what this experiment verifies.

Constraints

  • mypy --strict must pass. All functions typed, no implicit Any.
  • ruff must pass. Line length 100 (the repo default — confirm in pyproject.toml).
  • pytest must pass. All tests green.
  • No print. Use log.info even in the smoke test.
  • No np.random.seed. Use np.random.default_rng(seed) only.
  • Idempotency. seed_everything(42); seed_everything(42) must produce the same downstream behavior as a single call. Same for get_logger("foo"); get_logger("foo").

Stop conditions

Done when:

  1. Both utility files exist, both pass mypy --strict and ruff.
  2. Both test files exist, all tests pass.
  3. The smoke experiment runs without error and emits a JSON log line.
  4. git diff shows no print statements anywhere in your new files.

Pitfalls

  • structlog not configured before first use. If you call log.info(...) before structlog.configure(...) runs, you get the default processor chain, not yours. Configure at module-import time of logging.py, and make sure tests import logging.py before exercising loggers.
  • PYTHONHASHSEED set after Python startup. Setting os.environ["PYTHONHASHSEED"] from within Python does not retroactively change the current process's hash seed (it was decided at interpreter startup). It only affects child processes spawned via subprocess. Document this in the docstring; do not pretend it makes the parent process deterministic.
  • Test stdout capture. capsys.readouterr().out returns a string. For JSON lines, you may need to out.strip().split("\n") and json.loads each line.
  • Forgetting __init__.py. src/utils/ should already have __init__.py from Phase 0. If not, create empty.
  • Circular imports. seeding.py calls get_logger; logging.py doesn't import seeding. Keep it one-directional.

When to consult solutions/

After you have:

  1. Committed both utility files.
  2. Both test files green.
  3. The smoke experiment ran and you can paste the JSON output into your README.md.

Then read solutions/00-environment-and-utilities-ref.md (written at phase open) to compare structure choices.


Next lab: lab/01-strides-and-views.md.