Skip to content

English · Español

Phase 06 — Python for AI Engineering

Requires: 05 — Probability & Information Theory Teaches: numpy · strides · broadcasting · views · vectorization · profiling Jump to any chapter from the phase reference index.

Chapter map

Pre-written per LYNX_CORTEX_ADDENDUM.md §A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 La capa Python+NumPy donde viven el 80% de los bugs futuros. No es un repaso de Python, es ingeniería numérica: strides, broadcasting, vectorización, perfilado, semilla. Cuando esto sea reflejo, el código del resto del curriculum será limpio.


Goal

Internalize Python's reference semantics and NumPy's memory model deeply enough that Borja can predict, by hand, shape, dtype, and rough wall-clock of any tensor expression he will write in Phases 7–22. This phase is the engineering substrate for everything numeric that follows. It is light on math, heavy on measurement and surprise.

By phase close, Borja owns:

  • A seed_everything(seed) utility that every later experiment uses.
  • A get_logger(name) utility that emits structured JSON logs.
  • Three small experiments that demonstrate (not merely state) the three traps every AI engineer eventually hits: shared views, broadcasting shape bugs, Python-loop overhead.

Read order

  1. theory/00-motivation.md — why this phase is non-negotiable.
  2. theory/01-references-mutation-gil.md — Python's reference semantics, GIL myths, and how they bite numeric code.
  3. theory/02-strides-and-broadcasting.md — the NumPy memory model formalized: shape, strides, views vs copies, broadcasting rules.
  4. theory/03-vectorization-and-profiling.md — the cost model of Python+NumPy, with the four profilers (cProfile, line_profiler, memory_profiler, py-spy) and when each one earns its keep.
  5. lab/00-environment-and-utilities.md — write seed_everything and get_logger.
  6. lab/01-strides-and-views.md — produce intentional view-aliasing bugs; measure transpose vs contiguous-transpose cost.
  7. lab/02-broadcasting-trap.md — reproduce the (N,) * (N,1) → (N,N) bug and fix it.
  8. lab/03-vectorization-budget.md — measure Python-loop vs NumPy speedup ratio across sizes.

solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.

Definition of Done

See PHASE_06_PLAN.md §6. Briefly:

  • src/utils/seeding.py and src/utils/logging.py exist, pass mypy --strict, and have tests.
  • Three experiment directories with manifest.json and the plot/notes called out in the lab.
  • /quiz 06 ≥ 70%: predict broadcast shapes, explain OWNDATA, identify when np.asarray copies.

What this phase intentionally does NOT cover

  • Autograd. Phase 7 builds it from scratch on top of this engineering substrate.
  • Tensors-as-graph-nodes. Phase ⅞ territory.
  • Pandas, polars, scikit-learn, scipy. None of these are imported in the curriculum's core path; if a corner case needs them, they enter at the relevant phase.
  • C extensions, Cython, Numba. Not in scope. NumPy is enough. (Triton in Phase 24, much later.)
  • Asyncio. Not relevant until Phase 33's serving stack.
  • Type checking deep dive. mypy --strict is on; explaining TypeVar and Protocol is for when we actually need them (probably Phase 17+ with config schemas).

Phase 6's scope is the memory-and-cost model of Python+NumPy, plus the two cross-cutting utilities (seeding, logging) that every later phase imports. Nothing more.

Further reading

Optional — enrichment, not required to pass the phase.