English · Español
Phase 06 — Python for AI Engineering¶
Requires: 05 — Probability & Information Theory Teaches:
numpy·strides·broadcasting·views·vectorization·profilingJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per
LYNX_CORTEX_ADDENDUM.md§A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.🇪🇸 La capa Python+NumPy donde viven el 80% de los bugs futuros. No es un repaso de Python, es ingeniería numérica: strides, broadcasting, vectorización, perfilado, semilla. Cuando esto sea reflejo, el código del resto del curriculum será limpio.
Goal¶
Internalize Python's reference semantics and NumPy's memory model deeply enough that Borja can predict, by hand, shape, dtype, and rough wall-clock of any tensor expression he will write in Phases 7–22. This phase is the engineering substrate for everything numeric that follows. It is light on math, heavy on measurement and surprise.
By phase close, Borja owns:
- A
seed_everything(seed)utility that every later experiment uses. - A
get_logger(name)utility that emits structured JSON logs. - Three small experiments that demonstrate (not merely state) the three traps every AI engineer eventually hits: shared views, broadcasting shape bugs, Python-loop overhead.
Read order¶
theory/00-motivation.md— why this phase is non-negotiable.theory/01-references-mutation-gil.md— Python's reference semantics, GIL myths, and how they bite numeric code.theory/02-strides-and-broadcasting.md— the NumPy memory model formalized: shape, strides, views vs copies, broadcasting rules.theory/03-vectorization-and-profiling.md— the cost model of Python+NumPy, with the four profilers (cProfile,line_profiler,memory_profiler,py-spy) and when each one earns its keep.lab/00-environment-and-utilities.md— writeseed_everythingandget_logger.lab/01-strides-and-views.md— produce intentional view-aliasing bugs; measure transpose vs contiguous-transpose cost.lab/02-broadcasting-trap.md— reproduce the(N,) * (N,1) → (N,N)bug and fix it.lab/03-vectorization-budget.md— measure Python-loop vs NumPy speedup ratio across sizes.
solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.
Definition of Done¶
See PHASE_06_PLAN.md §6. Briefly:
src/utils/seeding.pyandsrc/utils/logging.pyexist, passmypy --strict, and have tests.- Three experiment directories with
manifest.jsonand the plot/notes called out in the lab. /quiz 06≥ 70%: predict broadcast shapes, explainOWNDATA, identify whennp.asarraycopies.
What this phase intentionally does NOT cover¶
- Autograd. Phase 7 builds it from scratch on top of this engineering substrate.
- Tensors-as-graph-nodes. Phase ⅞ territory.
- Pandas, polars, scikit-learn, scipy. None of these are imported in the curriculum's core path; if a corner case needs them, they enter at the relevant phase.
- C extensions, Cython, Numba. Not in scope. NumPy is enough. (Triton in Phase 24, much later.)
- Asyncio. Not relevant until Phase 33's serving stack.
- Type checking deep dive.
mypy --strictis on; explainingTypeVarandProtocolis for when we actually need them (probably Phase 17+ with config schemas).
Phase 6's scope is the memory-and-cost model of Python+NumPy, plus the two cross-cutting utilities (seeding, logging) that every later phase imports. Nothing more.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 Array Programming with NumPy — Harris et al. · 2020. the design of the array you live in for 30 phases.