English · Español
Phase 01 — Hardware & Computing Substrate¶
Requires: 00 — Project Foundations & Learning Methodology Teaches:
memory-hierarchy·roofline·arithmetic-intensity·cache·latency·bandwidthJump to any chapter from the phase reference index.
Chapter map¶
Pre-written per A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.
🇪🇸 Antes de las redes neuronales, el silicio. Aquí construimos el modelo mental que explica por qué
matmulingenuo es lento incluso en una CPU moderna: no son las multiplicaciones, es la memoria.
Goal¶
A mechanical understanding of the machine Borja is running on — enough that "this kernel is bandwidth-bound" is a statement Borja can prove with measurements on his own laptop, not a phrase he repeats from textbooks.
Read order¶
theory/00-motivation.md— why this phase exists at all.theory/01-from-transistor-to-cpu.md— bottom-up: transistor → gate → ALU → pipeline → branch prediction.theory/02-memory-hierarchy.md— caches, DRAM, NUMA, PCIe, SSD; latency vs bandwidth.theory/03-roofline-model.md— the unified visual model that ties compute and memory together. The most important theory page in this phase.lab/00-machine-profile.md— collect ground-truth specs of your machine (lscpu,lstopo,dmidecode, etc.). One-shot.lab/01-memcpy-bandwidth.md— measure RAM bandwidth empirically.lab/02-cache-walks.md— see the cache hierarchy via timing.lab/03-roofline-plot.md— final integration: plot your machine's roofline and place naive matmul on it.
solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.
Definition of Done¶
See PHASE_01_PLAN.md §6. Briefly:
- Roofline plot of your machine in
experiments/01-roofline/. - Each experiment has a
manifest.json(perLYNX_CORTEX.md§5). - You can argue, with measurements, why naive fp32 matmul on an N×N matrix is bandwidth-bound for some N and compute-bound for others.
What this phase intentionally does NOT cover¶
- GPUs. Deferred to Phase 23.
- Distributed memory. Phase 35.
- Numerical accuracy of those operations. Phase 2.
- SIMD instructions in detail. Touched here, deepened in Phase 24 (Triton/CUDA).
Phase 1's scope is the single-CPU memory + compute pipeline. Nothing more.
Further reading¶
Optional — enrichment, not required to pass the phase.
- 📄 Roofline: An Insightful Visual Performance Model — Williams, Waterman, Patterson · 2009. the model you measure your machine against.
- 📕 Computer Architecture: A Quantitative Approach — Hennessy & Patterson · 2017. the canonical memory-hierarchy and pipelining reference.