English · Español

Phase 01 — Hardware & Computing Substrate¶

Requires: 00 — Project Foundations & Learning Methodology Teaches: memory-hierarchy · roofline · arithmetic-intensity · cache · latency · bandwidth Jump to any chapter from the phase reference index.

Chapter map¶

Pre-written per A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 Antes de las redes neuronales, el silicio. Aquí construimos el modelo mental que explica por qué matmul ingenuo es lento incluso en una CPU moderna: no son las multiplicaciones, es la memoria.

Goal¶

A mechanical understanding of the machine Borja is running on — enough that "this kernel is bandwidth-bound" is a statement Borja can prove with measurements on his own laptop, not a phrase he repeats from textbooks.

Read order¶

theory/00-motivation.md — why this phase exists at all.
theory/01-from-transistor-to-cpu.md — bottom-up: transistor → gate → ALU → pipeline → branch prediction.
theory/02-memory-hierarchy.md — caches, DRAM, NUMA, PCIe, SSD; latency vs bandwidth.
theory/03-roofline-model.md — the unified visual model that ties compute and memory together. The most important theory page in this phase.
lab/00-machine-profile.md — collect ground-truth specs of your machine (lscpu, lstopo, dmidecode, etc.). One-shot.
lab/01-memcpy-bandwidth.md — measure RAM bandwidth empirically.
lab/02-cache-walks.md — see the cache hierarchy via timing.
lab/03-roofline-plot.md — final integration: plot your machine's roofline and place naive matmul on it.

solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.

Definition of Done¶

See PHASE_01_PLAN.md §6. Briefly:

Roofline plot of your machine in experiments/01-roofline/.
Each experiment has a manifest.json (per LYNX_CORTEX.md §5).
You can argue, with measurements, why naive fp32 matmul on an N×N matrix is bandwidth-bound for some N and compute-bound for others.

What this phase intentionally does NOT cover¶

GPUs. Deferred to Phase 23.
Distributed memory. Phase 35.
Numerical accuracy of those operations. Phase 2.
SIMD instructions in detail. Touched here, deepened in Phase 24 (Triton/CUDA).

Phase 1's scope is the single-CPU memory + compute pipeline. Nothing more.