Skip to content

English · Español

Phase 01 — Hardware & Computing Substrate

Requires: 00 — Project Foundations & Learning Methodology Teaches: memory-hierarchy · roofline · arithmetic-intensity · cache · latency · bandwidth Jump to any chapter from the phase reference index.

Chapter map

Pre-written per A12. This phase entry exists before Borja begins study. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 Antes de las redes neuronales, el silicio. Aquí construimos el modelo mental que explica por qué matmul ingenuo es lento incluso en una CPU moderna: no son las multiplicaciones, es la memoria.


Goal

A mechanical understanding of the machine Borja is running on — enough that "this kernel is bandwidth-bound" is a statement Borja can prove with measurements on his own laptop, not a phrase he repeats from textbooks.

Read order

  1. theory/00-motivation.md — why this phase exists at all.
  2. theory/01-from-transistor-to-cpu.md — bottom-up: transistor → gate → ALU → pipeline → branch prediction.
  3. theory/02-memory-hierarchy.md — caches, DRAM, NUMA, PCIe, SSD; latency vs bandwidth.
  4. theory/03-roofline-model.md — the unified visual model that ties compute and memory together. The most important theory page in this phase.
  5. lab/00-machine-profile.md — collect ground-truth specs of your machine (lscpu, lstopo, dmidecode, etc.). One-shot.
  6. lab/01-memcpy-bandwidth.md — measure RAM bandwidth empirically.
  7. lab/02-cache-walks.mdsee the cache hierarchy via timing.
  8. lab/03-roofline-plot.md — final integration: plot your machine's roofline and place naive matmul on it.

solutions/ is empty during pre-write — populated at phase open after Borja's prior-phase API decisions are visible.

Definition of Done

See PHASE_01_PLAN.md §6. Briefly:

  • Roofline plot of your machine in experiments/01-roofline/.
  • Each experiment has a manifest.json (per LYNX_CORTEX.md §5).
  • You can argue, with measurements, why naive fp32 matmul on an N×N matrix is bandwidth-bound for some N and compute-bound for others.

What this phase intentionally does NOT cover

  • GPUs. Deferred to Phase 23.
  • Distributed memory. Phase 35.
  • Numerical accuracy of those operations. Phase 2.
  • SIMD instructions in detail. Touched here, deepened in Phase 24 (Triton/CUDA).

Phase 1's scope is the single-CPU memory + compute pipeline. Nothing more.

Further reading

Optional — enrichment, not required to pass the phase.