English · Español

Phase 36 — Frontier Architectures¶

Requires: 35 — Distributed Training & Inference Teaches: mixture-of-experts · mamba · state-space-models · speculative-decoding Jump to any chapter from the phase reference index.

Chapter map¶

Pre-written per A12. Theory and lab problem statements are stable drafts; solutions are written just-in-time at phase open.

🇪🇸 Una gira de arquitecturas modernas: MoE, MLA, RWKV/Mamba, speculative decoding, "reasoning models". La pregunta central de toda la fase no es "¿cómo funciona?", es "¿me sirve para esto?". Spoiler: para el tutor de gramática, casi nunca.

Goal¶

Survey the four families of frontier architectures (MoE, MLA, state-space models, speculative decoding + reasoning) deeply enough that Borja can read any current paper or codebase, write the FLOP and memory math on a napkin, and — for each technique — judge whether it would help the grammar tutor.

The judgement is the load-bearing part. Each technique solves a specific bottleneck. The grammar tutor has none of those bottlenecks. Phase 36 turns that mismatch into the lesson: smaller scope is sometimes the right answer, and recognizing that is harder than copying a flashy architecture.

The phase is consciously concept-heavy, light implementation (§4 spec). One tiny MoE experiment on local CPU, one pencil-and-paper MLA derivation, one Mamba reading walkthrough, one speculative-decoding survey. Zero cloud cost. Zero production code.

Read order¶

theory/00-motivation.md — why "is this the right tool for my task" is the only question that matters in arch surveys.
theory/01-moe.md — Mixture of Experts: routing, load balancing, expert parallelism. Bottleneck addressed: parameter count growing faster than compute.
theory/02-mla.md — Multi-Latent Attention (DeepSeek): low-rank latent cache. Bottleneck addressed: KV-cache memory at long context.
theory/03-state-space-models.md — RWKV, Mamba, S4. Selective scan. Hybrids (Jamba). Bottleneck addressed: attention's quadratic time and growing KV cache at very long context.
theory/04-speculative-and-reasoning.md — Speculative decoding family (vanilla / Medusa / EAGLE / Lookahead) + "reasoning models" / test-time-compute scaling. Bottleneck addressed: decode latency.
lab/00-moe-on-grammar-tutor.md — train a 2-expert MoE variant locally. Confirm: doesn't help.
lab/01-mla-math-exercise.md — derive MLA's KV-cache reduction. Confirm: irrelevant at our scale.
lab/02-mamba-walkthrough.md — annotated reading of mamba-minimal's selective_scan.
lab/03-speculative-survey.md — one-page survey + recommendation for the grammar tutor.

solutions/ is empty during pre-write — populated at phase open after Borja's Phase 17 MiniGPT and Phase 18 training loop are in.

Definition of Done¶

See PHASE_36_PLAN.md §6. Briefly:

2-expert MoE locally runs and converges; honest negative-result note committed.
MLA KV-cache math derived for the grammar tutor's dimensions.
Mamba selective-scan walkthrough committed (~1 page + line citations).
Speculative decoding survey committed.
Architecture decision-tree diagram committed under diagrams/.
/quiz 36 ≥ 70%.

What this phase intentionally does NOT cover¶

Implementing Mamba or MLA in PyTorch. Read-only. The implementations require GPU-kernel context (Phase 24) and aren't pedagogical at our scale.
Training a real MoE. A "real" MoE is 100B+ parameters; we're a calculator. The 2-expert local experiment is a stub.
Multi-modal architectures (vision encoders, audio encoders, fusion). §4 mentions for completeness; out of scope here.
RLHF / DPO / "reasoning RL". Phase 28 already mentioned these as concept-only; not re-introduced.
MoE serving infrastructure (expert parallelism at scale, all-to-all comm patterns, dropless MoE). Phase 35 territory if Borja revisits.
Speculative decoding implementation. Survey only. Implementing is a fun side-project but distracts from the phase's purpose.
3D parallelism for MoE training. Phase 35 territory.

Phase 36's scope is vocabulary, math, and judgement on frontier architectures, applied to the microscopic grammar tutor. Nothing more.