English · Español
Lab 02 — Mamba selective-scan walkthrough (reading lab)¶
Goal: read
mamba-minimalend-to-end. Annotate the selective-scan logic. Write a 1-page summary connecting the math (theory/03) to the code.Estimated time: 2–3 hours.
Prereq:
theory/03-state-space-models.mdunderstood. Phase 25 (PyTorch internals) done. Borja can read a transformer reference implementation comfortably.
What you produce¶
A directory experiments/36-mamba-walkthrough/ containing:
mamba-sha.txt— the SHA of themamba-minimalrepo read.walkthrough.md— ~1-page annotated reading of the selective-scan core (selective_scan_refor equivalent).state-update-diagram.mmd— mermaid diagram of one step of selective scan, annotated with shapes.grammar-tutor-applicability.md— short verdict: would Mamba help the grammar tutor?
TODOs¶
Block A — clone mamba-minimal¶
git clone https://github.com/johnma2006/mamba-minimal /tmp/mamba-minimal
cd /tmp/mamba-minimal
git rev-parse HEAD > /home/overdrive/claude/lynx-cortex/experiments/36-mamba-walkthrough/mamba-sha.txt
mamba-minimal is a deliberately pedagogical reimplementation (~300 LOC) — much easier to read than the official Mamba repo which uses CUDA kernels. Read the educational version.
Block B — walk the file¶
The interesting file is model.py. Focus on these functions / classes:
MambaBlock— the building block (analogous to a transformer block).selective_scan(orselective_scan_refdepending on version) — the core recurrence.
In walkthrough.md, write annotations covering:
- The discretization step. Where in the code is the continuous-to-discrete transition (computing \(\bar{A}, \bar{B}\) from \(A, B, \Delta\))? Cite line numbers.
- The selectivity. Which lines make \(B, C, \Delta\) input-dependent? (As opposed to S4, where these are fixed parameters.)
- The state update. Trace \(h_t = \bar{A}_t h_{t-1} + \bar{B}_t x_t\) — find the corresponding line(s).
- The output projection. Where is \(y_t = C_t h_t\)?
- The convolution. Mamba uses a 1D conv as a pre-processor. Why? (Hint: gives short-range smoothing before the SSM.)
Each annotation: 2-3 sentences + line citation. ~5-8 annotations total.
Block C — the shape diagram¶
Draw the mermaid diagram for one selective-scan step:
flowchart LR
x[x_t : (B, D)] --> deltaP[Linear -> Δ_t : (B, D)]
x --> Bproj[Linear -> B_t : (B, N)]
x --> Cproj[Linear -> C_t : (B, N)]
A_param[A : (D, N) param] -.discretize.-> Abar[Ā_t : (B, D, N)]
deltaP --> Abar
Bproj --> Bbar[B̄_t : (B, D, N)]
deltaP --> Bbar
h_prev[h_{t-1} : (B, D, N)] --> update[h_t = Ā_t · h_{t-1} + B̄_t · x_t]
Abar --> update
Bbar --> update
update --> h_t[h_t : (B, D, N)]
h_t --> output[y_t = C_t · h_t : (B, D)]
Cproj --> output
Commit as state-update-diagram.mmd. Annotate the diagram by adding a "where in the code" reference next to each box (e.g., "Δ projection: line 142").
Block D — the grammar-tutor applicability¶
Write grammar-tutor-applicability.md (~200 words):
- Would Mamba help the grammar tutor? (Spoiler from
theory/03: no.) - What specifically about the grammar-tutor's task makes attention strictly better than Mamba? (Answer: subject-verb-tense agreement requires precise lookup of a specific past token; Mamba compresses past into a state, attention reads directly.)
- When would you reach for Mamba? (Answer: very long context, where the KV cache becomes infeasible.)
- What about a hybrid (Jamba-like) approach? Could a single attention layer + multiple Mamba layers help? (Hint: probably not, at our 32-token max context. Attention's compute at this scale is negligible.)
Constraints¶
- No Mamba training. This is a reading lab. Spinning up Mamba inference is fine if you want to feel the speed; training is out-of-scope (and slow on CPU).
- No copying source. Cite line ranges, not full code. The walkthrough should be a summary, not a transcription.
- Mermaid diagrams only. No PNGs from external tools — keep things diff-able.
- CPU-only, zero cloud spend.
Stop conditions¶
You're done when:
experiments/36-mamba-walkthrough/{mamba-sha.txt, walkthrough.md, state-update-diagram.mmd, grammar-tutor-applicability.md}all exist.walkthrough.mdhas ≥5 annotated points with line citations.- The mermaid diagram is committed and renders correctly.
- The grammar-tutor applicability note answers all four questions.
- You can explain, from memory, "what makes Mamba 'selective'" in one sentence.
Hint of last resort¶
If mamba-minimal has drifted from the version this lab was written against: pin the version with the SHA at clone time. If the function names changed (selective_scan vs selective_scan_ref etc.), use whichever matches the current file.
If the discretization step (Block B point 1) is opaque: the formula is in theory/03-state-space-models.md §"Discretization". Compare the math to the code one line at a time. The code is doing exactly the math, in PyTorch ops.
When to consult solutions/¶
After committing. Solution lives in solutions/02-mamba-walkthrough-ref.md — written at phase open with the current mamba-minimal version pinned. The reference is a set of annotation picks with line ranges; Borja's picks may differ — the comparison is "what did I miss?", not "did I match exactly?".
Next lab: lab/03-speculative-survey.md.