English · Español
Lab 02 — RoPE Implementation and the Relative-Position Property¶
Goal: implement RoPE in NumPy; verify numerically that \(\langle R_m q, R_n k \rangle = \langle q, R_{n-m} k \rangle\) to 1e-5.
Estimated time: 90–120 minutes.
Prereq: lab 01 committed;
theory/03-rope.mdread.
What you produce¶
A directory experiments/16-rope/ containing:
rope.py— your implementation, insrc/minimodel/positional/rope.py.verify.py— verification script.verify_output.txt— captured printout.relative_position_demo.png— plot showing relative-position property holds across \(m - n\).manifest.json.README.md.
TODOs¶
Block A — implementation¶
Per src/minimodel/positional/BLUEPRINT.md:
-
rope_frequencies(d_head: int, base: float = 10000.0) -> np.ndarray. Shape(d_head/2,). -
precompute_rope(T: int, d_head: int) -> tuple[np.ndarray, np.ndarray]. Returns(cos_pe, sin_pe)of shape(T, d_head/2). -
apply_rope(q, k, cos_pe, sin_pe) -> tuple[np.ndarray, np.ndarray]. Each of shape(T, d_head).
Implement using the "interleaved pairs" convention (see theory/03-rope.md): pair \((2k, 2k+1)\) rotated together.
def apply_rope(q, k, cos_pe, sin_pe):
# q, k: (T, d_head)
# cos_pe, sin_pe: (T, d_head // 2)
q_even = q[:, 0::2]
q_odd = q[:, 1::2]
q_rope_even = q_even * cos_pe - q_odd * sin_pe
q_rope_odd = q_even * sin_pe + q_odd * cos_pe
q_rope = np.empty_like(q)
q_rope[:, 0::2] = q_rope_even
q_rope[:, 1::2] = q_rope_odd
# same for k
...
return q_rope, k_rope
About 15 LOC total for apply_rope. Hand-built; no shortcuts.
Block B — verify the relative-position property¶
This is the key test. The property: \(\langle R_m q, R_n k \rangle = \langle q, R_{n-m} k \rangle\) for any \(q, k, m, n\).
- Pick
d_head = 16, \(T = 32\). - Generate random
q, kof shape(d_head,)with seed 0. - Pre-compute
cos_pe, sin_pe = precompute_rope(T=32, d_head=16). - For each pair \((m, n) \in \{(0, 1), (3, 7), (5, 5), (10, 12), (20, 8)\}\):
- Compute
q_m = apply_rope_single(q, cos_pe[m], sin_pe[m])andk_n = apply_rope_single(k, cos_pe[n], sin_pe[n]). - Compute
lhs = q_m @ k_n. - Compute
k_diff = apply_rope_single(k, cos_pe[n - m], sin_pe[n - m])if \(n \geq m\), else usecos_pe[-(m-n)]with \(\sin\) sign-flipped. - Compute
rhs = q @ k_diff. - Assert
abs(lhs - rhs) < 1e-5. - Print all five pairs in a table.
(apply_rope_single is a single-position version of apply_rope for convenience. Or implement the full apply_rope and slice.)
Block C — plot the relative-position property¶
- For fixed \(m = 0\), vary \(n\) from 0 to 31.
- At each \(n\), compute \(\langle R_m q, R_n k \rangle\).
- Plot as a function of \(n - m\).
- Overlay \(\langle q, R_{n-m} k \rangle\) on the same axes.
- The two curves must coincide exactly (within 1e-5).
- Save as
relative_position_demo.png.
Block D — sanity: RoPE preserves norm¶
Rotations are orthogonal: \(\|R q\| = \|q\|\).
- Random
qof shape(d_head,). - For each position \(p \in \{0, 1, 5, 100\}\), compute the rotated version.
- Assert
np.allclose(np.linalg.norm(q_rotated), np.linalg.norm(q), atol=1e-6).
Block E — write up¶
In README.md:
- State the relative-position property. In one sentence.
- Report the max diff across the five test cases in Block B. Should be < 1e-5.
- Confirm norm preservation (Block D).
- Why does V not get rotated? From
theory/03-rope.md, in your own words.
Block F — manifest¶
{
"experiment": "16-rope",
"date": "YYYY-MM-DD",
"seed": 0,
"versions": { "python": "3.11.x", "numpy": "X.Y.Z" },
"config": {
"d_head": 16,
"T": 32,
"rope_base": 10000.0
},
"results_summary": {
"relative_position_max_diff": null,
"norm_preservation_max_diff": null
}
}
Constraints¶
- No PyTorch.
- Interleaved pairs convention. Document this clearly in
README.md. Other implementations use "split-half" convention; that's also valid but the two are not bit-equivalent. - No fancy einsums. Use explicit indexing and concatenation. Vectorized but readable.
Stop conditions¶
Done when:
- All six files committed.
- Block B assertions all pass (max diff < 1e-5).
- Block D norm-preservation passes.
relative_position_demo.pngshows two coinciding curves.README.mdanswers all four Block E questions.
Pitfalls¶
- Sign convention. The rotation matrix in
theory/03-rope.mduses \(\begin{pmatrix} \cos & -\sin \\ \sin & \cos \end{pmatrix}\). This is "counter-clockwise". Some references use clockwise (signs flipped). Stick with the theory file's convention. apply_rope_single(q, cos_pe[m], sin_pe[m]). Whenm > n, the relative position is negative. Rotation by \(-\theta\) has \(\sin(-\theta) = -\sin(\theta)\). Handle the sign correctly when computing the rhs.cos_pe[n - m]requires \(n - m \geq 0\). For the test cases with \(n < m\), compute the rotation manually rather than indexing.- Float precision. With
float32and large positions, the trigonometric values lose precision. For Phase 16's small T this is fine; flag for Phase 22+ if Borja extends to long contexts.
When to consult solutions/¶
After all six files committed and assertions pass. Solution at solutions/02-rope-ref.md.
Next lab: 03-extrapolation-compare.md.