English · Español

Lab 00 — Permutation Equivariance, in Numbers¶

Goal: demonstrate empirically that attention without positional information is permutation-equivariant. Show that adding sinusoidal PE breaks this property.

Estimated time: 30–45 minutes.

Prereq: theory/00-motivation.md read. Phase 15's src/minimodel/attention/ exists.

What you produce¶

A directory experiments/16-permutation-equivariance/ containing:

demo.py — script that runs attention with and without PE on a 3-token sequence and shows what happens under permutation.
demo_output.txt — captured printout.
manifest.json.
README.md.

TODOs¶

Block A — set up¶

Use MultiHeadAttention(d_model=8, n_heads=1, seed=0) from Phase 15.
Input: 3 tokens forming a verb-grammar fragment — he, work, I (token IDs from your Phase 14 tokenizer; embed via the Phase 13 embedding). Stack as \(X \in \mathbb{R}^{3 \times 8}\).
Linguistic motivation: he work I is ungrammatical; I work he is also ungrammatical; only with positional info can the model prefer one ordering over another (or rather: prefer he works over works he later in Phase 18). Without PE, all orderings look the same.

Block B — without PE¶

Compute Y = mha.forward(X, mask=None).
Permute the input: X_perm = X[[2, 0, 1]] (swap tokens around).
Compute Y_perm = mha.forward(X_perm, mask=None).
Assert: np.allclose(Y_perm, Y[[2, 0, 1]], atol=1e-6).

This proves: the attention output on the permuted input equals the permutation of the attention output on the original input. Equivariance. The model has no way to tell which permutation it received.

Print Y and Y[[2, 0, 1]] and Y_perm side-by-side. Verify visually.

Block C — with PE¶

Use sinusoidal PE: pe = sinusoidal_pe(3, 8) (from Phase 16's src/minimodel/positional/sinusoidal.py).
Compute Y_pe = mha.forward(X + pe).
Compute Y_perm_pe = mha.forward(X_perm + pe).
Assert: not np.allclose(Y_perm_pe, Y_pe[[2, 0, 1]], atol=1e-3).

This proves: with PE, the model does distinguish the permutation. The output is no longer just a reordering.

Print the diff matrix Y_perm_pe - Y_pe[[2, 0, 1]] (it should have non-trivial entries — the PE has broken equivariance).

Block D — interpret¶

In README.md (1–2 paragraphs), answer:

Why does the without-PE test pass? State the permutation-equivariance theorem in your own words and reference the 3-token example.
Why does the with-PE test pass-with-difference? The PE rows are different for different positions; adding them to permuted tokens means each token now carries a position-specific signature that the un-permuted version wouldn't have.

Block E — manifest¶

{
  "experiment": "16-permutation-equivariance",
  "date": "YYYY-MM-DD",
  "seed": 0,
  "versions": { "python": "3.11.x", "numpy": "X.Y.Z" },
  "config": {
    "d_model": 8,
    "n_heads": 1,
    "T": 3,
    "pe_scheme_compared": "sinusoidal"
  },
  "results_summary": {
    "without_PE_equivariance_max_diff": null,
    "with_PE_equivariance_max_diff": null
  }
}

The without-PE diff should be < 1e-6. The with-PE diff should be > 1e-3.

Constraints¶

No new code in src/. Use existing MultiHeadAttention and sinusoidal_pe. This lab is a demonstration, not an implementation lab.
Seeded. Reproducible.

Stop conditions¶

Done when:

All four files committed.
Both assertions pass (one for equivariance without PE, one for non-equivariance with PE).
README.md explains the result.

Pitfalls¶

Permutation index confusion. X[[2, 0, 1]] means "take row 2, row 0, row 1 in that order". Confirm by printing X and X[[2, 0, 1]] to make sure you understand.
Tolerance. 1e-6 for without-PE; 1e-3 for with-PE (the PE values are O(1), so the diff after attention is non-trivial).

When to consult `solutions/`¶

After all four files committed. Solution at solutions/00-permutation-equivariance-ref.md.

Next lab: 01-sinusoidal-pe.md.