English · Español

Theory 06 — Bilingual policy in the portal¶

🇪🇸 La política bilingüe del repo (§0.6 de CLAUDE.md) tiene cuatro capas en el portal: contenido (teoría), interfaz (chrome), entrada del usuario (notas, diario), y assessment (quizzes y exámenes). Cada capa elige independientemente: contenido es bilingüe siempre, chrome es localizable, entrada es libre, assessment puede ser monolingüe o pareado. La regla central: el inglés es el canónico; el español es el complemento. Ninguna cadena del sistema obliga a leer en un idioma que no controlas.

The four layers¶

The portal's bilingual surface decomposes into four independent layers:

Theory content. What the curriculum teaches (docs/phase-NN-*/theory/*.md).
UI chrome. Labels, buttons, navigation, error messages.
User input. Notes, journal entries, exam responses.
Assessment content. Quiz prompts, exam questions, answer keys.

Each layer has its own policy. Mixing them — for example, treating quiz prompts and UI chrome under the same i18n pipeline — produces strange behavior; a button labeled "Next" in chrome English should not shift to Spanish "Siguiente" just because the quiz is in Spanish, and vice versa.

Layer 1: Theory content¶

Per CLAUDE.md §0.6 (codifying addendum §A2), theory pages are bilingual-by-source:

The entire document is written in English (canonical).
A > 🇪🇸 blockquote summary of 1–3 sentences appears at the top, in Spanish.
Inline Spanish glosses may appear next to dense conceptual terms, sparingly.
Section headers, code identifiers, file names, commit messages remain English-only.

The portal renders these pages as-is — no translation pipeline. The 🇪🇸 block is part of the prose, not a UI artifact. The locale toggle (theory 04) does not affect this; switching locale from English to Spanish does not hide the 🇪🇸 block or surface a Spanish-only variant. There is no Spanish-only theory variant.

This is a deliberate choice. Maintaining two parallel translations of the curriculum is the anti-goal of §A2 — it doubles the curriculum's maintenance cost without doubling its pedagogical value. The Spanish summary at the top of each page hits the highest-value 80% (orient the Spanish-native reader to the dense bits) at 5% of the cost.

Why the canonical is English¶

Three reasons, in priority order:

Reusability. Other learners (per addendum §A3) may not be Spanish-speaking. A curriculum written canonically in Spanish would force every non-Spanish-speaking fork to translate first; canonical English keeps the barrier minimal.
Technical vocabulary. ML and engineering vocabulary is canonically English. "Backpropagation," "gradient descent," "attention head" — Spanish translations exist but adoption is uneven and the cognitive load of a translated technical term is non-trivial.
Code identifiers. Source code identifiers must be ASCII English by repo policy. Theory that uses identifier names (e.g., softmax_with_temperature(logits)) reads more naturally when the surrounding prose matches.

The Spanish summary contract¶

The summary block at the top of each theory page has a tight contract:

Length: 1 to 3 sentences. Not a paragraph. Not a single fragment.
Voice: active, declarative, no hedging.
Content: the page's central insight, in everyday Spanish (tú form OK, vosotros not appropriate for Latin American readers).
Format: a markdown blockquote prefixed > 🇪🇸.

The contract is enforced by a docs-linting hook: each docs/phase-NN-*/theory/*.md file is checked for a ^> 🇪🇸 line within the first 5 non-blank lines. Missing summaries fail the docs build.

Layer 2: UI chrome¶

The portal's UI chrome is localized — English and Spanish strings live in src/miniportal/locale/{en,es}.yaml and a Jinja2 {% trans %} filter resolves them against students.locale.

The localized surface is small: ~50 strings covering navigation labels (Curriculum, Journal, Notes, Quizzes, Progress, Admin, Logout), button labels (Save entry, Submit quiz, Reveal answer, Reset), table headers (username, display_name, last login), and error / success notices (Saved., Password updated., Login required.).

A locale fallback chain handles missing strings: if es.yaml lacks a key, the renderer falls back to en.yaml. Missing-in-both is a startup error, caught in CI.

The locale toggle is a one-click UI affordance (top-nav 🇪🇸/🇬🇧 button) that:

Writes the new value to students.locale.
Sets a cookie lc=<new locale> for the rendering layer to read on the next request.
Refreshes the page.

The setting is per-student, not session-scoped. The same browser logged into two student accounts will respect each account's preferred locale independently.

Why chrome localization, given canonical-English content¶

A reasonable question: if the theory content is English with a Spanish summary, why localize the buttons?

Answer: chrome localization is cheap (~50 strings) and removes a friction tax. A Spanish-speaking student who reads theory in English (with the Spanish summary) still benefits from buttons labeled Guardar and Siguiente. The cognitive cost of re-translating chrome on each click is non-zero and accumulates.

The asymmetry is deliberate: localizing 50 chrome strings has low maintenance cost; translating 40 phases of theory has high maintenance cost. The portal picks the cheap one.

Layer 3: User input¶

User-generated content — notes, journal entries, exam responses — is completely unconstrained. Borja can write notes in English, Spanish, mixed code-switched paragraphs, or any other language. The portal does not detect, validate, or transform input language.

Storage: body_markdown columns are UTF-8 TEXT; the SQLite unicode_string collation is the default. Indexing on these columns is full-text via FTS5 (deferred to a future phase if needed); the FTS5 tokenizer's unicode mode handles mixed-language content correctly.

The Phase 37 injection filter (theory 03) is applied uniformly regardless of input language. The filter operates on character classes and patterns, not on language tokens.

The journal file convention (learners/<name>/journal/YYYY-MM-DD.md) is similarly unconstrained. The filename is ASCII (the date), the body is anything Borja wants to write.

Why no input validation¶

Three reasons:

The user knows best. Forcing a language on personal notes is paternalistic and counterproductive. A Spanish-native learner may take notes in Spanish because that's how they think.
Code-switching is the norm. Borja routinely writes // Esto evita un buffer overflow — half Spanish, half English. Validation that rejected this would frustrate without benefit.
No downstream consumer cares. The notes table is read only by the learner who wrote it. There is no automated pipeline that needs a normalized language tag.

Layer 4: Assessment content¶

Quiz prompts and exam questions get a special treatment: they can be authored bilingually.

The quizzes.body_yaml and exam_questions.prompt schemas support paired-language fields:

slug: past-simple-irregular-eat
phase_n: 32
prompt_en: |
  Conjugate "eat" in the past simple, 3rd person singular.
prompt_es: |
  Conjuga "eat" en pasado simple, 3ª persona del singular.
answer_key: ate
rubric:
  match: exact_ci
  normalize: trim

The rendering layer picks prompt_en or prompt_es based on the active locale; falls back to prompt_en if prompt_es is missing.

The answer key is not translated. For the grammar tutor curriculum, the answer is by definition an English form (e.g., ate). Translating the prompt while keeping the answer key in the target language preserves the learning intent — "I want the English past simple of eat" — regardless of which language the prompt is read in.

For the Phase 30 structured-answer machinery, the rubric specification is language-agnostic (regex, exact-match, JSON-shape match); no translation is needed.

Exam responses¶

exam_responses.response_text is the learner's free-text answer. For grammar tutor questions, the answer is a single English word (ate, went, was). The matcher is case-insensitive trim-normalized comparison; nothing language-specific.

For longer-form Phase 30 structured answers, the learner submits a JSON structure following the rubric. The JSON keys are canonical English (e.g., {"verb": "...", "tense": "...", "person": "..."}); the values may be in either language if the rubric allows. This is documented per-question in the rubric.

The bilingual quiz lab¶

The Phase 41 lab includes a smoke test for bilingual quiz rendering:

Set students.locale = 'en'. Render the quiz. Assert prompt matches prompt_en.
Set students.locale = 'es'. Re-render. Assert prompt matches prompt_es.
Submit the same answer in both locales. Assert both score equally.

The lab also catches the "missing prompt_es" case: a quiz with only prompt_en defined renders correctly in either locale (fallback to English), with no error.

What this policy does NOT do¶

No machine translation. The portal never calls an MT service. Translations are author-supplied and version-controlled.
No locale autodetection. No Accept-Language header parsing. The default is English; switching is explicit.
No RTL support. Hebrew, Arabic, etc. are out of scope. Adding RTL involves CSS, layout, and tokenization work the curriculum does not budget for.
No content language detection. A note's language is not tagged. Search treats all input as a single bag-of-tokens.
No translation memory. The Phase 41 quizzes' EN/ES pairs are hand-authored each time; no shared glossary across quizzes (the corpus is too small).

Forward path¶

Two natural extensions are documented but deferred:

A third language. If a future learner is Portuguese-speaking, the same paired-field machinery (prompt_pt) extends without schema changes. The locale enum and the locale toggle UI need updates; both bounded.
A language-tagged note model. A future learner might want to filter "show me only my Spanish notes." A notes.language column (nullable TEXT, learner-set) would suffice; not in MVP.

The bilingual workflow for Claude Code itself¶

A note about Claude's own outputs in this repo: per §0.6, Claude responds to Borja in English with optional Spanish summaries for dense conceptual material. The portal does not change this contract — Claude's behavior is governed by CLAUDE.md and the addendum, not by the portal's locale setting. The portal localizes UI; it does not localize the underlying authoring workflow.

The pre-write pipeline that produced this very file is bilingual-by-source: the file's content is English, the Spanish summary at top is Spanish. The portal serves this file unchanged regardless of who reads it.

One-paragraph recap¶

The portal's bilingual policy decomposes into four layers, each with its own rule: theory is canonical-English with a Spanish summary, UI chrome is locale-toggled English/Spanish, user input is free-form unconstrained, assessment content can be authored bilingually with shared answer keys. The locale toggle affects only chrome and assessment prompts; the theory remains as written. The design treats English as the load-bearing canonical and Spanish as a complement — never the inverse — because the curriculum is meant to be forkable by non-Spanish-speaking learners while still serving Borja's learning style.

End of theory pre-write for Phase 41. Labs and BLUEPRINTs are covered by parallel agents.