English · Español
Lab 04 — Quizzes, exams, and the SM-2 review loop¶
🇪🇸 El portal aprende a preguntar. Quizzes definidos en YAML por fase, con tipos de respuesta acotados al dominio §A13 (forma verbal, MCQ, código corto, ensayo corto). Cada error genera una tarjeta de repaso; el algoritmo SM-2 decide cuándo vuelve a aparecer. Lo que se falla, no se olvida.
Goal¶
Implement the quiz/exam pipeline: a YAML loader for data/quizzes/phase-NN-name.yaml, a grader for the four answer types (mcq, short, code, conjugation-form), a quiz_attempts persistence layer, automatic review_card creation on every wrong answer, and the daily "Today's reviews" page driven by SM-2 ease/interval updates. Exams are longer-form rubric-graded essays — the rubric is run by the model in LYNX_TEST_MODE=1 and by the teacher in production.
Why this lab exists¶
The portal's pedagogical loop is not "take a quiz, see your score, forget it." It is "take a quiz, mark the failures, surface them again tomorrow." Without the SM-2 surface, the quiz is decorative. With it, a wrong answer becomes a contract the system will honor on the learner's behalf until they get it right twice in a row.
The §A13 scope (20 verbs × 5 tenses × 3 persons) is small enough that the entire reviewable universe is bounded. SM-2 here is overkill in capacity but right in shape — Borja will use it through Phase 41 and reuse it for any later phase that adopts the same pattern.
Prerequisites¶
- Labs 00–03 done.
- Phase 20 evaluation harness exists and exposes
grade_conjugation(prompt, given) -> GradeResult. - A13 verb data exists in
data/verbs/from Phase 12. markdown-it-py+ sanitizer from lab 03.
Deliverables¶
data/quizzes/phase-00-onboarding.yaml(example).src/miniportal/quizzes/__init__.py— loader, models.src/miniportal/quizzes/grader.py— dispatch byanswer_type.src/miniportal/quizzes/sm2.py— SM-2 ease/interval update.- Migrations:
quiz_attempts,review_cards,exam_attempts. src/miniportal/routes/quizzes.py—GET /quizzes,GET /quizzes/{phase}/{slug},POST /quizzes/{...}/submit.src/miniportal/routes/review.py—GET /review(today's due cards),POST /review/{card_id}/grade(SM-2 feedback button).src/miniportal/routes/exams.py— exam routes (rubric-graded).- Templates:
quiz_view.html.jinja,quiz_result.html.jinja,review_today.html.jinja,exam_view.html.jinja. tests/portal/test_quiz_grading.py.tests/portal/test_review_card_creation.py.tests/portal/test_sm2_update.py.
Step 1 — Quiz YAML format¶
data/quizzes/phase-00-onboarding.yaml:
phase: 0
slug: onboarding
title_en: "Phase 0 onboarding check"
title_es: "Comprobación de incorporación a la fase 0"
items:
- id: q1
answer_type: mcq
prompt_en: "Which command syncs the locked dependencies?"
prompt_es: "¿Qué comando sincroniza las dependencias bloqueadas?"
choices: ["pip install -r requirements.txt", "uv sync --frozen", "uv add fastapi", "poetry lock"]
correct: 1
rubric: "uv is mandatory per CLAUDE.md §2."
tags: [tooling, uv]
- id: q2
answer_type: conjugation-form
prompt_en: "Past simple of 'eat', 3rd singular?"
prompt_es: "Pasado simple de 'eat', 3ª persona del singular."
correct: "ate"
rubric: "Irregular verb; same form for all persons."
tags: [a13, irregular, past-simple]
- id: q3
answer_type: short
prompt_en: "Name the four answer types supported by the grader."
correct: ["mcq", "short", "code", "conjugation-form"]
rubric: "Order independent; case insensitive."
tags: [meta]
Constraint: prompt_en is required; prompt_es is optional but strongly encouraged (A2 bilingual policy). Every item must carry at least one tag — the review queue uses tags to balance daily load.
Step 2 — Loader + models¶
# src/miniportal/quizzes/__init__.py
from dataclasses import dataclass
from pathlib import Path
import yaml
@dataclass(frozen=True)
class QuizItem:
id: str
answer_type: str # "mcq" | "short" | "code" | "conjugation-form"
prompt_en: str
prompt_es: str | None
correct: object # int (mcq) | str | list[str]
rubric: str
tags: tuple[str, ...]
@dataclass(frozen=True)
class Quiz:
phase: int
slug: str
title_en: str
title_es: str | None
items: tuple[QuizItem, ...]
def load_quiz(path: Path) -> Quiz:
raise NotImplementedError("Lab 04 step 2 — parse YAML, validate against schema, return frozen Quiz.")
The schema validation happens at load time (not at grading time). A malformed YAML must fail loud on app start, not on first quiz attempt.
Step 3 — The grader¶
# src/miniportal/quizzes/grader.py
from dataclasses import dataclass
from miniportal.quizzes import QuizItem
@dataclass(frozen=True)
class GradeResult:
correct: bool
given: object
expected: object
feedback: str # short, learner-facing
def grade(item: QuizItem, given: object) -> GradeResult:
if item.answer_type == "mcq":
raise NotImplementedError("Lab 04 step 3a — given is an int index; compare to item.correct.")
if item.answer_type == "short":
raise NotImplementedError(
"Lab 04 step 3b — given is str; item.correct is list[str]; case-insensitive set membership."
)
if item.answer_type == "conjugation-form":
raise NotImplementedError(
"Lab 04 step 3c — delegate to Phase 20's grade_conjugation; case-insensitive, strip whitespace."
)
if item.answer_type == "code":
raise NotImplementedError(
"Lab 04 step 3d — execute given snippet in a sandboxed subprocess (Phase 31 sandbox); "
"compare stdout to item.correct. Time limit 2 s, memory limit 64 MB."
)
raise ValueError(f"unknown answer_type: {item.answer_type}")
The code branch is the only one that touches a subprocess. It reuses Phase 31's sandbox helper — do not roll a new sandbox here.
Step 4 — quiz_attempts and review_cards¶
# Migration sketch (Lab 04)
# quiz_attempts:
# id PK, student_id, phase, slug, started_at, submitted_at,
# score_correct INT, score_total INT, payload_json TEXT (item id -> given)
# review_cards:
# id PK, student_id, quiz_phase, quiz_slug, item_id,
# ease_factor REAL DEFAULT 2.5,
# interval_days INT DEFAULT 0,
# repetitions INT DEFAULT 0,
# due_on DATE,
# created_at, last_reviewed_at
# UNIQUE(student_id, quiz_phase, quiz_slug, item_id)
# exam_attempts:
# id PK, student_id, exam_id, body_md, rubric_grade JSON, graded_by ('model' | 'teacher'), graded_at
On every wrong item in a quiz submission:
INSERT OR IGNOREintoreview_cardswith defaults.- If already present, leave it — SM-2 owns the schedule from here on.
Step 5 — SM-2 algorithm¶
# src/miniportal/quizzes/sm2.py
from dataclasses import dataclass
@dataclass(frozen=True)
class ReviewState:
ease_factor: float
interval_days: int
repetitions: int
def update(state: ReviewState, quality: int) -> ReviewState:
"""Standard SM-2 (Wozniak 1990).
quality ∈ {0..5} (0 = forgot completely, 5 = perfect recall).
if quality < 3:
repetitions = 0
interval_days = 1
else:
repetitions += 1
if repetitions == 1: interval_days = 1
elif repetitions == 2: interval_days = 6
else: interval_days = round(interval_days * ease_factor)
ease_factor = max(1.3, ease_factor + 0.1 - (5 - quality) * (0.08 + (5 - quality) * 0.02))
Returns new ReviewState; pure function.
"""
raise NotImplementedError("Lab 04 step 5 — implement the formula above; clamp ease_factor at 1.3.")
The portal's grade buttons are mapped: Again=0, Hard=2, Good=4, Easy=5. Quality 1 and 3 are deliberately unreachable to keep the UI to four buttons.
Step 6 — Routes¶
# src/miniportal/routes/quizzes.py
@router.get("/{phase}/{slug}")
async def show_quiz(phase: int, slug: str, student = Depends(current_student)):
raise NotImplementedError("Lab 04 step 6 — load YAML, render quiz_view with prompts (locale-aware).")
@router.post("/{phase}/{slug}/submit")
async def submit_quiz(phase: int, slug: str, ...):
"""Side effects:
1. INSERT quiz_attempt row.
2. For each wrong item, INSERT OR IGNORE review_card.
3. Update progress.status to 'in_progress' if not done.
4. Render quiz_result.html with per-item feedback.
"""
raise NotImplementedError("Lab 04 step 6 — implement the four side effects in a single transaction.")
# src/miniportal/routes/review.py
@router.get("")
async def review_today(student = Depends(current_student)):
"""Cards with due_on <= today, ordered by oldest due first.
Cap at 30 cards/day (configurable) to avoid burnout."""
raise NotImplementedError("Lab 04 step 6 — query due cards, render review_today.html.")
@router.post("/{card_id}/grade")
async def grade_review(card_id: int, quality: int = Form(...), student = Depends(current_student)):
"""quality ∈ {0,2,4,5} from the UI buttons. Computes SM-2 update, persists, returns next card."""
raise NotImplementedError("Lab 04 step 6 — owner check, validate quality, sm2.update, persist, return JSON.")
Step 7 — Exams¶
# src/miniportal/routes/exams.py
@router.post("/{exam_id}/submit")
async def submit_exam(exam_id: int, body_md: str = Form(...), student = Depends(current_student)):
"""Rubric grading:
- LYNX_TEST_MODE=1: invoke the model with the rubric as a prompt, store JSON grade.
- Production: graded_by='teacher', status='pending'; teacher fills the rubric in lab 05's view.
"""
raise NotImplementedError("Lab 04 step 7 — branch on test mode; persist exam_attempts row.")
The model-graded path is restricted to LYNX_TEST_MODE=1 to keep production grading human-anchored. The rubric prompt template lives at data/exams/rubric-template.md.
Step 8 — Tests¶
# tests/portal/test_quiz_grading.py
def test_mcq_correct(): raise NotImplementedError("MCQ index match.")
def test_mcq_wrong(): raise NotImplementedError("MCQ index mismatch.")
def test_short_case_insensitive(): raise NotImplementedError("'ate' == 'Ate'.")
def test_conjugation_form_delegates(): raise NotImplementedError("Mock Phase 20 grader; assert it's called.")
def test_code_sandbox_used(): raise NotImplementedError("Mock Phase 31 sandbox; assert the snippet is dispatched through it.")
# tests/portal/test_review_card_creation.py
def test_wrong_answer_creates_card(): raise NotImplementedError("Submit a quiz with one wrong item; assert one review_card row.")
def test_idempotent_resubmission(): raise NotImplementedError("Submit twice; only one review_card per (student, quiz, item).")
def test_correct_answer_no_card(): raise NotImplementedError("Submit all-correct; zero review_cards inserted.")
# tests/portal/test_sm2_update.py
def test_first_pass_interval_1():
raise NotImplementedError("State(2.5, 0, 0) + quality 4 -> (≈2.5, 1, 1).")
def test_second_pass_interval_6():
raise NotImplementedError("State(2.5, 1, 1) + quality 4 -> (≈2.5, 6, 2).")
def test_failure_resets_repetitions():
raise NotImplementedError("State(2.5, 6, 2) + quality 0 -> (≈ slightly lower, 1, 0).")
def test_ease_floor_1_3():
raise NotImplementedError("Repeated quality 0 inputs: ease_factor never goes below 1.3.")
What "done" looks like¶
- At least one YAML quiz committed (
phase-00-onboarding.yaml). - Loader rejects malformed YAML at app start.
- All four answer types grade correctly;
coderuns through the Phase 31 sandbox. - Wrong answers create review cards; correct answers do not.
-
/reviewlists due cards capped at 30/day. - SM-2 formula matches the reference implementation; ease floor 1.3 enforced.
- Exam route persists; model-grading gated behind
LYNX_TEST_MODE=1. - Templates render bilingual prompts when
student.locale != 'en'andprompt_esexists. -
mypy --strictandbanditclean.
Common pitfalls¶
- Rolling your own sandbox for
code. Phase 31 already did the work. Reusing it keeps a single hardening surface. - Loading YAML at request time. A 50 ms parse on every quiz hit. Cache parsed quizzes at app start; reload only on file mtime change (or never reload in production).
- Letting case sensitivity vary by answer type.
conjugation-formis case-insensitive;codestdout is case-sensitive. Document explicitly in the grader's docstring. - Storing the
payload_jsonwith raw user input. Sanitize, or store and never render. Thecodesnippets are particularly tempting to dump into a debug page. - A 5-button SM-2 UI. Four buttons (0/2/⅘). Quality 1 and 3 add cognitive load without behavioral difference.
- Recreating a review card on every wrong answer. Use
INSERT OR IGNORE. SM-2 owns the schedule once the card exists. - Ungated model-graded exam in production. Costs spiral; teacher loses oversight.
LYNX_TEST_MODE=1is the gate.
Next: lab/05-admin-teacher-view.md — the admin dashboard.