Skip to content

English · Español

Lab 04 — Quizzes, exams, and the SM-2 review loop

🇪🇸 El portal aprende a preguntar. Quizzes definidos en YAML por fase, con tipos de respuesta acotados al dominio §A13 (forma verbal, MCQ, código corto, ensayo corto). Cada error genera una tarjeta de repaso; el algoritmo SM-2 decide cuándo vuelve a aparecer. Lo que se falla, no se olvida.

Goal

Implement the quiz/exam pipeline: a YAML loader for data/quizzes/phase-NN-name.yaml, a grader for the four answer types (mcq, short, code, conjugation-form), a quiz_attempts persistence layer, automatic review_card creation on every wrong answer, and the daily "Today's reviews" page driven by SM-2 ease/interval updates. Exams are longer-form rubric-graded essays — the rubric is run by the model in LYNX_TEST_MODE=1 and by the teacher in production.

Why this lab exists

The portal's pedagogical loop is not "take a quiz, see your score, forget it." It is "take a quiz, mark the failures, surface them again tomorrow." Without the SM-2 surface, the quiz is decorative. With it, a wrong answer becomes a contract the system will honor on the learner's behalf until they get it right twice in a row.

The §A13 scope (20 verbs × 5 tenses × 3 persons) is small enough that the entire reviewable universe is bounded. SM-2 here is overkill in capacity but right in shape — Borja will use it through Phase 41 and reuse it for any later phase that adopts the same pattern.

Prerequisites

  • Labs 00–03 done.
  • Phase 20 evaluation harness exists and exposes grade_conjugation(prompt, given) -> GradeResult.
  • A13 verb data exists in data/verbs/ from Phase 12.
  • markdown-it-py + sanitizer from lab 03.

Deliverables

  • data/quizzes/phase-00-onboarding.yaml (example).
  • src/miniportal/quizzes/__init__.py — loader, models.
  • src/miniportal/quizzes/grader.py — dispatch by answer_type.
  • src/miniportal/quizzes/sm2.py — SM-2 ease/interval update.
  • Migrations: quiz_attempts, review_cards, exam_attempts.
  • src/miniportal/routes/quizzes.pyGET /quizzes, GET /quizzes/{phase}/{slug}, POST /quizzes/{...}/submit.
  • src/miniportal/routes/review.pyGET /review (today's due cards), POST /review/{card_id}/grade (SM-2 feedback button).
  • src/miniportal/routes/exams.py — exam routes (rubric-graded).
  • Templates: quiz_view.html.jinja, quiz_result.html.jinja, review_today.html.jinja, exam_view.html.jinja.
  • tests/portal/test_quiz_grading.py.
  • tests/portal/test_review_card_creation.py.
  • tests/portal/test_sm2_update.py.

Step 1 — Quiz YAML format

data/quizzes/phase-00-onboarding.yaml:

phase: 0
slug: onboarding
title_en: "Phase 0 onboarding check"
title_es: "Comprobación de incorporación a la fase 0"
items:
  - id: q1
    answer_type: mcq
    prompt_en: "Which command syncs the locked dependencies?"
    prompt_es: "¿Qué comando sincroniza las dependencias bloqueadas?"
    choices: ["pip install -r requirements.txt", "uv sync --frozen", "uv add fastapi", "poetry lock"]
    correct: 1
    rubric: "uv is mandatory per CLAUDE.md §2."
    tags: [tooling, uv]
  - id: q2
    answer_type: conjugation-form
    prompt_en: "Past simple of 'eat', 3rd singular?"
    prompt_es: "Pasado simple de 'eat',  persona del singular."
    correct: "ate"
    rubric: "Irregular verb; same form for all persons."
    tags: [a13, irregular, past-simple]
  - id: q3
    answer_type: short
    prompt_en: "Name the four answer types supported by the grader."
    correct: ["mcq", "short", "code", "conjugation-form"]
    rubric: "Order independent; case insensitive."
    tags: [meta]

Constraint: prompt_en is required; prompt_es is optional but strongly encouraged (A2 bilingual policy). Every item must carry at least one tag — the review queue uses tags to balance daily load.

Step 2 — Loader + models

# src/miniportal/quizzes/__init__.py
from dataclasses import dataclass
from pathlib import Path
import yaml


@dataclass(frozen=True)
class QuizItem:
    id: str
    answer_type: str   # "mcq" | "short" | "code" | "conjugation-form"
    prompt_en: str
    prompt_es: str | None
    correct: object    # int (mcq) | str | list[str]
    rubric: str
    tags: tuple[str, ...]


@dataclass(frozen=True)
class Quiz:
    phase: int
    slug: str
    title_en: str
    title_es: str | None
    items: tuple[QuizItem, ...]


def load_quiz(path: Path) -> Quiz:
    raise NotImplementedError("Lab 04 step 2 — parse YAML, validate against schema, return frozen Quiz.")

The schema validation happens at load time (not at grading time). A malformed YAML must fail loud on app start, not on first quiz attempt.

Step 3 — The grader

# src/miniportal/quizzes/grader.py
from dataclasses import dataclass

from miniportal.quizzes import QuizItem


@dataclass(frozen=True)
class GradeResult:
    correct: bool
    given: object
    expected: object
    feedback: str  # short, learner-facing


def grade(item: QuizItem, given: object) -> GradeResult:
    if item.answer_type == "mcq":
        raise NotImplementedError("Lab 04 step 3a — given is an int index; compare to item.correct.")
    if item.answer_type == "short":
        raise NotImplementedError(
            "Lab 04 step 3b — given is str; item.correct is list[str]; case-insensitive set membership."
        )
    if item.answer_type == "conjugation-form":
        raise NotImplementedError(
            "Lab 04 step 3c — delegate to Phase 20's grade_conjugation; case-insensitive, strip whitespace."
        )
    if item.answer_type == "code":
        raise NotImplementedError(
            "Lab 04 step 3d — execute given snippet in a sandboxed subprocess (Phase 31 sandbox); "
            "compare stdout to item.correct. Time limit 2 s, memory limit 64 MB."
        )
    raise ValueError(f"unknown answer_type: {item.answer_type}")

The code branch is the only one that touches a subprocess. It reuses Phase 31's sandbox helper — do not roll a new sandbox here.

Step 4 — quiz_attempts and review_cards

# Migration sketch (Lab 04)
# quiz_attempts:
#   id PK, student_id, phase, slug, started_at, submitted_at,
#   score_correct INT, score_total INT, payload_json TEXT (item id -> given)
# review_cards:
#   id PK, student_id, quiz_phase, quiz_slug, item_id,
#   ease_factor REAL DEFAULT 2.5,
#   interval_days INT DEFAULT 0,
#   repetitions INT DEFAULT 0,
#   due_on DATE,
#   created_at, last_reviewed_at
#   UNIQUE(student_id, quiz_phase, quiz_slug, item_id)
# exam_attempts:
#   id PK, student_id, exam_id, body_md, rubric_grade JSON, graded_by ('model' | 'teacher'), graded_at

On every wrong item in a quiz submission:

  • INSERT OR IGNORE into review_cards with defaults.
  • If already present, leave it — SM-2 owns the schedule from here on.

Step 5 — SM-2 algorithm

# src/miniportal/quizzes/sm2.py
from dataclasses import dataclass


@dataclass(frozen=True)
class ReviewState:
    ease_factor: float
    interval_days: int
    repetitions: int


def update(state: ReviewState, quality: int) -> ReviewState:
    """Standard SM-2 (Wozniak 1990).

    quality ∈ {0..5} (0 = forgot completely, 5 = perfect recall).

    if quality < 3:
        repetitions = 0
        interval_days = 1
    else:
        repetitions += 1
        if repetitions == 1: interval_days = 1
        elif repetitions == 2: interval_days = 6
        else:                   interval_days = round(interval_days * ease_factor)

    ease_factor = max(1.3, ease_factor + 0.1 - (5 - quality) * (0.08 + (5 - quality) * 0.02))

    Returns new ReviewState; pure function.
    """
    raise NotImplementedError("Lab 04 step 5 — implement the formula above; clamp ease_factor at 1.3.")

The portal's grade buttons are mapped: Again=0, Hard=2, Good=4, Easy=5. Quality 1 and 3 are deliberately unreachable to keep the UI to four buttons.

Step 6 — Routes

# src/miniportal/routes/quizzes.py
@router.get("/{phase}/{slug}")
async def show_quiz(phase: int, slug: str, student = Depends(current_student)):
    raise NotImplementedError("Lab 04 step 6 — load YAML, render quiz_view with prompts (locale-aware).")


@router.post("/{phase}/{slug}/submit")
async def submit_quiz(phase: int, slug: str, ...):
    """Side effects:
      1. INSERT quiz_attempt row.
      2. For each wrong item, INSERT OR IGNORE review_card.
      3. Update progress.status to 'in_progress' if not done.
      4. Render quiz_result.html with per-item feedback.
    """
    raise NotImplementedError("Lab 04 step 6 — implement the four side effects in a single transaction.")


# src/miniportal/routes/review.py
@router.get("")
async def review_today(student = Depends(current_student)):
    """Cards with due_on <= today, ordered by oldest due first.
    Cap at 30 cards/day (configurable) to avoid burnout."""
    raise NotImplementedError("Lab 04 step 6 — query due cards, render review_today.html.")


@router.post("/{card_id}/grade")
async def grade_review(card_id: int, quality: int = Form(...), student = Depends(current_student)):
    """quality ∈ {0,2,4,5} from the UI buttons. Computes SM-2 update, persists, returns next card."""
    raise NotImplementedError("Lab 04 step 6 — owner check, validate quality, sm2.update, persist, return JSON.")

Step 7 — Exams

# src/miniportal/routes/exams.py
@router.post("/{exam_id}/submit")
async def submit_exam(exam_id: int, body_md: str = Form(...), student = Depends(current_student)):
    """Rubric grading:
      - LYNX_TEST_MODE=1: invoke the model with the rubric as a prompt, store JSON grade.
      - Production: graded_by='teacher', status='pending'; teacher fills the rubric in lab 05's view.
    """
    raise NotImplementedError("Lab 04 step 7 — branch on test mode; persist exam_attempts row.")

The model-graded path is restricted to LYNX_TEST_MODE=1 to keep production grading human-anchored. The rubric prompt template lives at data/exams/rubric-template.md.

Step 8 — Tests

# tests/portal/test_quiz_grading.py
def test_mcq_correct(): raise NotImplementedError("MCQ index match.")
def test_mcq_wrong(): raise NotImplementedError("MCQ index mismatch.")
def test_short_case_insensitive(): raise NotImplementedError("'ate' == 'Ate'.")
def test_conjugation_form_delegates(): raise NotImplementedError("Mock Phase 20 grader; assert it's called.")
def test_code_sandbox_used(): raise NotImplementedError("Mock Phase 31 sandbox; assert the snippet is dispatched through it.")
# tests/portal/test_review_card_creation.py
def test_wrong_answer_creates_card(): raise NotImplementedError("Submit a quiz with one wrong item; assert one review_card row.")
def test_idempotent_resubmission(): raise NotImplementedError("Submit twice; only one review_card per (student, quiz, item).")
def test_correct_answer_no_card(): raise NotImplementedError("Submit all-correct; zero review_cards inserted.")
# tests/portal/test_sm2_update.py
def test_first_pass_interval_1():
    raise NotImplementedError("State(2.5, 0, 0) + quality 4 -> (≈2.5, 1, 1).")


def test_second_pass_interval_6():
    raise NotImplementedError("State(2.5, 1, 1) + quality 4 -> (≈2.5, 6, 2).")


def test_failure_resets_repetitions():
    raise NotImplementedError("State(2.5, 6, 2) + quality 0 -> (≈ slightly lower, 1, 0).")


def test_ease_floor_1_3():
    raise NotImplementedError("Repeated quality 0 inputs: ease_factor never goes below 1.3.")

What "done" looks like

  • At least one YAML quiz committed (phase-00-onboarding.yaml).
  • Loader rejects malformed YAML at app start.
  • All four answer types grade correctly; code runs through the Phase 31 sandbox.
  • Wrong answers create review cards; correct answers do not.
  • /review lists due cards capped at 30/day.
  • SM-2 formula matches the reference implementation; ease floor 1.3 enforced.
  • Exam route persists; model-grading gated behind LYNX_TEST_MODE=1.
  • Templates render bilingual prompts when student.locale != 'en' and prompt_es exists.
  • mypy --strict and bandit clean.

Common pitfalls

  1. Rolling your own sandbox for code. Phase 31 already did the work. Reusing it keeps a single hardening surface.
  2. Loading YAML at request time. A 50 ms parse on every quiz hit. Cache parsed quizzes at app start; reload only on file mtime change (or never reload in production).
  3. Letting case sensitivity vary by answer type. conjugation-form is case-insensitive; code stdout is case-sensitive. Document explicitly in the grader's docstring.
  4. Storing the payload_json with raw user input. Sanitize, or store and never render. The code snippets are particularly tempting to dump into a debug page.
  5. A 5-button SM-2 UI. Four buttons (0/2/⅘). Quality 1 and 3 add cognitive load without behavioral difference.
  6. Recreating a review card on every wrong answer. Use INSERT OR IGNORE. SM-2 owns the schedule once the card exists.
  7. Ungated model-graded exam in production. Costs spiral; teacher loses oversight. LYNX_TEST_MODE=1 is the gate.

Next: lab/05-admin-teacher-view.md — the admin dashboard.