Skip to content

English · Español

Theory 02 — Data model

🇪🇸 Doce tablas SQLite, ni una más. Cada una con una responsabilidad clara y una clave foránea explícita. Las migraciones son código versionado en src/miniportal/migrations/ mediante alembic; no se "modifica el schema a mano". El principio rector: el portal sirve un solo aprendiz local hoy, pero el schema admite N sin reescritura — la diferencia entre 1 y N vive en students.id, no en el código.

Why SQLite, why now

Phase 41 builds a learner portal. The data is small (single learner, ~40 phases, hundreds of journal entries, thousands of exam responses over the curriculum's lifetime). The deployment is local — Borja's laptop, served on localhost. Three properties drop out:

  • Concurrent writers: one (the learner).
  • Total row count after 2 years of study: estimated at < 50 k across all tables.
  • Backup story: copy the file.

SQLite is the only correct choice. Postgres would be over-engineering — a server process, a connection pool, a credential store — for a workload SQLite handles in microseconds. The portal pins sqlite3 (stdlib) for transport and alembic for migrations.

The migration policy is non-negotiable. Every schema change is a versioned file under src/miniportal/migrations/versions/<rev>_<slug>.py. Hand-edits to instance/portal.db are forbidden — they break reproducibility, the §0.5 hard rule.

The twelve tables

The data model has twelve tables. They partition into four concerns:

  1. Identitystudents, credentials, sessions.
  2. Curriculum stateprogress, journal_entries, notes.
  3. Assessmentquizzes, quiz_attempts, exam_questions, exam_responses.
  4. Memoryreview_cards, audit_log.

Each table is presented as a CREATE TABLE statement in SQLite dialect, followed by the index list and the foreign-key cascade choice.

students

CREATE TABLE students (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    username        TEXT NOT NULL UNIQUE,
    display_name    TEXT NOT NULL,
    locale          TEXT NOT NULL CHECK (locale IN ('en', 'es')),
    role            TEXT NOT NULL CHECK (role IN ('student', 'admin', 'teacher')),
    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    last_login_at   TIMESTAMP
);

CREATE UNIQUE INDEX ix_students_username ON students(username);

The locale column persists per-student UI language. The role column distinguishes the lone admin (Borja himself, in the MVP) from future students. A teacher role exists in the schema but the MVP does not exercise it — the column is forward-looking, not aspirational; deleting it later is harder than adding a CHECK constraint now.

credentials

CREATE TABLE credentials (
    student_id                  INTEGER NOT NULL PRIMARY KEY,
    argon2_hash                 BLOB,
    salt                        BLOB NOT NULL,
    must_set_on_next_login      BOOLEAN NOT NULL DEFAULT 1,
    password_set_at             TIMESTAMP,
    last_changed_at             TIMESTAMP,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE
);

argon2_hash is nullable — that nullability is the schema-level expression of the "no password by default" flow (theory 03). salt is per-credential, not global; the pepper (server-wide) is environmental and never lands here. The cascade is ON DELETE CASCADE: deleting a student deletes the credential row. Anything more permissive would leak hashes.

sessions

CREATE TABLE sessions (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id      INTEGER NOT NULL,
    token_hash      BLOB NOT NULL,
    ip              TEXT,
    user_agent      TEXT,
    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at      TIMESTAMP NOT NULL,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE
);

CREATE INDEX ix_sessions_token_hash ON sessions(token_hash);
CREATE INDEX ix_sessions_expires_at ON sessions(expires_at);

The session table stores hashes of session tokens, not the tokens themselves. The cookie carries the bearer token; the server hashes on receipt and compares. Bigger schemas use a separate token vault; for the MVP the hash-in-row pattern is enough — itsdangerous handles signing, and the row only stores the hash for revocation purposes.

expires_at has its own index because the cleanup job (DELETE FROM sessions WHERE expires_at < CURRENT_TIMESTAMP) is the single most frequent admin-pane operation.

progress

CREATE TABLE progress (
    student_id      INTEGER NOT NULL,
    phase_n         SMALLINT NOT NULL CHECK (phase_n BETWEEN 0 AND 41),
    status          TEXT NOT NULL CHECK (status IN ('not_started', 'in_progress', 'done')),
    opened_at       TIMESTAMP,
    closed_at       TIMESTAMP,
    checkpoint_json TEXT,
    PRIMARY KEY (student_id, phase_n),
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE
);

Composite primary key — the row is uniquely identified by the (student, phase) pair. checkpoint_json mirrors the contents of learners/<name>/phase-NN/checkpoint.json (Phase 0 file convention) into the database, so the portal can render progress without parsing the filesystem on every page load. The filesystem version is canonical; the table is a denormalized cache.

journal_entries

CREATE TABLE journal_entries (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id      INTEGER NOT NULL,
    day             DATE NOT NULL,
    body_markdown   TEXT NOT NULL,
    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE
);

CREATE INDEX ix_journal_student_day ON journal_entries(student_id, day);

The convention from learners/<name>/journal/YYYY-MM-DD.md is one file per day, append-only within the day. The table mirrors that: there can be multiple journal_entries rows per (student, day), each one an append. Reconstructing the daily file means SELECT body_markdown FROM journal_entries WHERE student_id = ? AND day = ? ORDER BY created_at, joined by \n\n---\n\n.

notes

CREATE TABLE notes (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id      INTEGER NOT NULL,
    phase_n         SMALLINT,
    page_path       TEXT,
    body_markdown   TEXT NOT NULL,
    tags            TEXT,
    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE
);

CREATE INDEX ix_notes_student_phase ON notes(student_id, phase_n);
CREATE INDEX ix_notes_student_page ON notes(student_id, page_path);

phase_n and page_path are both nullable. A note can be "about phase 11" (phase scoped), "about docs/phase-15/theory/02-attention.md" (page scoped), or freestanding. tags is a comma-separated string — denormalized on purpose; the volume is too small to justify a note_tags join table.

quizzes

CREATE TABLE quizzes (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    slug            TEXT NOT NULL UNIQUE,
    phase_n         SMALLINT NOT NULL,
    body_yaml       TEXT NOT NULL
);

CREATE UNIQUE INDEX ix_quizzes_slug ON quizzes(slug);

The quiz definitions live in data/quizzes/*.yaml, version-controlled in the repo. A boot-time loader reads them and upserts into the table. The YAML is the canonical source; the row is a cache. This means changing a quiz means editing YAML + running migrations + restarting — the same flow as the rest of the curriculum. Quiz content lives in git history; you can git blame a quiz question.

quiz_attempts

CREATE TABLE quiz_attempts (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id      INTEGER NOT NULL,
    quiz_id         INTEGER NOT NULL,
    started_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    completed_at    TIMESTAMP,
    score_pct       REAL,
    answers_json    TEXT,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
    FOREIGN KEY (quiz_id) REFERENCES quizzes(id) ON DELETE RESTRICT
);

The cascade choice here is deliberate. ON DELETE CASCADE for the student (deleting Borja removes his attempts) but ON DELETE RESTRICT for the quiz (you cannot delete a quiz that has attempts — soft-delete via a deprecated_at column on quizzes is the future fix when needed).

exam_questions

CREATE TABLE exam_questions (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    slug            TEXT NOT NULL UNIQUE,
    phase_n         SMALLINT NOT NULL,
    prompt          TEXT NOT NULL,
    answer_key      TEXT NOT NULL,
    rubric          TEXT
);

CREATE UNIQUE INDEX ix_exam_questions_slug ON exam_questions(slug);

Exam questions are the high-stakes assessment, distinct from quizzes. Phase 30 (structured exam answers) defines the rubric format. The rubric column is a JSON-encoded structured grader specification — see Phase 30 theory for the schema. The grammar tutor capstone (Phase 32 under §A13) uses exam questions of the form "conjugate the verb eat in past simple, 3rd person singular" with answer_key = "ate".

exam_responses

CREATE TABLE exam_responses (
    id                          INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id                  INTEGER NOT NULL,
    exam_question_id            INTEGER NOT NULL,
    response_text               TEXT NOT NULL,
    correct                     BOOLEAN NOT NULL,
    correct_at_first_try        BOOLEAN NOT NULL,
    attempted_at                TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    feedback                    TEXT,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
    FOREIGN KEY (exam_question_id) REFERENCES exam_questions(id) ON DELETE RESTRICT
);

CREATE INDEX ix_exam_responses_student ON exam_responses(student_id);
CREATE INDEX ix_exam_responses_question ON exam_responses(exam_question_id);

correct_at_first_try is the gating column for spaced repetition (theory 05). A wrong answer at any retry attempt seeds a review_cards row. A correct-first-try answer never enters spaced repetition — it's already known.

review_cards

CREATE TABLE review_cards (
    id                  INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id          INTEGER NOT NULL,
    exam_question_id    INTEGER NOT NULL,
    ease                REAL NOT NULL DEFAULT 2.5,
    interval_days       INTEGER NOT NULL DEFAULT 1,
    due_on              DATE NOT NULL,
    last_reviewed_at    TIMESTAMP,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
    FOREIGN KEY (exam_question_id) REFERENCES exam_questions(id) ON DELETE RESTRICT
);

CREATE UNIQUE INDEX ix_review_cards_student_question
    ON review_cards(student_id, exam_question_id);
CREATE INDEX ix_review_cards_due ON review_cards(student_id, due_on);

The SM-2 state is captured by three columns: ease (the easiness factor, starting at 2.5), interval_days (the number of days until next review), and due_on (today + interval). Theory 05 derives the update rule. The unique index (student_id, exam_question_id) enforces the invariant that a failed question produces one card per student, not one per failure.

audit_log

CREATE TABLE audit_log (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    student_id      INTEGER,
    action          TEXT NOT NULL,
    details_json    TEXT NOT NULL,
    at              TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE SET NULL
);

CREATE INDEX ix_audit_log_student ON audit_log(student_id);
CREATE INDEX ix_audit_log_at ON audit_log(at);

The audit log records security-relevant actions: login_success, login_failure, password_set, password_reset_requested, quiz_submitted, exam_submitted, admin_create_student. student_id is nullable and uses ON DELETE SET NULL — the audit row survives the student deletion (forensically critical) but loses the FK link. Phase 37 threat T8 (trace storage as injection vector) governs the contents of details_json; injection filter applied before insert.

Migration policy via alembic

Schema changes follow this discipline:

  1. Edit src/miniportal/models.py (the SQLAlchemy declarative models).
  2. Run alembic revision --autogenerate -m "add review_cards.ease_floor" — produces a versioned migration file.
  3. Inspect the generated SQL. Hand-correct anything autogenerate got wrong (autogenerate misses CHECK constraint changes, for example).
  4. Run alembic upgrade head against a copy of production data. Verify.
  5. Commit the migration file and the model change in one commit.

Downgrades are written but rarely used; the policy is "fix forward, not roll back." A migration that turns out wrong is corrected by a new migration, not by alembic downgrade. The DB is small enough that even destructive migrations can be tested against a copy in seconds.

Cascade choice summary

Parent Child On parent delete Rationale
students credentials CASCADE Credential without owner is meaningless.
students sessions CASCADE Same.
students progress CASCADE Same.
students journal_entries CASCADE Same.
students notes CASCADE Same.
students quiz_attempts CASCADE Same.
students exam_responses CASCADE Same.
students review_cards CASCADE Same.
students audit_log SET NULL Forensic survival.
quizzes quiz_attempts RESTRICT Quizzes don't get deleted; deprecate via column.
exam_questions exam_responses RESTRICT Same; preserve history.
exam_questions review_cards RESTRICT Same.

What this schema does NOT model

  • Multi-tenant orgs. No organizations table. The portal is single-org (the curriculum is the org). Adding orgs is a future migration if/when the portal is hosted.
  • Roles beyond three. student/admin/teacher is enough. RBAC tables are deferred.
  • Soft deletes. Everything is a hard delete, gated by FK cascades. Soft-delete becomes necessary only at scale Phase 41 does not anticipate.
  • Versioned content. Quizzes and exam questions are not versioned in-DB; git handles versioning. The columns are the current version.

One-paragraph recap

Twelve tables, SQLite, alembic migrations, explicit cascade choices, FK-typed primary keys, JSON columns for genuinely-nested data (details_json, answers_json, checkpoint_json) and only those. The schema admits N learners because students.id is the foreign key everywhere; the deployment is single-learner only by convention. The next theory file covers the credentials lifecycle and the minivault encryption-at-rest contract.

Next: theory/03-auth-and-vault.md — Argon2id parameters, no-password-by-default, AES-GCM minivault, session cookies.