Skip to content

English · Español

How it all fits together

Lynx Cortex looks like several products — a documentation site, an interactive learner portal, and a set of offline books — but it is built from one content tree with three renderers and shipped down three independent deploy lanes. This page is the map: where the single sources of truth live, how the docs pipeline turns them into a static site, how the FastAPI portal reuses the same content, how credentials work, and how the three lanes deploy without stepping on each other.

Where this page sits

This is the published, bilingual version of the engineering reference kept at ARCHITECTURE.md in the repository root. The root file is the terse source of truth for contributors; this page is the readable tour. When the two disagree, the root file wins — but they are meant to stay in sync.

The big picture

One tree of Markdown and YAML feeds every output. The generators are deterministic and idempotent: the same inputs always produce the same site, portal data and books.

flowchart TB
  subgraph SOT["Single sources of truth"]
    DOCS["docs/phase-NN-*/<br/>README · theory · lab · break · quizzes<br/>(EN + ES mirrors)"]
    META["data/curriculum/*.yaml<br/>phase_meta · phase_study_meta · phase_references"]
    QYAML["data/quizzes/*.yaml<br/>data/exams/*.yaml"]
    GLOSS["GLOSSARY.md / .es.md"]
  end

  subgraph GEN["Generators — just docs-gen"]
    G["build_phase_meta · build_phase_extras<br/>build_study_plan · build_glossary_data<br/>build_lang_pairs"]
    GB["build_books.py"]
  end

  DOCS --> G & GB
  META --> G
  GLOSS --> G

  G --> SITE["mkdocs build --strict → site/"]
  GB --> BOOKS["dist/books/*.{pdf,epub}"]

  SITE -->|deploy-docs.yml| CF["Cloudflare Pages (static)"]
  BOOKS -->|release-books.yml| GHREL["GitHub Release · tag=books"]
  GHREL -->|gh release download| SITE

  subgraph PORTAL["Portal — FastAPI"]
    CURR["curriculum.py reads docs/ + phase_meta.yaml"]
    DB["portal.db"]
    VAULT["vault.db"]
    SRS["*_review.sqlite"]
  end

  DOCS --> CURR
  META --> CURR
  QYAML -->|portal_seed_quizzes.py| DB
  PORTAL -->|deploy-portal.yml| FLY["Fly.io + persistent volume<br/>+ Cloudflare front"]

Seven facts hold this whole system together:

  1. One content tree, three renderers — site, portal, books.
  2. data/curriculum/phase_meta.yaml is the shared nav spine.
  3. Quizzes flow YAML → seeder → DB; marks are computed, never authored.
  4. Three separate SQLite databases, deliberately kept apart.
  5. Books are built once, consumed everywhere.
  6. Credentials are stateless signed tokens — nothing stored server-side.
  7. Three independent, path-filtered deploy lanes.

1. The single sources of truth

Everything downstream is a projection of a small set of authored inputs. Edit these; never edit the generated artifacts.

Source What it owns Consumed by
docs/phase-NN-*/ All phase prose: README, theory, lab, break, quizzes — EN + ES mirrors Site, portal, books
data/curriculum/phase_meta.yaml Cross-phase nav spine: slug, titles, summary, requires (informational), teaches Site generators + portal
data/curriculum/phase_study_meta.yaml Effort model for the study planner build_study_plan.py
data/curriculum/phase_references.yaml "Further reading" per phase build_phase_extras.py
data/quizzes/*.yaml, data/exams/*.yaml Quiz and exam content portal_seed_quizzes.py → portal DB
GLOSSARY.md / .es.md Concept terms and definitions build_glossary_data.py

Bilingual policy (§A17)

Every document is mirrored as X.md (English) and X.es.md (Spanish), and both are equally authoritative. When an English source is edited, its Spanish mirror is updated in the same commit. The runtime EN/ES toggle in the header is pure client-side JavaScript driven by a generated URL map (window.LYNX_LANG_PAIRS) — there is no server round-trip. Code identifiers, file paths, shell commands and commit messages stay English.

2. The docs-site pipeline

The site is plain mkdocs-material: docs_dir: docs, site_dir: site, theme overrides in overrides/, and only the built-in search plugin enabled. All interactivity is generated static JavaScript — there is no application server behind the docs.

just docs runs the whole pipeline:

just docs   # docs-gen (5 generators) → ensure books exist → mkdocs build --strict
flowchart LR
  subgraph INPUTS["Authored inputs"]
    D["docs/*.md (EN + ES)"]
    M["phase_meta.yaml"]
    GL["GLOSSARY.md"]
  end

  subgraph DOCSGEN["just docs-gen — deterministic"]
    LP["build_lang_pairs<br/>→ lang-pairs.js"]
    PM["build_phase_meta<br/>→ Requires/Teaches + reference.md<br/>VALIDATES slugs · coverage · acyclicity"]
    PX["build_phase_extras<br/>→ concept map + further reading"]
    SP["build_study_plan<br/>→ window.LYNX_STUDY + planner"]
    GD["build_glossary_data<br/>→ window.LYNX_GLOSSARY tooltips"]
  end

  D --> LP & PM & PX & SP & GD
  M --> PM & SP
  GL --> GD

  LP & PM & PX & SP & GD --> BUILD["mkdocs build --strict"]
  BUILD --> S["site/ (static)"]

The five generators (all in scripts/, all deterministic) inject data bundles and projected Markdown blocks into the tree before MkDocs runs:

  • build_phase_meta — projects the Requires / Teaches blocks into each README and builds the reference index. It is also the curriculum integrity gate: it validates slugs, coverage and acyclicity of the prerequisite graph and fails the build if they break.
  • build_phase_extras — the per-phase concept-map widget and the "further reading" blocks.
  • build_study_plan — the window.LYNX_STUDY bundle behind the interactive study planner (pace selector + Gantt + cards).
  • build_glossary_data — the window.LYNX_GLOSSARY bundle that powers the hover-to-explain concept tooltips.
  • build_lang_pairs — the window.LYNX_LANG_PAIRS EN↔ES URL map used by the header language toggle.

Strict by design

The build runs with --strict, so a broken internal link or an orphaned nav entry is a hard failure, not a warning. That is why the prerequisite graph is validated up front: a typo in a slug fails fast, locally, before it can reach the deploy.

3. The offline books

The same Markdown becomes downloadable PDF and EPUB books — one per language. The generator (scripts/build_books.py) deliberately uses no pandoc, no LaTeX, no Node:

  • WeasyPrint renders HTML → PDF with a cover, table of contents and page breaks.
  • ebooklib writes the EPUB.
  • matplotlib mathtext turns every $…$ expression into a cached SVG (in dist/books/_mathcache) so equations render identically on any reader with no fonts to install.

Each phase is a chapter; theory, lab, break and quiz files are sections. Crucially, the books are built once by the release-books.yml workflow and published to the books GitHub Release — then downloaded by the docs deploy. They are never rebuilt in the fast docs lane, which keeps that lane quick.

Why books are a release artifact, not committed

Books are regenerable from the content tree, so committing them would bloat git history with binaries that drift out of sync. Instead CI builds them in their own slow lane and the docs lane pulls the finished files via gh release download. Locally, just docs builds them once if absent.

4. The learner portal

The portal (src/miniportal/) is a server-rendered FastAPI + SQLModel + Jinja2 + HTMX application — no single-page app. create_app(config) is the sole entrypoint, usable both in production (env-driven config) and in tests (an explicit config against an in-memory database).

It reuses the curriculum rather than copying it: curriculum.py reads the same docs/ tree and phase_meta.yaml the site is built from, read-only — the portal never writes back into docs/. Prerequisites surface as informational badges, never as locks.

Request path and middleware

Every request passes through a fixed middleware stack. Starlette executes outermost-first, so the order added in create_app is the inverse of execution:

flowchart TB
  REQ([Incoming request]) --> SEC["SecurityHeaders<br/>(outermost)"]
  SEC --> OBS["RequestObservability<br/>(/metrics)"]
  OBS --> RL["RateLimiter"]
  RL --> BSL["BodySizeLimit"]
  BSL --> INJ["InjectionFilter<br/>(innermost)"]
  INJ --> ROUTERS

  subgraph ROUTERS["Routers — each a build_router factory"]
    direction LR
    A["auth · dashboard · academic · locale"]
    B["notes · quiz (+SRS) · downloads"]
    C["admin · admin_overrides · exam_engine"]
    D["lab_tracker · capstone_tracker · grading · obs_extended"]
  end

  ROUTERS --> DBS

  subgraph DBS["Three separate SQLite stores"]
    direction LR
    P[("portal.db<br/>SQLModel main store")]
    V[("vault.db<br/>AES-256-GCM · minivault")]
    R[("*_review.sqlite<br/>SM-2 SRS · minireview")]
  end

Three databases, kept apart

The three SQLite stores are intentionally separate so a compromise or corruption in one does not reach the others:

  • portal.db — the main SQLModel store (users, attempts, marks, notes).
  • vault.db — an AES-256-GCM encrypted vault (minivault).
  • *_review.sqlite — the SM-2 spaced-repetition store (minireview), raw SQLite.

Authentication and i18n

Auth is Argon2id (t=3, 64 MiB, p=2) with a server pepper; sessions, CSRF and invite tokens are itsdangerous-signed from cfg.session_secret; cookies are HttpOnly / Secure / SameSite. Authorization climbs a dependency ladder — current_student → require_teacher_or_admin → require_admin — and mutating routes carry a double-submit CSRF check. Interface translation is a plain Python t() dictionary (EN + ES), not gettext.

5. Content flow and where marks come from

The cardinal rule: marks are computed, never authored. No one types a grade into a file; grading/service.py derives the report from the learner's actual attempt rows.

flowchart LR
  PROSE["Phase prose / theory / labs"] --> DOCS["docs/ (one SoT)"]
  DOCS --> SITE2["Site"]
  DOCS --> PORTAL2["Portal"]
  DOCS --> BOOKS2["Books"]

  QY["data/quizzes · data/exams (*.yaml)"] -->|portal_seed_quizzes.py| PDB[("portal.db")]
  LEARN([Learner attempts]) --> ATT["attempt rows in portal.db"]
  ATT -->|grading/service.py| REPORT["compute_report → 5-tier band<br/>PASS_MARK = 50"]
  REPORT --> CRED["Credentials"]
  • Phase prose, theory and labs live once in docs/ and render to all three surfaces.
  • Cross-phase metadata lives once in phase_meta.yaml, shared by the site generators and the portal.
  • Quizzes and exams are authored as YAML, seeded into portal.db by portal_seed_quizzes.py.
  • Grading reads attempt rows and produces a 5-tier band with PASS_MARK = 50.

6. Credentials — stateless and verifiable

Certificates, transcripts and ID cards carry stateless verifiable tokens. Nothing is stored server-side; verification is a pure HMAC check.

flowchart LR
  G["grading.compute_report"] --> B["band + payload"]
  B --> T["credentials.make_token<br/>base64url(json) + HMAC-SHA256(session secret)"]
  T --> DOC["Certificate / transcript / id-card HTML<br/>+ fingerprint code + /verify?token= URL"]
  DOC --> PUB["Public /verify"]
  PUB --> CHK{"HMAC valid?<br/>(constant-time)"}
  CHK -->|yes| OK["Authentic — render payload"]
  CHK -->|no| NO["Reject"]

The token is base64url JSON plus an HMAC-SHA256 of the session secret, embedded in the credential HTML alongside a human-readable fingerprint code and a /verify?token= URL. The public /verify endpoint checks the HMAC in constant time; because nothing is persisted, there is no database to tamper with. A certificate is gated on legal name + accepted terms version + overall mark ≥ 50.

7. Deployment — three lanes, five workflows

Five GitHub Actions workflows in .github/workflows/, split into three independent, path-filtered deploy lanes plus two pure gates. A change to the docs never triggers a portal deploy, and vice versa.

flowchart TB
  subgraph GATES["Gates — no deploy"]
    CI["ci.yml<br/>ruff + mypy(src) + pytest"]
    PT["portal-tests.yml<br/>portal pytest + Dockerfile/compose checks"]
  end

  subgraph LANES["Three deploy lanes"]
    DD["deploy-docs.yml<br/>generators → gh release download books<br/>→ mkdocs build --strict → wrangler pages deploy"]
    RB["release-books.yml<br/>WeasyPrint + mathcache → books Release"]
    DP["deploy-portal.yml<br/>flyctl deploy --remote-only"]
  end

  DD --> CFP["Cloudflare Pages (static site)"]
  RB --> REL["GitHub Release · tag=books"]
  REL -.consumed by.-> DD
  DP --> FLYIO["Fly.io + persistent volume + Cloudflare front"]
Workflow Role
ci.yml Lint (ruff) + types (mypy src) + tests (pytest). A gate, no deploy.
deploy-docs.yml Generators → gh release download booksmkdocs build --strictwrangler pages deploy site. "GitHub builds, Cloudflare publishes."
release-books.yml Builds the four books (WeasyPrint, mathcache) and publishes the books Release.
deploy-portal.yml flyctl deploy --remote-only to Fly.io (needs FLY_API_TOKEN).
portal-tests.yml Portal pytest plus Dockerfile / compose checks. A gate, no deploy.

The portal container and the single-writer invariant

The portal ships from docker/portal.Dockerfile: a two-stage build running as a non-root portal user (uid 10001), venv at /opt/venv, started with python scripts/portal_run.py. fly.toml deploys app lynx-cortex-portal in region cdg (Paris) with a persistent volume lynx_data mounted at /var/lib/lynx-cortex, scale-to-zero when idle, and secrets injected via scripts/fly-secrets.example.sh.

Why a single machine is a feature, not a limit

A Fly volume binds to exactly one machine at a time, which physically enforces the single-SQLite-writer invariant. There can never be two processes writing the same database, because there can never be two machines mounting the same volume. Backups are handled by portal_backup.py (SQLite online-backup API), with _snapshot_rotate and a guarded _restore. Do not raise the machine count past one without first moving off SQLite.


Where to go next