Skip to content

English · Español

00 — The 2026 AI-Lab Interview Landscape

🇪🇸 Mapa del loop de entrevistas en laboratorios de IA en 2026: pantalla telefónica, ML-systems, coding, lectura de paper, design, behavioral. Cada lab del lynx-cortex mapea a uno o más tipos de ronda.

The standard loop (Anthropic / OpenAI / DeepMind / Google Brain / xAI / Cohere / Mistral)

A 2026 AI-engineer / research-engineer loop is 5 to 7 hours, typically split across two days:

Round Duration What is graded
0. Recruiter screen 30 min Resume signal, role/team fit, comp expectations
1. Phone / video screen 60 min One ML concept question + one coding question (LeetCode-medium or implement-a-primitive)
2. ML systems 60 min Open-ended: "design a chatbot for 10M DAU", capacity math, failure modes
3. Coding (implementation) 60 min Implement attention / BPE / LoRA / DPO loss in NumPy or PyTorch from scratch
4. Paper / depth 45-60 min Read a recent paper (sometimes pre-shared, sometimes cold), discuss strengths/weaknesses
5. Behavioral / values 45 min STAR-format anecdotes; for Anthropic, expect "what is your view on AI safety"
6. Hiring manager / bar raiser 30-45 min Calibration, sells you on the team, surfaces concerns

For research scientist roles, swap round 3 for a research-project deep dive (45 minutes of you presenting prior work, 45 minutes of them probing it).

What each round actually screens for

Phone screen — baseline competence

  • Not a hard filter on cleverness; it filters out candidates who don't know what attention is.
  • Coding portion is "can you write code that compiles" not "can you golf this in 5 lines".
  • Failure mode: spending 45 minutes on the coding half and 5 on the concept. They are equal-weight.

ML systems — can you reason under partial information

  • They will not give you full requirements. You must elicit them.
  • The graded skill: capacity math (Little's law, tokens/s, GPU memory budget), failure-mode enumeration (OOM, NCCL hang, thermal throttle), cost discipline (CpQU — cost per quality unit).
  • See theory/02-systems-design-for-llms.md for the 5 canonical prompts.

Coding — implementation muscle

  • The "depth filter". Many candidates know attention conceptually but cannot write softmax(Q @ K.T / sqrt(d_k)) @ V from a blank file in 20 minutes.
  • Whether they let you use PyTorch varies. Anthropic phone screens often demand NumPy; on-sites allow PyTorch.
  • See theory/04-coding-drills.md for the 12 staple drills.

Paper read — can you read like a researcher

  • The graded skill: in 20 minutes, identify (a) the claim, (b) the method, © what is not tested.
  • They want to hear "the ablation on §5.2 doesn't control for tokenizer differences", not "the paper is impressive".
  • See theory/03-paper-read-drill.md for the 20-minute protocol on 4 canonical papers.

Behavioral — taste, judgment, and ownership

  • At Anthropic specifically: expect "tell me about a time a model behaved badly and you investigated".
  • STAR is the format, but the substance is what tradeoffs did you make. "I shipped X" is half the answer; "I shipped X instead of Y because Z" is the full one.
  • See theory/05-behavioral-and-storytelling.md for 10 pre-written anecdotes from the lynx-cortex journey.

How lynx-cortex labs map to interview rounds

Lab / Phase artifact Interview round it prepares
Phase 04 — Calculus & Optimization Phone screen (gradient questions); paper round (DPO derivation)
Phase 07 — Scalar autograd Coding round (implement backward)
Phase 08 — Tensor autograd Coding round (broadcasting, shape bugs)
Phase 11 — Tokenization BPE Coding round (drill 02)
Phase 15 — Attention Coding round (drill 01), ML-systems (attention cost math)
Phase 16 — Positional encodings Coding round (RoPE drill 10)
Phase 17 — Mini-GPT Paper round (Vaswani 2017)
Phase 19 — Training dynamics Behavioral ("hard debug")
Phase 20 — Evaluation harness ML-systems (offline vs online eval)
Phase 21 — Inference / sampling Coding round (top-p drill 07)
Phase 22 — KV cache Coding round (drill 04), ML-systems (memory budget)
Phase 26 — Quantization ML-systems (cost per token)
Phase 27 — Modern attention Coding round (FlashAttention conceptual)
Phase 28 — LoRA / QLoRA Coding round (drill 05)
Phase 29 — RAG ML-systems prompt 3
Phase 32 — Agents ML-systems prompt 2
Phase 33 — Inference serving ML-systems prompt 1, coding drill 12 (continuous batcher)
Phase 34 — Observability & cost ML-systems (CpQU)
Phase 35 — Distributed ML-systems (NCCL deadlock)
Phase 37 — Security & safety Anthropic behavioral round
Phase 38 — MLOps ML-systems (multi-tenant fine-tuning)
X3 — RLHF / DPO Coding drill 06 (DPO loss), paper round (Rafailov 2023)

What this module does not cover

  • LeetCode grinding. Use NeetCode 150 separately if your role demands DSA.
  • Comp negotiation. Use levels.fyi and a recruiter friend.
  • Visa / immigration. Out of scope.

Next file

01-whiteboard-ml-questions.md: 25 questions, 3-paragraph answers, 3-level follow-up trees.