Skip to content

English · Español

06 — Company-Specific Prep: Signals by Lab

🇪🇸 Lo que cada laboratorio prioriza en entrevistas: Anthropic (alineamiento, IA constitucional, "explica por qué los modelos fallan"); OpenAI (escala + juicio de producto); DeepMind (profundidad de investigación + matemáticas); Google Brain (papers + producción); xAI (pragmatismo de ingeniería); Cohere / Mistral (multilingüe + retrieval).

How to read this file

Each section has three blocks: 1. What they value — culture and research priorities, distilled from public statements, papers, and engineering blog posts (2024-2026). 2. What to expect in the loop — the specific questions / formats their interviewers tend to favor. 3. The lynx-cortex prep — which phase / module / drill is best leverage for that company.

These are heuristics from public information, not contracts. Calibrate against the recruiter's brief.


Anthropic

What they value

  • Alignment-first. The published research output is dominated by alignment work: Constitutional AI, RLHF, scalable oversight, interpretability (circuits, sparse autoencoders), red-teaming, model behavior reasoning. If you do not have an opinion on alignment, you will struggle.
  • Honest uncertainty. Anthropic's writing style is deliberately measured — "we don't know yet"-style epistemic humility. Engineers / researchers who project false confidence get filtered.
  • Long-context, careful reasoning. Their published model releases (Claude 3.5, Claude 4, Claude Opus) emphasize reasoning quality over benchmark-chasing. Interviewers care about your reasoning process, not just the answer.
  • Safety as engineering, not theatre. They expect you to think about failure modes concretely — not "AI might be unsafe" but "this specific prompt class causes this specific failure".

What to expect in the loop

  • Phone screen. Standard coding (attention, BPE, or similar) + one ML concept.
  • ML systems round. Often includes a constraint like "design a system that won't accidentally output harmful content" — capacity plus safety.
  • Paper round. Likely a recent Anthropic paper (CAI, Constitutional Classifiers, Influence Functions, Scaling Monosemanticity, Sleeper Agents). Read these.
  • Behavioral round — the Anthropic-distinctive one. Expect questions like:
  • "Tell me about a time a model behaved badly. What did you investigate?"
  • "What is your view on AI safety?"
  • "When have you been wrong about something important?"
  • "Explain why language models hallucinate."
  • Bar raiser / hiring manager. Often probes calibration: "How confident are you in this answer? What would change your mind?"

Specific topics to be fluent on

Topic Source
Constitutional AI Bai et al. 2022
RLHF / RLAIF Christiano 2017, Lee 2023
Scalable oversight (debate, weak-to-strong) Bowman 2022, Burns 2023
Mechanistic interpretability Olah / Conmy / Bricken (circuits, SAEs)
Sleeper Agents / deceptive alignment Hubinger 2024
Sycophancy and reward hacking Sharma 2023
Model Spec / behavior policies Anthropic's published behaviors

lynx-cortex prep

  • X3 RLHF/DPO module is the highest-leverage prep. Be able to derive DPO from RLHF on a whiteboard (see whiteboard Q14, drill 06).
  • Phase 37 (Security & Safety) + X3 theory/05 (Constitutional AI) for the behavioral round.
  • Behavioral anecdote 8 ("model behaved badly") is the highest-leverage story.
  • Whiteboard Q24 (Constitutional AI) — be able to explain the SL-CAI / RL-CAI distinction.

OpenAI

What they value

  • Scale and engineering velocity. OpenAI ships fast and at scale; engineers are expected to be product-savvy as well as technically deep.
  • Product judgment. "What should we ship?" is a real interview question. They want builders who think about user value, not just elegant code.
  • Practical scaling laws intuition. Compute, data, parameter budgets, inference economics. They invented Chinchilla's predecessor (Kaplan 2020) and operate at the frontier of inference scaling.
  • Less alignment-public, more capabilities-public than Anthropic. Their public research mix is heavier on capabilities and product (GPT-4 system card, o1 reasoning, Sora, Realtime API).

What to expect in the loop

  • Phone screen. Coding-heavy. Implement something. Speed matters.
  • ML systems round. Often "design an inference service at our scale" — capacity math from prompt 1 of 02-systems-design-for-llms.md is the bullseye.
  • Coding round. Implement attention / sampling / a primitive in PyTorch. Speed is graded.
  • Product / judgment round. "If you were the PM for ChatGPT, what would you prioritize?" — they want to see opinions backed by data.
  • Research / paper round. May ask about scaling laws, mixture-of-experts, multimodal training.

Specific topics

Topic Source
Scaling laws Kaplan 2020, Hoffmann 2022, Hoffmann 2024 update
RLHF (InstructGPT) Ouyang 2022
Mixture of experts Switch Transformer, GShard, GPT-4 rumored architecture
Multimodal GPT-4V system card, Sora technical report
Reasoning (o1, o3) OpenAI o1 / o3 system cards
Inference / batching Continuous batching, speculative decoding

lynx-cortex prep

  • Phase 33 (inference serving) + drill 12 (continuous batcher) for the systems round.
  • Phase 38 (MLOps) for cost-discipline answers (CpQU framing).
  • Paper-pitch cards 4 (GPT-3), 10 (Chinchilla), 13 (InstructGPT) memorized.
  • Have a product opinion: pick one feature of an OpenAI product, argue why it works or doesn't.

Google DeepMind

What they value

  • Research depth. DeepMind has the strongest research-publication culture of any major lab. Even engineering roles are expected to read papers fluently.
  • Math. RL, theory, optimization, information theory. The bar is the highest for math fluency among the labs.
  • Long-horizon problems. AlphaFold, AlphaProof, AlphaGeometry — DeepMind chases hard, long-running scientific goals.
  • Production rigor. Now merged with Google Brain — production scale is also part of the bar.

What to expect in the loop

  • Phone screen. Coding + research-paper short discussion.
  • Research round (for RS, RE roles). Present your prior work; expect deep probing on methodology, ablations, and "what would you do differently".
  • Math / theory round. Optimization, information theory, RL fundamentals. Be ready to derive policy gradient on a whiteboard.
  • Coding round. Standard.
  • Paper round. Likely a DeepMind paper — Chinchilla, Flamingo, Gemini, Gato, AlphaFold, AlphaProof, Gemma.

Specific topics

Topic Source
Chinchilla scaling Hoffmann 2022
RL fundamentals Sutton & Barto; PPO; SAC; DQN
Distributed training Megatron, JAX/Flax, DeepSpeed
AlphaGo lineage Silver 2016, AlphaZero, MuZero
Gemini architecture Gemini technical report

lynx-cortex prep

  • Phase 04 (calculus & optimization) is the highest leverage. Derive everything.
  • Phase 19 (training dynamics) — DeepMind cares about training stability.
  • X3 module — RL fundamentals (theory/01).
  • Paper-pitch card 12 (Chinchilla) memorized; be able to derive the scaling law fit.

Google Brain (within Google DeepMind)

What they value

  • Paper-prolific — historically the most-publishing single team in AI.
  • Production scale — they ship to Google Search, Workspace, Pixel.
  • Eclectic — covers vision, NLP, robotics, healthcare AI, hardware (TPU). Bring your specialty.
  • Post-2023, organizationally merged with DeepMind; some signals overlap.

What to expect in the loop

  • Similar to DeepMind but with more emphasis on production infrastructure (TPU, JAX, GShard).
  • More likely than DeepMind to ask about MLOps, multi-tenant serving, latency tuning.

Specific topics

Topic Source
Transformer (original) Vaswani 2017
BERT / T5 / PaLM Devlin 2018, Raffel 2019, Chowdhery 2022
Mixture of experts Shazeer 2017, Switch Transformer
Pathways / JAX Pathways paper, JAX docs
TPU architecture TPU papers, MLPerf submissions

lynx-cortex prep

  • Same as DeepMind, plus Phase 33 (inference serving) and Phase 35 (distributed).
  • Paper-pitch card 1 (Attention) memorized.

xAI

What they value

  • Engineering pragmatism. Less paper-prolific than DeepMind, more focused on training & deploying at scale fast.
  • Hardware sense. Grok was trained on a Memphis cluster (Colossus, 100k+ H100s). They want engineers who understand GPU networking, NCCL, RDMA.
  • Iteration speed. xAI shipped Grok, Grok-1.5, Grok-2, Grok-3 quickly. They reward people who ship.
  • Pragmatic alignment. Less focused on theoretical alignment than Anthropic; more "we shipped a product, here are the guardrails".

What to expect in the loop

  • Phone screen. Coding-heavy.
  • Systems round. Likely "how would you train a frontier model on N GPUs" — capacity math, distributed strategies (FSDP, ZeRO, tensor / pipeline / sequence parallelism), failure handling.
  • Coding round. Implementation under time pressure.
  • Culture round. Less STAR, more "tell us what excites you about Grok".

Specific topics

Topic Source
Distributed training Megatron, FSDP, ZeRO, DeepSpeed
NCCL / RDMA / Infiniband NVIDIA networking docs
FlashAttention Dao 2022, FA-2, FA-3
Llama-style architectures Llama 2, Llama 3 technical reports
Grok system cards xAI published model docs

lynx-cortex prep

  • Phase 35 (distributed) is the highest leverage.
  • Phase 23 + 24 (GPU + CUDA / Triton) — be able to write a Triton kernel sketch.
  • Drill 03 (gradient checkpointing) and drill 12 (continuous batcher).

Cohere

What they value

  • Enterprise multilingual. Cohere's positioning is API-first, business-facing, multilingual-strong.
  • Retrieval excellence. Their Embed and Rerank models are best-in-class for many production RAG use cases.
  • Practical / deployable. Less "frontier model race", more "make production retrieval work".

What to expect in the loop

  • Strong emphasis on RAG / retrieval systems design.
  • Multilingual tokenization and evaluation questions.
  • Enterprise integration — authn/authz, multi-tenancy, data residency.

Specific topics

Topic Source
Dense retrieval DPR, BGE, E5, Cohere Embed v3
Reranking Cross-encoder, Cohere Rerank
Hybrid retrieval BM25 + dense fusion
Multilingual tokenization XLM-R, Aya
Long-context retrieval Recursive retrieval, hierarchical RAG

lynx-cortex prep

  • Phase 29 (RAG) is the highest leverage.
  • Phase 11 (tokenization) with attention to multilingual ratios.
  • Whiteboard Q19 (RAG) and Q20 (vocab) memorized.

Mistral

What they value

  • Open-weight pragmatism. Mistral has shipped strong open weights (Mistral 7B, Mixtral, Mistral Large). They blend French research culture with European startup intensity.
  • Efficient architectures. Sliding-window attention, mixture of experts, grouped-query attention. Mistral's reputation is "smart architecture choices, not just bigger".
  • Multilingual. Especially European languages.

What to expect in the loop

  • Strong focus on architecture choices and ablations.
  • "Why GQA vs MHA vs MQA" is a typical question.
  • Open-weight ecosystem fluency (Hugging Face, llama.cpp, vLLM).

Specific topics

Topic Source
Sliding-window attention Mistral 7B paper
Mixture of experts Mixtral paper
Grouped-query attention Ainslie 2023
Mistral Large Mistral technical reports
Open-weight tooling Hugging Face, vLLM, llama.cpp

lynx-cortex prep

  • Phase 27 (modern attention) for sliding-window and FlashAttention.
  • Phase 36 (frontier architectures) for MoE.
  • Paper-pitch card 15 (Mistral 7B) memorized.

Cross-company quick-reference matrix

Topic Anthropic OpenAI DeepMind Brain xAI Cohere Mistral
Alignment depth ★★★ ★★ ★★ ★★
Math depth ★★ ★★ ★★★ ★★ ★★ ★★ ★★
Distributed training ★★ ★★★ ★★★ ★★★ ★★★ ★★
Production scale ★★ ★★★ ★★ ★★★ ★★★ ★★★ ★★
Retrieval ★★★
Open-weights / ecosystem ★★ ★★★
Constitutional AI / Safety ★★★ ★★ ★★ ★★

A note on tone

Each lab has a tone you should match. Read 3-5 of their published blog posts before your interview. Anthropic: measured and uncertainty-aware. OpenAI: confident and product-forward. DeepMind: scholarly. xAI: irreverent. Cohere/Mistral: pragmatic. Mirroring the tone signals fit.


→ Move on to the lab files: ../lab/00-mock-interview-checklist.md, ../lab/01-paper-pitch-cards.md.