English · Español

06 — Company-Specific Prep: Signals by Lab¶

🇪🇸 Lo que cada laboratorio prioriza en entrevistas: Anthropic (alineamiento, IA constitucional, "explica por qué los modelos fallan"); OpenAI (escala + juicio de producto); DeepMind (profundidad de investigación + matemáticas); Google Brain (papers + producción); xAI (pragmatismo de ingeniería); Cohere / Mistral (multilingüe + retrieval).

How to read this file¶

Each section has three blocks: 1. What they value — culture and research priorities, distilled from public statements, papers, and engineering blog posts (2024-2026). 2. What to expect in the loop — the specific questions / formats their interviewers tend to favor. 3. The lynx-cortex prep — which phase / module / drill is best leverage for that company.

These are heuristics from public information, not contracts. Calibrate against the recruiter's brief.

Anthropic¶

What they value¶

Alignment-first. The published research output is dominated by alignment work: Constitutional AI, RLHF, scalable oversight, interpretability (circuits, sparse autoencoders), red-teaming, model behavior reasoning. If you do not have an opinion on alignment, you will struggle.
Honest uncertainty. Anthropic's writing style is deliberately measured — "we don't know yet"-style epistemic humility. Engineers / researchers who project false confidence get filtered.
Long-context, careful reasoning. Their published model releases (Claude 3.5, Claude 4, Claude Opus) emphasize reasoning quality over benchmark-chasing. Interviewers care about your reasoning process, not just the answer.
Safety as engineering, not theatre. They expect you to think about failure modes concretely — not "AI might be unsafe" but "this specific prompt class causes this specific failure".

What to expect in the loop¶

Phone screen. Standard coding (attention, BPE, or similar) + one ML concept.
ML systems round. Often includes a constraint like "design a system that won't accidentally output harmful content" — capacity plus safety.
Paper round. Likely a recent Anthropic paper (CAI, Constitutional Classifiers, Influence Functions, Scaling Monosemanticity, Sleeper Agents). Read these.
Behavioral round — the Anthropic-distinctive one. Expect questions like:
"Tell me about a time a model behaved badly. What did you investigate?"
"What is your view on AI safety?"
"When have you been wrong about something important?"
"Explain why language models hallucinate."
Bar raiser / hiring manager. Often probes calibration: "How confident are you in this answer? What would change your mind?"

Specific topics to be fluent on¶

Topic	Source
Constitutional AI	Bai et al. 2022
RLHF / RLAIF	Christiano 2017, Lee 2023
Scalable oversight (debate, weak-to-strong)	Bowman 2022, Burns 2023
Mechanistic interpretability	Olah / Conmy / Bricken (circuits, SAEs)
Sleeper Agents / deceptive alignment	Hubinger 2024
Sycophancy and reward hacking	Sharma 2023
Model Spec / behavior policies	Anthropic's published behaviors

lynx-cortex prep¶

X3 RLHF/DPO module is the highest-leverage prep. Be able to derive DPO from RLHF on a whiteboard (see whiteboard Q14, drill 06).
Phase 37 (Security & Safety) + X3 theory/05 (Constitutional AI) for the behavioral round.
Behavioral anecdote 8 ("model behaved badly") is the highest-leverage story.
Whiteboard Q24 (Constitutional AI) — be able to explain the SL-CAI / RL-CAI distinction.

OpenAI¶

What they value¶

Scale and engineering velocity. OpenAI ships fast and at scale; engineers are expected to be product-savvy as well as technically deep.
Product judgment. "What should we ship?" is a real interview question. They want builders who think about user value, not just elegant code.
Practical scaling laws intuition. Compute, data, parameter budgets, inference economics. They invented Chinchilla's predecessor (Kaplan 2020) and operate at the frontier of inference scaling.
Less alignment-public, more capabilities-public than Anthropic. Their public research mix is heavier on capabilities and product (GPT-4 system card, o1 reasoning, Sora, Realtime API).

What to expect in the loop¶

Phone screen. Coding-heavy. Implement something. Speed matters.
ML systems round. Often "design an inference service at our scale" — capacity math from prompt 1 of 02-systems-design-for-llms.md is the bullseye.
Coding round. Implement attention / sampling / a primitive in PyTorch. Speed is graded.
Product / judgment round. "If you were the PM for ChatGPT, what would you prioritize?" — they want to see opinions backed by data.
Research / paper round. May ask about scaling laws, mixture-of-experts, multimodal training.

Specific topics¶

Topic	Source
Scaling laws	Kaplan 2020, Hoffmann 2022, Hoffmann 2024 update
RLHF (InstructGPT)	Ouyang 2022
Mixture of experts	Switch Transformer, GShard, GPT-4 rumored architecture
Multimodal	GPT-4V system card, Sora technical report
Reasoning (o1, o3)	OpenAI o1 / o3 system cards
Inference / batching	Continuous batching, speculative decoding

lynx-cortex prep¶

Phase 33 (inference serving) + drill 12 (continuous batcher) for the systems round.
Phase 38 (MLOps) for cost-discipline answers (CpQU framing).
Paper-pitch cards 4 (GPT-3), 10 (Chinchilla), 13 (InstructGPT) memorized.
Have a product opinion: pick one feature of an OpenAI product, argue why it works or doesn't.

Google DeepMind¶

What they value¶

Research depth. DeepMind has the strongest research-publication culture of any major lab. Even engineering roles are expected to read papers fluently.
Math. RL, theory, optimization, information theory. The bar is the highest for math fluency among the labs.
Long-horizon problems. AlphaFold, AlphaProof, AlphaGeometry — DeepMind chases hard, long-running scientific goals.
Production rigor. Now merged with Google Brain — production scale is also part of the bar.

What to expect in the loop¶

Phone screen. Coding + research-paper short discussion.
Research round (for RS, RE roles). Present your prior work; expect deep probing on methodology, ablations, and "what would you do differently".
Math / theory round. Optimization, information theory, RL fundamentals. Be ready to derive policy gradient on a whiteboard.
Coding round. Standard.
Paper round. Likely a DeepMind paper — Chinchilla, Flamingo, Gemini, Gato, AlphaFold, AlphaProof, Gemma.

Specific topics¶

Topic	Source
Chinchilla scaling	Hoffmann 2022
RL fundamentals	Sutton & Barto; PPO; SAC; DQN
Distributed training	Megatron, JAX/Flax, DeepSpeed
AlphaGo lineage	Silver 2016, AlphaZero, MuZero
Gemini architecture	Gemini technical report

lynx-cortex prep¶

Phase 04 (calculus & optimization) is the highest leverage. Derive everything.
Phase 19 (training dynamics) — DeepMind cares about training stability.
X3 module — RL fundamentals (theory/01).
Paper-pitch card 12 (Chinchilla) memorized; be able to derive the scaling law fit.

Google Brain (within Google DeepMind)¶

What they value¶

Paper-prolific — historically the most-publishing single team in AI.
Production scale — they ship to Google Search, Workspace, Pixel.
Eclectic — covers vision, NLP, robotics, healthcare AI, hardware (TPU). Bring your specialty.
Post-2023, organizationally merged with DeepMind; some signals overlap.

What to expect in the loop¶

Similar to DeepMind but with more emphasis on production infrastructure (TPU, JAX, GShard).
More likely than DeepMind to ask about MLOps, multi-tenant serving, latency tuning.

Specific topics¶

Topic	Source
Transformer (original)	Vaswani 2017
BERT / T5 / PaLM	Devlin 2018, Raffel 2019, Chowdhery 2022
Mixture of experts	Shazeer 2017, Switch Transformer
Pathways / JAX	Pathways paper, JAX docs
TPU architecture	TPU papers, MLPerf submissions

lynx-cortex prep¶

Same as DeepMind, plus Phase 33 (inference serving) and Phase 35 (distributed).
Paper-pitch card 1 (Attention) memorized.

xAI¶

What they value¶

Engineering pragmatism. Less paper-prolific than DeepMind, more focused on training & deploying at scale fast.
Hardware sense. Grok was trained on a Memphis cluster (Colossus, 100k+ H100s). They want engineers who understand GPU networking, NCCL, RDMA.
Iteration speed. xAI shipped Grok, Grok-1.5, Grok-2, Grok-3 quickly. They reward people who ship.
Pragmatic alignment. Less focused on theoretical alignment than Anthropic; more "we shipped a product, here are the guardrails".

What to expect in the loop¶

Phone screen. Coding-heavy.
Systems round. Likely "how would you train a frontier model on N GPUs" — capacity math, distributed strategies (FSDP, ZeRO, tensor / pipeline / sequence parallelism), failure handling.
Coding round. Implementation under time pressure.
Culture round. Less STAR, more "tell us what excites you about Grok".

Specific topics¶

Topic	Source
Distributed training	Megatron, FSDP, ZeRO, DeepSpeed
NCCL / RDMA / Infiniband	NVIDIA networking docs
FlashAttention	Dao 2022, FA-2, FA-3
Llama-style architectures	Llama 2, Llama 3 technical reports
Grok system cards	xAI published model docs

lynx-cortex prep¶

Phase 35 (distributed) is the highest leverage.
Phase 23 + 24 (GPU + CUDA / Triton) — be able to write a Triton kernel sketch.
Drill 03 (gradient checkpointing) and drill 12 (continuous batcher).

Cohere¶

What they value¶

Enterprise multilingual. Cohere's positioning is API-first, business-facing, multilingual-strong.
Retrieval excellence. Their Embed and Rerank models are best-in-class for many production RAG use cases.
Practical / deployable. Less "frontier model race", more "make production retrieval work".

What to expect in the loop¶

Strong emphasis on RAG / retrieval systems design.
Multilingual tokenization and evaluation questions.
Enterprise integration — authn/authz, multi-tenancy, data residency.

Specific topics¶

Topic	Source
Dense retrieval	DPR, BGE, E5, Cohere Embed v3
Reranking	Cross-encoder, Cohere Rerank
Hybrid retrieval	BM25 + dense fusion
Multilingual tokenization	XLM-R, Aya
Long-context retrieval	Recursive retrieval, hierarchical RAG

lynx-cortex prep¶

Phase 29 (RAG) is the highest leverage.
Phase 11 (tokenization) with attention to multilingual ratios.
Whiteboard Q19 (RAG) and Q20 (vocab) memorized.

Mistral¶

What they value¶

Open-weight pragmatism. Mistral has shipped strong open weights (Mistral 7B, Mixtral, Mistral Large). They blend French research culture with European startup intensity.
Efficient architectures. Sliding-window attention, mixture of experts, grouped-query attention. Mistral's reputation is "smart architecture choices, not just bigger".
Multilingual. Especially European languages.

What to expect in the loop¶

Strong focus on architecture choices and ablations.
"Why GQA vs MHA vs MQA" is a typical question.
Open-weight ecosystem fluency (Hugging Face, llama.cpp, vLLM).

Specific topics¶

Topic	Source
Sliding-window attention	Mistral 7B paper
Mixture of experts	Mixtral paper
Grouped-query attention	Ainslie 2023
Mistral Large	Mistral technical reports
Open-weight tooling	Hugging Face, vLLM, llama.cpp

lynx-cortex prep¶

Phase 27 (modern attention) for sliding-window and FlashAttention.
Phase 36 (frontier architectures) for MoE.
Paper-pitch card 15 (Mistral 7B) memorized.

Cross-company quick-reference matrix¶

Topic	Anthropic	OpenAI	DeepMind	Brain	xAI	Cohere	Mistral
Alignment depth	★★★	★★	★★	★★	★	★	★
Math depth	★★	★★	★★★	★★	★★	★★	★★
Distributed training	★★	★★★	★★★	★★★	★★★	★	★★
Production scale	★★	★★★	★★	★★★	★★★	★★★	★★
Retrieval	★	★	★	★	★	★★★	★
Open-weights / ecosystem	★	★	★	★★	★	★	★★★
Constitutional AI / Safety	★★★	★★	★★	★★	★	★	★

A note on tone¶

Each lab has a tone you should match. Read 3-5 of their published blog posts before your interview. Anthropic: measured and uncertainty-aware. OpenAI: confident and product-forward. DeepMind: scholarly. xAI: irreverent. Cohere/Mistral: pragmatic. Mirroring the tone signals fit.

→ Move on to the lab files: ../lab/00-mock-interview-checklist.md, ../lab/01-paper-pitch-cards.md.