English · Español
Phase 25 — Quizzes (mirror)¶
🇪🇸 Las preguntas canónicas viven en
data/quizzes/phase-25-pytorch-internals.yaml.
q-25-01 — In-place op hazard¶
Prompt (EN): Calling h.add_(self.attn(self.ln1(h))) instead of h = h + self.attn(self.ln1(h)) in a training loop most often results in:
- A. No change at all; the two are functionally equivalent.
- B. A
RuntimeErrorfrom autograd, or silent wrong gradients. - C. A type error.
- D. A 2× memory speedup with no other change.
Correct: B. The in-place op modifies a storage that may be saved by an upstream node's backward; PyTorch's version check raises (or, in rare cases, gradients are silently wrong).
q-25-02 — View vs copy¶
Prompt (EN): Which of the following operations returns a view (no data copy) of the input tensor?
- A.
x.reshape(-1)whenxis already contiguous. - B.
x.contiguous()whenxis non-contiguous. - C.
x.clone(). - D.
x.to(dtype=torch.float16).
Correct: A. On contiguous input, reshape is a view. contiguous() on non-contig data copies. clone() always copies. to(dtype=...) copies (dtype change requires new storage).
q-25-03 — Autograd tape¶
Prompt (EN): In one or two sentences, describe what PyTorch's autograd "tape" is and what .backward() does with it.
Free response. Expected mentions: dynamic graph; nodes are Functions; .backward() traverses in reverse topological order.
q-25-04 — Saved tensors¶
Prompt (EN): Select every backward node that saves a tensor needed for its gradient computation.
- A.
AddBackward(fora + b). - B.
MulBackward(fora * b). - C.
SoftmaxBackward(forsoftmax(x)). - D.
GeluBackward(forgelu(x)).
Correct: B, C, D. Add needs nothing (gradient is just passed through). The other three each need an input or output to compute their Jacobian.
q-25-05 — Dispatcher¶
Prompt (EN): PyTorch's dispatcher dispatches a single operator call (e.g., torch.add) based on several keys. Which keys are part of the dispatch decision?
- A. Tensor device (CPU, CUDA, MPS, ...).
- B. Tensor dtype.
- C. Tensor layout (strided, sparse).
- D. Whether autograd is enabled.
Correct: A, B, C, D. All four are dispatch keys. The dispatcher selects the appropriate kernel (or autograd wrapper) based on the combination.