English · Español

Phase 25 — Quizzes (mirror)¶

🇪🇸 Las preguntas canónicas viven en data/quizzes/phase-25-pytorch-internals.yaml.

q-25-01 — In-place op hazard¶

Prompt (EN): Calling h.add_(self.attn(self.ln1(h))) instead of h = h + self.attn(self.ln1(h)) in a training loop most often results in:

A. No change at all; the two are functionally equivalent.
B. A RuntimeError from autograd, or silent wrong gradients.
C. A type error.
D. A 2× memory speedup with no other change.

Correct: B. The in-place op modifies a storage that may be saved by an upstream node's backward; PyTorch's version check raises (or, in rare cases, gradients are silently wrong).

q-25-02 — View vs copy¶

Prompt (EN): Which of the following operations returns a view (no data copy) of the input tensor?

A. x.reshape(-1) when x is already contiguous.
B. x.contiguous() when x is non-contiguous.
C. x.clone().
D. x.to(dtype=torch.float16).

Correct: A. On contiguous input, reshape is a view. contiguous() on non-contig data copies. clone() always copies. to(dtype=...) copies (dtype change requires new storage).

q-25-03 — Autograd tape¶

Prompt (EN): In one or two sentences, describe what PyTorch's autograd "tape" is and what .backward() does with it.

Free response. Expected mentions: dynamic graph; nodes are Functions; .backward() traverses in reverse topological order.

q-25-04 — Saved tensors¶

Prompt (EN): Select every backward node that saves a tensor needed for its gradient computation.

A. AddBackward (for a + b).
B. MulBackward (for a * b).
C. SoftmaxBackward (for softmax(x)).
D. GeluBackward (for gelu(x)).

Correct: B, C, D. Add needs nothing (gradient is just passed through). The other three each need an input or output to compute their Jacobian.

q-25-05 — Dispatcher¶

Prompt (EN): PyTorch's dispatcher dispatches a single operator call (e.g., torch.add) based on several keys. Which keys are part of the dispatch decision?

A. Tensor device (CPU, CUDA, MPS, ...).
B. Tensor dtype.
C. Tensor layout (strided, sparse).
D. Whether autograd is enabled.

Correct: A, B, C, D. All four are dispatch keys. The dispatcher selects the appropriate kernel (or autograd wrapper) based on the combination.