Skip to content

English · Español

Phase 25 — Quizzes (mirror)

🇪🇸 Las preguntas canónicas viven en data/quizzes/phase-25-pytorch-internals.yaml.


q-25-01 — In-place op hazard

Prompt (EN): Calling h.add_(self.attn(self.ln1(h))) instead of h = h + self.attn(self.ln1(h)) in a training loop most often results in:

  • A. No change at all; the two are functionally equivalent.
  • B. A RuntimeError from autograd, or silent wrong gradients.
  • C. A type error.
  • D. A 2× memory speedup with no other change.

Correct: B. The in-place op modifies a storage that may be saved by an upstream node's backward; PyTorch's version check raises (or, in rare cases, gradients are silently wrong).


q-25-02 — View vs copy

Prompt (EN): Which of the following operations returns a view (no data copy) of the input tensor?

  • A. x.reshape(-1) when x is already contiguous.
  • B. x.contiguous() when x is non-contiguous.
  • C. x.clone().
  • D. x.to(dtype=torch.float16).

Correct: A. On contiguous input, reshape is a view. contiguous() on non-contig data copies. clone() always copies. to(dtype=...) copies (dtype change requires new storage).


q-25-03 — Autograd tape

Prompt (EN): In one or two sentences, describe what PyTorch's autograd "tape" is and what .backward() does with it.

Free response. Expected mentions: dynamic graph; nodes are Functions; .backward() traverses in reverse topological order.


q-25-04 — Saved tensors

Prompt (EN): Select every backward node that saves a tensor needed for its gradient computation.

  • A. AddBackward (for a + b).
  • B. MulBackward (for a * b).
  • C. SoftmaxBackward (for softmax(x)).
  • D. GeluBackward (for gelu(x)).

Correct: B, C, D. Add needs nothing (gradient is just passed through). The other three each need an input or output to compute their Jacobian.


q-25-05 — Dispatcher

Prompt (EN): PyTorch's dispatcher dispatches a single operator call (e.g., torch.add) based on several keys. Which keys are part of the dispatch decision?

  • A. Tensor device (CPU, CUDA, MPS, ...).
  • B. Tensor dtype.
  • C. Tensor layout (strided, sparse).
  • D. Whether autograd is enabled.

Correct: A, B, C, D. All four are dispatch keys. The dispatcher selects the appropriate kernel (or autograd wrapper) based on the combination.