English · Español
Lab 00 — Provision a Cloud GPU End-to-End¶
Goal: rent a GPU, SSH in, run
nvidia-smi, run a one-line cuBLAS benchmark, terminate, all in under 30 minutes. Document the workflow that all subsequent labs use.Estimated time: 30–60 minutes (first time), 5 minutes (thereafter).
Prereq: cloud platform decision resolved per
PHASE_23_PLAN.md§7.a. Account created on chosen provider. Payment method on file. Budget cap configured.
What you produce¶
A directory experiments/23-provisioning/ containing:
provisioning_log.md— step-by-step what you did, with timestamps. Future-you's recipe.nvidia-smi.txt— capturednvidia-smioutput from your rented instance.cublas_smoke_test.json— output of a one-line "this thing works" benchmark.cost_log.md— start time, stop time, hourly rate, total cost.manifest.json.
TODOs¶
Block A — pre-flight (5 min)¶
- Confirm
PHASE_23_PLAN.md§7.a resolved. Open the chosen provider's console. - If using SSH-based provider (RunPod / Vast / Lambda): generate an SSH key locally if you don't have one (
ssh-keygen -t ed25519 -C "lynx-cortex-phase23"). Upload public key to provider. Do not commit private key. - Set a billing alert / spend cap at your monthly budget per
learners/borja/profile.md(€50–80). Most providers support this.
Block B — launch (5 min)¶
- Pick the smallest reasonable GPU. For Phase 23, an RTX 3090 (24 GB) or A10 (24 GB) is plenty. Spot pricing if available.
- Pick the smallest reasonable region (lowest latency to you; Borja is in Spain — prefer EU).
- Use a "CUDA pre-installed Ubuntu 22.04" template. Most providers have this. Do not install drivers yourself.
- Launch. Note the start time in
cost_log.md.
Block C — connect (5 min)¶
- SSH in. Confirm:
lsb_release -a(Ubuntu 22.04),nvidia-smi(driver + GPU detected). - Pipe
nvidia-smioutput tonvidia-smi.txt, scp back to your machine. -
git clone git@github.com:borjatarraso/lynx-cortex.giton the instance. The repo is private per A16, so authenticate with a deploy key (read-only is enough for Phase 23) or a short-lived PAT — no anonymous HTTPS clone. -
cd lynx-cortex && uv sync— install the project's deps in the instance. (uvmust be installed:curl -LsSf https://astral.sh/uv/install.sh | sh.)
Block D — smoke test (5 min)¶
- Run a one-line cuBLAS GEMM via
cupy(or PyTorch — see Plan §7.g; for Phase 23 we usecupy):import cupy as cp, time A = cp.random.randn(4096, 4096, dtype=cp.float16) B = cp.random.randn(4096, 4096, dtype=cp.float16) cp.cuda.runtime.deviceSynchronize() t0 = time.perf_counter() for _ in range(100): C = A @ B cp.cuda.runtime.deviceSynchronize() t1 = time.perf_counter() flops = 2 * 4096**3 * 100 / (t1 - t0) print(f"{flops / 1e12:.1f} TFLOPS fp16") - Save the result to
cublas_smoke_test.jsonwith{tflops_measured, gpu_name, dtype, matrix_size, iters}. - Confirm: number should be 30–80% of vendor peak for fp16 on the rented GPU. If <10%, something is wrong — check
nvidia-smifor "default" mode (vs MIG-partitioned), check CUDA version vs yourcupybuild.
Block E — terminate (2 min — don't skip)¶
-
git add -A && git commit -m "phase: 23 provisioning smoke test"from the instance. Push. - Note the stop time in
cost_log.md. Compute total cost = (stop - start) × hourly_rate. Write it down. - Terminate the instance from the provider console. Confirm in the console it's gone (not "stopped" — terminated; stopped instances on some providers still bill for storage).
- Verify your spend on the provider's dashboard.
Block F — write the recipe (10 min)¶
In provisioning_log.md, write a future-you recipe with:
- Exact commands you ran (copy-pasteable).
- Anything that surprised you (e.g., "the spot instance was reclaimed once, had to relaunch").
- The exact hourly rate you got (spot vs on-demand).
- An estimate of "time-from-decision-to-running-benchmark" — this is what you'll save in future sessions.
Block G — manifest¶
manifest.json:
{
"experiment": "23-provisioning",
"date": "YYYY-MM-DD",
"cloud_provider": "<runpod | lambda | vast | colab | other>",
"region": "<eu-west | us-east | ...>",
"gpu_name": "<RTX 3090 | A10 | A100 | ...>",
"gpu_vram_gib": null,
"pricing_model": "<spot | on-demand>",
"hourly_rate_usd": null,
"session_minutes": null,
"total_cost_usd": null,
"cuda_version": null,
"driver_version": null,
"ubuntu_version": "22.04",
"smoke_test_tflops_fp16": null,
"smoke_test_fraction_of_peak": null
}
Constraints¶
- Do not install drivers yourself. Use the provider's CUDA-pre-installed image.
- Do not commit SSH private keys, API tokens, or any provider credentials.
.gitignoreshould already cover~/.ssh,.env, etc. - Do not skip termination. A forgotten instance is a $50–500/day mistake.
- Do not run training jobs in Phase 23. This phase is measurement only. Anything that takes >5 minutes is suspicious — review and stop.
Stop conditions¶
Done when:
- All six output files committed to
experiments/23-provisioning/. - The instance is terminated (confirmed on dashboard).
- The cost log shows the actual spend for this session.
- The smoke test TFLOPS is in the 30–80% of peak range.
manifest.jsoncomplete.
Pitfalls¶
- "Stopped" vs "terminated". Stopping a RunPod or Vast instance still bills you for storage. Terminate.
- CUDA /
cupyversion mismatch. Ifcupy-cuda12xis installed but the image has CUDA 11.8, imports fail. Install the matchingcupybuild:pip install cupy-cuda11xorcupy-cuda12xmatching the image's CUDA. nvidia-smishows GPU but kernel launches fail. Permissions issue; ensure your user can access/dev/nvidia*. The Ubuntu CUDA template handles this but custom AMIs may not.- Spot instance reclaimed mid-session. Lose your shell. Re-launch and re-run. Document if it happens.
- First
cudaMemcpyis 100× slower than steady-state. Context init overhead. Don't time the first iteration.
When to consult solutions/¶
After all stop conditions are met. The reference at solutions/00-provision-cloud-gpu-ref.md (written at phase open) shows the exact commands the reference implementation used on RunPod with an A10, plus the cost breakdown.
Next lab: lab/01-device-query.md.