English · Español

Lab 00 — Provision a Cloud GPU End-to-End¶

Goal: rent a GPU, SSH in, run nvidia-smi, run a one-line cuBLAS benchmark, terminate, all in under 30 minutes. Document the workflow that all subsequent labs use.

Estimated time: 30–60 minutes (first time), 5 minutes (thereafter).

Prereq: cloud platform decision resolved per PHASE_23_PLAN.md §7.a. Account created on chosen provider. Payment method on file. Budget cap configured.

What you produce¶

A directory experiments/23-provisioning/ containing:

provisioning_log.md — step-by-step what you did, with timestamps. Future-you's recipe.
nvidia-smi.txt — captured nvidia-smi output from your rented instance.
cublas_smoke_test.json — output of a one-line "this thing works" benchmark.
cost_log.md — start time, stop time, hourly rate, total cost.
manifest.json.

TODOs¶

Block A — pre-flight (5 min)¶

Confirm PHASE_23_PLAN.md §7.a resolved. Open the chosen provider's console.
If using SSH-based provider (RunPod / Vast / Lambda): generate an SSH key locally if you don't have one (ssh-keygen -t ed25519 -C "lynx-cortex-phase23"). Upload public key to provider. Do not commit private key.
Set a billing alert / spend cap at your monthly budget per learners/borja/profile.md (€50–80). Most providers support this.

Block B — launch (5 min)¶

Pick the smallest reasonable GPU. For Phase 23, an RTX 3090 (24 GB) or A10 (24 GB) is plenty. Spot pricing if available.
Pick the smallest reasonable region (lowest latency to you; Borja is in Spain — prefer EU).
Use a "CUDA pre-installed Ubuntu 22.04" template. Most providers have this. Do not install drivers yourself.
Launch. Note the start time in cost_log.md.

Block C — connect (5 min)¶

SSH in. Confirm: lsb_release -a (Ubuntu 22.04), nvidia-smi (driver + GPU detected).
Pipe nvidia-smi output to nvidia-smi.txt, scp back to your machine.
git clone git@github.com:borjatarraso/lynx-cortex.git on the instance. The repo is private per A16, so authenticate with a deploy key (read-only is enough for Phase 23) or a short-lived PAT — no anonymous HTTPS clone.
cd lynx-cortex && uv sync — install the project's deps in the instance. (uv must be installed: curl -LsSf https://astral.sh/uv/install.sh | sh.)

Block D — smoke test (5 min)¶

Run a one-line cuBLAS GEMM via cupy (or PyTorch — see Plan §7.g; for Phase 23 we use cupy):

import cupy as cp, time
A = cp.random.randn(4096, 4096, dtype=cp.float16)
B = cp.random.randn(4096, 4096, dtype=cp.float16)
cp.cuda.runtime.deviceSynchronize()
t0 = time.perf_counter()
for _ in range(100):
    C = A @ B
cp.cuda.runtime.deviceSynchronize()
t1 = time.perf_counter()
flops = 2 * 4096**3 * 100 / (t1 - t0)
print(f"{flops / 1e12:.1f} TFLOPS fp16")

Save the result to cublas_smoke_test.json with {tflops_measured, gpu_name, dtype, matrix_size, iters}.
Confirm: number should be 30–80% of vendor peak for fp16 on the rented GPU. If <10%, something is wrong — check nvidia-smi for "default" mode (vs MIG-partitioned), check CUDA version vs your cupy build.

Block E — terminate (2 min — don't skip)¶

git add -A && git commit -m "phase: 23 provisioning smoke test" from the instance. Push.
Note the stop time in cost_log.md. Compute total cost = (stop - start) × hourly_rate. Write it down.
Terminate the instance from the provider console. Confirm in the console it's gone (not "stopped" — terminated; stopped instances on some providers still bill for storage).
Verify your spend on the provider's dashboard.

Block F — write the recipe (10 min)¶

In provisioning_log.md, write a future-you recipe with:

Exact commands you ran (copy-pasteable).
Anything that surprised you (e.g., "the spot instance was reclaimed once, had to relaunch").
The exact hourly rate you got (spot vs on-demand).
An estimate of "time-from-decision-to-running-benchmark" — this is what you'll save in future sessions.

Block G — manifest¶

manifest.json:

{
  "experiment": "23-provisioning",
  "date": "YYYY-MM-DD",
  "cloud_provider": "<runpod | lambda | vast | colab | other>",
  "region": "<eu-west | us-east | ...>",
  "gpu_name": "<RTX 3090 | A10 | A100 | ...>",
  "gpu_vram_gib": null,
  "pricing_model": "<spot | on-demand>",
  "hourly_rate_usd": null,
  "session_minutes": null,
  "total_cost_usd": null,
  "cuda_version": null,
  "driver_version": null,
  "ubuntu_version": "22.04",
  "smoke_test_tflops_fp16": null,
  "smoke_test_fraction_of_peak": null
}

Constraints¶

Do not install drivers yourself. Use the provider's CUDA-pre-installed image.
Do not commit SSH private keys, API tokens, or any provider credentials. .gitignore should already cover ~/.ssh, .env, etc.
Do not skip termination. A forgotten instance is a $50–500/day mistake.
Do not run training jobs in Phase 23. This phase is measurement only. Anything that takes >5 minutes is suspicious — review and stop.

Stop conditions¶

Done when:

All six output files committed to experiments/23-provisioning/.
The instance is terminated (confirmed on dashboard).
The cost log shows the actual spend for this session.
The smoke test TFLOPS is in the 30–80% of peak range.
manifest.json complete.

Pitfalls¶

"Stopped" vs "terminated". Stopping a RunPod or Vast instance still bills you for storage. Terminate.
CUDA / cupy version mismatch. If cupy-cuda12x is installed but the image has CUDA 11.8, imports fail. Install the matching cupy build: pip install cupy-cuda11x or cupy-cuda12x matching the image's CUDA.
nvidia-smi shows GPU but kernel launches fail. Permissions issue; ensure your user can access /dev/nvidia*. The Ubuntu CUDA template handles this but custom AMIs may not.
Spot instance reclaimed mid-session. Lose your shell. Re-launch and re-run. Document if it happens.
First cudaMemcpy is 100× slower than steady-state. Context init overhead. Don't time the first iteration.

When to consult `solutions/`¶

After all stop conditions are met. The reference at solutions/00-provision-cloud-gpu-ref.md (written at phase open) shows the exact commands the reference implementation used on RunPod with an A10, plus the cost breakdown.

Next lab: lab/01-device-query.md.