Points clΓ©s
The cheapest viable cloud GPU is the one that fits your model with the smallest VRAM headroom. Renting a $4/hr H100 to run a 13B model wastes 60+ GB of VRAM you are paying for.
For 7B-13B inference: an RTX 4090 24 GB on a marketplace (Vast.ai, RunPod community pool) at $0.30-0.80/hr. The 24 GB of VRAM is plenty, and consumer-card marketplaces undercut managed clouds.
For 70B inference or mid-scale fine-tuning: an A100 80 GB at $0.90-1.90/hr. The 80 GB of VRAM fits a 70B model at Q4 with context room. For frontier-model training or production serving with strict latency targets: an H100 80 GB at $2.20-4.00/hr β only worth it when sustained throughput is the constraint.
Ranges below are approximate May 2026 figures across major providers (RunPod, Vast.ai, Lambda Labs, and others). The low end is typically interruptible or marketplace pricing; the high end is on-demand managed cloud.
| GPU | VRAM | Hourly rate (approx) | Best for |
|---|---|---|---|
| RTX 4090 | 24 GB | $0.30-0.80/hr | 7B-30B inference, light fine-tuning |
| A100 80 GB | 80 GB | $0.90-1.90/hr | 70B inference, most fine-tuning |
| H100 80 GB | 80 GB | $2.20-4.00/hr | Large-scale training, latency-critical serving |