Six steps from workload classification to a signed contract. The sequence matters; jumping to step 6 without doing steps 1 and 2 is the most common reason GPU procurements miss budget by 2x.
1. Classify the workload
Distinguish steady-state inference from bursty inference from short-burst fine-tuning from long-running pretraining. Each maps to a different pricing model. Steady-state inference - reserved capacity. Bursty inference - per-second serverless. Short fine-tuning - on-demand or marketplace. Long-running pretraining - multi-month reservation.
2. Lock the GPU class
Decide whether the workload needs H100 / H200 / B200 SXM with InfiniBand (foundation-model training), A100 (most fine-tuning), L40S (inference and modest training), or A10G / T4 (light inference). Picking the wrong class is the single most expensive mistake at this stage.
3. Build the vendor shortlist
Use the vendor matrix on the homepage. For training clusters, the shortlist is usually 2 specialist clouds (CoreWeave, Lambda, Together, Crusoe) and 1 hyperscaler. For inference, the shortlist is per-second platforms (Modal, Replicate, RunPod Serverless) plus a reserved fallback.
4. Model TCO honestly
Use the calculator on the homepage. Add 25 percent for storage, egress, and MLOps. Add an SRE allocation. Compare year-1 and year-2 TCO; reservation discounts skew the picture toward year 1.
5. Run the RFP
Send a structured RFP to the shortlist. Use the RFP template page on this site. Ask explicitly for hidden line items (egress, storage, support tier, exit terms).
6. Negotiate the contract
Reservation pricing is the main lever. Multi-year commitments unlock 20 to 40 percent further off-list. SRE coverage, capacity SLAs, and exit terms are negotiable separately.
Last verified June 2026.