Pricing model
On-demand per-hour
The default for hyperscalers and most specialist clouds. The customer pays the published rate from the moment the instance is in a Running state until it is Stopped or Terminated. Rates vary by GPU class, region, and instance topology.
Upside
No commitment. Capacity provisioned on demand. Easy to model.
Trade-off
Headline list rate is the highest tier the vendor offers. No discount for utilisation. Some H100 SKUs are intermittently unavailable on pure on-demand.
Used by: CoreWeave, Lambda, Crusoe, AWS, Azure, GCP, Hyperstack, Oracle, DigitalOcean, Paperspace
Pricing model
Per-second (serverless)
Billing accrues per-second of active GPU time. Optimised for inference and short-burst training where instance lifetime is measured in seconds or minutes. Container cold-start and warm-pool time are billed separately or rolled in.
Upside
Pay nothing when no requests are in flight. Strong fit for spiky inference. No reservation needed.
Trade-off
Per-second hourly-equivalent rates are typically a premium over the same GPU on-demand. Cold-start billing surprises low-traffic deployments.
Used by: Modal, Replicate, RunPod Serverless
Pricing model
Reserved capacity
The customer commits to a defined cluster (often 8x to 1,024x H100 or H200) for 1 to 36 months in exchange for a meaningful discount on the on-demand rate. The most common procurement shape for production training clusters.
Upside
Headline rate is 30 to 60 percent below the same vendor's on-demand. Capacity is guaranteed for the term.
Trade-off
You pay the reservation whether you use it or not. Cluster size is fixed; scaling up means a new contract. Exit fees can apply mid-term.
Used by: CoreWeave, Lambda Reserved Cloud, Together AI, Crusoe, DigitalOcean (12-mo), AWS Capacity Blocks for ML, Azure Reserved Instances, Hyperstack