Hidden costs - Last verified July 2026

Hidden GPU cloud costs in 2026

Headline GPU-hour rates are roughly 60 to 75 percent of year-one TCO for a realistic AI training or inference workload. The remaining 25 to 40 percent splits across six line items that almost never appear on a vendor pricing page. Percentage bands below are derived from the vendor pricing pages on this site and are illustrative ranges for planning purposes, not a published industry benchmark.

5-12% of TCO

Egress and inter-region transfer

AWS, Azure, and GCP meter bandwidth out of region by the GB. Training datasets that move between buckets, model checkpoints copied to a different region for inference, and customer downloads from a hosted-model endpoint all add up. Specialist clouds (CoreWeave, Lambda, Crusoe) bill less aggressively on egress but it is not free.

3-8% of TCO

Persistent storage and snapshots

Block storage, distributed file system, and object storage volumes attached to a GPU instance are billed per GB-month and continue to bill when the GPU is detached. Snapshots stack quickly during fine-tuning runs. Lifecycle policy is a real cost lever, not an afterthought.

5-15% of TCO

Idle and warm-pool time

Serverless GPU platforms (Modal, Replicate, RunPod Serverless) bill per-second of GPU time AND container-warm time, which means an inference endpoint with low traffic but a long keepalive can spend more on idle than on serving. Reserved-capacity clusters bill every hour of the reservation whether the GPU is doing useful work or not.

8-18% of TCO

MLOps platform and orchestration

Weights & Biases, Determined, MosaicML, Anyscale, Comet, or a self-built Argo / Kubeflow setup each add platform cost. Vendor pricing is typically per-user, per-seat, or per-experiment-hour and is a meaningful share of any production AI workload that goes beyond a single notebook.

2-8% of TCO

Support tier and account engineering

Hyperscaler support plans (AWS Enterprise Support, Azure Premier, GCP Premium) are billed as a percentage of cloud spend. Specialist clouds bundle different levels of support into the reservation price. Premium support is the only way to get an SLA on capacity for a production cluster.

Variable, often understated

On-call and SRE coverage

Even on managed clouds, somebody has to babysit a multi-day training run, react to a checkpoint failure, and triage when a node falls out of an HGX node group. Honest TCO models include an SRE allocation, not just GPU-hours.

Worked example

Acme Foundation Co. (illustrative example, not a real company) is running an 8x H100 SXM cluster on CoreWeave for 6 months at 20 hours per day. That is 8 GPUs x 20 hours x 30 days x 6 months = 28,800 GPU-hours. At the CoreWeave H100 SXM published list rate of $6.155 per GPU-hour (HGX H100 8x node, per-GPU equiv from the CoreWeave pricing page), raw GPU compute is 28,800 x $6.155 = $177,264.

Line item	Basis	Cost
Raw GPU compute	28,800 GPU-hr x $6.155	$177,264
Egress / inter-region (8%)	0.08 x $177,264	$14,181
Persistent storage (5%)	0.05 x $177,264	$8,863
Idle / warm-pool (8%)	0.08 x $177,264	$14,181
MLOps platform (12%)	0.12 x $177,264	$21,272
Support tier (5%)	0.05 x $177,264	$8,863
SRE allocation (6 months)	Flat allocation	$48,000
Total TCO	Sum of line items	$292,624

Total TCO of $292,624 is a 1.65x uplift on the $177,264 headline GPU-hour line. Reserved-capacity contracts at CoreWeave typically price below the on-demand list rate; a 1-3 year reservation would reduce the headline compute line and the percentage uplifts would apply to that reduced base. Percentage bands above are illustrative planning ranges, not a published industry benchmark.

Last verified July 2026.