Independent reference. We are independent of every vendor listed. No affiliate links. No sponsored placements.
Guide - Last verified June 2026

GPU pricing 2026: the ultimate guide to GPU cloud cost

A single reference for how GPU cloud is priced in 2026, what the real H100 cost-per-hour bands look like across 17 vendors, and the cost levers (reservation term, region, GPU class, duty cycle) that decide what you actually pay.

Section 01

Why GPU pricing in 2026 looks the way it does

The H100 supply crunch of 2024-2025 has loosened. Public list rates on H100 SXM have fallen roughly 30 to 50 percent from their peak across the specialist clouds (CoreWeave, Lambda Labs, Together AI, Crusoe) as capacity from H200, B200, and GB200 deployments has come online. The hyperscalers (AWS, Azure, GCP) still publish materially higher list rates and the value proposition there is integration, not headline price.

Section 02

The four pricing models that dominate GPU cloud

On-demand per-hour is the default at the hyperscalers and most specialist clouds. Per-second billing is the Modal, Replicate, and RunPod model and is the right shape for inference and short bursts. Marketplace models (Vast.ai) clear capacity from independent hosts at the cost of interruptibility and variance. Reserved capacity (1, 12, 36 month) is how every serious training cluster is procured and unlocks the headline low rates. See How vendors price GPU compute for the full breakdown.

Section 03

H100 cost per hour, normalised across 17 clouds

On the cheapest end, Vast.ai marketplace H100 floor is $1.49 per GPU-hour for interruptible capacity. RunPod Community Cloud is $1.99, DigitalOcean 12-month reserved is $1.99, Hyperstack on-demand is $2.40, and the specialist-cloud reserved-cluster floor (CoreWeave, Lambda, Together) is $3.29 to $3.95 per GPU-hour for H100 SXM. Hyperscaler on-demand list rates are $11 to $13 per GPU-hour before any negotiated discount. See the H100 cost per hour page for the full table.

Section 04

The hidden cost line items

Headline GPU-hour rates are roughly 60 to 75 percent of year-one TCO for a realistic training-or-inference workload. Storage, egress, networking, MLOps tooling, on-call coverage, and idle warm-pool time make up the rest. The Hidden GPU cloud costs page itemises the categories with realistic ranges.

Section 05

Reservation strategy

Reserved capacity is the single biggest lever. CoreWeave, Lambda, Together, Crusoe, and DigitalOcean all publish reservation-tier rates that are 30 to 60 percent below the same vendor's on-demand rate. The hyperscalers offer Savings Plans, Reserved Instances, and AWS-specific Capacity Blocks for ML that reserve dated H100 windows in advance. A multi-month commitment is the default procurement shape for any production training cluster.

Vendor coverage on this site

Last verified June 2026.