Vendor pricing - Last verified July 2026

Replicate GPU pricing 2026

Inference-as-a-service for open-source ML models. Native Cog container format. Per-second billing model that is the simplest deployment surface in the category, at a price premium versus raw GPU rental.

Pricing model

per-second

Cheapest published rate

$0.81 per GPU-hour

Public source

https://replicate.com/pricing

Last verified July 2026

Published rate card

Per-GPU per-hour rates pulled from the vendor pricing page linked above, Last verified July 2026. Rates exclude storage, egress, and any managed-service uplift. Reserved-capacity contracts typically improve on these rates.

GPU	Configuration	Per GPU per hour
Nvidia H100	Per-second, hourly equiv	$5.490
Nvidia A100 80GB	Per-second, hourly equiv	$5.040
Nvidia L40S	Per-second, hourly equiv	$3.510
Nvidia T4	Per-second, hourly equiv	$0.810

Hidden costs to watch

Per-second billing applies to GPU-time AND to setup-time when the container cold-starts.
Custom model deployments (Cog containers) bill the same per-second rates whether the model is in flight or idle and warm.
No SLA on free tier; volume customers negotiate a private rate.

What Replicate is best for

Inference-only deployments of open-source models (image, video, audio) with zero infrastructure overhead.

See the GPU cloud buying guide

Worked example

Acme Vision Co. (illustrative example, not a real company) needs to train a 10 billion-parameter vision-language model on a fixed 8-GPU cluster for 30 days at 18 hours per day duty cycle.

At Replicate's cheapest published rate of $0.81 per GPU-hour, the run costs $3,499 for raw GPU compute, before storage, egress, and MLOps overhead. Add the typical 25 percent year-one uplift and the modelled spend is $4,374. Use the calculator on the homepage to model your own GPU class, cluster size, and duty cycle.

Last verified July 2026. Replicate rates change frequently. Always obtain a vendor quote before purchase.

Visit Replicate pricing page