Replicate GPU pricing 2026
Inference-as-a-service for open-source ML models. Native Cog container format. Per-second billing model that is the simplest deployment surface in the category, at a price premium versus raw GPU rental.
Published rate card
Per-GPU per-hour rates pulled from the vendor pricing page linked above, Last verified June 2026. Rates exclude storage, egress, and any managed-service uplift. Reserved-capacity contracts typically improve on these rates.
| GPU | Configuration | Per GPU per hour |
|---|---|---|
| Nvidia H100 | Per-second, hourly equiv | $5.040 |
| Nvidia A100 80GB | Per-second, hourly equiv | $5.040 |
| Nvidia A100 40GB | Per-second, hourly equiv | $4.140 |
| Nvidia A40 | Per-second, hourly equiv | $2.070 |
| Nvidia L40S | Per-second, hourly equiv | $3.510 |
| Nvidia T4 | Per-second, hourly equiv | $0.810 |
Hidden costs to watch
- Per-second billing applies to GPU-time AND to setup-time when the container cold-starts.
- Custom model deployments (Cog containers) bill the same per-second rates whether the model is in flight or idle and warm.
- No SLA on free tier; volume customers negotiate a private rate.
What Replicate is best for
Inference-only deployments of open-source models (image, video, audio) with zero infrastructure overhead.
See the GPU cloud buying guideWorked example
Acme Vision Co. (illustrative example, not a real company) needs to train a 10 billion-parameter vision-language model on a fixed 8-GPU cluster for 30 days at 18 hours per day duty cycle.
At Replicate's cheapest published rate of $0.81 per GPU-hour, the run costs $3,499 for raw GPU compute, before storage, egress, and MLOps overhead. Add the typical 25 percent year-one uplift and the modelled spend is $4,374. Use the calculator on the homepage to model your own GPU class, cluster size, and duty cycle.
Last verified June 2026. Replicate rates change frequently. Always obtain a vendor quote before purchase.
Visit Replicate pricing page