Modal GPU pricing 2026
Serverless GPU platform built on per-second billing and Python-native deployment primitives. Designed for inference and batch jobs rather than long-lived training clusters.
Published rate card
Per-GPU per-hour rates pulled from the vendor pricing page linked above, Last verified June 2026. Rates exclude storage, egress, and any managed-service uplift. Reserved-capacity contracts typically improve on these rates.
| GPU | Configuration | Per GPU per hour |
|---|---|---|
| H100 | Per-second, hourly equiv | $3.950 |
| A100 80GB | Per-second, hourly equiv | $2.780 |
| A100 40GB | Per-second, hourly equiv | $2.100 |
| L40S | Per-second, hourly equiv | $1.950 |
| A10G | Per-second, hourly equiv | $1.100 |
| T4 | Per-second, hourly equiv | $0.590 |
Hidden costs to watch
- Per-second billing is GPU-time only; cold-start container build time is billed on CPU at a separate rate.
- Outbound egress, persistent volumes, and dictionaries metered on top of GPU time.
- Team plan is required to remove the free-tier rate-limits.
What Modal is best for
Serverless inference, batch jobs, and ephemeral training where pay-per-second on cold start matters.
See the GPU cloud buying guideWorked example
Acme Vision Co. (illustrative example, not a real company) needs to train a 10 billion-parameter vision-language model on a fixed 8-GPU cluster for 30 days at 18 hours per day duty cycle.
At Modal's cheapest published rate of $0.59 per GPU-hour, the run costs $2,549 for raw GPU compute, before storage, egress, and MLOps overhead. Add the typical 25 percent year-one uplift and the modelled spend is $3,186. Use the calculator on the homepage to model your own GPU class, cluster size, and duty cycle.
Last verified June 2026. Modal rates change frequently. Always obtain a vendor quote before purchase.
Visit Modal pricing page