Together AI GPU pricing 2026
AI cloud that pairs reserved H100 / H200 clusters with a Together Inference API for serving open-source models. Founded by the team behind RedPajama and the Open-LLaMA training runs.
Published rate card
Per-GPU per-hour rates pulled from the vendor pricing page linked above, Last verified June 2026. Rates exclude storage, egress, and any managed-service uplift. Reserved-capacity contracts typically improve on these rates.
| GPU | Configuration | Per GPU per hour |
|---|---|---|
| H100 SXM | Reserved cluster, per-GPU | $3.290 |
| H200 SXM | Reserved cluster, per-GPU | $4.990 |
| A100 80GB | Reserved cluster, per-GPU | $2.400 |
| L40S | On-demand | $1.690 |
Hidden costs to watch
- Reserved-cluster rates (the headline H100 number) need 1-12 month minimum terms.
- Inference API billed per-million-tokens; serverless model hosting billed per-second.
- Network storage, dedicated endpoints, and fine-tuning runs all metered separately.
What Together AI is best for
Teams that want H100 / H200 / B200 clusters with managed networking and a high-throughput inference API alongside.
See the GPU cloud buying guideWorked example
Acme Vision Co. (illustrative example, not a real company) needs to train a 10 billion-parameter vision-language model on a fixed 8-GPU cluster for 30 days at 18 hours per day duty cycle.
At Together AI's cheapest published rate of $3.29 per GPU-hour, the run costs $14,213 for raw GPU compute, before storage, egress, and MLOps overhead. Add the typical 25 percent year-one uplift and the modelled spend is $17,766. Use the calculator on the homepage to model your own GPU class, cluster size, and duty cycle.
Last verified June 2026. Together AI rates change frequently. Always obtain a vendor quote before purchase.
Visit Together AI pricing page