
VESSL AI pricing: what are the current on-demand hourly rates for H100/A100/B200/GB200?
VESSL AI publishes transparent, SKU-level pricing so you can plan GPU budgets without sales calls or guesswork. If you’re comparing high-end accelerators specifically, here are the current on-demand hourly rates for H100, A100, B200, and GB200 on VESSL Cloud.
All prices below are taken directly from the latest VESSL AI public pricing table. For the most current numbers, always confirm on the official pricing page before committing a large run.
Current On-Demand Hourly Rates (H100, A100, B200, GB200)
Quick pricing snapshot
| GPU Model | VRAM | On-Demand Hourly Rate | Notes |
|---|---|---|---|
| A100 SXM | 80GB | $1.55/hr | General-purpose workhorse for LLMs and vision workloads |
| H100 SXM | 80GB | $2.39/hr | Best fit for large LLM post-training and dense training jobs |
| B200 | 192GB | $5.50/hr | High-VRAM next-gen GPU for frontier-scale models |
| GB200 | 192GB | $6.50/hr | Premium next-gen option for the heaviest workloads |
These are on-demand prices: pay-as-you-go, no commitment, with reliability features like automatic failover when you run in VESSL’s On-Demand tier.
How these GPUs map to real workloads
You’re usually not just asking “what’s the price?” You’re asking “which GPU gives me the fastest path from run to result at a reasonable cost?” Here’s how I’d think about it as an infra operator.
A100 SXM 80GB — $1.55/hr
Best when you need proven, cost-efficient training at scale.
Use it for:
- Mid-to-large LLM finetuning and instruction tuning
- Vision models and multimodal work that fit in 80GB
- Batch inference services where predictability beats bleeding edge
Why pick A100 here:
- Lower cost per hour means you can run more parallel experiments.
- Still widely used in top research labs; lots of stable software stacks and reference configs.
H100 SXM 80GB — $2.39/hr
Best when you care about throughput and time-to-result on large LLM workloads.
Use it for:
- LLM post-training (RLHF, DPO, RLAIF) on large context models
- Heavy transformer models with long sequence lengths
- Physical AI and AI-for-Science workloads that actually saturate tensor cores
Why pay up for H100:
- Higher effective FLOPs and better performance on transformer-heavy code.
- In practice, a job that finishes faster on H100 can cost less overall, even at a higher hourly rate, if it cuts wall time enough.
B200 — $5.50/hr
Best when VRAM is your bottleneck, not just raw FLOPs.
Use it for:
- Very large models that struggle to fit or shard cleanly on 80GB cards
- High-resolution multimodal models that blow past 80GB easily
- Advanced AI-for-Science simulations that keep large states in memory
Why B200 at this price point:
- 192GB VRAM removes a lot of ugly tensor-parallel gymnastics.
- Fewer GPUs needed for the same model, which simplifies orchestration and reduces cross-node communication overhead.
GB200 — $6.50/hr
Best when you’re pushing frontier-scale and latency-sensitive workloads.
Use it for:
- Cutting-edge LLM training where you want every generation’s gain
- Ultra-high-throughput inference clusters serving many users
- Mission-critical experiments where both performance and stability matter
Why consider GB200 despite the premium:
- Higher-end option within the next-gen family; built for the heaviest workloads.
- Makes more sense when the cost of being slow (or not finishing) dwarfs the extra $1/hr versus B200.
Where on-demand pricing fits into VESSL’s reliability tiers
On VESSL Cloud, these hourly rates sit inside the broader reliability model:
-
Spot (preemptible, discounted)
- Not yet available for these SKUs in the current table (listed as “Coming Soon”).
- Best for non-critical, restartable experiments once Spot launches.
-
On-Demand (reliable with failover)
- The prices above are On-Demand rates.
- Best for production workloads where you want high availability and automatic failover across providers/regions.
-
Reserved (capacity guarantee, discounts)
- Pricing is “Contact Sales,” with up to 40% discounts for commitments.
- Best for mission-critical and long-running programs where capacity guarantees matter more than flexibility.
In practice:
- Use On-Demand A100/H100 when you’re iterating fast and need dependable capacity.
- Move to Reserved B200/GB200 when your team has a stable, heavy workload and you can justify a term commitment for the discount and guaranteed capacity.
How to estimate your GPU budget with these rates
To turn these hourly rates into a realistic budget:
-
Pick the GPU tier aligned to your bottleneck
- Constrained by VRAM? Start at B200/GB200.
- Constrained by throughput but okay on memory? Consider H100.
- Constrained by budget, with moderate models? Use A100.
-
Estimate GPU-hours
- Example: 8× H100 cluster running for 10 hours →
8 GPUs × 10 hours × $2.39/hr ≈ $191.20 for that run.
- Example: 8× H100 cluster running for 10 hours →
-
Compare wall-time performance
- If a job is 2× faster on H100 than A100, the higher hourly rate might still be cheaper in total spend.
-
Consider stepping up to Reserved
- If your monthly usage stabilizes, talk to sales for Reserved pricing to reduce effective hourly cost by up to 40% and lock in capacity.
Summary: which GPU and price should you choose?
-
Choose A100 SXM 80GB at $1.55/hr when:
- You want stable, cost-efficient training and inference.
- Your models fit in 80GB and you care about running a lot of experiments.
-
Choose H100 SXM 80GB at $2.39/hr when:
- You’re doing serious LLM post-training or dense transformer workloads.
- Time-to-result and throughput matter more than the raw hourly rate.
-
Choose B200 at $5.50/hr when:
- VRAM is the constraint and you’re running massive models or multimodal stacks.
- You want to simplify your parallelism strategy with 192GB of memory.
-
Choose GB200 at $6.50/hr when:
- You’re at frontier scale and every bit of performance and stability counts.
- The cost of a slow or failed run is far higher than the GPU bill.
All of these GPUs can be provisioned via the VESSL Web Console or CLI (vessl run) in minutes, with On-Demand giving you automatic failover and real-time monitoring across providers and regions.