
VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)
Fine-tuning an LLM on 8×H100 for 72 hours is a serious run: you’re locking up 8 top-tier GPUs for three straight days. With VESSL AI, you can get a clean estimate ahead of time and decide whether On-Demand or Reserved makes more sense for this workload.
Below, I’ll walk through the math, compare On-Demand vs Reserved at a realistic discount, and give some guidance on when to step up to Reserved capacity for repeated LLM post-training jobs.
Quick cost estimate: 8×H100 for 72 hours on VESSL AI
VESSL AI’s published On-Demand rate for H100 SXM 80GB is:
- $2.39 per GPU-hour (On-Demand)
For 8 GPUs over 72 hours:
- Total GPU-hours =
8 GPUs × 72 hours = 576 GPU-hours - On-Demand cost = 576 × $2.39 ≈ $1,376.64
So if you spin up an 8×H100 cluster and leave it running for a 72-hour fine-tune, expect roughly:
- ~$1,380 On-Demand (before any Reserved discounts)
Reserved pricing is “Contact Sales” because it depends on commitment and volume, but VESSL explicitly offers “volume discounts” and Reserved discounts “up to 40%” in general. That’s our basis for a realistic comparison.
At-a-glance comparison: On-Demand vs Reserved for this run
| Rank | Option | Best For | Primary Strength | Watch Out For |
|---|---|---|---|---|
| 1 | Reserved | Repeated LLM fine-tuning & production | Lower effective $/GPU-hr with guarantees | Requires commitment + sales conversation |
| 2 | On-Demand | One-off or exploratory fine-tunes | No commitment, automatic failover | Higher cost per hour over time |
| 3 | Spot | Non-critical experiments, ablations | Lowest possible cost if/when available | Preemptions; not ideal for 72h uninterrupted |
For this specific question—a continuous 72-hour fine-tune on 8×H100—Spot is not appropriate because preemptions can trash long runs. The real decision is On-Demand vs Reserved.
Comparison criteria
To keep the comparison grounded in how teams actually run LLM post-training, we’ll use three criteria:
-
1. Effective cost per GPU-hour:
How much you really pay over the full 72-hour window, factoring in discounts. -
2. Reliability & continuity:
Whether the run can survive provider or regional issues without manual “job wrangling.” -
3. Commitment & flexibility:
Whether you’re locked into capacity (and for how long) versus spinning up as needed.
1. On-Demand (Best for one-off or exploratory 72-hour runs)
On-Demand on VESSL AI is designed for production workloads that need reliability without long-term commitment.
For 8×H100 over 72 hours:
- Hourly rate (per GPU): $2.39
- Total GPU-hours: 576
- Estimated cost:
576 × 2.39 ≈ $1,376.64
You can think of this as:
- ~$458.88 per day for an 8×H100 node
(8 GPUs × 24 hours × $2.39 ≈ $458.88)
What On-Demand does well
-
High availability with failover:
On-Demand is explicitly “Reliable with failover.” If a provider or region has issues, VESSL’s Auto Failover can move the workload to another provider/region without you rewriting everything. -
No commitment required:
You request 8×H100, run your 72 hours of fine-tuning, then shut down. No multi-month contract, no capacity planning meeting. -
Operational simplicity:
- Use the Web Console for visual cluster management
- Or the CLI (
vessl run) for scripted jobs - Real-time monitoring is built-in, helping you “fire-and-forget” instead of staring at dashboards all weekend.
Tradeoffs & limitations
-
Higher per-hour cost vs Reserved:
Over a single 72-hour run, $1.3k–$1.4k is fine. But if you do this repeatedly (weekly or multiple times per month), the delta vs Reserved accumulates fast. -
No explicit capacity guarantee:
On-Demand is reliable with failover, but you’re still drawing from shared capacity. During peak demand or for large clusters (e.g., 32–64 H100s), you may still want the certainty of a Reserved commitment.
On-Demand decision trigger
Choose On-Demand for this workload if:
- This is a one-off or occasional 72-hour fine-tune on 8×H100 (e.g., a single model variant, a POC, or a quarterly retrain).
- You need automatic failover and real-time monitoring, but you’re not yet sure how often you’ll repeat the run.
- You want to avoid up-front commitments, especially if the architecture or GPU profile you’ll use long term isn’t finalized.
For the exact run described, your ballpark is ~$1,380 On-Demand.
2. Reserved (Best for repeated fine-tunes and production extensions)
Reserved on VESSL AI is for mission-critical AI with:
- Capacity guarantee
- Volume discounts
- Dedicated support
Reserved plans:
- Include guaranteed capacity for the GPU SKU you care about (here, H100 SXM),
- Offer discounts up to ~40% relative to On-Demand (per VESSL’s general guidance),
- Start at 3-month terms and scale with your needs.
To model cost, we’ll use two realistic discount scenarios off the $2.39/hr On-Demand rate:
-
Moderate discount: 20% off On-Demand
Effective $/GPU-hr ≈ $1.91 -
Aggressive discount: 40% off On-Demand
Effective $/GPU-hr ≈ $1.43
Note: These are illustrative to show scale; actual discounts for H100 will be set during a sales conversation based on volume and term length.
Scenario A: 20% Reserved discount
- Effective rate: $2.39 × 0.8 = $1.912/hr per H100
- Total cost for 8×H100 × 72 hours:
- 576 GPU-hours × $1.912 ≈ $1,100.31
Versus On-Demand:
- On-Demand: ≈ $1,376.64
- Reserved (20% off): ≈ $1,100.31
- Savings for one 72-hour run: ≈ $276
Scenario B: 40% Reserved discount
- Effective rate: $2.39 × 0.6 = $1.434/hr per H100
- Total cost for 8×H100 × 72 hours:
- 576 GPU-hours × $1.434 ≈ $825.98
Versus On-Demand:
- On-Demand: ≈ $1,376.64
- Reserved (40% off): ≈ $825.98
- Savings for one 72-hour run: ≈ $551
So if you’re running this exact workload repeatedly—say weekly:
- Weekly On-Demand: ~$1,377
- Weekly Reserved (20%): ~$1,100 (save ~$1.4k/month)
- Weekly Reserved (40%): ~$826 (save ~$2.2k/month)
And that’s just one 8-GPU job; many teams do this across multiple models or branches.
What Reserved does well
-
Capacity guarantee for H100 clusters:
You get a guaranteed pool of H100 SXM capacity. For LLM post-training, that means:- No scrambling for GPUs when a new dataset or architecture lands
- No “quota ceiling” surprises mid-quarter
-
Lower effective $/GPU-hour at scale:
Once you’re running multi-day jobs repeatedly, the discounts more than pay for the commitment, especially when you’re in the H100/B200 class. -
Dedicated support + enterprise posture:
- Dedicated support for debugging and scheduling
- SOC 2 Type II and ISO 27001 for compliance comfort
- SLA-style conversations, custom integrations, and multi-cloud setup with your infra/security teams
Tradeoffs & limitations
-
Requires commitment (terms from 3 months):
Reserved only makes sense if:- You know you’ll run enough 8×H100 workloads (or larger) to justify it.
- You’re ready to lock in capacity for at least a quarter.
-
Sales conversation required:
You’ll need to talk to VESSL’s team to:- Finalize exact discount percentages
- Nail down cluster shape (8, 16, 32 H100s?), providers, and regions
- Align on SLAs and support expectations
Reserved decision trigger
Move to Reserved for this workload if:
- You’re running 8×H100 fine-tunes regularly—e.g., weekly/biweekly/multiple models.
- You care about capacity guarantees during launches, eval cycles, or academic deadlines.
- You want to push unit cost down while still keeping VESSL’s reliability stack:
- Auto Failover
- Multi-Cluster visibility
- Real-time monitoring
- Dedicated support
For one single 72-hour run, Reserved is optional. For ongoing LLM post-training, it’s usually the better fit.
3. Spot (Why it’s not ideal for a 72-hour continuous fine-tune)
VESSL also exposes Spot as an operational mode:
- Best for: cheap experiments, short jobs, non-critical workloads
- Key behavior: instances can be preempted
Spot pricing for H100 on VESSL is currently marked “Coming Soon” in the public table, but regardless of the eventual number, the tradeoff stands:
- Pro: Lower cost per GPU-hour than On-Demand
- Con: Your 72-hour run can be interrupted, forcing restarts/checkpoint restores
For a 72-hour LLM fine-tune where you care about:
- Stable throughput
- No surprise preemptions
- Minimal “job wrangling”
Spot is not the right tool. Use it for:
- Architecture search
- Ablations
- Short-batch runs
- Preprocessing
Once you’ve settled on a configuration, switch to On-Demand or Reserved for the 72-hour production-grade fine-tune.
Putting it together: cost vs reliability vs commitment
For 8×H100 over 72 hours on VESSL AI:
-
On-Demand (no commitment)
- Rate: $2.39 per H100-hour
- 72-hour 8×H100 run: ≈ $1,376.64
- Best for: one-off runs, POCs, irregular experiments
-
Reserved (3+ month commitment, discounted)
- Example 20% discount: ≈ $1,100 per 72-hour 8×H100 run
- Example 40% discount: ≈ $826 per 72-hour 8×H100 run
- Best for: regular fine-tunes, multiple model variants, production retraining
-
Spot (preemptible)
- Not priced yet in the table for H100, but will be cheaper per hour
- Not suitable for 72-hour uninterrupted fine-tunes you can’t afford to restart
Final verdict
If you just need to fine-tune an LLM once on 8×H100 for 72 hours, assume roughly:
- ~$1,380 on VESSL AI On-Demand at the current H100 SXM rate of $2.39/hr per GPU.
If you expect to run this job (or larger) repeatedly, it’s worth moving to Reserved capacity:
- You unlock meaningful discounts on H100 hours.
- You get capacity guarantees and dedicated support layered on top of VESSL’s automatic failover and multi-cloud orchestration.
- Over a quarter or year, the savings versus pure On-Demand stack up quickly.
Next step
Get Started with VESSL AI to:
- Spin up an 8×H100 cluster via Web Console or CLI in minutes
- Run your 72-hour fine-tune on On-Demand today
- Talk to sales about Reserved H100 capacity if you’re planning recurring LLM post-training runs