VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)

Fine-tuning an LLM on 8×H100 for 72 hours is a serious run: you’re locking up 8 top-tier GPUs for three straight days. With VESSL AI, you can get a clean estimate ahead of time and decide whether On-Demand or Reserved makes more sense for this workload.

Below, I’ll walk through the math, compare On-Demand vs Reserved at a realistic discount, and give some guidance on when to step up to Reserved capacity for repeated LLM post-training jobs.

Quick cost estimate: 8×H100 for 72 hours on VESSL AI

VESSL AI’s published On-Demand rate for H100 SXM 80GB is:

$2.39 per GPU-hour (On-Demand)

For 8 GPUs over 72 hours:

Total GPU-hours = 8 GPUs × 72 hours = 576 GPU-hours
On-Demand cost = 576 × $2.39 ≈ $1,376.64

So if you spin up an 8×H100 cluster and leave it running for a 72-hour fine-tune, expect roughly:

~$1,380 On-Demand (before any Reserved discounts)

Reserved pricing is “Contact Sales” because it depends on commitment and volume, but VESSL explicitly offers “volume discounts” and Reserved discounts “up to 40%” in general. That’s our basis for a realistic comparison.

At-a-glance comparison: On-Demand vs Reserved for this run

Rank	Option	Best For	Primary Strength	Watch Out For
1	Reserved	Repeated LLM fine-tuning & production	Lower effective $/GPU-hr with guarantees	Requires commitment + sales conversation
2	On-Demand	One-off or exploratory fine-tunes	No commitment, automatic failover	Higher cost per hour over time
3	Spot	Non-critical experiments, ablations	Lowest possible cost if/when available	Preemptions; not ideal for 72h uninterrupted

For this specific question—a continuous 72-hour fine-tune on 8×H100—Spot is not appropriate because preemptions can trash long runs. The real decision is On-Demand vs Reserved.

Comparison criteria

To keep the comparison grounded in how teams actually run LLM post-training, we’ll use three criteria:

1. Effective cost per GPU-hour:
How much you really pay over the full 72-hour window, factoring in discounts.
2. Reliability & continuity:
Whether the run can survive provider or regional issues without manual “job wrangling.”
3. Commitment & flexibility:
Whether you’re locked into capacity (and for how long) versus spinning up as needed.

1. On-Demand (Best for one-off or exploratory 72-hour runs)

On-Demand on VESSL AI is designed for production workloads that need reliability without long-term commitment.

For 8×H100 over 72 hours:

Hourly rate (per GPU): $2.39
Total GPU-hours: 576
Estimated cost: 576 × 2.39 ≈ $1,376.64

You can think of this as:

~$458.88 per day for an 8×H100 node
(8 GPUs × 24 hours × $2.39 ≈ $458.88)

What On-Demand does well

High availability with failover:
On-Demand is explicitly “Reliable with failover.” If a provider or region has issues, VESSL’s Auto Failover can move the workload to another provider/region without you rewriting everything.
No commitment required:
You request 8×H100, run your 72 hours of fine-tuning, then shut down. No multi-month contract, no capacity planning meeting.
Operational simplicity:
- Use the Web Console for visual cluster management
- Or the CLI (vessl run) for scripted jobs
- Real-time monitoring is built-in, helping you “fire-and-forget” instead of staring at dashboards all weekend.

Tradeoffs & limitations

Higher per-hour cost vs Reserved:
Over a single 72-hour run, $1.3k–$1.4k is fine. But if you do this repeatedly (weekly or multiple times per month), the delta vs Reserved accumulates fast.
No explicit capacity guarantee:
On-Demand is reliable with failover, but you’re still drawing from shared capacity. During peak demand or for large clusters (e.g., 32–64 H100s), you may still want the certainty of a Reserved commitment.

On-Demand decision trigger

Choose On-Demand for this workload if:

This is a one-off or occasional 72-hour fine-tune on 8×H100 (e.g., a single model variant, a POC, or a quarterly retrain).
You need automatic failover and real-time monitoring, but you’re not yet sure how often you’ll repeat the run.
You want to avoid up-front commitments, especially if the architecture or GPU profile you’ll use long term isn’t finalized.

For the exact run described, your ballpark is ~$1,380 On-Demand.

2. Reserved (Best for repeated fine-tunes and production extensions)

Reserved on VESSL AI is for mission-critical AI with:

Capacity guarantee
Volume discounts
Dedicated support

Reserved plans:

Include guaranteed capacity for the GPU SKU you care about (here, H100 SXM),
Offer discounts up to ~40% relative to On-Demand (per VESSL’s general guidance),
Start at 3-month terms and scale with your needs.

To model cost, we’ll use two realistic discount scenarios off the $2.39/hr On-Demand rate:

Moderate discount: 20% off On-Demand
Effective $/GPU-hr ≈ $1.91
Aggressive discount: 40% off On-Demand
Effective $/GPU-hr ≈ $1.43

Note: These are illustrative to show scale; actual discounts for H100 will be set during a sales conversation based on volume and term length.

Scenario A: 20% Reserved discount

Effective rate: $2.39 × 0.8 = $1.912/hr per H100
Total cost for 8×H100 × 72 hours:
- 576 GPU-hours × $1.912 ≈ $1,100.31

Versus On-Demand:

On-Demand: ≈ $1,376.64
Reserved (20% off): ≈ $1,100.31
Savings for one 72-hour run: ≈ $276

Scenario B: 40% Reserved discount

Effective rate: $2.39 × 0.6 = $1.434/hr per H100
Total cost for 8×H100 × 72 hours:
- 576 GPU-hours × $1.434 ≈ $825.98

Versus On-Demand:

On-Demand: ≈ $1,376.64
Reserved (40% off): ≈ $825.98
Savings for one 72-hour run: ≈ $551

So if you’re running this exact workload repeatedly—say weekly:

Weekly On-Demand: ~$1,377
Weekly Reserved (20%): ~$1,100 (save ~$1.4k/month)
Weekly Reserved (40%): ~$826 (save ~$2.2k/month)

And that’s just one 8-GPU job; many teams do this across multiple models or branches.

What Reserved does well

Capacity guarantee for H100 clusters:
You get a guaranteed pool of H100 SXM capacity. For LLM post-training, that means:
- No scrambling for GPUs when a new dataset or architecture lands
- No “quota ceiling” surprises mid-quarter
Lower effective $/GPU-hour at scale:
Once you’re running multi-day jobs repeatedly, the discounts more than pay for the commitment, especially when you’re in the H100/B200 class.
Dedicated support + enterprise posture:
- Dedicated support for debugging and scheduling
- SOC 2 Type II and ISO 27001 for compliance comfort
- SLA-style conversations, custom integrations, and multi-cloud setup with your infra/security teams

Tradeoffs & limitations

Requires commitment (terms from 3 months):
Reserved only makes sense if:
- You know you’ll run enough 8×H100 workloads (or larger) to justify it.
- You’re ready to lock in capacity for at least a quarter.
Sales conversation required:
You’ll need to talk to VESSL’s team to:
- Finalize exact discount percentages
- Nail down cluster shape (8, 16, 32 H100s?), providers, and regions
- Align on SLAs and support expectations

Reserved decision trigger

Move to Reserved for this workload if:

You’re running 8×H100 fine-tunes regularly—e.g., weekly/biweekly/multiple models.
You care about capacity guarantees during launches, eval cycles, or academic deadlines.
You want to push unit cost down while still keeping VESSL’s reliability stack:
- Auto Failover
- Multi-Cluster visibility
- Real-time monitoring
- Dedicated support

For one single 72-hour run, Reserved is optional. For ongoing LLM post-training, it’s usually the better fit.

3. Spot (Why it’s not ideal for a 72-hour continuous fine-tune)

VESSL also exposes Spot as an operational mode:

Best for: cheap experiments, short jobs, non-critical workloads
Key behavior: instances can be preempted

Spot pricing for H100 on VESSL is currently marked “Coming Soon” in the public table, but regardless of the eventual number, the tradeoff stands:

Pro: Lower cost per GPU-hour than On-Demand
Con: Your 72-hour run can be interrupted, forcing restarts/checkpoint restores

For a 72-hour LLM fine-tune where you care about:

Stable throughput
No surprise preemptions
Minimal “job wrangling”

Spot is not the right tool. Use it for:

Architecture search
Ablations
Short-batch runs
Preprocessing

Once you’ve settled on a configuration, switch to On-Demand or Reserved for the 72-hour production-grade fine-tune.

Putting it together: cost vs reliability vs commitment

For 8×H100 over 72 hours on VESSL AI:

On-Demand (no commitment)
- Rate: $2.39 per H100-hour
- 72-hour 8×H100 run: ≈ $1,376.64
- Best for: one-off runs, POCs, irregular experiments
Reserved (3+ month commitment, discounted)
- Example 20% discount: ≈ $1,100 per 72-hour 8×H100 run
- Example 40% discount: ≈ $826 per 72-hour 8×H100 run
- Best for: regular fine-tunes, multiple model variants, production retraining
Spot (preemptible)
- Not priced yet in the table for H100, but will be cheaper per hour
- Not suitable for 72-hour uninterrupted fine-tunes you can’t afford to restart

Final verdict

If you just need to fine-tune an LLM once on 8×H100 for 72 hours, assume roughly:

~$1,380 on VESSL AI On-Demand at the current H100 SXM rate of $2.39/hr per GPU.

If you expect to run this job (or larger) repeatedly, it’s worth moving to Reserved capacity:

You unlock meaningful discounts on H100 hours.
You get capacity guarantees and dedicated support layered on top of VESSL’s automatic failover and multi-cloud orchestration.
Over a quarter or year, the savings versus pure On-Demand stack up quickly.

Next step

Get Started with VESSL AI to:

Spin up an 8×H100 cluster via Web Console or CLI in minutes
Run your 72-hour fine-tune on On-Demand today
Talk to sales about Reserved H100 capacity if you’re planning recurring LLM post-training runs

VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)

Quick cost estimate: 8×H100 for 72 hours on VESSL AI

At-a-glance comparison: On-Demand vs Reserved for this run

Comparison criteria

1. On-Demand (Best for one-off or exploratory 72-hour runs)

What On-Demand does well

Tradeoffs & limitations

On-Demand decision trigger

2. Reserved (Best for repeated fine-tunes and production extensions)

Scenario A: 20% Reserved discount

Scenario B: 40% Reserved discount

What Reserved does well

Tradeoffs & limitations

Reserved decision trigger

3. Spot (Why it’s not ideal for a 72-hour continuous fine-tune)

Putting it together: cost vs reliability vs commitment

Final verdict

Next step

Keep Reading

More from GPU Cloud Infrastructure

How do I mount S3/object storage or a GitHub repo into a VESSL AI run or workspace?

How do I set up a persistent GPU Workspace in VESSL AI with Jupyter + SSH access?

How do I start a training run on VESSL AI using the CLI (vessl run) with a YAML file?