VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)
GPU Cloud Infrastructure

VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)

8 min read

Fine-tuning an LLM on 8×H100 for 72 hours is a serious run: you’re locking up 8 top-tier GPUs for three straight days. With VESSL AI, you can get a clean estimate ahead of time and decide whether On-Demand or Reserved makes more sense for this workload.

Below, I’ll walk through the math, compare On-Demand vs Reserved at a realistic discount, and give some guidance on when to step up to Reserved capacity for repeated LLM post-training jobs.


Quick cost estimate: 8×H100 for 72 hours on VESSL AI

VESSL AI’s published On-Demand rate for H100 SXM 80GB is:

  • $2.39 per GPU-hour (On-Demand)

For 8 GPUs over 72 hours:

  • Total GPU-hours = 8 GPUs × 72 hours = 576 GPU-hours
  • On-Demand cost = 576 × $2.39 ≈ $1,376.64

So if you spin up an 8×H100 cluster and leave it running for a 72-hour fine-tune, expect roughly:

  • ~$1,380 On-Demand (before any Reserved discounts)

Reserved pricing is “Contact Sales” because it depends on commitment and volume, but VESSL explicitly offers “volume discounts” and Reserved discounts “up to 40%” in general. That’s our basis for a realistic comparison.


At-a-glance comparison: On-Demand vs Reserved for this run

RankOptionBest ForPrimary StrengthWatch Out For
1ReservedRepeated LLM fine-tuning & productionLower effective $/GPU-hr with guaranteesRequires commitment + sales conversation
2On-DemandOne-off or exploratory fine-tunesNo commitment, automatic failoverHigher cost per hour over time
3SpotNon-critical experiments, ablationsLowest possible cost if/when availablePreemptions; not ideal for 72h uninterrupted

For this specific question—a continuous 72-hour fine-tune on 8×H100—Spot is not appropriate because preemptions can trash long runs. The real decision is On-Demand vs Reserved.


Comparison criteria

To keep the comparison grounded in how teams actually run LLM post-training, we’ll use three criteria:

  • 1. Effective cost per GPU-hour:
    How much you really pay over the full 72-hour window, factoring in discounts.

  • 2. Reliability & continuity:
    Whether the run can survive provider or regional issues without manual “job wrangling.”

  • 3. Commitment & flexibility:
    Whether you’re locked into capacity (and for how long) versus spinning up as needed.


1. On-Demand (Best for one-off or exploratory 72-hour runs)

On-Demand on VESSL AI is designed for production workloads that need reliability without long-term commitment.

For 8×H100 over 72 hours:

  • Hourly rate (per GPU): $2.39
  • Total GPU-hours: 576
  • Estimated cost: 576 × 2.39 ≈ $1,376.64

You can think of this as:

  • ~$458.88 per day for an 8×H100 node
    (8 GPUs × 24 hours × $2.39 ≈ $458.88)

What On-Demand does well

  • High availability with failover:
    On-Demand is explicitly “Reliable with failover.” If a provider or region has issues, VESSL’s Auto Failover can move the workload to another provider/region without you rewriting everything.

  • No commitment required:
    You request 8×H100, run your 72 hours of fine-tuning, then shut down. No multi-month contract, no capacity planning meeting.

  • Operational simplicity:

    • Use the Web Console for visual cluster management
    • Or the CLI (vessl run) for scripted jobs
    • Real-time monitoring is built-in, helping you “fire-and-forget” instead of staring at dashboards all weekend.

Tradeoffs & limitations

  • Higher per-hour cost vs Reserved:
    Over a single 72-hour run, $1.3k–$1.4k is fine. But if you do this repeatedly (weekly or multiple times per month), the delta vs Reserved accumulates fast.

  • No explicit capacity guarantee:
    On-Demand is reliable with failover, but you’re still drawing from shared capacity. During peak demand or for large clusters (e.g., 32–64 H100s), you may still want the certainty of a Reserved commitment.

On-Demand decision trigger

Choose On-Demand for this workload if:

  • This is a one-off or occasional 72-hour fine-tune on 8×H100 (e.g., a single model variant, a POC, or a quarterly retrain).
  • You need automatic failover and real-time monitoring, but you’re not yet sure how often you’ll repeat the run.
  • You want to avoid up-front commitments, especially if the architecture or GPU profile you’ll use long term isn’t finalized.

For the exact run described, your ballpark is ~$1,380 On-Demand.


2. Reserved (Best for repeated fine-tunes and production extensions)

Reserved on VESSL AI is for mission-critical AI with:

  • Capacity guarantee
  • Volume discounts
  • Dedicated support

Reserved plans:

  • Include guaranteed capacity for the GPU SKU you care about (here, H100 SXM),
  • Offer discounts up to ~40% relative to On-Demand (per VESSL’s general guidance),
  • Start at 3-month terms and scale with your needs.

To model cost, we’ll use two realistic discount scenarios off the $2.39/hr On-Demand rate:

  1. Moderate discount: 20% off On-Demand
    Effective $/GPU-hr ≈ $1.91

  2. Aggressive discount: 40% off On-Demand
    Effective $/GPU-hr ≈ $1.43

Note: These are illustrative to show scale; actual discounts for H100 will be set during a sales conversation based on volume and term length.

Scenario A: 20% Reserved discount

  • Effective rate: $2.39 × 0.8 = $1.912/hr per H100
  • Total cost for 8×H100 × 72 hours:
    • 576 GPU-hours × $1.912 ≈ $1,100.31

Versus On-Demand:

  • On-Demand: ≈ $1,376.64
  • Reserved (20% off): ≈ $1,100.31
  • Savings for one 72-hour run:$276

Scenario B: 40% Reserved discount

  • Effective rate: $2.39 × 0.6 = $1.434/hr per H100
  • Total cost for 8×H100 × 72 hours:
    • 576 GPU-hours × $1.434 ≈ $825.98

Versus On-Demand:

  • On-Demand: ≈ $1,376.64
  • Reserved (40% off): ≈ $825.98
  • Savings for one 72-hour run:$551

So if you’re running this exact workload repeatedly—say weekly:

  • Weekly On-Demand: ~$1,377
  • Weekly Reserved (20%): ~$1,100 (save ~$1.4k/month)
  • Weekly Reserved (40%): ~$826 (save ~$2.2k/month)

And that’s just one 8-GPU job; many teams do this across multiple models or branches.

What Reserved does well

  • Capacity guarantee for H100 clusters:
    You get a guaranteed pool of H100 SXM capacity. For LLM post-training, that means:

    • No scrambling for GPUs when a new dataset or architecture lands
    • No “quota ceiling” surprises mid-quarter
  • Lower effective $/GPU-hour at scale:
    Once you’re running multi-day jobs repeatedly, the discounts more than pay for the commitment, especially when you’re in the H100/B200 class.

  • Dedicated support + enterprise posture:

    • Dedicated support for debugging and scheduling
    • SOC 2 Type II and ISO 27001 for compliance comfort
    • SLA-style conversations, custom integrations, and multi-cloud setup with your infra/security teams

Tradeoffs & limitations

  • Requires commitment (terms from 3 months):
    Reserved only makes sense if:

    • You know you’ll run enough 8×H100 workloads (or larger) to justify it.
    • You’re ready to lock in capacity for at least a quarter.
  • Sales conversation required:
    You’ll need to talk to VESSL’s team to:

    • Finalize exact discount percentages
    • Nail down cluster shape (8, 16, 32 H100s?), providers, and regions
    • Align on SLAs and support expectations

Reserved decision trigger

Move to Reserved for this workload if:

  • You’re running 8×H100 fine-tunes regularly—e.g., weekly/biweekly/multiple models.
  • You care about capacity guarantees during launches, eval cycles, or academic deadlines.
  • You want to push unit cost down while still keeping VESSL’s reliability stack:
    • Auto Failover
    • Multi-Cluster visibility
    • Real-time monitoring
    • Dedicated support

For one single 72-hour run, Reserved is optional. For ongoing LLM post-training, it’s usually the better fit.


3. Spot (Why it’s not ideal for a 72-hour continuous fine-tune)

VESSL also exposes Spot as an operational mode:

  • Best for: cheap experiments, short jobs, non-critical workloads
  • Key behavior: instances can be preempted

Spot pricing for H100 on VESSL is currently marked “Coming Soon” in the public table, but regardless of the eventual number, the tradeoff stands:

  • Pro: Lower cost per GPU-hour than On-Demand
  • Con: Your 72-hour run can be interrupted, forcing restarts/checkpoint restores

For a 72-hour LLM fine-tune where you care about:

  • Stable throughput
  • No surprise preemptions
  • Minimal “job wrangling”

Spot is not the right tool. Use it for:

  • Architecture search
  • Ablations
  • Short-batch runs
  • Preprocessing

Once you’ve settled on a configuration, switch to On-Demand or Reserved for the 72-hour production-grade fine-tune.


Putting it together: cost vs reliability vs commitment

For 8×H100 over 72 hours on VESSL AI:

  • On-Demand (no commitment)

    • Rate: $2.39 per H100-hour
    • 72-hour 8×H100 run: ≈ $1,376.64
    • Best for: one-off runs, POCs, irregular experiments
  • Reserved (3+ month commitment, discounted)

    • Example 20% discount: ≈ $1,100 per 72-hour 8×H100 run
    • Example 40% discount: ≈ $826 per 72-hour 8×H100 run
    • Best for: regular fine-tunes, multiple model variants, production retraining
  • Spot (preemptible)

    • Not priced yet in the table for H100, but will be cheaper per hour
    • Not suitable for 72-hour uninterrupted fine-tunes you can’t afford to restart

Final verdict

If you just need to fine-tune an LLM once on 8×H100 for 72 hours, assume roughly:

  • ~$1,380 on VESSL AI On-Demand at the current H100 SXM rate of $2.39/hr per GPU.

If you expect to run this job (or larger) repeatedly, it’s worth moving to Reserved capacity:

  • You unlock meaningful discounts on H100 hours.
  • You get capacity guarantees and dedicated support layered on top of VESSL’s automatic failover and multi-cloud orchestration.
  • Over a quarter or year, the savings versus pure On-Demand stack up quickly.

Next step

Get Started with VESSL AI to:

  • Spin up an 8×H100 cluster via Web Console or CLI in minutes
  • Run your 72-hour fine-tune on On-Demand today
  • Talk to sales about Reserved H100 capacity if you’re planning recurring LLM post-training runs