VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)
GPU Cloud Infrastructure

VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)

9 min read

Most teams don’t struggle with the modeling math. They struggle with the bill. If you’re planning to fine-tune an LLM on 8×H100 for 72 hours, you want a clear, SKU-level estimate for VESSL AI—and you want to know whether On-Demand or Reserved makes more sense.

Quick Answer: The best overall choice for a one-off or infrequent 72-hour fine-tuning run on 8×H100 is On-Demand. If your priority is guaranteed capacity and long-term cost efficiency, Reserved is often a stronger fit. For ongoing or mission-critical post-training pipelines, consider Reserved with capacity guarantees and support.


At-a-Glance Comparison

Below, we assume the published On-Demand rate for H100 SXM 80GB: $2.39/hr per GPU.
Reserved pricing is “Contact Sales,” so we’ll use a realistic illustrative discount range (up to ~40%) to bracket costs. Always check the VESSL pricing page or talk to sales for exact Reserved quotes.

RankOptionBest ForPrimary StrengthWatch Out For
1On-DemandSingle or occasional 72-hour fine-tuning runsSimple, pay-as-you-go with failoverNo long-term discount; pay list price per hour
2Reserved (12+ weeks of regular runs)Teams running repeated or scheduled fine-tuningLower effective hourly cost, guaranteed capacityRequires commitment; need to size capacity correctly
3Hybrid (On-Demand now, move to Reserved later)Teams still validating workload size & cadenceStart immediately, then lock in discounts once usage is clearTwo-step process; discounts only apply after you reserve

Comparison Criteria

We evaluated On-Demand vs Reserved for this scenario against three concrete criteria:

  • Total Cost for 72 Hours on 8×H100:
    What you actually pay to run a single 72-hour fine-tuning job on 8 H100 SXM GPUs.

  • Capacity Reliability & Scheduling Risk:
    How confident you can be that 8×H100 will be available when your run needs to start, and how resilient your job is to provider or region issues.

  • Fit for Your Workload Pattern:
    Whether you’re doing a one-off experiment, periodic post-training, or a recurring production pipeline—and how that meshes with each pricing tier.


Step-by-Step Cost Estimate

Base assumption: H100 SXM 80GB pricing

From VESSL AI’s published pricing (USD):

  • H100 SXM 80GB On-Demand: $2.39/hr per GPU

We’ll estimate what it costs to run:

  • GPU count: 8 H100 SXM
  • Runtime: 72 hours continuous
  • Total GPU-hours: 8 GPUs × 72 hours = 576 GPU-hours

On-Demand cost estimate for 8×H100, 72 hours

Formula:

  • Cost = GPU-hours × On-Demand rate

Calculation:

  • 576 GPU-hours × $2.39/hr = $1,376.64

You can treat this as:

  • Approximate On-Demand cost: $1,375–$1,400
  • Includes:
    • Access to 8×H100 SXM 80GB
    • Automatic failover across providers/regions
    • Real-time monitoring
    • No commitment; pay only for the 72 hours you actually run

Reserved cost estimate for 8×H100, 72 hours (illustrative)

Reserved pricing on VESSL is “Contact Sales,” but documentation states:

  • Reserved:
    • Guaranteed capacity
    • Volume discounts
    • Terms starting at 3 months
    • Discounts up to ~40% depending on commitment and volume

To estimate the cost of a single 72-hour run under a Reserved plan, we’ll model a few possible discount levels. These are example figures, not official quotes.

Example 1: ~20% Effective Discount

  • Approx. Reserved rate: $2.39 × 0.8 ≈ $1.91/hr per H100

Cost:

  • 576 GPU-hours × $1.91/hr ≈ $1,099.00

So at ~20% discount:

  • Estimated Reserved cost for this run: ~$1,100
  • Estimated savings vs On-Demand: ~$275–$300 for this single job

Example 2: ~30% Effective Discount

  • Approx. Reserved rate: $2.39 × 0.7 ≈ $1.67/hr per H100

Cost:

  • 576 GPU-hours × $1.67/hr ≈ $962.00

At ~30% discount:

  • Estimated Reserved cost: ~$950–$975
  • Savings vs On-Demand: ~$400–$425

Example 3: ~40% Effective Discount (upper bound case)

  • Approx. Reserved rate: $2.39 × 0.6 ≈ $1.43/hr per H100

Cost:

  • 576 GPU-hours × $1.43/hr ≈ $824.00

At ~40% discount:

  • Estimated Reserved cost: ~$825–$850
  • Savings vs On-Demand: ~$525–$550

Again: these Reserved rates are illustrative to show how commitment changes the economics. You’ll need to speak to the VESSL team for your exact rate card.


Detailed Breakdown

1. On-Demand (Best overall for one-off 72-hour fine-tuning)

On-Demand ranks as the top choice here because it gives you reliable H100 capacity with automatic failover and no commitment, which is ideal when you just need to run a 72-hour fine-tuning job and move on.

What it does well:

  • Simple, predictable pricing:
    You know the H100 SXM rate up front: $2.39/hr per GPU.
    For 8 GPUs over 72 hours, you’re looking at ~$1,375–$1,400 all-in on compute.

  • High availability with Auto Failover:

    • VESSL’s On-Demand tier is built for production-style reliability.
    • Auto Failover can switch providers or regions under the hood if one fails, keeping your 72-hour run alive.
    • You avoid wasting time re-queuing or micro-managing cloud-specific outages.
  • Zero long-term lock-in:

    • No minimum term.
    • Perfect if you’re still experimenting with model sizes, sequence lengths, or don’t yet know your exact post-training schedule.

Tradeoffs & Limitations:

  • No built-in volume discount:
    • You pay the published hourly rate for each GPU-hour.
    • As your usage grows (e.g., weekly 72-hour runs on 8–16×H100), the lack of a discount becomes obvious compared to Reserved pricing.

Decision Trigger:
Choose On-Demand if you want to fine-tune an LLM on 8×H100 for 72 hours once or occasionally, and you prioritize instant access, automatic failover, and pay-as-you-go simplicity over long-term discounts.


2. Reserved (Best for repeated or scheduled fine-tuning)

Reserved is the strongest fit when you know you’ll run multiple fine-tuning jobs—not just one—and you don’t want your experiments or pipelines to depend on “hoping 8×H100 is free.”

Reserved takes the same underlying GPU capacity and wraps it with capacity guarantees, pricing discounts, and dedicated support.

What it does well:

  • Guaranteed capacity for 8×H100 (or more):

    • You reserve a block of H100s for your team.
    • When it’s time to kick off a 72-hour post-training job, those GPUs are yours, not subject to quota surprises.
    • This matters for time-sensitive workloads—deadlines, paper submissions, or production retraining windows.
  • Lower effective hourly cost:

    • VESSL offers volume discounts under Reserved.
    • With realistic discounts in the 20–40% range, that same 72-hour job can drop from ~$1,375–$1,400 to somewhere in the $825–$1,100 range.
    • Over a quarter or year, the savings stack up.
  • Better fit for recurring pipelines:

    • If you retrain every week or every release cycle, Reserved aligns with how you actually operate.
    • You minimize “job wrangling” around capacity and run scheduling; you get more “fire-and-forget” execution.

Tradeoffs & Limitations:

  • Requires a term commitment:

    • Reserved terms start at 3 months.
    • If you only need one 72-hour run, ever, it’s usually not worth negotiating a Reserved block just for that.
  • You must estimate your capacity needs:

    • Under-reserve and you still hit limits.
    • Over-reserve and you pay for unused headroom.
    • This is fine once you know your steady-state workloads, less ideal in the earliest experimentation phase.

Decision Trigger:
Choose Reserved if you want to run 8×H100 fine-tuning jobs regularly (weekly/monthly) or as part of a mission-critical pipeline, and you prioritize capacity guarantees, lower effective hourly cost, and dedicated support over full flexibility.


3. Hybrid: On-Demand now, Reserved once usage is clear (Best for evolving workloads)

A hybrid approach stands out for many teams because it matches how usage usually evolves:

  1. You don’t yet know if 8×H100 for 72 hours will be your final configuration.
  2. You don’t yet know your exact cadence—once a month, once a week, or just once.
  3. You still want to move fast and avoid grinding on procurement from day one.

What it does well:

  • Start with On-Demand:

    • Run your first 72-hour fine-tune on 8×H100 using On-Demand (~$1,375–$1,400).
    • Use this to validate:
      • Is 8 GPUs enough?
      • Do you need 16?
      • Is 72 hours realistic, or do you often need 96?
    • You gather real numbers instead of guessing.
  • Move to Reserved once patterns stabilize:

    • Once you know your typical GPU count and runtime, talk to VESSL about a Reserved block sized to that pattern.
    • At that point, discounted hourly cost and capacity guarantee directly reflect your actual usage, not assumptions.

Tradeoffs & Limitations:

  • Two-step operational path:
    • You don’t get Reserved discounts on your first few runs.
    • You’ll need a quick internal loop to approve moving that known workload into a Reserved term once it’s stable.

Decision Trigger:
Choose this Hybrid path if you want to start fine-tuning immediately with minimal friction, and you expect to scale into a recurring post-training workload that will later benefit from Reserved capacity and discounts.


Cost Summary Table: 8×H100 for 72 Hours

Using the H100 SXM On-Demand rate ($2.39/hr) plus illustrative Reserved discounts:

Plan TypeRate AssumptionTotal GPU-Hours (8×H100, 72h)Estimated Total CostNotes
On-Demand$2.39/hr576$1,376.64Published rate; pay-as-you-go; automatic failover
Reserved – mild discount~$1.91/hr (~20% off)576≈$1,099Illustrative; exact discounts via sales
Reserved – moderate discount~$1.67/hr (~30% off)576≈$962Better fit for recurring workloads
Reserved – aggressive discount~$1.43/hr (~40% off)576≈$824Example upper bound with strong commitment/volume

Use this as a planning range; the real Reserved numbers will depend on:

  • Contract length (terms start at 3 months)
  • Number of GPUs and SKUs (H100 only vs mixed A100/H100/B200)
  • Expected utilization (how often you keep the GPUs busy)

How this plays out for real LLM fine-tuning

When you plot this against real workloads:

  • Single experiment, 8×H100, 72 hours:

    • On-Demand: budget ~$1.4k for compute.
    • You get high availability, monitoring, and you’re done.
  • Monthly fine-tune, 8×H100, 72–96 hours:

    • On-Demand: ~$1.4k–$1.8k per run, every month.
    • After a few months, Reserved starts to look materially cheaper.
  • Weekly retraining or multiple concurrent fine-tunes:

    • On-Demand will work, but you’re leaving money and reliability on the table.
    • Reserved with guaranteed blocks of H100s is usually the right default.
    • You get both cost efficiency and lower job-wrangling overhead—no hunting for slots or juggling different clouds.

Final Verdict

For a single 72-hour fine-tune on 8×H100, On-Demand is the practical answer on VESSL AI: you’re looking at roughly $1,375–$1,400 in GPU cost, with automatic failover and no commitment.

If you’ll be repeating this fine-tune regularly—weekly or monthly—or scaling it out to more GPUs or longer runs, Reserved capacity becomes the better strategic choice. With realistic volume discounts in the 20–40% range, the effective cost of that same 72-hour run can drop into the $825–$1,100 band while locking in guaranteed capacity and dedicated support.

The clean way to approach it:

  • First run or two: Use On-Demand, measure GPU-hours and cadence.
  • Once stable: Move to Reserved, size the block to your real workload, and treat H100s as a constant, reliable input—rather than a quota problem you fight every time you retrain.

Next Step

Get Started