
VESSL AI: estimate cost to fine-tune an LLM on 8×H100 for 72 hours (on-demand vs reserved)
Most teams don’t struggle with the modeling math. They struggle with the bill. If you’re planning to fine-tune an LLM on 8×H100 for 72 hours, you want a clear, SKU-level estimate for VESSL AI—and you want to know whether On-Demand or Reserved makes more sense.
Quick Answer: The best overall choice for a one-off or infrequent 72-hour fine-tuning run on 8×H100 is On-Demand. If your priority is guaranteed capacity and long-term cost efficiency, Reserved is often a stronger fit. For ongoing or mission-critical post-training pipelines, consider Reserved with capacity guarantees and support.
At-a-Glance Comparison
Below, we assume the published On-Demand rate for H100 SXM 80GB: $2.39/hr per GPU.
Reserved pricing is “Contact Sales,” so we’ll use a realistic illustrative discount range (up to ~40%) to bracket costs. Always check the VESSL pricing page or talk to sales for exact Reserved quotes.
| Rank | Option | Best For | Primary Strength | Watch Out For |
|---|---|---|---|---|
| 1 | On-Demand | Single or occasional 72-hour fine-tuning runs | Simple, pay-as-you-go with failover | No long-term discount; pay list price per hour |
| 2 | Reserved (12+ weeks of regular runs) | Teams running repeated or scheduled fine-tuning | Lower effective hourly cost, guaranteed capacity | Requires commitment; need to size capacity correctly |
| 3 | Hybrid (On-Demand now, move to Reserved later) | Teams still validating workload size & cadence | Start immediately, then lock in discounts once usage is clear | Two-step process; discounts only apply after you reserve |
Comparison Criteria
We evaluated On-Demand vs Reserved for this scenario against three concrete criteria:
-
Total Cost for 72 Hours on 8×H100:
What you actually pay to run a single 72-hour fine-tuning job on 8 H100 SXM GPUs. -
Capacity Reliability & Scheduling Risk:
How confident you can be that 8×H100 will be available when your run needs to start, and how resilient your job is to provider or region issues. -
Fit for Your Workload Pattern:
Whether you’re doing a one-off experiment, periodic post-training, or a recurring production pipeline—and how that meshes with each pricing tier.
Step-by-Step Cost Estimate
Base assumption: H100 SXM 80GB pricing
From VESSL AI’s published pricing (USD):
- H100 SXM 80GB On-Demand: $2.39/hr per GPU
We’ll estimate what it costs to run:
- GPU count: 8 H100 SXM
- Runtime: 72 hours continuous
- Total GPU-hours: 8 GPUs × 72 hours = 576 GPU-hours
On-Demand cost estimate for 8×H100, 72 hours
Formula:
- Cost = GPU-hours × On-Demand rate
Calculation:
- 576 GPU-hours × $2.39/hr = $1,376.64
You can treat this as:
- Approximate On-Demand cost: $1,375–$1,400
- Includes:
- Access to 8×H100 SXM 80GB
- Automatic failover across providers/regions
- Real-time monitoring
- No commitment; pay only for the 72 hours you actually run
Reserved cost estimate for 8×H100, 72 hours (illustrative)
Reserved pricing on VESSL is “Contact Sales,” but documentation states:
- Reserved:
- Guaranteed capacity
- Volume discounts
- Terms starting at 3 months
- Discounts up to ~40% depending on commitment and volume
To estimate the cost of a single 72-hour run under a Reserved plan, we’ll model a few possible discount levels. These are example figures, not official quotes.
Example 1: ~20% Effective Discount
- Approx. Reserved rate: $2.39 × 0.8 ≈ $1.91/hr per H100
Cost:
- 576 GPU-hours × $1.91/hr ≈ $1,099.00
So at ~20% discount:
- Estimated Reserved cost for this run: ~$1,100
- Estimated savings vs On-Demand: ~$275–$300 for this single job
Example 2: ~30% Effective Discount
- Approx. Reserved rate: $2.39 × 0.7 ≈ $1.67/hr per H100
Cost:
- 576 GPU-hours × $1.67/hr ≈ $962.00
At ~30% discount:
- Estimated Reserved cost: ~$950–$975
- Savings vs On-Demand: ~$400–$425
Example 3: ~40% Effective Discount (upper bound case)
- Approx. Reserved rate: $2.39 × 0.6 ≈ $1.43/hr per H100
Cost:
- 576 GPU-hours × $1.43/hr ≈ $824.00
At ~40% discount:
- Estimated Reserved cost: ~$825–$850
- Savings vs On-Demand: ~$525–$550
Again: these Reserved rates are illustrative to show how commitment changes the economics. You’ll need to speak to the VESSL team for your exact rate card.
Detailed Breakdown
1. On-Demand (Best overall for one-off 72-hour fine-tuning)
On-Demand ranks as the top choice here because it gives you reliable H100 capacity with automatic failover and no commitment, which is ideal when you just need to run a 72-hour fine-tuning job and move on.
What it does well:
-
Simple, predictable pricing:
You know the H100 SXM rate up front: $2.39/hr per GPU.
For 8 GPUs over 72 hours, you’re looking at ~$1,375–$1,400 all-in on compute. -
High availability with Auto Failover:
- VESSL’s On-Demand tier is built for production-style reliability.
- Auto Failover can switch providers or regions under the hood if one fails, keeping your 72-hour run alive.
- You avoid wasting time re-queuing or micro-managing cloud-specific outages.
-
Zero long-term lock-in:
- No minimum term.
- Perfect if you’re still experimenting with model sizes, sequence lengths, or don’t yet know your exact post-training schedule.
Tradeoffs & Limitations:
- No built-in volume discount:
- You pay the published hourly rate for each GPU-hour.
- As your usage grows (e.g., weekly 72-hour runs on 8–16×H100), the lack of a discount becomes obvious compared to Reserved pricing.
Decision Trigger:
Choose On-Demand if you want to fine-tune an LLM on 8×H100 for 72 hours once or occasionally, and you prioritize instant access, automatic failover, and pay-as-you-go simplicity over long-term discounts.
2. Reserved (Best for repeated or scheduled fine-tuning)
Reserved is the strongest fit when you know you’ll run multiple fine-tuning jobs—not just one—and you don’t want your experiments or pipelines to depend on “hoping 8×H100 is free.”
Reserved takes the same underlying GPU capacity and wraps it with capacity guarantees, pricing discounts, and dedicated support.
What it does well:
-
Guaranteed capacity for 8×H100 (or more):
- You reserve a block of H100s for your team.
- When it’s time to kick off a 72-hour post-training job, those GPUs are yours, not subject to quota surprises.
- This matters for time-sensitive workloads—deadlines, paper submissions, or production retraining windows.
-
Lower effective hourly cost:
- VESSL offers volume discounts under Reserved.
- With realistic discounts in the 20–40% range, that same 72-hour job can drop from ~$1,375–$1,400 to somewhere in the $825–$1,100 range.
- Over a quarter or year, the savings stack up.
-
Better fit for recurring pipelines:
- If you retrain every week or every release cycle, Reserved aligns with how you actually operate.
- You minimize “job wrangling” around capacity and run scheduling; you get more “fire-and-forget” execution.
Tradeoffs & Limitations:
-
Requires a term commitment:
- Reserved terms start at 3 months.
- If you only need one 72-hour run, ever, it’s usually not worth negotiating a Reserved block just for that.
-
You must estimate your capacity needs:
- Under-reserve and you still hit limits.
- Over-reserve and you pay for unused headroom.
- This is fine once you know your steady-state workloads, less ideal in the earliest experimentation phase.
Decision Trigger:
Choose Reserved if you want to run 8×H100 fine-tuning jobs regularly (weekly/monthly) or as part of a mission-critical pipeline, and you prioritize capacity guarantees, lower effective hourly cost, and dedicated support over full flexibility.
3. Hybrid: On-Demand now, Reserved once usage is clear (Best for evolving workloads)
A hybrid approach stands out for many teams because it matches how usage usually evolves:
- You don’t yet know if 8×H100 for 72 hours will be your final configuration.
- You don’t yet know your exact cadence—once a month, once a week, or just once.
- You still want to move fast and avoid grinding on procurement from day one.
What it does well:
-
Start with On-Demand:
- Run your first 72-hour fine-tune on 8×H100 using On-Demand (~$1,375–$1,400).
- Use this to validate:
- Is 8 GPUs enough?
- Do you need 16?
- Is 72 hours realistic, or do you often need 96?
- You gather real numbers instead of guessing.
-
Move to Reserved once patterns stabilize:
- Once you know your typical GPU count and runtime, talk to VESSL about a Reserved block sized to that pattern.
- At that point, discounted hourly cost and capacity guarantee directly reflect your actual usage, not assumptions.
Tradeoffs & Limitations:
- Two-step operational path:
- You don’t get Reserved discounts on your first few runs.
- You’ll need a quick internal loop to approve moving that known workload into a Reserved term once it’s stable.
Decision Trigger:
Choose this Hybrid path if you want to start fine-tuning immediately with minimal friction, and you expect to scale into a recurring post-training workload that will later benefit from Reserved capacity and discounts.
Cost Summary Table: 8×H100 for 72 Hours
Using the H100 SXM On-Demand rate ($2.39/hr) plus illustrative Reserved discounts:
| Plan Type | Rate Assumption | Total GPU-Hours (8×H100, 72h) | Estimated Total Cost | Notes |
|---|---|---|---|---|
| On-Demand | $2.39/hr | 576 | $1,376.64 | Published rate; pay-as-you-go; automatic failover |
| Reserved – mild discount | ~$1.91/hr (~20% off) | 576 | ≈$1,099 | Illustrative; exact discounts via sales |
| Reserved – moderate discount | ~$1.67/hr (~30% off) | 576 | ≈$962 | Better fit for recurring workloads |
| Reserved – aggressive discount | ~$1.43/hr (~40% off) | 576 | ≈$824 | Example upper bound with strong commitment/volume |
Use this as a planning range; the real Reserved numbers will depend on:
- Contract length (terms start at 3 months)
- Number of GPUs and SKUs (H100 only vs mixed A100/H100/B200)
- Expected utilization (how often you keep the GPUs busy)
How this plays out for real LLM fine-tuning
When you plot this against real workloads:
-
Single experiment, 8×H100, 72 hours:
- On-Demand: budget ~$1.4k for compute.
- You get high availability, monitoring, and you’re done.
-
Monthly fine-tune, 8×H100, 72–96 hours:
- On-Demand: ~$1.4k–$1.8k per run, every month.
- After a few months, Reserved starts to look materially cheaper.
-
Weekly retraining or multiple concurrent fine-tunes:
- On-Demand will work, but you’re leaving money and reliability on the table.
- Reserved with guaranteed blocks of H100s is usually the right default.
- You get both cost efficiency and lower job-wrangling overhead—no hunting for slots or juggling different clouds.
Final Verdict
For a single 72-hour fine-tune on 8×H100, On-Demand is the practical answer on VESSL AI: you’re looking at roughly $1,375–$1,400 in GPU cost, with automatic failover and no commitment.
If you’ll be repeating this fine-tune regularly—weekly or monthly—or scaling it out to more GPUs or longer runs, Reserved capacity becomes the better strategic choice. With realistic volume discounts in the 20–40% range, the effective cost of that same 72-hour run can drop into the $825–$1,100 band while locking in guaranteed capacity and dedicated support.
The clean way to approach it:
- First run or two: Use On-Demand, measure GPU-hours and cadence.
- Once stable: Move to Reserved, size the block to your real workload, and treat H100s as a constant, reliable input—rather than a quota problem you fight every time you retrain.