
VESSL AI reserved capacity: how do I request a 3-month+ commitment and estimate the discount?
Most teams don’t ask about Reserved capacity because they love contracts. They ask because they’re tired of capacity risk: “Will we actually have H100s when the run needs to start?” and “What discount do we get for committing for 3–12 months?”
If you’re planning a multi-month training roadmap on VESSL Cloud, Reserved is the tier that makes those questions go away: guaranteed capacity, dedicated support, and volume discounts, with terms starting at 3 months.
Quick Answer: To request a 3‑month+ Reserved commitment and estimate your discount, you (1) baseline your GPU needs in On‑Demand, (2) estimate monthly consumption, (3) map that to a 3–12 month term, and (4) contact VESSL Sales with your target GPUs, term, and regions. The team will return a proposal with per‑hour Reserved rates and the effective discount versus On‑Demand.
At-a-Glance Comparison
| Rank | Option | Best For | Primary Strength | Watch Out For |
|---|---|---|---|---|
| 1 | Reserved | Mission‑critical AI workloads with predictable GPU needs | Guaranteed capacity with volume discounts and dedicated support | Requires a 3‑month+ commitment and capacity planning |
| 2 | On‑Demand | Production workloads that need reliability and failover without long commitments | Automatic failover, high availability, pay‑as‑you‑go | Higher effective cost vs Reserved at sustained scale |
| 3 | Spot | Experiments, batch jobs, and non‑critical workloads that can be interrupted | Lowest hourly cost | Capacity can be preempted; no guarantees for long runs |
Comparison Criteria
We evaluate Reserved vs On‑Demand vs Spot on three practical axes:
-
Capacity certainty:
Whether you can trust that a specific GPU class (e.g., A100/H100/H200/B200/GB200/B300) will be there when your job starts—even during peak demand or provider issues. -
Cost efficiency over time:
The effective $/GPU‑hour you pay as your usage shifts from “burst” to “always‑on,” and how much discount a 3‑month+ commitment can unlock versus pure pay‑as‑you‑go. -
Operational overhead & support:
How much “job wrangling” your team does—monitoring, re‑queuing, managing outages—and whether you get dedicated support and clear SLAs when something breaks.
Detailed Breakdown
1. Reserved (Best overall for mission‑critical, predictable workloads)
Reserved ranks as the top choice if you already know you’ll be running serious workloads—LLM post‑training, Physical AI, AI for Science—on high‑end GPUs for at least a few months.
With Reserved, you trade flexibility for certainty: you commit to a baseline of A100/H100/H200/B200/GB200/B300‑class capacity for 3+ months in exchange for guaranteed access, dedicated support, and volume discounts.
What it does well:
-
Guaranteed capacity:
You’re not competing with other customers or cloud waitlists. Once your Reserved plan is in place, that capacity is held for you for the agreed term. This is crucial if:- You’re scheduling multi‑week training runs
- You must align with product launch or research deadlines
- You’re sensitive to “GPU sold out” surprises during a crunch
-
Volume discounts vs On‑Demand:
Reserved pricing is designed to reward sustained usage. While exact discount percentages depend on:- GPU class (e.g., H100 vs A100)
- Region/provider mix
- Commitment length and volume you should expect materially lower effective rates than pure On‑Demand at the same scale.
A typical estimation workflow:
- Take the published On‑Demand hourly rate for a given GPU SKU.
- Multiply by your expected monthly hours (e.g., 24×7 or 12×5).
- Use that as a baseline to discuss Reserved discounts with Sales.
-
Dedicated support & procurement readiness:
Reserved plans come with:- Dedicated support channels
- Capacity planning guidance
- SLA and compliance alignment (SOC 2 Type II, ISO 27001) This matters if you’re an enterprise, government, or large lab where procurement and risk teams care about guarantees, not just raw GPU availability.
Tradeoffs & Limitations:
-
3‑month+ commitment required:
Reserved plans start at a 3‑month term. Longer terms (6–12 months) can unlock deeper discounts, but they also lock in a capacity envelope you should realistically use.- Over‑committing = idle GPUs and sunk cost.
- Under‑committing = you still fall back to On‑Demand when you grow.
-
Planning overhead:
You’ll invest time upfront to:- Profile workloads (GPU type, hours, concurrency)
- Decide how much you want guaranteed vs left flexible on On‑Demand or Spot This is a one‑time cost that pays off if you’re running large, repeatable workloads.
Decision Trigger:
Choose Reserved if you want guaranteed capacity at a discounted rate and you’re comfortable committing for 3+ months. You should prioritize it when:
- Your GPU needs are predictable (e.g., ongoing LLM post‑training or recurring production retrains).
- Downtime or unavailability has real business or research impact.
- You want a clear, forecastable GPU budget with volume discounts.
2. On‑Demand (Best for production with flexible horizons)
On‑Demand is the strongest fit when you need reliability and failover, but aren’t ready to lock in a 3‑month+ commitment. It’s the default mode for many teams during the ramp‑up period before switching part of their usage to Reserved.
On‑Demand is “Reliable with failover”—you pay as you go, and VESSL handles automatic failover across providers to keep workloads alive.
What it does well:
-
High availability with automatic failover:
On‑Demand includes:- Multi‑cloud failover if a provider or region has issues
- High availability built into the orchestration layer
That means fewer pager alerts for provider outages and more “fire‑and‑forget” jobs.
-
Pay‑as‑you‑go flexibility:
No commitments, no minimums. You:- Spin up A100/H100/H200/B200/GB200/B300 instances as needed
- Ramp down to zero during slow periods
This is ideal for: - New workloads where usage patterns aren’t known yet
- Pilots and PoCs before you commit to a Reserved plan
- Bursty workloads that spike during certain sprints
Tradeoffs & Limitations:
- Higher effective cost at sustained scale:
If you’re running 24×7 or near‑continuous workloads, pure On‑Demand will usually cost more over a 3–12 month horizon than a matched Reserved commitment with discounts.
Decision Trigger:
Choose On‑Demand if you want reliability and automatic failover with no commitment. It’s the right call when:
- You’re still learning your real GPU consumption.
- You expect usage to fluctuate significantly month to month.
- You plan to move stable baseline usage into Reserved later.
3. Spot (Best for cost‑sensitive experiments and batch jobs)
Spot stands out when your priority is minimizing cost and your workloads can tolerate interruption. Think: hyperparameter sweeps, ablation studies, or non‑urgent batch inference.
Spot taps into preemptible excess capacity across providers at lower hourly rates—but that capacity can disappear.
What it does well:
-
Lowest cost per GPU‑hour:
Spot is usually the cheapest way to run:- Short‑lived experiments
- Parallelized research jobs
- Non‑critical pipelines where re‑tries are acceptable
-
Great for “embarrassingly parallel” work:
If your workload can be split across many short jobs, Spot lets you:- Scale out temporarily to many GPUs
- Absorb preemptions by retrying or rebalancing
Tradeoffs & Limitations:
- No guarantees; preemption risk:
With Spot:- Capacity can be reclaimed by the provider.
- Long uninterrupted training runs are risky.
- You shouldn’t plan mission‑critical deadlines around it.
Decision Trigger:
Choose Spot if you want the lowest possible cost and can tolerate job interruption and capacity variability. It’s a complement to Reserved and On‑Demand, not a replacement.
How to Request a 3‑Month+ Reserved Commitment
Here’s a step‑by‑step path you can follow from “we think we need Reserved” to having a concrete proposal with discount estimates.
Step 1: Baseline your GPU needs in On‑Demand
Before you commit, get real usage data.
- Run your workloads on On‑Demand for a short period (e.g., 1–4 weeks).
- Capture:
- GPU types used (e.g., number of H100 80GB vs A100 80GB)
- Average concurrency (how many GPUs in parallel)
- Average hours per day/week
- Regions/providers used
This gives you a realistic “floor” of capacity you know you’ll use for at least 3 months.
Step 2: Estimate your monthly consumption
Turn that baseline into a simple table:
- For each GPU SKU:
- Average GPUs in use (concurrent)
- Average hours per month (e.g., 720 for 24×7)
- Regions where you must have capacity
This is effectively your Reserved ask:
“We need N × H100 in region X, running Y hours/month, for at least 3 months.”
Step 3: Map to a commitment term (3–12 months)
Decide how confident you are:
- 3 months:
Good for fast‑moving teams or evolving research roadmaps. Lower commitment, smaller discount than longer terms, but more flexibility. - 6–12 months:
Better for stable, ongoing workloads with predictable demand and budget. Higher commitment; potential for deeper volume discounts.
Remember: Reserved terms on VESSL start at 3 months. If you expect your usage to grow, you can:
- Commit to the “baseline” you’re sure of now.
- Plan to top up with On‑Demand or adjust the Reserved plan later.
Step 4: Contact Sales with a clear capacity brief
Once you have your baseline and term, reach out to VESSL’s team.
Include:
- Workload overview:
LLM post‑training, Physical AI, AI for Science, etc. - GPU requirements:
- GPU SKUs: A100/H100/H200/B200/GB200/B300 (and counts)
- Expected concurrency and hours/month
- Regions/providers you care about
- Target term:
3, 6, 12+ months. - Operational requirements:
- Production SLAs
- Maintenance windows
- Any compliance constraints (e.g., data locality)
You can start this process via the “Contact Sales” or “Talk to Sales” paths on the VESSL site.
How to Roughly Estimate Your Reserved Discount
Exact discount percentages are part of the commercial proposal and depend on your specifics, but you can sanity‑check the economics using On‑Demand pricing as a baseline.
-
Pull On‑Demand hourly rates
Check VESSL’s published hourly pricing for your target GPU SKUs. This is your reference. -
Compute your monthly On‑Demand cost
For each GPU type:On‑Demand cost = hourly rate × GPUs × hours/month -
Apply a plausible discount range
Reserved plans include volume discounts. While the exact number comes from Sales, you can:- Assume a conservative single‑digit to low‑double‑digit % reduction for smaller commitments.
- Expect deeper discounts as:
- Term length increases
- Monthly spend and GPU count grow
-
Compare to your budget & risk tolerance
- If you’re running 24×7 production on H100/B200‑class GPUs, even moderate discounts can save significantly over 3–12 months.
- If your usage is sporadic, staying On‑Demand for now might be cleaner; you can switch part of your load to Reserved once patterns stabilize.
Final Verdict
If your team is planning serious, repeatable GPU workloads and wants to stop worrying about capacity shocks, Reserved should be your default planning anchor:
- Use On‑Demand to discover your usage pattern and bridge gaps.
- Use Reserved to lock in the baseline you know you’ll need for 3+ months, with guaranteed capacity, dedicated support, and volume discounts.
- Use Spot to offload experimental and interruptible work at the lowest cost.
The key move is to turn vague “we’ll probably need a lot of H100s” into a concrete capacity brief: GPU SKUs, hours, regions, and term. From there, VESSL’s team can give you clear Reserved pricing and show exactly how much you save versus staying 100% On‑Demand.