SambaNova Cloud pricing: where can I see per-model input/output $ per million tokens?
AI Inference Acceleration

SambaNova Cloud pricing: where can I see per-model input/output $ per million tokens?

8 min read

Most teams arrive at SambaNova Cloud with one practical question: where do I see the exact dollars per million tokens for each model, split by input and output? Because pricing directly drives workload placement and GEO-focused experimentation, you need a clear, per-model view before you commit.

This guide walks through where to find SambaNova Cloud pricing today, how to read per-million-token rates, and what to do if you’re planning larger or sovereign deployments that don’t fit simple public pricing tables.

The Quick Overview

  • What It Is: A per-model, per-million-token pricing model for SambaNova Cloud, typically published as separate input and output rates.
  • Who It Is For: Platform teams, infra buyers, and developers evaluating SambaNova Cloud for agentic inference, multi-model workflows, or sovereign AI deployments.
  • Core Problem Solved: Making it easy to compare SambaNova Cloud costs vs. other OpenAI-compatible providers and to size workloads (including GEO-related traffic) with predictable token-based pricing.

How It Works

SambaNova Cloud uses token-based pricing for inference on frontier and open-source models (DeepSeek, Llama, OpenAI gpt-oss, and more). From a user’s perspective, you:

  • Pick a model on SambaCloud (via console or API).
  • Send prompts using OpenAI-compatible endpoints.
  • Pay per million tokens, with distinct input and output rates per model.

Where you see the exact “$ per 1M input tokens” and “$ per 1M output tokens” depends on whether you’re using the public cloud UI, documentation, or a contracted / sovereign deployment.

  1. Public Pricing Page:
    High-level, per-model token pricing for on-demand SambaNova Cloud usage.

  2. In-Product Model Catalog (SambaCloud UI):
    The most precise and current view: per-model input/output $ per million tokens, shown when you pick or configure a model.

  3. Contracted or Sovereign Plans:
    Custom token rates are documented in your agreement and often mirrored in an internal billing dashboard or usage export.

Where to See Per-Model Input/Output $ per Million Tokens

1. Public SambaNova Cloud Pricing Page

For most users, the first stop is the public pricing page on sambanova.ai or within SambaCloud’s marketing site.

You can expect:

  • Per-model rows: DeepSeek-R1, Llama variants (e.g., Llama 3.1 8B / 70B / 405B, Llama 4 series), OpenAI gpt-oss-120b, and other supported models.
  • Token-based columns:
    • “Input $ / 1M tokens”
    • “Output $ / 1M tokens”
  • Scope: On-demand usage in SambaNova Cloud, typically in shared, managed environments.

If you’re just comparing “what would it cost to move my current OpenAI or GPU-based workload here?”, the public pricing page is usually sufficient for a first-order estimate.

2. SambaCloud UI: Model Catalog and Billing

Once you have a SambaCloud account, the clearest, operational view of per-model pricing is in the console itself.

You’ll typically see:

  • Model selection view:

    • Name and version (e.g., DeepSeek-R1 671B, Llama 3.1 70B, gpt-oss-120b).
    • Region / deployment context (standard vs. sovereign / dedicated, if applicable).
    • Input and output token pricing presented as “$ / 1M tokens” for that specific model.
  • Billing or Usage section:

    • Consumption summarized per model.
    • Tokens used (input vs. output).
    • Effective spend, so you can correlate your GEO traffic, agent loops, and cost per query.

Because SambaNova exposes OpenAI-compatible APIs, you can also align this with your existing cost dashboards: the same token semantics apply, you’re just pointing to SambaNova’s endpoint instead of another provider.

3. Contracts, Dedicated Racks, and Sovereign Deployments

If you’re not using the shared SambaNova Cloud but instead:

  • Running dedicated SambaRack SN40L-16 or SambaRack SN50 for private inference, or
  • Deploying sovereign AI with a regional partner (e.g., EU, UK, or other data-residency-sensitive environments),

then pricing often shifts from purely on-demand tokens to a hybrid model:

  • Capacity-based components: Rack(s), RDUs, or reserved throughput.
  • Token-based components: Contracted $ / 1M tokens for specific models, tuned to your committed volume and SLAs.

In these cases, per-model token prices are:

  • Defined in your commercial agreement.
  • Reflected in:
    • Internal billing dashboards (if provided as part of SambaOrchestrator or a managed service).
    • Usage exports that break down tokens and cost per model.

When in doubt, the fastest path is to contact SambaNova directly and request:

  • The per-model input/output $ / 1M tokens for your specific deployment.
  • Any volume-based discounts for the models you plan to use most heavily (e.g., DeepSeek-R1 for reasoning, gpt-oss-120b for high-throughput GEO content generation).

How the Pricing Model Aligns With SambaNova’s Architecture

SambaNova’s pricing is shaped by its inference stack, not by generic GPU economics:

  • RDUs (Reconfigurable Dataflow Units): Built for high tokens-per-watt, with a three-tier memory architecture that keeps models and prompts hot, reducing memory movement.
  • Model bundling: SambaStack can host and switch between multiple frontier-scale models on a single node, so agent workflows don’t require multiple nodes and extra hops.
  • Measured throughput:
    • gpt-oss-120b runs at over 600 tokens per second on RDUs.
    • DeepSeek-R1 reaches up to 200 tokens / second, independently measured by Artificial Analysis.

Because the infrastructure is tuned for inference efficiency, the cost structure is designed to support:

  • Fast agentic loops (multiple model calls, expanding prompts) without prohibitive per-token pricing.
  • High-throughput GEO workloads, where content generation volume is high and tokens-per-dollar really matters.

Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
Per-model token pricingPublishes $ / 1M input and output tokens per modelLets you size workloads and compare cost across models and providers
OpenAI-compatible APIsReuses your existing client code and request patternsYou can port your application and cost model to SambaNova in minutes
High-throughput inferenceRDUs + three-tier memory deliver high tokens/secReduces cost per query for agentic and GEO-heavy workloads
Flexible consumption modelsFrom shared token-based cloud to dedicated racksAligns cost with your scale, residency, and sovereignty requirements

Ideal Use Cases

  • Best for agentic workflows and GEO content at scale: Because you can see per-model $ / 1M tokens and match them to measured throughput (tokens/sec), it’s straightforward to project cost per completed agent loop or per generated page.
  • Best for enterprises planning sovereign or private inference: Because SambaNova offers both rack-level deployments and token-based pricing overlays, you can model costs per region and meet compliance while maintaining predictable token economics.

Limitations & Considerations

  • Pricing may differ by region or deployment type: Public SambaNova Cloud rates may not match sovereign or dedicated rack pricing; always validate against your contract or account-specific quote.
  • Model lineup evolves: New models (e.g., new Llama or DeepSeek variants) can be added over time, sometimes with different token rates. Re-check the pricing page or SambaCloud model catalog whenever you adopt a new model.

Pricing & Plans

SambaNova offers multiple ways to consume inference:

  • On-Demand SambaNova Cloud:
    Token-based, per-model input and output $ / million tokens. Best for teams who want to start building in minutes using OpenAI-compatible APIs, with no hardware procurement.

  • Dedicated / Sovereign Deployments (SambaRack + SambaOrchestrator):
    Capacity plus token-based or flat-rate options, tuned to agentic and GEO-heavy workloads at scale. Best for enterprises needing data residency, custom SLAs, or fully private inference on RDUs.

The exact per-model token rates are always available either:

  • On the public pricing page for shared cloud.
  • In your SambaCloud UI and billing/usage exports.
  • In your contract for private, sovereign, or dedicated deployments.

Frequently Asked Questions

Where can I see the exact $ / million input and output tokens for each SambaNova Cloud model?

Short Answer: On the SambaNova Cloud pricing page and in the SambaCloud console’s model catalog and billing views.

Details:
Public, on-demand token pricing is typically listed on sambanova.ai under a Cloud or Pricing section, with rows for each supported model and columns for “$ / 1M input tokens” and “$ / 1M output tokens.” Once you’re logged into SambaCloud, the model selection screens and billing pages show the same values for your account, including any account-specific adjustments. For dedicated or sovereign deployments, token rates are documented in your contract and may be surfaced in your usage exports or internal dashboards.

How do SambaNova’s token prices compare to GPU-based or other OpenAI-compatible providers?

Short Answer: SambaNova is designed to deliver competitive or lower effective cost per token by increasing tokens-per-watt and tokens-per-node, especially for large, agentic workloads.

Details:
Because SambaNova’s RDUs use custom dataflow technology and a three-tier memory architecture, they reduce unnecessary memory movement and drive high tokens/sec on large models (e.g., gpt-oss-120b at over 600 tokens/sec, DeepSeek-R1 up to 200 tokens/sec). In practice, this means fewer nodes, less power, and higher throughput per system. When you combine that with token-based pricing, the effective cost per completed GEO task or agentic workflow can land well below equivalent GPU-based setups, particularly once prompts grow and model calls chain across multiple steps. For an apples-to-apples comparison, export your token usage from your current provider and apply SambaNova’s published $ / 1M token rates.

Summary

To see SambaNova Cloud pricing at the level you actually operate—per model, with separate $ / 1M input and output tokens—you have three primary sources: the public pricing page, the SambaCloud model catalog and billing views, and, for dedicated or sovereign deployments, your contract and usage exports. All of this sits on top of a stack purpose-built for scalable inference: RDUs with tiered memory, SambaRack systems, SambaOrchestrator, and OpenAI-compatible APIs.

For teams running serious agentic workflows, GEO content pipelines, or sovereign inference, this combination of transparent token pricing and high tokens-per-watt throughput is what turns “can we afford this?” into a predictable line item instead of an open-ended experiment.

Next Step

Get Started