
How do I run DeepSeek-R1 on SambaNova Cloud, and what model name do I use in the API request?
Most teams coming to SambaNova Cloud with DeepSeek-R1 in mind want two things: a drop‑in, OpenAI‑style API experience and predictable, high-throughput inference for complex reasoning workloads. SambaNova is built exactly for that pattern—agentic loops, long prompts, and multi-step calls—without forcing you into one-model-per-node constraints.
Quick Answer: You run DeepSeek-R1 on SambaNova Cloud through our OpenAI-compatible APIs (chat/completions) and reference the published DeepSeek-R1 model name in the
modelfield of your request. Once your SambaCloud account is enabled, you can start calling DeepSeek-R1 in minutes using the same client libraries you already use for OpenAI.
The Quick Overview
- What It Is: DeepSeek-R1 is a 671B-parameter, reasoning-focused model that SambaNova runs on RDUs for high-speed, efficient inference—up to 200 tokens/second as independently measured by Artificial Analysis.
- Who It Is For: Platform and infra teams who need scalable, cost-efficient reasoning and coding workloads—agent loops, tool-use, code gen—exposed over OpenAI-compatible APIs.
- Core Problem Solved: Running frontier-scale reasoning models without blowing up latency, power, or per-token cost, and without rewriting your existing OpenAI-based applications.
How It Works
On SambaNova Cloud, DeepSeek-R1 runs on Reconfigurable Dataflow Units (RDUs) with a three-tier memory architecture tuned for agentic inference. You access it through OpenAI-compatible endpoints, which means:
- Same core API patterns you use today (
chat.completions,completions, streaming). - You simply swap your base URL and API key, then point to the DeepSeek-R1 model name SambaNova exposes.
- SambaStack handles model bundling and memory management so prompts, intermediate reasoning, and additional models in your workflow can run efficiently on the same node.
From an operator’s perspective, you get rack-scale performance and tokens-per-watt efficiency; from a developer’s perspective, you get “just works like OpenAI” integration.
-
Account & Access Setup:
- Request access to SambaNova Cloud and obtain your API key.
- Configure your client to use the SambaCloud base URL (e.g.,
https://api.sambanova.aior environment-specific endpoint).
-
Select the DeepSeek-R1 Model Name:
- SambaNova publishes a specific model identifier for DeepSeek-R1 (for example:
deepseek-r1or a versioned variant). - You pass that model name in the
modelfield of your OpenAI-compatible request. This tells SambaStack to route your call to the DeepSeek-R1 deployment running on RDUs.
- SambaNova publishes a specific model identifier for DeepSeek-R1 (for example:
-
Send OpenAI-Compatible Requests:
- Use standard OpenAI clients or raw HTTP calls.
- For agentic workflows, combine DeepSeek-R1 calls with tools or other models; SambaStack can execute multi-step chains on the same infrastructure for lower latency and better throughput.
Model name usage pattern:
model: "<deepseek-r1-model-name-published-in-your-sambanova-cloud-account>"
Check your SambaCloud console or docs for the exact string to copy; this ensures you use the latest supported build and configuration.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| OpenAI-Compatible API | Exposes DeepSeek-R1 via chat/completions and related endpoints using standard OpenAI request/response formats. | Port existing apps in minutes without changing business logic or SDKs. |
| RDU-Optimized DeepSeek-R1 | Runs the 671B-parameter DeepSeek-R1 on SambaNova RDUs with custom dataflow and tiered memory. | Up to 200 tokens/second (Artificial Analysis) for fast reasoning and coding workloads. |
| Model Bundling & Agentic Support | Lets DeepSeek-R1 participate in multi-model, multi-step workflows on a single node. | Lower latency for agent loops, less overhead from switching models and endpoints. |
Ideal Use Cases
- Best for complex agentic reasoning: Because DeepSeek-R1 is tuned for coding, mathematics, and multi-step reasoning, and SambaNova’s architecture keeps prompts and models hot in the RDU memory hierarchy for fast, iterative loops.
- Best for cost-efficient frontier inference: Because RDUs are optimized for tokens-per-watt and SambaStack minimizes excess data movement, making frontier-scale reasoning more affordable at production scale.
Limitations & Considerations
-
Exact model name is environment-specific:
The literalmodelstring you should use (e.g.,deepseek-r1,deepseek-r1-671b, or versioned variants) is published in your SambaCloud UI or documentation. Always copy the name from there to avoid 404 ormodel_not_founderrors. -
Feature set aligns with OpenAI baseline, not every niche extension:
The APIs are OpenAI-compatible for core patterns, but if you depend on a very new, OpenAI-specific beta feature, validate support in SambaNova’s docs before assuming it’s available with DeepSeek-R1.
Pricing & Plans
SambaNova Cloud is designed for inference at scale, with pricing optimized around high-throughput, production workloads rather than small, ad-hoc experimentation.
- Usage is typically metered on tokens and/or capacity, with economics shaped by RDUs’ efficiency (tokens-per-watt) and throughput.
- DeepSeek-R1 benefits from SambaNova’s ability to run large models efficiently—up to 200 tokens/second as measured independently by Artificial Analysis—so your effective cost per unit of useful work is lower.
Exact pricing, committed-use options, and volume discounts are provided by the sales team.
- On-Demand / Pay-As-You-Go: Best for teams needing to experiment or ramp up workloads without long-term commitments, while keeping an OpenAI-compatible interface.
- Committed / Enterprise Plans: Best for platform teams standardizing on SambaNova for large-scale, agentic inference who need predictable spend, SLAs, and potentially sovereign or hybrid deployments.
Frequently Asked Questions
What model name do I actually put in the model field for DeepSeek-R1?
Short Answer: Use the DeepSeek-R1 model identifier exactly as it appears in your SambaNova Cloud console or API docs (for example, a name such as deepseek-r1 or deepseek-r1-671b).
Details:
SambaNova exposes DeepSeek-R1 under a specific, possibly versioned model name. This lets us:
- Track which build and configuration of DeepSeek-R1 you’re using.
- Roll out performance improvements or fixes without breaking your applications.
- Offer multiple variants if needed (e.g., base vs. instruction-tuned, or different context lengths).
When you’re in SambaCloud:
- Open the Models or API documentation section.
- Find DeepSeek-R1 in the supported models list.
- Copy the exact
modelstring shown there into your code.
Your request will look conceptually like:
from openai import OpenAI
client = OpenAI(
base_url="https://api.sambanova.ai/v1",
api_key="YOUR_SAMBANOVA_API_KEY",
)
response = client.chat.completions.create(
model="deepseek-r1", # use the exact name from your SambaCloud console
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to sort a list of dicts by date."},
],
temperature=0.2,
stream=False,
)
print(response.choices[0].message.content)
Replace "deepseek-r1" with the exact identifier shown in your account.
How do I switch an existing OpenAI-based app to use DeepSeek-R1 on SambaNova Cloud?
Short Answer: Change the base URL and API key to SambaNova, then swap the model name to the DeepSeek-R1 identifier published in SambaCloud; your request structure can stay the same.
Details:
Because SambaNova’s APIs are OpenAI-compatible, migration is straightforward:
-
Update configuration:
- Set
base_urlto the SambaNova Cloud endpoint. - Set
api_keyto your SambaNova key.
- Set
-
Swap model name:
- Replace your existing
modelvalue (e.g.,gpt-4o) with the DeepSeek-R1 identifier from SambaCloud (e.g.,deepseek-r1).
- Replace your existing
-
Validate behavior:
- Run your existing tests and prompts.
- For agentic flows and tools, keep the same orchestration logic; DeepSeek-R1 should slot into your existing chain.
-
Optimize for DeepSeek-R1:
- Take advantage of its reasoning strengths by using clearer instructions and allowing enough tokens for chain-of-thought style reasoning, where appropriate.
- Because SambaNova RDUs can sustain up to ~200 tokens/second on DeepSeek-R1 (Artificial Analysis), you can safely allow slightly longer reasoning in latency-sensitive contexts than you might on slower infrastructure.
No new SDKs or bespoke client libraries are required; the OpenAI pattern is preserved.
Summary
DeepSeek-R1 on SambaNova Cloud gives you frontier-scale reasoning with infrastructure designed specifically for agentic inference. You access it through OpenAI-compatible APIs, so switching from an existing OpenAI-based workflow is largely a matter of changing your base URL, API key, and the model field to the DeepSeek-R1 name published in your SambaCloud account.
Under the hood, RDUs, dataflow execution, and a three-tier memory architecture keep DeepSeek-R1 running efficiently—up to 200 tokens/second as independently measured—so your agent loops, coding assistants, and reasoning-heavy applications can scale without runaway costs or latency.