
How do I run DeepSeek-R1 on SambaNova Cloud, and what model name do I use in the API request?
Most teams discover DeepSeek-R1 the same way: you see the reasoning and coding performance, then hit a wall trying to serve it with acceptable latency and cost. SambaNova Cloud is designed to run models like DeepSeek-R1 at scale, using our RDU-based inference stack to keep throughput high—even for multi-step agentic workflows.
Quick Answer: You can run DeepSeek-R1 on SambaNova Cloud using OpenAI-compatible APIs. In your API request, you’ll set the
modelfield to the specific DeepSeek-R1 model name exposed in your SambaNova Cloud account (for example, adeepseek-r1ordeepseek-r1-<variant>identifier shown in your dashboard or docs).
The Quick Overview
- What It Is: DeepSeek-R1 is a 671B-parameter reasoning and coding model that SambaNova runs on RDUs for high-throughput, low-latency inference, achieving up to 200 tokens/second (as independently measured by Artificial Analysis).
- Who It Is For: Platform teams, infra leads, and developers who want DeepSeek-R1 performance with OpenAI-compatible APIs and predictable, rack-scale inference characteristics.
- Core Problem Solved: It removes the pain of serving a massive reasoning model on “one-model-per-node” GPU clusters, letting you run DeepSeek-R1 as part of multi-model, agentic workflows without re-architecting your app.
How It Works
On SambaNova Cloud, DeepSeek-R1 runs on our Reconfigurable Dataflow Unit (RDU) hardware inside a full-stack inference environment (SambaStack). You don’t manage chips directly; you call an OpenAI-style endpoint, and SambaNova’s stack handles:
- Model loading and caching in a three-tier memory hierarchy
- High-throughput token generation (up to 200 tokens/second for DeepSeek-R1)
- Autoscaling, load balancing, and monitoring via SambaOrchestrator
From your perspective, it behaves like an OpenAI model: you send an HTTP request with a model name, your prompt, and configuration parameters; SambaNova Cloud returns DeepSeek-R1 completions or chat responses.
End-to-end flow
-
Account & Access Setup:
Sign up for SambaNova Cloud, get API keys, and verify that DeepSeek-R1 is enabled for your tenant (or request access from sales/support if needed). -
Model Selection & Naming:
In the SambaNova Cloud UI or docs, locate the exact model identifier for DeepSeek-R1 (e.g.,deepseek-r1or a variant likedeepseek-r1-671b). This is the value you use in the APImodelfield. -
Integration via OpenAI-Compatible APIs:
Use your existing OpenAI client libraries or HTTP code, swap the base URL and API key for SambaNova Cloud, and replace themodelvalue with the DeepSeek-R1 identifier. No new SDK or interface required.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| OpenAI-compatible API for DeepSeek-R1 | Exposes DeepSeek-R1 behind /v1/chat/completions and similar endpoints | Port existing OpenAI apps in minutes without rewriting your stack |
| RDU-accelerated DeepSeek-R1 inference | Runs DeepSeek-R1 on SambaNova RDUs with custom dataflow + tiered memory | High throughput (up to 200 tokens/sec) and better tokens-per-watt |
| Model bundling for agentic workflows | Lets you pair DeepSeek-R1 with other models (e.g., gpt-oss-120b, Llama) on one node | Faster, cheaper multi-step agents without cross-endpoint latency |
Step-by-Step: Running DeepSeek-R1 on SambaNova Cloud
Below is the typical process a platform team will follow. Adjust for your language stack and environment.
1. Get SambaNova Cloud access and API keys
- Go to https://sambanova.ai and navigate to SambaCloud.
- Create or sign in to your account.
- Generate an API key from your account’s “API Keys” or “Security” section.
- Confirm that DeepSeek models are available in your region/plan; if not, contact SambaNova to enable access.
Store the API key securely (environment variables or your secrets manager).
export SAMBANOVA_API_KEY="your_api_key_here"
export SAMBANOVA_BASE_URL="https://api.sambanova.ai/v1"
(Use the exact base URL provided in your SambaNova Cloud docs; the above is illustrative.)
2. Find the precise DeepSeek-R1 model name
Inside SambaNova Cloud, you’ll see a Models or Catalog view listing available models, including DeepSeek. Typical patterns you’ll see:
- A generic name like
deepseek-r1 - Or a versioned/variant name like
deepseek-r1-671bordeepseek-r1-reasoning
Use that exact string as the model value in your requests. If your tenant exposes multiple DeepSeek variants, pick the one aligned with your workload (e.g., reasoning-heavy vs. balanced).
If you’re unsure which model name to use:
- Check the API reference section of SambaNova Cloud
- Or reach out to support with: “What is the correct model name for DeepSeek-R1 in my region/plan?”
3. Call DeepSeek-R1 using an OpenAI-style client
Example: Python (OpenAI client pointed at SambaNova Cloud)
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["SAMBANOVA_API_KEY"],
base_url=os.environ.get("SAMBANOVA_BASE_URL", "https://api.sambanova.ai/v1")
)
response = client.chat.completions.create(
model="deepseek-r1", # replace with the exact name from your SambaNova Cloud UI
messages=[
{"role": "system", "content": "You are a precise reasoning assistant."},
{"role": "user", "content": "Explain how to design a resilient agentic workflow for customer support."}
],
temperature=0.2,
max_tokens=512
)
print(response.choices[0].message.content)
Only three things differ from a typical OpenAI call:
base_urlis pointed at SambaNova Cloud.api_keyis your SambaNova API key.modeluses the DeepSeek-R1 identifier from your SambaNova account.
Example: Raw curl request
curl -X POST "$SAMBANOVA_BASE_URL/chat/completions" \
-H "Authorization: Bearer $SAMBANOVA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1",
"messages": [
{"role": "system", "content": "You are an expert coding assistant."},
{"role": "user", "content": "Write a robust Python function to parse large JSON logs incrementally."}
],
"temperature": 0.1,
"max_tokens": 400
}'
The response structure mirrors OpenAI’s, so downstream app logic (parsing choices, reading message.content, etc.) can remain unchanged.
4. Use DeepSeek-R1 in multi-model/agentic workflows
SambaStack is designed for model bundling and one-node multi-model workflows. A typical pattern:
- Use
gpt-oss-120b(running at >600 tokens/second on RDUs) for fast intent classification, tool selection, or simple questions. - Escalate complex reasoning, coding, or math tasks to
deepseek-r1. - Keep both models hot in RDU tiered memory, minimizing model-switch latency within the same node.
In code, that looks like swapping model based on your agent decision logic:
def route_request(intent: str, messages):
if intent in ("simple_qna", "summary"):
model = "gpt-oss-120b"
else:
model = "deepseek-r1"
return client.chat.completions.create(
model=model,
messages=messages,
temperature=0.2,
max_tokens=512
)
You get the performance of a specialized reasoning model without stitching calls across different vendors or clusters.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| DeepSeek-R1 on RDUs | Runs 671B-parameter DeepSeek-R1 on SambaNova RDUs | Up to 200 tokens/second (independent measurement by Artificial Analysis) |
| OpenAI-compatible integration | Uses the same /chat/completions and model semantics you already know | Port in minutes; reduce integration risk and migration time |
| Agentic workflow support | Efficiently switches between DeepSeek-R1 and other models on one node | Lower latency across multi-step tasks; fewer cross-cluster hops |
Ideal Use Cases
-
Best for deep reasoning and coding agents:
Because DeepSeek-R1 is optimized for coding, mathematics, and complex reasoning, running it on RDUs lets you sustain high throughput for evaluation loops, code review bots, and planning agents where each request may generate hundreds of tokens. -
Best for hybrid “fast + smart” workflows:
Because SambaStack supports model bundling, you can pair DeepSeek-R1 with faster models like gpt-oss-120b, routing only the hardest tasks to DeepSeek-R1. This keeps average latency and cost under control while still unlocking top-tier reasoning quality where it matters.
Limitations & Considerations
-
Model name is tenant- and region-aware:
The exact DeepSeek-R1modelidentifier can vary by environment or rollout stage. Always use the name shown in your SambaNova Cloud UI or API docs rather than guessing. If the call fails with “model not found,” verify your model string and region. -
Throughput tuning still matters:
While DeepSeek-R1 runs at up to 200 tokens/second on RDUs, your observed performance will depend on batch size, context length, and concurrent load. Use SambaOrchestrator metrics and autoscaling to right-size capacity for your production workload.
Pricing & Plans
SambaNova offers multiple ways to consume DeepSeek-R1, depending on whether you want managed cloud access or infrastructure on-prem / in your own data center.
Typical patterns:
-
Usage-based SambaCloud access:
Pay per token or per capacity unit for DeepSeek-R1 and other models, ideal for teams that want to “Start building in minutes” without managing hardware. Pricing is aligned with high-efficiency RDUs to drive better tokens-per-dollar for reasoning-heavy workloads. -
Dedicated capacity or rack deployments (SambaRack + SN50/SN40L-16):
For large-scale or sovereign AI needs, you can deploy SambaRack systems with SN50 RDUs for “fast agentic inference at a fraction of the cost” or SN40L-16 for “low power inference (average of 10 kWh).” DeepSeek-R1 can then be exposed through your own SambaStack + SambaOrchestrator environment.
Talk with SambaNova sales to match a DeepSeek-R1 plan to your scale, compliance, and deployment-model requirements.
- Cloud Usage Plan: Best for teams wanting to experiment and launch production workloads without owning hardware, with flexible scaling and OpenAI-compatible APIs.
- Dedicated / Sovereign Plan: Best for enterprises and governments needing data residency, full control over infrastructure, and sustained high-volume DeepSeek-R1 usage.
Frequently Asked Questions
What model name should I use in my API request to run DeepSeek-R1?
Short Answer: Use the exact DeepSeek-R1 model identifier shown in your SambaNova Cloud model catalog (for example, deepseek-r1 or deepseek-r1-671b), and set it in the model field of your API request.
Details:
SambaNova exposes DeepSeek-R1 as one of the available models in your account. While examples often use deepseek-r1 as the placeholder, your tenant may expose a slightly different name or variant (e.g., versioned tags or regional suffixes). In the Cloud UI, go to the Models section, copy the DeepSeek-R1 identifier, and paste that string into the model parameter in your OpenAI-style calls (chat.completions, completions, etc.). If you don’t see DeepSeek-R1 listed, contact support to enable it.
How fast will DeepSeek-R1 run on SambaNova Cloud compared to GPUs?
Short Answer: On SambaNova RDUs, DeepSeek-R1 achieves up to 200 tokens/second, as independently measured by Artificial Analysis, giving you frontier-scale reasoning throughput suitable for near real-time agentic workflows.
Details:
DeepSeek-R1 is a 671B-parameter model, so serving it efficiently is non-trivial on traditional GPU clusters. SambaNova’s RDUs combine custom dataflow technology with a three-tier memory architecture that keeps models and prompts hot, reducing data movement and maximizing tokens-per-watt. Independent tests show DeepSeek-R1 running at up to 200 tokens/second on SambaNova RDUs, which translates to lower latency per request and higher sustained throughput for multi-step agents, evaluations, and code-generation workloads. Your actual numbers will reflect your prompt lengths, concurrency, and autoscaling configuration, but the system is purpose-built for this class of model.
Summary
Running DeepSeek-R1 on SambaNova Cloud is straightforward: get your API key, locate the DeepSeek-R1 model name in your SambaNova account, and plug that identifier into the model field of an OpenAI-compatible API call. Behind that simple interface, SambaStack and RDUs handle the heavy lifting—dataflow execution, three-tier memory, autoscaling—so you can focus on building reasoning- and coding-heavy agents, not on wrestling with “one-model-per-node” infrastructure.
For teams already on OpenAI-style interfaces, the migration is mostly a URL and key change. For teams pushing agentic workloads to their limits, SambaNova’s chips-to-model computing gives DeepSeek-R1 the throughput and efficiency it needs to be a practical production dependency.