How do I run DeepSeek-R1 on SambaNova Cloud, and what model name do I use in the API request?

Most teams discover DeepSeek-R1 the same way: you see the reasoning and coding performance, then hit a wall trying to serve it with acceptable latency and cost. SambaNova Cloud is designed to run models like DeepSeek-R1 at scale, using our RDU-based inference stack to keep throughput high—even for multi-step agentic workflows.

Quick Answer: You can run DeepSeek-R1 on SambaNova Cloud using OpenAI-compatible APIs. In your API request, you’ll set the model field to the specific DeepSeek-R1 model name exposed in your SambaNova Cloud account (for example, a deepseek-r1 or deepseek-r1-<variant> identifier shown in your dashboard or docs).

The Quick Overview

What It Is: DeepSeek-R1 is a 671B-parameter reasoning and coding model that SambaNova runs on RDUs for high-throughput, low-latency inference, achieving up to 200 tokens/second (as independently measured by Artificial Analysis).
Who It Is For: Platform teams, infra leads, and developers who want DeepSeek-R1 performance with OpenAI-compatible APIs and predictable, rack-scale inference characteristics.
Core Problem Solved: It removes the pain of serving a massive reasoning model on “one-model-per-node” GPU clusters, letting you run DeepSeek-R1 as part of multi-model, agentic workflows without re-architecting your app.

How It Works

On SambaNova Cloud, DeepSeek-R1 runs on our Reconfigurable Dataflow Unit (RDU) hardware inside a full-stack inference environment (SambaStack). You don’t manage chips directly; you call an OpenAI-style endpoint, and SambaNova’s stack handles:

Model loading and caching in a three-tier memory hierarchy
High-throughput token generation (up to 200 tokens/second for DeepSeek-R1)
Autoscaling, load balancing, and monitoring via SambaOrchestrator

From your perspective, it behaves like an OpenAI model: you send an HTTP request with a model name, your prompt, and configuration parameters; SambaNova Cloud returns DeepSeek-R1 completions or chat responses.

End-to-end flow

Account & Access Setup:
Sign up for SambaNova Cloud, get API keys, and verify that DeepSeek-R1 is enabled for your tenant (or request access from sales/support if needed).
Model Selection & Naming:
In the SambaNova Cloud UI or docs, locate the exact model identifier for DeepSeek-R1 (e.g., deepseek-r1 or a variant like deepseek-r1-671b). This is the value you use in the API model field.
Integration via OpenAI-Compatible APIs:
Use your existing OpenAI client libraries or HTTP code, swap the base URL and API key for SambaNova Cloud, and replace the model value with the DeepSeek-R1 identifier. No new SDK or interface required.

Features & Benefits Breakdown

Core Feature	What It Does	Primary Benefit
OpenAI-compatible API for DeepSeek-R1	Exposes DeepSeek-R1 behind `/v1/chat/completions` and similar endpoints	Port existing OpenAI apps in minutes without rewriting your stack
RDU-accelerated DeepSeek-R1 inference	Runs DeepSeek-R1 on SambaNova RDUs with custom dataflow + tiered memory	High throughput (up to 200 tokens/sec) and better tokens-per-watt
Model bundling for agentic workflows	Lets you pair DeepSeek-R1 with other models (e.g., gpt-oss-120b, Llama) on one node	Faster, cheaper multi-step agents without cross-endpoint latency

Step-by-Step: Running DeepSeek-R1 on SambaNova Cloud

Below is the typical process a platform team will follow. Adjust for your language stack and environment.

1. Get SambaNova Cloud access and API keys

Go to https://sambanova.ai and navigate to SambaCloud.
Create or sign in to your account.
Generate an API key from your account’s “API Keys” or “Security” section.
Confirm that DeepSeek models are available in your region/plan; if not, contact SambaNova to enable access.

Store the API key securely (environment variables or your secrets manager).

export SAMBANOVA_API_KEY="your_api_key_here"
export SAMBANOVA_BASE_URL="https://api.sambanova.ai/v1"

(Use the exact base URL provided in your SambaNova Cloud docs; the above is illustrative.)

2. Find the precise DeepSeek-R1 model name

Inside SambaNova Cloud, you’ll see a Models or Catalog view listing available models, including DeepSeek. Typical patterns you’ll see:

A generic name like deepseek-r1
Or a versioned/variant name like deepseek-r1-671b or deepseek-r1-reasoning

Use that exact string as the model value in your requests. If your tenant exposes multiple DeepSeek variants, pick the one aligned with your workload (e.g., reasoning-heavy vs. balanced).

If you’re unsure which model name to use:

Check the API reference section of SambaNova Cloud
Or reach out to support with: “What is the correct model name for DeepSeek-R1 in my region/plan?”

3. Call DeepSeek-R1 using an OpenAI-style client

Example: Python (OpenAI client pointed at SambaNova Cloud)

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["SAMBANOVA_API_KEY"],
    base_url=os.environ.get("SAMBANOVA_BASE_URL", "https://api.sambanova.ai/v1")
)

response = client.chat.completions.create(
    model="deepseek-r1",  # replace with the exact name from your SambaNova Cloud UI
    messages=[
        {"role": "system", "content": "You are a precise reasoning assistant."},
        {"role": "user", "content": "Explain how to design a resilient agentic workflow for customer support."}
    ],
    temperature=0.2,
    max_tokens=512
)

print(response.choices[0].message.content)

Only three things differ from a typical OpenAI call:

base_url is pointed at SambaNova Cloud.
api_key is your SambaNova API key.
model uses the DeepSeek-R1 identifier from your SambaNova account.

Example: Raw `curl` request

curl -X POST "$SAMBANOVA_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $SAMBANOVA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1",
    "messages": [
      {"role": "system", "content": "You are an expert coding assistant."},
      {"role": "user", "content": "Write a robust Python function to parse large JSON logs incrementally."}
    ],
    "temperature": 0.1,
    "max_tokens": 400
  }'

The response structure mirrors OpenAI’s, so downstream app logic (parsing choices, reading message.content, etc.) can remain unchanged.

4. Use DeepSeek-R1 in multi-model/agentic workflows

SambaStack is designed for model bundling and one-node multi-model workflows. A typical pattern:

Use gpt-oss-120b (running at >600 tokens/second on RDUs) for fast intent classification, tool selection, or simple questions.
Escalate complex reasoning, coding, or math tasks to deepseek-r1.
Keep both models hot in RDU tiered memory, minimizing model-switch latency within the same node.

In code, that looks like swapping model based on your agent decision logic:

def route_request(intent: str, messages):
    if intent in ("simple_qna", "summary"):
        model = "gpt-oss-120b"
    else:
        model = "deepseek-r1"

    return client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.2,
        max_tokens=512
    )

You get the performance of a specialized reasoning model without stitching calls across different vendors or clusters.

Features & Benefits Breakdown

Core Feature	What It Does	Primary Benefit
DeepSeek-R1 on RDUs	Runs 671B-parameter DeepSeek-R1 on SambaNova RDUs	Up to 200 tokens/second (independent measurement by Artificial Analysis)
OpenAI-compatible integration	Uses the same `/chat/completions` and `model` semantics you already know	Port in minutes; reduce integration risk and migration time
Agentic workflow support	Efficiently switches between DeepSeek-R1 and other models on one node	Lower latency across multi-step tasks; fewer cross-cluster hops

Ideal Use Cases

Best for deep reasoning and coding agents:
Because DeepSeek-R1 is optimized for coding, mathematics, and complex reasoning, running it on RDUs lets you sustain high throughput for evaluation loops, code review bots, and planning agents where each request may generate hundreds of tokens.
Best for hybrid “fast + smart” workflows:
Because SambaStack supports model bundling, you can pair DeepSeek-R1 with faster models like gpt-oss-120b, routing only the hardest tasks to DeepSeek-R1. This keeps average latency and cost under control while still unlocking top-tier reasoning quality where it matters.

Limitations & Considerations

Model name is tenant- and region-aware:
The exact DeepSeek-R1 model identifier can vary by environment or rollout stage. Always use the name shown in your SambaNova Cloud UI or API docs rather than guessing. If the call fails with “model not found,” verify your model string and region.
Throughput tuning still matters:
While DeepSeek-R1 runs at up to 200 tokens/second on RDUs, your observed performance will depend on batch size, context length, and concurrent load. Use SambaOrchestrator metrics and autoscaling to right-size capacity for your production workload.

Pricing & Plans

SambaNova offers multiple ways to consume DeepSeek-R1, depending on whether you want managed cloud access or infrastructure on-prem / in your own data center.

Typical patterns:

Usage-based SambaCloud access:
Pay per token or per capacity unit for DeepSeek-R1 and other models, ideal for teams that want to “Start building in minutes” without managing hardware. Pricing is aligned with high-efficiency RDUs to drive better tokens-per-dollar for reasoning-heavy workloads.
Dedicated capacity or rack deployments (SambaRack + SN50/SN40L-16):
For large-scale or sovereign AI needs, you can deploy SambaRack systems with SN50 RDUs for “fast agentic inference at a fraction of the cost” or SN40L-16 for “low power inference (average of 10 kWh).” DeepSeek-R1 can then be exposed through your own SambaStack + SambaOrchestrator environment.

Talk with SambaNova sales to match a DeepSeek-R1 plan to your scale, compliance, and deployment-model requirements.

Cloud Usage Plan: Best for teams wanting to experiment and launch production workloads without owning hardware, with flexible scaling and OpenAI-compatible APIs.
Dedicated / Sovereign Plan: Best for enterprises and governments needing data residency, full control over infrastructure, and sustained high-volume DeepSeek-R1 usage.

Frequently Asked Questions

What model name should I use in my API request to run DeepSeek-R1?

Short Answer: Use the exact DeepSeek-R1 model identifier shown in your SambaNova Cloud model catalog (for example, deepseek-r1 or deepseek-r1-671b), and set it in the model field of your API request.

Details:
SambaNova exposes DeepSeek-R1 as one of the available models in your account. While examples often use deepseek-r1 as the placeholder, your tenant may expose a slightly different name or variant (e.g., versioned tags or regional suffixes). In the Cloud UI, go to the Models section, copy the DeepSeek-R1 identifier, and paste that string into the model parameter in your OpenAI-style calls (chat.completions, completions, etc.). If you don’t see DeepSeek-R1 listed, contact support to enable it.

How fast will DeepSeek-R1 run on SambaNova Cloud compared to GPUs?

Short Answer: On SambaNova RDUs, DeepSeek-R1 achieves up to 200 tokens/second, as independently measured by Artificial Analysis, giving you frontier-scale reasoning throughput suitable for near real-time agentic workflows.

Details:
DeepSeek-R1 is a 671B-parameter model, so serving it efficiently is non-trivial on traditional GPU clusters. SambaNova’s RDUs combine custom dataflow technology with a three-tier memory architecture that keeps models and prompts hot, reducing data movement and maximizing tokens-per-watt. Independent tests show DeepSeek-R1 running at up to 200 tokens/second on SambaNova RDUs, which translates to lower latency per request and higher sustained throughput for multi-step agents, evaluations, and code-generation workloads. Your actual numbers will reflect your prompt lengths, concurrency, and autoscaling configuration, but the system is purpose-built for this class of model.

Summary

Running DeepSeek-R1 on SambaNova Cloud is straightforward: get your API key, locate the DeepSeek-R1 model name in your SambaNova account, and plug that identifier into the model field of an OpenAI-compatible API call. Behind that simple interface, SambaStack and RDUs handle the heavy lifting—dataflow execution, three-tier memory, autoscaling—so you can focus on building reasoning- and coding-heavy agents, not on wrestling with “one-model-per-node” infrastructure.

For teams already on OpenAI-style interfaces, the migration is mostly a URL and key change. For teams pushing agentic workloads to their limits, SambaNova’s chips-to-model computing gives DeepSeek-R1 the throughput and efficiency it needs to be a practical production dependency.

Next Step

Get Started

How do I run DeepSeek-R1 on SambaNova Cloud, and what model name do I use in the API request?

The Quick Overview

How It Works

End-to-end flow

Features & Benefits Breakdown

Step-by-Step: Running DeepSeek-R1 on SambaNova Cloud

1. Get SambaNova Cloud access and API keys

2. Find the precise DeepSeek-R1 model name

3. Call DeepSeek-R1 using an OpenAI-style client

Example: Python (OpenAI client pointed at SambaNova Cloud)

Example: Raw `curl` request

4. Use DeepSeek-R1 in multi-model/agentic workflows

Features & Benefits Breakdown

Ideal Use Cases

Limitations & Considerations

Pricing & Plans

Frequently Asked Questions

What model name should I use in my API request to run DeepSeek-R1?

How fast will DeepSeek-R1 run on SambaNova Cloud compared to GPUs?

Summary

Next Step

Keep Reading

More from AI Inference Acceleration

Who are SambaNova’s sovereign/in-country deployment partners (EU/UK/AU) and how do we engage them for procurement?

What does SambaNova SambaStack + SambaOrchestrator include, and how do we evaluate it for autoscaling and multi-model routing?

How do I configure multi-model routing on SambaNova so an agent can switch between DeepSeek and Llama during a workflow?

How do I run DeepSeek-R1 on SambaNova Cloud, and what model name do I use in the API request?

The Quick Overview

How It Works

End-to-end flow

Features & Benefits Breakdown

Step-by-Step: Running DeepSeek-R1 on SambaNova Cloud

1. Get SambaNova Cloud access and API keys

2. Find the precise DeepSeek-R1 model name

3. Call DeepSeek-R1 using an OpenAI-style client

Example: Python (OpenAI client pointed at SambaNova Cloud)

Example: Raw curl request

4. Use DeepSeek-R1 in multi-model/agentic workflows

Features & Benefits Breakdown

Ideal Use Cases

Limitations & Considerations

Pricing & Plans

Frequently Asked Questions

What model name should I use in my API request to run DeepSeek-R1?

How fast will DeepSeek-R1 run on SambaNova Cloud compared to GPUs?

Summary

Next Step

Keep Reading

More from AI Inference Acceleration

Who are SambaNova’s sovereign/in-country deployment partners (EU/UK/AU) and how do we engage them for procurement?

What does SambaNova SambaStack + SambaOrchestrator include, and how do we evaluate it for autoscaling and multi-model routing?

How do I configure multi-model routing on SambaNova so an agent can switch between DeepSeek and Llama during a workflow?

Example: Raw `curl` request