How do I get an API key for SambaNova Cloud and set up usage limits/budgets for my team?
AI Inference Acceleration

How do I get an API key for SambaNova Cloud and set up usage limits/budgets for my team?

11 min read

Most teams come to SambaNova Cloud with two immediate needs: get an API key working in minutes, and put hard guardrails around spend so multi-model, agentic workloads don’t surprise Finance at the end of the month. This guide walks through both: how to obtain and manage SambaNova Cloud API keys, and how to structure usage limits, budgets, and team controls around them.

Quick Answer: You get a SambaNova Cloud API key by creating a SambaCloud account, enabling API access, and generating a key from your account settings. Usage limits and budgets are then enforced at the organization/project level by combining per-key scopes, rate limits, and spend thresholds, so you can keep agentic and multi-model workloads under control.


The Quick Overview

  • What It Is: SambaNova Cloud (SambaCloud) exposes high-performance inference on open-source frontier models—like DeepSeek, Llama, and gpt-oss—over OpenAI-compatible APIs. You use API keys to authenticate requests and enforce access controls.
  • Who It Is For: Platform, infra, and ML ops teams running production LLM workloads that need predictable performance, cost controls, and a fast path from OpenAI-compatible prototypes to a more efficient inference stack.
  • Core Problem Solved: Avoids the “one-key-per-developer with no guardrails” anti-pattern. Instead, you can tie SambaNova Cloud API keys to projects, environments, and budgets, and rely on the underlying chips-to-model stack for consistent throughput and tokens-per-watt efficiency.

How It Works

At a high level, SambaNova Cloud gives you OpenAI-compatible endpoints backed by RDU-powered inference. Keys authenticate calls to those endpoints, while account-level controls determine which models a key can access and how much usage it can generate over a period.

You typically:

  1. Create and configure your SambaCloud organization: Define who owns billing, who can generate keys, and how many environments (dev/stage/prod) you need.
  2. Generate API keys per environment or workload: Use project-scoped keys instead of per-person keys, and align each with its own usage policy.
  3. Set usage limits and budgets: Apply rate and spend caps across keys, tie them to monitoring and alerts, and use logs to keep multi-model agentic workflows from escaping their budgets.

The same OpenAI-compatible interface means you can port existing apps from other providers in minutes and immediately apply the same guardrails.


1. Getting a SambaNova Cloud account

Before you can generate an API key, you need access to SambaNova Cloud.

  1. Visit SambaNova Cloud:
    Go to https://sambanova.ai and navigate to SambaCloud / “Start Building” or “Get Started.”
  2. Create or join an organization:
    • If you’re the first in your company, create a new organization with your work email.
    • If your org already exists, request to be added by the current org admin.
  3. Assign roles:
    • Org Admin / Billing Owner: Can manage plans, payment methods, and global limits.
    • Project Admin: Can create projects/environments and manage API keys within them.
    • Developer: Can use existing keys; key creation may be restricted depending on governance.

Once your account is active and you can see SambaCloud dashboards, you’re ready to enable API access.


2. Enabling API access and locating the API section

SambaNova Cloud exposes all models through OpenAI-compatible APIs, so your main tasks are to enable API access and retrieve your base endpoint.

  1. Sign in to SambaCloud.
  2. Open the API / Developer settings:
    • Look for sections labeled “API,” “Developer Settings,” or “API Keys.”
    • Confirm that API access is enabled for your organization and for your user role.
  3. Identify your API base URL:
    • SambaNova Cloud provides OpenAI-compatible endpoints; you’ll see something like:
      https://api.sambanova.ai/v1/...
    • You’ll use this as the base for chat/completions, completions, and other standard routes.

Keep this page open—you’ll generate keys and configure their scope here.


3. Generating a SambaNova Cloud API key

Once in the API section, generate a key for your first workload.

  1. Create a new key:
    • Click “Create API Key” or equivalent.
    • Name it with intent—examples:
      • dev-agentic-workflows
      • prod-customer-facing-chat
      • qa-load-testing
  2. Set scope and permissions (recommended structure):
    • Model access: Limit which models this key can call (e.g., only Llama 3.1 70B, or allow gpt-oss-120b for production, DeepSeek-R1 for R&D).
    • Environment: Associate keys with dev, staging, or prod projects.
    • Data access: If available, restrict to specific datasets or tenants for sovereign / regulated workloads.
  3. Save and copy the key:
    • API keys are typically shown once. Store them in a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager, Kubernetes secrets).
    • Never commit keys to Git or share them in Slack / email.

You now have a working SambaNova Cloud API key ready to authenticate OpenAI-compatible requests.


4. Using your API key with OpenAI-compatible APIs

SambaNova Cloud is designed so you can port from another OpenAI-compatible provider with minimal changes.

Basic request structure

HTTP example:

curl https://api.sambanova.ai/v1/chat/completions \
  -H "Authorization: Bearer $SAMBANOVA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain model bundling in SambaStack." }
    ],
    "max_tokens": 256
  }'

Python example (using OpenAI client style):

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SAMBANOVA_API_KEY"],
    base_url="https://api.sambanova.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize SambaNova's tiered memory architecture."},
    ],
    max_tokens=256,
)
print(response.choices[0].message.content)

That’s the core of integration: swap your base URL and API key, keep the same request schema.


5. Designing usage limits and budgets for your team

Production teams care less about single-key usage and more about predictable cost envelopes across multiple workloads and environments. The right pattern is to use projects and per-key limits as your primary control plane.

Core concepts

  • Organization limit: Hard monthly or daily cap for the entire org.
  • Project limit: Budget per project or environment (dev vs. prod).
  • API key limit: Guardrails for specific services or agents.
  • Alerting thresholds: Email/Webhook notifications at 50%, 75%, 90%, 100% of budget.

6. Setting organization-wide usage limits and budgets

  1. Go to Billing / Usage:
    • Locate “Billing,” “Usage,” or “Cost Management” in SambaCloud.
  2. Define a monthly organization budget:
    • Choose a ceiling that reflects your contractual plan or internal budget.
    • Example: $10,000/month across all projects and keys.
  3. Set hard vs. soft caps:
    • Soft cap: Alerts at thresholds but allow overflow (good for early experiments).
    • Hard cap: Stop further requests or throttle at the cap (good for production cost certainty).
  4. Configure alerts:
    • Add distribution lists (e.g., ml-platform@company.com, finops@company.com).
    • If supported, configure webhooks to your internal observability stack or Slack.

This ensures no single team can push the organization significantly over its approved budget.


7. Project- and environment-level limits

Next, control spend per environment—this is where real protection against runaway dev experiments comes from.

  1. Create projects for each major workload:
    • customer-support-assistant
    • internal-code-assistant
    • research-agentic-prototypes
  2. Separate environments:
    • For each project, define dev, staging, prod.
  3. Assign budgets per project/environment:
    • Example:
      • customer-support-assistant: prod$5,000/month
      • customer-support-assistant: dev$500/month
      • research-agentic-prototypes$2,000/month
  4. Map API keys to projects:
    • Keys created inside a project inherit its budget/rate constraints.
    • Enforce policy: developers must use project-scoped keys, not personal keys.

With this structure, research agents can’t accidentally consume the same budget reserved for customer-facing traffic.


8. Per-key limits: controlling specific teams and agents

Per-key limits are where you can target specific workloads—like a DeepSeek-R1 reasoning agent that calls multiple models in a loop.

Options commonly available:

  • Rate limits: Requests per second (RPS) or tokens per second per key.
  • Daily/Monthly budget per key: Hard or soft caps expressed in currency or token quotas.
  • Model-specific caps: E.g., restrict a key from using the largest, most expensive frontier models.

Example pattern

  • Prod chat API key:
    • High RPS, moderate cost cap, access to gpt-oss-120b and Llama 3.1 70B.
  • Batch analytics key:
    • Lower RPS, higher overall token allowance, running off-peak.
  • Research agent key:
    • Strict daily token cap and restricted to a subset of models (e.g., DeepSeek-R1 and one Llama variant).

This is especially important on SambaNova because agentic workflows can execute long chains of calls across multiple models—limits prevent those chains from becoming unbounded.


9. Tracking usage and preventing overages in agentic workloads

Even with limits, you need visibility to see which models and agents are driving spend.

What to monitor

  • Usage by model: DeepSeek-R1 vs. Llama vs. gpt-oss-120b.
  • Usage by API key / project: Correlate with your services and teams.
  • Tokens per request / per agent loop: Watch for prompt growth and long reasoning chains.
  • Tokens per watt / tokens per dollar (internal metric): SambaNova’s RDU stack is built to maximize these; monitoring them tells you if your prompts and sampling configs are efficient.

How to respond

  • Tighten per-key limits if a project consistently hits caps early.
  • Encourage teams to:
    • Truncate or summarize context.
    • Use smaller models (e.g., Llama 3.1 8B/70B) for simpler steps.
    • Reserve the largest models (gpt-oss-120b, DeepSeek-R1) for complex reasoning steps.

SambaStack’s ability to switch between multiple frontier-scale models on one node is what makes this effective—you can route to the right model per step without changing infrastructure.


10. Recommended team patterns for API key and budget management

For most platform teams, these patterns work best:

  1. Per-service keys, not per-person keys:
    • Easier to rotate, audit, and terminate.
  2. Environment isolation:
    • Distinct keys and budgets for dev vs. prod.
  3. Least-privilege model access:
    • Most services don’t need every frontier model; restrict to what’s necessary.
  4. Centralized secrets management:
    • Use Vault/Secrets Manager; avoid environment sprawl where keys are copied manually.
  5. Regular reviews:
    • Monthly: review model usage and costs; right-size budgets.
    • Quarterly: re-evaluate which workloads should move to more efficient models or architectures.

This mirrors how SambaOrchestrator manages inference across data centers—explicit control planes for scaling and routing, not ad-hoc configuration.


Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
OpenAI-compatible APIsExposes SambaNova Cloud models via OpenAI-style endpoints and schemas.Port existing apps in minutes without rewriting client code.
Project-scoped API keysTie keys to projects/environments with model and usage constraints.Clear ownership, easier debugging, safer multi-team usage.
Usage limits & budgetsEnforce org, project, and key-level caps and alerts.Predictable spend and protection against runaway agentic workloads.

Ideal Use Cases

  • Best for platform teams consolidating LLM providers: Because SambaNova Cloud lets you migrate from OpenAI-compatible endpoints quickly and then put stronger usage and cost guardrails around high-throughput workloads.
  • Best for organizations scaling agentic, multi-model workflows: Because the underlying SambaStack can switch between multiple frontier-scale models on a single node, and API budgets/limits keep those workflows within defined resource envelopes.

Limitations & Considerations

  • Key-sharing risk: If teams share keys across services, you lose per-service visibility and control. Use per-service keys and a secrets manager to keep permissions tight.
  • Overly strict limits in early rollout: Aggressive caps can look like instability to product teams; start with soft caps plus alerts, then harden as you understand real usage patterns.

Pricing & Plans

SambaNova Cloud offers usage-based access to high-performance inference, driven by the underlying RDU architecture and SambaStack.

Typical structure includes:

  • Metered usage: You pay for the tokens or requests your workloads consume, with different effective costs depending on model size and complexity.

  • Plan tiers: Higher tiers can offer better economics for large-scale usage and enable more aggressive autoscaling and model access.

  • Growth / Team Plan: Best for teams validating SambaNova Cloud for a few production services and internal tools, needing clear per-project budgets and simple OpenAI-compatible migration.

  • Enterprise Plan: Best for organizations standardizing on SambaNova for large-scale agentic inference, needing tight integration with existing observability, strict usage controls, and multi-region/sovereign deployment options.

For precise pricing, limits, and enterprise options, contact SambaNova directly.


Frequently Asked Questions

How do I rotate or revoke a SambaNova Cloud API key?

Short Answer: Delete or regenerate the key in the SambaCloud API settings, then update your services with the new key.

Details:
In the API / Developer settings, select the key you want to rotate and choose “Regenerate” or “Delete.” Regeneration provides a new secret while retaining its name and scope; deletion fully revokes access. Update your secrets manager and redeploy your services before removing the old key to avoid downtime.


Can I restrict a SambaNova Cloud API key to specific models or projects?

Short Answer: Yes, keys can be scoped to projects and limited to particular models.

Details:
When creating or editing a key, you can associate it with a project/environment and configure which models it can call (for example, only DeepSeek-R1 and Llama 3.1 70B). This is the recommended pattern for enforcing internal governance—prod services get tightly scoped keys, while R&D projects get broader access but smaller budgets.


Summary

Getting an API key for SambaNova Cloud is straightforward: create or join your organization, enable API access in SambaCloud, generate project-scoped keys, and plug them into your existing OpenAI-compatible clients. The real value appears when you combine those keys with structured usage limits and budgets—org-wide caps, project-level envelopes, and per-key constraints—so your agentic, multi-model workflows can take full advantage of SambaNova’s chips-to-model inference stack without risking uncontrolled cost.

You get fast, efficient inference on frontier open-source models like DeepSeek, Llama, and gpt-oss, plus the operational confidence that comes from explicit budget and access control.


Next Step

Get Started