
How do I get an API key for SambaNova Cloud and set up usage limits/budgets for my team?
Most teams land on SambaNova Cloud with one immediate question: how do we get an API key into developers’ hands quickly while still keeping spend and usage under control across the org? This guide walks you through both sides—creating your SambaNova Cloud API key and configuring usage limits and budgets so you can scale safely.
Quick Answer: You get an API key for SambaNova Cloud by signing up, creating a project or workspace, and generating a key from the account or developer settings. You then manage usage limits and budgets at the team or project level to control total spend, rate limits, and who can access which keys.
The Quick Overview
- What It Is: A way to authenticate to SambaNova Cloud’s OpenAI-compatible APIs and enforce usage controls for your organization.
- Who It Is For: Platform, infra, and application teams running production LLM inference on SambaNova Cloud and needing predictable cost and access control.
- Core Problem Solved: Fast onboarding to high-throughput inference (DeepSeek, Llama, gpt-oss, and more) without losing visibility or control over how much your team consumes.
How It Works
At a high level, SambaNova Cloud gives you OpenAI-compatible APIs on top of SambaStack, running on RDUs in SambaRack systems. You create one or more API keys associated with a project, then use standard Authorization: Bearer <API_KEY> headers in your applications. For team usage control, you configure limits and budgets at the organization, project, or key level—depending on how you want to partition spend and enforce boundaries between workloads.
Each request is metered by model and token usage. Usage is then aggregated against your plan and any budgets you define. When thresholds are hit, SambaNova Cloud can restrict further calls, throttle, or notify admins, ensuring that a single runaway agent or test script doesn’t blow your monthly budget.
Here’s how the flow typically looks from zero to production:
- Account & Organization Setup: Create your SambaNova Cloud account, define your organization or team structure, and assign admin roles.
- API Key Creation & Integration: Generate API keys, scope them to projects or environments, and wire them into your applications via OpenAI-compatible clients.
- Usage Limits, Budgets & Monitoring: Configure per-project or per-key limits, set alerts and budget caps, and monitor real-time usage as agentic workloads scale.
1. Set up your SambaNova Cloud account and organization
Before you can generate an API key, you need a SambaNova Cloud account and an organizational context for your team.
-
Sign up or sign in
- Visit
https://sambanova.aiand navigate to SambaCloud / “Start Building”. - Sign up with your work email, or sign in if your organization already has access.
- Verify your email and complete any required profile details.
- Visit
-
Create or join an organization
- If you’re the first in your company, create a new organization or workspace.
- If your organization already exists:
- Ask an existing admin to invite you.
- Accept the invitation via email and join the correct org.
-
Assign roles
- At minimum, define:
- Org Admins: Can manage billing, budgets, and global usage limits.
- Project Owners/Leads: Can create project-level keys and monitor usage.
- Developers: Can use keys but may not be allowed to create or edit them.
- Align roles with your internal controls (e.g., security or FinOps requirements).
- At minimum, define:
2. Create an API key for SambaNova Cloud
Once your org is set up, you can generate keys for your workloads.
Step-by-step: Generate your first API key
The exact UI labels may vary, but the flow generally looks like:
- Navigate to API or Developer settings
- From the SambaNova Cloud console, go to:
- Settings → API Keys
or - Developer → API Access
- Settings → API Keys
- From the SambaNova Cloud console, go to:
- Choose the context (org or project)
- If your org uses separate projects for different apps or environments, select the project you want the key to belong to (e.g., “prod-agentic-workflows”, “staging-experiments”).
- Click “Create API Key”
- Provide:
- A descriptive name (e.g.,
prod-agent-orchestrator,ci-integration-tests). - Optional scope or permissions if supported (e.g., read-only logs vs. full inference).
- A descriptive name (e.g.,
- Provide:
- Generate and copy the key
- SambaNova Cloud will show the new key once.
- Copy it immediately and store it in a secure secret manager (e.g., HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Kubernetes Secrets).
Best practices for key management
- One key per workload and environment
- Example:
agent-service-prodagent-service-stagingnotebook-research
- Example:
- Never commit keys to source control
- Use environment variables or secret managers, not
.envfiles in git.
- Use environment variables or secret managers, not
- Rotate keys regularly
- Set a key-rotation policy (e.g., every 60–90 days) or rotate immediately if you suspect exposure.
- Restrict permissions where possible
- For shared environments, prefer keys with limited model access or restricted rate limits.
3. Connect with OpenAI-compatible clients
SambaNova Cloud is designed to let you port your existing OpenAI-based applications in minutes.
Python example
import os
import requests
API_KEY = os.environ["SAMBANOVA_API_KEY"]
response = requests.post(
"https://api.sambanova.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "gpt-oss-120b", # or DeepSeek, Llama, etc.
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain tiered memory for LLM inference."},
],
"max_tokens": 256,
},
timeout=30,
)
print(response.json())
If you’re using an OpenAI SDK, you typically just change the base URL and the API key environment variable, while keeping most of your code unchanged.
4. Configure usage limits and budgets for your team
Once keys are in place, the next priority is preventing surprise bills while still enabling experimentation.
Types of limits you may configure
Depending on your plan and org settings, you’ll typically manage:
- Org-level budgets: Monthly or quarterly caps on total usage or spend.
- Project-level quotas: Token or spend limits per project (e.g., “R&D”, “Customer Support Agents”).
- API-key-level caps: Hard or soft limits for specific keys used by a microservice or team.
- Rate limits: Requests-per-minute or tokens-per-second ceilings to protect services and control burst behavior.
- Alert thresholds: Notifications at specific utilization points (e.g., 50%, 75%, 90% of budget).
Step-by-step: Set budgets and limits
- Go to Billing or Usage settings
- From your org console, open:
- Billing & Usage or Usage Controls.
- From your org console, open:
- Define an organization-wide budget
- Set a billing period (monthly is typical).
- Choose a budget metric:
- Currency-based (e.g., $ budget).
- Usage-based (e.g., total tokens or hours).
- Create alert thresholds:
- 50%: Notify admins.
- 80%: Notify admins + project owners.
- 95–100%: Consider automated restriction (e.g., block new non-prod keys).
- Create project-level quotas
- For each project:
- Define a monthly token or spend limit.
- Optionally set hard caps (requests stop when exceeded) vs. soft caps (alerts only).
- Align these with your internal budget allocations (e.g., “Agents team gets 30% of capacity, R&D gets 20%”).
- For each project:
- Configure per-key limits where needed
- For keys used by:
- Hackathons
- External partners
- CI/testing
- Set conservative caps (e.g., low daily token limits) to avoid spikes.
- For keys used by:
- Enable alerts and reporting
- Add notification channels (email, Slack, etc. if supported).
- Assign who receives alerts:
- Org Admins
- Project owners
- FinOps or platform teams
5. Monitor usage and optimize spend
Usage limits are only effective if you can see how close you are to hitting them.
What to monitor
Typical dashboards in SambaNova Cloud or your own observability stack should track:
- Tokens used over time: By organization, project, and API key.
- Model breakdown: DeepSeek vs. Llama vs. gpt-oss-120b and other models.
- Latency and throughput: Tokens/sec, especially for agentic workflows with multiple model calls.
- Error rates and throttling: Signs of limits being hit or exceeded.
Optimization strategies for agentic workloads
Because SambaNova’s stack is designed for agentic inference and multi-model workflows, you can often optimize both performance and cost by:
- Right-sizing models per step: Use a smaller Llama variant or distilled model for simple classification steps, and a larger gpt-oss or DeepSeek model only for complex reasoning.
- Capping context growth: Set sensible
max_tokensand context windows for long-running agents to prevent unbounded prompt growth. - Batching requests where feasible: For high-throughput inference, batch similar requests to maximize tokens-per-watt and keep RDUs fully utilized.
- Measuring real usage per workflow: Attribute consumption back to specific agents or pipelines so you can enforce budgets at the correct level (e.g., “Customer Support Agent v2” vs. generic “prod”).
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| OpenAI-compatible API keys | Authenticate to SambaNova Cloud using familiar OpenAI-style APIs | Port your applications in minutes without rewriting clients |
| Org & project-level budgets | Set spend or usage caps for teams, projects, and environments | Prevent runaway costs while enabling experimentation |
| Per-key limits and rate controls | Restrict usage and requests for individual keys or services | Contain impact of misconfigured agents or scripts |
| Usage monitoring & reporting | Track tokens, models, latency, and errors across your org | Optimize model choices and workloads based on real data |
| Full-stack inference performance | Leverage RDUs, SambaRack, and SambaStack for high-throughput inference | Achieve frontier-scale agentic workflows at efficient cost |
Ideal Use Cases
- Best for platform teams standardizing LLM access: Because it provides a single, controlled way to issue SambaNova Cloud API keys, monitor usage, and enforce budgets across multiple application teams.
- Best for production agentic workflows: Because it combines high-throughput inference on models like DeepSeek and gpt-oss with granular usage controls so you can safely run multi-step, multi-model agents without overspending.
Limitations & Considerations
- UI and controls vary by plan: Some advanced budget or per-key features may be available only on specific SambaNova Cloud plans; check with your SambaNova representative if you need fine-grained control.
- Limits vs. SLAs: Usage caps can prevent overspend but don’t replace capacity planning—coordinate rate limits and quotas with expected production workloads so agents don’t stall at critical moments.
Pricing & Plans
SambaNova Cloud is built to scale from initial experiments to high-throughput production, with pricing tied to model usage and performance characteristics.
- Team/Starter Plans: Best for smaller groups or new projects needing fast access to SambaNova Cloud models with simple org-wide limits and straightforward spending controls.
- Enterprise Plans: Best for organizations needing granular project and key-level budgets, advanced monitoring, and integration with existing observability and FinOps practices across data centers.
For specific pricing, discounts, and which usage-control features are available on each plan, contact SambaNova directly.
Frequently Asked Questions
How do I safely share a SambaNova Cloud API key with my team?
Short Answer: Don’t share raw keys; store them in a central secret manager and expose them to services via environment variables or runtime configuration.
Details:
Instead of sending API keys over chat or email, use your organization’s standard secret management solution. Assign one key per service or environment (e.g., agent-prod, agent-staging) and load it into your deployment system (Kubernetes, serverless, VMs) via encrypted secrets. Give developers access to run workloads, not to view the underlying key. Combine this with per-key usage limits and budgets so a single compromised service doesn’t impact your entire spend.
What happens when my team hits a usage limit or budget cap?
Short Answer: Depending on how you configure it, SambaNova Cloud can either send alerts, throttle traffic, or block additional usage until the budget is adjusted or the period resets.
Details:
When usage approaches your defined thresholds, alerts are triggered so admins can respond. If you’ve set soft limits, workloads continue but you gain early warning to adjust policies. With hard caps, additional requests may fail with an error indicating that the limit has been reached. This is useful for non-critical workloads like experiments but may be too aggressive for customer-facing systems; in those cases, consider higher limits plus alerting. Always test your application behavior when limits are exceeded—graceful degradation is better than hard failure for production agents.
Summary
Getting an API key for SambaNova Cloud and setting up usage limits/budgets for your team is a two-step process: onboard your organization and generate scoped keys, then enforce clear quotas and monitoring around them. Because SambaNova Cloud uses OpenAI-compatible APIs, you can migrate existing code quickly, then lean on SambaNova’s inference stack—RDUs, SambaRack, SambaStack, and SambaOrchestrator—to run fast, efficient agentic workflows. With org, project, and key-level controls in place, you gain the ability to scale your LLM usage without losing financial or operational control.