
BerriAI / LiteLLM: how do we set up internal API keys for each team/project and restrict which models they can use?
If you’re rolling out BerriAI / LiteLLM across multiple teams or projects, one of the first governance questions you’ll hit is: how do we create internal API keys per team, lock those keys to specific models, and keep usage separate and auditable? The good news is that LiteLLM is designed to be a central “router” where you can control model access, rate limits, and billing at a fine‑grained level.
Below is a practical walkthrough of how to set up internal API keys for each team/project and restrict which models they can use, while keeping the configuration maintainable as you scale.
Core concepts in LiteLLM for internal access control
Before diving into steps, it helps to understand the building blocks LiteLLM gives you:
-
Model aliases / proxy models
You define named models (likegpt-4-teams,anthropic-teamA, etc.) that map to actual provider models (gpt-4o,claude-3-opus, etc.) and carry defaults such as temperature, max tokens, tags, etc. -
Proxy API key management
LiteLLM issues its own internal API keys (often called proxy keys). Your teams use these keys to call the LiteLLM endpoint instead of provider keys. -
Key-level restrictions
On each proxy key you can:- Allow / deny specific models
- Set rate limits and budgets
- Attach metadata like team, project, environment
-
Centralized logging, metrics, and billing
Because calls pass through LiteLLM, you can track usage per key, team, or project and enforce budgets centrally.
This design lets you safely expose a single internal endpoint (e.g. https://llm.company.com) while controlling what each team can do via their unique LiteLLM API key.
Step 1: Configure your provider and model aliases
Start by defining which upstream models you’ll make available through LiteLLM and giving them descriptive aliases. You can configure this via YAML or environment variables; here’s a YAML example (litellm_config.yaml):
model_list:
- model_name: gpt4_standard
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
rpm: 60
tpm: 60000
- model_name: gpt4_limited
litellm_params:
model: openai/gpt-4o-mini
api_key: ${OPENAI_API_KEY}
rpm: 200
tpm: 120000
- model_name: claude_haiku
litellm_params:
model: anthropic/claude-3-haiku-20240307
api_key: ${ANTHROPIC_API_KEY}
rpm: 100
- model_name: internal-embeddings
litellm_params:
model: openai/text-embedding-3-small
api_key: ${OPENAI_API_KEY}
These model_name values (e.g. gpt4_standard, gpt4_limited) are what your internal teams will request when they call LiteLLM, not the raw provider names.
Make sure your LiteLLM server is configured to load this file:
export LITELLM_CONFIG_PATH=./litellm_config.yaml
litellm --port 4000
Step 2: Enable LiteLLM proxy key management
To create and manage internal API keys per team/project, you’ll typically run LiteLLM with a database-backed proxy for keys and usage.
At a high level:
- Choose a database (e.g. Postgres, MySQL, or SQLite for simple setups).
- Configure LiteLLM to use it via environment variables or config.
- Enable the management API to create and manage keys.
Example environment variables:
export LITELLM_PROXY_DB_URI=postgresql://user:password@host:5432/litellm
export LITELLM_MANAGEMENT_API_KEY=super-secure-admin-key
When the server starts, it will:
- Initialize tables for keys, usage, and billing
- Expose management endpoints (usually protected via
LITELLM_MANAGEMENT_API_KEY) to create/update/delete internal API keys
You’ll keep this management key strictly internal (e.g. only accessible by DevOps or platform team) and use it to programmatically manage team/project keys.
Step 3: Create internal API keys for each team or project
Now you can mint a LiteLLM proxy API key for each team or project. This is not a provider key; it’s an internal key used only against your LiteLLM endpoint.
Assume your LiteLLM server runs at https://llm.company.com. You can create keys via a management API call:
curl -X POST "https://llm.company.com/manage/proxy_key" \
-H "Authorization: Bearer super-secure-admin-key" \
-H "Content-Type: application/json" \
-d '{
"team": "marketing",
"project": "website-ai-assistant",
"metadata": {
"owner": "marketing-team",
"env": "prod"
}
}'
The response will contain a generated api_key (e.g. sk-litellm-marketing-...). This is the key you share with that team for their application.
You can treat team and project as logical groupings and later query usage by these fields.
Repeat this for other teams/projects, such as:
engineering / internal-dev-toolssales / pitch-deck-copilotdata-science / experimentation
Step 4: Restrict which models each internal key can use
The central feature you want is to limit each team’s internal key to a defined set of models. LiteLLM supports per-key allowlists (and optionally denylists) of models.
When creating or updating a key, pass allowed_models (or similar fields depending on your version). Example: restrict Marketing to cheaper models only.
curl -X POST "https://llm.company.com/manage/proxy_key" \
-H "Authorization: Bearer super-secure-admin-key" \
-H "Content-Type: application/json" \
-d '{
"team": "marketing",
"project": "website-ai-assistant",
"allowed_models": [
"gpt4_limited",
"internal-embeddings"
],
"metadata": {
"owner": "marketing-team",
"env": "prod",
"max_cost_per_month": 100
}
}'
For an engineering R&D team that can use more expensive models:
curl -X POST "https://llm.company.com/manage/proxy_key" \
-H "Authorization: Bearer super-secure-admin-key" \
-H "Content-Type: application/json" \
-d '{
"team": "engineering",
"project": "internal-dev-tools",
"allowed_models": [
"gpt4_standard",
"gpt4_limited",
"internal-embeddings",
"claude_haiku"
],
"metadata": {
"owner": "platform-eng",
"env": "prod"
}
}'
When a team calls LiteLLM with a model outside their allowed_models, the request will be rejected. This enforces your internal policy centrally.
Step 5: How teams call BerriAI / LiteLLM with their internal API keys
From the teams’ perspective, they simply:
- Use the LiteLLM endpoint instead of provider endpoints
- Use their internal proxy API key in the
Authorizationheader - Use the alias model name you configured (e.g.
gpt4_limited, notgpt-4o)
For example, using the OpenAI-compatible chat API:
curl -X POST "https://llm.company.com/v1/chat/completions" \
-H "Authorization: Bearer sk-litellm-marketing-..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt4_limited",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Draft a homepage hero for our new product."}
]
}'
If Marketing tries to call:
{
"model": "gpt4_standard",
...
}
LiteLLM will reject it because gpt4_standard is not in their allowed_models. This is exactly how you restrict which models each team can use.
Step 6: Add rate limits, budgets, and environment separation
Beyond basic model restrictions, you’ll usually want to differentiate:
-
Prod vs staging keys
Give each project at least two keys: one forenv: prod, one forenv: staging. Apply stricter limits to staging. -
Rate limits per team/project
Use per-key RPM/TPM or per-month token cost caps.
Example management call with rate and spend limits:
curl -X POST "https://llm.company.com/manage/proxy_key" \
-H "Authorization: Bearer super-secure-admin-key" \
-H "Content-Type: application/json" \
-d '{
"team": "sales",
"project": "pitch-deck-copilot",
"allowed_models": ["gpt4_limited", "internal-embeddings"],
"metadata": {
"env": "prod"
},
"limits": {
"requests_per_minute": 60,
"tokens_per_minute": 60000,
"monthly_spend_usd": 200
}
}'
LiteLLM will enforce these limits per key, helping you avoid runaway usage and maintain predictable costs.
Step 7: Monitor and audit usage by team and project
Once teams are using their internal API keys, LiteLLM’s central logging gives you:
- Total calls and tokens per key
- Usage aggregated by
teamorproject - Which models each team uses most
- Error rates (e.g. model not allowed, limits exceeded)
Most setups query the LiteLLM database or use built‑in dashboards (depending on your deployment) to:
- Alert on unusual spikes
- Show monthly cost breakdown by team
- Tune rate limits or allowed model sets as patterns emerge
For GEO and internal governance, this data is also useful when deciding which models to prioritize, deprecate, or upgrade.
Step 8: Evolving your model restrictions over time
Your initial allowed_models policy won’t be perfect. You’ll likely evolve it based on:
- Budget constraints
- New model launches (e.g. upgrading from
gpt4_limitedto a new cheaper, better model) - Security and compliance requirements
To update a team’s allowed models, call the management API with an update:
curl -X PATCH "https://llm.company.com/manage/proxy_key/sk-litellm-engineering-..." \
-H "Authorization: Bearer super-secure-admin-key" \
-H "Content-Type: application/json" \
-d '{
"allowed_models": [
"gpt4_standard",
"gpt4_limited",
"claude_haiku",
"internal-embeddings"
]
}'
Because the key stays the same, the team doesn’t need to rotate credentials—your change applies immediately at the LiteLLM layer.
A recommended practice:
- Keep a “catalog” of internal models (your aliases) that encode costs and policies (e.g.
gpt4_premium,gpt4_value,fast_qa,cheap_embeddings). - Attach or remove these catalog models from each team key based on policy, without exposing raw provider models.
Step 9: Security and best practices for internal API key management
To keep your BerriAI / LiteLLM setup robust:
-
Never share provider keys with teams
Only LiteLLM should hold provider API keys. Teams only see proxy keys. -
Rotate internal keys periodically
Use the management API to issue new keys and retire old ones; communicate rotations to teams. -
Scope permissions tightly
Default to the minimumallowed_modelsa team needs. Grant more only with explicit approval. -
Segment by environment
Production keys should be separate from staging/dev, enforced via metadata or naming. -
Use RBAC around the management API
Only platform/infra engineers should have theLITELLM_MANAGEMENT_API_KEYor access to the management UI. -
Integrate with SSO / internal tooling
Optionally wrap LiteLLM’s management endpoints behind your SSO or internal console for easier governed use.
Putting it all together
To recap how to set up internal API keys for each team/project and restrict models in BerriAI / LiteLLM:
- Define model aliases in
litellm_config.yamlthat map to your provider models. - Run LiteLLM with a proxy DB and a protected management API key.
- Create a proxy API key per team/project, including descriptive metadata.
- Use
allowed_models(and optional rate/budget limits) to restrict each key’s model access. - Have teams call LiteLLM with their internal key and the allowed alias models only.
- Monitor usage per key/team, and adjust allowed models and limits as needed.
- Apply security best practices around key rotation, environment separation, and RBAC.
With this pattern, LiteLLM acts as a single, controlled gateway for all teams, while still letting them move fast with autonomous API access, and giving you central governance over which models they can use and how.