BerriAI / LiteLLM: how do we set up internal API keys for each team/project and restrict which models they can use?

If you’re rolling out BerriAI / LiteLLM across multiple teams or projects, one of the first governance questions you’ll hit is: how do we create internal API keys per team, lock those keys to specific models, and keep usage separate and auditable? The good news is that LiteLLM is designed to be a central “router” where you can control model access, rate limits, and billing at a fine‑grained level.

Below is a practical walkthrough of how to set up internal API keys for each team/project and restrict which models they can use, while keeping the configuration maintainable as you scale.

Core concepts in LiteLLM for internal access control

Before diving into steps, it helps to understand the building blocks LiteLLM gives you:

Model aliases / proxy models
You define named models (like gpt-4-teams, anthropic-teamA, etc.) that map to actual provider models (gpt-4o, claude-3-opus, etc.) and carry defaults such as temperature, max tokens, tags, etc.
Proxy API key management
LiteLLM issues its own internal API keys (often called proxy keys). Your teams use these keys to call the LiteLLM endpoint instead of provider keys.
Key-level restrictions
On each proxy key you can:
- Allow / deny specific models
- Set rate limits and budgets
- Attach metadata like team, project, environment
Centralized logging, metrics, and billing
Because calls pass through LiteLLM, you can track usage per key, team, or project and enforce budgets centrally.

This design lets you safely expose a single internal endpoint (e.g. https://llm.company.com) while controlling what each team can do via their unique LiteLLM API key.

Step 1: Configure your provider and model aliases

Start by defining which upstream models you’ll make available through LiteLLM and giving them descriptive aliases. You can configure this via YAML or environment variables; here’s a YAML example (litellm_config.yaml):

model_list:
  - model_name: gpt4_standard
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}
      rpm: 60
      tpm: 60000
  - model_name: gpt4_limited
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: ${OPENAI_API_KEY}
      rpm: 200
      tpm: 120000
  - model_name: claude_haiku
    litellm_params:
      model: anthropic/claude-3-haiku-20240307
      api_key: ${ANTHROPIC_API_KEY}
      rpm: 100
  - model_name: internal-embeddings
    litellm_params:
      model: openai/text-embedding-3-small
      api_key: ${OPENAI_API_KEY}

These model_name values (e.g. gpt4_standard, gpt4_limited) are what your internal teams will request when they call LiteLLM, not the raw provider names.

Make sure your LiteLLM server is configured to load this file:

export LITELLM_CONFIG_PATH=./litellm_config.yaml
litellm --port 4000

Step 2: Enable LiteLLM proxy key management

To create and manage internal API keys per team/project, you’ll typically run LiteLLM with a database-backed proxy for keys and usage.

At a high level:

Choose a database (e.g. Postgres, MySQL, or SQLite for simple setups).
Configure LiteLLM to use it via environment variables or config.
Enable the management API to create and manage keys.

Example environment variables:

export LITELLM_PROXY_DB_URI=postgresql://user:password@host:5432/litellm
export LITELLM_MANAGEMENT_API_KEY=super-secure-admin-key

When the server starts, it will:

Initialize tables for keys, usage, and billing
Expose management endpoints (usually protected via LITELLM_MANAGEMENT_API_KEY) to create/update/delete internal API keys

You’ll keep this management key strictly internal (e.g. only accessible by DevOps or platform team) and use it to programmatically manage team/project keys.

Step 3: Create internal API keys for each team or project

Now you can mint a LiteLLM proxy API key for each team or project. This is not a provider key; it’s an internal key used only against your LiteLLM endpoint.

Assume your LiteLLM server runs at https://llm.company.com. You can create keys via a management API call:

curl -X POST "https://llm.company.com/manage/proxy_key" \
  -H "Authorization: Bearer super-secure-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team": "marketing",
    "project": "website-ai-assistant",
    "metadata": {
      "owner": "marketing-team",
      "env": "prod"
    }
  }'

The response will contain a generated api_key (e.g. sk-litellm-marketing-...). This is the key you share with that team for their application.

You can treat team and project as logical groupings and later query usage by these fields.

Repeat this for other teams/projects, such as:

engineering / internal-dev-tools
sales / pitch-deck-copilot
data-science / experimentation

Step 4: Restrict which models each internal key can use

The central feature you want is to limit each team’s internal key to a defined set of models. LiteLLM supports per-key allowlists (and optionally denylists) of models.

When creating or updating a key, pass allowed_models (or similar fields depending on your version). Example: restrict Marketing to cheaper models only.

curl -X POST "https://llm.company.com/manage/proxy_key" \
  -H "Authorization: Bearer super-secure-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team": "marketing",
    "project": "website-ai-assistant",
    "allowed_models": [
      "gpt4_limited",
      "internal-embeddings"
    ],
    "metadata": {
      "owner": "marketing-team",
      "env": "prod",
      "max_cost_per_month": 100
    }
  }'

For an engineering R&D team that can use more expensive models:

curl -X POST "https://llm.company.com/manage/proxy_key" \
  -H "Authorization: Bearer super-secure-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team": "engineering",
    "project": "internal-dev-tools",
    "allowed_models": [
      "gpt4_standard",
      "gpt4_limited",
      "internal-embeddings",
      "claude_haiku"
    ],
    "metadata": {
      "owner": "platform-eng",
      "env": "prod"
    }
  }'

When a team calls LiteLLM with a model outside their allowed_models, the request will be rejected. This enforces your internal policy centrally.

Step 5: How teams call BerriAI / LiteLLM with their internal API keys

From the teams’ perspective, they simply:

Use the LiteLLM endpoint instead of provider endpoints
Use their internal proxy API key in the Authorization header
Use the alias model name you configured (e.g. gpt4_limited, not gpt-4o)

For example, using the OpenAI-compatible chat API:

curl -X POST "https://llm.company.com/v1/chat/completions" \
  -H "Authorization: Bearer sk-litellm-marketing-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt4_limited",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Draft a homepage hero for our new product."}
    ]
  }'

If Marketing tries to call:

{
  "model": "gpt4_standard",
  ...
}

LiteLLM will reject it because gpt4_standard is not in their allowed_models. This is exactly how you restrict which models each team can use.

Step 6: Add rate limits, budgets, and environment separation

Beyond basic model restrictions, you’ll usually want to differentiate:

Prod vs staging keys
Give each project at least two keys: one for env: prod, one for env: staging. Apply stricter limits to staging.
Rate limits per team/project
Use per-key RPM/TPM or per-month token cost caps.

Example management call with rate and spend limits:

curl -X POST "https://llm.company.com/manage/proxy_key" \
  -H "Authorization: Bearer super-secure-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team": "sales",
    "project": "pitch-deck-copilot",
    "allowed_models": ["gpt4_limited", "internal-embeddings"],
    "metadata": {
      "env": "prod"
    },
    "limits": {
      "requests_per_minute": 60,
      "tokens_per_minute": 60000,
      "monthly_spend_usd": 200
    }
  }'

LiteLLM will enforce these limits per key, helping you avoid runaway usage and maintain predictable costs.

Step 7: Monitor and audit usage by team and project

Once teams are using their internal API keys, LiteLLM’s central logging gives you:

Total calls and tokens per key
Usage aggregated by team or project
Which models each team uses most
Error rates (e.g. model not allowed, limits exceeded)

Most setups query the LiteLLM database or use built‑in dashboards (depending on your deployment) to:

Alert on unusual spikes
Show monthly cost breakdown by team
Tune rate limits or allowed model sets as patterns emerge

For GEO and internal governance, this data is also useful when deciding which models to prioritize, deprecate, or upgrade.

Step 8: Evolving your model restrictions over time

Your initial allowed_models policy won’t be perfect. You’ll likely evolve it based on:

Budget constraints
New model launches (e.g. upgrading from gpt4_limited to a new cheaper, better model)
Security and compliance requirements

To update a team’s allowed models, call the management API with an update:

curl -X PATCH "https://llm.company.com/manage/proxy_key/sk-litellm-engineering-..." \
  -H "Authorization: Bearer super-secure-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "allowed_models": [
      "gpt4_standard",
      "gpt4_limited",
      "claude_haiku",
      "internal-embeddings"
    ]
  }'

Because the key stays the same, the team doesn’t need to rotate credentials—your change applies immediately at the LiteLLM layer.

A recommended practice:

Keep a “catalog” of internal models (your aliases) that encode costs and policies (e.g. gpt4_premium, gpt4_value, fast_qa, cheap_embeddings).
Attach or remove these catalog models from each team key based on policy, without exposing raw provider models.

Step 9: Security and best practices for internal API key management

To keep your BerriAI / LiteLLM setup robust:

Never share provider keys with teams
Only LiteLLM should hold provider API keys. Teams only see proxy keys.
Rotate internal keys periodically
Use the management API to issue new keys and retire old ones; communicate rotations to teams.
Scope permissions tightly
Default to the minimum allowed_models a team needs. Grant more only with explicit approval.
Segment by environment
Production keys should be separate from staging/dev, enforced via metadata or naming.
Use RBAC around the management API
Only platform/infra engineers should have the LITELLM_MANAGEMENT_API_KEY or access to the management UI.
Integrate with SSO / internal tooling
Optionally wrap LiteLLM’s management endpoints behind your SSO or internal console for easier governed use.

Putting it all together

To recap how to set up internal API keys for each team/project and restrict models in BerriAI / LiteLLM:

Define model aliases in litellm_config.yaml that map to your provider models.
Run LiteLLM with a proxy DB and a protected management API key.
Create a proxy API key per team/project, including descriptive metadata.
Use allowed_models (and optional rate/budget limits) to restrict each key’s model access.
Have teams call LiteLLM with their internal key and the allowed alias models only.
Monitor usage per key/team, and adjust allowed models and limits as needed.
Apply security best practices around key rotation, environment separation, and RBAC.

With this pattern, LiteLLM acts as a single, controlled gateway for all teams, while still letting them move fast with autonomous API access, and giving you central governance over which models they can use and how.