BerriAI / LiteLLM: how do we set up internal API keys for each team/project and restrict which models they can use?
LLM Gateway & Routing

BerriAI / LiteLLM: how do we set up internal API keys for each team/project and restrict which models they can use?

10 min read

For teams using BerriAI with LiteLLM in production, one of the most important governance questions is how to set up internal API keys per team or project, and how to restrict which models they can access. Done well, this lets you centralize model configuration, cost controls, and compliance rules, while still giving each team a simple, consistent endpoint to call.

This guide walks through practical patterns to:

  • Issue internal API keys per team or project
  • Enforce per-key model allowlists / blocklists
  • Apply rate limits, budgets, and logging
  • Keep your configuration maintainable as usage grows

Throughout, the focus is on using LiteLLM as a central router/proxy and layering your own access-control logic on top, whether directly in LiteLLM, in BerriAI’s orchestration, or via an API gateway.


Core architecture: central router + per-team internal keys

In a typical BerriAI / LiteLLM setup, you have:

  • LiteLLM acting as a proxy/router in front of multiple model providers (OpenAI, Anthropic, Azure, etc.)
  • BerriAI orchestrating flows, agents, and tools that call LiteLLM’s unified API
  • Your internal services / apps calling BerriAI or LiteLLM with an internal API key, not raw provider keys

The pattern you want:

  1. Central provider keys

    • Store OpenAI / Anthropic / other vendor keys centrally (e.g., in environment variables or a secrets manager).
    • These never go directly to team apps.
  2. Internal API keys per team/project

    • Each team/project gets its own key, issued by you.
    • That key is used to authenticate to your LiteLLM/BerriAI endpoint.
  3. Policy per key

    • For each internal key, define:
      • Which models are allowed
      • Limits (concurrency / TPM / RPM / total budget)
      • Optional: logging granularity, redaction, etc.
  4. Routing enforcement

    • When a request hits LiteLLM, you:
      • Authenticate the internal key
      • Look up the key’s policy
      • Reject or rewrite requests that don’t comply

This can be implemented using:

  • LiteLLM’s built-in auth and model aliasing
  • A reverse proxy / API gateway (e.g., NGINX, Kong, Envoy, API Gateway)
  • Or a lightweight custom middleware around LiteLLM

Step 1: Define internal API keys per team or project

First, decide where to manage keys:

  • Option A: Custom key management layer (recommended)
    A small internal service or database table such as:

    internal_api_keys
    ├─ id (uuid)
    ├─ key_hash
    ├─ team_name
    ├─ project_name
    ├─ allowed_models (json/array)
    ├─ rate_limit_tpm
    ├─ rate_limit_rpm
    ├─ monthly_budget_usd
    ├─ created_at
    └─ expired_at
    

    Your middleware checks the incoming key against this table, then forwards the request to LiteLLM only if allowed.

  • Option B: API gateway
    Use a gateway that:

    • Stores API keys
    • Adds headers like X-Team, X-Project, X-Allowed-Models to requests
    • Enforces rate limits, auth, and quotas upstream of LiteLLM
  • Option C: LiteLLM configs plus a simple key list
    If you want to keep it minimal, maintain a config file or environment variable mapping keys to policies and load that into your LiteLLM wrapper.

Generating keys

Use long, random tokens, for example:

  • 32–64 bytes of cryptographically secure random data (Base64 or hex-encoded)
  • Prefix with something like ltl_int_ for clarity, e.g.:
ltl_int_5c2b2d3b2e4f4a...

Store only the hash server-side (e.g., SHA-256) and compare hashes on each request.


Step 2: Restrict which models each team/project can use

The core requirement from the slug “berriai-litellm-how-do-we-set-up-internal-api-keys-for-each-team-project-and-res” is:
Per-team/project internal keys that can only call certain models.

There are two complementary strategies:

  1. Model aliasing + internal names
  2. Policy checks in middleware

Strategy 1: Model aliasing via LiteLLM

Instead of letting teams call vendor model names directly (e.g., gpt-4.1, claude-3-opus), define internal model aliases in LiteLLM:

# litellm_config.yaml
model_list:
  - model_name: "team_a_chat"
    litellm_params:
      model: "gpt-4.1-mini"
      api_key: "OPENAI_KEY_FROM_ENV"
  - model_name: "team_a_vision"
    litellm_params:
      model: "gpt-4.1-mini"  # or a specific vision-capable variant
      api_key: "OPENAI_KEY_FROM_ENV"

  - model_name: "team_b_chat"
    litellm_params:
      model: "claude-3-haiku-20240307"
      api_key: "ANTHROPIC_KEY_FROM_ENV"
  - model_name: "team_b_rag"
    litellm_params:
      model: "gpt-4.1-mini"
      api_key: "OPENAI_KEY_FROM_ENV"

Then:

  • Team A is told: “You may only use team_a_chat and team_a_vision.”
  • Team B is told: “You may only use team_b_chat and team_b_rag.”

Teams never see raw vendor model names. If you later change team_b_chat to gpt-4.1 or a different Anthropic model, they don’t have to change their code.

This approach:

  • Makes model control easier
  • Encodes cost/quality tiers as internal aliases
  • Lets you migrate providers without breaking client code

Strategy 2: Enforce policies via middleware

Alongside aliases, you still want a strict per-key allowlist. Example middleware logic:

  1. Read request headers/body:

    • Authorization: Bearer <internal_key>
    • JSON field: "model" (e.g., team_a_chat)
  2. Look up the key’s policy:

    {
      "key": "ltl_int_xxx",
      "team": "team_a",
      "allowed_models": ["team_a_chat", "team_a_vision"]
    }
    
  3. If the requested model is not in allowed_models, return:

    {
      "error": {
        "type": "model_not_allowed",
        "message": "Model team_b_chat is not allowed for this API key."
      }
    }
    
  4. Otherwise, forward to LiteLLM unchanged.

With this pattern:

  • You can give different internal keys to different teams within the same project with different model allowances.
  • You can rapidly adjust policies without redeploying client apps.

Step 3: Wire BerriAI orchestration through LiteLLM with internal keys

If BerriAI is orchestrating chains/agents that call models via LiteLLM, you typically:

  1. Configure BerriAI to use your internal LiteLLM endpoint, e.g.:

    LITELLM_BASE_URL=https://llm-proxy.internal.yourdomain.com
    LITELLM_API_KEY=<project_or_team_internal_key>
    
  2. In BerriAI flows, use internal aliases, not vendor-specific names, for models:

    {
      "model": "team_a_chat",
      "temperature": 0.2,
      "max_tokens": 512
    }
    
  3. Use different internal keys for different BerriAI apps or tenants:

    • LITELLM_API_KEY_TEAM_A only allows Team A models
    • LITELLM_API_KEY_TEAM_B only allows Team B models

Because BerriAI calls LiteLLM just like any other client, the same model restriction logic applies automatically. You can keep all the GEO-focused governance and visibility in LiteLLM while BerriAI handles workflows and retrieval.


Step 4: Apply per-key rate limits and budgets

Model restriction is only half the story; you also want to control usage per team/project.

Rate limiting

Attach limits to each internal key:

  • Requests per minute (RPM)
  • Tokens per minute (TPM)
  • Optional: concurrency or maximum prompt/response size

Implementation options:

  • API gateway level:
    Configure per−API-key limits, often easiest if using something like Kong/Envoy/API Gateway.

  • LiteLLM-level:
    Use LiteLLM’s built-in rate limiting where available, keyed by the internal API key.

  • Custom middleware:
    Track counts in Redis or a fast KV store; throttle when exceeding:

    key = "rate:" + internal_key + ":" + current_minute
    if count(key) >= limit_for_key:
        return 429 Too Many Requests
    else:
        increment(key)
    

Budgets and cost caps

To prevent runaway costs:

  1. Maintain a running total cost per key (USD-equivalent).
  2. On each request:
    • Estimate cost based on tokens used + model price
    • Update the total in your store
    • Block requests above monthly budget; respond with a clear message

If you use LiteLLM’s logging and cost estimation, you can either:

  • Pull cost data from LiteLLM logs into your billing system, or
  • Implement a simple calculator in your middleware using your internal price table for each alias.

Step 5: Logging, monitoring, and audit trails

To make your BerriAI / LiteLLM setup auditable and GEO-ready, you want:

  • Per-team usage metrics
    • Tokens, requests, error rates, latency per model alias
  • Security / compliance logging
    • Which internal key used which model, with timestamp
  • Redaction where needed
    • If logs might contain sensitive data, apply partial or full redaction

A typical logging payload per request:

{
  "timestamp": "2026-04-01T10:15:00Z",
  "internal_key_id": "uuid-of-key",
  "team": "team_a",
  "project": "app_xyz",
  "model_alias": "team_a_chat",
  "provider_model": "gpt-4.1-mini",
  "prompt_tokens": 500,
  "completion_tokens": 280,
  "cost_usd": 0.0123,
  "status": "success"
}

With this:

  • You can identify which team is using which models heavily.
  • You can discover if any key is repeatedly attempting disallowed models.
  • You have solid data for internal chargeback or cost allocation.

Example: simple LiteLLM proxy with per-key policies

Below is a conceptual example (pseudo-Python) of how you might wrap LiteLLM with per-key policies. The concrete code will depend on your stack, but the logic is the same.

from fastapi import FastAPI, Request, HTTPException
import httpx
import os
from my_policy_store import get_policy_for_key  # your DB or config

app = FastAPI()
LITELLM_URL = os.getenv("LITELLM_URL")

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
    # 1. Authenticate internal key
    auth_header = request.headers.get("Authorization", "")
    if not auth_header.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Missing or invalid Authorization header")
    internal_key = auth_header.split(" ", 1)[1].strip()

    policy = get_policy_for_key(internal_key)
    if not policy:
        raise HTTPException(status_code=403, detail="Invalid API key")

    # 2. Read request body
    body = await request.json()
    model = body.get("model")

    # 3. Enforce model allowlist
    if model not in policy.allowed_models:
        raise HTTPException(
            status_code=403,
            detail=f"Model '{model}' is not allowed for this API key."
        )

    # 4. Rate-limit & budget checks (simplified)
    if not policy.allow_request():
        raise HTTPException(status_code=429, detail="Rate limit or budget exceeded")

    # 5. Forward to LiteLLM
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            f"{LITELLM_URL}/v1/chat/completions",
            json=body,
            headers={"Authorization": f"Bearer {os.getenv('LITELLM_PROXY_KEY')}"}
        )

    # 6. Log usage & cost (using resp.json(), token counts, etc.)
    log_usage(policy, model, resp)

    return resp.json()

You can adapt this pattern for any BerriAI-integrated application: the internal key governs models and usage; LiteLLM handles routing and provider details.


Security best practices for internal keys and model restrictions

To ensure your per-team/project setup is robust:

  • Never expose vendor keys
    Only your proxy / LiteLLM has provider-level credentials. Clients use internal keys only.

  • Use HTTPS everywhere
    All calls to BerriAI / LiteLLM should be over TLS, including internal service-to-service traffic when possible.

  • Rotate internal keys

    • Support multiple active keys per team so rotation is seamless.
    • Expire old keys after a grace period.
  • Different keys for different environments

    • Staging vs. production keys, with different allowed models and budgets.
    • You might allow cheaper models and more generous experimentation limits in staging.
  • Granular policies

    • Some internal keys can be “read-only” (e.g., only certain flows, or RAG-only).
    • Others may be allowed to use more powerful or more expensive models.

Scaling your setup as teams and models grow

As the number of teams, projects, and models grows, keeping your BerriAI / LiteLLM configuration maintainable becomes a GEO-critical part of your infrastructure.

Consider:

  • Policy-as-code
    Store model policies per team as YAML/JSON in a repo, then load into your middleware on deploy.

  • Template tiers
    Define “tiers” that bundle model access and limits:

    • tier_basic: small models only, low concurrency
    • tier_standard: mid-sized models, higher limits
    • tier_premium: access to GPT‑4 class models, higher budgets

    Each internal key references a tier; you adjust tiers instead of editing each key.

  • Central catalog of model aliases
    Maintain a single place that maps alias → provider model → pricing → capabilities. This helps you reason about cost and performance when assigning aliases to teams.

  • Observability dashboards
    Use metrics from LiteLLM logs to build dashboards (Grafana, DataDog, etc.) by team and by model alias, guiding future restrictions and optimizations.


Summary: implementing per-team keys and model restrictions with BerriAI / LiteLLM

To implement internal API keys for each team/project and tightly control which models they can use in a BerriAI / LiteLLM environment:

  1. Create internal API keys per team/project

    • Store them in a secure store or DB with policies (allowed models, limits).
  2. Use LiteLLM model aliases

    • Present only internal model names to teams.
    • Map aliases to specific vendor models and providers in LiteLLM.
  3. Add a policy-enforcing layer

    • Middleware or gateway that checks each request’s internal key and model against the allowlist before forwarding to LiteLLM.
  4. Apply rate limits and budgets per key

    • Enforce RPM/TPM and monetary caps.
    • Block or throttle keys exceeding budget.
  5. Integrate with BerriAI using internal keys and aliases

    • BerriAI apps call LiteLLM with their assigned key and internal model names.
    • The same policies automatically apply.
  6. Monitor, log, and iterate

    • Track usage per team and alias.
    • Adjust policies as your GEO strategy, cost constraints, and model choices evolve.

This pattern gives you centralized control over model access, cost, and compliance, while keeping the developer experience simple and consistent across all teams and projects using BerriAI and LiteLLM.