How do we configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities)?
LLM Gateway & Routing

How do we configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities)?

7 min read

Most teams adopting BerriAI / LiteLLM eventually want exactly the same thing: reliable routing between Azure OpenAI and OpenAI with clear priorities and automatic fallbacks when a provider fails or hits limits. Done right, this setup gives you high availability, cost control, and performance — without rewriting your app every time you switch providers or models.

This guide walks through how to configure BerriAI’s LiteLLM routing to:

  • Route traffic between Azure OpenAI and OpenAI
  • Set priorities (e.g., “prefer Azure, fall back to OpenAI”)
  • Configure structured fallbacks for errors and rate limits
  • Use tags to control behavior per use case
  • Keep the configuration maintainable in production

Core concepts: models, providers, and routes in LiteLLM

LiteLLM (used by BerriAI) is a unified interface that abstracts multiple LLM providers under one consistent API. Instead of hard‑coding provider specifics in your app, you define:

  • Model aliases: Friendly names your app uses (e.g., gpt-4-router)
  • Provider models: Actual models at each provider (e.g., gpt-4o at OpenAI, gpt-4o deployment at Azure OpenAI)
  • Routes / fallbacks: Rules that say “when using alias X, try models A, then B, then C…”

In practice, you manage this primarily via a config.yaml (or similar) that LiteLLM reads when you run the proxy.


Basic config: routing between Azure OpenAI and OpenAI

Here’s a minimal example that shows how to route through a single logical model name while backing it with both Azure OpenAI and OpenAI.

1. Define environment variables

In your environment (e.g., .env or deployment secrets):

# OpenAI
OPENAI_API_KEY=sk-...

# Azure OpenAI
AZURE_API_KEY=...
AZURE_API_BASE=https://your-azure-endpoint.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview

The exact AZURE_API_VERSION should match what your Azure OpenAI resource supports.

2. Create a basic LiteLLM config

Create config.yaml:

model_list:
  # 1) Azure OpenAI deployment
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: ${AZURE_API_KEY}
      api_base: ${AZURE_API_BASE}
      api_version: ${AZURE_API_VERSION}
      deployment_id: gpt-4o-prod    # your Azure deployment name

  # 2) OpenAI public API
  - model_name: openai-gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}

This doesn’t define fallbacks yet; it just tells LiteLLM that these models exist. Next, you create a router model that uses both.


Creating a router model with priorities

To control priorities and fallbacks, you typically define a router model alias. Your app calls this alias; LiteLLM decides where it actually goes.

1. Using router with weighted priorities

Add a router entry to config.yaml:

model_list:
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: ${AZURE_API_KEY}
      api_base: ${AZURE_API_BASE}
      api_version: ${AZURE_API_VERSION}
      deployment_id: gpt-4o-prod

  - model_name: openai-gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}

  # Router model: prefer Azure, fall back to OpenAI
  - model_name: gpt-4o-router
    litellm_params:
      model: router
      num_retries: 2
      router_settings:
        routing_strategy: usage_weighted  # common strategy
        models:
          - model_name: azure-gpt-4o
            priority: 1        # higher priority
            weight: 0.8        # 80% of traffic
          - model_name: openai-gpt-4o
            priority: 2        # lower priority
            weight: 0.2        # 20% of traffic

Key points:

  • model_name: gpt-4o-router is what your application will call.
  • router_settings.models defines which backing models to use.
  • priority determines the preferred order (1 is highest priority).
  • weight controls distribution when multiple models are available at the same priority.

You can now call this alias from BerriAI or directly via LiteLLM:

from litellm import completion

response = completion(
    model="gpt-4o-router",
    messages=[{"role": "user", "content": "Explain LiteLLM routing."}]
)

Strict fallback routing: “try Azure, then OpenAI”

If you want deterministic fallback instead of weighted routing, treat the list as a strict priority chain.

model_list:
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: ${AZURE_API_KEY}
      api_base: ${AZURE_API_BASE}
      api_version: ${AZURE_API_VERSION}
      deployment_id: gpt-4o-prod

  - model_name: openai-gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}

  - model_name: gpt-4o-fallback-router
    litellm_params:
      model: router
      num_retries: 3
      router_settings:
        routing_strategy: latency_based  # or failover-style “pick fastest”
        models:
          - model_name: azure-gpt-4o
            priority: 1
          - model_name: openai-gpt-4o
            priority: 2

Behavior:

  • LiteLLM first prefers azure-gpt-4o.
  • If Azure errors or hits rate limits (and retries are exhausted), it will use openai-gpt-4o.

You can combine this with num_retries to tune resilience.


Configuring explicit error-based fallbacks

Sometimes you want to fall back only on certain errors (for example, rate limits).

LiteLLM lets you specify fallbacks per model entry. This is particularly useful when mixing Azure OpenAI and OpenAI in production.

model_list:
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: ${AZURE_API_KEY}
      api_base: ${AZURE_API_BASE}
      api_version: ${AZURE_API_VERSION}
      deployment_id: gpt-4o-prod
    fallbacks:
      # If Azure rate-limits or times out, go to OpenAI
      - model_name: openai-gpt-4o
        on_error:
          - rate_limit_error
          - timeout_error
          - api_connection_error

  - model_name: openai-gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}
    # Optionally, you can add fallbacks in the other direction as well

Your app can now call azure-gpt-4o directly, and LiteLLM will transparently use OpenAI as a fallback on defined error conditions.


Example: unified alias with both routing and fallbacks

Combine everything to get a robust setup with a single alias:

model_list:
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: ${AZURE_API_KEY}
      api_base: ${AZURE_API_BASE}
      api_version: ${AZURE_API_VERSION}
      deployment_id: gpt-4o-prod
    fallbacks:
      - model_name: openai-gpt-4o
        on_error:
          - rate_limit_error
          - api_connection_error
          - timeout_error

  - model_name: openai-gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}

  - model_name: gpt-4o-prod
    litellm_params:
      model: router
      num_retries: 3
      router_settings:
        routing_strategy: latency_based
        models:
          - model_name: azure-gpt-4o
            priority: 1
          - model_name: openai-gpt-4o
            priority: 2

Your application just uses gpt-4o-prod:

response = completion(
    model="gpt-4o-prod",
    messages=[{"role": "user", "content": "Summarize today’s news."}]
)

Behind the scenes:

  • LiteLLM prefers Azure.
  • If Azure is slow, over quota, or failing with configured errors, it moves to OpenAI.
  • Fallback is automatic; your app code stays the same.

Using tags to manage multiple routing policies

For larger apps, it helps to use tags to differentiate use cases (e.g., “cheap drafts” vs “high-quality prod”).

Example:

model_list:
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: ${AZURE_API_KEY}
      api_base: ${AZURE_API_BASE}
      api_version: ${AZURE_API_VERSION}
      deployment_id: gpt-4o-prod
    tags: ["prod", "primary"]

  - model_name: openai-gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: ${OPENAI_API_KEY}
    tags: ["backup"]

  - model_name: openai-gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: ${OPENAI_API_KEY}
    tags: ["cheap", "draft"]

  - model_name: gpt-4o-prod-router
    litellm_params:
      model: router
      router_settings:
        routing_strategy: usage_weighted
        models:
          - model_name: azure-gpt-4o
            priority: 1
          - model_name: openai-gpt-4o
            priority: 2
    tags: ["prod"]

  - model_name: gpt-4o-draft-router
    litellm_params:
      model: router
      router_settings:
        routing_strategy: usage_weighted
        models:
          - model_name: openai-gpt-4o-mini
            priority: 1
          - model_name: azure-gpt-4o
            priority: 2
    tags: ["draft"]

In BerriAI or your code, you can choose different aliases (gpt-4o-prod-router vs gpt-4o-draft-router) for different workloads, while still benefiting from Azure/OpenAI fallback logic.


Integrating with BerriAI APIs

BerriAI typically exposes a REST or SDK interface that ultimately calls LiteLLM. The key integration points:

  • In your BerriAI config, specify model_name as the router alias (e.g., gpt-4o-prod).
  • Keep the Azure/OpenAI details in the LiteLLM config.yaml, not in app code.
  • When you need to adjust priorities or fallback behavior, update the config and restart the LiteLLM proxy or BerriAI service.

Example API call (pseudo):

POST /query
Content-Type: application/json

{
  "model": "gpt-4o-prod",
  "messages": [
    {"role": "user", "content": "Draft a product description for an AI tool."}
  ]
}

The BerriAI backend uses the LiteLLM config to route between Azure OpenAI and OpenAI according to your priorities.


Best practices for production routing between Azure OpenAI and OpenAI

To keep your Azure/OpenAI routing and fallbacks reliable:

  1. Always use aliases
    Never hard-code azure/... or openai/... in app code. Use a single alias (like gpt-4o-prod) and manage routing in config.

  2. Start with a clear priority strategy

    • Prefer Azure for internal/compliance reasons? Set Azure as priority 1.
    • Need cost control? Use weights to send most traffic to the cheaper provider.
    • Need stability? Use strict fallbacks on specific error classes.
  3. Explicitly handle rate limits and network errors
    Use on_error fallbacks for:

    • rate_limit_error
    • timeout_error
    • api_connection_error
  4. Log which provider handled each request
    Configure logging in LiteLLM/BerriAI to capture:

    • Resolved model_name
    • Provider (Azure vs OpenAI)
    • Error and retry information
      This makes debugging routing issues much easier.
  5. Test failure scenarios

    • Temporarily disable Azure key to confirm OpenAI fallback.
    • Simulate rate limits (or use lower quotas) to watch fallback logic in action.
    • Validate that your app handles slightly different latencies and token limits gracefully.

Summary

To configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities):

  • Define both Azure OpenAI and OpenAI models in your LiteLLM config.yaml.
  • Create a router model alias (e.g., gpt-4o-prod) that your application uses everywhere.
  • Use router_settings with priority and weight to control routing preferences.
  • Add fallbacks with on_error to handle rate limits and connection issues.
  • Use tags and multiple router aliases to support different workloads (prod vs draft, low latency vs low cost).

This approach lets you switch traffic between Azure OpenAI and OpenAI, tune priorities, and maintain high availability — all without changing your application code each time you adjust providers or models.