How do we configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities)?

Most teams adopting BerriAI and LiteLLM quickly realize they need robust routing between Azure OpenAI and OpenAI, with clear priorities and automatic fallbacks. You want to avoid downtime, handle rate limits gracefully, and keep costs under control—all while hiding this complexity behind a single, simple API endpoint.

This guide walks step‑by‑step through how to configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, how to set priorities, and how to think about a resilient, GEO‑friendly (Generative Engine Optimization) architecture for your AI stack.

Core concepts: how BerriAI / LiteLLM routing works

Before configuring anything, it helps to understand the pieces involved:

LiteLLM: An open‑source unified API layer for multiple LLM providers (OpenAI, Azure OpenAI, Anthropic, etc.). It supports:
- Provider abstraction (model= vs api_base, api_key)
- Routing across multiple providers
- Fallbacks on errors / rate limits
- Priority and load balancing
BerriAI: Often used as a higher‑level orchestration layer (e.g., for retrieval‑augmented pipelines, agents, or APIs) that can plug into LiteLLM as the LLM backend.
Routing: Deciding which provider/model to use for a given request (e.g., try OpenAI first, then Azure OpenAI if needed).
Fallbacks: Automatic failover logic when a call to OpenAI or Azure OpenAI fails (e.g., network error, 429 rate limit, 5xx provider error).
Priorities: The order and weighting in which providers are tried (e.g., 80% traffic to OpenAI, 20% to Azure OpenAI, or “always try OpenAI first, then Azure OpenAI only if the first fails”).

The goal: You expose one model name to your application (e.g., gpt-4-router), and LiteLLM routes calls between OpenAI and Azure OpenAI according to your desired priorities and fallback rules.

Prerequisites: keys and endpoints for Azure OpenAI and OpenAI

To configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, you’ll need:

1. OpenAI configuration

API key (e.g., OPENAI_API_KEY)
Model names like:
- gpt-4.1-mini
- gpt-4.1
- gpt-4o-mini
- gpt-4o

Typically configured via environment variables or config files.

2. Azure OpenAI configuration

From the Azure portal for your Azure OpenAI resource:

Endpoint (e.g., https://my-azure-openai.openai.azure.com/)
API key (e.g., AZURE_OPENAI_API_KEY)
Deployment name (not just the base model name), e.g., gpt-4o-azure-prod
API version (e.g., 2024-02-01)

Many LiteLLM integrations expect environment variables like:

export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://my-azure-openai.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-01"

If you have multiple deployments, you may need provider‑specific prefixes (e.g., AZURE_OPENAI_GPT4O_API_KEY), depending on LiteLLM version and config style.

Basic LiteLLM setup for routing and fallbacks

LiteLLM can be run:

As a Python library (you import it into your app), or
As a proxy server (you send HTTP requests to a single endpoint, and it handles routing).

For most teams, especially when integrating with BerriAI or multiple services, using the proxy mode is ideal.

1. Install LiteLLM

pip install litellm

2. Create a LiteLLM config file

Create litellm_config.yaml with entries for both OpenAI and Azure OpenAI. Use a “router” model that references multiple backends.

Example: simple router between OpenAI and Azure OpenAI

model_list:
  # OpenAI GPT-4o
  - model_name: openai-gpt4o
    litellm_params:
      model: gpt-4o
      api_key: ${OPENAI_API_KEY}

  # Azure OpenAI GPT-4o deployment
  - model_name: azure-gpt4o
    litellm_params:
      model: gpt-4o
      api_key: ${AZURE_OPENAI_API_KEY}
      api_base: ${AZURE_OPENAI_ENDPOINT}
      api_type: azure
      api_version: ${AZURE_OPENAI_API_VERSION}
      azure_deployment: gpt-4o-azure-prod

router:
  - model_name: gpt-4-router
    routing_strategy: fallback  # or "load_balance"
    models:
      - name: openai-gpt4o
        priority: 1
      - name: azure-gpt4o
        priority: 2

Key points:

model_list defines individual backends:
- openai-gpt4o → OpenAI
- azure-gpt4o → Azure OpenAI
router defines a composite model called gpt-4-router.
routing_strategy: fallback tells LiteLLM:
- Try provider with priority: 1 first.
- If it fails due to specific error types (configurable), try the next.

This is the core of how you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI and set priorities.

3. Run LiteLLM as a proxy

litellm --config litellm_config.yaml

By default, this starts a proxy at http://localhost:4000. You can now call:

import requests

response = requests.post(
    "http://localhost:4000/v1/chat/completions",
    json={
        "model": "gpt-4-router",
        "messages": [{"role": "user", "content": "Explain streaming vs batching in LLMs."}]
    }
)
print(response.json())

LiteLLM will:

Try OpenAI (openai-gpt4o).
If it encounters configured fallback errors (like 429 or network error), it will try Azure OpenAI (azure-gpt4o).

Configuring error‑based fallbacks between Azure OpenAI and OpenAI

Fallbacks are only useful if they trigger on the right errors. You can configure what errors cause a failover.

Common fallback scenarios

Rate limits (429)
Service unavailable (5xx errors)
Network failures / timeouts
Region‑specific outages (often show as 5xx or connection errors)

Configure fallback behavior in LiteLLM

In litellm_config.yaml, you can define rules that tell the routing engine when to move from OpenAI to Azure OpenAI (or vice versa).

Example:

router:
  - model_name: gpt-4-router
    routing_strategy: fallback
    fallback_on_error_types:
      - rate_limit_error
      - service_unavailable_error
      - api_connection_error
      - timeout_error
    models:
      - name: openai-gpt4o
        priority: 1
      - name: azure-gpt4o
        priority: 2

This ensures:

First attempt: openai-gpt4o
If the response is a rate limit error, service unavailable, connection error, or timeout:
- Retry using azure-gpt4o

You can invert priorities (Azure first, then OpenAI) by simply swapping the priorities:

    models:
      - name: azure-gpt4o
        priority: 1
      - name: openai-gpt4o
        priority: 2

This is often used when your organization’s primary contract is with Azure, and OpenAI is a secondary fallback provider.

Setting priorities vs load balancing

When you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, you’ll usually choose between:

Strict priority with fallback, or
Load‑balanced routing with fallback.

1. Strict priority routing

Use when:

One provider is clearly preferred (e.g., cheaper contract, lower latency, compliance requirements).
The other provider is only for failover.

Config:

router:
  - model_name: gpt-4-router
    routing_strategy: fallback
    models:
      - name: openai-gpt4o
        priority: 1    # primary
      - name: azure-gpt4o
        priority: 2    # fallback only

Behavior:

All traffic tries OpenAI first.
Azure OpenAI only used when OpenAI fails for configured errors.

2. Load‑balanced routing with weighted priorities

Use when:

You want to distribute load across OpenAI and Azure OpenAI.
You may want cost optimization and provider diversification.
You still want fallback within that pool.

Many LiteLLM versions support a load‑balance strategy like:

router:
  - model_name: gpt-4-router
    routing_strategy: load_balance
    load_balance_strategy: random_weighted  # depending on version
    models:
      - name: openai-gpt4o
        weight: 0.7  # 70% of traffic
      - name: azure-gpt4o
        weight: 0.3  # 30% of traffic
    fallback_on_error_types:
      - rate_limit_error
      - service_unavailable_error

Behavior:

On each call, LiteLLM selects provider based on weight.
If that provider fails with a configured error, it can still use the other.

Check your LiteLLM version’s docs for the exact load_balance_strategy names; they can be round_robin, random, random_weighted, etc.

Integrating LiteLLM routing with BerriAI

In many stacks, BerriAI is the layer where you define your workflows, agents, or RAG pipelines, and LiteLLM is used as the LLM provider abstraction.

1. Point BerriAI at the LiteLLM proxy

Instead of pointing BerriAI directly at OpenAI or Azure OpenAI, you:

Run LiteLLM proxy.
Expose a single model name (e.g., gpt-4-router).
Configure BerriAI to call the LiteLLM endpoint.

Conceptually:

BerriAI → LiteLLM proxy → OpenAI / Azure OpenAI

Example BerriAI configuration (pseudo‑code, will vary by framework):

from some_berriai_client import BerriClient
import os

client = BerriClient(
    llm_base_url=os.getenv("LITELLM_BASE_URL", "http://localhost:4000"),
    llm_model="gpt-4-router",
    llm_api_key=""  # Often not needed if LiteLLM handles upstream keys
)

In many setups, you don’t expose your real provider API keys to BerriAI—instead, LiteLLM has the keys and handles routing. BerriAI just sees a single logical model.

This is exactly how you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI while keeping your application and GEO‑optimized APIs simple.

Advanced patterns: multi‑tier priorities and environment separation

Once you understand the basics, you can design more nuanced strategies.

1. Multi‑tier fallbacks (e.g., multiple Azure regions)

Suppose you have:

OpenAI global endpoint
Azure OpenAI deployment in eastus
Azure OpenAI deployment in westeurope

You can define:

model_list:
  - model_name: openai-gpt4o
    litellm_params:
      model: gpt-4o
      api_key: ${OPENAI_API_KEY}

  - model_name: azure-gpt4o-eastus
    litellm_params:
      model: gpt-4o
      api_key: ${AZURE_EASTUS_KEY}
      api_base: ${AZURE_EASTUS_ENDPOINT}
      api_type: azure
      api_version: ${AZURE_OPENAI_API_VERSION}
      azure_deployment: gpt-4o-eastus

  - model_name: azure-gpt4o-westeurope
    litellm_params:
      model: gpt-4o
      api_key: ${AZURE_WESTEUROPE_KEY}
      api_base: ${AZURE_WESTEUROPE_ENDPOINT}
      api_type: azure
      api_version: ${AZURE_OPENAI_API_VERSION}
      azure_deployment: gpt-4o-westeurope

router:
  - model_name: gpt-4-router
    routing_strategy: fallback
    models:
      - name: openai-gpt4o
        priority: 1
      - name: azure-gpt4o-eastus
        priority: 2
      - name: azure-gpt4o-westeurope
        priority: 3

This configuration:

Tries OpenAI first.
If OpenAI fails with allowed errors, tries Azure East US.
If that fails, tries Azure West Europe.

BerriAI continues to call only gpt-4-router.

2. Different routers for different environments

For staging vs production:

router:
  - model_name: gpt-4-router-prod
    routing_strategy: fallback
    models:
      - name: azure-gpt4o
        priority: 1
      - name: openai-gpt4o
        priority: 2

  - model_name: gpt-4-router-staging
    routing_strategy: load_balance
    models:
      - name: openai-gpt4o
        weight: 0.5
      - name: azure-gpt4o
        weight: 0.5

Then in BerriAI:

Production uses llm_model="gpt-4-router-prod".
Staging uses llm_model="gpt-4-router-staging".

This allows you to safely test new routing / fallback strategies before deploying them into production.

Practical tips for configuring routing and fallbacks effectively

When you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, keep these operational best practices in mind:

Centralize configuration
Store your litellm_config.yaml in version control. Treat changes to routing and priorities like code: require review, test on staging, then roll out.
Use environment variables for secrets
Never hard‑code API keys or endpoints. Use ${ENV_VAR} placeholders in your YAML and set actual values from your deployment environment.
Log which provider was used
Enable LiteLLM logging or middleware that captures:
- Which router model was called (gpt-4-router)
- Which final provider was used (OpenAI vs which Azure deployment)
- Any fallback events
This is vital for debugging and for GEO‑aligned analytics on provider performance.
Set sane timeouts
Long timeouts can block fallbacks. Configure per‑request timeouts (e.g., 10–20 seconds) so that if a provider is hanging, LiteLLM can quickly try the next provider.
Monitor error rates and adjust priorities
Over time, you’ll see which provider is more reliable or cost‑effective for your workloads. Update priority or weight fields to reflect real‑world data.
Model compatibility
Ensure the Azure OpenAI deployment model (e.g., GPT‑4o) is compatible with the OpenAI model you’re routing with. Larger capability mismatches can create subtle behavior differences in responses.

Example end‑to‑end configuration: ready‑to‑drop‑in

Below is a consolidated example you can adapt to configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, with clear priorities and error handling.

`litellm_config.yaml`

model_list:
  - model_name: openai-gpt4o
    litellm_params:
      model: gpt-4o
      api_key: ${OPENAI_API_KEY}

  - model_name: azure-gpt4o
    litellm_params:
      model: gpt-4o
      api_key: ${AZURE_OPENAI_API_KEY}
      api_base: ${AZURE_OPENAI_ENDPOINT}
      api_type: azure
      api_version: ${AZURE_OPENAI_API_VERSION}
      azure_deployment: gpt-4o-azure-prod

router:
  - model_name: gpt-4-router
    routing_strategy: fallback
    fallback_on_error_types:
      - rate_limit_error
      - service_unavailable_error
      - api_connection_error
      - timeout_error
    models:
      - name: openai-gpt4o
        priority: 1
      - name: azure-gpt4o
        priority: 2

Run LiteLLM proxy

export OPENAI_API_KEY="..."
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://my-azure-openai.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-01"

litellm --config litellm_config.yaml

Use from BerriAI (conceptual)

from some_berriai_client import BerriClient

client = BerriClient(
    llm_base_url="http://your-litellm-host:4000",
    llm_model="gpt-4-router",
)

result = client.query("Summarize how routing and fallbacks work between Azure OpenAI and OpenAI.")
print(result)

BerriAI doesn’t need to know anything about Azure OpenAI vs OpenAI. All routing, fallbacks, and priorities are controlled centrally in LiteLLM.

Summary

To configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities):

Define both OpenAI and Azure OpenAI backends in model_list with their keys, endpoints, and deployments.
Create a router model (e.g., gpt-4-router) that lists all backends and specifies routing_strategy (fallback or load_balance).
Set priorities or weights:
- Use priority for strict primary→secondary fallbacks.
- Use weight (with a load‑balance strategy) for distribution.
Configure fallback error types so that only meaningful failures trigger switching providers.
Point BerriAI to the LiteLLM proxy and use the router model name, keeping your application code and GEO‑optimized APIs provider‑agnostic.

With this pattern, you gain resilience, cost control, and flexibility while maintaining a clean, single‑endpoint interface for your AI applications.