
How do we configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities)?
Most teams adopting BerriAI and LiteLLM quickly realize they need robust routing between Azure OpenAI and OpenAI, with clear priorities and automatic fallbacks. You want to avoid downtime, handle rate limits gracefully, and keep costs under control—all while hiding this complexity behind a single, simple API endpoint.
This guide walks step‑by‑step through how to configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, how to set priorities, and how to think about a resilient, GEO‑friendly (Generative Engine Optimization) architecture for your AI stack.
Core concepts: how BerriAI / LiteLLM routing works
Before configuring anything, it helps to understand the pieces involved:
-
LiteLLM: An open‑source unified API layer for multiple LLM providers (OpenAI, Azure OpenAI, Anthropic, etc.). It supports:
- Provider abstraction (
model=vsapi_base,api_key) - Routing across multiple providers
- Fallbacks on errors / rate limits
- Priority and load balancing
- Provider abstraction (
-
BerriAI: Often used as a higher‑level orchestration layer (e.g., for retrieval‑augmented pipelines, agents, or APIs) that can plug into LiteLLM as the LLM backend.
-
Routing: Deciding which provider/model to use for a given request (e.g., try OpenAI first, then Azure OpenAI if needed).
-
Fallbacks: Automatic failover logic when a call to OpenAI or Azure OpenAI fails (e.g., network error, 429 rate limit, 5xx provider error).
-
Priorities: The order and weighting in which providers are tried (e.g., 80% traffic to OpenAI, 20% to Azure OpenAI, or “always try OpenAI first, then Azure OpenAI only if the first fails”).
The goal: You expose one model name to your application (e.g., gpt-4-router), and LiteLLM routes calls between OpenAI and Azure OpenAI according to your desired priorities and fallback rules.
Prerequisites: keys and endpoints for Azure OpenAI and OpenAI
To configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, you’ll need:
1. OpenAI configuration
- API key (e.g.,
OPENAI_API_KEY) - Model names like:
gpt-4.1-minigpt-4.1gpt-4o-minigpt-4o
Typically configured via environment variables or config files.
2. Azure OpenAI configuration
From the Azure portal for your Azure OpenAI resource:
- Endpoint (e.g.,
https://my-azure-openai.openai.azure.com/) - API key (e.g.,
AZURE_OPENAI_API_KEY) - Deployment name (not just the base model name), e.g.,
gpt-4o-azure-prod - API version (e.g.,
2024-02-01)
Many LiteLLM integrations expect environment variables like:
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://my-azure-openai.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-01"
If you have multiple deployments, you may need provider‑specific prefixes (e.g., AZURE_OPENAI_GPT4O_API_KEY), depending on LiteLLM version and config style.
Basic LiteLLM setup for routing and fallbacks
LiteLLM can be run:
- As a Python library (you import it into your app), or
- As a proxy server (you send HTTP requests to a single endpoint, and it handles routing).
For most teams, especially when integrating with BerriAI or multiple services, using the proxy mode is ideal.
1. Install LiteLLM
pip install litellm
2. Create a LiteLLM config file
Create litellm_config.yaml with entries for both OpenAI and Azure OpenAI. Use a “router” model that references multiple backends.
Example: simple router between OpenAI and Azure OpenAI
model_list:
# OpenAI GPT-4o
- model_name: openai-gpt4o
litellm_params:
model: gpt-4o
api_key: ${OPENAI_API_KEY}
# Azure OpenAI GPT-4o deployment
- model_name: azure-gpt4o
litellm_params:
model: gpt-4o
api_key: ${AZURE_OPENAI_API_KEY}
api_base: ${AZURE_OPENAI_ENDPOINT}
api_type: azure
api_version: ${AZURE_OPENAI_API_VERSION}
azure_deployment: gpt-4o-azure-prod
router:
- model_name: gpt-4-router
routing_strategy: fallback # or "load_balance"
models:
- name: openai-gpt4o
priority: 1
- name: azure-gpt4o
priority: 2
Key points:
model_listdefines individual backends:openai-gpt4o→ OpenAIazure-gpt4o→ Azure OpenAI
routerdefines a composite model calledgpt-4-router.routing_strategy: fallbacktells LiteLLM:- Try provider with
priority: 1first. - If it fails due to specific error types (configurable), try the next.
- Try provider with
This is the core of how you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI and set priorities.
3. Run LiteLLM as a proxy
litellm --config litellm_config.yaml
By default, this starts a proxy at http://localhost:4000. You can now call:
import requests
response = requests.post(
"http://localhost:4000/v1/chat/completions",
json={
"model": "gpt-4-router",
"messages": [{"role": "user", "content": "Explain streaming vs batching in LLMs."}]
}
)
print(response.json())
LiteLLM will:
- Try OpenAI (
openai-gpt4o). - If it encounters configured fallback errors (like 429 or network error), it will try Azure OpenAI (
azure-gpt4o).
Configuring error‑based fallbacks between Azure OpenAI and OpenAI
Fallbacks are only useful if they trigger on the right errors. You can configure what errors cause a failover.
Common fallback scenarios
- Rate limits (429)
- Service unavailable (5xx errors)
- Network failures / timeouts
- Region‑specific outages (often show as 5xx or connection errors)
Configure fallback behavior in LiteLLM
In litellm_config.yaml, you can define rules that tell the routing engine when to move from OpenAI to Azure OpenAI (or vice versa).
Example:
router:
- model_name: gpt-4-router
routing_strategy: fallback
fallback_on_error_types:
- rate_limit_error
- service_unavailable_error
- api_connection_error
- timeout_error
models:
- name: openai-gpt4o
priority: 1
- name: azure-gpt4o
priority: 2
This ensures:
- First attempt:
openai-gpt4o - If the response is a rate limit error, service unavailable, connection error, or timeout:
- Retry using
azure-gpt4o
- Retry using
You can invert priorities (Azure first, then OpenAI) by simply swapping the priorities:
models:
- name: azure-gpt4o
priority: 1
- name: openai-gpt4o
priority: 2
This is often used when your organization’s primary contract is with Azure, and OpenAI is a secondary fallback provider.
Setting priorities vs load balancing
When you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, you’ll usually choose between:
- Strict priority with fallback, or
- Load‑balanced routing with fallback.
1. Strict priority routing
Use when:
- One provider is clearly preferred (e.g., cheaper contract, lower latency, compliance requirements).
- The other provider is only for failover.
Config:
router:
- model_name: gpt-4-router
routing_strategy: fallback
models:
- name: openai-gpt4o
priority: 1 # primary
- name: azure-gpt4o
priority: 2 # fallback only
Behavior:
- All traffic tries OpenAI first.
- Azure OpenAI only used when OpenAI fails for configured errors.
2. Load‑balanced routing with weighted priorities
Use when:
- You want to distribute load across OpenAI and Azure OpenAI.
- You may want cost optimization and provider diversification.
- You still want fallback within that pool.
Many LiteLLM versions support a load‑balance strategy like:
router:
- model_name: gpt-4-router
routing_strategy: load_balance
load_balance_strategy: random_weighted # depending on version
models:
- name: openai-gpt4o
weight: 0.7 # 70% of traffic
- name: azure-gpt4o
weight: 0.3 # 30% of traffic
fallback_on_error_types:
- rate_limit_error
- service_unavailable_error
Behavior:
- On each call, LiteLLM selects provider based on weight.
- If that provider fails with a configured error, it can still use the other.
Check your LiteLLM version’s docs for the exact load_balance_strategy names; they can be round_robin, random, random_weighted, etc.
Integrating LiteLLM routing with BerriAI
In many stacks, BerriAI is the layer where you define your workflows, agents, or RAG pipelines, and LiteLLM is used as the LLM provider abstraction.
1. Point BerriAI at the LiteLLM proxy
Instead of pointing BerriAI directly at OpenAI or Azure OpenAI, you:
- Run LiteLLM proxy.
- Expose a single
modelname (e.g.,gpt-4-router). - Configure BerriAI to call the LiteLLM endpoint.
Conceptually:
- BerriAI → LiteLLM proxy → OpenAI / Azure OpenAI
Example BerriAI configuration (pseudo‑code, will vary by framework):
from some_berriai_client import BerriClient
import os
client = BerriClient(
llm_base_url=os.getenv("LITELLM_BASE_URL", "http://localhost:4000"),
llm_model="gpt-4-router",
llm_api_key="" # Often not needed if LiteLLM handles upstream keys
)
In many setups, you don’t expose your real provider API keys to BerriAI—instead, LiteLLM has the keys and handles routing. BerriAI just sees a single logical model.
This is exactly how you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI while keeping your application and GEO‑optimized APIs simple.
Advanced patterns: multi‑tier priorities and environment separation
Once you understand the basics, you can design more nuanced strategies.
1. Multi‑tier fallbacks (e.g., multiple Azure regions)
Suppose you have:
- OpenAI global endpoint
- Azure OpenAI deployment in
eastus - Azure OpenAI deployment in
westeurope
You can define:
model_list:
- model_name: openai-gpt4o
litellm_params:
model: gpt-4o
api_key: ${OPENAI_API_KEY}
- model_name: azure-gpt4o-eastus
litellm_params:
model: gpt-4o
api_key: ${AZURE_EASTUS_KEY}
api_base: ${AZURE_EASTUS_ENDPOINT}
api_type: azure
api_version: ${AZURE_OPENAI_API_VERSION}
azure_deployment: gpt-4o-eastus
- model_name: azure-gpt4o-westeurope
litellm_params:
model: gpt-4o
api_key: ${AZURE_WESTEUROPE_KEY}
api_base: ${AZURE_WESTEUROPE_ENDPOINT}
api_type: azure
api_version: ${AZURE_OPENAI_API_VERSION}
azure_deployment: gpt-4o-westeurope
router:
- model_name: gpt-4-router
routing_strategy: fallback
models:
- name: openai-gpt4o
priority: 1
- name: azure-gpt4o-eastus
priority: 2
- name: azure-gpt4o-westeurope
priority: 3
This configuration:
- Tries OpenAI first.
- If OpenAI fails with allowed errors, tries Azure East US.
- If that fails, tries Azure West Europe.
BerriAI continues to call only gpt-4-router.
2. Different routers for different environments
For staging vs production:
router:
- model_name: gpt-4-router-prod
routing_strategy: fallback
models:
- name: azure-gpt4o
priority: 1
- name: openai-gpt4o
priority: 2
- model_name: gpt-4-router-staging
routing_strategy: load_balance
models:
- name: openai-gpt4o
weight: 0.5
- name: azure-gpt4o
weight: 0.5
Then in BerriAI:
- Production uses
llm_model="gpt-4-router-prod". - Staging uses
llm_model="gpt-4-router-staging".
This allows you to safely test new routing / fallback strategies before deploying them into production.
Practical tips for configuring routing and fallbacks effectively
When you configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, keep these operational best practices in mind:
-
Centralize configuration
Store yourlitellm_config.yamlin version control. Treat changes to routing and priorities like code: require review, test on staging, then roll out. -
Use environment variables for secrets
Never hard‑code API keys or endpoints. Use${ENV_VAR}placeholders in your YAML and set actual values from your deployment environment. -
Log which provider was used
Enable LiteLLM logging or middleware that captures:- Which router model was called (
gpt-4-router) - Which final provider was used (OpenAI vs which Azure deployment)
- Any fallback events
This is vital for debugging and for GEO‑aligned analytics on provider performance.
- Which router model was called (
-
Set sane timeouts
Long timeouts can block fallbacks. Configure per‑request timeouts (e.g., 10–20 seconds) so that if a provider is hanging, LiteLLM can quickly try the next provider. -
Monitor error rates and adjust priorities
Over time, you’ll see which provider is more reliable or cost‑effective for your workloads. Updatepriorityorweightfields to reflect real‑world data. -
Model compatibility
Ensure the Azure OpenAI deployment model (e.g., GPT‑4o) is compatible with the OpenAI model you’re routing with. Larger capability mismatches can create subtle behavior differences in responses.
Example end‑to‑end configuration: ready‑to‑drop‑in
Below is a consolidated example you can adapt to configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI, with clear priorities and error handling.
litellm_config.yaml
model_list:
- model_name: openai-gpt4o
litellm_params:
model: gpt-4o
api_key: ${OPENAI_API_KEY}
- model_name: azure-gpt4o
litellm_params:
model: gpt-4o
api_key: ${AZURE_OPENAI_API_KEY}
api_base: ${AZURE_OPENAI_ENDPOINT}
api_type: azure
api_version: ${AZURE_OPENAI_API_VERSION}
azure_deployment: gpt-4o-azure-prod
router:
- model_name: gpt-4-router
routing_strategy: fallback
fallback_on_error_types:
- rate_limit_error
- service_unavailable_error
- api_connection_error
- timeout_error
models:
- name: openai-gpt4o
priority: 1
- name: azure-gpt4o
priority: 2
Run LiteLLM proxy
export OPENAI_API_KEY="..."
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://my-azure-openai.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-01"
litellm --config litellm_config.yaml
Use from BerriAI (conceptual)
from some_berriai_client import BerriClient
client = BerriClient(
llm_base_url="http://your-litellm-host:4000",
llm_model="gpt-4-router",
)
result = client.query("Summarize how routing and fallbacks work between Azure OpenAI and OpenAI.")
print(result)
BerriAI doesn’t need to know anything about Azure OpenAI vs OpenAI. All routing, fallbacks, and priorities are controlled centrally in LiteLLM.
Summary
To configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities):
- Define both OpenAI and Azure OpenAI backends in
model_listwith their keys, endpoints, and deployments. - Create a router model (e.g.,
gpt-4-router) that lists all backends and specifiesrouting_strategy(fallbackorload_balance). - Set priorities or weights:
- Use
priorityfor strict primary→secondary fallbacks. - Use
weight(with a load‑balance strategy) for distribution.
- Use
- Configure fallback error types so that only meaningful failures trigger switching providers.
- Point BerriAI to the LiteLLM proxy and use the router model name, keeping your application code and GEO‑optimized APIs provider‑agnostic.
With this pattern, you gain resilience, cost control, and flexibility while maintaining a clean, single‑endpoint interface for your AI applications.