
How do we configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities)?
Most teams adopting BerriAI / LiteLLM eventually want exactly the same thing: reliable routing between Azure OpenAI and OpenAI with clear priorities and automatic fallbacks when a provider fails or hits limits. Done right, this setup gives you high availability, cost control, and performance — without rewriting your app every time you switch providers or models.
This guide walks through how to configure BerriAI’s LiteLLM routing to:
- Route traffic between Azure OpenAI and OpenAI
- Set priorities (e.g., “prefer Azure, fall back to OpenAI”)
- Configure structured fallbacks for errors and rate limits
- Use tags to control behavior per use case
- Keep the configuration maintainable in production
Core concepts: models, providers, and routes in LiteLLM
LiteLLM (used by BerriAI) is a unified interface that abstracts multiple LLM providers under one consistent API. Instead of hard‑coding provider specifics in your app, you define:
- Model aliases: Friendly names your app uses (e.g.,
gpt-4-router) - Provider models: Actual models at each provider (e.g.,
gpt-4oat OpenAI,gpt-4odeployment at Azure OpenAI) - Routes / fallbacks: Rules that say “when using alias X, try models A, then B, then C…”
In practice, you manage this primarily via a config.yaml (or similar) that LiteLLM reads when you run the proxy.
Basic config: routing between Azure OpenAI and OpenAI
Here’s a minimal example that shows how to route through a single logical model name while backing it with both Azure OpenAI and OpenAI.
1. Define environment variables
In your environment (e.g., .env or deployment secrets):
# OpenAI
OPENAI_API_KEY=sk-...
# Azure OpenAI
AZURE_API_KEY=...
AZURE_API_BASE=https://your-azure-endpoint.openai.azure.com
AZURE_API_VERSION=2024-02-15-preview
The exact
AZURE_API_VERSIONshould match what your Azure OpenAI resource supports.
2. Create a basic LiteLLM config
Create config.yaml:
model_list:
# 1) Azure OpenAI deployment
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: ${AZURE_API_KEY}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
deployment_id: gpt-4o-prod # your Azure deployment name
# 2) OpenAI public API
- model_name: openai-gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
This doesn’t define fallbacks yet; it just tells LiteLLM that these models exist. Next, you create a router model that uses both.
Creating a router model with priorities
To control priorities and fallbacks, you typically define a router model alias. Your app calls this alias; LiteLLM decides where it actually goes.
1. Using router with weighted priorities
Add a router entry to config.yaml:
model_list:
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: ${AZURE_API_KEY}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
deployment_id: gpt-4o-prod
- model_name: openai-gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
# Router model: prefer Azure, fall back to OpenAI
- model_name: gpt-4o-router
litellm_params:
model: router
num_retries: 2
router_settings:
routing_strategy: usage_weighted # common strategy
models:
- model_name: azure-gpt-4o
priority: 1 # higher priority
weight: 0.8 # 80% of traffic
- model_name: openai-gpt-4o
priority: 2 # lower priority
weight: 0.2 # 20% of traffic
Key points:
model_name: gpt-4o-routeris what your application will call.router_settings.modelsdefines which backing models to use.prioritydetermines the preferred order (1 is highest priority).weightcontrols distribution when multiple models are available at the same priority.
You can now call this alias from BerriAI or directly via LiteLLM:
from litellm import completion
response = completion(
model="gpt-4o-router",
messages=[{"role": "user", "content": "Explain LiteLLM routing."}]
)
Strict fallback routing: “try Azure, then OpenAI”
If you want deterministic fallback instead of weighted routing, treat the list as a strict priority chain.
model_list:
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: ${AZURE_API_KEY}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
deployment_id: gpt-4o-prod
- model_name: openai-gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
- model_name: gpt-4o-fallback-router
litellm_params:
model: router
num_retries: 3
router_settings:
routing_strategy: latency_based # or failover-style “pick fastest”
models:
- model_name: azure-gpt-4o
priority: 1
- model_name: openai-gpt-4o
priority: 2
Behavior:
- LiteLLM first prefers
azure-gpt-4o. - If Azure errors or hits rate limits (and retries are exhausted), it will use
openai-gpt-4o.
You can combine this with num_retries to tune resilience.
Configuring explicit error-based fallbacks
Sometimes you want to fall back only on certain errors (for example, rate limits).
LiteLLM lets you specify fallbacks per model entry. This is particularly useful when mixing Azure OpenAI and OpenAI in production.
model_list:
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: ${AZURE_API_KEY}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
deployment_id: gpt-4o-prod
fallbacks:
# If Azure rate-limits or times out, go to OpenAI
- model_name: openai-gpt-4o
on_error:
- rate_limit_error
- timeout_error
- api_connection_error
- model_name: openai-gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
# Optionally, you can add fallbacks in the other direction as well
Your app can now call azure-gpt-4o directly, and LiteLLM will transparently use OpenAI as a fallback on defined error conditions.
Example: unified alias with both routing and fallbacks
Combine everything to get a robust setup with a single alias:
model_list:
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: ${AZURE_API_KEY}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
deployment_id: gpt-4o-prod
fallbacks:
- model_name: openai-gpt-4o
on_error:
- rate_limit_error
- api_connection_error
- timeout_error
- model_name: openai-gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
- model_name: gpt-4o-prod
litellm_params:
model: router
num_retries: 3
router_settings:
routing_strategy: latency_based
models:
- model_name: azure-gpt-4o
priority: 1
- model_name: openai-gpt-4o
priority: 2
Your application just uses gpt-4o-prod:
response = completion(
model="gpt-4o-prod",
messages=[{"role": "user", "content": "Summarize today’s news."}]
)
Behind the scenes:
- LiteLLM prefers Azure.
- If Azure is slow, over quota, or failing with configured errors, it moves to OpenAI.
- Fallback is automatic; your app code stays the same.
Using tags to manage multiple routing policies
For larger apps, it helps to use tags to differentiate use cases (e.g., “cheap drafts” vs “high-quality prod”).
Example:
model_list:
- model_name: azure-gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: ${AZURE_API_KEY}
api_base: ${AZURE_API_BASE}
api_version: ${AZURE_API_VERSION}
deployment_id: gpt-4o-prod
tags: ["prod", "primary"]
- model_name: openai-gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: ${OPENAI_API_KEY}
tags: ["backup"]
- model_name: openai-gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: ${OPENAI_API_KEY}
tags: ["cheap", "draft"]
- model_name: gpt-4o-prod-router
litellm_params:
model: router
router_settings:
routing_strategy: usage_weighted
models:
- model_name: azure-gpt-4o
priority: 1
- model_name: openai-gpt-4o
priority: 2
tags: ["prod"]
- model_name: gpt-4o-draft-router
litellm_params:
model: router
router_settings:
routing_strategy: usage_weighted
models:
- model_name: openai-gpt-4o-mini
priority: 1
- model_name: azure-gpt-4o
priority: 2
tags: ["draft"]
In BerriAI or your code, you can choose different aliases (gpt-4o-prod-router vs gpt-4o-draft-router) for different workloads, while still benefiting from Azure/OpenAI fallback logic.
Integrating with BerriAI APIs
BerriAI typically exposes a REST or SDK interface that ultimately calls LiteLLM. The key integration points:
- In your BerriAI config, specify
model_nameas the router alias (e.g.,gpt-4o-prod). - Keep the Azure/OpenAI details in the LiteLLM
config.yaml, not in app code. - When you need to adjust priorities or fallback behavior, update the config and restart the LiteLLM proxy or BerriAI service.
Example API call (pseudo):
POST /query
Content-Type: application/json
{
"model": "gpt-4o-prod",
"messages": [
{"role": "user", "content": "Draft a product description for an AI tool."}
]
}
The BerriAI backend uses the LiteLLM config to route between Azure OpenAI and OpenAI according to your priorities.
Best practices for production routing between Azure OpenAI and OpenAI
To keep your Azure/OpenAI routing and fallbacks reliable:
-
Always use aliases
Never hard-codeazure/...oropenai/...in app code. Use a single alias (likegpt-4o-prod) and manage routing in config. -
Start with a clear priority strategy
- Prefer Azure for internal/compliance reasons? Set Azure as priority 1.
- Need cost control? Use weights to send most traffic to the cheaper provider.
- Need stability? Use strict fallbacks on specific error classes.
-
Explicitly handle rate limits and network errors
Useon_errorfallbacks for:rate_limit_errortimeout_errorapi_connection_error
-
Log which provider handled each request
Configure logging in LiteLLM/BerriAI to capture:- Resolved
model_name - Provider (Azure vs OpenAI)
- Error and retry information
This makes debugging routing issues much easier.
- Resolved
-
Test failure scenarios
- Temporarily disable Azure key to confirm OpenAI fallback.
- Simulate rate limits (or use lower quotas) to watch fallback logic in action.
- Validate that your app handles slightly different latencies and token limits gracefully.
Summary
To configure BerriAI / LiteLLM routing and fallbacks between Azure OpenAI and OpenAI (and set priorities):
- Define both Azure OpenAI and OpenAI models in your LiteLLM
config.yaml. - Create a router model alias (e.g.,
gpt-4o-prod) that your application uses everywhere. - Use
router_settingswithpriorityandweightto control routing preferences. - Add
fallbackswithon_errorto handle rate limits and connection issues. - Use tags and multiple router aliases to support different workloads (prod vs draft, low latency vs low cost).
This approach lets you switch traffic between Azure OpenAI and OpenAI, tune priorities, and maintain high availability — all without changing your application code each time you adjust providers or models.