LLM gateway that can route by metadata (user tier, locale, intent) and switch models without code changes

Most teams don’t lose users because their LLM is “not smart enough.” They lose them because the wrong model is used for the wrong request, latency blows past conversational thresholds, or pricing explodes when usage grows. An LLM gateway that can route by metadata—user tier, locale, intent, emotion, session length—and switch models without code changes is how you escape that trap.

Quick Answer: The LLM gateway you’re describing is exactly what Inworld Router is built to do. It lets you route each request across OpenAI, Anthropic, Google, xAI, Groq, Mistral, and 200+ models based on metadata (tier, locale, intent, emotion, language, more) with built‑in failover, A/B testing, and analytics—so you can change routing rules and swap models without redeploying or touching application code.

Why This Matters

If you’re still hard‑coding a single LLM into your product, you’re accepting three unnecessary risks: model outages, silent quality regressions, and unpredictable costs. A metadata‑aware LLM gateway turns model choice into an operational control surface instead of a build-time decision.

That’s how you:

Give free and pro users different model stacks without branching your code.
Route multilingual traffic to the best model per locale and latency zone.
Experiment with new models or prompts safely, with rollback and failover, instead of “big bang” migrations.

Inworld Router was built for exactly this: a provider‑agnostic, no‑latency‑added gateway where you decide how to route based on user and request context—not based on what’s hard‑coded into your backend.

Key Benefits:

Fine-grained routing by metadata: Use fields like tier, country, language, intent, or emotion to pick the right model and configuration per request.
Switch models without code changes: Update routing strategies, A/B tests, and failover from the Router config surface, not from your app code or deployment pipeline.
Improve UX and cost predictably: Combine high‑quality, higher‑cost models where they matter (e.g., pro, critical flows) with cheaper or faster models elsewhere—while tracking the actual impact via Router analytics.

Core Concepts & Key Points

Concept	Definition	Why it's important
Metadata-conditioned routing	Routing each request to a model based on metadata (e.g., `tier=pro`, `locale=de-DE`, `intent=search`, `emotion=angry`) passed alongside the prompt.	Lets you tailor model selection to user tier, locale, and context instead of a one‑size‑fits‑all LLM, improving UX and cost alignment.
Provider-agnostic LLM gateway	A unified API layer (like Inworld Router) that fronts multiple providers (OpenAI, Anthropic, Google, xAI, Groq, Mistral, more) and 200+ models.	Decouples your product from any single vendor, makes failover, experimentation, and migrations possible with no app code changes.
No-code routing control (A/B, failover, tiering)	Configuration-layer rules for A/B testing, tier-based routing, and automatic failover, managed outside your application binaries.	Lets product and platform teams adjust models, pricing tiers, and experiments in hours instead of release cycles, with lower risk and better observability.

How It Works (Step-by-Step)

At a high level, Inworld Router gives you a single, OpenAI‑compatible endpoint. You send messages plus metadata; Router uses those fields to select a model, apply any experiment arms or tiering, and then forward the request to the chosen provider—without adding latency on top.

1. Attach metadata to each request

You keep your usual chat/completions shape, but add metadata that describes the user and the request. Example with the inworld/saas-tiers router model:

curl https://api.inworld.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{
    "model": "inworld/saas-tiers",
    "messages": [
      { "role": "user", "content": "Help me debug this billing issue." }
    ],
    "extra_body": {
      "metadata": {
        "tier": "pro",
        "country": "US",
        "locale": "en-US",
        "intent": "support_billing",
        "session_turns": 7
      }
    }
  }'

Under the hood:

tier might push Pro users to a higher‑quality Claude Sonnet 4.6 or Qwen3 Max path.
country / locale can route to models that perform better or faster in that language/region.
intent directs “support” vs “search” vs “creative” queries to different reasoning profiles.
session_turns can trigger context-length strategies (e.g., summary vs full history).

2. Router selects the best model using your rules

Inworld Router is configured with routing strategies that map metadata → model choice. Examples you can encode:

Tier-based routing
- tier=free → Gemini 3 Flash
- tier=pro → Claude Sonnet 4.6
- tier=enterprise → Claude Opus 4.6 or Qwen3 Max
Locale-based routing
- language=zh → Qwen3 Max
- language=en & country=US → Grok 4.1 Fast
- language=de → Gemini 2.5 Flash
Intent-based routing
- intent=code → specialized code‑strong model
- intent=creative → more permissive, creative‑writing tuned model
- intent=search → cheaper, high‑speed model for retrieval‑style responses

Router applies these strategies with:

A/B testing: Split traffic between, say, Gemini 3 Flash and Qwen3 Max for tier=free users to see which performs better.
Failover: If one provider is degraded or returns errors, automatically fail over to an alternate without your app needing to know.
Tiering: Configure primary and secondary models by tier, intent, or other metadata, so you can mix cost and quality intentionally.

All of this happens with no additional latency beyond what the target model itself introduces; the Router is designed to be streaming‑native.

3. Change routing, not code

When performance, prices, or model quality change, you update Router configuration—not your application code:

Swap Claude Sonnet → Gemini 3 Flash for tier=free traffic.
Increase the proportion of traffic going to a new model in an A/B test.
Add a new rule for intent=“safety_review” that routes to a conservative model.

Because the API contract is unified and OpenAI‑compatible:

Client libraries don’t change.
Call shapes remain the same.
You can iterate on routing strategy daily without deploying new binaries.

Router analytics give you visibility into:

Which models are receiving traffic and for which metadata slices.
Error rates and latency per model and per routing rule.
Cost baselines (via $/1M tokens from each provider, with no markup).

Common Mistakes to Avoid

Hard-coding a single model everywhere:
This makes migrations painful and prevents you from optimizing for different user tiers or locales. Instead, front everything with Router and encode your strategy in metadata and routing rules.
Treating metadata as an afterthought:
If you’re not passing tier, locale, intent, and session_turns, you can’t route intelligently. Standardize metadata in your app (e.g., attach a context object on every call) and enforce it at the edge.
Experimenting without guardrails:
Swapping models directly in production code is how regressions slip in. Use Router’s A/B testing and failover to introduce new models gradually, with a clear rollback path.

Real-World Example

Imagine you’re running a global support assistant with free and paid plans:

Free users:
- Mostly short FAQ queries.
- You route them to Gemini 3 Flash (fast, cost‑efficient), with low context limits.
- For locale values where you see weaker quality, you test Qwen3 Max via A/B.
Pro users:
- Complex multi‑turn debugging conversations.
- You route them to Claude Sonnet 4.6 with longer context and richer tool calling.
- If Sonnet latency spikes, Router automatically fails over to Claude Opus 4.6.
Enterprise users:
- High‑stakes support where hallucinations are unacceptable.
- You route intent=safety_critical queries to your most conservative, audited model.
- You keep a parallel A/B arm trying a new provider for non‑critical intents only.

In all cases, the application sends the same API call shape—just with different metadata pulled from your auth/session layer. When you decide to change which models handle Pro vs Enterprise, you edit Router rules and watch analytics; there’s no redeploy and no client update.

Pro Tip: Design your metadata schema as carefully as your database schema. Decide up front on stable fields (tier, country, language, intent, emotion, session_turns, plan) and make every LLM call go through a helper that populates them consistently—your routing leverage grows with metadata quality.

Summary

If you care about keeping UX, reliability, and cost under control at scale, you should not be binding your product to a single LLM or a hard‑coded model ID. A proper LLM gateway that can route by metadata (user tier, locale, intent, emotion) and switch models without code changes is the difference between a demo and a durable product.

Inworld Router gives you that control:

A unified, OpenAI‑compatible API across OpenAI, Anthropic, Google, xAI, Groq, Mistral, and 200+ models.
Metadata-conditioned routing, with A/B testing, failover, and tiering—so every user and context gets the right model.
No-latency-added routing and clear economics (pay for what you consume, no markup on underlying LLM rates).

You keep your app simple. Router handles the complexity.

Next Step

Get Started

LLM gateway that can route by metadata (user tier, locale, intent) and switch models without code changes

Why This Matters

Core Concepts & Key Points

How It Works (Step-by-Step)

1. Attach metadata to each request

2. Router selects the best model using your rules

3. Change routing, not code

Common Mistakes to Avoid

Real-World Example

Summary

Next Step

Keep Reading

More from Text-to-Speech APIs

How do I migrate from ElevenLabs to LMNT and claim the 500,000 free migration credits?

How do I contact LMNT sales for an Enterprise plan with SLA and dedicated support?

How do I apply for the LMNT Startup Grant (3 months free, 15M characters/month) and how long does approval take?