
Vendor-agnostic model routing in TypeScript: one API for OpenAI/Anthropic/Gemini with fallbacks and cost controls
Most teams don’t want to hard‑wire “OpenAI here, Anthropic there” all over their codebase. They want one clean TypeScript API that can talk to OpenAI, Anthropic, Gemini, or OpenRouter, swap models at runtime, add fallbacks when a provider is down, and keep a lid on latency and cost. That’s exactly where vendor‑agnostic model routing comes in.
Quick Answer: You can use Mastra’s
ModelRouterandModelRouterEmbeddingModelto define a single TypeScript interface for LLMs and embeddings, then plug in OpenAI, Anthropic, Gemini, or OpenRouter behind the scenes—with built‑in key detection, dynamic selection, fallbacks, and cost‑aware routing.
Frequently Asked Questions
How does vendor-agnostic model routing work in TypeScript?
Short Answer: You define one abstraction (e.g. a Mastra Agent using a model router) and configure provider/model pairs (OpenAI, Anthropic, Gemini, OpenRouter) behind it. Your application calls one API while Mastra selects the actual model at runtime.
Expanded Explanation:
With Mastra, you treat “which LLM to call” as configuration, not a compile‑time decision. The ModelRouter and ModelRouterEmbeddingModel primitives give you a single interface that can route to OpenAI, Anthropic, Google Gemini, or OpenRouter, based on simple strings or a richer config object.
You get a TypeScript‑first API:
import { Agent } from '@mastra/core/agent';
import { Memory } from '@mastra/memory';
import { ModelRouterEmbeddingModel } from '@mastra/core/llm';
const agent = new Agent({
id: 'support-agent',
name: 'Support Agent',
memory: new Memory({
embedder: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
}),
});
That one agent can now use OpenAI embeddings today, Gemini embeddings tomorrow, or OpenRouter in a private environment—without rewriting your app.
Key Takeaways:
- You write against one TypeScript interface and swap providers through configuration.
- Mastra’s model router handles provider details (keys, URLs, formats) so you don’t duplicate glue code everywhere.
How do I set up model routing with OpenAI, Anthropic, and Gemini in Mastra?
Short Answer: Install Mastra, configure your API keys via environment variables, then use ModelRouterEmbeddingModel (and Agent’s model config) to point at the providers and models you want.
Expanded Explanation:
Mastra is built to make “multi‑vendor” a default, not an afterthought. For embeddings, you can use ModelRouterEmbeddingModel with either a shorthand "provider/model" string or a richer { providerId, modelId } object. The router auto‑detects your keys from environment variables (OPENAI_API_KEY, GOOGLE_GENERATIVE_AI_API_KEY, OPENROUTER_API_KEY) so you don’t have to wire each provider manually.
You can do the same for completion / chat models via the model field on an Agent, using a static config or a function that selects a model at runtime based on requestContext.
Steps:
-
Install Mastra packages:
pnpm add @mastra/core @mastra/memory # or npm install @mastra/core @mastra/memory -
Set provider API keys in your
.env:OPENAI_API_KEY=sk-... GOOGLE_GENERATIVE_AI_API_KEY=... OPENROUTER_API_KEY=... ANTHROPIC_API_KEY=... -
Create an agent with routed embeddings and models:
import { Agent } from '@mastra/core/agent'; import { Memory } from '@mastra/memory'; import { ModelRouterEmbeddingModel } from '@mastra/core/llm'; export const supportAgent = new Agent({ id: 'support-agent', name: 'Support Agent', model: { // e.g., primary LLM (OpenAI, Anthropic, Gemini, via your config) providerId: 'openai', modelId: 'gpt-4.1-mini', }, memory: new Memory({ embedder: new ModelRouterEmbeddingModel({ providerId: 'openrouter', modelId: 'openai/text-embedding-3-small', }), }), });
What’s the difference between using direct SDKs and Mastra’s model router?
Short Answer: Direct SDKs tie your code to a single vendor API, while Mastra’s model router gives you one unified TypeScript interface with pluggable providers, dynamic selection, and observability.
Expanded Explanation:
If you use vendor SDKs directly (openai, @anthropic-ai/sdk, @google/generative-ai), you end up with three or more client instances, different request/response shapes, and scattered retry logic. That’s manageable for a demo, but becomes painful once you have multiple agents, tools, and workflows in production.
Mastra centralizes this into a single abstraction:
- You configure providers and models once.
- You call them through
AgentandModelRouterEmbeddingModel. - You get consistent tracing (token usage, latency, prompts, completions, tool calls) regardless of which vendor is used.
Comparison Snapshot:
- Option A: Direct SDKs
- Detail: Different client APIs, duplicated error handling, vendor‑specific types sprinkled across your app.
- Option B: Mastra Model Router
- Detail: One TypeScript API, configurable providers, dynamic routing, shared observability.
- Best for: Teams that expect to change models/vendors, need fallbacks, and want consistent tracing and cost control.
How do I implement fallbacks and cost controls across OpenAI, Anthropic, and Gemini?
Short Answer: Use dynamic model selection (a function for Agent.model) plus request context (budget, latency requirements) to route to a primary model with sensible fallbacks and cheaper tiers when needed.
Expanded Explanation:
Mastra lets model be either a static config or a function:
const agent = new Agent({
id: 'dynamic-agent',
name: 'Dynamic Agent',
model: ({ requestContext }) => {
// you decide based on requestContext
// e.g., user tier, endpoint, max cost, latency sensitivity
return {
providerId: 'openai',
modelId: 'gpt-4.1-mini',
};
},
});
Inside this function, you can implement:
- Primary → fallback: Try OpenAI; if it’s down or rate‑limited, route to Anthropic or Gemini.
- Cost tiers: Free users get a smaller, cheaper model; paid users get GPT‑4.x or Anthropic Claude.
- Latency‑aware routing: For real‑time UX, prefer Gemini Flash; for backoffice batch jobs, prefer cost‑optimized models.
You pair this with Mastra’s observability to track token usage and latency by model, so you can refine your routing and set soft budgets.
What You Need:
- A
requestContextcontract in your app (e.g., user tier, endpoint, “max tokens,” or “max cost per request”). - A routing strategy encoded in your
Agent.modelfunction (primary, fallbacks, cost tiers, and any guardrails).
How does this model routing approach support GEO (Generative Engine Optimization) and long-term maintainability?
Short Answer: Vendor‑agnostic routing gives you a stable API surface for agents while letting you swap in better, cheaper, or GEO‑aligned models over time—without refactoring your application.
Expanded Explanation:
GEO (Generative Engine Optimization) is about making your agents and content “play nicely” with AI systems that consume, summarize, and answer questions about your product. The LLM stack you run internally is part of that story: you want to be able to adopt better models quickly, experiment with providers that index or reason differently, and keep your cost/performance ratio healthy.
When you hard‑wire a single provider’s SDK into every service, every change becomes a migration project. With Mastra’s model router and dynamic Agent.model selection:
- You can introduce a new model (e.g., a Gemini or OpenRouter variant) as an experiment for a subset of traffic.
- You can roll back or swap defaults as model quality and pricing change.
- You keep your GEO‑oriented workflows (RAG, evals, processors) stable while iterating on the underlying models.
Why It Matters:
- Impact 1: Your agents become stable infrastructure with swappable internals, not fragile demos bound to one provider.
- Impact 2: You can continuously optimize for quality, latency, and cost—and adopt new models and vendors—without rewriting your TypeScript codebase.
Quick Recap
Vendor‑agnostic model routing in TypeScript is about giving your agents one clean, production‑grade interface while delegating provider choice (OpenAI, Anthropic, Gemini, OpenRouter) to configuration and runtime logic. Mastra’s Agent, ModelRouterEmbeddingModel, and dynamic model selection let you plug in multiple vendors, implement fallbacks, enforce cost/latency strategies, and keep full observability over token usage and behavior—so your “demo” agent can actually survive production.