
Migrating from ElevenLabs to LMNT: what are the API differences (streaming, voice IDs, auth) and what’s the fastest migration path?
Quick Answer: Migrating from ElevenLabs to LMNT is mostly about swapping endpoints, headers, and voice identifiers—and deciding when to move from REST-style synthesis to low-latency streaming. LMNT uses a simple
Authorization: Bearerheader, a different voice ID scheme, and WebSocket streaming that delivers audio in ~150–200ms for conversational apps. The fastest path is: validate voices in the Playground, map ElevenLabs voices → LMNT voice/clone IDs, then wire up LMNT’s REST and streaming endpoints behind a small abstraction layer so you can flip traffic gradually.
Why This Matters
If you’re already in production with ElevenLabs, you’ve probably felt the pain points that show up at scale: latency that breaks turn-taking, stricter rate limits, or voice cloning workflows that don’t match how fast you iterate. LMNT is built specifically to solve those constraints—150–200ms low-latency streaming, studio-quality clones from a 5-second recording, and no concurrency or rate limits—without forcing a full rewrite of your voice stack.
For teams shipping conversational agents, tutors, and games, understanding the concrete API differences (auth, endpoints, streaming protocol, and voice IDs) is what lets you migrate in days, not weeks.
Key Benefits:
- Lower latency for live experiences: LMNT’s 150–200ms streaming is tuned for real turn-taking in agents and games, so voice doesn’t lag behind the UI.
- Simpler scaling model: No concurrency or rate limits and volume-friendly pricing mean you can scale usage without redesigning your architecture around throttles.
- Fast voice cloning and parity: Clone production voices from ~5 seconds of audio, map them to new LMNT IDs, and keep your product UX intact.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Auth & headers | How you authenticate (API keys, headers) and structure requests | Determines how invasive the migration is in your codebase; LMNT uses straightforward Authorization: Bearer patterns. |
| Voices & cloning | How voices are identified and how you create custom clones | You’ll remap ElevenLabs voice IDs to LMNT system voices or clones; LMNT can recreate voices with minimal input. |
| Streaming vs REST | HTTP synthesis vs low-latency WebSocket streaming | Streaming is where most UX gains come from—LMNT’s 150–200ms streaming is ideal for conversational apps and games. |
How It Works (Step-by-Step)
Here’s a practical migration path I’d use as a voice product engineer moving an existing ElevenLabs integration over to LMNT.
1. Inventory your current ElevenLabs usage
Before you touch code, catalog how you actually use ElevenLabs today:
- APIs in use
- Text-to-speech REST endpoint(s)
- WebSocket or other streaming endpoints
- Voice cloning / voice management APIs
- Where it’s called
- Backend services (Node/Go/Python/etc.)
- Frontend (browser) or game engines (Unity, Unreal)
- Any serverless or edge functions
- Voice footprint
- System voices vs custom/cloned voices
- Which specific voice IDs are used in production
- Which experiences are latency-sensitive (agents, games) vs batch (content renders)
This gives you a clear list of:
- Endpoints to swap
- Voice IDs to map
- Places where streaming latency improvements will matter most
2. Explore LMNT voices in the Playground
Before wiring up APIs, align on sound and style.
- Go to the LMNT Playground (linked from lmnt.com).
- Try built-in voices like:
- Leah – cheerful assistant
- Vesper – nerdy tutor
- Natalie – youthful friend
- Tyler – smooth storyteller
- Brandon – engaging broadcaster
- Match each existing ElevenLabs voice to:
- A stock LMNT voice, or
- A planned clone (for cases where parity matters: brand voices, characters)
Result: a simple mapping table, e.g.:
| Use case | ElevenLabs voice ID | LMNT target |
|---|---|---|
| Support agent | eleven-voice-1 | Leah |
| History tutor | eleven-voice-2 | Vesper |
| Game character “Tony” | custom ID | LMNT clone |
| Newsreader | custom ID | Brandon |
You’ll translate this into actual LMNT voice IDs once you’re in the API.
3. Understand auth & request differences
ElevenLabs (typical pattern)
- Auth: API key in an
xi-api-keyheader. - Endpoints:
https://api.elevenlabs.io/v1/text-to-speech/{voice_id}and variants. - Request body: JSON with
text, optionalmodel_id,voice_settings, etc.
LMNT (typical pattern)
LMNT uses a simpler, standard auth style:
- Auth header
Authorization: Bearer YOUR_LMNT_API_KEY - API surface (high level)
- REST-style synthesis endpoint(s) for non-streaming use cases
- WebSocket streaming for 150–200ms low-latency TTS
- Specs: Browse
https://api.lmnt.com/specfor the full OpenAPI definition.
In most backends, this is a mechanical swap:
- const apiKey = process.env.ELEVENLABS_API_KEY;
- const headers = { 'xi-api-key': apiKey, 'Content-Type': 'application/json' };
+ const apiKey = process.env.LMNT_API_KEY;
+ const headers = { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' };
If you encapsulate this behind a TtsClient interface, you only change one implementation.
4. Map voice IDs and cloning workflow
ElevenLabs voice IDs
- Typically a UUID-like string.
- May be:
- Base/system voices, or
- Cloned/custom voices created via their voice management APIs.
Your code probably stores these IDs in:
- Config files / environment
- Database columns (per user, per bot, per character)
LMNT voice identifiers
LMNT also uses identifiers for:
- System voices like Leah, Vesper, Brandon, etc.
- Cloned voices you create (studio quality from a 5-second recording).
Fast migration strategy:
-
For system voices
- Replace hardcoded or config-level ElevenLabs voice IDs with LMNT equivalents.
- Keep everything else (SSML/text prompts) the same initially; fine-tune later.
-
For cloned voices
- For each critical ElevenLabs clone:
- Capture ~5 seconds of representative audio (you can reuse your existing training samples).
- Use LMNT’s cloning flow (via Playground or API) to create a corresponding LMNT voice.
- Store the returned LMNT voice ID in the same place your existing voice ID lives.
- Either overwrite (if you’re doing a hard cutover), or
- Add a new column/field (if you want dual-provider fallback).
- For each critical ElevenLabs clone:
-
Maintain a mapping for rollout
-
In code or config, keep explicit mapping for a while:
{ "support-agent": { "elevenlabs": "eleven-voice-1", "lmnt": "lmnt-voice-leah" }, "big-tony": { "elevenlabs": "eleven-custom-123", "lmnt": "lmnt-voice-tony-clone" } } -
This lets you A/B test or route by environment (
staging→ LMNT,prod→ ElevenLabs, then flip).
-
5. Move non-streaming calls to LMNT REST first
For workloads where latency is less critical (e.g., batch content generation, pre-rendered lessons):
- Implement a LMNT REST TTS client aligned with the OpenAPI spec at
https://api.lmnt.com/spec. - Mirror your existing REST logic:
- Input text or SSML
- Voice ID (mapped LMNT ID)
- Optional parameters (format, sample rate, etc.)
- Keep return types identical to your existing interface:
- Always return a binary audio buffer or URL, regardless of provider.
This gives you a safe first win: LMNT in production for non-realtime pathways without touching streaming yet.
6. Upgrade your streaming pipeline to LMNT
This is where most of the UX benefit comes from: LMNT’s 150–200ms low-latency streaming is tuned for conversational agents, games, and tutors.
ElevenLabs streaming (conceptual)
- Typically WebSocket or server-sent style streams.
- You:
- Connect with auth and parameters (voice ID, format).
- Send text payload.
- Receive chunks of encoded audio.
The exact frame protocol is provider-specific.
LMNT streaming
- Protocol: WebSocket streaming (see
https://api.lmnt.com/specfor exact URL and message schema). - Behavior: Audio begins streaming within ~150–200ms after you send input.
- Built for:
- Agents and assistants
- LLM-driven tutors
- Real-time characters in games
Migration steps:
-
Isolate your streaming abstraction
If you don’t already have one, create a small layer:interface StreamingTts { connect(options: { voiceId: string }): Promise<void>; sendText(text: string): Promise<void>; onAudioChunk(cb: (chunk: ArrayBuffer) => void): void; close(): Promise<void>; } -
Implement
ElevenLabsStreamingTtsandLmntStreamingTts
Keep your app code calling the interface, not the provider’s protocol directly. -
Wire LMNT WebSocket based on the spec:
- Open connection with auth (
Authorization: Bearer). - Pass voice ID and configuration in the init/handshake.
- Stream text and handle audio frames as they arrive.
- Pipe audio chunks to:
- Browser
AudioContext/MediaSource - Game engine audio sources
- WebRTC (e.g., with LiveKit, similar to LMNT’s “Big Tony’s Auto Emporium” demo)
- Browser
- Open connection with auth (
-
Measure latency & tune turn-taking
- Log time from “user stops speaking” → “first LMNT audio frame.”
- With LMNT’s 150–200ms target, you can:
- Start speaking before the LLM has finished the full response (incremental generation path).
- Tighten interrupt/“barge in” behavior.
-
Roll out by feature
- Migrate your most latency-sensitive flows first:
- Live customer support agent
- Tutor sessions
- Game characters
- Keep batch flows on REST.
- Migrate your most latency-sensitive flows first:
7. Handle rate limits, scaling, and pricing differences
One of the most tangible differences from ElevenLabs:
- LMNT:
- No concurrency or rate limits.
- Affordable pricing that improves with volume.
- Startup Grant: 45M credits over 3 months for eligible teams.
- SOC-2 Type II for enterprise readiness.
Practical implications:
- You don’t need complicated:
- Request queues to avoid 429s.
- Sharding users across multiple accounts.
- You can trust a straightforward scaling path:
- Vertical: more users on the same voice surfaces.
- Horizontal: more agents, games, and characters using TTS concurrently.
For procurement and security reviews, SOC-2 Type II is a deployment unlock—pair it with your internal controls and you’re ready to run LMNT in production.
Common Mistakes to Avoid
-
Treating the migration as a 1:1 endpoint swap only:
Don’t just replace URLs and headers; take the opportunity to move your most interactive flows to LMNT streaming so you actually get the 150–200ms latency benefits. -
Not abstracting providers behind an interface:
Hard-coding LMNT everywhere makes future changes painful. Wrap ElevenLabs and LMNT in a commonTtsClient/StreamingTtsinterface so you can:- Flip providers per environment
- Do canary rollouts
- Fall back gracefully if needed
Real-World Example
Say you’ve built a “History Tutor” with ElevenLabs:
- Today:
- User asks a question in the browser.
- Backend calls an LLM, then ElevenLabs TTS with a tutor voice ID.
- You stream audio back via WebSocket or chunked HTTP.
- Latency is 500–800ms before audio starts, so conversations feel laggy.
Migration with LMNT:
-
Voices
- In the LMNT Playground, you pick Vesper (“nerdy tutor”) as the base voice.
- Optionally clone your existing tutor voice using a 5-second sample.
- Store the LMNT voice ID in your “Tutor” config.
-
API swap
- Replace ElevenLabs REST calls with LMNT REST for non-live flows.
- Implement
LmntStreamingTtsand plug it into your tutor’s realtime mode.
-
End-to-end streaming
- Use incremental LLM output (token streaming) + LMNT WebSocket TTS to start speech as soon as the first tokens arrive.
- Measured timeline:
- 0ms: user stops speaking.
- 80–150ms: LLM emits first tokens.
- 150–200ms: LMNT begins audio playback.
The net result: your “History Tutor” feels far more human—less dead air, more overlapping conversational rhythm—without changing the core product idea.
Pro Tip: Before flipping all traffic, deploy LMNT to staging with a shadow mode: log LMNT’s latency and audio quality for the same requests you send to ElevenLabs, then switch provider IDs via config once you’re confident.
Summary
Migrating from ElevenLabs to LMNT comes down to a few concrete API shifts—Authorization: Bearer auth, a different set of voice IDs, and a WebSocket streaming model tuned for 150–200ms latency. The fastest path is:
- Inventory your ElevenLabs usage and where voice actually matters.
- Pick or clone LMNT voices in the Playground and map them to your existing personas.
- Swap non-streaming REST calls first for low-risk wins.
- Implement LMNT streaming for your most interactive agents, tutors, and game characters.
- Lean on LMNT’s no-rate-limit scaling and SOC-2 Type II posture when you push to production.
If you wrap all of this behind a provider-agnostic TtsClient, you can migrate incrementally and ship improvements continuously.