
Migrating from ElevenLabs to LMNT: what are the API differences (streaming, voice IDs, auth) and what’s the fastest migration path?
Most teams land on LMNT after they’ve already shipped something on ElevenLabs—and now need lower latency, more concurrency, or predictable scaling for agents and games. The good news: the APIs are conceptually similar (voices, text, audio out), but there are some real differences in streaming, voice IDs, and auth that you should plan for. With a bit of mapping, you can usually get a basic migration done in a day, then iterate on quality and fine-tuning.
Quick Answer: LMNT and ElevenLabs both expose HTTP and streaming TTS, but LMNT centers on ultra-low-latency WebSocket streaming, simpler auth, and studio-quality clones from a 5 second recording. The fastest migration path is: 1) map your existing voices to LMNT stock or clones, 2) swap your streaming client over to LMNT’s spec at
https://api.lmnt.com/spec, and 3) progressively migrate traffic, starting with your most latency-sensitive flows.
Why This Matters
If you’re running conversational apps, agents, or game characters, your TTS stack is either invisible or it’s the reason users churn. A few hundred extra milliseconds, a throttled concurrency spike, or brittle voice cloning can break turn-taking and make the experience feel fake.
Migrating from ElevenLabs to LMNT correctly gives you:
- 150–200ms low-latency streaming for real-time exchanges.
- Unlimited voice clones from short captures (5 seconds).
- No concurrency or rate limits so you don’t get surprised at scale.
Done right, the migration is mostly plumbing—mapping endpoints and payloads—rather than a ground-up rewrite.
Key Benefits:
- Faster, more natural streaming: LMNT’s 150–200ms streaming keeps your agents and NPCs conversational instead of “call center” slow.
- Simpler scaling model: No concurrency or rate limits and character-based pricing that improves with volume means fewer production surprises.
- Builder-native migration path: Free Playground to match voices, clear API spec, and forkable demos so you can validate quickly before flipping traffic.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Streaming mode | How your app sends text and receives audio in real time (HTTP vs WebSocket, chunking, latency). | Turn-taking and “talk over” behavior in agents and games live or die on streaming latency and stability. |
| Voice identity & cloning | How you reference voices (IDs vs names) and create or import new voices. | Clean mapping from ElevenLabs voices to LMNT voices avoids regressions in brand sound and character continuity. |
| Auth & limits | How you authenticate (API keys, headers) and what throughput constraints apply. | A small auth refactor plus “no concurrency or rate limits” lets you scale without re-architecting around throttles. |
Core API Differences: LMNT vs ElevenLabs
Below is a high-level mapping of what typically changes when you move from ElevenLabs to LMNT. Exact details will depend on how you’re using ElevenLabs today, but these are the patterns I see most often.
1. Streaming: HTTP vs WebSocket and Latency
How ElevenLabs usually works
- Offers both HTTP and WebSocket streaming.
- Many integrations hit a REST endpoint for TTS and stream back chunks (or poll for a finished file).
- Latency can be fine for “voice-over” style use, but it tends to creep past conversational thresholds as traffic and complexity rise.
How LMNT works
- Designed primarily around real-time, WebSocket-based streaming TTS.
- End-to-end audio typically lands in 150–200ms, which is the budget you want for natural back-and-forth.
- Same mental model: send text (or partials), receive audio frames. Just faster and tuned for conversational apps, agents, and games.
Migration impact
- If you’re already using ElevenLabs WebSockets: this is mostly a URL + payload schema change plus minor client refactoring.
- If you’re using synchronous HTTP TTS today: moving to LMNT streaming is a good time to make your app streaming-native as well (e.g., stream TTS audio as it arrives instead of waiting for full files).
2. Voice IDs and Voice Cloning
How ElevenLabs handles voices
- Voices are typically tied to voice IDs you pass into the API.
- Cloning often requires more than just a few seconds of training audio to get truly reliable, production-quality results.
- Voice management is frequently its own set of endpoints and flows.
How LMNT handles voices
- LMNT ships with named voices (e.g., “Brandon” as an engaging broadcaster, “Leah” as a cheerful assistant) and studio-quality voice clones.
- Key constraints:
- All you need is a 5 second recording.
- Unlimited clones across plans, with a commercial license on paid tiers.
- Voices are referenced via identifiers, but the typical developer flow is:
- Explore voices in the Playground for fit.
- Clone voices you need for brand or character continuity.
- Plug those IDs into your API calls.
Migration impact
- Map your ElevenLabs voices to LMNT voices:
- For generic roles (assistant, tutor, narrator), start from stock LMNT voices (Leah, Vesper, Natalie, Tyler, Brandon).
- For branded or character voices, create a 5-second capture clone that approximates—or intentionally improves—your current sound.
- Update your code where you previously passed an ElevenLabs
voice_idto now pass the corresponding LMNT voice ID or name.
3. Auth and Rate Limiting
How ElevenLabs usually works
- Standard API-key authentication.
- Practical limits around:
- Requests per minute / concurrency.
- Project or account-level quotas.
- Developers often add backoff handling, queueing, or per-tenant throttles to avoid bumping into caps.
How LMNT works
- Also API-key based, but the big operational difference is:
- No concurrency or rate limits.
- Pricing that gets better with volume, so you can treat TTS as infrastructure, not a fragile external service.
- LMNT is SOC-2 Type II compliant and offers enterprise plans when you’re ready or need something custom.
Migration impact
- Swap your authorization header to match LMNT’s spec.
- You can usually simplify your client-side throttling and retry logic because you’re not engineering around a concurrency ceiling.
- For security and procurement: LMNT’s SOC-2 Type II is often enough to unblock production deployment in regulated orgs.
How It Works (Step-by-Step)
Here’s a pragmatic migration path I’ve used with teams moving from ElevenLabs to LMNT. Treat this as a checklist rather than a big-bang rewrite.
- Baseline Your Current Usage
- Map Voices + Latency Requirements
- Implement LMNT Streaming and Gradually Flip Traffic
1. Baseline Your Current Usage
First, get clear on how you’re using ElevenLabs right now.
- Inventory your flows:
- Which endpoints? (text-to-speech sync, streaming WebSocket, voice cloning, etc.)
- Which products? (agents, tutors, game NPCs, broadcast-like narration.)
- Capture your latency and quality expectations:
- What’s your acceptable time-to-first-audio?
- Which flows absolutely need conversational latency vs “voiceover” style latency?
- List your voices:
- How many distinct voices?
- Which are “must exactly match” vs “okay to upgrade/tune”?
This gives you a voice mapping checklist and helps prioritize which integrations to migrate first.
2. Map Voices + Latency Requirements
Now translate that inventory into LMNT equivalents.
-
Try LMNT voices in the Playground:
- Match your core roles:
- Assistants → e.g., Leah.
- Tutors → e.g., Vesper.
- Friends/companions → Natalie.
- Storytellers → Tyler.
- Broadcasters → Brandon.
- Validate emotional tone, pace, and clarity.
- Match your core roles:
-
Clone your signature voices:
- For voices that must stay on-brand, use a 5-second recording to create a studio-quality clone.
- Compare playback against your current ElevenLabs voice; tweak capture if needed.
-
Document your mapping:
# Example voice mapping table ElevenLabs voice ID → LMNT voice -------------------- --------------------------------- "eleven_labs_amy" → "Leah" (stock, cheerful assistant) "eleven_labs_npc1" → Custom LMNT clone: "npc-scout-01" "eleven_labs_news" → "Brandon" (stock, engaging broadcaster) -
Tie voices to latency budgets:
- Agents and game NPCs → target LMNT’s streaming and 150–200ms envelope.
- Longer-form narration → can still use streaming, but latency is less critical; focus on voice feel.
3. Implement LMNT Streaming and Gradually Flip Traffic
With voices mapped, you can start swapping your runtime integration.
a) Read the LMNT API spec
- Browse
https://api.lmnt.com/specto see the full schema. - Start from a language you already use—Node, Python, Go, etc.
- Build a minimal client that:
- Connects to LMNT.
- Sends a test string.
- Writes audio to a file or plays it back.
If you’re coming from ElevenLabs WebSockets, this is mostly: new URL, new message schema, new auth header.
b) Implement streaming in your agent/game runtime
- Replace your ElevenLabs streaming client with LMNT’s:
- Same pattern: open WebSocket, send text, receive audio chunks.
- Update your media pipeline (Web, mobile, or game engine) to consume LMNT audio frames.
- Keep your timeout and cancellation logic roughly the same—agents still need to interrupt speech when users interject.
c) Validate with a structured test
- Run side-by-side:
- Same prompts.
- ElevenLabs vs LMNT.
- Measure:
- Time to first audio.
- Total latency for typical utterances.
- Subjective naturalness (accent, pacing, expressiveness).
d) Gradually flip traffic
- Phase rollout:
- Internal test/staging agents → 100% LMNT.
- A small percentage of production users → LMNT, with fallback to ElevenLabs.
- Ramp to 100% LMNT once satisfied.
- For long-running sessions (e.g., tutoring), consider:
- New sessions on LMNT.
- Existing sessions continue on ElevenLabs until they naturally complete.
Common Mistakes to Avoid
-
Treating it as a pure “endpoint swap”:
LMNT’s strength is low-latency streaming. If you keep your app fully synchronous (generate full audio file, then play), you’ll ship a slower experience than necessary. Make your client streaming-native. -
Ignoring voice mapping until the end:
Don’t leave voice selection as an afterthought. Users notice voice style changes more than they notice backend swaps. Start by matching your key voices in the Playground, then clone where needed, before you touch production traffic.
Real-World Example
Say you’ve built a browser-based AI tutor that uses ElevenLabs WebSockets. You see latency spike during peak hours and occasionally hit rate limits, which kills the conversational feel when students interrupt or ask follow-up questions.
The migration path might look like this:
- Recreate your tutor voice in LMNT:
- You discover Vesper matches your “nerdy tutor” vibe out of the box.
- For your branded math tutor, you capture a clean 5-second sample from your existing actor and clone it in LMNT.
- Implement a new streaming client:
- You open a WebSocket to LMNT for each active tutoring session.
- The text output from your LLM is streamed to LMNT as it’s generated; audio comes back in 150–200ms, which makes turn-taking feel natural.
- A/B test with real users:
- 10% of sessions use LMNT; the rest stay on ElevenLabs.
- You track response time, user satisfaction, and error rates.
- Flip to LMNT and simplify ops:
- With no concurrency or rate limits, you can stop juggling custom rate limiters and backoff code.
- Your infra team treats LMNT as reliable voice infrastructure, and your product team iterates on voice nuances instead of fighting latency.
Pro Tip: Start migration with your most latency-sensitive flows—like agent handoffs and live tutoring—because that’s where LMNT’s 150–200ms streaming and no-rate-limit model delivers the biggest instant win.
Summary
Migrating from ElevenLabs to LMNT is less about relearning TTS and more about taking advantage of a stack built for real-time voice. The main differences you’ll touch are:
- Streaming: LMNT is optimized around WebSocket streaming in the 150–200ms range, which is what you want for conversational apps, agents, and games.
- Voices and cloning: You’ll map your existing voices to LMNT stock voices or studio-quality clones from a 5-second recording, keeping brand and character continuity.
- Auth and limits: You’ll swap in LMNT’s auth scheme and can stop engineering around concurrency and rate limits, knowing LMNT will scale with you.
If you inventory your current ElevenLabs usage, map voices early, and implement LMNT streaming behind a feature flag, you can usually move your critical paths over in a day or two and refine from there.