LMNT vs ElevenLabs: how do concurrency limits, rate limits, and load testing compare for a production voice agent?
Text-to-Speech APIs

LMNT vs ElevenLabs: how do concurrency limits, rate limits, and load testing compare for a production voice agent?

12 min read

Most production voice agents don’t fail because the model sounds bad. They fail because concurrency limits, rate limits, or unpredictable throttling kick in the minute real traffic hits. If you’re choosing between LMNT and ElevenLabs for a production voice agent, the real question is: which stack keeps you from hitting a wall when you go from 10 users to 10,000?

Quick Answer: LMNT is built to run production voice agents without hitting concurrency or rate limits, and explicitly advertises “No concurrency or rate limits” plus volume-friendly pricing. ElevenLabs offers strong voices, but published and anecdotal constraints (per‑minute limits, per‑project caps, and softer throttles at scale) can complicate load testing and force more aggressive traffic shaping. For teams planning serious load, LMNT’s “we’ll scale with you” posture, streaming-first design, and lack of hard limits make it easier to run and trust high-concurrency agents.

Why This Matters

In a real-time voice agent, your TTS layer is effectively part of your conversation loop. If concurrency limits or rate limits fire during peak load, the user doesn’t see a small error—they feel lag, dropped turns, or silence. That’s how you lose trust and retention.

Understanding how LMNT vs ElevenLabs treat concurrency, rate limits, and load testing determines whether you can:

  • Safely simulate peak conditions before launch
  • Run large numbers of parallel sessions (agents, callers, players)
  • Avoid mid-conversation throttles that break natural turn-taking

When you’re shipping conversational apps, agents, and games, this isn’t nice-to-have; it’s the difference between a polished product and a demo that falls apart under real traffic.

Key Benefits:

  • Stable scale-out behavior: Choosing a provider with no hard concurrency or rate limits means you can ramp traffic without constantly tuning around vendor ceilings.
  • Predictable performance under load: Low-latency streaming plus elastic capacity lets you keep turn-taking snappy even when thousands of sessions are active.
  • Simpler load testing & ops: Clear, generous limits (or none at all) make it easier to model worst-case scenarios, tune your backoff strategies, and avoid surprise throttling in production.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Concurrency limitsThe maximum number of simultaneous requests or streams you can run at once for a given account/project.Determines how many parallel voice sessions (agent calls, game players, concurrent users) you can serve before requests start failing or queuing.
Rate limitsCaps on how many requests, characters, or tokens you can send per unit time (per second/minute/day).Affects burst behavior and peak traffic; aggressive limits can force buffering, batching, or degrading the experience during spikes.
Load testingSystematic simulation of high traffic (many concurrent requests and bursts) to measure latency, error rates, and stability.Essential to prove your voice agent can handle real-world usage—launching without load tests is how you discover limits in production instead of in staging.

How It Works (Step-by-Step)

At a high level, here’s how you evaluate LMNT vs ElevenLabs on concurrency, rate limits, and load testing for a production voice agent.

  1. Model your concurrency needs
    Start from your product, not the vendor:

    • Peak simultaneous sessions (calls, chats, players)
    • Average speech duty cycle (talk time vs silence)
    • Target turn-taking latency (e.g., <300ms from text to audible speech)

    This yields a working budget. For example: 1,000 concurrent users, each requiring ~3–5 TTS responses per minute, with low latency. Your TTS provider must support that concurrency plus overhead.

  2. Map vendor limits and behavior under load
    For LMNT (based on the official docs):

    • No concurrency or rate limits. The docs explicitly state: “No concurrency or rate limits.”
    • Streaming-first design. Low latency streaming in the 150–200ms range, built for “conversational apps, agents, and games.”
    • Scale posture. “We’ll scale with you. Enterprise plans when you’re ready or need something custom.”

    For ElevenLabs (based on publicly known patterns and typical TTS vendor designs; verify current docs before launch):

    • Mix of per-minute or per-day character limits, tied to plan tiers.
    • Per-project or per-API-key limits that may not be obvious until you dig into the docs or hit them.
    • Throttling behavior (HTTP 429s, queued requests, or silent slowdowns) during burst traffic.

    The core difference: LMNT’s position is “no concurrency or rate limits” plus predictable character-based pricing that gets “even better with volume,” which makes it easier to plan for large, bursty loads. ElevenLabs is more traditionally quota-based, where you must architect around caps.

  3. Design and run your load tests

    Once you understand stated limits, you test them:

    • Step 3.1 – Prototype with Playground vs API

      • LMNT: Try voices and latency in the free Playground. Once the experience feels right, switch to the Developer API.
      • ElevenLabs: Prototype via their web UI, then move to API.
    • Step 3.2 – Build a load test harness

      • Use a language you already use for infra (Node, Python, Go, Rust).
      • For LMNT, you can follow the API spec at https://api.lmnt.com/spec to wire up streaming requests.
      • Simulate N concurrent sessions, each sending text chunks at realistic intervals (how your agent actually speaks).
    • Step 3.3 – Measure latency and error behavior
      Track:

      • Time to first byte (TTFB) of audio per request (LMNT targets 150–200ms).
      • End-to-end turn latency (LLM → TTS → playback).
      • Error rates under concurrency and burst conditions (429s, 5xx, connection drops).
    • Step 3.4 – Increase concurrency until something breaks

      • With LMNT, you’re looking for infrastructure bottlenecks on your side (network, LLM throughput, mixer), because the platform isn’t imposing hard concurrency or rate limits from the outset.
      • With ElevenLabs, you’ll want to find where soft or hard limits kick in and how the API signals those states (429s, slower responses, or silent throttling).

Common Mistakes to Avoid

  • Assuming demo behavior equals production behavior:
    A vendor’s Playground or demo often runs on privileged infrastructure that doesn’t reflect your API quotas. Always validate production behavior with the API and your actual traffic patterns.

  • Load testing with unrealistic traffic patterns:
    If your test script blasts 100 requests per second from a single client, you’re measuring something different than a thousand end users talking over WebRTC. Simulate realistic session structure: staggered starts, bursts of speech, natural pauses, and varied utterance lengths.

LMNT vs ElevenLabs: Practical Differences for a Production Voice Agent

From the perspective of actually running a large-scale voice agent, here’s how the two tend to differ.

1. Concurrency Limits

LMNT

  • Explicitly states: “No concurrency or rate limits.”
  • Designed for streaming first, so parallel WebSocket or HTTP streaming sessions are a core use case.
  • The “History Tutor” (LLM-driven streaming speech on Vercel) and “Big Tony’s Auto Emporium” (realtime speech-to-speech using LiveKit) demos demonstrate that multiple concurrent sessions are expected—not an edge case.
  • Enterprise plans available “when you’re ready or need something custom,” which typically means you can negotiate SLOs for very high concurrency use cases.

Implication for your agent: you can design around your own infra constraints (LLM throughput, signaling, and audio mix) rather than around an arbitrary “max concurrent streams” ceiling. Great fit for high-fan-out agents, multiplayer games, or large contact-center style deployments.

ElevenLabs

  • Public plans are governed by usage quotas: monthly characters, per-minute or per-day limits, and sometimes per-project ceilings.
  • While they can support significant concurrency, it’s usually bounded by tiered quotas; hitting limits may yield 429s or slower responses.
  • For real-time agents, this means you must implement:
    • Request-level backoff and retries
    • Queuing or buffering when you approach known caps
    • Additional observability to catch when you’re near thresholds

Implication for your agent: concurrency is workable but has to be explicitly engineered around the limits. For spiky traffic or unknown growth curves, this adds operational overhead.

2. Rate Limits and Burst Handling

LMNT

  • Positions itself with “No concurrency or rate limits” and character-based pricing that gets better with volume.

  • This is important for burst traffic:

    • Product launches
    • Live events
    • Seasonal spikes (e.g., holidays, school cycles for tutors)
  • Because you’re not fighting per-second or per-minute caps, you can rely more on your own autoscaling strategy and less on shaping demand around vendor constraints.

ElevenLabs

  • Tends to use more traditional rate limiting: requests/characters per unit time per account or API key.
  • Burst traffic may:
    • Trigger 429s
    • Require pre-warming or special enterprise arrangements
    • Force you to implement upstream rate-limiting to prevent user-visible throttling

For a real-time agent, hitting a rate limit mid-call is especially painful: either you stop speaking, or you fall back to a degraded or cached voice.

3. Load Testing and Pre-Production Validation

LMNT

  • “Get started in seconds” is literal: pull up your editor, use the API spec (https://api.lmnt.com/spec), and start driving streaming requests.
  • Free Playground makes it easy to lock in voice and latency before you even write a test harness.
  • Two “ready-to-fork” demos show real-world streaming behavior:
    • History Tutor – LLM-driven, Vercel-hosted, good starting point for a text-in / audio-out agent.
    • Big Tony’s Auto Emporium – LiveKit-based, speech-to-speech, closer to full duplex voice.

Because there are no published concurrency or rate limits, you can:

  • Ramp your load tests directly toward your expected production profile.
  • Focus on end-to-end metrics (user-perceived latency, quality, and stability) rather than juggling quota math.
  • Use LMNT’s Startup Grant (45M credits over 3 months) to run heavy tests without blowing your budget.

ElevenLabs

  • Load testing is absolutely possible, but constrained by:
    • Plan quotas (you can burn through monthly characters quickly in aggressive tests).
    • Per-minute or per-second rate limits that may kick in as you ramp.

This often results in more cautious test plans:

  • Throttled load tests to stay under limits.
  • “Segmented” tests (e.g., test groups of 100 concurrent sessions separately) that don’t fully reproduce worst-case scenarios.

For serious production readiness, you’ll typically want enterprise discussions in advance to ensure you won’t hit hidden ceilings during testing or at launch.

4. Latency Under Load

Both platforms focus on natural-sounding voices, but for a voice agent, latency is part of your concurrency story: if TTS slows as you add load, your effective concurrency is lower.

LMNT

  • Explicit claim: 150–200ms low-latency streaming, “great for conversational apps, agents, and games.”
  • Because it’s built for streaming and real-time use:
    • You can start playback as soon as audio begins streaming.
    • You get more headroom on concurrency before user-perceived latency becomes a problem.

In practice, this means your load tests can target conversational thresholds (e.g., sub‑300ms first audio) rather than just “eventual” responses.

ElevenLabs

  • Delivers high quality voices, but many typical workflows are request/response rather than aggressively streaming-first.
  • Under higher load or rate limiting conditions, you may see:
    • Increased response latencies
    • Jitter that affects turn-taking smoothness

You’ll want to tune your agent’s speech chunking and buffer strategy carefully if you’re pushing concurrency.

5. Pricing, Scaling, and Operational Risk

LMNT

  • Framed as “Fast.Lifelike.Affordable.” with character-based pricing and better economics at volume.
  • “No concurrency or rate limits” + no artificial throttles makes cost modeling and scaling straightforward:
    • Character in → predictable cost out.
    • Scale concurrency based on your own infra and budget, not on vendor throttle points.
  • SOC-2 Type II in the footer signals enterprise readiness: important if your agent touches sensitive domains (health, education, finance).

ElevenLabs

  • Well-known and widely adopted; strong voice quality, broad ecosystem.
  • Pricing is more traditional SaaS/quota style:
    • You pay for capacity and work around limits.
    • Scaling to large, bursty agents often triggers plan upgrades and negotiation.

Operationally, this adds more moving parts: quota monitoring, early-warning alerts, and sometimes multi-vendor fallback just to protect against rate limit outages.

Real-World Example

Imagine you’re launching a real-time tutoring agent serving 5,000 concurrent students during peak hours. Each student has a two-way voice session with an agent that:

  • Generates 3–5 TTS responses per minute
  • Must keep turn-taking latency near or below 300ms
  • Needs to gracefully handle spikes (e.g., class transitions, exam weeks)

With ElevenLabs, your rollout might look like:

  • Spend weeks clarifying rate limits and capacity with support.
  • Build a load tester that ramps up to 1,000 sessions, then stops to avoid hitting per-minute caps.
  • Add a complex traffic shaper: queue or downsample requests when 429s appear, and possibly fall back to a secondary TTS provider.
  • At launch, closely watch dashboards; a sudden spike from 3,000 to 5,000 sessions risks tripping limits, causing pauses or failures in speech.

With LMNT, the same project can instead:

  • Use the Playground to tune voices and ensure latency feels right.
  • Fork the History Tutor demo as a starting point, wiring your LLM and session manager into its streaming pipeline.
  • Build a load test harness that actually hits your target: thousands of concurrent WebSocket sessions streaming TTS, without artificial caps.
  • Run aggressive pre-launch load tests using the Startup Grant to validate that:
    • TTFB stays around 150–200ms.
    • Audio quality and stability hold under load.
    • Your own infra (LLM, signaling, hosting) is the limiting factor—not the TTS provider.

When launch day traffic spikes beyond expectations, LMNT’s “No concurrency or rate limits” posture and enterprise-ready scaling mean your agent keeps talking. You iterate on your own autoscaling rather than begging for limit increases mid-incident.

Pro Tip: When evaluating any TTS vendor, treat rate limits and concurrency as first-class API features. Ask for explicit numbers, behavior under overload (429 vs queue vs silent slowdown), and whether they support running your own aggressive load tests. LMNT’s clear “No concurrency or rate limits” stance dramatically simplifies this conversation.

Summary

For a production voice agent, concurrency limits and rate limits are as important as voice quality. LMNT and ElevenLabs both deliver natural-sounding output, but they differ sharply in how they handle scale:

  • LMNT:

    • Built for streaming and real-time use with 150–200ms latency.
    • Explicitly: No concurrency or rate limits.
    • Predictable, volume-friendly pricing and strong scale posture (“We’ll scale with you”).
    • Easier, more honest load testing—your bottleneck is your own infra, not the TTS ceiling.
  • ElevenLabs:

    • Strong voices and broad adoption, but governed by quotas and rate limits.
    • Requires more defensive engineering for concurrency and burst traffic.
    • Load testing must be designed around plan constraints and potential throttling.

If your voice agent is more than a demo—if you expect thousands of concurrent conversations, real spikes, and production SLOs—LMNT’s combination of streaming performance, lack of concurrency/rate limits, and builder-native workflow (Playground → API → forkable demos) makes it a better fit for high-concurrency, low-latency deployments.

Next Step

Get Started