
LMNT vs OpenAI: cost comparison at scale (per-character pricing, overages) for tens of millions of characters/month
Most teams don’t run into real TTS cost problems until usage jumps into the tens of millions of characters per month—right when a “quick test” becomes a production feature and invoices start to spike. That’s usually when the OpenAI line item gets a hard look and alternative engines like LMNT enter the conversation.
Quick Answer: At tens of millions of characters per month, LMNT is designed to stay predictable and affordable with character-based pricing that improves with volume and no concurrency or rate limits. OpenAI’s TTS is competitively priced at small to medium scale, but costs, hidden overages (via throttling and retries), and latency under real load can add up once you move into continuous, production-grade voice for agents, apps, and games.
Why This Matters
If your product speaks out loud—an agent, tutor, or game character—TTS spend becomes a core COGS driver, not a side experiment. The wrong pricing model can:
- Force aggressive rate limiting on users.
- Make you under-scope voice features just to stay under budget.
- Hide true costs behind throttling, timeouts, and operational workarounds.
A good TTS choice at this scale has to balance three things at once: per-character price, behavior under load (no surprise throttles or concurrency caps), and latency that’s low enough for turn-taking.
Key Benefits:
- Predictable per-character economics: LMNT’s character-based pricing is built to get cheaper as volume grows, so tens of millions of characters/month don’t break your COGS model.
- No concurrency or rate limits: You can scale usage without hidden overage penalties from throttling, retries, or forced queuing during peak traffic.
- Built for conversational latency: 150–200 ms low-latency streaming makes it viable to ship voice-forward agents and games without compromising experience to save on cost.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Per‑character pricing | Charging based on the number of characters synthesized into speech. | Lets you model cost directly against usage (e.g., characters/user/session) and project spend at 10M–100M characters/month. |
| Overages & throttling | The implicit “cost” of hitting concurrency or rate limits—timeouts, retries, and degraded UX. | Even if per-character price is low, aggressive limits can raise your effective cost and force architectural workarounds. |
| Latency at scale | End‑to‑end time from text to first audio chunk under real traffic. | If latency degrades as you scale, your agent or game stops feeling conversational and users drop off—wasting both time and TTS spend. |
How It Works (Step‑by‑Step)
Here’s how I’d evaluate LMNT vs OpenAI for tens of millions of characters per month.
-
Translate product usage into characters
- Start with your voice surface area:
- Average characters per response (e.g., 250–400 characters for an assistant turn).
- Average turns per session.
- Monthly active sessions.
- Multiply to estimate monthly characters (e.g., 300 chars/turn × 12 turns/session × 250k sessions ≈ 900M characters/month).
- This number is your baseline for comparing LMNT and OpenAI.
- Start with your voice surface area:
-
Apply per‑character pricing and volume tiers
- OpenAI:
- TTS is typically priced per “input character” or per audio duration, with different rates for “standard” vs “HD” or “enhanced” voices.
- Volume discounts may exist, but you’re often negotiating or manually modeling beyond a certain scale.
- LMNT:
- Character-based pricing explicitly designed to “get even better with volume.”
- Unlimited voice clones and no concurrency limits mean you’re not adding hidden costs for “premium” features or scale.
- For tens of millions of characters, you should:
- Compute effective CPM (cost per million characters) for each platform.
- Compare what happens when you double usage—do you get better effective pricing, or do you hit a new tier with penalties?
- OpenAI:
-
Factor in operational overages
- This is where the headline rate can be misleading.
- OpenAI:
- Global usage and shared infrastructure can mean rate limits, model-level caps, or request throttling at peak times.
- Over time, teams end up adding:
- Retries and backoff logic.
- Queueing/batching layers to stay under limits.
- Fallback engines to keep UX from breaking.
- All of that adds “operational overage”: engineering time plus extra infra.
- LMNT:
- “No concurrency or rate limits” and “Enterprise plans when you’re ready or need something custom.”
- You can treat TTS as a simple function of characters and latency, not a juggling act of caps and workarounds.
- At tens of millions of characters, this often makes LMNT’s effective cost lower even when sticker prices look similar.
Common Mistakes to Avoid
-
Ignoring concurrency and rate limits in your cost model:
It’s not enough to multiply characters by a price. Check if your vendor has per-minute or per-project caps that will force you to:- Throttle user sessions.
- Add a job queue.
- Stand up a multi‑vendor failover strategy.
When comparing LMNT and OpenAI, model what happens during your peak hour, not just your average month.
-
Treating latency as a “nice to have” instead of a cost lever:
Slow TTS leads to longer sessions, more user abandonments, and more retry logic—which all inflate effective spend. LMNT’s 150–200 ms streaming is optimized for conversational turn‑taking; if you have to stack buffering or “thinking” animations on top of a slower engine to hide lag, that’s a real product cost.
Real‑World Example
Say you’re shipping a voice tutor that uses streaming TTS for every answer:
- 300 characters per response on average.
- 15 responses per session.
- 50,000 sessions per day.
That’s:
- 300 × 15 × 50,000 = 225,000,000 characters/day.
- ~6.75B characters/month.
With two different engines:
-
On OpenAI:
- You estimate cost via the published per-character (or per‑second) rate, then layer on:
- Higher‑quality voice multipliers (HD/Expressive tiers).
- Potential rate limits that force you to:
- Queue or slow responses at peak.
- Implement multi‑region, multi‑vendor routing.
- Your “list price” might look fine, but in practice you:
- Drop some audio requests under load (wasted spend + bad UX).
- Spend engineering time and infra budget on complex workarounds.
- You estimate cost via the published per-character (or per‑second) rate, then layer on:
-
On LMNT:
- You model cost as:
- Characters × a tiered CPM that improves as you approach billions of characters.
- No added fee for voice cloning (studio‑quality from a 5‑second recording) or for peak concurrency: you can run as many simultaneous sessions as your app can drive.
- Latency stays in the 150–200 ms range, so you don’t need buffering hacks to paper over slow speech.
- Result: a more linear, predictable cost curve. When your tutor usage doubles, your TTS line item scales sensibly instead of tripping a series of new limits and hidden costs.
- You model cost as:
Pro Tip: When you run your vendor comparison, simulate your busiest 15 minutes, not your average day. Use a load test that mirrors real dialogue (short, frequent turns) and track: error rates, median/95th percentile latency to first audio, and any throttling behavior. Then translate those into “effective CPM” for both LMNT and OpenAI—this is where LMNT’s “no concurrency or rate limits” and low‑latency streaming usually show up as real savings.
Summary
At small to moderate volumes, both LMNT and OpenAI can look comparable on headline per-character pricing. The real divergence appears when your usage climbs into tens or hundreds of millions of characters per month:
- LMNT leans into predictable, volume‑friendly per‑character pricing, unlimited voice cloning from just 5 seconds of audio, 24 languages with natural code‑switching, and no concurrency or rate limits.
- OpenAI often requires you to plan around caps, negotiate discounts, and build operational scaffolding to handle throttling—costs that don’t show up in the sticker price.
If you’re building conversational apps, agents, or games and expect usage at this scale, you want a TTS engine where cost is a straightforward function of characters and latency—not a maze of hidden overages.