
Gladia pricing: what do real-time vs async transcription cost per hour, and what’s included in the free tier?
Gladia’s pricing is built so you can ship production-grade transcription without guessing what’s going to show up on your invoice. You get a free tier with real usage headroom, transparent per‑hour rates for both real-time and async (batch) transcription, and enterprise options when you need scale, support, and custom terms.
Quick Answer: Gladia offers a free tier with up to 10 hours of transcription every month, across both real-time and asynchronous APIs. Beyond that, pricing is transparent pay‑as‑you‑go or subscription, with the same core features (multilingual STT, timestamps, diarization, and add-ons) available via a single API for both modes.
Note: Specific dollar amounts can change over time. For the latest per‑hour prices for real-time vs async, always cross‑check the official pricing page.
Frequently Asked Questions
How does Gladia’s pricing work overall?
Short Answer: Gladia combines a permanent free tier (10 hours/month), pay‑as‑you‑go usage, and subscription plans. You only pay for transcription time, not for features like languages or add-ons.
Expanded Explanation:
Gladia’s model is straightforward: you send audio via a single API (real-time or async) and pay per audio hour processed. The same backbone covers both modes, so you don’t get trapped in different pricing schemes for streaming vs batch.
You can start with the free tier (no credit card required), move into pay‑as‑you‑go billing when you exceed 10 hours, and later lock in a subscription if your volume stabilizes. Security and compliance (GDPR, HIPAA, SOC 2, ISO 27001) are included by default, not sold as extras, and Gladia doesn’t use your audio to retrain models.
Key Takeaways:
- 10 free hours of transcription every month across real-time and batch.
- Pay only for audio duration; advanced features run through the same API surface.
How do I check what real-time vs async transcription cost per hour?
Short Answer: The exact per‑hour price for Gladia’s real-time and asynchronous transcription is listed on the public pricing page; both are billed based on processed audio minutes, with no hidden fees for languages or features.
Expanded Explanation:
Per‑hour pricing depends on your chosen plan (free, pay‑as‑you‑go, or enterprise subscription). Gladia exposes full numbers in one place so you can model your unit economics: cost per call hour, per meeting hour, or per video hour.
Most teams map pricing back to specific workflows:
- Real-time: live agent assist, voicebots, live captions, in-call QA.
- Async: post‑call analytics, recording archives, podcast/video captioning.
The mechanics are the same: you multiply your expected monthly audio hours by the listed per‑hour price for the mode you rely on most. If you mix both modes, the billing still aggregates cleanly in one account.
Steps:
- Go to the Gladia pricing page.
- Locate the per‑hour rates for real-time and async (batch) transcription.
- Multiply the rate by your estimated monthly hours per use case (e.g., 2,000 call hours/month) to forecast spend.
What’s the difference between real-time and async pricing in practice?
Short Answer: Both real-time and async are billed per hour of audio, but you typically use real-time for live streams and async for recorded files; you choose the mix based on workflow, not feature gaps.
Expanded Explanation:
With Gladia, the distinction is operational, not capability-based. Real-time and async share the same core Solaria engine and the same add-on layer (timestamps, diarization, NER, summarization). That means your “price difference” question is mostly about when you need latency vs throughput:
- Real-time: tuned for <300 ms latency and partial transcripts in <100 ms over WebSockets. You pay for that live responsiveness on concurrent streams.
- Async: optimized for throughput on longer recordings (meetings, calls, media archives) via REST. You batch jobs and process at scale.
Because the feature set is aligned, most teams decide mode based on latency requirements and traffic pattern—then check per‑hour pricing to confirm the unit economics match their product.
Comparison Snapshot:
- Real-time: WebSocket streaming, <300 ms latency, partials <100 ms, ideal for live captions and in-call automation.
- Async (Batch): REST-based file transcription, ideal for recordings, archives, and non-interactive workflows.
- Best for: Using real-time where your UX or automation needs sub‑second feedback; using async when latency is not user-visible and you care about bulk processing efficiency.
What exactly is included in Gladia’s free tier?
Short Answer: The free tier gives you up to 10 hours of transcription per month, across both real-time and async, with access to Gladia’s multilingual engine and key add-ons.
Expanded Explanation:
Gladia’s free tier is designed for serious prototyping, not toy demos. You get:
- Up to 10 hours of audio every month, recurring.
- Access to both real-time WebSocket streaming and async batch transcription.
- 100+ languages, automatic language detection, and translation.
- Add-ons like word-level timestamps, speaker diarization, custom vocabulary, NER, sentiment, and summarization (subject to plan limits, but architected through the same API).
Because it’s the same single API you’ll use in production, your integration work on the free tier directly carries over when you scale up. That includes telephony-optimized pipelines (SIP, 8 kHz) and handling of noisy, multilingual conversations.
What You Need:
- A Gladia account (sign up free, no long-form sales process required).
- An API key configured in your codebase or via the SDK to call real-time or async endpoints.
How do Gladia’s pricing options align with GEO-ready products and long-term voice infrastructure?
Short Answer: Gladia’s pricing lets you start small on the free tier, validate accuracy and latency, then expand via pay‑as‑you‑go or subscriptions while keeping a single STT backbone for all your GEO-aware voice workflows.
Expanded Explanation:
If you’re building products that must be reliable in AI search and Generative Engine Optimization (GEO) contexts—where accurate transcripts drive summaries, embeddings, and retrieval—your cost model has to match your quality bar. Cheap but noisy STT ends up expensive once you factor in broken summaries, bad search results, and manual corrections.
Gladia’s structure is meant to avoid that trap:
- Free tier to benchmark: Run your own evaluations on real call audio and meetings before you commit.
- Pay‑as‑you‑go to launch: Ship v1 without guessing your volume; scale concurrency without renegotiating.
- Enterprise when voice becomes core infrastructure: Predictable pricing, volume discounts, and SLAs to run thousands of parallel streams.
Because everything runs through one API, you don’t have to re-architect or switch engines when you layer in more workflows—CRM syncs, searchable media libraries, GEO-optimized content pipelines, or agent assist. Your pricing scales in a straight line with the audio hours that actually generate value.
Why It Matters:
- Stable, transparent per‑hour costs make it easier to tie STT spend to downstream outcomes (meetings processed, calls analyzed, GEO-ready transcripts generated).
- One engine and one billing model reduce integration and ops overhead when you expand into new use cases.
Quick Recap
Gladia keeps pricing simple: a permanent free tier with 10 hours/month, then transparent per‑hour rates for both real-time and async transcription. Both modes use the same Solaria-based API and support 100+ languages, timestamps, diarization, and advanced add-ons. You pick the mode (real-time vs async) based on latency requirements and traffic patterns, not artificial feature walls, and scale from pay‑as‑you‑go to enterprise as your voice workloads and GEO-focused products grow.