
How do I get started with LMNT in Node.js and stream audio back as it’s generated?
Most Node.js teams start with the same two questions: how fast can I get LMNT wired into my stack, and how do I stream audio back as it’s generated instead of waiting for a full file? The good news is that LMNT is built exactly for this—real-time, low-latency audio over streaming APIs—so you can go from “hello world” to conversational-grade voice in one session.
Quick Answer: Use LMNT’s streaming API from Node.js over WebSockets or HTTP chunked responses to generate audio as it’s synthesized, then pipe those chunks directly into your playback layer. Start by testing voices in the LMNT Playground, grab an API key, and then use a simple Node.js client to send text and stream the resulting audio buffers back to your client or media pipeline in real time.
Why This Matters
If you’re building conversational apps, agents, or games, buffering an entire TTS clip before playback kills the experience. Users feel the lag, turn-taking falls apart, and your otherwise-smart assistant sounds like a pre-rendered voicemail. LMNT’s 150–200ms low-latency streaming is designed to solve this: you start playing audio almost immediately, keep the interaction natural, and still get lifelike delivery across 24 languages and cloned voices.
Key Benefits:
- Conversational latency: 150–200ms streaming means users hear a response as soon as it’s ready, not after a full synthesis pass.
- Production-ready scaling: No concurrency or rate limits, so you can fan out across many simultaneous Node.js sessions without re-architecting.
- Flexible voice control: Use built-in voices or studio-quality clones from a 5-second recording, and code-switch mid-sentence in 24 languages.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Streaming TTS | Generating and delivering audio incrementally over a network connection (WebSocket or HTTP streaming) instead of waiting for a full file. | Enables conversational-grade turn-taking; users hear speech within ~150–200ms instead of seconds later. |
| Node.js audio pipeline | The flow of data from LMNT’s API → your Node.js backend → browser/native client or media server. | Determines how quickly and smoothly audio reaches users; a clean pipeline avoids buffering and jitter. |
| Voice selection & cloning | Choosing LMNT’s stock voices (e.g., “Brandon” for a broadcaster style) or creating a voice clone from a short recording. | Lets you align the voice with your product—newsreader, tutor, game NPC—without sacrificing latency. |
How It Works (Step-by-Step)
At a high level, you’ll: test voices, wire up Node.js, then stream audio to your front end or media layer.
-
Try LMNT in the Playground
- Go to the LMNT Playground from the main site.
- Test voices like “Brandon” (engaging broadcaster), “Leah” (cheerful assistant), or “Vesper” (nerdy tutor).
- Adjust style and language, and confirm the voice that matches your use case (e.g., a newscaster-style agent).
-
Grab your API key
- Create or log into your LMNT account.
- Generate an API key from the dashboard (you’ll use this as a Bearer token or in an
x-api-keyheader, depending on the client). - Keep it server-side in your Node.js app (environment variables, not hardcoded).
-
Set up your Node.js project
-
Initialize a project:
mkdir lmnt-node-streaming cd lmnt-node-streaming npm init -y npm install axios ws dotenv -
Add a
.envfile:LMNT_API_KEY=your_api_key_here -
Load your env vars in
index.js:require('dotenv').config(); const API_KEY = process.env.LMNT_API_KEY;
-
-
Call LMNT’s streaming API from Node.js
Depending on the API surface you prefer, you’ll either:
- Use WebSockets for ultra-low-latency streaming (ideal for conversational agents, games, and speech-to-speech), or
- Use HTTP streaming (chunked responses) for simpler server-to-client pipelines.
Below is a conceptual WebSocket flow you can adapt once you’re in the docs.
const WebSocket = require('ws'); const API_KEY = process.env.LMNT_API_KEY; const TTS_WS_URL = 'wss://api.lmnt.com/v1/stream'; // check docs for current path function streamTTS(text, options = {}) { return new Promise((resolve, reject) => { const ws = new WebSocket(TTS_WS_URL, { headers: { Authorization: `Bearer ${API_KEY}`, }, }); const audioChunks = []; ws.on('open', () => { const payload = { text, voice: options.voice || 'brandon', // engaging broadcaster style format: options.format || 'audio/opus', // or 'audio/mpeg', 'audio/wav' – see docs language: options.language || 'en', }; ws.send(JSON.stringify(payload)); }); ws.on('message', (data) => { // Depending on protocol, you might receive JSON frames and raw audio frames. // A common pattern is {type: 'audio', chunk: <binary>} or binary frames directly. // Pseudocode: if (Buffer.isBuffer(data)) { // Raw audio chunk audioChunks.push(data); // Stream this chunk to your playback layer here } else { const msg = JSON.parse(data.toString()); if (msg.type === 'end') { ws.close(); } if (msg.type === 'error') { console.error('LMNT error:', msg); ws.close(); reject(new Error(msg.error)); } } }); ws.on('close', () => { const fullAudio = Buffer.concat(audioChunks); resolve(fullAudio); }); ws.on('error', (err) => { reject(err); }); }); } (async () => { const text = 'Here are the latest headlines from NPR.'; const audioBuffer = await streamTTS(text, { voice: 'brandon' }); console.log(`Received ${audioBuffer.length} bytes of audio`); })().catch(console.error);In a real app, you won’t wait for
resolve()before playing audio—you’ll stream eachaudioChunks.push(data)packet out to:- A browser client over WebSockets or Server-Sent Events.
- A media server (e.g., LiveKit, WebRTC pipeline).
- A Node audio player (CLI tool, bot, or backend processor).
-
Stream audio from Node.js to the client
Here’s a simple pattern using Express + WebSocket to forward LMNT’s streaming audio to a browser:
npm install express wsconst express = require('express'); const http = require('http'); const WebSocketServer = require('ws').Server; const LMNTWebSocket = require('ws'); const app = express(); const server = http.createServer(app); const wss = new WebSocketServer({ server }); wss.on('connection', (client) => { client.on('message', (raw) => { const { text } = JSON.parse(raw.toString()); const lmnt = new LMNTWebSocket('wss://api.lmnt.com/v1/stream', { headers: { Authorization: `Bearer ${API_KEY}` }, }); lmnt.on('open', () => { lmnt.send(JSON.stringify({ text, voice: 'brandon', format: 'audio/opus' })); }); lmnt.on('message', (data) => { // Forward raw audio chunks directly to the browser client.send(data); }); lmnt.on('close', () => client.close()); lmnt.on('error', (err) => { console.error(err); client.close(); }); }); }); server.listen(3000, () => console.log('Server listening on :3000'));On the frontend, you’d connect via WebSocket, receive binary audio frames, and feed them into a Web Audio or WebRTC pipeline for immediate playback.
-
Iterate on voice and language
- Switch voices without changing your pipeline:
voice: 'leah'for a friendly assistant,voice: 'vesper'for a tutor, etc. - Use multilingual support: set
language: 'es'or mix languages mid-sentence—LMNT can code-switch naturally. - When you’re ready, create a voice clone from a 5-second recording, then reference its ID in place of a stock voice.
- Switch voices without changing your pipeline:
-
Scale to production
When you’re past your first prototype:
- Use LMNT’s character-based pricing to estimate and control cost per session.
- Take advantage of no concurrency or rate limits to spawn many Node workers or containers without negotiating caps.
- If you’re shipping at scale, lean on SOC-2 Type II compliance and enterprise plans “when you’re ready or need something custom.”
Common Mistakes to Avoid
-
Treating streaming like file download:
Don’t wait for the full audio buffer to be assembled before sending anything to the client. Instead, forward each chunk as you receive it from LMNT so playback can begin immediately and stay conversational. -
Leaking API keys to the browser:
Never embed your LMNT key in frontend code. Always terminate the LMNT connection server-side in Node.js and proxy a safe, app-specific stream to the client (WebSockets, SSE, or HTTP streaming). -
Ignoring audio format vs. playback stack:
Make sure your chosen format (audio/opus,audio/mpeg,audio/wav) matches what your client can play. Browsers are great with Opus-in-WebM / Ogg and MP3; lower-level engines or game runtimes may prefer raw PCM or WAV. -
Not testing latency under load:
LMNT’s backend won’t throttle you, but your own infrastructure might. Test with many concurrent WebSocket sessions to ensure your Node event loop and network stack can keep up.
Real-World Example
Say you’re building a Node.js service that reads the latest headlines from https://text.npr.org/ in a “Brandon” newscaster style and streams the audio to a client, continuously.
A minimal flow looks like this:
- Node.js fetches the latest headlines (using
axiosornode-fetch). - You concatenate or summarize them into a short script.
- Your Node service opens an LMNT streaming connection with
voice: 'brandon'. - As LMNT sends back audio chunks, you immediately push them over a WebSocket to your frontend.
- The browser decodes the audio stream and plays it instantly, so the user hears a live, broadcaster-style feed with only ~150–200ms of perceived latency.
Pro Tip: Before wiring up your full news pipeline, validate the exact text and voice style in the LMNT Playground. Once you’re happy with pacing and tone, copy that same text and voice configuration into your Node.js code so your streaming output matches what you tested.
Summary
To get started with LMNT in Node.js and stream audio back as it’s generated, you:
- Test voices and styles in the Playground to pick the right sound.
- Grab an API key and wire up a Node.js client using WebSockets or HTTP streaming.
- Forward LMNT’s audio chunks directly to your client or media server so playback starts within ~150–200ms.
- Iterate on voices, languages, and clones without changing your streaming pipeline, then scale out confidently with no concurrency limits and predictable pricing.
Once your streaming path is in place, you can plug it into conversational agents, tutors, broadcasters, or game characters—all powered by the same Node.js + LMNT core.