How do I get started with LMNT in Node.js and stream audio back as it’s generated?

Quick Answer: To get started with LMNT in Node.js and stream audio back as it’s generated, you’ll create a WebSocket (or HTTP streaming) client to the LMNT API, send text plus voice parameters, and pipe the incoming audio chunks directly to your playback or transport layer. You can try voices in the LMNT Playground first, then drop the same settings into your Node.js code to get 150–200 ms latency, lifelike speech, and real-time turn-taking.

Why This Matters

If you’re building conversational apps, agents, or games, you can’t afford robotic voices or multi-second delays. Users expect your AI to sound natural and respond in roughly human turn-taking time—anything slower breaks trust. Streaming LMNT audio from Node.js lets you:

Key Benefits:

Hit conversational latency: Streamed audio begins in ~150–200 ms, so your agent can start talking almost as soon as the LLM decides what to say.
Keep speech lifelike at scale: Use LMNT’s studio-quality voices and 5-second voice clones without adding custom infrastructure.
Stay builder-friendly: Prototype in the Playground, then reuse the same voice IDs and settings in a slim Node.js client—no extra SDKs, no rate-limit headaches.

Core Concepts & Key Points

Concept	Definition	Why it's important
Streaming TTS	Generating and sending audio in small chunks as the text is synthesized.	Enables sub-second responses instead of waiting for the full audio file to render. Critical for agents and games.
WebSocket / HTTP streaming	Long-lived connections where LMNT pushes audio frames as they’re ready.	Lets your Node.js app play or forward audio incrementally instead of buffering everything.
Voice + language config	The combination of voice (e.g., “Brandon”) and language parameters you send to LMNT.	Controls speaking style, language, and code-switch behavior across 24 languages, even mid-sentence.

How It Works (Step-by-Step)

At a high level, you’ll:

Test your voice setup in the LMNT Playground.
Create a Node.js client that streams audio from LMNT.
Pipe incoming audio frames to playback (browser) or transport (WebRTC, WebSocket, or file).

Below is a practical walkthrough you can adapt to your own stack.

1. Try voices in the LMNT Playground

Before writing Node.js code, lock in a voice and style:

Go to the LMNT Playground.
Try built‑in voices like:
- Brandon – engaging broadcaster style (great for newscaster / narrator responses).
- Leah – cheerful assistant.
- Vesper – nerdy tutor.
Paste in sample text, including multilingual code-switching, e.g.:

“Breaking news from Berlin. Und jetzt wechseln wir kurz ins Deutsche, bevor wir zurück ins Englische springen.”
Note the voice name and any configuration you like. You’ll mirror this from Node.

This gives you a reference sound and helps you pick a default voice for your agent or character.

2. Set up your Node.js project

Use a minimal Node setup—no heavy frameworks required.

mkdir lmnt-node-streaming
cd lmnt-node-streaming
npm init -y
npm install ws node-fetch

You’ll use:

ws for WebSocket connections (common for low-latency streaming).
node-fetch (or native fetch in newer Node) if you want HTTP-based streaming.

Add an environment file for your LMNT API key:

npm install dotenv

Create .env:

LMNT_API_KEY=your_lmnt_api_key_here

And a light bootstrap in index.js:

require('dotenv').config();

if (!process.env.LMNT_API_KEY) {
  throw new Error('Missing LMNT_API_KEY in .env');
}

3. Connect to the LMNT streaming endpoint

The exact URL and request shape live in LMNT’s API spec at:

https://api.lmnt.com/spec

The pattern you’ll follow:

Authenticate with your LMNT API key.
Open a streaming connection (WebSocket or HTTP).
Send text + voice config.
Receive audio chunks and metadata events.

Below is a conceptual WebSocket example you can adapt once you inspect the spec.

const WebSocket = require('ws');
require('dotenv').config();

const LMNT_API_KEY = process.env.LMNT_API_KEY;

// Replace this URL with the streaming endpoint from https://api.lmnt.com/spec
const LMNT_WS_URL = 'wss://api.lmnt.com/v1/tts/stream';

function createLmntStream({ text, voice, language }) {
  return new Promise((resolve, reject) => {
    const ws = new WebSocket(LMNT_WS_URL, {
      headers: {
        Authorization: `Bearer ${LMNT_API_KEY}`,
      },
    });

    ws.on('open', () => {
      const payload = {
        text,
        voice,      // e.g. "brandon"
        language,   // e.g. "en"
        // You can add optional controls as supported by the spec:
        // rate, pitch, emotion, etc.
      };
      ws.send(JSON.stringify(payload));
    });

    ws.on('message', (data, isBinary) => {
      if (isBinary) {
        // This is an audio frame; you’d forward it to your player/transport.
        handleAudioChunk(data);
        return;
      }

      // JSON messages (events, markers, errors)
      const msg = JSON.parse(data.toString());
      if (msg.type === 'done') {
        ws.close();
      } else if (msg.type === 'error') {
        console.error('LMNT error:', msg);
        ws.close();
        reject(new Error(msg.error || 'LMNT streaming error'));
      }
    });

    ws.on('close', () => resolve());
    ws.on('error', reject);
  });
}

function handleAudioChunk(chunk) {
  // Here you decide what “stream back as it’s generated” means.
  // Options:
  // 1. Write to stdout (debug).
  // 2. Relay to a browser via WebSocket.
  // 3. Push into WebRTC / LiveKit.
  // 4. Accumulate into a file (less “streaming”, more recording).
  console.log('Received audio chunk of size', chunk.length);
}

(async () => {
  await createLmntStream({
    text: 'Here are the latest headlines, brought to you by LMNT.',
    voice: 'brandon',
    language: 'en',
  });
})();

Use the spec to replace LMNT_WS_URL and to align with the exact message schema (e.g., event fields, audio field names, etc.).

4. Stream audio back to your user as it’s generated

“Streaming back” usually means one of three things:

A. Streaming to a browser (most common)

Common architecture:

Node.js holds the LMNT streaming connection.
Browser client connects to Node via WebSocket.
Node forwards audio chunks from LMNT directly to the browser.
Browser decodes and plays them with the Web Audio API.

Server-side relay:

// ws-server.js
const WebSocket = require('ws');
const { createLmntStream } = require('./lmnt-stream'); // from above

const server = new WebSocket.Server({ port: 8080 });

server.on('connection', (client) => {
  client.on('message', async (msg) => {
    const { text, voice, language } = JSON.parse(msg.toString());

    await createLmntStream({
      text,
      voice,
      language,
      onAudioChunk: (chunk) => {
        if (client.readyState === WebSocket.OPEN) {
          client.send(chunk); // binary frame to browser
        }
      },
    });
  });
});

On the client, you’d:

Open a WebSocket to ws://your-node-host:8080.
On each binary message, append to an AudioBuffer or MediaSource.
Start playback as soon as the first chunks arrive.

This pattern matches LMNT’s demos like History Tutor (LLM + streaming speech on Vercel) and Big Tony’s Auto Emporium (speech-to-speech using LiveKit).

B. Streaming into WebRTC / LiveKit

If you’re running a realtime experience (voice calls, in-game comms):

Node.js uses LMNT as the TTS engine.
You push audio frames into a WebRTC audio track or a LiveKit room.
The user hears the speech with ~150–200 ms LMNT latency plus your network overhead.

This is the same plumbing pattern as the Big Tony’s Auto Emporium demo.

C. Streaming to another backend service

For pipelines like:

LLM → LMNT TTS → your own mixer / transcoder.
Node just bridges LMNT’s chunks into a queue, gRPC stream, or another microservice.

5. Combine LMNT with your LLM in Node.js

For a conversational agent, LMNT is usually the last step:

Receive user input (text or ASR).
Call your LLM for the response.
As soon as you have text (or partial chunks), start LMNT streaming.

Pseudocode:

async function handleUserMessage({ userText, wsToBrowser }) {
  const llmResponseText = await callYourLLM(userText); // or stream partials

  await createLmntStream({
    text: llmResponseText,
    voice: 'brandon',
    language: 'en',
    onAudioChunk: (chunk) => {
      if (wsToBrowser.readyState === WebSocket.OPEN) {
        wsToBrowser.send(chunk);
      }
    },
  });
}

For maximum responsiveness, you can:

Stream partial LLM outputs into LMNT rather than waiting for the full text.
Let LMNT start speaking while the LLM is still generating.

This is where the 150–200 ms LMNT streaming latency really matters—your agent sounds “live” instead of canned.

Common Mistakes to Avoid

Treating LMNT like a file-only TTS API:
If you wait for a full audio blob before playing, you’re throwing away LMNT’s low-latency advantage. Use streaming endpoints and forward chunks as soon as they arrive.
Not planning the client playback path:
Node can receive audio, but your user still needs to hear it. Decide early whether you’ll stream to a browser, WebRTC, or another service, and design your binary protocol and codec handling around that.

Real-World Example

Say you’re building a “Breaking News” assistant in Node.js that reads headlines in a Brandon newscaster style. Your flow:

Node fetches headlines from https://text.npr.org/.
Concatenates them into a script.
Sends that script plus voice: "brandon" to LMNT via streaming.
Pipes LMNT’s live audio chunks through a WebSocket to the browser.
The browser starts playing within ~200 ms, and keeps playing as more audio arrives.

You get a continuous, lifelike news stream driven by your Node backend and the user perceives it as real-time broadcast speech—not pre-rendered clips.

Pro Tip: Start with a short test script (1–2 sentences) to validate your streaming path end-to-end—LMNT → Node → browser playback—before plugging in an LLM or complex content source like NPR headlines.

Summary

To get started with LMNT in Node.js and stream audio back as it’s generated, you:

Configure and test voices in the LMNT Playground (e.g., Brandon for broadcast-style delivery).
Use the LMNT API spec at https://api.lmnt.com/spec to open a streaming connection from Node and send text + voice parameters.
Forward LMNT’s binary audio chunks directly to your user via WebSockets, WebRTC, or another transport so playback begins within 150–200 ms.

This pattern gives your conversational apps, agents, and games fast, lifelike speech without juggling rate limits or custom streaming infrastructure.

Next Step

Get Started