How do I use the LMNT Unity SDK to generate character dialogue at runtime?

Most teams building AI-powered games and agents hit the same wall: you can script great dialogue, but if the voice feels robotic or lags behind the interaction, players stop believing the character. The LMNT Unity SDK is built to fix that—letting you generate lifelike character dialogue at runtime with streaming text-to-speech that hits roughly 150–200ms latency, good enough for real conversations.

Quick Answer: You use the LMNT Unity SDK by initializing the client with your API key, selecting a voice (or a voice clone), and streaming text-to-speech into an AudioSource at runtime. In practice, that means wiring LMNT’s low-latency streaming API into your dialogue system so characters can speak dynamically generated lines in sync with gameplay and player input.

Why This Matters

Runtime dialogue is where your game or agent stops feeling scripted and starts feeling present. If you’re using an LLM for branching dialogue, NPC barks, or system narration, you don’t want to pre-render audio for every possible line—that kills iteration speed and blows up your build size.

By using the LMNT Unity SDK to generate character dialogue at runtime, you get:

Conversational latency (around 150–200ms) instead of multi-second audio lag
Studio-quality voices—including custom clones from a 5-second recording
A production path that scales: no concurrency limits and volume pricing that improves as usage grows

Key Benefits:

Truly dynamic characters: Let NPCs, companions, and agents speak lines generated on the fly by your dialogue system or LLM.
Faster iteration, smaller builds: No need to bake thousands of voice lines into assets; ship updates without re-rendering audio.
Production-ready performance: Low-latency streaming, support for 24 languages (including mid-sentence switching), and no concurrency or rate limits as you scale.

Core Concepts & Key Points

Concept	Definition	Why it's important
Streaming TTS	Generating audio in small chunks while text is still arriving, rather than waiting for a full file	Enables 150–200ms latency so characters can respond in a conversational rhythm instead of feeling laggy
Voice selection & cloning	Choosing a built-in LMNT voice or creating a custom clone from a ~5s recording	Lets each character have a distinct, studio-quality voice without huge capture sessions
Runtime integration in Unity	Wiring LMNT’s streaming API into `AudioSource` / `AudioClip` pipelines and your dialogue manager	Keeps your game loop responsive while characters speak dynamically generated lines, in sync with animations and logic

How It Works (Step-by-Step)

At a high level, you’ll:

Plug LMNT into your Unity project (Playground → API → Unity).
Create a small runtime “speaker” component that talks to LMNT.
Feed it text at runtime from your dialogue or agent system and stream the audio straight into your character.

1. Set up LMNT and your API key

Create an account at lmnt.com and try voices in the Playground.
Generate an API key from the dashboard.
In Unity, store this key securely (e.g., in an environment variable or an encrypted config asset—never hardcode in public repos).

In C#, you’ll typically inject the key into your LMNT client:

public class LmntConfig : ScriptableObject
{
    [SerializeField] private string apiKey;

    public string ApiKey => apiKey;
}

2. Install and initialize the LMNT Unity SDK

Assuming you’ve added the LMNT Unity SDK (via UPM or DLL), create a lightweight wrapper that:

Builds a client using your API key
Opens a streaming TTS session
Connects it to a Unity AudioSource

Example pattern:

using UnityEngine;

public class LmntVoiceClient : MonoBehaviour
{
    [SerializeField] private LmntConfig config;
    [SerializeField] private string voiceId = "brandon"; // e.g., Brandon for a broadcaster NPC
    [SerializeField] private AudioSource audioSource;

    private ILmntStreamingClient client;

    private async void Awake()
    {
        client = new LmntStreamingClient(config.ApiKey);
        await client.ConnectAsync();
    }

    public async void Speak(string text)
    {
        // Cancel any current speech if needed
        await client.StopAsync();

        // Stream audio from LMNT and feed into AudioSource
        await foreach (var audioChunk in client.StreamTextToSpeechAsync(text, voiceId))
        {
            PlayChunk(audioChunk);
        }
    }

    private void PlayChunk(float[] samples)
    {
        var clip = AudioClip.Create("lmntChunk",
            samples.Length, 1, 24000, false);

        clip.SetData(samples, 0);

        // Play immediately, or enqueue into a custom buffer for smoother playback
        audioSource.clip = clip;
        audioSource.Play();
    }

    private async void OnDestroy()
    {
        if (client != null)
        {
            await client.DisposeAsync();
        }
    }
}

Note: The exact client type (ILmntStreamingClient, method names, sample rate) will depend on the current LMNT Unity SDK; use this as a structural template and align calls with the LMNT API spec.

3. Wire the SDK into your dialogue system

Once your LmntVoiceClient exists, connect it to your dialogue / agent logic:

Basic example:

public class NpcDialogue : MonoBehaviour
{
    [SerializeField] private LmntVoiceClient voiceClient;

    public void DeliverLine(string text)
    {
        voiceClient.Speak(text);
        // Optionally trigger animations, lip sync, subtitles here
    }
}

With LLM-generated dialogue:

public class AgentController : MonoBehaviour
{
    [SerializeField] private LmntVoiceClient voiceClient;
    [SerializeField] private NpcDialogue npcDialogue;

    public async void RespondToPlayer(string playerMessage)
    {
        string agentText = await MyLlmService.GetReplyAsync(playerMessage);
        npcDialogue.DeliverLine(agentText);
    }
}

The key is that you treat LMNT as the rendering layer for whatever text your game or agent generates at runtime.

Common Mistakes to Avoid

Blocking the main thread:
Don’t make synchronous HTTP or WebSocket calls from Update() or other main-thread-only contexts.
How to avoid it: Use async/await or background tasks for network I/O and only touch Unity objects (like AudioSource) back on the main thread (e.g., via UnityMainThreadDispatcher or a custom queue).
Recreating the client for every line:
Spinning up a new LMNT client or WebSocket session for each sentence adds latency and overhead.
How to avoid it: Initialize the LMNT client once (per character or per scene) and reuse it for multiple lines. Use pause/stop APIs for interruptions rather than tearing down the connection.

Real-World Example

Imagine you’re shipping a story-driven Unity game with an AI “History Tutor” character—similar to LMNT’s Vercel-hosted History Tutor demo, but fully in-engine. The player can ask any question about historical events, and your backend LLM replies with a text explanation.

With LMNT’s Unity SDK:

Your game sends the player’s question to your backend.
The backend’s LLM produces an answer like:
“In 1969, the Apollo 11 mission landed the first humans on the Moon…”
Instead of pre-rendering anything, your AgentController passes that text to LmntVoiceClient.Speak().
LMNT streams audio back in ~150–200ms, and your character starts speaking almost immediately, in a studio-quality tutor voice.
Because LMNT supports 24 languages and mid-sentence switching, your tutor can naturally code-switch when answering questions that mix English with names or terms in other languages.

The result: the character feels “live,” not pre-scripted, and the voice keeps up with the conversation.

Pro Tip: For more natural interactions, start animating or lip-syncing as soon as the first audio chunk arrives—don’t wait for the entire line. LMNT’s streaming lets you begin playback almost immediately, so tie your facial animations and subtitles to streaming events rather than line completion.

Summary

To use the LMNT Unity SDK to generate character dialogue at runtime, you:

Initialize a low-latency LMNT streaming client using your API key
Choose a built-in voice (or a 5-second voice clone) for each character
Stream TTS audio directly into an AudioSource whenever your dialogue or agent system emits text

Because LMNT is built for conversational apps, agents, and games—with 150–200ms streaming, 24 languages, no concurrency or rate limits, and studio-quality voice clones—you can rely on it not just for prototypes, but for live production experiences where character voice has to feel real and stay responsive under load.

Next Step

Get Started

How do I use the LMNT Unity SDK to generate character dialogue at runtime?

Why This Matters

Core Concepts & Key Points

How It Works (Step-by-Step)

1. Set up LMNT and your API key

2. Install and initialize the LMNT Unity SDK

3. Wire the SDK into your dialogue system

Common Mistakes to Avoid

Real-World Example

Summary

Next Step

Keep Reading

More from Text-to-Speech APIs

How do I migrate from ElevenLabs to LMNT and claim the 500,000 free migration credits?

How do I contact LMNT sales for an Enterprise plan with SLA and dedicated support?

How do I apply for the LMNT Startup Grant (3 months free, 15M characters/month) and how long does approval take?