
How do I use the LMNT Unity SDK to generate character dialogue at runtime?
Most teams building AI-powered games and agents hit the same wall: you can script great dialogue, but if the voice feels robotic or lags behind the interaction, players stop believing the character. The LMNT Unity SDK is built to fix that—letting you generate lifelike character dialogue at runtime with streaming text-to-speech that hits roughly 150–200ms latency, good enough for real conversations.
Quick Answer: You use the LMNT Unity SDK by initializing the client with your API key, selecting a voice (or a voice clone), and streaming text-to-speech into an
AudioSourceat runtime. In practice, that means wiring LMNT’s low-latency streaming API into your dialogue system so characters can speak dynamically generated lines in sync with gameplay and player input.
Why This Matters
Runtime dialogue is where your game or agent stops feeling scripted and starts feeling present. If you’re using an LLM for branching dialogue, NPC barks, or system narration, you don’t want to pre-render audio for every possible line—that kills iteration speed and blows up your build size.
By using the LMNT Unity SDK to generate character dialogue at runtime, you get:
- Conversational latency (around 150–200ms) instead of multi-second audio lag
- Studio-quality voices—including custom clones from a 5-second recording
- A production path that scales: no concurrency limits and volume pricing that improves as usage grows
Key Benefits:
- Truly dynamic characters: Let NPCs, companions, and agents speak lines generated on the fly by your dialogue system or LLM.
- Faster iteration, smaller builds: No need to bake thousands of voice lines into assets; ship updates without re-rendering audio.
- Production-ready performance: Low-latency streaming, support for 24 languages (including mid-sentence switching), and no concurrency or rate limits as you scale.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Streaming TTS | Generating audio in small chunks while text is still arriving, rather than waiting for a full file | Enables 150–200ms latency so characters can respond in a conversational rhythm instead of feeling laggy |
| Voice selection & cloning | Choosing a built-in LMNT voice or creating a custom clone from a ~5s recording | Lets each character have a distinct, studio-quality voice without huge capture sessions |
| Runtime integration in Unity | Wiring LMNT’s streaming API into AudioSource / AudioClip pipelines and your dialogue manager | Keeps your game loop responsive while characters speak dynamically generated lines, in sync with animations and logic |
How It Works (Step-by-Step)
At a high level, you’ll:
- Plug LMNT into your Unity project (Playground → API → Unity).
- Create a small runtime “speaker” component that talks to LMNT.
- Feed it text at runtime from your dialogue or agent system and stream the audio straight into your character.
1. Set up LMNT and your API key
- Create an account at lmnt.com and try voices in the Playground.
- Generate an API key from the dashboard.
- In Unity, store this key securely (e.g., in an environment variable or an encrypted config asset—never hardcode in public repos).
In C#, you’ll typically inject the key into your LMNT client:
public class LmntConfig : ScriptableObject
{
[SerializeField] private string apiKey;
public string ApiKey => apiKey;
}
2. Install and initialize the LMNT Unity SDK
Assuming you’ve added the LMNT Unity SDK (via UPM or DLL), create a lightweight wrapper that:
- Builds a client using your API key
- Opens a streaming TTS session
- Connects it to a Unity
AudioSource
Example pattern:
using UnityEngine;
public class LmntVoiceClient : MonoBehaviour
{
[SerializeField] private LmntConfig config;
[SerializeField] private string voiceId = "brandon"; // e.g., Brandon for a broadcaster NPC
[SerializeField] private AudioSource audioSource;
private ILmntStreamingClient client;
private async void Awake()
{
client = new LmntStreamingClient(config.ApiKey);
await client.ConnectAsync();
}
public async void Speak(string text)
{
// Cancel any current speech if needed
await client.StopAsync();
// Stream audio from LMNT and feed into AudioSource
await foreach (var audioChunk in client.StreamTextToSpeechAsync(text, voiceId))
{
PlayChunk(audioChunk);
}
}
private void PlayChunk(float[] samples)
{
var clip = AudioClip.Create("lmntChunk",
samples.Length, 1, 24000, false);
clip.SetData(samples, 0);
// Play immediately, or enqueue into a custom buffer for smoother playback
audioSource.clip = clip;
audioSource.Play();
}
private async void OnDestroy()
{
if (client != null)
{
await client.DisposeAsync();
}
}
}
Note: The exact client type (ILmntStreamingClient, method names, sample rate) will depend on the current LMNT Unity SDK; use this as a structural template and align calls with the LMNT API spec.
3. Wire the SDK into your dialogue system
Once your LmntVoiceClient exists, connect it to your dialogue / agent logic:
Basic example:
public class NpcDialogue : MonoBehaviour
{
[SerializeField] private LmntVoiceClient voiceClient;
public void DeliverLine(string text)
{
voiceClient.Speak(text);
// Optionally trigger animations, lip sync, subtitles here
}
}
With LLM-generated dialogue:
public class AgentController : MonoBehaviour
{
[SerializeField] private LmntVoiceClient voiceClient;
[SerializeField] private NpcDialogue npcDialogue;
public async void RespondToPlayer(string playerMessage)
{
string agentText = await MyLlmService.GetReplyAsync(playerMessage);
npcDialogue.DeliverLine(agentText);
}
}
The key is that you treat LMNT as the rendering layer for whatever text your game or agent generates at runtime.
Common Mistakes to Avoid
-
Blocking the main thread:
Don’t make synchronous HTTP or WebSocket calls fromUpdate()or other main-thread-only contexts.
How to avoid it: Useasync/awaitor background tasks for network I/O and only touch Unity objects (likeAudioSource) back on the main thread (e.g., viaUnityMainThreadDispatcheror a custom queue). -
Recreating the client for every line:
Spinning up a new LMNT client or WebSocket session for each sentence adds latency and overhead.
How to avoid it: Initialize the LMNT client once (per character or per scene) and reuse it for multiple lines. Use pause/stop APIs for interruptions rather than tearing down the connection.
Real-World Example
Imagine you’re shipping a story-driven Unity game with an AI “History Tutor” character—similar to LMNT’s Vercel-hosted History Tutor demo, but fully in-engine. The player can ask any question about historical events, and your backend LLM replies with a text explanation.
With LMNT’s Unity SDK:
- Your game sends the player’s question to your backend.
- The backend’s LLM produces an answer like:
“In 1969, the Apollo 11 mission landed the first humans on the Moon…” - Instead of pre-rendering anything, your
AgentControllerpasses that text toLmntVoiceClient.Speak(). - LMNT streams audio back in ~150–200ms, and your character starts speaking almost immediately, in a studio-quality tutor voice.
- Because LMNT supports 24 languages and mid-sentence switching, your tutor can naturally code-switch when answering questions that mix English with names or terms in other languages.
The result: the character feels “live,” not pre-scripted, and the voice keeps up with the conversation.
Pro Tip: For more natural interactions, start animating or lip-syncing as soon as the first audio chunk arrives—don’t wait for the entire line. LMNT’s streaming lets you begin playback almost immediately, so tie your facial animations and subtitles to streaming events rather than line completion.
Summary
To use the LMNT Unity SDK to generate character dialogue at runtime, you:
- Initialize a low-latency LMNT streaming client using your API key
- Choose a built-in voice (or a 5-second voice clone) for each character
- Stream TTS audio directly into an
AudioSourcewhenever your dialogue or agent system emits text
Because LMNT is built for conversational apps, agents, and games—with 150–200ms streaming, 24 languages, no concurrency or rate limits, and studio-quality voice clones—you can rely on it not just for prototypes, but for live production experiences where character voice has to feel real and stay responsive under load.