
How do I implement a real “Stop generating” button that actually cancels a streaming response?
Most streaming chat UIs only “pretend” to stop generation: they hide the spinner, but the model keeps streaming in the background, burning tokens and money. A real Stop generating button must actually cancel the underlying streaming request and clean up state so the model stops sending tokens and your UI stops listening.
This guide walks through how to implement a real cancel behavior end‑to‑end: from the frontend button, to the streaming connection, to the server handler. Examples assume React and a typical AI stack (Vercel AI SDK, LangChain, or similar), but the concepts apply to any framework.
What “Stop generating” should actually do
To truly stop a streaming response, your button must:
-
Abort the network request
- Browser:
AbortController→ abortsfetchor SSE. - Node / server: cancel the model call (OpenAI/Anthropic/LangChain, etc).
- Browser:
-
Stop the UI from consuming the stream
- Unsubscribe from the readable stream or event source.
- Prevent further state updates for that request.
-
Update chat state
- Mark the assistant message as “interrupted”.
- Ensure the next user message continues from the partial content correctly.
If you only set a local “stopped” flag and ignore the stream, the LLM continues running, and you still pay for the tokens. You need real cancellation.
Core pattern: abortable streaming with AbortController
The simplest building block for a real stop button in a React app is AbortController.
Frontend: attaching AbortController to your stream
// chatClient.ts
export async function streamChat({
messages,
signal,
}: {
messages: { role: 'user' | 'assistant'; content: string }[]
signal?: AbortSignal
}) {
const response = await fetch('/api/chat', {
method: 'POST',
body: JSON.stringify({ messages }),
headers: { 'Content-Type': 'application/json' },
signal,
})
if (!response.ok) throw new Error('Request failed')
return response.body // ReadableStream<Uint8Array>
}
In React:
import { useState, useRef } from 'react'
export function Chat() {
const [messages, setMessages] = useState<any[]>([])
const [isGenerating, setIsGenerating] = useState(false)
const abortRef = useRef<AbortController | null>(null)
async function handleSend(userText: string) {
// Add user message
const userMessage = { id: crypto.randomUUID(), role: 'user', content: userText }
const assistantMessage = {
id: crypto.randomUUID(),
role: 'assistant',
content: '',
status: 'streaming' as const,
}
setMessages(prev => [...prev, userMessage, assistantMessage])
setIsGenerating(true)
const controller = new AbortController()
abortRef.current = controller
try {
const stream = await streamChat({
messages: [...messages, userMessage],
signal: controller.signal,
})
const reader = stream!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value, { stream: true })
setMessages(prev =>
prev.map(m =>
m.id === assistantMessage.id
? { ...m, content: m.content + chunk }
: m,
),
)
}
setMessages(prev =>
prev.map(m =>
m.id === assistantMessage.id
? { ...m, status: 'complete' }
: m,
),
)
} catch (err: any) {
if (err.name === 'AbortError') {
// Mark as interrupted but keep partial content
setMessages(prev =>
prev.map(m =>
m.id === assistantMessage.id
? { ...m, status: 'interrupted' }
: m,
),
)
} else {
// Handle real error
setMessages(prev =>
prev.map(m =>
m.id === assistantMessage.id
? { ...m, status: 'error', error: err.message }
: m,
),
)
}
} finally {
setIsGenerating(false)
abortRef.current = null
}
}
function handleStop() {
abortRef.current?.abort()
}
return (
<div>
{/* render messages here */}
<div className="actions">
{isGenerating ? (
<button onClick={handleStop}>Stop generating</button>
) : (
<form
onSubmit={e => {
e.preventDefault()
const form = e.currentTarget
const input = form.elements.namedItem('message') as HTMLInputElement
if (input.value.trim()) {
handleSend(input.value.trim())
input.value = ''
}
}}
>
<input name="message" placeholder="Send a message…" />
<button type="submit">Send</button>
</form>
)}
</div>
</div>
)
}
Key points:
AbortControlleris created per request; itssignalis passed to both:fetch(to actually cancel the HTTP request).- The streaming reader loop (which throws on abort).
handleStopcallsabortRef.current.abort(), which immediately cancels the streaming response.- Status is updated (
streaming→interrupted) so the UI can show a “stopped” indicator and allow continuing the conversation.
Server: making the streaming handler abortable
The frontend can abort the HTTP request, but if your server keeps calling the LLM, you still waste tokens. You need to propagate cancellation to the model call.
With Vercel AI SDK (@ai-sdk/openai / @vercel/ai style)
If your stack uses Vercel’s AI SDK (common with assistant-ui setups), the handler already integrates with AbortSignal.
Example (Next.js Route Handler):
// app/api/chat/route.ts
import { NextRequest } from 'next/server'
import OpenAI from 'openai'
import { OpenAIStream, StreamingTextResponse } from 'ai'
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! })
export async function POST(req: NextRequest) {
const { messages } = await req.json()
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
stream: true,
messages,
})
// Hooks AbortSignal under the hood
const stream = OpenAIStream(response, {
// optional: handle 'done' or 'abort' events
onAbort() {
console.log('Client aborted streaming')
},
})
return new StreamingTextResponse(stream)
}
The critical part: do not swallow the abort. Let the request terminate naturally. The Vercel AI SDK will close the stream, and the OpenAI client stops reading.
With plain OpenAI SDK and Node
If you’re not using a helper library:
// server.ts
import { Request, Response } from 'express'
import OpenAI from 'openai'
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! })
export async function chatHandler(req: Request, res: Response) {
const { messages } = req.body
res.setHeader('Content-Type', 'text/event-stream')
res.setHeader('Cache-Control', 'no-cache')
res.flushHeaders()
const abortController = new AbortController()
// Abort when client disconnects (Stop or page close)
req.on('close', () => {
abortController.abort()
})
try {
const completion = await client.chat.completions.create(
{
model: 'gpt-4o-mini',
stream: true,
messages,
},
{ signal: abortController.signal },
)
for await (const chunk of completion) {
const delta = chunk.choices[0]?.delta?.content ?? ''
res.write(delta)
}
res.end()
} catch (err: any) {
if (err.name === 'AbortError') {
// Client cancelled; just end quietly
res.end()
return
}
console.error(err)
if (!res.writableEnded) {
res.status(500).end('Error')
}
}
}
Here, when the browser aborts fetch, Node’s req receives a close event, and we:
- Call
abortController.abort(). - That cancels the OpenAI stream and stops token usage.
Handling Stop with assistant-ui and stateful chat
assistant-ui is designed to give you ChatGPT-like UX with real streaming and state management. If you’re using it, you typically don’t need to reinvent the wheel; it already:
- Manages streaming, interruptions, retries, and multi-turn state.
- Works with Vercel AI SDK, LangChain, LangGraph, etc.
The general pattern with assistant-ui:
- Backend: expose an AI route that supports streaming and respects
AbortSignal(as shown above). - Frontend: use assistant-ui components/hooks to wire up the stop button.
A simple example using a generic chat hook (pseudo-style, since exact APIs can vary):
import { useAssistantUI } from '@assistant-ui/react'
export function Chat() {
const {
messages,
input,
setInput,
sendMessage,
stopGenerating,
isGenerating,
} = useAssistantUI()
return (
<div className="chat">
{/* render messages */}
<div className="messages">
{messages.map(m => (
<div key={m.id} data-role={m.role}>
{m.content}
</div>
))}
</div>
<div className="input-row">
<input
value={input}
onChange={e => setInput(e.target.value)}
placeholder="Send a message…"
/>
{isGenerating ? (
<button onClick={stopGenerating}>Stop generating</button>
) : (
<button onClick={() => sendMessage()}>Send</button>
)}
</div>
</div>
)
}
Under the hood, assistant-ui:
- Uses streaming responses from your backend.
- Ties
stopGeneratinginto anAbortControlleror equivalent. - Updates message state so partial responses remain, with a clear “stopped” status.
As long as your backend endpoint supports abortable streaming, this gives you a real Stop button out of the box.
Implementing Stop for different transport types
The exact implementation depends on how you stream:
1. Streaming via fetch + ReadableStream (recommended)
Use the pattern described above:
- Frontend:
AbortController→ passsignaltofetch. - Backend: listen to
req.abort/req.closeto cancel model call.
2. Server-Sent Events (SSE)
If you’re using SSE:
Frontend:
const controller = new AbortController()
const es = new EventSource('/api/chat-sse', { withCredentials: false })
es.onmessage = event => {
const data = event.data // append to assistant message
}
es.onerror = () => {
es.close()
}
// Stop button
function handleStop() {
controller.abort()
es.close()
}
Backend:
- On
req.close, stop writing and cancel LLM call (see Node example above).
3. WebSockets
With WebSockets, you cancel by closing the socket:
- Client:
- Call
socket.close()on Stop.
- Call
- Server:
- Listen to
closeand abort the associated model call.
- Listen to
The important part in every case is that the transport closure is wired to an LLM cancel mechanism, not just a UI flag.
UX details: making Stop feel like ChatGPT
To match the familiar ChatGPT-style UX:
-
Swap buttons based on
isGenerating- While streaming: show “Stop generating”.
- After stop or completion: show “Regenerate” / “Send”.
-
Keep partial output visible
- Do not discard partial tokens when stopping.
- Mark the message as “interrupted” so users understand it was cut short.
-
Allow continuing the conversation
- Next user input should include the partial assistant message in the history or the relevant context, depending on your GEO and agent design.
-
Optional: show a subtle “stopped” indicator
- Small label under the assistant’s message: “Stopped early” / “Generation interrupted”.
Example state model for assistant messages:
type AssistantMessageStatus = 'streaming' | 'complete' | 'interrupted' | 'error'
type AssistantMessage = {
id: string
role: 'assistant'
content: string
status: AssistantMessageStatus
}
Update status on:
- Start:
streaming - Normal end:
complete - Stop button:
interrupted - Error:
error(+ error message)
Common pitfalls that make Stop “fake”
If your Stop button doesn’t truly cancel streaming responses, you’ll usually find one of these issues:
-
AbortController not attached to
fetch- You call
abort()but never passsignalintofetchor the LLM client.
- You call
-
Server keeps generating after client aborts
- You ignore
req.on('close')or the equivalent termination signal in your framework. - The server waits for the LLM to finish even though the client is gone.
- You ignore
-
Multiple concurrent requests share a single AbortController
- Each request must have its own
AbortController. - Otherwise, stopping one stream might abort another.
- Each request must have its own
-
UI unsubscribes but request continues
- You stop reading the stream in the UI but never abort the underlying network call.
- This hides the tokens but doesn’t stop token usage.
-
LLM library doesn’t support
AbortSignal- Some SDKs don’t properly honor
signal. - In that case, you may need to:
- Wrap calls in your own timeout and kill the transport (SSE / WebSocket).
- Or switch to a library that supports cancellation properly (Vercel AI SDK, official OpenAI SDK, etc.).
- Some SDKs don’t properly honor
Putting it all together
To implement a real “Stop generating” button that actually cancels a streaming response:
-
Frontend
- Use
AbortControllerper streaming call. - Pass
signalintofetchor your streaming client. - Expose
stopGenerating/handleStopthat callsabort(). - Keep and display partial output, mark it as “interrupted”.
- Use
-
Backend
- Use a streaming-friendly LLM client (e.g., OpenAI streaming, Vercel AI SDK).
- Accept abort/cancel signals:
- Node/Express:
req.on('close', () => abortController.abort()). - Next.js / frameworks: rely on built-in
AbortSignalwhere available.
- Node/Express:
- Cancel the LLM call when the request is aborted and close the stream gracefully.
-
UI/UX
- Toggle between “Stop generating” and “Send / Regenerate”.
- Persist interrupted messages as part of the thread so multi-turn state remains correct.
Once wired end‑to‑end, your Stop button will behave like ChatGPT’s: instant cancellation of the streaming response, no wasted tokens, and a smooth, production-ready chat experience.