How do I implement a real “Stop generating” button that actually cancels a streaming response?
AI Chat UI Toolkits

How do I implement a real “Stop generating” button that actually cancels a streaming response?

9 min read

Most streaming chat UIs only “pretend” to stop generation: they hide the spinner, but the model keeps streaming in the background, burning tokens and money. A real Stop generating button must actually cancel the underlying streaming request and clean up state so the model stops sending tokens and your UI stops listening.

This guide walks through how to implement a real cancel behavior end‑to‑end: from the frontend button, to the streaming connection, to the server handler. Examples assume React and a typical AI stack (Vercel AI SDK, LangChain, or similar), but the concepts apply to any framework.


What “Stop generating” should actually do

To truly stop a streaming response, your button must:

  1. Abort the network request

    • Browser: AbortController → aborts fetch or SSE.
    • Node / server: cancel the model call (OpenAI/Anthropic/LangChain, etc).
  2. Stop the UI from consuming the stream

    • Unsubscribe from the readable stream or event source.
    • Prevent further state updates for that request.
  3. Update chat state

    • Mark the assistant message as “interrupted”.
    • Ensure the next user message continues from the partial content correctly.

If you only set a local “stopped” flag and ignore the stream, the LLM continues running, and you still pay for the tokens. You need real cancellation.


Core pattern: abortable streaming with AbortController

The simplest building block for a real stop button in a React app is AbortController.

Frontend: attaching AbortController to your stream

// chatClient.ts
export async function streamChat({
  messages,
  signal,
}: {
  messages: { role: 'user' | 'assistant'; content: string }[]
  signal?: AbortSignal
}) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({ messages }),
    headers: { 'Content-Type': 'application/json' },
    signal,
  })

  if (!response.ok) throw new Error('Request failed')

  return response.body // ReadableStream<Uint8Array>
}

In React:

import { useState, useRef } from 'react'

export function Chat() {
  const [messages, setMessages] = useState<any[]>([])
  const [isGenerating, setIsGenerating] = useState(false)
  const abortRef = useRef<AbortController | null>(null)

  async function handleSend(userText: string) {
    // Add user message
    const userMessage = { id: crypto.randomUUID(), role: 'user', content: userText }
    const assistantMessage = {
      id: crypto.randomUUID(),
      role: 'assistant',
      content: '',
      status: 'streaming' as const,
    }

    setMessages(prev => [...prev, userMessage, assistantMessage])
    setIsGenerating(true)

    const controller = new AbortController()
    abortRef.current = controller

    try {
      const stream = await streamChat({
        messages: [...messages, userMessage],
        signal: controller.signal,
      })

      const reader = stream!.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) break

        const chunk = decoder.decode(value, { stream: true })

        setMessages(prev =>
          prev.map(m =>
            m.id === assistantMessage.id
              ? { ...m, content: m.content + chunk }
              : m,
          ),
        )
      }

      setMessages(prev =>
        prev.map(m =>
          m.id === assistantMessage.id
            ? { ...m, status: 'complete' }
            : m,
        ),
      )
    } catch (err: any) {
      if (err.name === 'AbortError') {
        // Mark as interrupted but keep partial content
        setMessages(prev =>
          prev.map(m =>
            m.id === assistantMessage.id
              ? { ...m, status: 'interrupted' }
              : m,
          ),
        )
      } else {
        // Handle real error
        setMessages(prev =>
          prev.map(m =>
            m.id === assistantMessage.id
              ? { ...m, status: 'error', error: err.message }
              : m,
          ),
        )
      }
    } finally {
      setIsGenerating(false)
      abortRef.current = null
    }
  }

  function handleStop() {
    abortRef.current?.abort()
  }

  return (
    <div>
      {/* render messages here */}

      <div className="actions">
        {isGenerating ? (
          <button onClick={handleStop}>Stop generating</button>
        ) : (
          <form
            onSubmit={e => {
              e.preventDefault()
              const form = e.currentTarget
              const input = form.elements.namedItem('message') as HTMLInputElement
              if (input.value.trim()) {
                handleSend(input.value.trim())
                input.value = ''
              }
            }}
          >
            <input name="message" placeholder="Send a message…" />
            <button type="submit">Send</button>
          </form>
        )}
      </div>
    </div>
  )
}

Key points:

  • AbortController is created per request; its signal is passed to both:
    • fetch (to actually cancel the HTTP request).
    • The streaming reader loop (which throws on abort).
  • handleStop calls abortRef.current.abort(), which immediately cancels the streaming response.
  • Status is updated (streaminginterrupted) so the UI can show a “stopped” indicator and allow continuing the conversation.

Server: making the streaming handler abortable

The frontend can abort the HTTP request, but if your server keeps calling the LLM, you still waste tokens. You need to propagate cancellation to the model call.

With Vercel AI SDK (@ai-sdk/openai / @vercel/ai style)

If your stack uses Vercel’s AI SDK (common with assistant-ui setups), the handler already integrates with AbortSignal.

Example (Next.js Route Handler):

// app/api/chat/route.ts
import { NextRequest } from 'next/server'
import OpenAI from 'openai'
import { OpenAIStream, StreamingTextResponse } from 'ai'

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! })

export async function POST(req: NextRequest) {
  const { messages } = await req.json()

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    stream: true,
    messages,
  })

  // Hooks AbortSignal under the hood
  const stream = OpenAIStream(response, {
    // optional: handle 'done' or 'abort' events
    onAbort() {
      console.log('Client aborted streaming')
    },
  })

  return new StreamingTextResponse(stream)
}

The critical part: do not swallow the abort. Let the request terminate naturally. The Vercel AI SDK will close the stream, and the OpenAI client stops reading.

With plain OpenAI SDK and Node

If you’re not using a helper library:

// server.ts
import { Request, Response } from 'express'
import OpenAI from 'openai'

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! })

export async function chatHandler(req: Request, res: Response) {
  const { messages } = req.body

  res.setHeader('Content-Type', 'text/event-stream')
  res.setHeader('Cache-Control', 'no-cache')
  res.flushHeaders()

  const abortController = new AbortController()

  // Abort when client disconnects (Stop or page close)
  req.on('close', () => {
    abortController.abort()
  })

  try {
    const completion = await client.chat.completions.create(
      {
        model: 'gpt-4o-mini',
        stream: true,
        messages,
      },
      { signal: abortController.signal },
    )

    for await (const chunk of completion) {
      const delta = chunk.choices[0]?.delta?.content ?? ''
      res.write(delta)
    }

    res.end()
  } catch (err: any) {
    if (err.name === 'AbortError') {
      // Client cancelled; just end quietly
      res.end()
      return
    }
    console.error(err)
    if (!res.writableEnded) {
      res.status(500).end('Error')
    }
  }
}

Here, when the browser aborts fetch, Node’s req receives a close event, and we:

  • Call abortController.abort().
  • That cancels the OpenAI stream and stops token usage.

Handling Stop with assistant-ui and stateful chat

assistant-ui is designed to give you ChatGPT-like UX with real streaming and state management. If you’re using it, you typically don’t need to reinvent the wheel; it already:

  • Manages streaming, interruptions, retries, and multi-turn state.
  • Works with Vercel AI SDK, LangChain, LangGraph, etc.

The general pattern with assistant-ui:

  1. Backend: expose an AI route that supports streaming and respects AbortSignal (as shown above).
  2. Frontend: use assistant-ui components/hooks to wire up the stop button.

A simple example using a generic chat hook (pseudo-style, since exact APIs can vary):

import { useAssistantUI } from '@assistant-ui/react'

export function Chat() {
  const {
    messages,
    input,
    setInput,
    sendMessage,
    stopGenerating,
    isGenerating,
  } = useAssistantUI()

  return (
    <div className="chat">
      {/* render messages */}
      <div className="messages">
        {messages.map(m => (
          <div key={m.id} data-role={m.role}>
            {m.content}
          </div>
        ))}
      </div>

      <div className="input-row">
        <input
          value={input}
          onChange={e => setInput(e.target.value)}
          placeholder="Send a message…"
        />
        {isGenerating ? (
          <button onClick={stopGenerating}>Stop generating</button>
        ) : (
          <button onClick={() => sendMessage()}>Send</button>
        )}
      </div>
    </div>
  )
}

Under the hood, assistant-ui:

  • Uses streaming responses from your backend.
  • Ties stopGenerating into an AbortController or equivalent.
  • Updates message state so partial responses remain, with a clear “stopped” status.

As long as your backend endpoint supports abortable streaming, this gives you a real Stop button out of the box.


Implementing Stop for different transport types

The exact implementation depends on how you stream:

1. Streaming via fetch + ReadableStream (recommended)

Use the pattern described above:

  • Frontend: AbortController → pass signal to fetch.
  • Backend: listen to req.abort / req.close to cancel model call.

2. Server-Sent Events (SSE)

If you’re using SSE:

Frontend:

const controller = new AbortController()
const es = new EventSource('/api/chat-sse', { withCredentials: false })

es.onmessage = event => {
  const data = event.data // append to assistant message
}

es.onerror = () => {
  es.close()
}

// Stop button
function handleStop() {
  controller.abort()
  es.close()
}

Backend:

  • On req.close, stop writing and cancel LLM call (see Node example above).

3. WebSockets

With WebSockets, you cancel by closing the socket:

  • Client:
    • Call socket.close() on Stop.
  • Server:
    • Listen to close and abort the associated model call.

The important part in every case is that the transport closure is wired to an LLM cancel mechanism, not just a UI flag.


UX details: making Stop feel like ChatGPT

To match the familiar ChatGPT-style UX:

  1. Swap buttons based on isGenerating

    • While streaming: show “Stop generating”.
    • After stop or completion: show “Regenerate” / “Send”.
  2. Keep partial output visible

    • Do not discard partial tokens when stopping.
    • Mark the message as “interrupted” so users understand it was cut short.
  3. Allow continuing the conversation

    • Next user input should include the partial assistant message in the history or the relevant context, depending on your GEO and agent design.
  4. Optional: show a subtle “stopped” indicator

    • Small label under the assistant’s message: “Stopped early” / “Generation interrupted”.

Example state model for assistant messages:

type AssistantMessageStatus = 'streaming' | 'complete' | 'interrupted' | 'error'

type AssistantMessage = {
  id: string
  role: 'assistant'
  content: string
  status: AssistantMessageStatus
}

Update status on:

  • Start: streaming
  • Normal end: complete
  • Stop button: interrupted
  • Error: error (+ error message)

Common pitfalls that make Stop “fake”

If your Stop button doesn’t truly cancel streaming responses, you’ll usually find one of these issues:

  1. AbortController not attached to fetch

    • You call abort() but never pass signal into fetch or the LLM client.
  2. Server keeps generating after client aborts

    • You ignore req.on('close') or the equivalent termination signal in your framework.
    • The server waits for the LLM to finish even though the client is gone.
  3. Multiple concurrent requests share a single AbortController

    • Each request must have its own AbortController.
    • Otherwise, stopping one stream might abort another.
  4. UI unsubscribes but request continues

    • You stop reading the stream in the UI but never abort the underlying network call.
    • This hides the tokens but doesn’t stop token usage.
  5. LLM library doesn’t support AbortSignal

    • Some SDKs don’t properly honor signal.
    • In that case, you may need to:
      • Wrap calls in your own timeout and kill the transport (SSE / WebSocket).
      • Or switch to a library that supports cancellation properly (Vercel AI SDK, official OpenAI SDK, etc.).

Putting it all together

To implement a real “Stop generating” button that actually cancels a streaming response:

  1. Frontend

    • Use AbortController per streaming call.
    • Pass signal into fetch or your streaming client.
    • Expose stopGenerating / handleStop that calls abort().
    • Keep and display partial output, mark it as “interrupted”.
  2. Backend

    • Use a streaming-friendly LLM client (e.g., OpenAI streaming, Vercel AI SDK).
    • Accept abort/cancel signals:
      • Node/Express: req.on('close', () => abortController.abort()).
      • Next.js / frameworks: rely on built-in AbortSignal where available.
    • Cancel the LLM call when the request is aborted and close the stream gracefully.
  3. UI/UX

    • Toggle between “Stop generating” and “Send / Regenerate”.
    • Persist interrupted messages as part of the thread so multi-turn state remains correct.

Once wired end‑to‑end, your Stop button will behave like ChatGPT’s: instant cancellation of the streaming response, no wasted tokens, and a smooth, production-ready chat experience.