How do I track token usage and cost per user request for an LLM feature in a Node/Express app?

Most Node and Express teams hit the same wall: you ship an LLM-powered feature, then realize you have no clear view of token usage or cost per request, per route, or per user. That’s a problem when you’re trying to keep spend predictable, debug bad outputs, or justify the feature to leadership.

Quick Answer: Add a tracing layer around your LLM calls that records token usage (input, output, total) and maps it to your Express request context (user ID, route, request ID). In Mastra, this is built into Observability and agent/workflow traces; you can then compute cost per request using your model’s pricing and export traces to your preferred o11y stack.

Quick Answer: Track token usage and cost per user request by:

instrumenting all LLM calls in your Express routes

recording token usage, model, and latency

enriching traces with user/request metadata

computing cost from model pricing

exporting data to Mastra Studio, Mastra Cloud, or your OpenTelemetry-compatible platform

Frequently Asked Questions

How do I capture token usage for each LLM request in a Node/Express app?

Short Answer: Wrap every LLM call in an instrumentation layer that records inputTokens, outputTokens, totalTokens, model, and latency, and attach those metrics to your Express request context.

Expanded Explanation:
In practice, you don’t want scattered console.log() calls and ad‑hoc counters. You want a single place where every LLM call is traced, and you can see, for a given Express route and user, exactly how many tokens were used and how long the model took.

In Mastra, every Agent and Workflow execution emits a trace that includes model interactions: prompts, completions, and token usage (inputTokens, outputTokens, totalTokens). That data is available via usage objects and is persisted by Observability, so you can see “this HTTP request → this agent/workflow → these model calls → this token usage”.

Key Takeaways:

Instrument at the LLM call boundary, not just at the HTTP layer.
Record structured usage: inputTokens, outputTokens, totalTokens, and model.
Use Mastra’s Observability to get token usage and traces without building your own pipeline.

What’s the process to wire token usage tracking into my Express routes?

Short Answer: Use Mastra to define your Agent/Workflow, enable Observability, then call your agent from your Express route and pass request metadata; Mastra will handle token usage tracking and tracing for you.

Expanded Explanation:
You want a clean separation: your route handler focuses on HTTP and auth, and your Mastra code handles LLM orchestration and token accounting. Mastra’s Observability records model interactions (token usage, latency, prompts, completions) and lets you export traces to Mastra Cloud or OpenTelemetry-compatible platforms. By including user IDs and request metadata in the trace, you can later ask, “What did this user cost us last week?” or “Which endpoint is burning tokens?”

Steps:

Install and initialize Mastra

npm create mastra@latest
# or, inside an existing app:
npm install @mastra/core @mastra/agents @mastra/observability

Configure your workspace and observability once at startup:

// mastra.ts
import { Workspace } from "@mastra/core";
import { Observability, DefaultExporter } from "@mastra/observability";

export const workspace = new Workspace({
  name: "node-express-llm",
  observability: new Observability({
    exporter: new DefaultExporter({
      // traces to stdout, Mastra Studio, etc.
    }),
  }),
});

Define an Agent that uses your LLM provider

// agents/support-agent.ts
import { Agent } from "@mastra/agents";
import { workspace } from "../mastra";

export const supportAgent = new Agent({
  name: "support-agent",
  workspace,
  model: {
    provider: "openai",
    name: "gpt-4.1-mini",
  },
  system: "You are a helpful support agent.",
});

Mastra will capture token usage for this agent’s model calls automatically.

Call the Agent from your Express route and enrich with request context

// server.ts
import express from "express";
import { supportAgent } from "./agents/support-agent";

const app = express();
app.use(express.json());

app.post("/api/support", async (req, res) => {
  const userId = req.header("x-user-id") || "anonymous";
  const requestId = req.header("x-request-id") || crypto.randomUUID();

  try {
    const result = await supportAgent.run(
      {
        input: req.body.message,
      },
      {
        // optional metadata that will appear in traces
        metadata: {
          userId,
          requestId,
          route: "/api/support",
        },
      }
    );

    // result.usage carries token stats for the run
    const usage = result.usage; // { inputTokens, outputTokens, totalTokens, ... }

    res.json({
      reply: result.output,
      usage,
    });
  } catch (err) {
    console.error("LLM error", err);
    res.status(500).json({ error: "LLM call failed" });
  }
});

app.listen(3000, () => {
  console.log("Server running on http://localhost:3000");
});

Here result.usage is the same structure Mastra uses internally:

inputTokens: tokens consumed by the input prompt
outputTokens: tokens generated in the response
totalTokens: sum of input and output

Mastra’s Observability also stores this in traces so you don’t have to persist it manually.

What’s the difference between doing this manually and using Mastra’s Observability?

Short Answer: Manual tracking means hand-rolling logging, storage, and correlation; Mastra’s Observability gives you token-aware tracing, structured usage, and exports out-of-the-box.

Expanded Explanation:
You can absolutely wrap every LLM call yourself, capture token counts from the provider SDK, and store them in a database. The tradeoff is you now own trace correlation, schema evolution, and integrations with monitoring tools. With Mastra, Agents and Workflows emit traces that already understand AI patterns: model interactions, tool calls, memory reads/writes, and token usage. You get a unified view: “This HTTP request → this Agent run → these model calls → these tools → this cost.”

Comparison Snapshot:

Manual instrumentation:
- You call the provider SDK directly, parse token usage, log to your database.
- You must build trace correlation, dashboards, and alerts yourself.
Mastra Observability:
- Traces model interactions with token usage, latency, prompts, completions.
- Exposes usage objects at the API level and exports to Mastra Cloud or OpenTelemetry-compatible platforms.
Best for:
- Manual: small prototypes where you don’t care about long-term observability.
- Mastra: production Node/Express apps where you need end‑to‑end traces, cost control, and debuggability.

How do I calculate and store cost per user request?

Short Answer: Multiply totalTokens by your model’s per‑token rate (input and output if priced differently), then store that cost alongside the trace metadata (user ID, route, request ID).

Expanded Explanation:
Most LLM providers expose per‑1K‑token pricing for input and output. Once you have inputTokens and outputTokens for a request, you can calculate approximate cost. In Mastra, you can do this right in your Agent call result, or as a post‑processing step on trace data exported to your analytics platform.

You typically want a small CostCalculator utility and a database table keyed by request ID or trace ID. That lets you answer questions like “Top 10 most expensive users”, “Cost per endpoint”, and “Cost per feature launch.”

What You Need:

A mapping of model names → pricing (input/output per 1K tokens).
A place to store per-request cost (database, data warehouse, or an OLAP store like ClickHouse for high-traffic apps).

Example cost calculation in your Express handler:

// pricing.ts
export const MODEL_PRICING = {
  "gpt-4.1-mini": {
    inputPer1K: 0.15 / 1_000_000,  // example: $0.15 / 1M tokens => per token
    outputPer1K: 0.60 / 1_000_000,
  },
  // add other models here
};

export function estimateCost(model: string, usage: { inputTokens: number; outputTokens: number }) {
  const pricing = MODEL_PRICING[model];
  if (!pricing) return 0;

  const inputCost = usage.inputTokens * pricing.inputPer1K;
  const outputCost = usage.outputTokens * pricing.outputPer1K;
  return inputCost + outputCost;
}

// in your Express route
import { estimateCost } from "./pricing";

app.post("/api/support", async (req, res) => {
  const userId = req.header("x-user-id") || "anonymous";
  const requestId = req.header("x-request-id") || crypto.randomUUID();

  const result = await supportAgent.run({ input: req.body.message }, { metadata: { userId, requestId } });

  const { usage, modelName } = result; // assume modelName is available from your agent
  const cost = estimateCost(modelName, usage);

  // persist cost per request
  await db.insert("llm_request_costs", {
    requestId,
    userId,
    route: "/api/support",
    model: modelName,
    inputTokens: usage.inputTokens,
    outputTokens: usage.outputTokens,
    totalTokens: usage.totalTokens,
    cost,
    createdAt: new Date(),
  });

  res.json({
    reply: result.output,
    usage,
    cost,
  });
});

How can I keep token usage and cost under control as my Node/Express app scales?

Short Answer: Combine proactive limits (like Mastra’s TokenLimiterProcessor), good prompt design, and observability-driven tuning (dashboards and alerts on tokens and cost per route/user).

Expanded Explanation:
Tracking is half the problem; control is the other half. You want guardrails so a single request can’t accidentally consume millions of tokens and blow your budget. Mastra’s processors give you a code-first way to enforce limits on LLM outputs. The TokenLimiterProcessor is an output processor that limits the number of tokens in model responses—truncating or blocking when the limit is exceeded.

Pair that with dashboards on top of Mastra’s traces (token usage, latency, prompts, completions) and a rule of thumb: keep hard caps in code, and use observability to tune soft limits over time.

Why It Matters:

Cost control: Hard limits and real‑time visibility help prevent runaway usage and surprise bills.
Reliability and UX: Consistent response sizes and latency make your LLM feature feel like part of your infrastructure, not a flaky experiment.

Example using TokenLimiterProcessor in an Agent:

import { Agent } from "@mastra/agents";
import { TokenLimiterProcessor } from "@mastra/agents/processors";
import { workspace } from "../mastra";

export const summarizerAgent = new Agent({
  name: "summarizer-agent",
  workspace,
  model: {
    provider: "openai",
    name: "gpt-4.1-mini",
  },
  system: "Summarize the input text in less than 200 words.",
  outputProcessors: [
    new TokenLimiterProcessor({
      maxTokens: 512, // hard limit on output tokens
      behavior: "truncate", // or "block"
    }),
  ],
});

Now every call through summarizerAgent will respect your token limit, with usage still recorded in traces.

Quick Recap

If you’re building an LLM feature in a Node/Express app, treat token usage and cost tracking as first‑class infrastructure, not a logging afterthought. Instrument your Agents and Workflows with Mastra, let Observability capture token usage, latency, prompts, and completions, then compute cost based on model pricing and store it per request and per user. Add TokenLimiterProcessor and similar guardrails to keep usage bounded, and use traces exported to Mastra Studio, Mastra Cloud, or your OpenTelemetry-compatible platform to monitor, debug, and optimize over time.

Next Step

Get Started

How do I track token usage and cost per user request for an LLM feature in a Node/Express app?

Frequently Asked Questions

How do I capture token usage for each LLM request in a Node/Express app?

What’s the process to wire token usage tracking into my Express routes?

What’s the difference between doing this manually and using Mastra’s Observability?

How do I calculate and store cost per user request?

How can I keep token usage and cost under control as my Node/Express app scales?

Quick Recap

Next Step

Keep Reading

More from AI Coding Agent Platforms

How do I set up Windsurf Teams ($30/user/mo) with centralized billing, admin analytics, and automated zero data retention?

How do I contact Windsurf about Enterprise pricing, RBAC, and hybrid deployment for 200+ seats?

How do I add SSO to Windsurf Teams (+$10/user/mo) and what identity providers are supported?