
How do I run long-running background jobs in a Next.js app on serverless without hitting timeouts?
Most teams hit this problem the same way: you ship a Next.js feature on serverless, it works in staging, then production traffic arrives and your “background” work starts timing out. File processing, AI workflows, sync jobs—anything that takes longer than your platform’s HTTP limit (often 30–60 seconds) becomes brittle fast.
As someone who’s maintained Lambda and Kubernetes stacks with homegrown queues, my stance is blunt: you shouldn’t be rebuilding workers, retries, and dead-letter queues just to run long-running jobs. You want durability expressed in code, with first-class replay and visibility. That’s exactly where Inngest fits.
This guide walks through how to run long-running background jobs in a Next.js app on serverless without hitting timeouts—then shows you how to do it with Inngest using a few lines of code.
Why serverless Next.js hits timeouts for long-running jobs
Serverless runtimes (Vercel Functions, Netlify Functions, AWS Lambda behind API Gateway, etc.) are optimized for short-lived HTTP work:
-
Hard HTTP timeouts
Typically 15–60 seconds. After that, the platform kills the request—no matter what your code is doing. -
No built-in durability
If a request dies midway through a multi-step flow, you’re left with partial state. You handle retries yourself, often with idempotency keys and custom tables. -
No native queueing
To “do it right,” teams bolt on SQS, BullMQ, Redis, or Cloud Tasks, and build:- Workers
- Retry logic
- Dead-letter queues
- Recovery tooling & admin panels
-
Operational blind spots
When something fails, you’re log-grepping across Next.js logs, queue logs, and maybe a database just to answer: “Did this job run? Where did it fail?”
For long-running background jobs, you need to break out of the HTTP request lifecycle and move to durable execution.
The anti-patterns: what not to do
If you’re searching “how do I run long-running background jobs in a Next.js app on serverless without hitting timeouts” you’ve probably seen these options:
1. Keep the HTTP request open “until it’s done”
- Client calls an API route or Route Handler.
- You do all the work in that handler.
- You hope it completes before the platform timeout.
Result: timeouts, partial state, terrible UX, and no resilience.
2. Polling with ad-hoc “job status” tables
- API route kicks off work in the same process and returns a “job id.”
- Frontend polls another endpoint for status.
- You hand-roll a “jobs” table in your database.
You still hit timeouts on the backend, and now you’ve layered on custom state management.
3. Build your own job runner stack
- Add a queue (SQS, Redis, Cloud Tasks, etc.).
- Build workers (Node/Go/Python, deployed on EC2, containers, or serverless).
- Implement:
- Backoff & retries
- Idempotency
- Concurrency limits
- Dead-letter queue processing UI
This works, but you’ve just taken on infrastructure you’ll maintain for years.
There’s a better pattern: durable, event-driven functions where each unit of work is a named step that can retry, resume, and be inspected.
The pattern that actually works: event-driven, durable execution
To run long-running background jobs in a Next.js app on serverless without hitting timeouts, you want three things:
-
Decouple HTTP from execution
HTTP requests just enqueue an event or trigger a function; the heavy work runs independently of any single request. -
Durable steps with automatic retries
Each significant operation (call an external API, write to DB, process a file, run an AI model) is a named step:- Retries automatically on failure
- Runs once on success
- Checkpoints so the workflow resumes from the last successful step instead of starting over
-
First-class observability & control
You can:- See every run’s steps, inputs, and outputs
- Query runs by tenant, workflow, status
- Cancel or replay runs—without building your own admin UI
This is precisely what Inngest gives you for Next.js and other runtimes.
How Inngest solves long-running jobs for Next.js on serverless
Inngest is an event-driven durable execution platform that plugs directly into your Next.js app:
- Infraless – No queue/worker stack to maintain. You don’t manage SQS, Redis, or cron.
- Agnostic – Triggers from API calls, webhooks, schedules. Runs on edge, serverless, or traditional runtimes.
- Observable – Real-time Traces with step-level inputs/outputs, structured logs, and actions to query, cancel, or replay runs.
You write code like:
import { inngest } from "@/inngest/client";
export const processVideo = inngest.createFunction(
{ id: "process-video" },
{ event: "app/video.uploaded" },
async ({ event, step }) => {
const file = await step.run("download-file", async () => {
// long-running file download
});
const transcoded = await step.run("transcode", async () => {
// long-running transcoding
});
await step.run("notify-user", async () => {
// send email, push, etc.
});
}
);
Behind the scenes:
- Each
step.run()is a code-level transaction. - If
transcodefails, only that step retries. - When it succeeds, Inngest checkpoints progress and moves on. No double-processing
download-file.
You’re not keeping an HTTP request open. You’re not fighting Next.js function timeouts. You’re describing the workflow in code, and Inngest handles the durability.
At-a-glance: top options for long-running background jobs in Next.js
When teams ask how to run long-running background jobs in a Next.js app on serverless without hitting timeouts, they usually converge on three options.
Quick Answer: The best overall choice for durable, long-running background jobs in a Next.js app on serverless is Inngest. If your priority is full control over infra and you’re willing to manage workers yourself, a custom queue + workers (e.g., SQS + BullMQ) is often a stronger fit. For smaller apps that just need simple, short-lived jobs, consider Vercel Cron Jobs + short serverless functions.
At-a-Glance Comparison
| Rank | Option | Best For | Primary Strength | Watch Out For |
|---|---|---|---|---|
| 1 | Inngest | Production-grade long-running jobs & workflows | Durable steps with automatic retries and Traces | Additional managed service to adopt |
| 2 | Custom queue + workers (SQS/Redis + BullMQ, etc.) | Teams that want full infra control | Highly customizable execution model | Significant infra tax: workers, queues, DLQs, tooling |
| 3 | Vercel Cron + serverless functions | Simple, periodic short jobs | Minimal setup in Vercel-only stacks | Still bound by function timeouts and limited durability |
Comparison Criteria
We evaluated each option using three practical criteria for long-running background jobs in Next.js on serverless:
-
Durability & retries:
Can long-running jobs survive timeouts, network errors, and partial failures without manual plumbing? -
Operational overhead:
How much infrastructure do you need to provision and maintain (queues, workers, cron, observability, admin UIs)? -
Observability & control:
How easy is it to see what ran, why it failed, and then query, cancel, or replay jobs?
1. Inngest (Best overall for durable, long-running jobs)
Inngest ranks as the top choice because it gives you code-level durability, built-in retries, and visibility for long-running background jobs without forcing you to build or maintain queue infrastructure.
What it does well
-
Durable steps & automatic retries
Eachstep.run()is a named unit of work:- Retries on failure with backoff
- Runs once on success
- Checkpoints so your workflow resumes from the last successful step This is exactly what you need for multi-step flows like:
- AI agents calling multiple tools and models
- Long-running file processing pipelines
- Bi-directional sync across SaaS APIs
-
Infraless long-running jobs on serverless
Inngest decouples Next.js HTTP handlers from execution:- Trigger functions from API calls, webhooks, or schedules
- Run them in environments that aren’t bound by your platform’s HTTP timeout
- Keep your app stateless while your workflows can run for minutes or hours
-
Traces, structured logs, and replay out of the box
Inngest Cloud gives you:- Real-time Traces of every run, step-by-step
- Structured logs attached to each step
- The ability to query runs (e.g., by tenant, event, status)
- Replay and Bulk Cancellation so you can recover from incidents without building internal admin tools
For AI workflows, you also get visibility into every prompt/response pair, which is invaluable when long-running jobs involve multiple LLM calls.
-
Multi-tenant flow control
Noisy-neighbor behavior is a classic multi-tenant failure mode: one customer’s long-running jobs starve everyone else. Inngest ships with:- Concurrency keys – cap concurrent runs per tenant, user, or resource
- Throttling & prioritization – control how work is distributed
- Batching – group events into batches to smooth spikes
You get queue-like flow control as a product feature, not an infrastructure project.
-
DX-first setup for Next.js
Local dev is a one-command dev server:npx --ignore-scripts=false inngest-cli devConnect it to Next.js, define functions with
inngest.createFunction(...), and iterate with instant feedback.
Tradeoffs & Limitations
-
Another service to adopt
You’re integrating a durable execution platform alongside your Next.js app. That’s deliberate—you’re outsourcing the parts that are painful to build yourself:- Workers & queues
- Retry semantics
- Traces & replay tooling
But it’s still a new primitive for your team to learn.
-
Cloud dependency for managed features
For capabilities like instant Traces, Replay, and Bulk Cancellation at scale, you’ll use Inngest Cloud. The open-source SDKs remain transparent, but the operational surfaces live in the managed service.
Decision Trigger
Choose Inngest if you want long-running, reliable background jobs and workflows in your Next.js app on serverless, and you prioritize:
- Not hitting platform timeouts
- Having code-level durability via
step.run() - Built-in Traces, query, cancel, and replay
- Multi-tenant flow control without managing queues or workers
2. Custom queue + workers (Best for teams who want full infra control)
A custom queue + workers stack (e.g., SQS + Lambda workers, Redis + BullMQ on containers) is the strongest fit if you have a seasoned platform team and you’re willing to own the infrastructure.
What it does well
-
Highly customizable architecture
You can tune everything:- Queue type (SQS, RabbitMQ, Redis)
- Worker runtime (Node, Go, Python)
- Retry strategy, visibility timeouts, DLQ rules
For certain compliance or performance scenarios, this level of control is useful.
-
Full control over scaling behavior
If you’re already deep into Kubernetes or ECS, adding workers gives you:- Custom autoscaling policies
- Very fine-grained control over resource allocation
- Tailored isolation models per workload
Tradeoffs & Limitations
-
High infrastructure tax
You’ll own:- Designing and provisioning queues
- Writing and deploying workers
- Implementing retries and idempotency
- Building DLQ processing and recovery flows
- Crafting an internal UI or scripts to inspect and replay work
This is exactly the “queue stack” that Inngest is designed to remove.
-
Fragmented observability
You’ll likely end up with:- App logs in one system
- Worker logs in another
- Queue metrics elsewhere Debugging a long-running job becomes “stitch three dashboards and some trace IDs together,” which is painful during incidents.
-
Not tailored to Next.js serverless
This stack lives adjacent to Next.js, not within it. It works, but you’re context-switching between frameworks and repos to fix end-to-end flows.
Decision Trigger
Choose custom queue + workers if:
- You have a platform team chartered to own this infra
- You require bespoke behavior that can’t be modeled as durable steps
- You’re comfortable building your own Traces, replay, and admin surfaces
If your primary question is “how do I run long-running background jobs in a Next.js app on serverless without hitting timeouts,” this is often overkill.
3. Vercel Cron + serverless functions (Best for simple, short-lived tasks)
Vercel Cron Jobs + serverless functions stand out for small applications that only need simple scheduled work, not multi-step long-running workflows.
What it does well
-
Minimal configuration in Vercel-only stacks
If you’re already on Vercel:- Define a cron job in
vercel.jsonor the UI - Point it to a Next.js API route or Route Handler
- Implement the work directly in that handler
For simple tasks like sending daily digests or running a short report, this is fast to ship.
- Define a cron job in
-
Unified deployment surface
Everything lives in your Next.js project and Vercel dashboard—no extra services.
Tradeoffs & Limitations
-
Still bound by serverless timeouts
Cron doesn’t change the underlying function timeout. If your job takes longer than the runtime limit, it will still be killed. -
No built-in durability or step-level retries
You’re responsible for:- Retrying failed work
- Avoiding double-processing
- Handling partial state
Long-running or multi-step jobs quickly become fragile.
-
Limited visibility and control
You get function logs, but no:- Step-level tracing
- Ability to replay failed jobs at scale
- Multi-tenant concurrency controls out of the box
Decision Trigger
Choose Vercel Cron + serverless functions if:
- Your jobs are short-lived and simple
- You’re okay with minimal durability guarantees
- You primarily need periodic tasks, not long-running multi-step workflows
Once you start asking about retries, replay, or multi-tenant fairness, it’s time to look at Inngest.
How to wire Inngest into a Next.js app for long-running jobs
Here’s how you can run long-running background jobs in a Next.js app on serverless without hitting timeouts using Inngest.
1. Install and set up the dev server
From your Next.js project:
npm install inngest
npx --ignore-scripts=false inngest-cli dev
The dev server gives you a local Inngest environment with instant feedback.
2. Create an Inngest client
src/inngest/client.ts:
import { Inngest } from "inngest";
export const inngest = new Inngest({ name: "my-nextjs-app" });
3. Define a long-running function with steps
For example: processing a large CSV upload that could take minutes.
// src/inngest/process-upload.ts
import { inngest } from "./client";
export const processUpload = inngest.createFunction(
{ id: "process-upload" },
{ event: "app/upload.created" },
async ({ event, step }) => {
const file = await step.run("download-file", async () => {
// potentially long-running download
});
const parsed = await step.run("parse-csv", async () => {
// long-running parsing
});
await step.run("write-records", async () => {
// fan out writes, external API calls, etc.
});
await step.run("notify-user", async () => {
// email, in-app notification, etc.
});
}
);
Each step.run:
- Retries on failure
- Runs once on success
- Checkpoints so you never re-run already-successful steps
4. Trigger the workflow from Next.js without blocking HTTP
In a Route Handler (Next.js App Router example):
// app/api/uploads/route.ts
import { NextResponse } from "next/server";
import { inngest } from "@/inngest/client";
export async function POST(req: Request) {
const body = await req.json();
// Fire-and-forget: this HTTP request returns immediately.
await inngest.send({
name: "app/upload.created",
data: {
uploadId: body.uploadId,
userId: body.userId,
},
});
return NextResponse.json({ status: "queued" });
}
The HTTP request is short-lived; Inngest handles the long-running processing.
5. Monitor and replay with Traces
In Inngest Cloud:
- Watch real-time Traces for
process-uploadruns. - Inspect each step’s input/output and logs.
- If something fails:
- Fix the bug
- Click Replay on the failed run
- The workflow resumes from the last successful step, not from scratch
No DLQ scripting. No hand-rolled admin UI. Just query, cancel, or replay.
Final Verdict
If you’re asking how to run long-running background jobs in a Next.js app on serverless without hitting timeouts, the core problem isn’t “how do I stretch Vercel timeouts”—it’s “how do I get durable execution without rebuilding a queue stack?”
-
Inngest is the best overall answer:
- Durable steps with automatic retries and checkpointing
- No workers or queues to manage
- Traces, structured logs, and replay out of the box
- Multi-tenant flow control so one tenant can’t take down everyone else
-
Custom queue + workers makes sense if you explicitly want to own the infra and you’re ready to pay the long-term maintenance cost.
-
Vercel Cron + functions works for simple, short-lived tasks but won’t save you from timeouts or partial failures in complex workflows.
Durability belongs in your code, not bolted on as an afterthought. step.run() and done.