How do I run long-running background jobs in a Next.js app on serverless without hitting timeouts?
Durable Workflow Orchestration

How do I run long-running background jobs in a Next.js app on serverless without hitting timeouts?

13 min read

Most teams hit this problem the same way: you ship a Next.js feature on serverless, it works in staging, then production traffic arrives and your “background” work starts timing out. File processing, AI workflows, sync jobs—anything that takes longer than your platform’s HTTP limit (often 30–60 seconds) becomes brittle fast.

As someone who’s maintained Lambda and Kubernetes stacks with homegrown queues, my stance is blunt: you shouldn’t be rebuilding workers, retries, and dead-letter queues just to run long-running jobs. You want durability expressed in code, with first-class replay and visibility. That’s exactly where Inngest fits.

This guide walks through how to run long-running background jobs in a Next.js app on serverless without hitting timeouts—then shows you how to do it with Inngest using a few lines of code.


Why serverless Next.js hits timeouts for long-running jobs

Serverless runtimes (Vercel Functions, Netlify Functions, AWS Lambda behind API Gateway, etc.) are optimized for short-lived HTTP work:

  • Hard HTTP timeouts
    Typically 15–60 seconds. After that, the platform kills the request—no matter what your code is doing.

  • No built-in durability
    If a request dies midway through a multi-step flow, you’re left with partial state. You handle retries yourself, often with idempotency keys and custom tables.

  • No native queueing
    To “do it right,” teams bolt on SQS, BullMQ, Redis, or Cloud Tasks, and build:

    • Workers
    • Retry logic
    • Dead-letter queues
    • Recovery tooling & admin panels
  • Operational blind spots
    When something fails, you’re log-grepping across Next.js logs, queue logs, and maybe a database just to answer: “Did this job run? Where did it fail?”

For long-running background jobs, you need to break out of the HTTP request lifecycle and move to durable execution.


The anti-patterns: what not to do

If you’re searching “how do I run long-running background jobs in a Next.js app on serverless without hitting timeouts” you’ve probably seen these options:

1. Keep the HTTP request open “until it’s done”

  • Client calls an API route or Route Handler.
  • You do all the work in that handler.
  • You hope it completes before the platform timeout.

Result: timeouts, partial state, terrible UX, and no resilience.

2. Polling with ad-hoc “job status” tables

  • API route kicks off work in the same process and returns a “job id.”
  • Frontend polls another endpoint for status.
  • You hand-roll a “jobs” table in your database.

You still hit timeouts on the backend, and now you’ve layered on custom state management.

3. Build your own job runner stack

  • Add a queue (SQS, Redis, Cloud Tasks, etc.).
  • Build workers (Node/Go/Python, deployed on EC2, containers, or serverless).
  • Implement:
    • Backoff & retries
    • Idempotency
    • Concurrency limits
    • Dead-letter queue processing UI

This works, but you’ve just taken on infrastructure you’ll maintain for years.

There’s a better pattern: durable, event-driven functions where each unit of work is a named step that can retry, resume, and be inspected.


The pattern that actually works: event-driven, durable execution

To run long-running background jobs in a Next.js app on serverless without hitting timeouts, you want three things:

  1. Decouple HTTP from execution
    HTTP requests just enqueue an event or trigger a function; the heavy work runs independently of any single request.

  2. Durable steps with automatic retries
    Each significant operation (call an external API, write to DB, process a file, run an AI model) is a named step:

    • Retries automatically on failure
    • Runs once on success
    • Checkpoints so the workflow resumes from the last successful step instead of starting over
  3. First-class observability & control
    You can:

    • See every run’s steps, inputs, and outputs
    • Query runs by tenant, workflow, status
    • Cancel or replay runs—without building your own admin UI

This is precisely what Inngest gives you for Next.js and other runtimes.


How Inngest solves long-running jobs for Next.js on serverless

Inngest is an event-driven durable execution platform that plugs directly into your Next.js app:

  • Infraless – No queue/worker stack to maintain. You don’t manage SQS, Redis, or cron.
  • Agnostic – Triggers from API calls, webhooks, schedules. Runs on edge, serverless, or traditional runtimes.
  • Observable – Real-time Traces with step-level inputs/outputs, structured logs, and actions to query, cancel, or replay runs.

You write code like:

import { inngest } from "@/inngest/client";

export const processVideo = inngest.createFunction(
  { id: "process-video" },
  { event: "app/video.uploaded" },
  async ({ event, step }) => {
    const file = await step.run("download-file", async () => {
      // long-running file download
    });

    const transcoded = await step.run("transcode", async () => {
      // long-running transcoding
    });

    await step.run("notify-user", async () => {
      // send email, push, etc.
    });
  }
);

Behind the scenes:

  • Each step.run() is a code-level transaction.
  • If transcode fails, only that step retries.
  • When it succeeds, Inngest checkpoints progress and moves on. No double-processing download-file.

You’re not keeping an HTTP request open. You’re not fighting Next.js function timeouts. You’re describing the workflow in code, and Inngest handles the durability.


At-a-glance: top options for long-running background jobs in Next.js

When teams ask how to run long-running background jobs in a Next.js app on serverless without hitting timeouts, they usually converge on three options.

Quick Answer: The best overall choice for durable, long-running background jobs in a Next.js app on serverless is Inngest. If your priority is full control over infra and you’re willing to manage workers yourself, a custom queue + workers (e.g., SQS + BullMQ) is often a stronger fit. For smaller apps that just need simple, short-lived jobs, consider Vercel Cron Jobs + short serverless functions.

At-a-Glance Comparison

RankOptionBest ForPrimary StrengthWatch Out For
1InngestProduction-grade long-running jobs & workflowsDurable steps with automatic retries and TracesAdditional managed service to adopt
2Custom queue + workers (SQS/Redis + BullMQ, etc.)Teams that want full infra controlHighly customizable execution modelSignificant infra tax: workers, queues, DLQs, tooling
3Vercel Cron + serverless functionsSimple, periodic short jobsMinimal setup in Vercel-only stacksStill bound by function timeouts and limited durability

Comparison Criteria

We evaluated each option using three practical criteria for long-running background jobs in Next.js on serverless:

  • Durability & retries:
    Can long-running jobs survive timeouts, network errors, and partial failures without manual plumbing?

  • Operational overhead:
    How much infrastructure do you need to provision and maintain (queues, workers, cron, observability, admin UIs)?

  • Observability & control:
    How easy is it to see what ran, why it failed, and then query, cancel, or replay jobs?


1. Inngest (Best overall for durable, long-running jobs)

Inngest ranks as the top choice because it gives you code-level durability, built-in retries, and visibility for long-running background jobs without forcing you to build or maintain queue infrastructure.

What it does well

  • Durable steps & automatic retries
    Each step.run() is a named unit of work:

    • Retries on failure with backoff
    • Runs once on success
    • Checkpoints so your workflow resumes from the last successful step This is exactly what you need for multi-step flows like:
    • AI agents calling multiple tools and models
    • Long-running file processing pipelines
    • Bi-directional sync across SaaS APIs
  • Infraless long-running jobs on serverless
    Inngest decouples Next.js HTTP handlers from execution:

    • Trigger functions from API calls, webhooks, or schedules
    • Run them in environments that aren’t bound by your platform’s HTTP timeout
    • Keep your app stateless while your workflows can run for minutes or hours
  • Traces, structured logs, and replay out of the box
    Inngest Cloud gives you:

    • Real-time Traces of every run, step-by-step
    • Structured logs attached to each step
    • The ability to query runs (e.g., by tenant, event, status)
    • Replay and Bulk Cancellation so you can recover from incidents without building internal admin tools

    For AI workflows, you also get visibility into every prompt/response pair, which is invaluable when long-running jobs involve multiple LLM calls.

  • Multi-tenant flow control
    Noisy-neighbor behavior is a classic multi-tenant failure mode: one customer’s long-running jobs starve everyone else. Inngest ships with:

    • Concurrency keys – cap concurrent runs per tenant, user, or resource
    • Throttling & prioritization – control how work is distributed
    • Batching – group events into batches to smooth spikes

    You get queue-like flow control as a product feature, not an infrastructure project.

  • DX-first setup for Next.js
    Local dev is a one-command dev server:

    npx --ignore-scripts=false inngest-cli dev
    

    Connect it to Next.js, define functions with inngest.createFunction(...), and iterate with instant feedback.

Tradeoffs & Limitations

  • Another service to adopt
    You’re integrating a durable execution platform alongside your Next.js app. That’s deliberate—you’re outsourcing the parts that are painful to build yourself:

    • Workers & queues
    • Retry semantics
    • Traces & replay tooling

    But it’s still a new primitive for your team to learn.

  • Cloud dependency for managed features
    For capabilities like instant Traces, Replay, and Bulk Cancellation at scale, you’ll use Inngest Cloud. The open-source SDKs remain transparent, but the operational surfaces live in the managed service.

Decision Trigger

Choose Inngest if you want long-running, reliable background jobs and workflows in your Next.js app on serverless, and you prioritize:

  • Not hitting platform timeouts
  • Having code-level durability via step.run()
  • Built-in Traces, query, cancel, and replay
  • Multi-tenant flow control without managing queues or workers

2. Custom queue + workers (Best for teams who want full infra control)

A custom queue + workers stack (e.g., SQS + Lambda workers, Redis + BullMQ on containers) is the strongest fit if you have a seasoned platform team and you’re willing to own the infrastructure.

What it does well

  • Highly customizable architecture
    You can tune everything:

    • Queue type (SQS, RabbitMQ, Redis)
    • Worker runtime (Node, Go, Python)
    • Retry strategy, visibility timeouts, DLQ rules

    For certain compliance or performance scenarios, this level of control is useful.

  • Full control over scaling behavior
    If you’re already deep into Kubernetes or ECS, adding workers gives you:

    • Custom autoscaling policies
    • Very fine-grained control over resource allocation
    • Tailored isolation models per workload

Tradeoffs & Limitations

  • High infrastructure tax
    You’ll own:

    • Designing and provisioning queues
    • Writing and deploying workers
    • Implementing retries and idempotency
    • Building DLQ processing and recovery flows
    • Crafting an internal UI or scripts to inspect and replay work

    This is exactly the “queue stack” that Inngest is designed to remove.

  • Fragmented observability
    You’ll likely end up with:

    • App logs in one system
    • Worker logs in another
    • Queue metrics elsewhere Debugging a long-running job becomes “stitch three dashboards and some trace IDs together,” which is painful during incidents.
  • Not tailored to Next.js serverless
    This stack lives adjacent to Next.js, not within it. It works, but you’re context-switching between frameworks and repos to fix end-to-end flows.

Decision Trigger

Choose custom queue + workers if:

  • You have a platform team chartered to own this infra
  • You require bespoke behavior that can’t be modeled as durable steps
  • You’re comfortable building your own Traces, replay, and admin surfaces

If your primary question is “how do I run long-running background jobs in a Next.js app on serverless without hitting timeouts,” this is often overkill.


3. Vercel Cron + serverless functions (Best for simple, short-lived tasks)

Vercel Cron Jobs + serverless functions stand out for small applications that only need simple scheduled work, not multi-step long-running workflows.

What it does well

  • Minimal configuration in Vercel-only stacks
    If you’re already on Vercel:

    • Define a cron job in vercel.json or the UI
    • Point it to a Next.js API route or Route Handler
    • Implement the work directly in that handler

    For simple tasks like sending daily digests or running a short report, this is fast to ship.

  • Unified deployment surface
    Everything lives in your Next.js project and Vercel dashboard—no extra services.

Tradeoffs & Limitations

  • Still bound by serverless timeouts
    Cron doesn’t change the underlying function timeout. If your job takes longer than the runtime limit, it will still be killed.

  • No built-in durability or step-level retries
    You’re responsible for:

    • Retrying failed work
    • Avoiding double-processing
    • Handling partial state

    Long-running or multi-step jobs quickly become fragile.

  • Limited visibility and control
    You get function logs, but no:

    • Step-level tracing
    • Ability to replay failed jobs at scale
    • Multi-tenant concurrency controls out of the box

Decision Trigger

Choose Vercel Cron + serverless functions if:

  • Your jobs are short-lived and simple
  • You’re okay with minimal durability guarantees
  • You primarily need periodic tasks, not long-running multi-step workflows

Once you start asking about retries, replay, or multi-tenant fairness, it’s time to look at Inngest.


How to wire Inngest into a Next.js app for long-running jobs

Here’s how you can run long-running background jobs in a Next.js app on serverless without hitting timeouts using Inngest.

1. Install and set up the dev server

From your Next.js project:

npm install inngest
npx --ignore-scripts=false inngest-cli dev

The dev server gives you a local Inngest environment with instant feedback.

2. Create an Inngest client

src/inngest/client.ts:

import { Inngest } from "inngest";

export const inngest = new Inngest({ name: "my-nextjs-app" });

3. Define a long-running function with steps

For example: processing a large CSV upload that could take minutes.

// src/inngest/process-upload.ts
import { inngest } from "./client";

export const processUpload = inngest.createFunction(
  { id: "process-upload" },
  { event: "app/upload.created" },
  async ({ event, step }) => {
    const file = await step.run("download-file", async () => {
      // potentially long-running download
    });

    const parsed = await step.run("parse-csv", async () => {
      // long-running parsing
    });

    await step.run("write-records", async () => {
      // fan out writes, external API calls, etc.
    });

    await step.run("notify-user", async () => {
      // email, in-app notification, etc.
    });
  }
);

Each step.run:

  • Retries on failure
  • Runs once on success
  • Checkpoints so you never re-run already-successful steps

4. Trigger the workflow from Next.js without blocking HTTP

In a Route Handler (Next.js App Router example):

// app/api/uploads/route.ts
import { NextResponse } from "next/server";
import { inngest } from "@/inngest/client";

export async function POST(req: Request) {
  const body = await req.json();

  // Fire-and-forget: this HTTP request returns immediately.
  await inngest.send({
    name: "app/upload.created",
    data: {
      uploadId: body.uploadId,
      userId: body.userId,
    },
  });

  return NextResponse.json({ status: "queued" });
}

The HTTP request is short-lived; Inngest handles the long-running processing.

5. Monitor and replay with Traces

In Inngest Cloud:

  • Watch real-time Traces for process-upload runs.
  • Inspect each step’s input/output and logs.
  • If something fails:
    • Fix the bug
    • Click Replay on the failed run
    • The workflow resumes from the last successful step, not from scratch

No DLQ scripting. No hand-rolled admin UI. Just query, cancel, or replay.


Final Verdict

If you’re asking how to run long-running background jobs in a Next.js app on serverless without hitting timeouts, the core problem isn’t “how do I stretch Vercel timeouts”—it’s “how do I get durable execution without rebuilding a queue stack?”

  • Inngest is the best overall answer:

    • Durable steps with automatic retries and checkpointing
    • No workers or queues to manage
    • Traces, structured logs, and replay out of the box
    • Multi-tenant flow control so one tenant can’t take down everyone else
  • Custom queue + workers makes sense if you explicitly want to own the infra and you’re ready to pay the long-term maintenance cost.

  • Vercel Cron + functions works for simple, short-lived tasks but won’t save you from timeouts or partial failures in complex workflows.

Durability belongs in your code, not bolted on as an afterthought. step.run() and done.


Next Step

Get Started