Cloudflare Workers background jobs: how do I handle long-running tasks and retries without a worker fleet?
Durable Workflow Orchestration

Cloudflare Workers background jobs: how do I handle long-running tasks and retries without a worker fleet?

11 min read

Most teams hit the same wall with Cloudflare Workers: handling background jobs, long-running tasks, and reliable retries without standing up yet another worker fleet, queue, or cron layer. You want to keep everything on the edge, but you also need durability, ordering, and observability that Workers by themselves don’t give you.

As someone who’s run multi-tenant workloads on both serverless and Kubernetes, I’ll walk through how to handle Cloudflare Workers background jobs cleanly—and where an “infraless” execution layer like Inngest fits in so you don’t end up rebuilding a queue stack from scratch.


Quick Answer: The best overall choice for durable background jobs on Cloudflare Workers—without managing a worker fleet—is Inngest with Durable Functions + Steps.
If your priority is staying 100% on Cloudflare primitives, Durable Objects + Queues + Alarms is often a stronger fit.
For simple, low-volume tasks with best-effort retries, consider KV / D1 + cron-triggered Workers.


At-a-Glance Comparison

RankOptionBest ForPrimary StrengthWatch Out For
1Inngest Durable Functions on CloudflareTeams who want durable workflows, retries, and observability without building infraCode-level durability (step.run()), instant Traces, replay, no worker fleetAdds an external execution platform (Inngest) alongside your Workers
2Durable Objects + Queues + AlarmsCloudflare-only shops with strong infra skillsNative Workers primitives, full control over scheduling & queuesYou own retries, backoff, dead-letter strategy, and observability
3KV/D1 + Cron WorkersSimple periodic or low-volume async workEasiest to start with; zero extra componentsNot suited for heavy, multi-step, or multi-tenant workflows; fragile retries

Comparison Criteria

We evaluated each approach to Cloudflare Workers background jobs against:

  • Durability & Retries:
    Can long-running tasks survive timeouts, deploys, and partial failures—and resume from the last successful step instead of starting over?

  • Operational Complexity:
    How much “queue stack” are you rebuilding—workers, queues, retry logic, dead-letter queues, instrumentation—versus focusing on business logic?

  • Observability & Control:
    Can you see each run, its step-level inputs/outputs, and then query, cancel, or replay runs without writing custom admin tools?


Detailed Breakdown

1. Inngest Durable Functions on Cloudflare (Best overall for production-grade reliability)

Inngest ranks as the top choice because it gives you code-level durability and retries on Cloudflare Workers—without a worker fleet, custom queues, or hand-rolled observability.

You keep writing TypeScript in your Cloudflare Workers environment; Inngest turns that into durable, step-based workflows that can run beyond HTTP timeouts, resume from checkpoints, and be replayed from the UI.

What it does well:

  • Code-level durability with step.run():
    You model your long-running tasks as Steps, and Inngest handles retries and checkpointing.

    import { inngest } from "./client";
    
    export const processReport = inngest.createFunction(
      { id: "process-report" },
      { event: "app/report.created" },
      async ({ event, step }) => {
        const data = await step.run("fetch-data", async () => {
          // call your APIs or databases
          return await fetch(event.data.url).then((r) => r.json());
        });
    
        const result = await step.run("generate-report", async () => {
          // CPU / IO heavy work, can take minutes
          return renderPdf(data);
        });
    
        await step.run("notify-user", async () => {
          await sendEmail(event.data.userId, result.url);
        });
    
        return { ok: true };
      }
    );
    

    Mechanism → outcome:

    • Each step.run() is a named unit of work.
    • If generate-report fails, Inngest retries it with backoff.
    • Once it succeeds, the workflow is checkpointed.
    • A later failure in notify-user won’t re-run the previous heavy steps; execution resumes from the failed step.
  • Infraless: no queues, workers, or cron stack to manage:
    You don’t spin up a worker fleet, stitch queues to cron, or configure dead-letter queues. Inngest provides the execution layer; your Cloudflare Worker just emits events or calls an Inngest Durable Endpoint.

    Example: trigger background work from a Worker handler:

    // inside a Cloudflare Worker
    await fetch(INNGEST_EVENT_URL, {
      method: "POST",
      body: JSON.stringify({
        name: "app/report.created",
        data: { url, userId },
      }),
      headers: { "Content-Type": "application/json" },
    });
    
    return new Response("Queued", { status: 202 });
    
  • Long-running, multi-tenant workloads with flow control:
    Multi-tenant systems are where queue stacks get gnarly—free-tier users blocking paying customers, noisy neighbors saturating concurrency, and manual sharding across queues.
    Inngest bakes in multi-tenant flow control:

    • Concurrency keys (e.g., per-tenant or per-resource)
    • Throttling and prioritization
    • Batching

    You express intent (e.g., “only 1 sync per space at a time, paying customers first”), Inngest enforces it—without 15 queues and manual distribution like GitBook had to do on Google Cloud Tasks.

  • Observable by default: Traces, structured logs, replay:
    Every run is traceable without you wiring up logging, trace IDs, or custom UIs.

    From Inngest Cloud you can:

    • Inspect step inputs/outputs and structured logs.
    • See real-time traces—including every tool or model call for AI-heavy flows.
    • Query runs by event data (e.g., spaceId, tenantId).
    • Replay a single run or bulk replay thousands after a bugfix.
    • Cancel in-flight runs if something goes wrong.

    That’s the difference between “grep logs and reconstruct state” and “click Replay.”

  • Agnostic execution: Works with Cloudflare & beyond:
    You can trigger Inngest from:

    • Cloudflare Workers (API calls, webhooks, Durable Endpoints)
    • Other runtimes (Node, serverless, Kubernetes)
    • Schedules (cron-like, defined in code)

    And you can deploy your Inngest functions to:

    • Edge
    • Serverless platforms
    • Traditional container environments

    But the primitives stay the same: inngest.createFunction(), step.run(), Traces, Replay.

Tradeoffs & Limitations:

  • External execution platform alongside Workers:
    You’re adding Inngest as a dedicated durable execution layer, not “pure Cloudflare.” For most teams, the trade is worth it: less infrastructure tax, more visibility. But it’s still another system in your architecture.

  • Learning the Steps model:
    You have to think in Steps instead of “just a function.” In practice, mapping long-running work to named steps is what saves you during failure and replay.

Decision Trigger:
Choose Inngest on Cloudflare if you want durable background jobs and workflows—automatic retries, checkpointing, multi-tenant controls, and instant Traces—without building a worker fleet, queues, and custom admin UIs.


2. Cloudflare Durable Objects + Queues + Alarms (Best for Cloudflare-only, infra-comfortable teams)

This stack is the strongest Cloudflare-only fit because it uses platform-native primitives to simulate background jobs and scheduling—but you’re responsible for most of the reliability and observability story.

What it does well:

  • Stateful coordination via Durable Objects:
    Durable Objects give you:

    • Single-threaded, stateful “actors” keyed by ID
    • The ability to serialize access per tenant or resource
    • A natural home for per-tenant queues and progress tracking

    You can have each tenant’s Durable Object maintain:

    • A queue of pending jobs
    • Current progress (e.g., which step is running)
    • Retry counters / backoff data
  • Event ingestion via Queues:
    Cloudflare Queues can feed work into your Durable Objects or Workers:

    • Producers enqueue messages.
    • Consumers (Workers) pull and dispatch to Durable Objects.
    • You can fan-in from many producers.

    This works well for:

    • Ingesting webhook events
    • Offloading heavy processing from edge entrypoints
    • Smoothing spikes across consumers
  • Scheduling via Alarms:
    Durable Object alarms act as timers:

    • Re-check the queue every X seconds/minutes
    • Trigger retries after a delay
    • Perform periodic maintenance / cleanup

    You effectively build your own cron runner, tied to per-object state.

Tradeoffs & Limitations:

  • You own retries, backoff, and idempotency:
    You’ll be writing logic like:

    • Max retry count per job
    • Exponential backoff or jitter
    • Idempotency checks (e.g., “have we processed this step already?”)
    • Handling partial failures and resuming work

    That’s exactly the “worker / retry / DLQ” tax most teams underestimate.

  • No built-in step-level checkpointing:
    You can persist checkpoints in Durable Object storage, but you have to define:

    • Step metadata
    • Transition rules
    • Resume logic

    In practice, you end up re-creating what step.run() gives you out of the box in a system like Inngest.

  • Limited out-of-the-box observability:
    You’ll lean on:

    • Logs (and maybe Logpush) for debugging
    • Custom trace IDs that you pass around
    • Your own UI or dashboards to inspect per-job progress

    This is where production incidents get painful—reconstructing multi-step runs across logs and messages.

Decision Trigger:
Choose Durable Objects + Queues + Alarms if you’re committed to staying 100% on Cloudflare primitives, have the time and expertise to own the queue stack, and are okay with building your own durability and observability patterns.


3. KV / D1 + Cron-triggered Workers (Best for simple, low-volume background tasks)

This pattern stands out for simple scenarios because it’s easy, Cloudflare-native, and requires minimal setup—but it doesn’t scale well to complex or long-running workloads.

What it does well:

  • Straightforward implementation:
    You can store “jobs” in KV or D1, and have a cron-triggered Worker process them:

    // cron worker
    export default {
      async scheduled(event, env, ctx) {
        const jobs = await env.DB.prepare(
          "SELECT * FROM jobs WHERE status = 'pending' LIMIT 100"
        ).all();
    
        for (const job of jobs.results) {
          ctx.waitUntil(handleJob(job, env));
        }
      },
    };
    
    async function handleJob(job, env) {
      try {
        // do work
        await env.DB.prepare(
          "UPDATE jobs SET status = 'done' WHERE id = ?"
        ).bind(job.id).run();
      } catch (err) {
        // naive retry logic
        await env.DB.prepare(
          "UPDATE jobs SET retries = retries + 1 WHERE id = ?"
        ).bind(job.id).run();
      }
    }
    

    It’s a quick way to implement:

    • Daily reports
    • Best-effort email sending
    • Low-frequency data syncs
  • Zero extra components:
    You stay within:

    • Workers
    • KV/D1
    • Cron triggers

    No external platform, no queue service, no durable workflow engine.

Tradeoffs & Limitations:

  • Not designed for heavy or long-running workloads:
    You’re constrained by:

    • Worker execution limits
    • Cron schedule windows
    • Lack of built-in failure isolation

    Long-running tasks risk:

    • Timing out
    • Being partially executed
    • Needing manual clean-up or replay logic
  • Ad-hoc reliability and visibility:
    You’ll eventually add:

    • “Processed” flags
    • Retry counters
    • Custom status tables or columns
    • Dashboards to monitor stuck jobs

    This is how a simple cron job quietly evolves into an accidental workflow engine.

Decision Trigger:
Choose KV/D1 + Cron Workers if your background jobs are simple, low-volume, and non-critical—and you’re okay with manually wiring retries and having limited traceability.


How Inngest Actually Plays with Cloudflare Workers

To make this concrete, here’s how you can pair Cloudflare Workers with Inngest to handle background jobs, long-running tasks, and retries—without managing a worker fleet.

1. Trigger Inngest from a Worker

From your Cloudflare Worker, emit an event when work should happen asynchronously:

export default {
  async fetch(request, env, ctx) {
    const { userId, payload } = await request.json();

    // Fire-and-forget background work
    ctx.waitUntil(
      fetch(env.INNGEST_EVENT_URL, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
          name: "app/sync.requested",
          data: { userId, payload },
        }),
      })
    );

    return new Response("Sync started", { status: 202 });
  },
};

2. Implement the durable workflow in Inngest

import { inngest } from "./client";

export const syncData = inngest.createFunction(
  { id: "sync-data" },
  { event: "app/sync.requested" },
  async ({ event, step }) => {
    const source = await step.run("fetch-source", async () => {
      return fetchSourceSystem(event.data.userId);
    });

    const transformed = await step.run("transform", async () => {
      return transformPayload(source, event.data.payload);
    });

    await step.run("push-target", async () => {
      await pushToTargetSystem(event.data.userId, transformed);
    });

    return { ok: true };
  }
);

Now you get:

  • Automatic retries per step with backoff.
  • Checkpointing after each successful step.run().
  • The ability to resume from the last successful step on failure.
  • Per-run Traces with inputs/outputs, logs, and timing.
  • Replay from the UI or via API.

3. Run & iterate locally quickly

Local DX matters. With Inngest:

npx --ignore-scripts=false inngest-cli dev

You get:

  • A local dev server for your functions.
  • Automatic event replay during development.
  • A local Traces UI so you can see your flows before shipping.

Final Verdict

If you’re just dipping your toe into Cloudflare Workers background jobs and your tasks are simple, KV/D1 plus cron-triggered Workers can get you moving quickly—but you’ll feel the pain as soon as jobs become multi-step, multi-tenant, or business-critical.

If you’re deep in Cloudflare, comfortable with infrastructure, and determined to stay all-in on the platform, Durable Objects + Queues + Alarms can form a solid foundation—at the cost of rebuilding your own queue stack: retries, checkpointing, dead-letter handling, and observability included.

If you want reliable long-running tasks, retries, and background workflows on Cloudflare Workers without maintaining a worker fleet or custom admin tooling, Inngest is built to be that missing durable execution layer. You express durability directly in code with step.run(), run anywhere (including Cloudflare), and get instant Traces and replay when things go wrong—without log-grepping across systems.


Next Step

Get Started