How do I use Modal to run a one-off batch job across thousands of workers and collect results back to my app?

Most teams hit the same wall the first time they try to fan out a “simple” batch job to thousands of workers: queues, fleets, backoff, retries, partial failures, and the dreaded “where did that job go?” debugging session. On Modal, that entire workflow collapses into a few Python functions you can run locally, then scale out to thousands of containers with the same code.

Quick Answer: Use @app.function plus .map() or .spawn() to fan out work across thousands of Modal workers, then aggregate results back in your app via .get() or a reducer function. You define the batch task as a Modal function, deploy it with modal deploy, and let Modal’s autoscaling, job queue, and result handling do the heavy lifting.

Why This Matters

One-off “big crunch” jobs are exactly the kind of workload that destroys traditional infrastructure ergonomics. You either overprovision a cluster that sits idle 99% of the time, or you risk timeouts and cascading failures when your batch spikes. With Modal, you don’t need a standing cluster at all: you describe your job in Python, fan it out to thousands of GPUs/CPUs in seconds, and collect the results back in the same code path.

This is the difference between “kick off a risky overnight batch” and “launch a 10,000-worker job from a FastAPI handler and still hit your SLA.” Because Modal is built as an AI-native runtime, the same primitives that serve LLM inference (@modal.fastapi_endpoint, stateful @app.cls) also handle embarrassingly parallel batch workloads with sub-second cold starts and instant autoscaling.

Key Benefits:

Massive parallelism on demand: Use .map() or .spawn() to fan out to thousands of workers without pre-provisioning clusters, quotas, or reservations.
Predictable, observable results: Track each job via FunctionCall IDs, aggregate results deterministically, and inspect logs/errors from the Modal dashboard.
Single codebase for batch + app: Use the same Python functions from your web app, CLI, and scheduled jobs (modal.Period/modal.Cron), without separate batch frameworks or YAML pipelines.

Core Concepts & Key Points

Concept	Definition	Why it's important
Modal functions	Python functions decorated with `@app.function()` that run in containers on Modal’s infrastructure.	They are your “worker units” – each call runs in an isolated container with the environment, hardware, and timeouts you define in code.
Job fan-out (`.map()` / `.spawn()`)	Ways to launch many function calls in parallel: `.map()` for bulk parallel execution, `.spawn()` for enqueueing individual jobs.	This is how you turn one batch request into thousands of concurrent workers without managing a queue or cluster yourself.
Result collection (`.get()` / aggregation)	Modal `FunctionCall.get()` (or `list(result_iterator)` for `.map()`) blocks until a result is ready or a timeout is hit.	This lets your app safely wait for all results, handle partial failures explicitly, and update downstream systems with the final aggregated output.

How It Works (Step-by-Step)

Let’s walk through a concrete pattern: you have a web app that needs to process 100,000 items (e.g., documents, images, eval prompts) in a single one-off batch, then return an aggregated result.

At a high level:

Define a Modal function that processes a single item.
Expose a “kickoff” endpoint (or CLI function) that calls .map() or .spawn() across your input set.
Collect results via .get() (or by iterating the mapped results), aggregate them, and store/return them to your app.

1. Define your worker function

Let’s get the imports out of the way and create a simple worker. This is the code that will run on thousands of workers in parallel.

# batch_workers.py
import modal

app = modal.App("one-off-batch-job")

image = (
    modal.Image.debian_slim()
    .pip_install(
        "numpy==1.26.4",
        "tqdm==4.66.4",
    )
)

@app.function(
    image=image,
    timeout=60 * 10,          # up to 10 minutes per item
    concurrency_limit=1000,   # optional: cap per-function concurrency
)
def process_item(item: dict) -> dict:
    """
    Process a single item in the batch.
    Replace this with your real workload: model inference, ETL, etc.
    """
    # Example: pretend work based on an "id"
    item_id = item["id"]
    value = item["value"]

    # ... do CPU/GPU-heavy processing here ...
    result = {
        "id": item_id,
        "processed_value": value * 2,
    }
    return result

This is all regular Python. The only Modal-specific pieces:

app = modal.App("one-off-batch-job") names your app.
@app.function(...) declares that process_item will run in Modal containers with a specific image, timeout, and concurrency behavior.

To deploy the worker so it’s callable from anywhere:

modal deploy batch_workers.py

Modal now knows how to launch containers that can run process_item at scale.

2. Fan out the one-off batch

Now define the “batch driver” that your app will hit once, which in turn fans out to many workers.

You can do this as another function in the same file or a separate module. Here we’ll keep it together and show both a CLI entrypoint (for ad-hoc runs) and an HTTP endpoint (for app-triggered runs).

# batch_workers.py (continued)
from typing import Iterable, List

@app.function(timeout=60 * 60)  # up to 1 hour for the overall batch
def run_batch(items: List[dict]) -> List[dict]:
    """
    Kick off a one-off batch across thousands of workers and
    aggregate the results in this function.
    """
    # Fan out to many workers. Each element in `items` goes to a separate container.
    # Under the hood, Modal autoscaling will spin up as many containers as needed.
    result_iterator = process_item.map(items)

    # Collect results. You can stream these instead of materializing the full list.
    results: List[dict] = []
    for result in result_iterator:
        results.append(result)

    return results

This pattern uses .map():

process_item.map(items) returns an iterator over results.
Modal handles sharding across containers, retries on failures (if configured), and autoscaling.

If you want more control (e.g., job queue semantics, incremental retrieval), use .spawn() instead:

@app.function(timeout=60 * 60)
def run_batch_spawn(items: List[dict]) -> List[dict]:
    calls = [process_item.spawn(item) for item in items]

    results = []
    for call in calls:
        # call is a modal.FunctionCall
        # Block until this result is ready, with a per-call timeout.
        result = call.get(timeout=60 * 5)
        results.append(result)
    return results

.spawn() gives you explicit FunctionCall objects with IDs you can persist in your database if your app wants to poll progress over time.

3. Expose it to your app and collect results

You probably want to kick off this one-off batch from a web app or API server.

Using FastAPI on Modal:

# batch_endpoint.py
import modal
from typing import List

app = modal.App("one-off-batch-job-endpoint")

# Reuse the same image and processing function via import,
# or move them into a shared module.
from batch_workers import run_batch  # ensure this is in your PYTHONPATH

@app.function()
@modal.fastapi_endpoint()
async def start_batch(items: List[dict]):
    """
    HTTP endpoint: POST a list of items, get the processed batch back.
    For very large batches, you'd return a job ID instead.
    """
    # call run_batch remotely
    return await run_batch.remote.aio(items)

Deploy this:

modal deploy batch_endpoint.py

Now you can trigger a one-off batch from your app:

curl -X POST "$YOUR_APP_ENDPOINT/start_batch" \
  -H "Content-Type: application/json" \
  -d '{"items": [{"id": 1, "value": 10}, {"id": 2, "value": 20}]}'

This will:

Call start_batch, running on Modal.
start_batch calls run_batch.remote.aio(items), which fans out to process_item across many containers.
run_batch collects the process_item results and returns them to start_batch, which returns them to your client.

For truly huge batches (hundreds of thousands or millions of items), you’d likely:

Accept the batch metadata via HTTP.
Write items to a dataset or queue (S3, database, or Modal Volume).
Kick off a driver function that spawns jobs and returns a job_id.
Let your app poll another endpoint that checks job status via modal.FunctionCall.from_id(job_id).

Modal’s job queue primitives (.spawn(), FunctionCall.get()) are built for this pattern.

Common Mistakes to Avoid

Packing everything into one giant function call:
If you pass 1,000,000 items into a single @app.function and loop locally, you’ve effectively disabled Modal’s autoscaling. Instead, design functions at the per-item or per-shard level and use .map() or .spawn() to parallelize.
Not persisting job IDs for long-running batches:
For batches that may outlive a single HTTP request or worker process, don’t assume a synchronous end-to-end flow. Persist FunctionCall.object_id for your driver function or individual tasks, and use modal.FunctionCall.from_id(...) to resume, poll status, or re-fetch results.

Real-World Example

Imagine an LLM-heavy evaluation run: you want to score 200,000 prompts against a new model checkpoint, compute metrics, and show a dashboard in your internal tooling. The naive way is to run this on a single GPU box overnight and hope it doesn’t die.

On Modal, you:

Define evaluate_prompt(prompt) as a Modal function with GPU hardware, e.g. gpu="A10G".
Use evaluate_prompt.map(prompts) from a run_eval driver to fan this out to thousands of GPUs in parallel.
Aggregate results in run_eval into a confusion matrix, score metrics, and write them to a Volume or your warehouse.
Trigger this entire flow from a single “Run eval” button in your web app by calling run_eval.spawn(...) and storing the call_id.

Modal spins up GPUs across its multi-cloud capacity pool, keeps cold starts sub-second, and gives you full logs for every failing prompt. Your app just sees a call_id and a final metrics payload; you never touch Kubernetes, auto-scaling groups, or queue configuration.

Pro Tip: For large batches, use hierarchical fan-out: have a driver function split the dataset into shards (e.g., 1,000 items per shard), spawn a process_shard function per shard, and only then fan out per-item inside each shard. This keeps scheduling overhead low and makes it easier to retry or resume specific shards without rerunning the entire batch.

Summary

Running a one-off batch across thousands of workers on Modal is just Python:

Define your per-item worker as an @app.function.
Use .map() for straightforward fan-out and .spawn()/FunctionCall.get() when you need explicit job queue behavior and job IDs.
Wrap the driver in a @modal.fastapi_endpoint (or call .remote() from your app) to trigger batches from your existing systems.
Let Modal handle autoscaling, isolation via gVisor, and retries, while you focus on how to split and aggregate your workload.

You get the scale and elasticity of a large cluster, without the operational drag or YAML.

Next Step

Get Started

How do I use Modal to run a one-off batch job across thousands of workers and collect results back to my app?

Why This Matters

Core Concepts & Key Points

How It Works (Step-by-Step)

1. Define your worker function

2. Fan out the one-off batch

3. Expose it to your app and collect results

Common Mistakes to Avoid

Real-World Example

Summary

Next Step

Keep Reading

More from Platform as a Service (PaaS)

Modal Team plan: how do I enable rollbacks and the static IP proxy, and does it include $100/month free credits?

How do I set up secrets (API keys) and environment variables in Modal for production deployments?

How do I fine-tune a Hugging Face model on Modal and save checkpoints to persistent storage?