
How do I use Modal to run a one-off batch job across thousands of workers and collect results back to my app?
Most teams hit the same wall the first time they try to fan out a “simple” batch job to thousands of workers: queues, fleets, backoff, retries, partial failures, and the dreaded “where did that job go?” debugging session. On Modal, that entire workflow collapses into a few Python functions you can run locally, then scale out to thousands of containers with the same code.
Quick Answer: Use
@app.functionplus.map()or.spawn()to fan out work across thousands of Modal workers, then aggregate results back in your app via.get()or a reducer function. You define the batch task as a Modal function, deploy it withmodal deploy, and let Modal’s autoscaling, job queue, and result handling do the heavy lifting.
Why This Matters
One-off “big crunch” jobs are exactly the kind of workload that destroys traditional infrastructure ergonomics. You either overprovision a cluster that sits idle 99% of the time, or you risk timeouts and cascading failures when your batch spikes. With Modal, you don’t need a standing cluster at all: you describe your job in Python, fan it out to thousands of GPUs/CPUs in seconds, and collect the results back in the same code path.
This is the difference between “kick off a risky overnight batch” and “launch a 10,000-worker job from a FastAPI handler and still hit your SLA.” Because Modal is built as an AI-native runtime, the same primitives that serve LLM inference (@modal.fastapi_endpoint, stateful @app.cls) also handle embarrassingly parallel batch workloads with sub-second cold starts and instant autoscaling.
Key Benefits:
- Massive parallelism on demand: Use
.map()or.spawn()to fan out to thousands of workers without pre-provisioning clusters, quotas, or reservations. - Predictable, observable results: Track each job via
FunctionCallIDs, aggregate results deterministically, and inspect logs/errors from the Modal dashboard. - Single codebase for batch + app: Use the same Python functions from your web app, CLI, and scheduled jobs (
modal.Period/modal.Cron), without separate batch frameworks or YAML pipelines.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Modal functions | Python functions decorated with @app.function() that run in containers on Modal’s infrastructure. | They are your “worker units” – each call runs in an isolated container with the environment, hardware, and timeouts you define in code. |
Job fan-out (.map() / .spawn()) | Ways to launch many function calls in parallel: .map() for bulk parallel execution, .spawn() for enqueueing individual jobs. | This is how you turn one batch request into thousands of concurrent workers without managing a queue or cluster yourself. |
Result collection (.get() / aggregation) | Modal FunctionCall.get() (or list(result_iterator) for .map()) blocks until a result is ready or a timeout is hit. | This lets your app safely wait for all results, handle partial failures explicitly, and update downstream systems with the final aggregated output. |
How It Works (Step-by-Step)
Let’s walk through a concrete pattern: you have a web app that needs to process 100,000 items (e.g., documents, images, eval prompts) in a single one-off batch, then return an aggregated result.
At a high level:
- Define a Modal function that processes a single item.
- Expose a “kickoff” endpoint (or CLI function) that calls
.map()or.spawn()across your input set. - Collect results via
.get()(or by iterating the mapped results), aggregate them, and store/return them to your app.
1. Define your worker function
Let’s get the imports out of the way and create a simple worker. This is the code that will run on thousands of workers in parallel.
# batch_workers.py
import modal
app = modal.App("one-off-batch-job")
image = (
modal.Image.debian_slim()
.pip_install(
"numpy==1.26.4",
"tqdm==4.66.4",
)
)
@app.function(
image=image,
timeout=60 * 10, # up to 10 minutes per item
concurrency_limit=1000, # optional: cap per-function concurrency
)
def process_item(item: dict) -> dict:
"""
Process a single item in the batch.
Replace this with your real workload: model inference, ETL, etc.
"""
# Example: pretend work based on an "id"
item_id = item["id"]
value = item["value"]
# ... do CPU/GPU-heavy processing here ...
result = {
"id": item_id,
"processed_value": value * 2,
}
return result
This is all regular Python. The only Modal-specific pieces:
app = modal.App("one-off-batch-job")names your app.@app.function(...)declares thatprocess_itemwill run in Modal containers with a specificimage, timeout, and concurrency behavior.
To deploy the worker so it’s callable from anywhere:
modal deploy batch_workers.py
Modal now knows how to launch containers that can run process_item at scale.
2. Fan out the one-off batch
Now define the “batch driver” that your app will hit once, which in turn fans out to many workers.
You can do this as another function in the same file or a separate module. Here we’ll keep it together and show both a CLI entrypoint (for ad-hoc runs) and an HTTP endpoint (for app-triggered runs).
# batch_workers.py (continued)
from typing import Iterable, List
@app.function(timeout=60 * 60) # up to 1 hour for the overall batch
def run_batch(items: List[dict]) -> List[dict]:
"""
Kick off a one-off batch across thousands of workers and
aggregate the results in this function.
"""
# Fan out to many workers. Each element in `items` goes to a separate container.
# Under the hood, Modal autoscaling will spin up as many containers as needed.
result_iterator = process_item.map(items)
# Collect results. You can stream these instead of materializing the full list.
results: List[dict] = []
for result in result_iterator:
results.append(result)
return results
This pattern uses .map():
process_item.map(items)returns an iterator over results.- Modal handles sharding across containers, retries on failures (if configured), and autoscaling.
If you want more control (e.g., job queue semantics, incremental retrieval), use .spawn() instead:
@app.function(timeout=60 * 60)
def run_batch_spawn(items: List[dict]) -> List[dict]:
calls = [process_item.spawn(item) for item in items]
results = []
for call in calls:
# call is a modal.FunctionCall
# Block until this result is ready, with a per-call timeout.
result = call.get(timeout=60 * 5)
results.append(result)
return results
.spawn() gives you explicit FunctionCall objects with IDs you can persist in your database if your app wants to poll progress over time.
3. Expose it to your app and collect results
You probably want to kick off this one-off batch from a web app or API server.
Using FastAPI on Modal:
# batch_endpoint.py
import modal
from typing import List
app = modal.App("one-off-batch-job-endpoint")
# Reuse the same image and processing function via import,
# or move them into a shared module.
from batch_workers import run_batch # ensure this is in your PYTHONPATH
@app.function()
@modal.fastapi_endpoint()
async def start_batch(items: List[dict]):
"""
HTTP endpoint: POST a list of items, get the processed batch back.
For very large batches, you'd return a job ID instead.
"""
# call run_batch remotely
return await run_batch.remote.aio(items)
Deploy this:
modal deploy batch_endpoint.py
Now you can trigger a one-off batch from your app:
curl -X POST "$YOUR_APP_ENDPOINT/start_batch" \
-H "Content-Type: application/json" \
-d '{"items": [{"id": 1, "value": 10}, {"id": 2, "value": 20}]}'
This will:
- Call
start_batch, running on Modal. start_batchcallsrun_batch.remote.aio(items), which fans out toprocess_itemacross many containers.run_batchcollects theprocess_itemresults and returns them tostart_batch, which returns them to your client.
For truly huge batches (hundreds of thousands or millions of items), you’d likely:
- Accept the batch metadata via HTTP.
- Write items to a dataset or queue (S3, database, or Modal Volume).
- Kick off a driver function that spawns jobs and returns a
job_id. - Let your app poll another endpoint that checks job status via
modal.FunctionCall.from_id(job_id).
Modal’s job queue primitives (.spawn(), FunctionCall.get()) are built for this pattern.
Common Mistakes to Avoid
-
Packing everything into one giant function call:
If you pass 1,000,000 items into a single@app.functionand loop locally, you’ve effectively disabled Modal’s autoscaling. Instead, design functions at the per-item or per-shard level and use.map()or.spawn()to parallelize. -
Not persisting job IDs for long-running batches:
For batches that may outlive a single HTTP request or worker process, don’t assume a synchronous end-to-end flow. PersistFunctionCall.object_idfor your driver function or individual tasks, and usemodal.FunctionCall.from_id(...)to resume, poll status, or re-fetch results.
Real-World Example
Imagine an LLM-heavy evaluation run: you want to score 200,000 prompts against a new model checkpoint, compute metrics, and show a dashboard in your internal tooling. The naive way is to run this on a single GPU box overnight and hope it doesn’t die.
On Modal, you:
- Define
evaluate_prompt(prompt)as a Modal function with GPU hardware, e.g.gpu="A10G". - Use
evaluate_prompt.map(prompts)from arun_evaldriver to fan this out to thousands of GPUs in parallel. - Aggregate results in
run_evalinto a confusion matrix, score metrics, and write them to a Volume or your warehouse. - Trigger this entire flow from a single “Run eval” button in your web app by calling
run_eval.spawn(...)and storing thecall_id.
Modal spins up GPUs across its multi-cloud capacity pool, keeps cold starts sub-second, and gives you full logs for every failing prompt. Your app just sees a call_id and a final metrics payload; you never touch Kubernetes, auto-scaling groups, or queue configuration.
Pro Tip: For large batches, use hierarchical fan-out: have a driver function split the dataset into shards (e.g., 1,000 items per shard), spawn a
process_shardfunction per shard, and only then fan out per-item inside each shard. This keeps scheduling overhead low and makes it easier to retry or resume specific shards without rerunning the entire batch.
Summary
Running a one-off batch across thousands of workers on Modal is just Python:
- Define your per-item worker as an
@app.function. - Use
.map()for straightforward fan-out and.spawn()/FunctionCall.get()when you need explicit job queue behavior and job IDs. - Wrap the driver in a
@modal.fastapi_endpoint(or call.remote()from your app) to trigger batches from your existing systems. - Let Modal handle autoscaling, isolation via gVisor, and retries, while you focus on how to split and aggregate your workload.
You get the scale and elasticity of a large cluster, without the operational drag or YAML.