
How do I use Modal to run a one-off batch job across thousands of workers and collect results back to my app?
Quick Answer: Use Modal’s batch primitives to fan out a list of tasks into thousands of concurrent workers with
.map()or.spawn(), then collect results with.get()or by aggregating into a shared store (like a Volume or external DB). You define the job as a regular Python function, decorate it with@app.function, and drive the whole batch from your app using the Modal Python client or a deployed endpoint.
Why This Matters
Every AI-heavy app eventually needs to run “oh no” jobs: huge backfills, massive eval sweeps, one-off data migrations, or embeddings re-indexes that have to finish soon but don’t justify standing up a whole new cluster. Spinning this up manually on Kubernetes or raw cloud instances crushes iteration speed—writing YAML, guessing capacity, watching nodes churn—while your actual feature work waits. With Modal, you can express the entire batch as code: a Python function that scales up to thousands of containers when you need it, then back down to zero when you’re done, while you keep full control over retries, timeouts, and result collection.
Key Benefits:
- Massive fan-out in minutes: Use
.map()or.spawn()to spread work across thousands of workers without touching autoscaling configs or quotas. - Deterministic result collection: Pull results back directly into your app with
FunctionCall.get()/.get()on maps, or aggregate into storage for later consumption. - Production-ready by default: Built-in retries, timeouts, logs, and observability on the Modal apps page so one-off jobs behave like first-class production workloads.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
Batch function (@app.function) | A regular Python function decorated with @app.function() that Modal turns into a scalable unit of work. | This is your “worker binary.” You define environment, hardware, and concurrency in code, then fan it out to thousands of tasks. |
Fan-out (.map() / .spawn()) | Ways to launch many independent function calls: .map() for map-style batches, .spawn() for job-queue patterns. | This is how you turn a list of 10,000 items into 10,000 concurrent containers without thinking about nodes, pods, or queues. |
Result handles (FunctionCall / map results) | Objects that represent running or completed calls; you resolve them with .get() to retrieve results or raise errors. | Gives you structured control over completion, retries, and aggregation instead of scraping logs or polling random queues. |
How It Works (Step-by-Step)
Let’s walk through a concrete pattern: run a one-off batch across thousands of workers and collect the results in your main app process.
1. Define the batch worker function
This is the unit of work that will run on each worker. Treat it like a pure function over your input item.
# batch_job.py
import modal
app = modal.App("one-off-batch-example")
@app.function(
timeout=600, # up to 10 minutes per task
retries=modal.Retries(max_retries=3),
)
def process_item(item: dict) -> dict:
# Heavy CPU/GPU logic, network calls, etc.
# Keep it deterministic where possible.
result = {
"id": item["id"],
"value": item["payload"] ** 2, # placeholder for real work
}
return result
You can attach hardware here too (e.g., GPUs) if your job is model-heavy:
@app.function(
image=modal.Image.debian_slim().pip_install("torch", "transformers"),
gpu="A10G",
timeout=1800,
retries=modal.Retries(max_retries=2),
)
def process_item(item: dict) -> dict:
...
2. Deploy the function so your app can call it
Deploy this once. After that, your existing service can trigger batches without re-deploying the worker code.
modal deploy batch_job.py
Modal will show an app name (we used "one-off-batch-example") and function name ("process_item"). These names are how your “controller” app will look up the worker.
3. Fan out work across thousands of workers
In your main application (could be a FastAPI service, a cron job, or a notebook), generate the workload and fan it out to Modal.
Option A: True batch fan-out with .map() (simple, synchronous controller)
# controller.py
import modal
def run_batch(items):
# Look up the deployed function
process_item = modal.Function.from_name(
"one-off-batch-example", # app name
"process_item", # function name
)
# Fan out across thousands of workers
# This returns an iterator that yields results as they complete.
results_iter = process_item.map(items)
results = []
for result in results_iter:
results.append(result)
return results
if __name__ == "__main__":
items = [{"id": i, "payload": i} for i in range(10_000)]
results = run_batch(items)
print(f"Processed {len(results)} items")
Modal takes care of:
- Spinning up as many containers as needed, within platform limits.
- Balancing concurrency to keep latency low.
- Retrying failed tasks according to your
modal.Retriesconfig.
Option B: Queue-style with .spawn() (if you want to decouple submission and result collection)
If you need your app to enqueue a batch and check completion later, use .spawn() and FunctionCall.get()—the same primitives used for the job queue pattern.
# controller_spawn.py
import modal
process_item = modal.Function.from_name("one-off-batch-example", "process_item")
def submit_batch(items):
call_ids = []
for item in items:
call = process_item.spawn(item)
call_ids.append(call.object_id)
return call_ids
def collect_results(call_ids, timeout=600):
results = []
for call_id in call_ids:
fc = modal.FunctionCall.from_id(call_id)
# `get` blocks until completion or timeout
result = fc.get(timeout=timeout)
results.append(result)
return results
if __name__ == "__main__":
items = [{"id": i, "payload": i} for i in range(10_000)]
call_ids = submit_batch(items)
results = collect_results(call_ids)
print(f"Processed {len(results)} items")
This pattern is closer to a traditional job queue: you can store call_ids in your DB and build a UI around them.
4. Wire this into your app API (optional but common)
If you want to trigger the whole batch from a web request and collect results back to your app, expose a controller endpoint with @modal.fastapi_endpoint or run the controller in your existing backend.
Simple example using Modal’s FastAPI integration:
# batch_api.py
import modal
from fastapi import FastAPI
from pydantic import BaseModel
app = modal.App("batch-api-example")
web = FastAPI()
process_item = modal.Function.from_name("one-off-batch-example", "process_item")
class BatchRequest(BaseModel):
items: list[dict]
class BatchResponse(BaseModel):
results: list[dict]
@modal.fastapi_endpoint(app=app)
def fastapi_app():
return web
@web.post("/run-batch", response_model=BatchResponse)
async def run_batch(req: BatchRequest):
# Fan out work and collect results
results = list(process_item.map(req.items))
return BatchResponse(results=results)
Deploy with:
modal deploy batch_api.py
Your app now exposes a /run-batch endpoint that runs a one-off batch over thousands of workers and returns the aggregated results in a single HTTP response (subject to payload size).
Common Mistakes to Avoid
-
Doing heavy work in the controller instead of the worker function:
Keeprun_batchand API handlers thin. All CPU/GPU-intensive work should live in the@app.functionworker, otherwise you’re bottlenecked on a single process and not using Modal’s scaling. -
Forgetting retries/timeouts or leaving them at unsafe defaults for big jobs:
For long-running, one-off batches, explicitly settimeout(up to 24 hours) andmodal.Retrieson the worker function. This prevents hanging tasks and makes failures explicit and observable on the apps page.
Real-World Example
Imagine you need to recompute embeddings for 1 million documents after updating your model. Doing this on a single machine would take hours or days and require tedious capacity planning. On Modal, you:
-
Write a
@app.function(image=..., gpu="A10G")worker that:- Loads your embedding model once per container in
@modal.enter. - Computes embeddings for a single document.
- Writes the result to your vector DB or a Modal Volume.
- Loads your embedding model once per container in
-
Use a controller script that:
- Streams your 1M document IDs from your primary DB.
- Calls
process_item.map(docs)in chunks to avoid holding everything in memory. - Optionally aggregates statistics (throughput, error rates) as results stream back.
-
Run
modal run controller.pyto kick off the one-off batch. Modal instantly fans out over hundreds or thousands of GPUs, finishes the re-index in minutes, then scales back to zero so you’re not paying for idle clusters.
Pro Tip: For truly massive batches, treat the controller as a streaming pipeline: feed
process_item.map()a generator that yields work items progressively. This keeps memory bounded and lets Modal start processing before you’ve enumerated the whole dataset, tightening your wall-clock completion time.
Summary
Running a one-off batch job across thousands of workers on Modal is just Python: define a worker function with @app.function, deploy it once with modal deploy, and then fan out work from your app using .map() or .spawn() plus FunctionCall.get(). Modal handles autoscaling, retries, cold starts, and isolation so you can focus on your data and business logic, not on provisioning another ephemeral cluster. You get a single, code-defined pipeline that you can run today, debug from logs on the apps page, and reuse the next time your product needs a massive backfill.