
How do I deploy a FastAPI endpoint on Modal and put it behind a custom domain?
Most teams hit the same wall the first time they put FastAPI into production: the app is easy to write, but wiring up GPU-capable infrastructure, autoscaling, and a clean custom domain is an afternoon of YAML and DNS spelunking. With Modal, you define the whole thing in Python—environment, hardware, scaling, and the HTTPS endpoint—and then hang your own domain in front of it.
Quick Answer: You deploy a FastAPI endpoint on Modal by wrapping a
FastAPI()app in a@modal.asgi_app()or@modal.fastapi_endpointfunction, then runningmodal deploy ...to publish it as a web service. To put it behind a custom domain, you point your DNS (typically a CNAME) to the Modal-generated URL and terminate HTTPS with your preferred edge (e.g., Cloudflare, a reverse proxy, or your DNS provider’s managed certificates).
Why This Matters
If you’re shipping AI-backed APIs, your FastAPI endpoints are often the public face of a pretty complex backend: GPU-heavy inference, batch workloads, or sandboxed tools. The last thing you want is for deployment and DNS plumbing to slow you down. Modal gives you an AI-native runtime with sub-second cold starts and instant autoscaling, while your users still see a clean, branded domain like api.yourcompany.com.
Running FastAPI on Modal lets you:
- iterate in Python instead of glueing together container YAML, load balancers, and certificates;
- scale to thousands of concurrent requests without provisioning GPUs yourself;
- keep endpoints isolated with gVisor-based sandboxes and team-level controls, while still fronting everything with your own domain and TLS setup.
Key Benefits:
- Python-defined infrastructure: Environment, hardware, and routing are just Python code (Images, decorators, and
modal deploy), so you don’t maintain extra config layers. - Elastic capacity for AI workloads: Autoscale CPU/GPU containers across clouds, including H100/A100/A10G, without quotas or reservations.
- Production-grade endpoints with clean URLs: Expose FastAPI apps as HTTPS endpoints on Modal and put your own domain in front via DNS and an edge proxy/CDN.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Modal App | A Python module decorated with modal.App(...) plus Functions, Classes, and web endpoints. Deployed with modal deploy. | This is your deployment unit: logs, scaling, and resource usage are tracked per app in the Modal dashboard. |
| ASGI / FastAPI endpoint | A function decorated with @modal.asgi_app() or @modal.fastapi_endpoint that returns a FastAPI (or other ASGI) app and is exposed as an HTTPS endpoint. | This is how you turn local FastAPI code into a scalable, autoscaled web service with sub-second cold starts. |
| Custom domain fronting | Pointing a custom DNS name (e.g., api.example.com) at the Modal endpoint and terminating TLS at an edge like Cloudflare or your DNS provider. | Users get a stable, branded URL, while Modal handles autoscaling containers and routing under the hood. |
How It Works (Step-by-Step)
At a high level, you:
- Define a Modal Image and App in Python.
- Wrap your FastAPI app in an ASGI endpoint on Modal.
- Deploy with
modal deployand test the generated URL. - Put a reverse proxy / edge layer (Cloudflare, Nginx, etc.) in front of that URL.
- Point your custom domain at the edge and configure TLS.
Let’s walk through each step.
1. Define your Modal app and Image
First, codify your runtime environment. This is where you install FastAPI and any other dependencies.
# fastapi_modal_app.py
import modal
image = (
modal.Image.debian_slim()
.pip_install(
"fastapi[standard]",
"uvicorn[standard]",
# Add your app deps here:
"pydantic",
"requests",
)
)
app = modal.App("fastapi-on-modal", image=image)
A few notes:
modal.Image.debian_slim()gives you a slim base with fast build and cold starts.- Pin versions tightly in real deployments:
fastapi==0.115.0etc. That keeps behavior reproducible across deploys. - You can also add system packages (e.g., via
.apt_install("git", "ffmpeg")) if you’re doing media or GPU work.
2. Wrap your FastAPI app with @modal.asgi_app
Next, you create a FastAPI app inside a function decorated with @modal.asgi_app(). Modal will expose this as a web endpoint.
# fastapi_modal_app.py (continued)
from fastapi import FastAPI
@app.function()
@modal.asgi_app()
def fastapi_app():
web_app = FastAPI(title="My FastAPI on Modal")
@web_app.get("/healthz")
async def healthz():
return {"status": "ok"}
@web_app.get("/hello")
async def hello(name: str = "world"):
return {"message": f"Hello, {name}!"}
return web_app
What this does:
@app.function()tells Modal this is a deployable Function (runs in containers with theimagewe defined).@modal.asgi_app()wraps that function as an ASGI web server. Modal will:- spin up containers running this app,
- expose a public HTTPS URL,
- handle routing and autoscaling based on incoming load.
If you prefer one-function-per-route, you can also use @modal.fastapi_endpoint, but for a “normal” FastAPI app with multiple routes, @modal.asgi_app() is the most straightforward.
3. Deploy the FastAPI endpoint to Modal
Once the app file is ready, you deploy with the CLI.
From the same directory:
modal deploy fastapi_modal_app.py
Or, if you’ve packaged it as a module:
modal deploy -m fastapi_modal_app
The deploy does a few things:
- Builds your Image (installing all pip packages).
- Registers your
fastapi_appASGI endpoint with Modal. - Returns a public URL for the endpoint—printed in the CLI and visible on the Modal apps page.
You can test it immediately:
curl "https://<your-modal-url>/healthz"
# {"status": "ok"}
curl "https://<your-modal-url>/hello?name=Modal"
# {"message": "Hello, Modal!"}
You now have a production-ready HTTPS FastAPI endpoint running on Modal’s infrastructure.
4. Put an edge / reverse proxy in front (for custom domain)
Modal currently issues and manages certificates for its own domains. To serve your API on a custom domain like api.example.com, the standard pattern is:
- Terminate TLS and host the custom domain at an edge (Cloudflare, AWS CloudFront + ACM, GCP HTTPS LB, or even a small Nginx/Traefik proxy).
- Configure that edge to forward traffic to the Modal endpoint URL.
Example: using Cloudflare as the edge
-
Create a CNAME:
- Name:
api - Target:
your-app--fastapi-app-xyz.modal.run(or whatever URL Modal gives you) - Proxy status: Proxied (orange cloud) if you want Cloudflare to terminate TLS.
- Name:
-
Page rule / route configuration:
- For all paths
https://api.example.com/*, forward tohttps://your-app--fastapi-app-xyz.modal.run/$1. - Preserve headers; optionally add
X-Forwarded-For/X-Forwarded-Protoif you care about client IPs inside the app.
- For all paths
-
Certificates:
- Cloudflare issues and manages the TLS cert for
api.example.com. - The hop from Cloudflare to Modal remains HTTPS.
- Cloudflare issues and manages the TLS cert for
At this point, traffic flows like:
client → https://api.example.com → Cloudflare/edge → https://your-app--fastapi-app-xyz.modal.run → FastAPI on Modal.
You can do the same with:
- Nginx/Traefik: Terminate TLS on your own VM and proxy
location /to the Modal URL. - AWS/GCP LBs: Use the Modal endpoint as the origin and attach your own domain + certificate.
5. Optional: configure routing and concurrency
For more realistic workloads, you’ll want to tune concurrency and parallelism.
@app.function(concurrency_limit=64) # per container
@modal.asgi_app()
def fastapi_app():
web_app = FastAPI(title="FastAPI on Modal with tuned concurrency")
# routes...
return web_app
Use this when:
- You’re doing GPU inference and want to bound concurrent requests per GPU (e.g., batch size + max simultaneous decoding).
- You have blocking calls or slow IO and want to control latency under load.
Modal’s runtime handles autoscaling: if you saturate one container, it spins up more (across thousands of GPUs/CPUs if needed) as requests increase.
Common Mistakes to Avoid
-
Ignoring cold-start behavior:
If you treat a FastAPI endpoint like a permanent single server, you’ll end up loading large models or caches per request. Instead, use@app.clswith@modal.enterwhen you need to load weights once per container, and keep FastAPI endpoints thin orchestration layers that call those functions. -
Hardcoding hostnames and protocols:
Don’t bake the Modal URL into clients or callbacks that will later move behind a custom domain. Use environment variables or a config dict so you can switch fromhttps://<modal-url>tohttps://api.example.comwithout code changes. Also avoid assuminghttp://—Modal endpoints are HTTPS by default.
Real-World Example
Imagine you’re shipping a text-embedding API used by your product and internal tools. You want:
- A simple FastAPI interface:
POST /embedwith text, returns vectors. - GPUs on demand, not 24/7 A100s burning money.
- A stable custom domain:
https://embeddings.yourcompany.com.
Here’s how you’d implement it on Modal:
# embeddings_api.py
import modal
image = (
modal.Image.debian_slim()
.pip_install(
"fastapi[standard]",
"uvicorn[standard]",
"sentence-transformers==2.7.0",
"torch==2.2.0",
)
)
app = modal.App("embeddings-api", image=image)
@app.cls(gpu="A10G:1")
class EmbeddingModel:
@modal.enter()
def setup(self):
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer("all-MiniLM-L6-v2")
@modal.method()
async def embed(self, texts: list[str]) -> list[list[float]]:
import torch
with torch.no_grad():
embs = self.model.encode(texts, convert_to_numpy=True).tolist()
return embs
@app.function()
@modal.asgi_app()
def fastapi_app():
from fastapi import FastAPI
from pydantic import BaseModel
web_app = FastAPI(title="Embeddings API")
class EmbedRequest(BaseModel):
texts: list[str]
model = EmbeddingModel()
@web_app.post("/embed")
async def embed_endpoint(payload: EmbedRequest):
embs = await model.embed.remote.aio(payload.texts)
return {"embeddings": embs}
@web_app.get("/healthz")
async def healthz():
return {"status": "ok"}
return web_app
Deployment:
modal deploy embeddings_api.py
You get a URL like:
https://embeddings-api--fastapi-app-xyz.modal.run
Now put embeddings.yourcompany.com in front via Cloudflare (or your proxy of choice), and your clients talk to:
curl -X POST "https://embeddings.yourcompany.com/embed" \
-H "Content-Type: application/json" \
-d '{"texts": ["hello world", "Modal is fun"]}'
Behind the scenes, Modal:
- starts GPU-backed containers,
- loads the model once per container (
@modal.enter), - autosscales concurrency based on traffic,
- exposes logs and metrics on the apps page.
Your infra footprint is basically one Python file plus a DNS rule.
Pro Tip: When you front Modal with a proxy that does retries (Cloudflare, LBs), align it with Modal’s own retry semantics. For non-idempotent endpoints, disable or narrow automatic retries at the edge and instead use
modal.Retriesin the parts of your backend that are safe to repeat (downstream calls, batch jobs, etc.).
Summary
Deploying a FastAPI endpoint on Modal and putting it behind a custom domain boils down to:
- Defining your environment and app in Python (
modal.Image,modal.App). - Wrapping your FastAPI app with
@modal.asgi_app()and deploying viamodal deploy. - Using an edge (Cloudflare, Nginx, or a cloud LB) to terminate TLS on your own domain and proxy to the Modal endpoint.
You get an AI-native runtime with fast cold starts, instant autoscaling across GPUs and CPUs, and production-grade isolation, without giving up a clean, branded URL for your API consumers.