How do I deploy a FastAPI endpoint on Modal and put it behind a custom domain?
Platform as a Service (PaaS)

How do I deploy a FastAPI endpoint on Modal and put it behind a custom domain?

9 min read

Most teams hit the same wall the first time they put FastAPI into production: the app is easy to write, but wiring up GPU-capable infrastructure, autoscaling, and a clean custom domain is an afternoon of YAML and DNS spelunking. With Modal, you define the whole thing in Python—environment, hardware, scaling, and the HTTPS endpoint—and then hang your own domain in front of it.

Quick Answer: You deploy a FastAPI endpoint on Modal by wrapping a FastAPI() app in a @modal.asgi_app() or @modal.fastapi_endpoint function, then running modal deploy ... to publish it as a web service. To put it behind a custom domain, you point your DNS (typically a CNAME) to the Modal-generated URL and terminate HTTPS with your preferred edge (e.g., Cloudflare, a reverse proxy, or your DNS provider’s managed certificates).

Why This Matters

If you’re shipping AI-backed APIs, your FastAPI endpoints are often the public face of a pretty complex backend: GPU-heavy inference, batch workloads, or sandboxed tools. The last thing you want is for deployment and DNS plumbing to slow you down. Modal gives you an AI-native runtime with sub-second cold starts and instant autoscaling, while your users still see a clean, branded domain like api.yourcompany.com.

Running FastAPI on Modal lets you:

  • iterate in Python instead of glueing together container YAML, load balancers, and certificates;
  • scale to thousands of concurrent requests without provisioning GPUs yourself;
  • keep endpoints isolated with gVisor-based sandboxes and team-level controls, while still fronting everything with your own domain and TLS setup.

Key Benefits:

  • Python-defined infrastructure: Environment, hardware, and routing are just Python code (Images, decorators, and modal deploy), so you don’t maintain extra config layers.
  • Elastic capacity for AI workloads: Autoscale CPU/GPU containers across clouds, including H100/A100/A10G, without quotas or reservations.
  • Production-grade endpoints with clean URLs: Expose FastAPI apps as HTTPS endpoints on Modal and put your own domain in front via DNS and an edge proxy/CDN.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Modal AppA Python module decorated with modal.App(...) plus Functions, Classes, and web endpoints. Deployed with modal deploy.This is your deployment unit: logs, scaling, and resource usage are tracked per app in the Modal dashboard.
ASGI / FastAPI endpointA function decorated with @modal.asgi_app() or @modal.fastapi_endpoint that returns a FastAPI (or other ASGI) app and is exposed as an HTTPS endpoint.This is how you turn local FastAPI code into a scalable, autoscaled web service with sub-second cold starts.
Custom domain frontingPointing a custom DNS name (e.g., api.example.com) at the Modal endpoint and terminating TLS at an edge like Cloudflare or your DNS provider.Users get a stable, branded URL, while Modal handles autoscaling containers and routing under the hood.

How It Works (Step-by-Step)

At a high level, you:

  1. Define a Modal Image and App in Python.
  2. Wrap your FastAPI app in an ASGI endpoint on Modal.
  3. Deploy with modal deploy and test the generated URL.
  4. Put a reverse proxy / edge layer (Cloudflare, Nginx, etc.) in front of that URL.
  5. Point your custom domain at the edge and configure TLS.

Let’s walk through each step.

1. Define your Modal app and Image

First, codify your runtime environment. This is where you install FastAPI and any other dependencies.

# fastapi_modal_app.py
import modal

image = (
    modal.Image.debian_slim()
    .pip_install(
        "fastapi[standard]",
        "uvicorn[standard]",
        # Add your app deps here:
        "pydantic",
        "requests",
    )
)

app = modal.App("fastapi-on-modal", image=image)

A few notes:

  • modal.Image.debian_slim() gives you a slim base with fast build and cold starts.
  • Pin versions tightly in real deployments: fastapi==0.115.0 etc. That keeps behavior reproducible across deploys.
  • You can also add system packages (e.g., via .apt_install("git", "ffmpeg")) if you’re doing media or GPU work.

2. Wrap your FastAPI app with @modal.asgi_app

Next, you create a FastAPI app inside a function decorated with @modal.asgi_app(). Modal will expose this as a web endpoint.

# fastapi_modal_app.py (continued)
from fastapi import FastAPI

@app.function()
@modal.asgi_app()
def fastapi_app():
    web_app = FastAPI(title="My FastAPI on Modal")

    @web_app.get("/healthz")
    async def healthz():
        return {"status": "ok"}

    @web_app.get("/hello")
    async def hello(name: str = "world"):
        return {"message": f"Hello, {name}!"}

    return web_app

What this does:

  • @app.function() tells Modal this is a deployable Function (runs in containers with the image we defined).
  • @modal.asgi_app() wraps that function as an ASGI web server. Modal will:
    • spin up containers running this app,
    • expose a public HTTPS URL,
    • handle routing and autoscaling based on incoming load.

If you prefer one-function-per-route, you can also use @modal.fastapi_endpoint, but for a “normal” FastAPI app with multiple routes, @modal.asgi_app() is the most straightforward.

3. Deploy the FastAPI endpoint to Modal

Once the app file is ready, you deploy with the CLI.

From the same directory:

modal deploy fastapi_modal_app.py

Or, if you’ve packaged it as a module:

modal deploy -m fastapi_modal_app

The deploy does a few things:

  • Builds your Image (installing all pip packages).
  • Registers your fastapi_app ASGI endpoint with Modal.
  • Returns a public URL for the endpoint—printed in the CLI and visible on the Modal apps page.

You can test it immediately:

curl "https://<your-modal-url>/healthz"
# {"status": "ok"}

curl "https://<your-modal-url>/hello?name=Modal"
# {"message": "Hello, Modal!"}

You now have a production-ready HTTPS FastAPI endpoint running on Modal’s infrastructure.

4. Put an edge / reverse proxy in front (for custom domain)

Modal currently issues and manages certificates for its own domains. To serve your API on a custom domain like api.example.com, the standard pattern is:

  • Terminate TLS and host the custom domain at an edge (Cloudflare, AWS CloudFront + ACM, GCP HTTPS LB, or even a small Nginx/Traefik proxy).
  • Configure that edge to forward traffic to the Modal endpoint URL.

Example: using Cloudflare as the edge

  1. Create a CNAME:

    • Name: api
    • Target: your-app--fastapi-app-xyz.modal.run (or whatever URL Modal gives you)
    • Proxy status: Proxied (orange cloud) if you want Cloudflare to terminate TLS.
  2. Page rule / route configuration:

    • For all paths https://api.example.com/*, forward to https://your-app--fastapi-app-xyz.modal.run/$1.
    • Preserve headers; optionally add X-Forwarded-For / X-Forwarded-Proto if you care about client IPs inside the app.
  3. Certificates:

    • Cloudflare issues and manages the TLS cert for api.example.com.
    • The hop from Cloudflare to Modal remains HTTPS.

At this point, traffic flows like:

client → https://api.example.com → Cloudflare/edge → https://your-app--fastapi-app-xyz.modal.run → FastAPI on Modal.

You can do the same with:

  • Nginx/Traefik: Terminate TLS on your own VM and proxy location / to the Modal URL.
  • AWS/GCP LBs: Use the Modal endpoint as the origin and attach your own domain + certificate.

5. Optional: configure routing and concurrency

For more realistic workloads, you’ll want to tune concurrency and parallelism.

@app.function(concurrency_limit=64)  # per container
@modal.asgi_app()
def fastapi_app():
    web_app = FastAPI(title="FastAPI on Modal with tuned concurrency")
    # routes...
    return web_app

Use this when:

  • You’re doing GPU inference and want to bound concurrent requests per GPU (e.g., batch size + max simultaneous decoding).
  • You have blocking calls or slow IO and want to control latency under load.

Modal’s runtime handles autoscaling: if you saturate one container, it spins up more (across thousands of GPUs/CPUs if needed) as requests increase.

Common Mistakes to Avoid

  • Ignoring cold-start behavior:
    If you treat a FastAPI endpoint like a permanent single server, you’ll end up loading large models or caches per request. Instead, use @app.cls with @modal.enter when you need to load weights once per container, and keep FastAPI endpoints thin orchestration layers that call those functions.

  • Hardcoding hostnames and protocols:
    Don’t bake the Modal URL into clients or callbacks that will later move behind a custom domain. Use environment variables or a config dict so you can switch from https://<modal-url> to https://api.example.com without code changes. Also avoid assuming http://—Modal endpoints are HTTPS by default.

Real-World Example

Imagine you’re shipping a text-embedding API used by your product and internal tools. You want:

  • A simple FastAPI interface: POST /embed with text, returns vectors.
  • GPUs on demand, not 24/7 A100s burning money.
  • A stable custom domain: https://embeddings.yourcompany.com.

Here’s how you’d implement it on Modal:

# embeddings_api.py
import modal

image = (
    modal.Image.debian_slim()
    .pip_install(
        "fastapi[standard]",
        "uvicorn[standard]",
        "sentence-transformers==2.7.0",
        "torch==2.2.0",
    )
)

app = modal.App("embeddings-api", image=image)

@app.cls(gpu="A10G:1")
class EmbeddingModel:
    @modal.enter()
    def setup(self):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer("all-MiniLM-L6-v2")

    @modal.method()
    async def embed(self, texts: list[str]) -> list[list[float]]:
        import torch
        with torch.no_grad():
            embs = self.model.encode(texts, convert_to_numpy=True).tolist()
        return embs

@app.function()
@modal.asgi_app()
def fastapi_app():
    from fastapi import FastAPI
    from pydantic import BaseModel

    web_app = FastAPI(title="Embeddings API")

    class EmbedRequest(BaseModel):
        texts: list[str]

    model = EmbeddingModel()

    @web_app.post("/embed")
    async def embed_endpoint(payload: EmbedRequest):
        embs = await model.embed.remote.aio(payload.texts)
        return {"embeddings": embs}

    @web_app.get("/healthz")
    async def healthz():
        return {"status": "ok"}

    return web_app

Deployment:

modal deploy embeddings_api.py

You get a URL like:

https://embeddings-api--fastapi-app-xyz.modal.run

Now put embeddings.yourcompany.com in front via Cloudflare (or your proxy of choice), and your clients talk to:

curl -X POST "https://embeddings.yourcompany.com/embed" \
  -H "Content-Type: application/json" \
  -d '{"texts": ["hello world", "Modal is fun"]}'

Behind the scenes, Modal:

  • starts GPU-backed containers,
  • loads the model once per container (@modal.enter),
  • autosscales concurrency based on traffic,
  • exposes logs and metrics on the apps page.

Your infra footprint is basically one Python file plus a DNS rule.

Pro Tip: When you front Modal with a proxy that does retries (Cloudflare, LBs), align it with Modal’s own retry semantics. For non-idempotent endpoints, disable or narrow automatic retries at the edge and instead use modal.Retries in the parts of your backend that are safe to repeat (downstream calls, batch jobs, etc.).

Summary

Deploying a FastAPI endpoint on Modal and putting it behind a custom domain boils down to:

  • Defining your environment and app in Python (modal.Image, modal.App).
  • Wrapping your FastAPI app with @modal.asgi_app() and deploying via modal deploy.
  • Using an edge (Cloudflare, Nginx, or a cloud LB) to terminate TLS on your own domain and proxy to the Modal endpoint.

You get an AI-native runtime with fast cold starts, instant autoscaling across GPUs and CPUs, and production-grade isolation, without giving up a clean, branded URL for your API consumers.

Next Step

Get Started