
Should I choose Modal Starter or Team if I need more than 3 seats and higher GPU concurrency?
If you’re already running real workloads on Modal and you’ve hit the limits of 3 seats and low GPU concurrency, you’re effectively deciding when to move from “single-team hacker project” to “production AI infra.” That’s the line between Modal Starter and Modal Team.
Quick Answer: If you need more than 3 seats and higher GPU concurrency, you’re firmly in Modal Team territory. Starter is optimized for small teams and early prototyping; Team is built for shared GPU capacity, higher parallelism, and governance so multiple engineers—and sometimes multiple products—can ship on top of the same Modal account.
Why This Matters
Hitting the seat cap or GPU concurrency wall is a symptom, not the problem. It means:
- You’ve got more than one engineer touching the same infra surface.
- Your workloads aren’t just “run a demo once”; they’re serving traffic, training, or fanning out batch jobs.
- You can’t afford jobs to sit in queues or block on other people’s experiments.
Choosing between Modal Starter and Modal Team isn’t a pricing exercise; it’s about whether your infra should behave like a shared production substrate or a single-user sandbox. Team gives you higher GPU concurrency, more seats, and better operational controls, so you can keep shipping without micromanaging who is allowed to run what.
Key Benefits:
- More seats, less friction: On Team, you don’t have to play musical chairs with logins as your org grows past 3 engineers.
- Higher GPU concurrency: Run more training jobs, fine-tunes, evals, and inference endpoints in parallel without serializing everything behind one or two GPUs.
- Team-level controls and governance: Enforce access, manage costs, and keep production and experiments from stepping on each other.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Seats | The number of individual users (engineers, data scientists, etc.) who can access and deploy to a Modal account. | If you need >3 people pushing code, debugging logs, and managing endpoints, you’ll outgrow Starter quickly. |
| GPU concurrency | How many GPU-backed containers can run simultaneously under your account. | Controls how many training jobs, LLM servers, or batch workers can be live at the same time without waiting in a queue. |
| Team account | A Modal plan designed for multi-engineer orgs with higher concurrency, credits, and governance features. | Lets you treat Modal as shared production infrastructure, not just individual dev sandboxes. |
How It Works (Step-by-Step)
Let’s walk through how to reason about Starter vs Team when you need more seats and higher GPU concurrency.
-
Map your workloads and concurrency needs
Write down what actually runs on Modal today (or what you expect to run in the next 1–3 months):
- LLM inference endpoints (e.g., via
@modal.fastapi_endpoint,@modal.web_server) - Training/fine-tuning jobs (multi-hour
@app.functionruns on A10G/A100/H100) - Batch processing that fans out with
.map()or.spawn() - Sandboxed code execution (e.g., untrusted tools, eval frameworks)
Then ask:
- How many GPU-backed containers do you want running at the same time?
- Do you need separate capacity for:
- Long-running training jobs
- Always-on inference endpoints
- Spiky eval/batch traffic
If the honest answer is “we routinely need multiple GPUs live for different workloads and can’t afford backlogs,” that’s a Modal Team signal.
- LLM inference endpoints (e.g., via
-
Count your engineers, not just users
Seats are not eyeballs; they’re people who actually:
- Run
modal runormodal deploy - Manage apps, logs, retries, and configuration
- Touch secrets, Volumes, and production endpoints
If you’re already at 3 people and talking about:
- A dedicated ML engineer for training/fine-tuning
- Backend engineers owning inference endpoints or tools
- Data/infra folks managing ETL and batch pipelines
…then Starter’s seat limit becomes a blocker, because:
- You end up sharing a single login (bad for audits and security).
- You can’t give everyone direct access to logs and metrics.
- You slow down iteration because only a subset of the team can deploy.
Modal Team removes the seat constraint so you can scale the “human backend” side along with your GPU usage.
- Run
-
Layer in governance, cost control, and production guarantees
As soon as multiple engineers are deploying to the same production substrate, you need basic controls:
- Permissioning and team controls: Who can deploy? Who can rotate Secrets? Who can touch production Volumes?
- Quota-style thinking: How do you avoid one giant training job starving your LLM endpoints?
- Security and compliance: gVisor isolation, SOC2/HIPAA posture, data residency controls start to matter when you’re handling real user data.
Starter is built to get you shipping quickly. Team is built for “we’re putting this in front of customers, and multiple people need to operate it safely.” If you’re already asking questions about:
- Separate environments (staging vs prod in Modal apps)
- Who can access logs or manage Proxy Auth tokens (
requires_proxy_auth=True) - How to prevent surprise GPU spend
…you’re in Team territory.
Common Mistakes to Avoid
-
Trying to stretch Starter past its intended scale:
Workarounds like sharing logins, manually scheduling who can run GPU jobs when, or pausing endpoints during training will crush your iteration speed and create production risk. If you’re doing this, move to Team. -
Underestimating GPU concurrency needs:
People often plan for “average” usage instead of peaks. In reality, you’ll have moments where:- Evals fan out to hundreds of jobs using
.map() - A multi-hour fine-tune runs on an A100
- An LLM endpoint needs to absorb spiky traffic
Plan for those peaks. If part of your workload must never queue (e.g., live inference), you want Team-level concurrency and autoscaling headroom.
- Evals fan out to hundreds of jobs using
Real-World Example
Say you’re a startup building an AI coding assistant:
- You serve a GPT-style model through
@modal.fastapi_endpoint, backed by GPUs, targeting near real-time latency. - You run nightly evals using
.map()across thousands of prompts to track regressions. - You occasionally fine-tune or LoRA-adapt models on A100s, each job running for 2–8 hours.
- Your org setup:
- 2 ML engineers working on models and training code
- 2 backend engineers working on APIs and integration
- 1 infra/DevOps engineer watching costs and reliability
On Starter, you immediately hit:
-
Seats: You have 5 engineers who all need direct access to Modal. With a 3-seat cap, someone ends up blind—can’t view logs, can’t deploy, or you resort to shared credentials.
-
GPU concurrency:
- Training job is running on a GPU.
- Evals want to spin up dozens of workers in parallel.
- Inference endpoint must stay responsive.
With low concurrency, something queues: either your evals take hours longer than necessary or, worse, your user-facing endpoint competes with internal workloads.
On Modal Team:
- All 5 engineers get their own seats, with clear roles and permissions.
- You configure your training functions with
@app.function(gpu="A100:1", timeout=24*60*60)and let them run in parallel with your inference endpoints. - You run evals via
.map()whenever you want, without worrying about starving inference. - The infra engineer has a single place (Modal apps and dashboards) to monitor usage, logs, and costs.
Pro Tip: Before you upgrade, write down your “must never queue” workloads (e.g., user-facing inference, critical cron jobs). Use those to size the GPU concurrency you ask for, and keep evals/batch work as the “elastic” layer that soaks up additional capacity.
Summary
If you need more than 3 seats and higher GPU concurrency, you’ve already crossed the threshold where Modal Starter is going to slow you down. Starter is ideal for solo devs and very small teams validating ideas; Modal Team is designed for real multi-engineer orgs running production AI workloads with:
- Multiple users managing apps, logs, and deployments.
- Parallel GPU workloads: training, inference, evals, and batch.
- A need for governance, security, and predictable performance.
The moment you’re negotiating who gets seats or scheduling GPU usage by hand, skip the gymnastics and move to Team. Your iteration speed and operational sanity are worth more than squeezing one more month out of a smaller plan.