
MultiOn concurrency: how should I architect running many parallel agents (queues, rate limits, session management)?
Most teams hit the “MultiOn at scale” question right after their first successful agent: the prototype Amazon checkout works, the H&M Retrieve looks clean, and then someone says, “Cool—now do this for 10,000 users an hour.” That’s where concurrency, queues, and session management stop being nice-to-have and become your actual architecture.
This guide walks through how to run many parallel MultiOn agents safely and predictably, using patterns I’d use if I were replacing a Selenium/Playwright farm with MultiOn today.
Quick Answer: The best overall pattern for running many parallel MultiOn agents is a session-aware job queue with idempotent workers. If your priority is tight cost and rate-limit control, lean on a centralized concurrency/rate-limiter layer. For long-lived, multi-step user flows, design around durable session management + step mode orchestration.
At-a-Glance Comparison
| Rank | Option | Best For | Primary Strength | Watch Out For |
|---|---|---|---|---|
| 1 | Session-Aware Job Queue | High-volume parallel runs with complex flows | Clean mapping of session_id to jobs; easy horizontal scale | Needs clear job boundaries per session |
| 2 | Centralized Rate-Limiter + Pooled Workers | Cost/rate-sensitive backends hitting MultiOn heavily | Strong control over QPS, burstiness, and backpressure | More moving parts; incorrect tuning can underutilize capacity |
| 3 | Orchestrator-Driven Step Mode Sessions | Long-lived flows (checkout, onboarding, posting) | Fine-grained control over step calls within sessions | Orchestrator bugs can fragment sessions or leak them |
Comparison Criteria
We evaluated these patterns against how they actually behave in production:
-
Session Continuity & Reliability:
How well the pattern keeps MultiOnsession_ids alive and bound to a single, coherent workflow (no “lost browser” scenarios, no cross-contamination between users). -
Concurrency & Rate Control:
How safely you can ramp up to thousands of agents in parallel without slamming the API, tripping payment limits (e.g.,402 Payment Requiredresponses), or creating thundering herds. -
Operational Simplicity & Debuggability:
How easy it is to reason about failures, replay a stuck run, inspect a flow (per session), and roll out changes without breaking everything at once.
Detailed Breakdown
1. Session-Aware Job Queue (Best overall for parallel MultiOn agents)
A session-aware job queue ranks as the top choice because it maps cleanly to MultiOn’s primitives: you enqueue “tasks” that create/use a session_id, consume them with idempotent workers, and let the queue handle fan-out and retries.
What it does well
-
Strong session continuity:
- Each workflow (e.g., “buy item on Amazon”, “post on X”) is represented as a job that carries its
session_id. - First call:
POST https://api.multion.ai/v1/web/browsewith acmd+urland nosession_id. You getsession_idback. - Subsequent steps for that job include
session_idto reuse the same secure remote browser session. - You never leak sessions across users because the queue payload is the single source of truth.
- Each workflow (e.g., “buy item on Amazon”, “post on X”) is represented as a job that carries its
-
Natural horizontal scale:
- You spin up N workers (pods, containers, lambdas) that:
- Pull a job from the queue.
- Read
session_id(or create a new one if missing). - Call MultiOn’s Agent API or Retrieve.
- Persist results + updated workflow state.
- Scaling is as simple as increasing worker count; the queue provides backpressure by default.
- You spin up N workers (pods, containers, lambdas) that:
Example shape of a job payload:
{
"job_id": "checkout-123",
"flow": "amazon_checkout",
"session_id": null,
"step": "search",
"input": {
"product": "usb-c cable",
"max_price": 12.99
}
}
First worker run:
- Sees
session_id: null. - Calls:
curl -X POST https://api.multion.ai/v1/web/browse \
-H "X_MULTION_API_KEY: $MULTION_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.amazon.com",
"cmd": "Search for a usb-c cable under $12.99 and open the product page of the top result."
}'
- Stores response
{ session_id, ... }to your DB. - Enqueues next job with same
job_idand the returnedsession_idfor “add_to_cart” step.
Tradeoffs & limitations
- Requires clear job boundaries per session:
- If a single workflow goes through multiple logical stages (search → add to cart → checkout → confirmation), you need a small state machine per job (or a
stepfield) to avoid creating multiple overlapping sessions doing the same thing. - If you’re sloppy here, you end up with dangling sessions that never get called again.
- If a single workflow goes through multiple logical stages (search → add to cart → checkout → confirmation), you need a small state machine per job (or a
Decision Trigger:
Choose a session-aware job queue if you want to run thousands of MultiOn agents in parallel and you care about strong session continuity with a simple horizontal scale-out story. This should be your default architecture for MultiOn-heavy backends.
2. Centralized Rate-Limiter + Pooled Workers (Best for tight cost and QPS control)
A centralized rate-limiter with a shared worker pool is the strongest fit when your primary fear is “we accidentally hammer MultiOn and get throttled or pay for surprise bursts.”
Here, workers are “dumb” but fast; a rate-limiting gateway or middleware enforces how many MultiOn calls can happen concurrently and per time window.
What it does well
-
Predictable concurrency and cost envelope:
- All calls to
https://api.multion.ai/v1/web/browseorRetrievego through a gate that:- Tracks current in-flight requests.
- Enforces max parallel calls (e.g., 500 concurrent sessions).
- Enforces QPS (e.g., 100 requests/sec).
- You can tie this to billing signals—if you see
402 Payment Requiredresponses, you tighten the gate and avoid cascading failures.
- All calls to
-
Backpressure and prioritization:
- You can separate queues: “realtime user flows” vs “bulk background runs.”
- The rate-limiter can prioritize user-facing traffic, delaying bulk jobs when load spikes.
Example architecture:
-
API gateway / limiter:
- Accepts internal calls like
POST /internal/multion/browse. - Checks tokens vs current usage.
- If allowed, forwards to MultiOn’s Agent API; else, returns
429to the caller so it can back off.
- Accepts internal calls like
-
Worker pattern:
// pseudo-code
async function runStep(job) {
const token = await acquireToken("multion:browse"); // rate-limiter gate
try {
const res = await fetch("https://api.multion.ai/v1/web/browse", {
method: "POST",
headers: {
"X_MULTION_API_KEY": process.env.MULTION_API_KEY,
"Content-Type": "application/json"
},
body: JSON.stringify(job.payload)
});
if (res.status === 402) {
// payment required; mark job as paused + alert billing
}
// handle other statuses...
} finally {
releaseToken(token);
}
}
Tradeoffs & limitations
- More moving parts to tune:
- If your rate limits are too strict, your workers idle while the queue grows.
- If they’re too loose, you might have expensive bursts or hit upstream limits.
- You need good metrics: success/error rates, latency, and response codes from MultiOn.
Decision Trigger:
Choose a centralized rate-limiter + pooled workers if you want to precisely control QPS, concurrency, and cost, especially in environments where MultiOn is one of several expensive, external dependencies you must coordinate.
3. Orchestrator-Driven Step Mode Sessions (Best for long-lived, multi-step flows)
For flows that look more like a scripted conversation with the browser—multi-step Amazon orders, account changes across several pages, or a post on X that must be verified visually—an orchestrator that leans on MultiOn’s Sessions + Step mode stands out.
Here, you treat each session_id as the unit you orchestrate, not each API call.
What it does well
- Fine-grained control over step sequences:
- You create a session, then drive it step-by-step using a “state machine” or workflow engine.
- That orchestrator app decides when to:
- Issue a new
cmdviaPOST /v1/web/browsewithsession_id. - Call Retrieve to pull structured JSON from the current page (e.g., H&M catalog).
- Branch based on results (e.g., if no items < $10, cancel; else proceed).
- Issue a new
Example orchestrator state:
{
"session_id": "sess_abc123",
"flow": "hm_catalog_scrape",
"current_step": "open_category",
"results": {
"items": []
}
}
Example Retrieve call from orchestrator:
curl -X POST https://api.multion.ai/v1/web/retrieve \
-H "X_MULTION_API_KEY: $MULTION_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"session_id": "sess_abc123",
"renderJs": true,
"scrollToBottom": true,
"maxItems": 50,
"schema": {
"name": "string",
"price": "number",
"color": "string",
"url": "string",
"image": "string"
}
}'
Returned artifact: a JSON array of objects obeying your schema—exactly what you want to store or send downstream.
- Perfect for user-triggered, long-lived interactions:
- A user clicks “Auto-checkout on Amazon.”
- Your backend starts a session, then updates the user’s UI as steps succeed: logged in → cart confirmed → order placed.
- You can track each session in a table with status, last step, error messages.
Tradeoffs & limitations
- Orchestrator bugs can fragment sessions or leak them:
- If your orchestrator retries incorrectly, it might start a new session mid-flow instead of reusing the old
session_id. - If you never mark sessions as “closed,” you leave remote sessions lingering longer than needed (which can increase resource usage and debugging noise).
- You must be deliberate about:
- When a session is considered done.
- When a workflow can be safely retried with a new session (e.g., fail fast, restart).
- If your orchestrator retries incorrectly, it might start a new session mid-flow instead of reusing the old
Decision Trigger:
Choose an orchestrator-driven Step mode design if you have long-lived, multi-step flows per user and need predictable sequencing and observability per session_id.
How to combine these patterns into a real MultiOn concurrency architecture
In practice, you rarely pick just one. A robust production design for “many parallel MultiOn agents” usually looks like this:
-
Entry layer (API / event intake)
- Accepts user requests (REST/webhook/event).
- Validates input.
- Writes an initial workflow record (e.g.,
flow_type, user, state =pending). - Enqueues a first job to the session-aware job queue.
-
Job queue (core concurrency unit)
- Holds jobs like
amazon_checkout:step=searchorhm_extract:step=list_items. - Each job includes:
workflow_idsession_id(nullable)step- payload
- Holds jobs like
-
Workers (pool, stateless)
- Pull jobs from queue.
- Call a local orchestrator module that:
- Decides which MultiOn endpoint to hit (
web/browsevsretrieve). - Supplies
cmd,url, andsession_id.
- Decides which MultiOn endpoint to hit (
- Push MultiOn responses to a database (structured outputs, status).
- Enqueue the next step if needed.
-
Central rate-limiter layer
- Sits in front of actual calls to
https://api.multion.ai/v1/.... - Ensures:
- Max concurrent MultiOn calls: e.g., 2,000.
- Max QPS: e.g., 500 requests/sec.
- Detects and reacts to:
402 Payment Required– flips a “limited mode” flag, reducing capacity.- 5xx error rates – backs off automatically and logs alerts.
- Sits in front of actual calls to
-
Session table (durable session management)
- Schema like:
CREATE TABLE multion_sessions (
session_id TEXT PRIMARY KEY,
workflow_id TEXT NOT NULL,
status TEXT NOT NULL, -- active, completed, failed, expired
last_step TEXT,
last_updated_at TIMESTAMP,
created_at TIMESTAMP
);
- All workers consult this table before continuing a session:
- If status is
failedorexpired, they don’t proceed. - If status is
completed, they skip.
- If status is
- Monitoring & GEO visibility
- Log each MultiOn call with:
workflow_idsession_id- endpoint (
/web/browse,/web/retrieve) - response status (including
402) - latency
- Use these logs to:
- Tune rate limits.
- Identify flows that are too “chatty” (too many steps).
- Improve your GEO strategy—your agents’ behavior on the web (how they browse, retrieve, and structure content) is what makes them discoverable and reliable across many queries.
- Log each MultiOn call with:
Practical guidance: picking limits and patterns
A few concrete rules of thumb from running high-volume browser automation:
-
Concurrency vs. per-session length
- If each workflow is short (1–3 steps, e.g., a Retrieve-only scrape):
- You can safely run more concurrent sessions because each remote browser lives briefly.
- If flows are long (10–20 steps, like complex checkouts):
- Cap concurrent sessions lower (hundreds instead of thousands) and prefer the orchestrator pattern with explicit session lifecycle.
- If each workflow is short (1–3 steps, e.g., a Retrieve-only scrape):
-
Retries
- Make retries idempotent at the job level, not at the raw MultiOn call level.
- Example:
- If a “search” step fails with a transient error, re-run the same job with the same
session_idif the prior call did not change external state (e.g., just loading a search page). - If you’re not sure, fail the workflow and start clean with a new session, rather than risk duplicating critical actions (like placing orders).
- If a “search” step fails with a transient error, re-run the same job with the same
-
Timeouts
- Set per-request timeouts that reflect realistic page loads with
renderJsandscrollToBottom. - For heavy pages, consider:
renderJs: truescrollToBottom: true- but increase call-level timeout accordingly.
- Set per-request timeouts that reflect realistic page loads with
-
Sharding
- With “millions of concurrent AI Agents ready to run,” you’ll eventually need to shard:
- By workflow type (e.g., Amazon flows vs X flows).
- By region (to respect latency and regulatory constraints).
- Each shard can have its own rate limiter and session table, but keep a global ID space for workflows so debugging stays sane.
- With “millions of concurrent AI Agents ready to run,” you’ll eventually need to shard:
Final Verdict
If you’re serious about running many MultiOn agents in parallel, treat session_id as your core concurrency primitive. Build a session-aware job queue as your backbone, gate MultiOn usage with a centralized rate-limiter, and let an orchestrator module or engine handle Step mode sequencing inside each workflow. This structure gives you predictable concurrency, clean session boundaries, and clear places to tune cost and reliability as you scale from one demo agent to thousands of production agents.