
MultiOn concurrency: how should I architect running many parallel agents (queues, rate limits, session management)?
Most teams hit MultiOn’s concurrency ceiling long before they hit CPU or memory limits. The real constraints are: how you queue work, how you pace calls against rate limits, and how you manage session_id lifecycles so agents don’t step on each other.
Below is how I’d architect running many parallel MultiOn agents in production, based on years of keeping Playwright/Selenium farms alive and now mapping that experience onto MultiOn’s Agent API (V1 Beta), Retrieve, and Sessions + Step mode.
Quick Answer: The best overall choice for high-throughput MultiOn concurrency is a centralized job queue + stateless workers. If your priority is strict cost and rate-limit control, a token-bucket style rate limiter around the Agent API is often a stronger fit. For complex, long-lived session workflows, consider a dedicated session-orchestrator service that owns
session_idlifecycles.
At-a-Glance Comparison
| Rank | Option | Best For | Primary Strength | Watch Out For |
|---|---|---|---|---|
| 1 | Centralized Job Queue + Stateless Workers | High-throughput, many parallel agents | Scales horizontally; keeps workers simple; easy to shard by use case | Requires external queue infra and clear job schemas |
| 2 | Token-Bucket Rate Limiter Around Agent API | Strict API pacing & cost control | Protects against 402s and vendor-side throttling; predictable QPS | Adds latency; needs good configuration per workflow type |
| 3 | Dedicated Session-Orchestrator Service | Long-lived, multi-step sessions | Strong guarantees on session_id reuse and ordering | More moving parts; overkill for short, one-shot tasks |
Comparison Criteria
We evaluated each pattern against three practical dimensions you actually feel in production:
- Throughput & Scalability: How easily can you run hundreds or thousands of concurrent agents calling
POST https://api.multion.ai/v1/web/browseand Retrieve without workers bottlenecking or thrashing? - Reliability & Session Integrity: How well does the pattern protect session continuity (
session_idreuse), avoid double-executing steps, and keep long workflows (e.g., Amazon checkout) consistent? - Operational Control (Cost & Limits): How straightforward is it to enforce rate limits, avoid “402 Payment Required”-style failures, and keep your MultiOn spend within budget while still hitting SLAs?
Detailed Breakdown
1. Centralized Job Queue + Stateless Workers (Best overall for high-throughput, many parallel agents)
A centralized queue with stateless workers ranks as the top choice because it maximizes throughput while keeping concurrency, retries, and failure handling out of your app’s hot path.
In this model:
- Jobs represent agent tasks (e.g., “order item on Amazon”, “post on X”, “extract H&M catalog page 3”).
- Workers pull jobs, call the Agent API (V1 Beta) and/or Retrieve, and push results back to a database or message bus.
- Workers are stateless: all they need is the job payload and access to MultiOn via
X_MULTION_API_KEY.
What it does well:
-
Scales cleanly with more agents:
You can run many concurrent agents simply by increasing worker replicas. Each worker:- Reads a job from the queue
- Calls
POST https://api.multion.ai/v1/web/browsewith acmd+urland optionalsession_id - Optionally calls Retrieve for structured JSON output
- Returns results, a new
session_id, or a follow-up job to the queue
Example job payload for an Amazon order flow:
{ "type": "amazon_order", "step": "add_to_cart", "session_id": null, "input": { "url": "https://www.amazon.com/s?k=usb-c+cable", "itemCriteria": { "brand": "Anker", "maxPrice": 20 } } }Worker logic (high level):
const res = await fetch("https://api.multion.ai/v1/web/browse", { method: "POST", headers: { "Content-Type": "application/json", "X_MULTION_API_KEY": process.env.MULTION_API_KEY! }, body: JSON.stringify({ url: job.input.url, cmd: "Find an Anker USB-C cable under $20 and add the best option to cart.", session_id: job.session_id ?? undefined }) }); if (res.status === 402) { // Payment gating – push job to dead-letter queue or pause processing } const data = await res.json(); // Persist session_id for subsequent steps const nextSessionId = data.session_id ?? job.session_id; enqueue({ type: "amazon_order", step: "checkout", session_id: nextSessionId, input: { /* ... */ } }); -
Keeps workers simple and stateless:
No browser lifecycle, no selector maintenance, no Playwright grid to babysit. The only state you carry is:session_idfor session continuity- Workflow state (
stepfield) - Any business payload (cart details, user account, etc.)
-
Built-in reliability via queue semantics:
Use job visibility timeouts and retries instead of ad-hoc cron scripts. If a worker dies mid-call:- The job becomes visible again
- Another worker can attempt the same Agent API call
- You can deduplicate using idempotency keys in your job payload (e.g.,
workflow_run_id)
Tradeoffs & Limitations:
-
Requires queue infrastructure and discipline:
You need to:- Pick a queue (SQS, RabbitMQ, Kafka, Redis-based task queue, etc.)
- Define clear job schemas per workflow type
- Implement DLQs for permanent failures (e.g., 4xx errors, bot protection blocks not recoverable with retries)
Without a well-defined schema, you end up with “payload soup” that’s hard to debug at scale.
Decision Trigger: Choose Centralized Job Queue + Stateless Workers if you want high-throughput parallel MultiOn agents and prioritize scalability and reliability through standard queue patterns.
2. Token-Bucket Rate Limiter Around Agent API (Best for strict API pacing & cost control)
A token-bucket style limiter around POST https://api.multion.ai/v1/web/browse and related endpoints is the strongest fit when you care most about controlling QPS and cost, while still enabling concurrency.
You wrap every MultiOn call in a rate limiter that:
- Enforces global and per-tenant QPS
- Smooths burst traffic
- Shields you from hitting hard limits and getting payment/usage errors like “402 Payment Required”
What it does well:
-
Protects against vendor throttling and surprise bills:
Before each call:await limiter.acquire("multion-global"); const res = await fetch("https://api.multion.ai/v1/web/browse", { /* ... */ });You can:
- Set conservative defaults globally (e.g., 50 req/s)
- Override per workflow type (e.g., Retrieve for catalog extraction might get a higher allowance than expensive multi-step checkout flows)
- Temporarily lower limits if you observe a spike in 402s or 429-like responses
-
Works with any queue/worker architecture:
This layer is orthogonal:- Put it inside your worker
- Put it in a gateway/proxy service all workers route through
- Or use a SaaS rate-limiter in front of your backend
It gives you a single enforcement point for all Agent API and Retrieve traffic.
Tradeoffs & Limitations:
-
Adds latency and tuning overhead:
If your limiter is too strict:- Jobs pile up in the queue
- SLAs degrade
- Workers sit idle waiting for tokens
If it’s too loose:
- You risk hitting financial or platform limits (e.g., more frequent “402 Payment Required” responses)
- You may see partial workflows stalled mid-session due to upstream constraints
You’ll need:
- Observability on call volume per endpoint
- Configurable limits per queue or workflow type
- A path to override limits for high-priority workloads
Decision Trigger: Choose Token-Bucket Rate Limiter Around Agent API if you want strict control over cost and QPS and prioritize protecting MultiOn usage and budgets above raw latency.
3. Dedicated Session-Orchestrator Service (Best for long-lived, multi-step sessions)
A dedicated session-orchestrator service stands out for multi-step, long-running workflows where session_id is the real unit of reliability.
In this pattern:
- A session-orchestrator service owns:
- Creating new sessions (first
POST /v1/web/browsewithoutsession_id) - Storing
session_id+ metadata (user, workflow, last step) - Enforcing ordering of steps per session
- Creating new sessions (first
- Workers never decide which session to use; they receive an already-resolved
session_idfrom the orchestrator.
What it does well:
-
Strong guarantees on session continuity and ordering:
This is crucial for flows like:- Amazon order:
- Step 1: Search + add to cart
- Step 2: Navigate to cart and verify items
- Step 3: Checkout and confirm order
- Posting on X:
- Step 1: Log in (if needed)
- Step 2: Compose tweet
- Step 3: Verify the tweet is live
Sample session-orchestrator flow:
// Pseudocode: create or fetch session for a workflow instance async function getOrCreateSession(workflowRunId: string) { const existing = await db.sessions.findOne({ workflowRunId }); if (existing) return existing.session_id; const res = await fetch("https://api.multion.ai/v1/web/browse", { method: "POST", headers: { "Content-Type": "application/json", "X_MULTION_API_KEY": process.env.MULTION_API_KEY! }, body: JSON.stringify({ url: "https://www.amazon.com/", cmd: "Open Amazon homepage." }) }); const data = await res.json(); await db.sessions.insert({ workflowRunId, session_id: data.session_id, createdAt: new Date() }); return data.session_id; }Then every subsequent step uses
session_idfrom this service. You can guarantee:- Exactly one step executes at a time for a given session
- Sessions are reused for all steps of a workflow
- Sessions are cleaned up when workflows complete
- Amazon order:
-
Encodes session lifetimes and reuse policies:
You can enforce:- Maximum session age (e.g., 30 minutes of inactivity → create a new session)
- Mapping of sessions to tenants (one session per tenant, or per workflow-run)
- Safe tear-down: when a workflow is marked “done,” you can stop reusing its session
Tradeoffs & Limitations:
-
More moving parts than most teams need at small scale:
You’re introducing:- A new service
- A
sessionstable or store - Coordination logic between queue, workers, and this orchestrator
For simple one-shot tasks (e.g., “open H&M product page and extract product details with Retrieve”) this is overkill; a single Agent API call plus Retrieve is fine, no persistent
session_idrequired.
Decision Trigger: Choose Dedicated Session-Orchestrator Service if you want robust handling of long-lived, multi-step workflows and prioritize session continuity and step ordering over architectural simplicity.
How to Combine These Patterns in a Real MultiOn Stack
In practice, you’ll likely use all three patterns together:
-
Job Queue + Stateless Workers as the backbone
- Each job is a logical user workflow: “checkout order,” “post tweet,” “extract catalog page.”
- Workers:
- Use the Agent API (V1 Beta) to control the browser
- Use Retrieve to turn dynamic pages into structured JSON arrays of objects
- Keep no local browser state—MultiOn’s secure remote sessions handle that.
-
Rate Limiter Around All MultiOn Calls
- Wrap calls to:
POST https://api.multion.ai/v1/web/browse(Agent API)- Retrieve endpoint (for data extraction with
renderJs,scrollToBottom,maxItems)
- Enforce global and per-tenant limits and react to “402 Payment Required” and similar responses with:
- Backoff
- Lowered limits
- Temporary queues for low-priority jobs
- Wrap calls to:
-
Session-Orchestrator for Critical Multi-Step Flows
- Use in front of flows like:
- Amazon end-to-end ordering
- X posting + verification
- Any login-gated workflow where session reuse matters
- Persist and reuse
session_iduntil the workflow is done.
- Use in front of flows like:
Example: High-Volume H&M Catalog Extraction
Goal: convert dynamic H&M category pages into structured JSON arrays of objects, at scale.
-
Queue:
- Jobs like
{ type: "hm_extract", pageUrl: "...", page: 1 }
- Jobs like
-
Worker:
-
Calls Retrieve with controls:
const res = await fetch("https://api.multion.ai/v1/retrieve", { method: "POST", headers: { "Content-Type": "application/json", "X_MULTION_API_KEY": process.env.MULTION_API_KEY! }, body: JSON.stringify({ url: job.pageUrl, instructions: "Extract all product cards as JSON with fields: name, price, colors, productUrl, imageUrl.", renderJs: true, scrollToBottom: true, maxItems: 50 }) }); const data = await res.json(); // JSON array of product objects -
No need for long-lived sessions here; each job is independent.
-
-
Rate Limiter:
- Caps Retrieve calls to stay within budget and avoid upstream throttling.
Example: Amazon Checkout Across Multiple Steps
Goal: multi-step purchase via a long-lived session.
- Session-orchestrator creates/returns
session_idforworkflowRunId. - Queue:
amazon_order:add_to_cartamazon_order:review_cartamazon_order:checkout
- Workers:
- Always call Agent API with the orchestrator-provided
session_id.
- Always call Agent API with the orchestrator-provided
- Rate Limiter:
- Only a controlled number of expensive checkout workflows run concurrently.
Final Verdict
For MultiOn concurrency, the safest default is to treat sessions as durable state, calls as metered resources, and workers as stateless executors:
- Use a central job queue + stateless workers as your backbone for running many parallel agents.
- Wrap Agent API (V1 Beta) and Retrieve calls in a token-bucket rate limiter so you don’t learn about limits from “402 Payment Required” logs.
- Introduce a dedicated session-orchestrator when workflows span multiple steps and you need tight control over
session_idreuse and ordering.
If your current stack is a pile of brittle Playwright/Selenium scripts, this architecture gives you the same coverage—Amazon ordering, X posting, H&M extraction—without maintaining a browser farm. Intent goes in (cmd + url + optional session_id), MultiOn’s secure remote sessions execute in the browser with native proxy support, and you get structured JSON or workflow completion out.