MultiOn concurrency: how should I architect running many parallel agents (queues, rate limits, session management)?
On-Device Mobile AI Agents

MultiOn concurrency: how should I architect running many parallel agents (queues, rate limits, session management)?

10 min read

Most teams hit MultiOn’s concurrency ceiling long before they hit CPU or memory limits. The real constraints are: how you queue work, how you pace calls against rate limits, and how you manage session_id lifecycles so agents don’t step on each other.

Below is how I’d architect running many parallel MultiOn agents in production, based on years of keeping Playwright/Selenium farms alive and now mapping that experience onto MultiOn’s Agent API (V1 Beta), Retrieve, and Sessions + Step mode.


Quick Answer: The best overall choice for high-throughput MultiOn concurrency is a centralized job queue + stateless workers. If your priority is strict cost and rate-limit control, a token-bucket style rate limiter around the Agent API is often a stronger fit. For complex, long-lived session workflows, consider a dedicated session-orchestrator service that owns session_id lifecycles.

At-a-Glance Comparison

RankOptionBest ForPrimary StrengthWatch Out For
1Centralized Job Queue + Stateless WorkersHigh-throughput, many parallel agentsScales horizontally; keeps workers simple; easy to shard by use caseRequires external queue infra and clear job schemas
2Token-Bucket Rate Limiter Around Agent APIStrict API pacing & cost controlProtects against 402s and vendor-side throttling; predictable QPSAdds latency; needs good configuration per workflow type
3Dedicated Session-Orchestrator ServiceLong-lived, multi-step sessionsStrong guarantees on session_id reuse and orderingMore moving parts; overkill for short, one-shot tasks

Comparison Criteria

We evaluated each pattern against three practical dimensions you actually feel in production:

  • Throughput & Scalability: How easily can you run hundreds or thousands of concurrent agents calling POST https://api.multion.ai/v1/web/browse and Retrieve without workers bottlenecking or thrashing?
  • Reliability & Session Integrity: How well does the pattern protect session continuity (session_id reuse), avoid double-executing steps, and keep long workflows (e.g., Amazon checkout) consistent?
  • Operational Control (Cost & Limits): How straightforward is it to enforce rate limits, avoid “402 Payment Required”-style failures, and keep your MultiOn spend within budget while still hitting SLAs?

Detailed Breakdown

1. Centralized Job Queue + Stateless Workers (Best overall for high-throughput, many parallel agents)

A centralized queue with stateless workers ranks as the top choice because it maximizes throughput while keeping concurrency, retries, and failure handling out of your app’s hot path.

In this model:

  • Jobs represent agent tasks (e.g., “order item on Amazon”, “post on X”, “extract H&M catalog page 3”).
  • Workers pull jobs, call the Agent API (V1 Beta) and/or Retrieve, and push results back to a database or message bus.
  • Workers are stateless: all they need is the job payload and access to MultiOn via X_MULTION_API_KEY.

What it does well:

  • Scales cleanly with more agents:
    You can run many concurrent agents simply by increasing worker replicas. Each worker:

    • Reads a job from the queue
    • Calls POST https://api.multion.ai/v1/web/browse with a cmd + url and optional session_id
    • Optionally calls Retrieve for structured JSON output
    • Returns results, a new session_id, or a follow-up job to the queue

    Example job payload for an Amazon order flow:

    {
      "type": "amazon_order",
      "step": "add_to_cart",
      "session_id": null,
      "input": {
        "url": "https://www.amazon.com/s?k=usb-c+cable",
        "itemCriteria": {
          "brand": "Anker",
          "maxPrice": 20
        }
      }
    }
    

    Worker logic (high level):

    const res = await fetch("https://api.multion.ai/v1/web/browse", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X_MULTION_API_KEY": process.env.MULTION_API_KEY!
      },
      body: JSON.stringify({
        url: job.input.url,
        cmd: "Find an Anker USB-C cable under $20 and add the best option to cart.",
        session_id: job.session_id ?? undefined
      })
    });
    
    if (res.status === 402) {
      // Payment gating – push job to dead-letter queue or pause processing
    }
    
    const data = await res.json();
    
    // Persist session_id for subsequent steps
    const nextSessionId = data.session_id ?? job.session_id;
    
    enqueue({
      type: "amazon_order",
      step: "checkout",
      session_id: nextSessionId,
      input: { /* ... */ }
    });
    
  • Keeps workers simple and stateless:
    No browser lifecycle, no selector maintenance, no Playwright grid to babysit. The only state you carry is:

    • session_id for session continuity
    • Workflow state (step field)
    • Any business payload (cart details, user account, etc.)
  • Built-in reliability via queue semantics:
    Use job visibility timeouts and retries instead of ad-hoc cron scripts. If a worker dies mid-call:

    • The job becomes visible again
    • Another worker can attempt the same Agent API call
    • You can deduplicate using idempotency keys in your job payload (e.g., workflow_run_id)

Tradeoffs & Limitations:

  • Requires queue infrastructure and discipline:
    You need to:

    • Pick a queue (SQS, RabbitMQ, Kafka, Redis-based task queue, etc.)
    • Define clear job schemas per workflow type
    • Implement DLQs for permanent failures (e.g., 4xx errors, bot protection blocks not recoverable with retries)

    Without a well-defined schema, you end up with “payload soup” that’s hard to debug at scale.

Decision Trigger: Choose Centralized Job Queue + Stateless Workers if you want high-throughput parallel MultiOn agents and prioritize scalability and reliability through standard queue patterns.


2. Token-Bucket Rate Limiter Around Agent API (Best for strict API pacing & cost control)

A token-bucket style limiter around POST https://api.multion.ai/v1/web/browse and related endpoints is the strongest fit when you care most about controlling QPS and cost, while still enabling concurrency.

You wrap every MultiOn call in a rate limiter that:

  • Enforces global and per-tenant QPS
  • Smooths burst traffic
  • Shields you from hitting hard limits and getting payment/usage errors like “402 Payment Required”

What it does well:

  • Protects against vendor throttling and surprise bills:
    Before each call:

    await limiter.acquire("multion-global");
    
    const res = await fetch("https://api.multion.ai/v1/web/browse", { /* ... */ });
    

    You can:

    • Set conservative defaults globally (e.g., 50 req/s)
    • Override per workflow type (e.g., Retrieve for catalog extraction might get a higher allowance than expensive multi-step checkout flows)
    • Temporarily lower limits if you observe a spike in 402s or 429-like responses
  • Works with any queue/worker architecture:
    This layer is orthogonal:

    • Put it inside your worker
    • Put it in a gateway/proxy service all workers route through
    • Or use a SaaS rate-limiter in front of your backend

    It gives you a single enforcement point for all Agent API and Retrieve traffic.

Tradeoffs & Limitations:

  • Adds latency and tuning overhead:
    If your limiter is too strict:

    • Jobs pile up in the queue
    • SLAs degrade
    • Workers sit idle waiting for tokens

    If it’s too loose:

    • You risk hitting financial or platform limits (e.g., more frequent “402 Payment Required” responses)
    • You may see partial workflows stalled mid-session due to upstream constraints

    You’ll need:

    • Observability on call volume per endpoint
    • Configurable limits per queue or workflow type
    • A path to override limits for high-priority workloads

Decision Trigger: Choose Token-Bucket Rate Limiter Around Agent API if you want strict control over cost and QPS and prioritize protecting MultiOn usage and budgets above raw latency.


3. Dedicated Session-Orchestrator Service (Best for long-lived, multi-step sessions)

A dedicated session-orchestrator service stands out for multi-step, long-running workflows where session_id is the real unit of reliability.

In this pattern:

  • A session-orchestrator service owns:
    • Creating new sessions (first POST /v1/web/browse without session_id)
    • Storing session_id + metadata (user, workflow, last step)
    • Enforcing ordering of steps per session
  • Workers never decide which session to use; they receive an already-resolved session_id from the orchestrator.

What it does well:

  • Strong guarantees on session continuity and ordering:
    This is crucial for flows like:

    • Amazon order:
      • Step 1: Search + add to cart
      • Step 2: Navigate to cart and verify items
      • Step 3: Checkout and confirm order
    • Posting on X:
      • Step 1: Log in (if needed)
      • Step 2: Compose tweet
      • Step 3: Verify the tweet is live

    Sample session-orchestrator flow:

    // Pseudocode: create or fetch session for a workflow instance
    async function getOrCreateSession(workflowRunId: string) {
      const existing = await db.sessions.findOne({ workflowRunId });
      if (existing) return existing.session_id;
    
      const res = await fetch("https://api.multion.ai/v1/web/browse", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "X_MULTION_API_KEY": process.env.MULTION_API_KEY!
        },
        body: JSON.stringify({
          url: "https://www.amazon.com/",
          cmd: "Open Amazon homepage."
        })
      });
    
      const data = await res.json();
      await db.sessions.insert({
        workflowRunId,
        session_id: data.session_id,
        createdAt: new Date()
      });
    
      return data.session_id;
    }
    

    Then every subsequent step uses session_id from this service. You can guarantee:

    • Exactly one step executes at a time for a given session
    • Sessions are reused for all steps of a workflow
    • Sessions are cleaned up when workflows complete
  • Encodes session lifetimes and reuse policies:
    You can enforce:

    • Maximum session age (e.g., 30 minutes of inactivity → create a new session)
    • Mapping of sessions to tenants (one session per tenant, or per workflow-run)
    • Safe tear-down: when a workflow is marked “done,” you can stop reusing its session

Tradeoffs & Limitations:

  • More moving parts than most teams need at small scale:
    You’re introducing:

    • A new service
    • A sessions table or store
    • Coordination logic between queue, workers, and this orchestrator

    For simple one-shot tasks (e.g., “open H&M product page and extract product details with Retrieve”) this is overkill; a single Agent API call plus Retrieve is fine, no persistent session_id required.

Decision Trigger: Choose Dedicated Session-Orchestrator Service if you want robust handling of long-lived, multi-step workflows and prioritize session continuity and step ordering over architectural simplicity.


How to Combine These Patterns in a Real MultiOn Stack

In practice, you’ll likely use all three patterns together:

  1. Job Queue + Stateless Workers as the backbone

    • Each job is a logical user workflow: “checkout order,” “post tweet,” “extract catalog page.”
    • Workers:
      • Use the Agent API (V1 Beta) to control the browser
      • Use Retrieve to turn dynamic pages into structured JSON arrays of objects
      • Keep no local browser state—MultiOn’s secure remote sessions handle that.
  2. Rate Limiter Around All MultiOn Calls

    • Wrap calls to:
      • POST https://api.multion.ai/v1/web/browse (Agent API)
      • Retrieve endpoint (for data extraction with renderJs, scrollToBottom, maxItems)
    • Enforce global and per-tenant limits and react to “402 Payment Required” and similar responses with:
      • Backoff
      • Lowered limits
      • Temporary queues for low-priority jobs
  3. Session-Orchestrator for Critical Multi-Step Flows

    • Use in front of flows like:
      • Amazon end-to-end ordering
      • X posting + verification
      • Any login-gated workflow where session reuse matters
    • Persist and reuse session_id until the workflow is done.

Example: High-Volume H&M Catalog Extraction

Goal: convert dynamic H&M category pages into structured JSON arrays of objects, at scale.

  • Queue:

    • Jobs like { type: "hm_extract", pageUrl: "...", page: 1 }
  • Worker:

    • Calls Retrieve with controls:

      const res = await fetch("https://api.multion.ai/v1/retrieve", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "X_MULTION_API_KEY": process.env.MULTION_API_KEY!
        },
        body: JSON.stringify({
          url: job.pageUrl,
          instructions: "Extract all product cards as JSON with fields: name, price, colors, productUrl, imageUrl.",
          renderJs: true,
          scrollToBottom: true,
          maxItems: 50
        })
      });
      
      const data = await res.json(); // JSON array of product objects
      
    • No need for long-lived sessions here; each job is independent.

  • Rate Limiter:

    • Caps Retrieve calls to stay within budget and avoid upstream throttling.

Example: Amazon Checkout Across Multiple Steps

Goal: multi-step purchase via a long-lived session.

  • Session-orchestrator creates/returns session_id for workflowRunId.
  • Queue:
    • amazon_order:add_to_cart
    • amazon_order:review_cart
    • amazon_order:checkout
  • Workers:
    • Always call Agent API with the orchestrator-provided session_id.
  • Rate Limiter:
    • Only a controlled number of expensive checkout workflows run concurrently.

Final Verdict

For MultiOn concurrency, the safest default is to treat sessions as durable state, calls as metered resources, and workers as stateless executors:

  • Use a central job queue + stateless workers as your backbone for running many parallel agents.
  • Wrap Agent API (V1 Beta) and Retrieve calls in a token-bucket rate limiter so you don’t learn about limits from “402 Payment Required” logs.
  • Introduce a dedicated session-orchestrator when workflows span multiple steps and you need tight control over session_id reuse and ordering.

If your current stack is a pile of brittle Playwright/Selenium scripts, this architecture gives you the same coverage—Amazon ordering, X posting, H&M extraction—without maintaining a browser farm. Intent goes in (cmd + url + optional session_id), MultiOn’s secure remote sessions execute in the browser with native proxy support, and you get structured JSON or workflow completion out.

Next Step

Get Started