MultiOn Step mode: how do I implement a multi-step flow (start → step → step) and handle retries/timeouts?
On-Device Mobile AI Agents

MultiOn Step mode: how do I implement a multi-step flow (start → step → step) and handle retries/timeouts?

11 min read

Most teams hit the limits of “single-shot” agents fast. Real product flows look more like: open site → log in → navigate → act → confirm. MultiOn’s Sessions + Step mode is built for exactly that shape: intent in, a real browser session kept alive, and controlled progression across multiple calls.

This guide walks through how to implement a start → step → step flow with the Agent API (V1 Beta), then harden it with retries, timeouts, and safe error handling.


Why Step mode exists (and what it actually does)

With Playwright/Selenium, you keep a browser alive and push it through a script. If something halfway through fails (login, 2FA, slow render), your selectors crumble and the run dies.

MultiOn flips this:

  • You send a high-level cmd (what you want) and a url (where to start) to
    POST https://api.multion.ai/v1/web/browse.
  • MultiOn runs that in a secure remote session (a real browser).
  • It gives you back:
    • A session_id you reuse on the next call.
    • A step result (what it did, what it found).

Step mode simply means: you keep that remote browser alive across calls by passing the same session_id, and you drive the workflow in controlled increments (start → step → step → finish) instead of one massive, opaque prompt.


Core pattern: start → step → step with session_id

1. Start the flow (create the session)

First call: you define the starting point and high-level goal.

curl -X POST "https://api.multion.ai/v1/web/browse" \
  -H "Content-Type: application/json" \
  -H "X_MULTION_API_KEY: $MULTION_API_KEY" \
  -d '{
    "url": "https://www.amazon.com/",
    "cmd": "Search for noise cancelling headphones and open the product page for the first organic result.",
    "mode": "step"
  }'

Key points:

  • mode: "step" (or the equivalent in the client SDK) tells MultiOn you’re driving a stepwise flow.
  • You don’t pass session_id yet; the platform will allocate one.

A typical response shape looks like:

{
  "session_id": "sess_12345abcde",
  "status": "success",
  "step": {
    "description": "Opened Amazon homepage, searched for 'noise cancelling headphones', and opened the first result.",
    "url": "https://www.amazon.com/dp/EXAMPLE",
    "screenshot": "https://...",
    "metadata": { /* page details, DOM info, etc. */ }
  }
}

From now on, session_id is your unit of continuity. Treat it like a browser handle.

2. Second step (continue within the same browser)

Next, you tell the agent what to do from the current page and pass the session_id to stay in the same secure remote session.

curl -X POST "https://api.multion.ai/v1/web/browse" \
  -H "Content-Type: application/json" \
  -H "X_MULTION_API_KEY: $MULTION_API_KEY" \
  -d '{
    "session_id": "sess_12345abcde",
    "cmd": "Add this item to cart and proceed to checkout until the final order review page.",
    "mode": "step"
  }'

Response:

{
  "session_id": "sess_12345abcde",
  "status": "success",
  "step": {
    "description": "Added item to cart and navigated to the order review page.",
    "url": "https://www.amazon.com/gp/buy/spc/handlers/display.html",
    "metadata": {
      "summary": "Order review page with price, shipping address, and payment method.",
      "total_price": "$219.99"
    }
  }
}

The important part: same session_id, new page state, same remote browser.

3. Third step (finalize or inspect)

You can continue chaining steps until the flow is done:

curl -X POST "https://api.multion.ai/v1/web/browse" \
  -H "Content-Type: application/json" \
  -H "X_MULTION_API_KEY: $MULTION_API_KEY" \
  -d '{
    "session_id": "sess_12345abcde",
    "cmd": "If everything looks correct, place the order. Then capture the order confirmation number.",
    "mode": "step"
  }'

Now you’ve implemented:

  • Start: discover product.
  • Step: add to cart and reach review.
  • Step: confirm and extract structured confirmation data.

You can apply the exact same pattern to other flows like posting on X or navigating a KYC portal—always: create session_id → reuse session_id for every step.


Example: multi-step X posting flow (with confirmation)

Here’s what it looks like in Node.js with the MultiOn client, modeling a “compose → edit → post → confirm” sequence.

npm install multion
import Multion from "multion";

const client = new Multion({ apiKey: process.env.MULTION_API_KEY! });

async function postOnX() {
  // 1. Start: open X and draft a post
  const start = await client.web.browse({
    url: "https://x.com",
    cmd: "Log in if needed and open the composer with a draft saying: 'Shipping our new AI agent demo today.'",
    mode: "step"
  });

  if (start.status !== "success") throw new Error("Failed to start X session");
  const sessionId = start.session_id;

  // 2. Step: update the draft text
  const edit = await client.web.browse({
    session_id: sessionId,
    cmd: "Update the draft to: 'Shipping our new browser-operating AI agents today. Full demo link in reply.'",
    mode: "step"
  });

  if (edit.status !== "success") throw new Error("Failed to edit draft");

  // 3. Step: post and confirm
  const post = await client.web.browse({
    session_id: sessionId,
    cmd: "Post the tweet, then open my profile and confirm that the latest tweet matches the updated draft.",
    mode: "step"
  });

  if (post.status !== "success") throw new Error("Failed to post tweet");

  return {
    sessionId,
    confirmation: post.step?.metadata
  };
}

postOnX().catch(console.error);

This is the practical pattern you’ll reuse for any multi-step browser automation with MultiOn.


Implementing retries and timeouts around Step mode

You don’t control network, third-party site latency, or bot protections. You do control how your app wraps the Agent API.

Think in three layers:

  1. Request-level timeout: how long you let a single web.browse call run.
  2. Retry policy: if a call fails or times out, when and how to retry.
  3. Session strategy: whether retries reuse the same session_id or start fresh.

1. Setting timeouts per call

At the HTTP layer, always set a sane timeout (e.g., 30–90 seconds depending on the workflow). In Node.js using fetch:

async function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
  const timeout = new Promise<never>((_, reject) =>
    setTimeout(() => reject(new Error(`Timeout after ${ms}ms`)), ms)
  );
  return Promise.race([promise, timeout]);
}

async function stepWithTimeout(payload: any, timeoutMs = 60000) {
  return withTimeout(
    client.web.browse(payload),
    timeoutMs
  );
}

Usage:

const result = await stepWithTimeout({
  session_id: sessionId,
  cmd: "Proceed to checkout and stop at the payment details step.",
  mode: "step"
}, 60000);

If the call overruns the timeout, you decide what to do next—retry with same session, or abandon.

2. Retry strategy: when to retry vs. fail fast

Some failures are worth retrying; some should bubble up:

  • Transient (safe to retry):

    • Network issues
    • Temporary 5xx from MultiOn or the target site
    • Browser session hiccups where the session is still valid
  • Permanent or critical (don’t auto-retry blindly):

    • Authentication failure (bad credentials)
    • Bot protection blocks that require human input
    • Billing issues (e.g., MultiOn responds with 402 Payment Required)

Design your wrapper to examine:

  • HTTP status codes (4xx vs 5xx vs 402).
  • Response status field (e.g., "success", "error").
  • Any error message/body MultiOn returns.

A simple exponential backoff example:

type BrowsePayload = {
  url?: string;
  session_id?: string;
  cmd: string;
  mode: "step";
};

async function safeStep(
  payload: BrowsePayload,
  {
    maxRetries = 3,
    baseDelayMs = 2000,
    timeoutMs = 60000
  } = {}
) {
  let attempt = 0;

  while (true) {
    attempt++;

    try {
      const result = await withTimeout(
        client.web.browse(payload),
        timeoutMs
      );

      if (result.status === "success") return result;

      // Inspect error shape from MultiOn, if available
      const code = (result as any).error?.code;
      const message = (result as any).error?.message || "";

      // Fail fast on payment issues or obvious hard stops
      if (code === 402 || /Payment Required/i.test(message)) {
        throw new Error("MultiOn billing issue (402 Payment Required). Abort.");
      }

      if (attempt > maxRetries) {
        throw new Error(`Step failed after ${maxRetries} attempts: ${message}`);
      }

      const delay = baseDelayMs * Math.pow(2, attempt - 1);
      await new Promise((r) => setTimeout(r, delay));

    } catch (err: any) {
      if (attempt > maxRetries) throw err;

      // Network/timeouts – backoff and retry
      const delay = baseDelayMs * Math.pow(2, attempt - 1);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

You then use safeStep instead of calling client.web.browse directly:

const start = await safeStep({
  url: "https://www.amazon.com",
  cmd: "Open Amazon and search for 'noise cancelling headphones'.",
  mode: "step"
});

const sessionId = start.session_id;

const step2 = await safeStep({
  session_id: sessionId,
  cmd: "Open the first organic result product page.",
  mode: "step"
});

3. Session-aware retry logic

Not all retries should reuse the same session_id.

  • Reuse session_id when:

    • You hit a transient error but the session is likely intact.
    • You’re in the middle of a multistep flow and want continuity preserved.
  • Drop session_id and restart when:

    • You suspect the session is corrupted or expired.
    • The site clearly kicked you back to a login or error page.
    • Bot protection raised friction that likely requires a fresh session.

Pattern:

async function resilientFlow() {
  // Start new session
  const start = await safeStep({
    url: "https://www.amazon.com",
    cmd: "Log in if needed and open my homepage.",
    mode: "step"
  });

  let sessionId = start.session_id;

  try {
    const wishlistStep = await safeStep({
      session_id: sessionId,
      cmd: "Navigate to my wishlist and open the most recently saved item.",
      mode: "step"
    });

    const checkoutStep = await safeStep({
      session_id: sessionId,
      cmd: "Add the item to cart and proceed to the order review page.",
      mode: "step"
    });

    return { sessionId, checkoutStep };
  } catch (err) {
    // Fallback: start fresh session once if needed
    const restart = await safeStep({
      url: "https://www.amazon.com",
      cmd: "Log in and go straight to my wishlist, then open the most recently saved item.",
      mode: "step"
    });

    sessionId = restart.session_id;

    const retryCheckout = await safeStep({
      session_id: sessionId,
      cmd: "Add the item to cart and proceed to the order review page.",
      mode: "step"
    });

    return { sessionId, retryCheckout };
  }
}

This is more robust than blindly hammering the same broken session.


Handling long flows and explicit completion

For longer flows (e.g., full fintech KYC, multi-page forms), Step mode lets you:

  • Split the flow into milestones:
    • Start session → complete part A → confirm → part B → confirm → finalize.
  • Decide after each step whether to:
    • Continue, adjust the next cmd, or abort.
    • Serialize state into your own DB (e.g., last step description, URL, extracted metadata).

A practical convention:

  • Use cmd that includes both the action and a clear stop condition.
    Example:
    "Fill in all mandatory fields on this KYC page using the provided applicant data, but do not submit yet. Stop when the form shows no validation errors."
  • On the next step, you can safely say:
    "Review the completed form, then submit the application and stop when a confirmation page is visible. Extract any confirmation IDs."

If you need structured data at intermediate steps (e.g., summary of the order, item list), combine Step mode with the Retrieve-like pattern: instruct the agent to read the page and return structured JSON in metadata or a similar field.


Operational tips from “selector PTSD”

Coming from Playwright/Selenium, the main shift is: you’re no longer hand-writing selectors, but you are still responsible for reliability. A few habits help:

  • Log every step: store session_id, cmd, status, step.description, and the url after each call. This becomes your postmortem trail.
  • Guardrails in language: be explicit in cmds about what not to do (e.g., “do not confirm the order yet”, “do not change any shipping address, only read it”). Step mode respects clear intent.
  • Wrap billing errors: if you see a 402 Payment Required from MultiOn, treat it as a system alert, not a user-level retry. Surface that to ops immediately.
  • Use timeouts per workflow type: checkout flows can afford more time than quick X posts. Don’t use one global timeout for everything.

Putting it all together: a reference flow

Conceptually, any MultiOn multi-step flow using Step mode looks like this:

  1. Start

    • Call POST https://api.multion.ai/v1/web/browse with url, cmd, and mode: "step".
    • Capture session_id.
  2. Step

    • Call the same endpoint with session_id, a new cmd, and mode: "step".
    • Handle result (success/error), log state, update your own workflow state machine.
  3. Step (repeat as needed)

    • Reuse session_id for each subsequent step.
    • Adjust cmd based on prior outputs.
  4. Retries + timeouts

    • Wrap each call with:
      • Request timeout.
      • Exponential backoff retry for transient errors.
      • Branch logic for hard failures (auth, payment, bot protection).
  5. Complete

    • When the flow is done (order placed, tweet posted, form submitted), persist whatever structured output you need (confirmation IDs, URLs, JSON summaries) and discard the session_id.

Once you’ve implemented that wrapper once, you can plug in any high-level task—Amazon ordering, posting on X, H&M catalog navigation—and get reliable multi-step browser automation without maintaining a test farm or a selector graveyard.


Final Verdict

Use MultiOn’s Step mode whenever your workflow crosses more than one page or state: login, multi-step checkout, social posting, or any gate-heavy web UI. The pattern is simple: create a session_id, reuse it for each cmd, and wrap every call with explicit timeouts and retry logic that understands when to reuse a session and when to start over. That gives you the flexibility of a real browser, the safety of secure remote sessions with native proxy support for tricky bot protection, and the reliability you’d expect from a production automation stack—without rewriting brittle scripts every time the UI changes.

Next Step

Get Started