MultiOn Step mode: how do I implement a multi-step flow (start → step → step) and handle retries/timeouts?
On-Device Mobile AI Agents

MultiOn Step mode: how do I implement a multi-step flow (start → step → step) and handle retries/timeouts?

12 min read

Most multi-step browser flows don’t fail because “the AI is dumb.” They fail because session state is lost, steps aren’t idempotent, and retries are bolted on after the fact. MultiOn’s Step mode is designed to flip that: you keep a single session_id alive across calls, move the agent through discrete steps, and build clear retry/timeouts around the same remote browser session.

This guide walks through how to implement a start → step → step flow with Step mode, and how to handle retries and timeouts in a way that won’t crumble the moment a login page is slow or Amazon decides to add one more interstitial.


What Step mode actually does

With MultiOn’s Agent API (V1 Beta), Step mode:

  • Spins up a secure remote session in a real browser.
  • Returns a session_id you reuse across subsequent calls.
  • Lets you advance the same agent/browser with new cmd instructions, step by step.
  • Gives you structured outputs (page state, optional JSON) at each step.

Think of it as:

Intent in (cmd + url) → actions executed in a persistent browser → structured output + session_id out.

You wire your own retry and timeout logic around calls to POST https://api.multion.ai/v1/web/browse.


Core call pattern: start → step → step

The minimal pattern looks like this:

  1. Start: create a session and run the first command.
  2. Step: continue with the same session_id and a new command.
  3. Step again: repeat until the workflow is complete, or you decide to close the session.

Everything else—retries, timeouts, idempotency—is built around this thread of session_id.

1. Start the session (login, initial navigation, or first action)

Use POST https://api.multion.ai/v1/web/browse with:

  • headers:
    • X_MULTION_API_KEY: YOUR_API_KEY
    • Content-Type: application/json
  • body:
    • url: where you want the agent to start
    • cmd: natural-language instruction
    • mode: "step"

Example: start an Amazon session and search for a product.

curl -X POST "https://api.multion.ai/v1/web/browse" \
  -H "X_MULTION_API_KEY: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com",
    "cmd": "Search for noise cancelling headphones and open the results page.",
    "mode": "step"
  }'

Typical response shape (simplified):

{
  "session_id": "sess_abc123",
  "status": "success",
  "page_url": "https://www.amazon.com/s?k=noise+cancelling+headphones",
  "summary": "Opened Amazon and navigated to the search results for noise cancelling headphones."
}

Key artifact: session_id. From this point on, that ID is your unit of reliability.

2. First step: refine the action in the same session

You now call the same endpoint, but you:

  • omit url (or keep it if you want to assert a target),
  • pass session_id,
  • send a new cmd.

Example: pick a specific product and open its detail page.

curl -X POST "https://api.multion.ai/v1/web/browse" \
  -H "X_MULTION_API_KEY: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "sess_abc123",
    "cmd": "From the current results, open the product with the highest rating under $250.",
    "mode": "step"
  }'

Response (simplified):

{
  "session_id": "sess_abc123",
  "status": "success",
  "page_url": "https://www.amazon.com/dp/B0XXXXXXX",
  "summary": "Opened the detail page for a top-rated noise cancelling headset under $250."
}

3. Second step: complete the flow

Example: add to cart and proceed to checkout.

curl -X POST "https://api.multion.ai/v1/web/browse" \
  -H "X_MULTION_API_KEY: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "sess_abc123",
    "cmd": "Add this item to the cart and go to the checkout page.",
    "mode": "step"
  }'

Response (simplified):

{
  "session_id": "sess_abc123",
  "status": "success",
  "page_url": "https://www.amazon.com/checkout",
  "summary": "Added the item to the cart and navigated to the checkout page."
}

Your service now has a single logical “Amazon checkout flow” spanning three calls, stitched together by session_id.


Example: multi-step flow in Node.js

Below is a minimal implementation that:

  • Starts a session.
  • Performs two additional steps.
  • Wraps each call with timeout and retry logic.
npm install axios
// multion-step-flow.js
const axios = require("axios");

const MULTION_API_KEY = process.env.MULTION_API_KEY;
const BASE_URL = "https://api.multion.ai/v1/web/browse";

async function callMultion(payload, { timeoutMs = 60000 } = {}) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const res = await axios.post(BASE_URL, payload, {
      headers: {
        "X_MULTION_API_KEY": MULTION_API_KEY,
        "Content-Type": "application/json"
      },
      signal: controller.signal
    });
    return res.data;
  } catch (err) {
    if (axios.isAxiosError(err) && err.response) {
      if (err.response.status === 402) {
        throw new Error("MultiOn returned 402 Payment Required – check billing/usage.");
      }
      throw new Error(
        `MultiOn error ${err.response.status}: ${JSON.stringify(err.response.data)}`
      );
    }
    if (err.name === "AbortError") {
      throw new Error("MultiOn request timed out.");
    }
    throw err;
  } finally {
    clearTimeout(timeout);
  }
}

async function withRetries(fn, { maxRetries = 2, backoffMs = 1000 } = {}) {
  let lastError;
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      lastError = err;
      if (attempt === maxRetries) break;
      await new Promise((r) => setTimeout(r, backoffMs * (attempt + 1)));
    }
  }
  throw lastError;
}

async function runAmazonCheckoutFlow() {
  // 1. Start: search
  const start = await withRetries(
    () =>
      callMultion({
        url: "https://www.amazon.com",
        cmd: "Search for noise cancelling headphones and open the results page.",
        mode: "step"
      }),
    { maxRetries: 1 }
  );

  const sessionId = start.session_id;
  if (!sessionId) throw new Error("No session_id returned from MultiOn.");

  // 2. Step: open best product
  const step1 = await withRetries(
    () =>
      callMultion({
        session_id: sessionId,
        cmd: "From the current results, open the product with the highest rating under $250.",
        mode: "step"
      }),
    { maxRetries: 1 }
  );

  // 3. Step: add to cart and go to checkout
  const step2 = await withRetries(
    () =>
      callMultion({
        session_id: sessionId,
        cmd: "Add this item to the cart and go to the checkout page.",
        mode: "step"
      }),
    { maxRetries: 2 }
  );

  return { start, step1, step2 };
}

runAmazonCheckoutFlow()
  .then((result) => {
    console.log("Flow completed:", {
      startUrl: result.start.page_url,
      productUrl: result.step1.page_url,
      checkoutUrl: result.step2.page_url
    });
  })
  .catch((err) => {
    console.error("Flow failed:", err.message);
  });

This is the “happy path” pattern you’ll reuse: a wrapper around the Agent API, plus a retry shell.


Handling retries correctly in Step mode

Retries are where most teams accidentally double-submit forms or add three copies of an item to a cart. With Step mode, you get a few levers to make retries safe.

1. Treat each step as a transactional unit

Design each cmd so that:

  • It detects state before acting.
  • It navigates to a stable target if needed.
  • It exits early if the action is already done.

Example command for adding to cart idempotently:

“If this item is not already in the cart, add it once, then go to the checkout page. If it is already in the cart, just go to the checkout page. Do not change the quantity.”

This way, if your callMultion wrapper retries after a network timeout, you won’t end up with 3x quantity.

2. Use session_id to distinguish safe vs unsafe retries

Two patterns:

  • Safe retry: same session_id, same cmd, same step. You assume the previous call didn’t finish.
  • Unsafe retry: new session_id. The agent might be at a different page/state (fresh session). Only do this after you explicitly decide to “restart the flow.”

You can encode this in your code:

  • For transient errors (network blip, 5xx proxied upstream), retry with the same session_id.
  • For hard errors (invalid session, repeated navigation failure, or business-level error from your own app), discard the session_id and restart the workflow from step 1.

3. Retry budget per step

Not all steps deserve the same retry policy. For example:

  • Login: 1–2 retries max before failing hard; you don’t want to lock accounts.
  • Pagination or scrolling: more retries are usually fine.
  • “Place order” actions: no automatic retries unless the command is explicitly idempotent and checks for confirmation state.

Encode this per step:

const STEP_POLICIES = {
  login: { maxRetries: 1, timeoutMs: 90000 },
  search: { maxRetries: 2, timeoutMs: 60000 },
  addToCart: { maxRetries: 1, timeoutMs: 60000 },
  placeOrder: { maxRetries: 0, timeoutMs: 120000 }
};

Then pass the policy into your withRetries wrapper for each cmd.


Handling timeouts around MultiOn Step mode

Timeouts live at two levels:

  1. HTTP request timeout: how long you wait for MultiOn to respond.
  2. Business-level step timeout: how long you tolerate a slow or stuck step before you mark the whole step as failed.

1. HTTP timeout

As shown in the Node example, you can use:

  • An AbortController (fetch-style) or an HTTP client-level timeout.
  • A per-step timeoutMs so login can run longer than a simple click.

Recommended starting points:

  • Simple navigation / click steps: 30–60 seconds.
  • Heavy pages (bot-protected, lots of JS): 60–90 seconds.
  • Checkouts or large forms: 90–120 seconds, depending on your risk tolerance.

When you hit a timeout:

  • Treat it as an ambiguous result: the action may or may not have completed.
  • Only retry steps that are idempotent by design, with a cmd that checks current state first.

2. Business-level timeouts and circuit breaking

At a higher level (your own service boundary):

  • Set a max time budget for the entire multi-step flow (e.g., 3–5 minutes).
  • Track elapsed time across steps using logs or tracing.
  • If you hit the budget, abort the flow, record the session_id and last page_url, and surface a clear error to your user.

That prevents zombie flows where you’re indefinitely retrying a page that is simply blocked by aggressive bot protection or captchas.


Error patterns to expect (and how to react)

From an ops perspective, you’ll see a few recurring classes of issues.

1. MultiOn API-level errors

  • 402 Payment Required: your billing or quota is blocking the session.
    • Action: fail fast, do not retry. Tell the user or ops that capacity is exhausted.
  • 4xx request errors (bad headers, missing fields):
    • Action: treat as a bug in your integration; fix the code, not the flow.
  • 5xx errors:
    • Action: classify as transient; retry with backoff and the same session_id.

2. Session issues

Symptoms:

  • Response contains a message that implies the session is invalid or closed.
  • Returned session_id is missing or changed unexpectedly.

Actions:

  • Log the session_id and your internal job ID for forensics.
  • Fail the current flow and, if safe, start a new session from step 1.
  • Guard against accidental cross-session mixing (don’t reuse session_id across flows).

3. Bot protection / rendering issues

These surface less as clean errors and more as “the agent didn’t get where we expected.”

Your detection tools:

  • The returned page_url.
  • The summary / description of the agent’s last actions.
  • Structured extraction results if you combine Step mode with Retrieve.

Actions:

  • If you suspect a bot-protection wall, allow one retry with the same session_id.
  • If still blocked, stop the flow and escalate (you may need to adjust native proxy settings or accept that this domain is too restrictive today).

Combining Step mode with Retrieve for structured outputs

Many flows aren’t just about clicks; you also need structured JSON at certain steps. You can pair Step mode with MultiOn’s Retrieve feature for this.

Typical pattern:

  1. Use Step mode to navigate and interact (e.g., log in to an H&M account and open a catalog page).

  2. Call Retrieve against the current page to pull structured data:

    • JSON arrays of objects
    • Controls like renderJs, scrollToBottom, maxItems to handle dynamic lists.

At a high level:

POST https://api.multion.ai/v1/web/retrieve
  -H "X_MULTION_API_KEY: YOUR_API_KEY"
  -d '{
    "session_id": "sess_abc123",
    "renderJs": true,
    "scrollToBottom": true,
    "maxItems": 50
  }'

The output is a JSON array of objects (e.g., for H&M: [{ "name": "...", "price": "...", "color": "...", "url": "...", "image": "..." }, ...]), which you can combine with the session-driven actions from Step mode.

Retries and timeouts apply the same way: wrap the Retrieve call with the same callMultion/withRetries pattern, and treat it as read-only (safe to retry).


Practical design checklist for Step mode flows

When you implement a multi-step flow with Step mode and want resilient retries/timeouts, walk through this list:

  1. Session-first design

    • Always capture and log session_id on the start call.
    • Pass the same session_id for all subsequent steps in the flow.
    • Never mix session_id between concurrent flows.
  2. Step-level contracts

    • Define each cmd to be idempotent when retried.
    • Make the agent check state (“if already X, do not repeat Y”).
    • Bound each step with a specific goal (search, select, login, confirm).
  3. Timeouts

    • Use per-step HTTP timeouts via AbortController or client-level options.
    • Define a total flow time budget and enforce it.
    • Treat timeouts as ambiguous; only retry steps designed to be safe.
  4. Retry strategy

    • Wrap MultiOn calls in a generic withRetries helper.
    • Tune maxRetries and backoff per step type.
    • Avoid automatic retries on irreversible operations (e.g., “place order”) unless the command is explicitly idempotent.
  5. Error handling and observability

    • Special-case 402 Payment Required and fail fast.
    • Log session_id, page_url, step name, and error for each failure.
    • Use those logs to debug brittle steps (slow logins, dynamic UIs, bot walls).

Moving from brittle scripts to Step mode flows

If you’ve been maintaining Playwright/Selenium for multi-step flows, the translation usually looks like this:

  • The test scenario becomes a series of natural-language cmd instructions.
  • Your browser cluster becomes MultiOn’s secure remote sessions.
  • Your session handling and selectors collapse into a single session_id and agent-driven actions.
  • Your retries and timeouts move from per-selector hacks to per-step policies around POST https://api.multion.ai/v1/web/browse.

To start implementing a flow:

  1. Install the SDK or use raw HTTP.
  2. Implement the three-call pattern (start → step → step) with logging of session_id.
  3. Wrap calls with your retry/timeout helpers.
  4. Add one flow at a time (e.g., Amazon search → product → checkout, or posting on X).
  5. Extend to parallel execution by running many flows concurrently, each with its own session_id.

Once that’s in place, you’ve converted fragile, selector-heavy scripts into intent-driven Step mode flows with explicit retry and timeout behavior—without owning your own “remote Chrome farm.”

Get Started