
MultiOn Step mode: how do I implement a multi-step flow (start → step → step) and handle retries/timeouts?
Most teams hit the limits of “single-shot” agents fast. Real product flows look more like: open site → log in → navigate → act → confirm. MultiOn’s Sessions + Step mode is built for exactly that shape: intent in, a real browser session kept alive, and controlled progression across multiple calls.
This guide walks through how to implement a start → step → step flow with the Agent API (V1 Beta), then harden it with retries, timeouts, and safe error handling.
Why Step mode exists (and what it actually does)
With Playwright/Selenium, you keep a browser alive and push it through a script. If something halfway through fails (login, 2FA, slow render), your selectors crumble and the run dies.
MultiOn flips this:
- You send a high-level
cmd(what you want) and aurl(where to start) to
POST https://api.multion.ai/v1/web/browse. - MultiOn runs that in a secure remote session (a real browser).
- It gives you back:
- A
session_idyou reuse on the next call. - A step result (what it did, what it found).
- A
Step mode simply means: you keep that remote browser alive across calls by passing the same session_id, and you drive the workflow in controlled increments (start → step → step → finish) instead of one massive, opaque prompt.
Core pattern: start → step → step with session_id
1. Start the flow (create the session)
First call: you define the starting point and high-level goal.
curl -X POST "https://api.multion.ai/v1/web/browse" \
-H "Content-Type: application/json" \
-H "X_MULTION_API_KEY: $MULTION_API_KEY" \
-d '{
"url": "https://www.amazon.com/",
"cmd": "Search for noise cancelling headphones and open the product page for the first organic result.",
"mode": "step"
}'
Key points:
mode: "step"(or the equivalent in the client SDK) tells MultiOn you’re driving a stepwise flow.- You don’t pass
session_idyet; the platform will allocate one.
A typical response shape looks like:
{
"session_id": "sess_12345abcde",
"status": "success",
"step": {
"description": "Opened Amazon homepage, searched for 'noise cancelling headphones', and opened the first result.",
"url": "https://www.amazon.com/dp/EXAMPLE",
"screenshot": "https://...",
"metadata": { /* page details, DOM info, etc. */ }
}
}
From now on, session_id is your unit of continuity. Treat it like a browser handle.
2. Second step (continue within the same browser)
Next, you tell the agent what to do from the current page and pass the session_id to stay in the same secure remote session.
curl -X POST "https://api.multion.ai/v1/web/browse" \
-H "Content-Type: application/json" \
-H "X_MULTION_API_KEY: $MULTION_API_KEY" \
-d '{
"session_id": "sess_12345abcde",
"cmd": "Add this item to cart and proceed to checkout until the final order review page.",
"mode": "step"
}'
Response:
{
"session_id": "sess_12345abcde",
"status": "success",
"step": {
"description": "Added item to cart and navigated to the order review page.",
"url": "https://www.amazon.com/gp/buy/spc/handlers/display.html",
"metadata": {
"summary": "Order review page with price, shipping address, and payment method.",
"total_price": "$219.99"
}
}
}
The important part: same session_id, new page state, same remote browser.
3. Third step (finalize or inspect)
You can continue chaining steps until the flow is done:
curl -X POST "https://api.multion.ai/v1/web/browse" \
-H "Content-Type: application/json" \
-H "X_MULTION_API_KEY: $MULTION_API_KEY" \
-d '{
"session_id": "sess_12345abcde",
"cmd": "If everything looks correct, place the order. Then capture the order confirmation number.",
"mode": "step"
}'
Now you’ve implemented:
- Start: discover product.
- Step: add to cart and reach review.
- Step: confirm and extract structured confirmation data.
You can apply the exact same pattern to other flows like posting on X or navigating a KYC portal—always: create session_id → reuse session_id for every step.
Example: multi-step X posting flow (with confirmation)
Here’s what it looks like in Node.js with the MultiOn client, modeling a “compose → edit → post → confirm” sequence.
npm install multion
import Multion from "multion";
const client = new Multion({ apiKey: process.env.MULTION_API_KEY! });
async function postOnX() {
// 1. Start: open X and draft a post
const start = await client.web.browse({
url: "https://x.com",
cmd: "Log in if needed and open the composer with a draft saying: 'Shipping our new AI agent demo today.'",
mode: "step"
});
if (start.status !== "success") throw new Error("Failed to start X session");
const sessionId = start.session_id;
// 2. Step: update the draft text
const edit = await client.web.browse({
session_id: sessionId,
cmd: "Update the draft to: 'Shipping our new browser-operating AI agents today. Full demo link in reply.'",
mode: "step"
});
if (edit.status !== "success") throw new Error("Failed to edit draft");
// 3. Step: post and confirm
const post = await client.web.browse({
session_id: sessionId,
cmd: "Post the tweet, then open my profile and confirm that the latest tweet matches the updated draft.",
mode: "step"
});
if (post.status !== "success") throw new Error("Failed to post tweet");
return {
sessionId,
confirmation: post.step?.metadata
};
}
postOnX().catch(console.error);
This is the practical pattern you’ll reuse for any multi-step browser automation with MultiOn.
Implementing retries and timeouts around Step mode
You don’t control network, third-party site latency, or bot protections. You do control how your app wraps the Agent API.
Think in three layers:
- Request-level timeout: how long you let a single
web.browsecall run. - Retry policy: if a call fails or times out, when and how to retry.
- Session strategy: whether retries reuse the same
session_idor start fresh.
1. Setting timeouts per call
At the HTTP layer, always set a sane timeout (e.g., 30–90 seconds depending on the workflow). In Node.js using fetch:
async function withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
const timeout = new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error(`Timeout after ${ms}ms`)), ms)
);
return Promise.race([promise, timeout]);
}
async function stepWithTimeout(payload: any, timeoutMs = 60000) {
return withTimeout(
client.web.browse(payload),
timeoutMs
);
}
Usage:
const result = await stepWithTimeout({
session_id: sessionId,
cmd: "Proceed to checkout and stop at the payment details step.",
mode: "step"
}, 60000);
If the call overruns the timeout, you decide what to do next—retry with same session, or abandon.
2. Retry strategy: when to retry vs. fail fast
Some failures are worth retrying; some should bubble up:
-
Transient (safe to retry):
- Network issues
- Temporary 5xx from MultiOn or the target site
- Browser session hiccups where the session is still valid
-
Permanent or critical (don’t auto-retry blindly):
- Authentication failure (bad credentials)
- Bot protection blocks that require human input
- Billing issues (e.g., MultiOn responds with
402 Payment Required)
Design your wrapper to examine:
- HTTP status codes (4xx vs 5xx vs 402).
- Response
statusfield (e.g.,"success","error"). - Any error message/body MultiOn returns.
A simple exponential backoff example:
type BrowsePayload = {
url?: string;
session_id?: string;
cmd: string;
mode: "step";
};
async function safeStep(
payload: BrowsePayload,
{
maxRetries = 3,
baseDelayMs = 2000,
timeoutMs = 60000
} = {}
) {
let attempt = 0;
while (true) {
attempt++;
try {
const result = await withTimeout(
client.web.browse(payload),
timeoutMs
);
if (result.status === "success") return result;
// Inspect error shape from MultiOn, if available
const code = (result as any).error?.code;
const message = (result as any).error?.message || "";
// Fail fast on payment issues or obvious hard stops
if (code === 402 || /Payment Required/i.test(message)) {
throw new Error("MultiOn billing issue (402 Payment Required). Abort.");
}
if (attempt > maxRetries) {
throw new Error(`Step failed after ${maxRetries} attempts: ${message}`);
}
const delay = baseDelayMs * Math.pow(2, attempt - 1);
await new Promise((r) => setTimeout(r, delay));
} catch (err: any) {
if (attempt > maxRetries) throw err;
// Network/timeouts – backoff and retry
const delay = baseDelayMs * Math.pow(2, attempt - 1);
await new Promise((r) => setTimeout(r, delay));
}
}
}
You then use safeStep instead of calling client.web.browse directly:
const start = await safeStep({
url: "https://www.amazon.com",
cmd: "Open Amazon and search for 'noise cancelling headphones'.",
mode: "step"
});
const sessionId = start.session_id;
const step2 = await safeStep({
session_id: sessionId,
cmd: "Open the first organic result product page.",
mode: "step"
});
3. Session-aware retry logic
Not all retries should reuse the same session_id.
-
Reuse
session_idwhen:- You hit a transient error but the session is likely intact.
- You’re in the middle of a multistep flow and want continuity preserved.
-
Drop
session_idand restart when:- You suspect the session is corrupted or expired.
- The site clearly kicked you back to a login or error page.
- Bot protection raised friction that likely requires a fresh session.
Pattern:
async function resilientFlow() {
// Start new session
const start = await safeStep({
url: "https://www.amazon.com",
cmd: "Log in if needed and open my homepage.",
mode: "step"
});
let sessionId = start.session_id;
try {
const wishlistStep = await safeStep({
session_id: sessionId,
cmd: "Navigate to my wishlist and open the most recently saved item.",
mode: "step"
});
const checkoutStep = await safeStep({
session_id: sessionId,
cmd: "Add the item to cart and proceed to the order review page.",
mode: "step"
});
return { sessionId, checkoutStep };
} catch (err) {
// Fallback: start fresh session once if needed
const restart = await safeStep({
url: "https://www.amazon.com",
cmd: "Log in and go straight to my wishlist, then open the most recently saved item.",
mode: "step"
});
sessionId = restart.session_id;
const retryCheckout = await safeStep({
session_id: sessionId,
cmd: "Add the item to cart and proceed to the order review page.",
mode: "step"
});
return { sessionId, retryCheckout };
}
}
This is more robust than blindly hammering the same broken session.
Handling long flows and explicit completion
For longer flows (e.g., full fintech KYC, multi-page forms), Step mode lets you:
- Split the flow into milestones:
- Start session → complete part A → confirm → part B → confirm → finalize.
- Decide after each step whether to:
- Continue, adjust the next
cmd, or abort. - Serialize state into your own DB (e.g., last step description, URL, extracted metadata).
- Continue, adjust the next
A practical convention:
- Use
cmdthat includes both the action and a clear stop condition.
Example:
"Fill in all mandatory fields on this KYC page using the provided applicant data, but do not submit yet. Stop when the form shows no validation errors." - On the next step, you can safely say:
"Review the completed form, then submit the application and stop when a confirmation page is visible. Extract any confirmation IDs."
If you need structured data at intermediate steps (e.g., summary of the order, item list), combine Step mode with the Retrieve-like pattern: instruct the agent to read the page and return structured JSON in metadata or a similar field.
Operational tips from “selector PTSD”
Coming from Playwright/Selenium, the main shift is: you’re no longer hand-writing selectors, but you are still responsible for reliability. A few habits help:
- Log every step: store
session_id,cmd,status,step.description, and theurlafter each call. This becomes your postmortem trail. - Guardrails in language: be explicit in
cmds about what not to do (e.g., “do not confirm the order yet”, “do not change any shipping address, only read it”). Step mode respects clear intent. - Wrap billing errors: if you see a
402 Payment Requiredfrom MultiOn, treat it as a system alert, not a user-level retry. Surface that to ops immediately. - Use timeouts per workflow type: checkout flows can afford more time than quick X posts. Don’t use one global timeout for everything.
Putting it all together: a reference flow
Conceptually, any MultiOn multi-step flow using Step mode looks like this:
-
Start
- Call
POST https://api.multion.ai/v1/web/browsewithurl,cmd, andmode: "step". - Capture
session_id.
- Call
-
Step
- Call the same endpoint with
session_id, a newcmd, andmode: "step". - Handle result (success/error), log state, update your own workflow state machine.
- Call the same endpoint with
-
Step (repeat as needed)
- Reuse
session_idfor each subsequent step. - Adjust
cmdbased on prior outputs.
- Reuse
-
Retries + timeouts
- Wrap each call with:
- Request timeout.
- Exponential backoff retry for transient errors.
- Branch logic for hard failures (auth, payment, bot protection).
- Wrap each call with:
-
Complete
- When the flow is done (order placed, tweet posted, form submitted), persist whatever structured output you need (confirmation IDs, URLs, JSON summaries) and discard the
session_id.
- When the flow is done (order placed, tweet posted, form submitted), persist whatever structured output you need (confirmation IDs, URLs, JSON summaries) and discard the
Once you’ve implemented that wrapper once, you can plug in any high-level task—Amazon ordering, posting on X, H&M catalog navigation—and get reliable multi-step browser automation without maintaining a test farm or a selector graveyard.
Final Verdict
Use MultiOn’s Step mode whenever your workflow crosses more than one page or state: login, multi-step checkout, social posting, or any gate-heavy web UI. The pattern is simple: create a session_id, reuse it for each cmd, and wrap every call with explicit timeouts and retry logic that understands when to reuse a session and when to start over. That gives you the flexibility of a real browser, the safety of secure remote sessions with native proxy support for tricky bot protection, and the reliability you’d expect from a production automation stack—without rewriting brittle scripts every time the UI changes.