How do I use MultiOn Retrieve to extract a JSON array of objects from a JS-heavy page (renderJs + scrollToBottom + maxItems)?
On-Device Mobile AI Agents

How do I use MultiOn Retrieve to extract a JSON array of objects from a JS-heavy page (renderJs + scrollToBottom + maxItems)?

9 min read

JS-heavy pages are where traditional scrapers go to die: infinite scroll, delayed rendering, and content that only appears after a few client-side events. MultiOn’s Retrieve API is designed to handle exactly this pattern: you send a url, tell the agent to render JavaScript, scroll the page to completion, and return a JSON array of objects that matches your schema.

Below is a practical, implementation-focused guide on how to use MultiOn Retrieve with renderJs, scrollToBottom, and maxItems to reliably extract a JSON array of objects from dynamic pages.


What MultiOn Retrieve actually does

Retrieve is MultiOn’s “intent → structured JSON” surface.

  • You pass:
    • A url
    • A description of what you want (fields, structure)
    • Optional controls like renderJs, scrollToBottom, and maxItems
  • MultiOn runs a real browser in a secure remote session, loads the page, executes JS, scrolls as configured, and returns:
    • A JSON array of objects, shaped to your requested schema

This is not a selector-based scraper. You don’t maintain CSS/XPath. You declare the output, and MultiOn navigates the DOM in a real browser context.


When to use renderJs, scrollToBottom, and maxItems

For JS-heavy pages, these three parameters are your main levers:

  • renderJs:
    Tell MultiOn to fully execute client-side JavaScript. Use this when:

    • Content is rendered by React/Vue/Next.js
    • Data appears after initial load
    • Static HTML is mostly placeholders or skeletons
  • scrollToBottom:
    Instructs the agent to scroll down and trigger lazy loading. Use this when:

    • The page loads more results on scroll
    • Product lists, feeds, or catalogs extend as you move down
  • maxItems:
    Limits how many objects you want in the returned JSON array. Use this when:

    • You only need a sample (e.g., first 50 products)
    • You want predictable payload sizes
    • You’re building a paginated or incremental ingestion pipeline

Together, they give you a stable pattern:

“Render the JS, scroll until the list stabilizes (or hits a reasonable depth), then give me a JSON array of up to N items, each with fields A/B/C.”


Minimal Retrieve call shape

The exact endpoint may evolve, but conceptually, a Retrieve call looks like this:

POST https://api.multion.ai/v1/retrieve
X_MULTION_API_KEY: YOUR_API_KEY
Content-Type: application/json

Body (conceptual):

{
  "url": "https://example.com/js-heavy-list",
  "renderJs": true,
  "scrollToBottom": true,
  "maxItems": 50,
  "schema": {
    "description": "Extract a JSON array of objects representing items on the page.",
    "fields": {
      "title": "string",
      "price": "string",
      "url": "string",
      "image": "string"
    }
  }
}

Response (conceptual):

{
  "items": [
    {
      "title": "Item 1",
      "price": "$19.99",
      "url": "https://example.com/item-1",
      "image": "https://example.com/item-1.jpg"
    },
    {
      "title": "Item 2",
      "price": "$24.99",
      "url": "https://example.com/item-2",
      "image": "https://example.com/item-2.jpg"
    }
  ]
}

The important piece for your application is that items is a JSON array of objects. You integrate that directly into your pipeline without worrying about selectors.


Step-by-step: Extracting a product list from a JS-heavy page

Let’s walk a concrete flow like the H&M catalog pattern, which is representative of ecommerce and similar JS-driven lists.

1. Define the JSON shape you need

Start from your downstream use case. For a product catalog, you might want:

  • name
  • price
  • productUrl
  • imageUrl
  • colors (array of strings)
  • inStock (boolean)

Schema description for Retrieve:

"schema": {
  "description": "Return a JSON array of product objects from this page.",
  "fields": {
    "name": "string - product name as shown on the page",
    "price": "string - price exactly as displayed, including currency symbol",
    "productUrl": "string - absolute URL to the product detail page",
    "imageUrl": "string - URL of the main product image",
    "colors": "string[] - list of color names if visible on this page",
    "inStock": "boolean - true if item appears available, false if sold out or unavailable"
  }
}

More explicit descriptions mean fewer ambiguities and more stable output.

2. Set renderJs for JS-heavy pages

For React/Vue/Next.js catalogs, always set:

"renderJs": true

This ensures the agent runs:

  • Initial JS bundles
  • Client-side routing
  • Data-fetching hooks

Without renderJs: true, you’ll often see empty lists or placeholders because the content never “hydrated.”

3. Use scrollToBottom to trigger lazy loading

If the page loads more products on scroll, set:

"scrollToBottom": true

This instructs the agent to:

  • Scroll down the page
  • Wait for new content to load
  • Continue until the bottom (or until a stability threshold is reached)

On most infinite-scroll designs, this will collect the full list that a user could see in a single scroll session.

4. Control volume with maxItems

You rarely want “everything forever” in one call, especially on massive catalogs. Use:

"maxItems": 100

This tells Retrieve: “Extract up to 100 product objects and stop.”

Benefits:

  • Predictable response size
  • Faster response times
  • Easier to batch or paginate your ingestion

5. Full example: JS-heavy catalog Retrieve call

curl -X POST https://api.multion.ai/v1/retrieve \
  -H "X_MULTION_API_KEY: $MULTION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.hm.com/us/category/ladies-new-arrivals",
    "renderJs": true,
    "scrollToBottom": true,
    "maxItems": 100,
    "schema": {
      "description": "Return a JSON array of product objects for women'\''s new arrivals.",
      "fields": {
        "name": "string - product name",
        "price": "string - price as displayed",
        "productUrl": "string - absolute URL to product page",
        "imageUrl": "string - main product image URL",
        "colors": "string[] - color options if shown",
        "inStock": "boolean - true if item appears orderable"
      }
    }
  }'

Expected response shape:

{
  "items": [
    {
      "name": "Rib-knit sweater",
      "price": "$34.99",
      "productUrl": "https://www.hm.com/us/productpage.1234567890.html",
      "imageUrl": "https://image.hm.com/assets/123/456.jpg",
      "colors": ["Beige", "Black"],
      "inStock": true
    },
    {
      "name": "Wide-leg trousers",
      "price": "$49.99",
      "productUrl": "https://www.hm.com/us/productpage.0987654321.html",
      "imageUrl": "https://image.hm.com/assets/789/012.jpg",
      "colors": ["Navy"],
      "inStock": true
    }
  ]
}

You now have a clean JSON array of objects ready for ingestion.


Using Retrieve from Node with the MultiOn SDK

For app integration, wire this into your backend or worker environment.

npm install multion

Example Node code:

import MultiOn from "multion";

const client = new MultiOn({ apiKey: process.env.MULTION_API_KEY! });

async function fetchCatalog() {
  const response = await client.retrieve({
    url: "https://www.hm.com/us/category/ladies-new-arrivals",
    renderJs: true,
    scrollToBottom: true,
    maxItems: 100,
    schema: {
      description: "JSON array of product objects from H&M catalog.",
      fields: {
        name: "string",
        price: "string",
        productUrl: "string",
        imageUrl: "string",
        colors: "string[]",
        inStock: "boolean"
      }
    }
  });

  const products = response.items; // JSON array of objects
  console.log(`Retrieved ${products.length} products`);
  return products;
}

fetchCatalog().catch(console.error);

This pattern gives you a repeatable, testable contract: a plain JS array of product objects.


Handling common JS-heavy edge cases

1. Content behind modals or banners

Some sites show cookie banners, age gates, or region selectors before content is usable.

Approach:

  • Clarify in your schema description that the agent should “dismiss popups or banners that block content.”
  • If needed, structure the instruction more explicitly (e.g., “Close any cookie or consent dialogs that prevent scrolling or product visibility.”).

The Retrieve agent operates in a real browser, so these are first-class interactions, not hacks.

2. Slow-loading sections

JS-heavy pages sometimes have delayed or staggered loads.

Recommendations:

  • Expect a slightly higher latency than static pages; you’re paying for real JS execution and scroll.
  • If content is still missing:
    • Re-check that renderJs is set to true.
    • Tighten your description so the agent knows which section to focus on (e.g., “main product grid” vs generic “products”).

3. Very large lists

If a single page shows hundreds or thousands of items:

  • Use maxItems to cap at a safe upper bound (e.g., 100–200 per call).
  • Combine with your own pagination strategy (e.g., multiple category URLs, query parameters, or filters) rather than trying to drain the entire catalog in one shot.

Comparing Retrieve to traditional Playwright/Selenium scraping

As someone who spent years maintaining Playwright/Selenium stacks:

  • With Playwright/Selenium, you:
    • Manage selectors for every page variant
    • Handle login flows, cookie banners, and infinite scroll manually
    • Maintain infra: sessions, proxies, bot protection, retries
  • With MultiOn Retrieve, you:
    • Describe the output JSON array of objects
    • Flip renderJs, scrollToBottom, and maxItems depending on page behavior
    • Let MultiOn’s secure remote sessions and native proxy support handle the operational side

The cost you’re optimizing for is engineering time and long-term brittleness, not just “can I get the data once.”


Recommended defaults for JS-heavy pages

If you’re building a general-purpose ingestion pipeline for modern sites, these are reliable defaults:

  • renderJs: true
    Assume React/Vue/SPA infrastructure.

  • scrollToBottom: true
    Assume lazy-loading or infinite-scroll lists.

  • maxItems: 50–200
    Start with 50 in development, increase as you validate timing and payloads.

  • Schema description:

    • Be explicit about:
      • The type of entity (product, post, listing)
      • The fields you need, including type and semantics
      • That the output must be a JSON array of objects

Example schema language:

“Return a JSON array of objects named items. Each object represents a single product in the main product grid.”

This helps the agent converge on a predictable top-level array.


Error handling and reliability signals

When you integrate Retrieve into a production system, you should treat it like any other critical API:

  • Check HTTP status codes:
    • 2xx: success
    • 4xx/5xx: handle, log, and retry as appropriate
    • Watch for 402 Payment Required in error flows; it’s an explicit signal that billing limits or quotas may be involved.
  • Validate the response:
    • Confirm items exists and is an array
    • Schema-check a few fields for type correctness
  • Instrument:
    • Log URLs, renderJs, scrollToBottom, maxItems settings
    • Track items.length to spot anomalies (e.g., sudden drop to 0 on a previously healthy page)

This gives you observability that’s comparable to a well-built Playwright/Selenium pipeline, without the infrastructure overhead.


Putting it together: A simple decision checklist

When you want to extract a JSON array of objects from a JS-heavy page using MultiOn Retrieve:

  1. Is the page JS-heavy or SPA-based?

    • Yes → set renderJs: true.
  2. Does content load on scroll?

    • Yes → set scrollToBottom: true.
  3. How many items do you need per call?

    • Set maxItems to a safe upper bound (start with 50–100).
  4. What does each object represent?

    • Define a schema with clear field names, types, and semantics.
  5. What does your downstream system expect?

    • Ensure the agent returns a top-level JSON array of objects (e.g., items), and wire that into your ingestion or processing pipeline.

If you follow this pattern, you get exactly what you wanted Playwright/Selenium to give you all along: reliable, structured JSON arrays of objects from real, JS-heavy pages—without owning the browser farm yourself.


Next Step

Get Started