AgentQL vs ScrapeGraphAI: which one keeps the same JSON output shape when sites change?
RAG Retrieval & Web Search APIs

AgentQL vs ScrapeGraphAI: which one keeps the same JSON output shape when sites change?

7 min read

Most teams comparing AgentQL vs ScrapeGraphAI are really asking one thing: when the site’s layout changes, which tool still returns the same JSON shape my code depends on? In other words, which one behaves more like a stable API contract instead of a fragile scraping script?

Quick Answer: AgentQL is explicitly designed to keep the same JSON output shape even as page structures and UIs change, using AI-driven “self-healing” selectors and schema-first queries. ScrapeGraphAI can extract structured data, but you’ll typically need to adjust prompts, XPaths, or pipeline logic as layouts evolve if you want a stable, reusable JSON schema over time.

Why This Matters

If your downstream systems (data pipelines, dashboards, LLM tools, or agents) expect a specific JSON schema, any break in that schema becomes a production incident: jobs fail, fields go missing, and your LLM grounding or analytics silently degrade. When you’re crawling tens or hundreds of domains, “fixing selectors” quickly becomes most of the job.

Choosing a tool that keeps the JSON output shape consistent turns web extraction into something closer to an API integration: define your schema once, harden it, and reuse it safely across similar pages—even as front‑end teams ship new designs.

Key Benefits:

  • Fewer breaking changes: Self-healing element location means your JSON fields remain stable even when the DOM changes.
  • Schema-first development: You design the output shape once and plug it into your code, instead of constantly patching parsers.
  • LLM-grounding friendly: Consistent JSON makes it easy to wire extractions into LLM tools/agents without worrying about hallucinations from raw HTML.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Schema-stable JSONA JSON structure whose keys and nesting remain the same across runs and UI changes.Lets you treat web extraction like an API contract; downstream code doesn’t break every time the site tweaks its layout.
Self-healing selectorsAgentQL’s AI-driven element targeting that adapts to DOM/CSS changes without you updating XPaths or CSS selectors.Keeps the same fields populating in your JSON even when the underlying HTML changes.
Schema-first queriesDefining the desired JSON shape up front (via AgentQL queries) rather than scraping HTML and parsing afterwards.Ensures consistent field names and structure, simplifies integration with LLMs and ETL jobs, and reduces parsing glue code.

How It Works (Step-by-Step)

At a high level, both AgentQL and ScrapeGraphAI help you get structured data from web pages, but they make very different tradeoffs around schema stability.

1. Define the JSON shape (AgentQL)

With AgentQL, you define the structure you want as a query—think of it as a schema expressed in a compact query language:

# Example AgentQL query for product listing pages
{
  products[] {
    product_name
    product_price(include currency symbol)
    product_rating
    product_url
  }
}

AgentQL uses this query as the contract: it will try to fill product_name, product_price, product_rating, and product_url on any similar product page or category page you point it at.

Typical JSON output:

{
  "products": [
    {
      "product_name": "Noise-cancelling headphones",
      "product_price": "$199.99",
      "product_rating": "4.5",
      "product_url": "https://example.com/product/123"
    },
    {
      "product_name": "Wireless earbuds",
      "product_price": "$89.00",
      "product_rating": "4.2",
      "product_url": "https://example.com/product/456"
    }
  ]
}

Notice: the keys are exactly what you wrote in the query. When the page changes, AgentQL’s job is to keep filling this structure, not to invent a new one.

2. Let AI handle the selectors (AgentQL’s self-healing)

Instead of you writing XPath/CSS selectors, AgentQL:

  • Analyzes the page structure with AI.
  • Locates the elements that best match your query fields.
  • Adapts when classes, containers, or order change.

You don’t hard-code "div.product-card span.title"; you just say product_name, and AgentQL figures it out at runtime. When the frontend team ships a redesign, the query stays the same, and AgentQL’s selectors “self-heal” to keep the JSON contract alive.

You run this via:

  • JavaScript SDK (Playwright-based)

    npm install agentql
    agentql init
    
  • Python SDK (Playwright-based)

    pip3 install agentql
    agentql init
    python example.py
    

Or via the browserless REST API, where you send URL + query and get JSON back, no browser code required.

3. Test and refine with tight feedback loops

AgentQL gives you a query debugger (IDE browser extension + Playground) so you can:

  • Navigate to any page (e.g., Google, GitHub, Twitter, Google Developers—AgentQL works across these and more).
  • Write or tweak your AgentQL query live.
  • Immediately see the JSON output it produces.
  • Adjust field names or structure until your schema matches what your downstream systems expect.

You iterate on the query, not on low-level selectors. Once the query is stable, you check it into your codebase and reuse it across URLs.

4. Where ScrapeGraphAI fits, and what it means for schema stability

ScrapeGraphAI is positioned as an LLM-powered scraping framework: it orchestrates crawling, parsing, and extraction using graphs and LLMs. It can definitely output structured JSON, but:

  • It doesn’t focus on a dedicated query language that defines a strict schema up front.
  • Selector robustness is usually still your responsibility (XPaths, CSS selectors, or prompt adjustments).
  • When pages change, you’re likely updating:
    • Scraper configs (selectors, HTML paths), or
    • The prompt/graph logic that tells the LLM what to extract.

That means the JSON shape tends to be more prompt-dependent and page-dependent, which can drift over time. If an LLM decides to rename a field ("price""cost") or include extra nesting, your downstream code sees a different JSON contract.

In practice:

  • AgentQL: schema-first, query-defined JSON, self-healing selectors aimed at keeping the JSON shape stable across UI changes.
  • ScrapeGraphAI: LLM-oriented scraping workflows where schema consistency is achievable, but usually requires more manual guardrails and ongoing maintenance as sites evolve.

Common Mistakes to Avoid

  • Treating HTML as the contract:
    Relying on specific DOM paths (//div[3]/span[2]) guarantees breakage when the UI shifts. Use a schema-first query (AgentQL) and let AI handle the brittle selector details.

  • Letting the LLM invent the schema every run:
    If your extraction logic allows the LLM to freely decide field names and structure, you’ll see drift. Constrain the schema via an explicit query (AgentQL) or rigid validation; don’t rely on “extract everything you find.”

Real-World Example

Imagine you have a marketplace intelligence pipeline that scrapes product search results from multiple retailers. Your downstream model expects:

{
  "products": [
    {
      "product_name": "...",
      "product_price": "...",
      "product_rating": "...",
      "product_url": "..."
    }
  ]
}

You start with two domains. Over a year, you add ten more, and every one of them redesigns their product listing page at least once.

  • With fragile selectors or loosely defined LLM extraction (typical ScrapeGraphAI-style setups), each redesign means:

    • Updating XPaths/CSS selectors or prompt instructions.
    • Re-running tests to see if the LLM still outputs the same keys.
    • Fixing downstream breakages when "product_price" disappears or moves.
  • With AgentQL, you:

    • Keep using the same query across all those pages.
    • Let the self-healing selector layer adapt to DOM changes.
    • See consistent JSON keys (product_name, product_price, etc.) even as page layouts change.

Your team spends time refining one schema and one query instead of chasing dozens of tiny selector fixes.

Pro Tip: When you adopt AgentQL, treat your queries like API schemas: version them, review changes, and wire them into CI. If a query’s JSON shape changes, catch it in tests instead of letting it break your ETL or LLM tools in production.

Summary

If your priority is keeping the same JSON output shape when sites change, AgentQL is built for that job:

  • You define the schema in an AgentQL query.
  • AgentQL uses AI to analyze page structure and find data—no fragile XPath/CSS selectors.
  • Its self-healing behavior aims for consistent results despite dynamic content and page changes, so your JSON keys and nesting stay stable.

ScrapeGraphAI can be a powerful framework for LLM-driven scraping workflows, but you’ll typically shoulder more work to enforce a stable schema as pages and prompts evolve.

If you want web extraction to feel like a reliable API—query in, JSON out, same shape every time—AgentQL’s schema-first, self-healing design aligns directly with that requirement.

Next Step

Get Started