How do I build a web-browsing LLM agent that can reliably find buttons/fields and extract data?

Most teams discover the hard way that “let the LLM read the HTML and figure it out” doesn’t scale. Agents get lost in reams of markup, miss the right button, or hallucinate fields that aren’t there—and the whole thing breaks the moment a product manager ships a UI tweak. If you want a web-browsing LLM agent that reliably finds buttons/fields and extracts data, you have to treat the web like an API: define the output schema first, make element location resilient, and give the agent a predictable contract.

Quick Answer: To build a reliable web-browsing LLM agent, stop grounding it on raw HTML and brittle selectors. Instead, use a schema-first approach where the agent issues a structured query (via tools like AgentQL) that turns any page into predictable JSON, and use robust, AI-powered element targeting (Playwright + AgentQL) to click buttons, fill fields, and extract data consistently across layout changes.

Why This Matters

If your agent can’t consistently click “Submit” or extract a price, it’s not production-ready—it’s a demo. Fragile XPath/CSS selectors and HTML dumps blow up context windows, increase hallucinations, and force engineers into endless “DOM whack-a-mole” every time the page shifts. A resilient, schema-first browsing layer turns flaky web automation into a dependable part of your data and automation stack, so you can scale agents across sites without rewriting scrapers every week.

Key Benefits:

Reliability across UI changes: AI-powered selectors analyze page structure instead of hard-coded XPaths, so your agent keeps working when layouts move.
Clean, structured outputs: Define the shape of the data you want (query → JSON), making downstream LLM grounding and pipelines far easier to maintain.
Faster development and debugging: Use SDKs and a browser extension to iterate on queries in real time, rather than chasing DOM diffs and refactoring selectors.

Core Concepts & Key Points

Concept	Definition	Why it's important
Schema-first extraction	Designing your agent around “query → JSON” contracts instead of scraping arbitrary HTML.	Gives your LLM a predictable, compact representation of the page so it can reason and act without drowning in markup.
AI-powered selectors	Using tools like AgentQL to let AI analyze page structure and locate elements (buttons, fields, tables) instead of brittle XPath/CSS.	Reduces breakage when pages change and frees you from constantly updating DOM selectors.
Tool-based browsing loop	LLM calls tools (e.g., “navigate”, “query page”, “click element”, “fill form”) that are implemented with Playwright + AgentQL.	Makes the agent’s behavior transparent, testable, and easy to evolve—each tool has clear inputs/outputs.

How It Works (Step-by-Step)

At a high level, a robust web-browsing LLM agent looks like this:

The LLM plans what it needs (“find all products under $50 on this page”).
It calls a browsing tool that uses AgentQL to turn the page into structured JSON or locate interactive elements.
It reasons over the JSON, decides the next action (click, fill, navigate), and repeats until the goal is reached.

Below is the step-by-step breakdown.

1. Make the web “AI-ready”: query → JSON

Instead of asking your LLM to read the DOM, let your browsing layer convert any URL into JSON that matches a schema you control.

With AgentQL, you define the shape of the data you want:

{
  products[] {
    product_name
    product_price(include currency symbol)
    product_url
  }
}

Run this via the AgentQL REST API (browserless) or a Playwright-based SDK, and you get clean JSON:

{
  "products": [
    {
      "product_name": "Noise-Cancelling Headphones",
      "product_price": "$129.99",
      "product_url": "https://example.com/product/noise-cancelling-headphones"
    },
    {
      "product_name": "Wireless Earbuds",
      "product_price": "$59.99",
      "product_url": "https://example.com/product/wireless-earbuds"
    }
  ]
}

This is what your LLM grounds on—not 200 KB of HTML.

You can use:

Python/JS SDKs with Playwright when you need to interact with the page (clicks, form fills, logins).
REST API (URL → JSON) for pure extraction from public-facing pages, no browser needed.
The browser extension + Playground to design and optimize queries live.

2. Use AI-powered selectors instead of brittle XPaths

Traditional Playwright/Selenium flows look like:

await page.click('//div[3]/div/button[2]');

This is fine until someone adds a new <div>.

With AgentQL, the engine analyzes the page structure to find elements that match your query, acting as a robust alternative to fragile XPath and DOM/CSS selectors.

For example, to get all search results on Google:

{
  results {
    items[] {
      title
      url
      snippet
    }
  }
}

Same idea for LinkedIn, Google Support, or any other site—AgentQL’s smart selectors work on any page, not just a specific domain. You define what “result” means in terms of fields; AgentQL handles the DOM nuances.

3. Wrap AgentQL + Playwright into LLM tools

Your agent shouldn’t know about Playwright or DOM. It should know it has tools like:

open_url(url: string) -> page_id
extract_data(page_id: string, query: string) -> json
click(page_id: string, action_query: string) -> page_id
fill_field(page_id: string, field_query: string, value: string) -> page_id

Example: navigation and extraction with AgentQL’s JS SDK (Playwright-based)

import { chromium } from 'playwright';
import { AgentQLClient } from '@agentql/sdk-js';

const client = new AgentQLClient({ apiKey: process.env.AGENTQL_API_KEY });

async function extractProducts(url: string) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle' });

  const query = `
    {
      products[] {
        product_name
        product_price(include currency symbol)
        product_url
      }
    }
  `;

  const result = await client.queryPage({ page, query });

  await browser.close();
  return result;
}

Your LLM tool just wraps extractProducts and returns the JSON to the model.

4. Implement a browsing loop in the agent

Once you have tools, the LLM’s job is:

Decide what it needs (e.g., “I need to filter by ‘Remote’ and then extract job listings.”).
Call tools in sequence until the goal is met.

A simple loop in pseudo-code:

state = {"goal": user_goal, "history": []}

for step in range(MAX_STEPS):
    tool_choice = llm.decide_next_action(state)
    
    if tool_choice.name == "open_url":
        page_id = open_url(tool_choice.args["url"])
        state["history"].append({"tool": "open_url", "page_id": page_id})
    
    elif tool_choice.name == "extract_data":
        json_data = extract_data(state["history"][-1]["page_id"], tool_choice.args["query"])
        state["history"].append({"tool": "extract_data", "data": json_data})
    
    elif tool_choice.name == "click":
        page_id = click(state["history"][-1]["page_id"], tool_choice.args["action_query"])
        state["history"].append({"tool": "click", "page_id": page_id})
    
    elif tool_choice.name == "finish":
        break

All the DOM complexity is encapsulated in the tools; AgentQL handles the element finding.

5. Test, debug, and harden with the AgentQL IDE

To keep this reliable:

Use the AgentQL browser extension to inspect a live page and refine your query until the JSON output matches your schema.
Save queries and reuse them across similar pages (same product template, same support article layout, etc.).
Run against a test suite of URLs to ensure your queries stay “self-healing” across dynamic content and design tweaks.

Common Mistakes to Avoid

Feeding raw HTML to the LLM: This overwhelms the context window and encourages hallucinations.
How to avoid it: Always convert pages to structured JSON via an AgentQL query or a similar schema-first layer before giving them to the model.
Hard-coding XPath/CSS selectors for actions: These break on minor UI updates and are painful to maintain.
How to avoid it: Use AgentQL’s AI-powered selectors with Playwright—describe the element in your query and let the engine analyze the page structure to find it.
Skipping a schema contract: Letting the LLM invent arbitrary keys (“price”, “cost”, “amount”) makes downstream processing fragile.
How to avoid it: Define the output schema in your AgentQL query (product_name, product_price(include currency symbol), etc.) and keep it stable over time.

Real-World Example

Imagine you’re building a web-browsing LLM agent to monitor competitor pricing across multiple ecommerce sites. Each site uses different HTML, but your internal pipeline expects consistent fields: product_name, product_price, product_url, and availability.

Using AgentQL, you define a single query per layout pattern:

{
  products[] {
    product_name
    product_price(include currency symbol)
    product_url
    availability
  }
}

For Site A, Site B, and Site C, you:

Use the browser extension to tune the query on a few sample URLs until the returned JSON matches your schema.
Save those queries and call them via the AgentQL REST API or SDK for hundreds of URLs per site.
Have your LLM agent:
- Decide which URLs to visit.
- Call a tool that runs the appropriate AgentQL query for that domain.
- Reason over the JSON to compare prices, flag anomalies, or generate summaries.

When any of those sites change their DOM, your AgentQL queries often keep working (“self-healing”) because the engine is not tied to a single brittle XPath—it understands the page structure. Where adjustments are needed, you use the Playground or extension to update the query once, then reuse across your fleet.

Pro Tip: Treat each AgentQL query like an API version. When you need to change the schema (e.g., add shipping_cost), bump the query version and migrate consumers gradually instead of silently changing field shapes.

Summary

A reliable web-browsing LLM agent isn’t about a smarter model; it’s about better contracts with the web:

Stop crunching reams of HTML—convert pages into JSON using schema-first queries.
Replace fragile XPath/DOM/CSS selectors with AI-powered selectors that analyze page structure to find buttons, fields, and data.
Wrap AgentQL + Playwright/REST in clear tools and let the LLM orchestrate those tools in a browsing loop.
Use the browser extension and Playground for fast feedback, and treat queries as reusable, self-healing building blocks.

When you build this way, your agent can reliably find the right elements, extract structured data, and stay stable even as sites evolve.

Next Step

Get Started