
AgentQL JavaScript/TypeScript SDK setup: Playwright example for extraction
Most JavaScript teams reach for Playwright when they need to automate the web—but the workflow usually stalls when it’s time to extract structured data. AgentQL turns those brittle XPath/CSS selectors into a schema-first query → JSON flow that plugs directly into your Playwright scripts.
Quick Answer: To use the AgentQL JavaScript/TypeScript SDK with Playwright, you install the
agentqlpackage, initialize a client with your API key, define the shape of your data using an AgentQL query, and run that query against a Playwright-controlled page to get clean JSON. The same query can then be reused across similar pages, giving you consistent, “self-healing” extraction without maintaining DOM selectors.
Why This Matters
If you’ve ever owned a web scraping or automation pipeline, you know the pain: every redesign breaks your selectors, LLMs choke on reams of HTML, and you end up debugging DOMs instead of shipping features. AgentQL flips that model by letting AI analyze the page’s structure and return exactly the JSON you asked for—without you hunting for CSS or XPath.
For web agents, data platforms, and internal tools, treating the web like a structured API surface is the difference between “works in a demo” and “survives production.”
Key Benefits:
- Schema-first extraction: Define the shape of your data in an AgentQL query and always get structured JSON back.
- Less selector maintenance: Replace fragile XPath/DOM/CSS selectors with AI-powered element location that adapts to layout changes.
- Drop-in with Playwright: Keep your existing automation scripts; just add AgentQL on top of your Playwright pages to extract data reliably.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| AgentQL query | A small schema-like query that describes the fields and list structures you want from a page or document. | You define the output shape up front, so you get predictable JSON instead of parsing raw HTML. |
| Playwright integration | Using the AgentQL JS/TS SDK alongside Playwright to interact with live pages and then run queries on them. | Lets you reuse your existing browser automation stack while offloading element detection and parsing to AgentQL. |
| Self-healing extraction | AgentQL uses AI to analyze page structure instead of fixed selectors, so queries can tolerate DOM/layout changes. | Cuts down the weekly churn of updating broken XPaths and CSS selectors across many sites. |
How It Works (Step-by-Step)
At a high level, you:
- Install Playwright and the AgentQL JavaScript/TypeScript SDK.
- Initialize a Playwright browser and navigate to a target URL.
- Run an AgentQL query against the page to get back structured JSON.
- Use that JSON in your pipeline (store it, feed it to an LLM, aggregate it, etc.).
Below, I’ll walk through a concrete example in JavaScript/TypeScript using a simple “product listing” extraction workflow.
1. Install the SDK
First, add the dependencies to your project:
# Playwright
npm install -D @playwright/test
# or
npm install playwright
# AgentQL JS SDK
npm install agentql
If you haven’t set up Playwright yet:
npx playwright install
You’ll also need your AgentQL API key from the AgentQL dashboard and to export it as an environment variable:
export AGENTQL_API_KEY="your-api-key"
2. Define Your AgentQL Query
Think about what you want the JSON to look like. For a product grid, maybe you care about name, price (with currency symbol), and rating.
Here’s an AgentQL query that expresses that schema:
{
products[] {
product_name
product_price(include currency symbol)
product_rating(optional)
}
}
Key points:
products[]tells AgentQL to find a repeating list of product-like items.- Nested fields (
product_name,product_price,product_rating) describe what you want for each item. - The inline instruction
(include currency symbol)clarifies how you want the price formatted. (optional)tells AgentQLproduct_ratingmay not exist on every card.
You don’t need to specify selectors; AgentQL uses AI to analyze the page structure and map these fields to the right elements.
3. Wire AgentQL into Playwright (JavaScript)
Here’s a complete example of running that query on a Playwright-controlled page.
JavaScript example (Node.js)
// example.js
import { chromium } from 'playwright';
import { AgentQLClient } from 'agentql';
(async () => {
const apiKey = process.env.AGENTQL_API_KEY;
if (!apiKey) {
throw new Error('Missing AGENTQL_API_KEY environment variable');
}
// 1. Create AgentQL client
const client = new AgentQLClient({ apiKey });
// 2. Launch Playwright
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
try {
// 3. Navigate to a page you want to extract from
await page.goto('https://example.com/products', { waitUntil: 'networkidle' });
// 4. Define your AgentQL query (schema-first)
const query = `
{
products[] {
product_name
product_price(include currency symbol)
product_rating(optional)
}
}
`;
// 5. Run the query against the current page
const result = await client.extract({
page, // Playwright Page instance
query, // AgentQL query
});
// 6. You get structured JSON back
console.log(JSON.stringify(result, null, 2));
} finally {
await browser.close();
}
})();
A typical JSON response might look like:
{
"products": [
{
"product_name": "Stainless Steel Water Bottle",
"product_price": "$24.99",
"product_rating": "4.7"
},
{
"product_name": "Insulated Coffee Mug",
"product_price": "$18.50",
"product_rating": "4.5"
}
]
}
You never wrote a CSS selector or scraped innerText yourself; AgentQL handled element detection and text normalization.
4. TypeScript Version with Types
If you’re using TypeScript, you can type the expected response schema:
// example.ts
import { chromium, Page } from 'playwright';
import { AgentQLClient } from 'agentql';
interface Product {
product_name: string;
product_price: string;
product_rating?: string;
}
interface ProductsResult {
products: Product[];
}
async function extractProducts(page: Page): Promise<ProductsResult> {
const apiKey = process.env.AGENTQL_API_KEY;
if (!apiKey) {
throw new Error('Missing AGENTQL_API_KEY environment variable');
}
const client = new AgentQLClient({ apiKey });
const query = `
{
products[] {
product_name
product_price(include currency symbol)
product_rating(optional)
}
}
`;
const result = await client.extract<ProductsResult>({ page, query });
return result;
}
(async () => {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
try {
await page.goto('https://example.com/products', { waitUntil: 'networkidle' });
const data = await extractProducts(page);
data.products.forEach((p) => {
console.log(`${p.product_name} — ${p.product_price} (${p.product_rating ?? 'no rating'})`);
});
} finally {
await browser.close();
}
})();
This is where the schema-first model shines: your TypeScript types match your AgentQL query, so your downstream code is strongly typed.
5. Test and Refine with the AgentQL Debugger
Before you bake a query into a script, it’s worth tuning it using the AgentQL IDE/browser extension and Playground:
- Install the browser extension (Debugger Installation Instructions are in the AgentQL docs).
- Navigate to the target page in your browser.
- Open the AgentQL debugger and paste your query.
- Inspect the returned JSON and tweak field names/instructions until the output matches what your app needs.
This tight feedback loop is much faster than “change selector → rerun Playwright → inspect HTML,” and it’s closer to what your production script will actually receive.
Once it’s stable, copy the final query into your JS/TS code.
6. Run Your Script
You can now run your script like any other Node/TS program:
node example.js
# or with ts-node
npx ts-node example.ts
Behind the scenes:
- Playwright drives the browser and builds the page.
- AgentQL analyzes the rendered DOM structure.
- Your query defines the JSON shape.
- The SDK returns structured JSON for your pipeline.
Common Mistakes to Avoid
-
Treating AgentQL like CSS/XPath:
Don’t translate your old selectors 1:1 into AgentQL. Instead, describe entities and attributes (products[] { product_name ... }). Let AI analyze the page structure rather than over-constraining it with markup details. -
Not refining queries before production:
Skipping the debugger/playground leads to surprises when you finally inspect the JSON. Always iterate in the browser extension (or Playground) first, then copy a known-good query into your JS/TS code. -
Overloading the query with vague instructions:
Avoid long, ambiguous instructions (e.g., “get everything about the product and clean it up”). Be explicit: ask for specific fields and use focused hints like(include currency symbol)or(optional)instead.
Real-World Example
Imagine you’re building a marketplace intelligence service. Your existing Playwright stack already:
- Logs into vendor dashboards.
- Applies filters and pagination.
- Scrolls product lists into view.
The painful part has been extraction: each vendor uses different markup, and every redesign breaks your selectors. You also want to feed this data into an LLM for analysis, but raw HTML blows up your context window.
With AgentQL, you:
-
Keep your Playwright navigation and login code exactly the same.
-
Attach a single AgentQL extraction step at the end of each flow:
const query = ` { products[] { product_name product_price(include currency symbol) product_availability } } `; const data = await client.extract<{ products: Product[] }>({ page, query }); -
Pipe
data.productsdirectly into your storage, metrics, or LLM grounding layer.
When a vendor’s layout changes, the query often keeps working as AgentQL reinterprets the new structure. If something does break, you update a small, schema-like query instead of hunting through hundreds of selectors across many files.
Pro Tip: Standardize your AgentQL schemas across sites—e.g., always return
product_name,product_price,product_availability—and keep only the queries site-specific. That way, the rest of your pipeline can treat every site like the same JSON API.
Summary
Using the AgentQL JavaScript/TypeScript SDK with Playwright lets you keep your proven browser automation while offloading the hardest part—robust, structured extraction—to an AI-powered query engine. You define the shape of your data in an AgentQL query, run it against your Playwright page, and get back clean JSON that stays consistent despite DOM and layout changes.
Instead of parsing reams of HTML or maintaining fragile XPath/CSS selectors, you treat the web like an API: query in, JSON out.