
How do I use AgentQL with Tetra remote Chromium sessions for hard or authenticated pages?
Tough, authenticated pages are exactly where traditional scrapers and LLM-based web agents tend to fall apart. Combining AgentQL with Tetra’s remote Chromium sessions gives you a durable way to handle login flows, MFA, client-side rendering, and “hard mode” layouts—while still getting clean JSON back from your AgentQL queries.
Quick Answer: Use Tetra to own the remote Chromium session (including login and any “hard” navigation), then point AgentQL at that live browser context via your SDK or REST flow. Let Tetra handle authentication and page rendering, and let AgentQL handle schema-first extraction (query → JSON) so your web agents can reliably work on authenticated dashboards and complex apps.
Why This Matters
Authenticated, JavaScript-heavy pages are where brittle XPath/CSS selectors and raw HTML grounding become unmanageable. You end up:
- Maintaining fragile Playwright/Selenium flows that break on minor DOM shifts.
- Shoving reams of HTML into an LLM and hitting context limits and hallucinations.
- Re-implementing login logic for every tool that needs access.
Using Tetra for remote Chromium and AgentQL for extraction turns that into a clean contract: Tetra keeps a stable, logged-in browser; AgentQL turns whatever is on the screen into structured JSON with self-healing selectors. Your AI agents and data pipelines get a single, reusable way to talk to “hard” pages like SaaS dashboards, internal tools, and complex account portals.
Key Benefits:
- Works on “hard” pages: Handle logins, 2FA, and SPA dashboards inside Tetra, then have AgentQL extract structured data from the rendered DOM.
- Self-healing vs fragile selectors: Replace XPath/DOM/CSS with AgentQL’s AI-powered element location so your queries survive layout changes.
- Schema-first for LLMs and pipelines: Define the output shape (query → JSON) once and reuse it across sessions, agents, and similar pages.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Remote Chromium session (Tetra) | A long-lived, remotely hosted Chromium instance you control via API/SDK (including cookies, localStorage, and full JS execution). | Lets you stay logged in, pass MFA, and navigate complex apps without re-authenticating every run. |
| AgentQL query → JSON | A schema-first query language where you define the shape of the data you want and get back structured JSON. | Removes the need for DOM selectors and raw HTML parsing; perfect for web agents and data workflows. |
| Self-healing extraction | AgentQL uses AI to analyze the page structure and locate target elements even as the DOM and layout change. | Reduces breakage from UI tweaks and lets you reuse the same query across similar pages and sessions. |
How It Works (Step-by-Step)
At a high level, the flow for “hard or authenticated” pages looks like:
- Use Tetra to create and manage a remote Chromium session (login, MFA, navigation).
- Ensure the target page is fully loaded in that session.
- Attach AgentQL to that session (via SDK or a browserless-style API) and run schema-first queries that return JSON.
Below is a concrete, developer-style breakdown.
1. Establish and authenticate the Tetra remote Chromium session
Your first job is to make the page “AI-ready” by getting Tetra’s browser to the right state: logged in, MFA cleared, and on the page you care about.
Pseudo-flow (language-agnostic):
- Create a new remote Chromium session with Tetra.
- Use Tetra’s control APIs to:
- Navigate to the login URL
- Fill in username/password
- Handle MFA or SSO if needed (code input, WebAuthn, etc.)
- Navigate to the authenticated page (e.g.,
/dashboard,/billing,/reports).
You now have a live, remote browser that:
- Is logged in
- Has cookies and session storage
- Is displaying the authenticated or “hard” page you want AgentQL to extract from
2. Make the page accessible to AgentQL
AgentQL works directly on real webpages—HTML plus all the runtime-generated DOM. With Tetra, you have two main patterns:
-
Playwright + AgentQL SDK in the same code path
- Use Tetra’s remote Chromium as the browser backend.
- Control it via Playwright from your code.
- Plug in the AgentQL JS or Python SDK against that Playwright
pagecontext.
-
Browserless-style API + AgentQL REST
- Use Tetra to maintain the session and expose the current URL / page state.
- Call the AgentQL REST API with the current page URL and your query, letting AgentQL handle the rendering remotely.
- If you need to reuse Tetra’s exact in-session state (e.g., cookies, headers), you pass session-specific info to AgentQL’s backend (if available in your deployment model).
For most teams, the easiest production-ready approach is:
Use Tetra as the remote Chromium “engine,” then run Playwright in your AgentQL script pointed at that engine, so AgentQL’s SDK operates on the live, authenticated DOM.
3. Define your AgentQL query (schema-first)
Once your Tetra session has the page open, AgentQL takes over the extraction side.
Example: Say you’re scraping an authenticated analytics dashboard with a list of campaigns.
You might define an AgentQL query like:
{
campaigns[] {
campaign_name
campaign_status
impressions
clicks
ctr
}
}
AgentQL uses AI to analyze the rendered page (inside that authenticated Tetra session) and finds these fields—even if:
- The DOM is deeply nested.
- Class names are obfuscated or dynamic.
- The layout shifts slightly over time.
4. Run the query via AgentQL SDK (with Tetra-backed Playwright)
Here’s a simplified JavaScript example that assumes:
- Tetra exposes a Playwright-compatible endpoint for remote Chromium.
- You’ve installed the AgentQL JS SDK:
npm install agentql
import { chromium } from 'playwright';
import { AgentQLClient } from 'agentql';
(async () => {
// 1) Connect Playwright to Tetra's remote Chromium
const browser = await chromium.connectOverCDP('wss://tetra-remote-chromium.example.com?session_id=YOUR_SESSION_ID');
const page = await browser.newPage();
// 2) Navigate within the authenticated session
// If Tetra already authenticated this session, you can go straight to the target URL
await page.goto('https://example-auth-app.com/dashboard', { waitUntil: 'networkidle' });
// 3) Initialize AgentQL client on top of this page
const client = new AgentQLClient({ page });
// 4) Define your schema-first query
const query = `
{
campaigns[] {
campaign_name
campaign_status
impressions
clicks
ctr
}
}
`;
// 5) Execute the query
const result = await client.query(query);
console.log(JSON.stringify(result, null, 2));
await browser.close();
})();
The returned JSON might look like:
{
"campaigns": [
{
"campaign_name": "Spring Launch 2026",
"campaign_status": "Active",
"impressions": "124,309",
"clicks": "2,914",
"ctr": "2.34%"
},
{
"campaign_name": "Retargeting - Q2",
"campaign_status": "Paused",
"impressions": "54,923",
"clicks": "613",
"ctr": "1.12%"
}
]
}
No XPath, no fragile CSS selectors, no scraping reams of HTML into your LLM—just a schema and the JSON it returns.
5. Integrate with your agent or data pipeline
From here, you can:
- Feed the JSON into an LLM as structured context instead of raw HTML.
- Load it into a data warehouse.
- Use it inside an orchestration tool (Temporal, Airflow, etc.) for recurring jobs.
- Attach it to an agent that periodically checks dashboard metrics or performs actions via Playwright.
AgentQL “holds no opinions on what’s and how’s.” You can plug this into whatever architecture makes sense: cron jobs, event-driven triggers, or full agent frameworks.
Common Mistakes to Avoid
-
Treating AgentQL like a DOM selector engine:
AgentQL is schema-first. Don’t try to reconstruct XPath logic in your query. Instead, describe the fields you want (product_name,order_total,campaign_status) and let AgentQL infer the mapping from the page structure. -
Separating authentication and extraction into incompatible tools:
If Tetra manages authentication but your AgentQL setup doesn’t actually operate on the same browser/session, you’ll hit login walls. Ensure your AgentQL SDK is bound to the same remote Chromium context Tetra controls (e.g., via PlaywrightconnectOverCDP). -
Over-stuffing LLM context with HTML instead of using JSON:
Once you have structured JSON, resist the urge to also dump raw HTML into your prompts. Use the JSON as the single source of truth; this keeps context lean and reduces hallucinations.
Real-World Example
Imagine you’re building a GEO-aware analytics agent that:
- Logs into a third-party ad platform that only exposes metrics via a web dashboard.
- Uses Tetra to maintain a persistent, MFA-cleared Chromium session.
- Queries multiple views (overview, campaigns, creatives, spend breakdowns).
- Returns a concise summary to downstream systems as structured JSON.
Flow:
-
Tetra spins up a remote Chromium instance and runs the full login sequence (username/password, SSO redirect, 2FA).
-
Once authenticated, Tetra navigates to
https://ads.vendor.com/campaigns. -
Your AgentQL script connects to that session over Playwright.
-
You run this query:
{ summary { total_spend date_range } campaigns[] { name status daily_budget yesterday_spend impressions clicks } } -
AgentQL returns JSON that your agent uses to:
- Generate GEO-optimized reports.
- Detect anomalies.
- Populate dashboards without scraping HTML ever again.
Pro Tip: Once you’ve validated your AgentQL query for one account in the AgentQL IDE or Playground, reuse the same query across all similar accounts and sessions. AgentQL’s self-healing behavior means you typically don’t need to re-tune selectors when the platform nudges layout or class names.
Summary
Using AgentQL with Tetra remote Chromium sessions lets you:
- Treat even the hardest, authenticated pages like a stable data source.
- Split responsibilities cleanly: Tetra keeps a logged-in browser alive; AgentQL turns whatever’s on screen into structured JSON via schema-first queries.
- Eliminate fragile XPath/CSS and HTML parsing in favor of self-healing extraction that’s reusable across pages and sessions.
The result is production-grade web agents and data workflows that can safely touch real, logged-in apps—without collapsing every time the UI changes.