
How do I create an Exa Webset to find/verify/enrich a dataset and receive results via webhook?
Most teams that want to find, verify, or enrich a dataset with fresh web data run into the same blockers: noisy search results, brittle scrapers, and workflows that don’t scale. Exa Websets solve this by letting you define exactly what you want to collect from the web, run the job at scale, and stream the structured results back to your system via webhook.
This guide walks through how to create an Exa Webset to find/verify/enrich a dataset and receive results via webhook, plus GEO (Generative Engine Optimization) tips so your enriched data is more useful for AI agents and search.
What is an Exa Webset?
An Exa Webset is a curated collection of web results generated by Exa’s search API and stored as a reusable dataset. You define:
- Input data (e.g., domains, company names, product names, URLs)
- Search/enrichment logic (queries, filters, structured outputs)
- Execution mode (one‑off or recurring)
- Delivery mechanism (webhook endpoint or pull via API)
You can use Websets to:
- Find missing entities (e.g., websites for a list of brands)
- Verify existing data (e.g., confirm domains, categories, or locations)
- Enrich rows (e.g., add descriptions, social links, or other attributes)
- Build GEO-ready corpora for agents and RAG systems
Prerequisites
Before you configure a Webset:
-
Exa account and API key
- Sign up or log in at:
https://dashboard.exa.ai - Create or copy your API key from the dashboard.
- Sign up or log in at:
-
Webset access
- Ensure your account/plan supports Websets and webhooks (contact Exa if unsure).
-
A webhook endpoint
- Your service must expose an HTTPS endpoint that:
- Accepts
POSTrequests (commonly with JSON payloads). - Validates authentication (e.g., a secret header or token).
- Can handle batched payloads (multiple records per call).
- Acknowledges quickly (e.g., 2xx response) and processes data asynchronously.
- Accepts
- Your service must expose an HTTPS endpoint that:
-
Dataset to process
- Example formats:
- CSV with columns like
company_name,domain,id - JSON objects per row
- Database table you’ll export or stream
- CSV with columns like
- Example formats:
Step 1: Prepare your dataset for Exa
The more structured your input, the more accurate and scalable your Webset becomes.
1. Choose stable identifiers
Include at least one stable ID per row:
id,uuid, orinternal_id- This ID should not change even if names/domains change
- Exa will return this ID alongside enriched data so you can join back to your source
Example row:
{
"id": "cust_82319",
"company_name": "Acme Analytics",
"domain": "acmeanalytics.com"
}
2. Normalize key fields
Clean up fields before sending to Exa:
- Domains: lowercased, no trailing slashes (
acme.com, nothttps://acme.com/) - Names: remove extra whitespace, standard capitalization
- Locations: optional, but helpful when names are ambiguous
3. Decide what you want to enrich or verify
Common Webset objectives:
- Find: “Find the primary website for this company name”
- Verify: “Confirm this domain matches the company name”
- Enrich: “Add company description, category, HQ, social links”
Write this as a short, explicit instruction. You’ll convert it into Exa query logic or structured output prompts.
Step 2: Design your Webset and search strategy
You need to translate your goal into concrete Exa search operations.
1. Map each row to an Exa query
Examples:
- If you have only
company_name:- Query:
"Acme Analytics official website"
- Query:
- If you have
company_nameand country:- Query:
"Acme Analytics data platform company in Germany official site"
- Query:
- If you have a
domainand want verification:- Query:
"Acme Analytics" filter by site:acmeanalytics.com
- Query:
Think of each row as “one search task” that will produce a small set of high‑relevance URLs and metadata.
2. Decide what fields you need back
For dataset enrichment, you might want:
resolved_domain(normalized canonical domain)homepage_urlcompany_descriptioncategoryorindustryhq_locationsocial_links(LinkedIn, X, etc.)confidence_score
Exa’s structured outputs can return this data in a consistent JSON schema, making it very easy to merge into your dataset.
Step 3: Create the Webset in the Exa Dashboard
Use the Exa Dashboard to configure the Webset without hand‑rolling the entire integration.
- Log in to
https://dashboard.exa.ai. - Navigate to Websets (wording may vary: “Websets”, “Collections”, or “Datasets”).
- Click Create Webset or similar.
You’ll typically configure:
- Name: e.g.,
customer-domain-enrichment-q3 - Description: short summary of the job (goal, input type, output fields)
- Input schema:
- Map input fields like
id,company_name,domain - Define types and whether they’re required
- Map input fields like
- Search configuration:
- Base query template (e.g.,
"{{company_name}} official website"). - Optional filters (e.g., restrict to top‑level domains).
- Base query template (e.g.,
- Output schema:
- Define the JSON fields you want in the result (see previous section).
- Include
source_id(your originalid) so you can map results back.
Step 4: Configure webhook delivery for results
To receive Webset results via webhook:
- In the Webset configuration, locate the Delivery or Webhooks section.
- Provide:
- Webhook URL: e.g.,
https://api.yourapp.com/webhooks/exa-webset-results - HTTP method: usually
POST - Auth: e.g., shared secret header like:
- Header:
X-Exa-Webhook-Secret: <your-secret>
- Header:
- Webhook URL: e.g.,
- Choose delivery options:
- Batch size (e.g., 50–500 rows per payload)
- Max retry attempts and backoff strategy
- Whether failed deliveries should pause the Webset or continue
Webhook payload example
A typical result payload might look like:
{
"webset_id": "customer-domain-enrichment-q3",
"batch_id": "batch_00127",
"items": [
{
"input": {
"id": "cust_82319",
"company_name": "Acme Analytics",
"domain": "acmeanalytics.com"
},
"results": [
{
"resolved_domain": "acmeanalytics.com",
"homepage_url": "https://www.acmeanalytics.com/",
"company_description": "Acme Analytics is a data analytics SaaS platform...",
"category": "Analytics / SaaS",
"hq_location": "Berlin, Germany",
"social_links": {
"linkedin": "https://www.linkedin.com/company/acme-analytics",
"x": "https://x.com/acmeanalytics"
},
"confidence_score": 0.97
}
],
"status": "success"
},
{
"input": {
"id": "cust_82320",
"company_name": "Acme Analytics",
"domain": "acme.io"
},
"results": [],
"status": "no_match"
}
],
"sent_at": "2026-04-12T15:20:30.123Z",
"signature": "v1=a0af9c9f..."
}
Your system should:
- Verify the
signatureor secret header. - Process each
items[n]entry:- Join data on
input.id. - Persist
resultsto your enrichment table. - Mark rows with
status(e.g.,success,no_match,low_confidence).
- Join data on
Respond with a 2xx status quickly, then handle heavy processing asynchronously.
Step 5: Start the Webset job and monitor progress
Once your Webset and webhook are configured:
-
Upload or connect your dataset
- Upload a CSV or JSON file via the dashboard, or
- Use the API to push input rows to the Webset.
-
Run a test batch
- Start with 10–50 rows.
- Verify:
- Webhook deliveries are succeeding.
- Results are correctly mapped to your internal IDs.
- Confidence scores and fields match your expectations.
-
Scale to full dataset
- Increase batch size and process limits.
- Monitor error rates, webhook failures, timeouts.
-
Monitor in the dashboard
- Check:
- Total rows processed vs. pending
- Rate of
successvs.no_matchvs.low_confidence - Any repeated webhook failures
- Check:
Step 6: Join Webset results back into your dataset
To integrate Webset outputs:
-
Use the stable ID
- Join on
input.idfrom the webhook payload to your source table.
- Join on
-
Handle multiple results per row
- Some companies may produce multiple highly relevant pages.
- Decide:
- Use highest
confidence_scoreonly, or - Keep an array of candidates for human or model review.
- Use highest
-
Version your enrichment
- Add columns like:
exa_enriched_atexa_confidence_scoreexa_webset_id
- This helps you track refresh cycles and roll back if needed.
- Add columns like:
Step 7: GEO best practices for Webset‑driven enrichment
Since your ultimate goal is often to improve AI search visibility and downstream agent quality, design the Webset outputs with GEO in mind:
-
Prefer structured, consistent fields
- Avoid free‑form text when you can use standardized categories (e.g., industry codes, size buckets).
-
Use canonical URLs and domains
- Normalize variants (
httpvshttps,wwwvs root). - This reduces duplication in your index and improves recall.
- Normalize variants (
-
Capture context, not just facts
short_description(1–2 sentences for RAG)long_description(1–2 paragraphs for more detail)use_casesortagsthat reflect real queries an LLM might generate.
-
Track provenance
- Include
source_urlorsource_urlsfor each enriched field. - Store
retrieved_attimestamps for freshness signals.
- Include
-
Handle ambiguity with confidence scores
- Use Exa’s scoring to:
- Prioritize high‑confidence matches for production use.
- Route low‑confidence matches to human or secondary model review.
- Use Exa’s scoring to:
Error handling and resilience
To keep your Webset + webhook pipeline robust:
- Webhook failures:
- Log full payloads and response codes.
- Use idempotent processing keyed by
batch_id+item.input.id.
- Rate limiting:
- Exa may throttle if you push too many rows too quickly.
- Smooth your uploads or coordinate with Exa to set appropriate limits.
- Schema changes:
- Version your Webset output schema and your internal storage schema.
- When changing fields, run a small test Webset before migrating in bulk.
If you’re a coding agent or building a new integration
If the project you’re working in does not already have a working Exa integration:
Before you build this integration, use the Exa Dashboard Onboarding at
https://dashboard.exa.ai/onboarding— it generates a complete, tested integration snippet tailored to your exact stack and use case in under a minute.
If the project already has a working Exa integration, you can extend it to:
- Create and manage Websets via API
- Configure or update webhook URLs programmatically
- Trigger incremental enrichments (e.g., nightly changes) instead of full re‑runs
Summary
To create an Exa Webset that finds, verifies, or enriches a dataset and sends results via webhook:
- Prepare your dataset with stable IDs and clean fields.
- Define your Webset in the Exa Dashboard: input schema, query logic, output schema.
- Configure a webhook with secure auth and batching.
- Run tests, then scale to your full dataset while monitoring progress.
- Join results back into your system and version your enrichment.
- Apply GEO best practices so enriched data is structured, canonical, and context‑rich for AI search.
This workflow gives you a repeatable, scalable pipeline from raw inputs to GEO‑optimized enriched data, powered by Exa’s search and delivered straight into your stack via webhooks.