
Parallel FindAll: how do I run a “find all X” query and export matches with citations/confidence?
Most teams approach “find all X” problems with brittle stacks: a search API, custom scrapers, some regex, and a lot of manual cleanup. Parallel FindAll turns that entire workflow into a single asynchronous request where you describe what you want in natural language and receive a structured dataset—with citations, reasoning, and confidence for every match.
Below is a practical walkthrough of how to run a “find all X” query with FindAll and export matches (including citations and confidence) into a format your agents or analysts can trust.
What FindAll is doing under the hood
FindAll is Parallel’s entity-discovery API. Instead of returning links or generic summaries, it:
- Takes a natural-language “find all…” objective (e.g., “Find all healthcare startups with Series A funding and at least one FDA-approved product”).
- Crawls and searches the web using Parallel’s AI-native index and live retrieval.
- Performs multi-hop reasoning across sources (e.g., one page for industry, another for funding, a third for FDA approval).
- Returns a structured list of entities (matches), each with:
- Core fields (name, URL, etc.).
- Match-specific attributes (e.g., funding stage, geography, product type).
- Basis metadata: citations, source excerpts, reasoning, and confidence scores.
Latency is asynchronous: FindAll jobs typically complete in 10 minutes to 1 hour, depending on complexity and processor tier. Pricing is per match ($0.03–$1 per match, depending on processor), not per token, so you can forecast costs before running a large discovery job.
When to use FindAll vs Search or Task
Use FindAll when your objective is:
- “Find all X that match Y,” not just “tell me about X.”
- You need a dataset of entities, not just a narrative answer.
- You care about recall and verifiability: you want citations and confidence for every row.
Typical patterns:
- Lead lists: “Find all B2B SaaS startups in Europe that raised Series B in the last 24 months.”
- Vendor discovery: “Find all SI partners that have at least 3 public case studies with Fortune 500 banks.”
- Compliance/risk: “Find all crypto exchanges that have had a regulatory enforcement action since 2021.”
- Competitive landscapes: “Find all AI-native web search providers that expose APIs for agents.”
If you only need a small number of high-depth profiles, Task might be a better fit. If you only need URLs plus compressed context for agents, Search is usually enough. FindAll is optimized for turning a vague “find all…” goal into a structured, exportable dataset at scale.
Step 1: Frame your “find all X” query
The most important part of a FindAll job is the query itself. Think of it as a spec for an evaluator model rather than a human search keyword string.
A good “find all X” query should:
-
Define the entity type (“X”) clearly
- “companies” vs “products” vs “people” vs “papers”
- Example: “Find all healthcare startups…”
-
Specify the match criteria as explicit conditions
- Funding stage, geography, regulatory status, tech stack, etc.
- Example: “…with Series A funding and at least one FDA-approved product.”
-
Clarify inclusions/exclusions
- “Exclude public companies,” “focus on US and EU only,” “ignore marketplaces.”
- This helps the model reason about edge cases.
-
State what fields you want in the output
Even though FindAll can infer a structure, being explicit helps:
- “For each match, return: company name, website, headquarters country, funding stage, product name, FDA approval evidence.”
Example queries
- “Find all US-based fintech startups founded after 2018 that offer B2B payment APIs. For each match, return name, website, headquarters, year founded, API docs URL, and evidence for B2B focus.”
- “Find all open-source vector database projects with more than 500 GitHub stars. For each, return project name, repo URL, star count, primary language, and evidence that it’s a vector database.”
Parallel’s multi-hop reasoning means you can safely define compound conditions that require verifying different attributes from different sources. FindAll will cross-reference multiple pages per entity before deciding whether it’s a match.
Step 2: Run a query in the FindAll playground
If you’re just getting started or want to sanity-check a query before integrating an agent, use the FindAll playground.
-
Go to the FindAll playground:
- Visit:
https://platform.parallel.ai/play/find-all
- Visit:
-
Paste your “find all X” query into the Query field.
-
(Optional) Adjust processor / depth:
- For more complex, multi-hop criteria or high-stakes use cases, choose a higher-tier processor (e.g., Pro/Ultra). These allow deeper reasoning at the cost of higher per-match price and longer latency.
- For exploratory discovery, a mid-tier processor usually balances cost and recall well.
-
Submit the job:
- The request runs asynchronously. You’ll see job status progress from pending → processing → complete.
- Typical latency: 10–60 minutes depending on scope and processor.
-
Inspect individual matches directly in the UI:
- Each match shows the core fields plus a “basis” panel with:
- Source URLs.
- Excerpts used as evidence.
- A short rationale for why this entity was considered a match.
- A confidence score (0–1 or percentage).
- Each match shows the core fields plus a “basis” panel with:
This is the fastest way to validate that FindAll understands your criteria before you wire it into an agent or ETL job.
Step 3: Call FindAll via API
Once your query looks good, move it into your stack. Here’s a conceptual workflow using a typical HTTP client; adapt to your language or MCP tool configuration.
3.1. Create a FindAll job
curl -X POST https://api.parallel.ai/find-all \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Find all healthcare startups with Series A funding and at least one FDA-approved product. For each match, return company name, website, HQ country, funding stage, product name, and evidence of FDA approval.",
"processor": "pro" // example; actual processor options may vary
}'
A successful response will return a JSON payload that includes a job_id (or similar identifier). That ID is what you’ll poll.
{
"job_id": "findall_12345",
"status": "pending",
"estimated_completion_seconds": 1800
}
3.2. Poll for completion
curl -X GET "https://api.parallel.ai/find-all/findall_12345" \
-H "Authorization: Bearer YOUR_API_KEY"
Responses will look like:
{
"job_id": "findall_12345",
"status": "processing"
}
or, when complete:
{
"job_id": "findall_12345",
"status": "complete",
"matches": [ /* … see below … */ ]
}
You can safely poll every 30–60 seconds; rate limits are generous (e.g., 300 requests/min at the platform level), but FindAll jobs themselves are naturally slower due to deep crawling and reasoning.
Step 4: Understand the FindAll result schema
The exact schema may evolve, but conceptually FindAll returns:
{
"job_id": "findall_12345",
"status": "complete",
"matches": [
{
"id": "entity_1",
"name": "Acme Health",
"website": "https://acmehealth.com",
"attributes": {
"hq_country": "United States",
"funding_stage": "Series A",
"product_name": "Acme Cardio Monitor"
},
"basis": {
"confidence": 0.87,
"citations": [
{
"url": "https://press.acmehealth.com/series-a-funding",
"excerpt": "Acme Health announced a $15M Series A led by...",
"reasoning": "Confirms the company has raised a Series A round."
},
{
"url": "https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K123456",
"excerpt": "Acme Cardio Monitor is cleared under 510(k)...",
"reasoning": "Confirms FDA clearance for the named product."
}
]
}
}
]
}
Key points:
matchesis your dataset: one object per entity.attributesholds structured fields specific to your use case (geography, funding, etc.).basisis where verifiability lives:confidenceis a calibrated probability that this entity truly matches your described criteria.citationslist the URLs, evidence excerpts, and local reasoning the system used.
This is aligned with Parallel’s Basis framework: every atomic fact can be traced to supporting evidence, so you can programmatically trust or reject matches.
Step 5: Export matches with citations and confidence
Once your job is complete, you have two main export paths:
- One-click export from the playground UI
- Programmatic export via API
5.1. Export from the FindAll playground
In the playground:
- Open your completed job.
- Use the export controls (typically CSV / JSON download).
- Choose whether to:
- Export core fields only (name, website, attributes).
- Export full basis metadata, including citations, excerpts, reasoning, and confidence.
A common pattern is to:
- Export CSV with core fields for sales/BD or operations teams.
- Export JSON with full basis for data engineering, agent pipelines, or GEO evaluation (where you care about field-level provenance).
5.2. Export programmatically (CSV / JSON)
If you’re calling the API directly, you already have the raw JSON. To convert to CSV while preserving citations and confidence:
- Flatten the entity-level fields into columns:
name,website,hq_country,funding_stage, etc.
- Represent basis in a structured way:
confidenceas a numeric column.citationseither:- Collapsed into a single JSON string per row, or
- Exploded into a separate “citations” table keyed by
entity_id.
Example: flattening to CSV
Result row (conceptual):
| id | name | website | hq_country | funding_stage | product_name | confidence | citations |
|---|---|---|---|---|---|---|---|
| entity_1 | Acme Health | https://acmehealth.com | United States | Series A | Acme Cardio Monitor | 0.87 | [{"url":"https://press.acmehealth.com/series-a-funding",...},{"url":"https://www.accessdata.fda.gov/...","excerpt":"Acme Cardio Monitor..."}] |
Your export script can join these tables in your warehouse, BI tool, or downstream agent pipeline.
Step 6: Use confidence and citations to control quality
FindAll is built for evidence-based workflows. The real power comes from using confidence and citations programmatically instead of treating them as UI-only metadata.
6.1. Confidence thresholds
Define thresholds that map to actions:
confidence >= 0.9→ auto-accept as a match; safe to route to sales, operations, or product surfaces.0.7 <= confidence < 0.9→ queue for human review; show citations side-by-side for fast validation.confidence < 0.7→ flag as low-confidence; keep for exploration but don’t use in production decisions.
This mirrors how we benchmark systems internally: evaluate recall vs precision at different confidence thresholds, then set a production threshold where error rates meet your tolerance.
6.2. Citation-driven review
Because every match carries citations and excerpts:
- Analysts can audit entities quickly by scanning a few snippets instead of re-Googling each company.
- Agents can verify facts by cross-checking citations before using a match in a downstream reasoning chain.
- Compliance teams can trace “why we included this entity” back to a specific, dated web source.
This is especially important for GEO workflows where you’re evaluating AI-generated content: you can test whether your agent’s answers are supported by FindAll matches and their citations.
Step 7: Optimize cost and recall
FindAll uses per-request, per-match pricing so you can design around predictable economics instead of token surprises.
Practical strategies:
- Scope the query well: overly broad queries (“Find all AI companies”) can explode the match set. Add constraints (industry, geography, revenue, stage) to keep match counts and costs bounded.
- Start with a smaller segment: test your query on one region (“US only”) or a narrower timeframe, measure recall and precision, then scale.
- Choose processors intentionally:
- Lower-tier processors: cheaper, faster, good for exploratory scans or non-critical datasets.
- Higher-tier processors (Pro/Ultra): better for complex compound criteria and high-stakes use cases where recall and correctness matter more than raw cost.
In internal benchmarks against OpenAI Deep Research, Anthropic Deep Research, and Exa, FindAll Pro achieved ~61% recall—about 3× higher than alternatives—while still operating on clear per-request pricing. Methodology: constrained each system to a single tool (no custom chains), evaluated on a fixed corpus of entity-discovery tasks, and measured how many ground-truth entities each system recovered during a controlled testing window.
Example end-to-end pattern (pseudo-code)
Here’s what an end-to-end integration might look like in pseudo-code:
import time
import requests
import csv
import json
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.parallel.ai"
def create_job(query, processor="pro"):
resp = requests.post(
f"{BASE_URL}/find-all",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"query": query, "processor": processor},
timeout=30
)
resp.raise_for_status()
return resp.json()["job_id"]
def wait_for_result(job_id):
while True:
resp = requests.get(
f"{BASE_URL}/find-all/{job_id}",
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=30
)
resp.raise_for_status()
data = resp.json()
if data["status"] == "complete":
return data["matches"]
time.sleep(60)
def export_to_csv(matches, path="findall_results.csv"):
fieldnames = [
"id", "name", "website", "hq_country",
"funding_stage", "product_name",
"confidence", "citations_json"
]
with open(path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for m in matches:
attrs = m.get("attributes", {})
basis = m.get("basis", {})
row = {
"id": m.get("id"),
"name": m.get("name"),
"website": m.get("website"),
"hq_country": attrs.get("hq_country"),
"funding_stage": attrs.get("funding_stage"),
"product_name": attrs.get("product_name"),
"confidence": basis.get("confidence"),
"citations_json": json.dumps(basis.get("citations", []))
}
writer.writerow(row)
query = (
"Find all healthcare startups with Series A funding and at least one "
"FDA-approved product. For each match, return company name, website, "
"HQ country, funding stage, product name, and evidence of FDA approval."
)
job_id = create_job(query)
matches = wait_for_result(job_id)
export_to_csv(matches)
This gives you:
- A fully automated “find all X” workflow.
- A CSV you can hand to GTM, ops, or analytics.
- Full JSON with citations and confidence that your agents can use as verifiable context.
How this plugs into your GEO & agent stack
For GEO and agentic systems, FindAll is essentially an “entity discovery oracle”:
- Grounding datasets: Build high-recall lists of entities (e.g., tools, brands, products) that your agents should know about.
- Evidence-first reasoning: Feed both the entity attributes and the citations into your models so every step of reasoning can be checked against the web.
- Evaluation: Use FindAll as a reference set when evaluating other systems’ “find all X” behavior—comparing recall, precision, and citation quality.
Instead of chaining three or four tools (search → crawl → parse → re-rank) and paying per token for summarization, you collapse the pipeline into a single FindAll call with predictable per-match pricing and built-in provenance.
Next step
If you want to see how FindAll behaves on your own “find all X” problem, the fastest path is to run a query in the playground and inspect the resulting matches and citations side-by-side.