
Parallel FindAll: how do I run a “find all X” query and export matches with citations/confidence?
Most teams approach web discovery as a one-off scraping project: write a crawler, guess CSS selectors, hope anti-bot measures don’t fire, then patch everything when sites change. FindAll is the opposite. You describe the dataset you want in natural language (“find all X that match Y”), and Parallel returns a structured table of entities—with citations, excerpts, reasoning, and confidence for each match.
This guide walks through how to:
- Write a good “find all X” query
- Run it with Parallel FindAll (UI or API)
- Inspect citations, reasoning, and confidence
- Export matches to a file or pipe them into your own stack
All examples assume you care about evidence and reproducibility—every row should be explainable and auditable.
What FindAll actually does
FindAll turns a natural-language objective into a structured dataset:
-
Input: One query like
“Find all healthcare startups with Series A funding and at least one FDA-approved product.”
-
Output: A list of entities (companies) where each row includes:
- The entity name and key attributes (e.g., industry, funding round, product)
- Citations: URLs used to verify each attribute
- Source excerpts: token-dense snippets tied to those URLs
- Reasoning: why the entity was considered a match
- Confidence: a calibrated score you can program against
Under the hood, FindAll uses Parallel’s AI-native web index, live crawling, and “multi-hop” reasoning: it can synthesize facts from multiple pages (industry from one source, funding from another, FDA approval from a third) to make a single match decision.
Latency is asynchronous (≈10 minutes–1 hour), with pricing per match ($0.03–$1). This is built for dataset creation, not single-turn search.
Step 1: Design a precise “find all X” query
The most important work happens before you hit “run.” A good FindAll query encodes your schema and match criteria in plain language.
Define “X” and the matching rules
Think in terms of:
-
Entity type: What are you actually enumerating?
- Examples: “B2B SaaS vendors offering SOC2-compliant CRM,” “US universities with online data science master’s programs,” “companies that announced layoffs in Q1 2024.”
-
Inclusion criteria: What must be true for a positive match?
- Example for vendors: “Product must be a CRM, must serve B2B, must explicitly mention SOC2 compliance.”
-
Exclusion criteria (optional but powerful):
- “Exclude open-source projects without a commercial product.”
- “Exclude agencies and consulting firms.”
-
Fields you want back: What columns should the dataset contain?
- Name, website, HQ country, target customer, pricing model, key feature, source URLs, confidence.
Example query structure
Here’s a pattern that works well with FindAll:
“Find all [ENTITY TYPE] that meet the following criteria:
- Must: [MUST-HAVE CONDITIONS]
- Exclude: [EXPLICITLY EXCLUDED CASES]
For each match, return a structured record with: [FIELDS].
Use multiple web sources if needed to verify each condition. Include citations, source excerpts, your reasoning, and a confidence score for each match.”
Concrete example:
“Find all healthcare startups with Series A funding and at least one FDA-approved product.
Must:
- Company operates primarily in healthcare or digital health
- Has raised a Series A funding round
- Has at least one product or device that is FDA-approved
Exclude:
- Public companies
- Non-US headquarters
For each company, return:
- company_name
- website
- headquarters_country
- industry
- latest_funding_round
- primary_FDA_approved_product
- citations and excerpts used for each field
- reasoning for why this company matches
- confidence score (0–1) for the overall match.”
This gives FindAll a clear schema and lets you programmatically filter later (e.g., only accept matches with confidence ≥ 0.8).
Step 2: Run the FindAll query in the playground
If you’re starting from scratch or debugging a new query, use the FindAll playground:
-
Go to the FindAll playground
Open: https://platform.parallel.ai/play/find-all -
Paste your “find all X” query
Use the structured pattern above, including:- Entity type
- Must / exclude conditions
- Fields you want returned
- Explicit request for citations, reasoning, and confidence
-
Submit the query
FindAll runs asynchronously. Because it’s doing multi-hop reasoning and live crawling, expect:- Latency: ~10 minutes–1 hour depending on complexity and requested depth
- Pricing: per match, so you know the upper bound on cost before you start
-
Monitor status
The playground lets you see when the dataset is ready and how many matches were found.
Step 3: Inspect citations, reasoning, and confidence
Once the run completes, you’ll see a table of matches. For each row:
- Click into a match to view:
- Citations (URLs): all the pages used to verify this entity
- Source excerpts: compressed, query-relevant snippets pulled from those pages
- Reasoning: a narrative or structured explanation mapping your criteria to evidence
- Confidence score: how certain FindAll is that this row satisfies your criteria
This is Parallel’s Basis framework in action: every atomic fact is traceable back to the web, not just the final answer.
How to use these fields effectively
-
Citations
- Sanity-check regulatory-sensitive fields (e.g., “FDA-approved”)
- Identify authoritative vs. secondary sources (e.g., FDA database vs. news coverage)
-
Excerpts
- Quickly verify claims without opening every page
- Feed into downstream summarization without another crawl
-
Reasoning
- See how FindAll resolved ambiguous cases
- Debug your query wording if you notice systematic false positives/negatives
-
Confidence
- Set hard thresholds (e.g., drop matches below 0.7)
- Route low-confidence rows to a human for review
- Use confidence as a feature in downstream scoring models
Because every match is evidence-backed, you can treat the dataset as programmable rather than a static CSV you “just trust.”
Step 4: Export matches with citations and confidence
Once you’re satisfied with the run, you have two main export paths.
Export via the playground UI
From the FindAll playground:
-
Download results
- Choose CSV or JSON (JSON is better if you want nested citations and reasoning without flattening).
-
Confirm included fields
Make sure the export contains:- Entity-level fields (name, website, etc.)
- Per-field or per-entity citations (URLs)
- Excerpts / snippets
- Reasoning
- Confidence
-
Load into your system
- CSV → spreadsheet / BI tool
- JSON → data warehouse, enrichment pipeline, or agent memory store
Because outputs are structured, you can retain the full Basis payload (evidence + rationale + confidence) alongside your core columns.
Export via API
For production workflows, call FindAll from your backend or agent:
-
Create a FindAll job
- POST the query (same natural-language specification you tested in the playground).
- You get back a job ID.
-
Poll the job
- Poll the status endpoint until the job is complete (10–60 minutes depending on complexity).
-
Fetch results
- GET the final matches as JSON.
- Each match includes:
- Core fields you requested (e.g.,
company_name,industry,funding_round) - A citations array with URLs and excerpts
- A reasoning field
- A confidence score (often per entity, and in some configurations per field)
- Core fields you requested (e.g.,
-
Persist the dataset
- Store matches and their Basis metadata (citations, reasoning, confidence) in your DB.
- Keep job IDs so you can re-run or incrementally update the dataset later.
This approach collapses what used to be a multi-step pipeline—search → scrape → parse → re-rank → hand-label—into a single API call with predictable per-request economics.
Step 5: Filter, review, and iterate
One of the main advantages of FindAll over DIY scraping is programmable verifiability. Instead of manually eyeballing a spreadsheet, you can enforce quality via citations/confidence.
Programmatic filters
Examples of filters you might apply after export:
confidence >= 0.8for production use- At least one citation from a primary source domain (e.g.,
.gov, official company site) - Exclude matches where reasoning contains phrases like “cannot confirm” or “appears to”
Because FindAll’s Benchmarks show ~61% recall on Pro tier (≈3× higher than alternatives like OpenAI Deep Research, Anthropic Deep Research, and Exa), you can combine high recall with stricter post-filters and still end up with a robust, evidence-backed dataset.
(About this benchmark: FindAll Pro was evaluated on entity-discovery tasks against alternative web research tools, measuring recall as the proportion of true matches surfaced. Tests were run under constrained tool use—only the relevant web research capability enabled.)
Human-in-the-loop review
For regulated or high-stakes use cases, pair machine filters with human review:
- Auto-accept matches with confidence ≥ 0.9 and strong citations
- Queue confidence 0.6–0.9 for human review inside a simple UI that surfaces:
- Entity summary
- Top citations
- Excerpts and reasoning
- Reject or flag matches below 0.6 unless manually overridden
The Basis metadata makes human review fast and auditable—reviewers see exactly why FindAll decided something was a match.
Iterate on the query
If you see:
- False positives: tighten “must” criteria, add more “exclude” rules, or specify better source preferences.
- False negatives: broaden criteria wording and explicitly allow alternate phrasings (e.g., “Series A or seed-extension equivalent”).
Re-running the query is cheap relative to building a new scraping pipeline, and because pricing is per match, you keep cost predictable as the dataset grows.
Putting it together: A repeatable “find all X” workflow
To recap the end-to-end pattern:
-
Frame your dataset as a natural-language “find all X” query with:
- Entity type
- Must + exclude conditions
- Desired output fields
- Explicit request for citations, reasoning, and confidence
-
Prototype in the playground
- Run the query (10–60 minutes latency).
- Inspect matches, citations, reasoning, and confidence.
-
Export results
- Download CSV/JSON from the UI, or
- Use the API to fetch JSON and persist it directly.
-
Apply filters and review
- Use confidence and citation quality to programmatically filter.
- Add human review where needed.
-
Iterate and operationalize
- Refine the query for precision/recall.
- Bake the FindAll run into your enrichment, prospecting, or monitoring workflows.
Parallel FindAll is designed for exactly this: turning complex, compound “find all X” objectives into structured, evidence-backed datasets without maintaining your own crawling and parsing infrastructure.