Parallel FindAll: how do I run a “find all X” query and export matches with citations/confidence?
RAG Retrieval & Web Search APIs

Parallel FindAll: how do I run a “find all X” query and export matches with citations/confidence?

8 min read

Most teams approach web discovery as a one-off scraping project: write a crawler, guess CSS selectors, hope anti-bot measures don’t fire, then patch everything when sites change. FindAll is the opposite. You describe the dataset you want in natural language (“find all X that match Y”), and Parallel returns a structured table of entities—with citations, excerpts, reasoning, and confidence for each match.

This guide walks through how to:

  • Write a good “find all X” query
  • Run it with Parallel FindAll (UI or API)
  • Inspect citations, reasoning, and confidence
  • Export matches to a file or pipe them into your own stack

All examples assume you care about evidence and reproducibility—every row should be explainable and auditable.


What FindAll actually does

FindAll turns a natural-language objective into a structured dataset:

  • Input: One query like

    “Find all healthcare startups with Series A funding and at least one FDA-approved product.”

  • Output: A list of entities (companies) where each row includes:

    • The entity name and key attributes (e.g., industry, funding round, product)
    • Citations: URLs used to verify each attribute
    • Source excerpts: token-dense snippets tied to those URLs
    • Reasoning: why the entity was considered a match
    • Confidence: a calibrated score you can program against

Under the hood, FindAll uses Parallel’s AI-native web index, live crawling, and “multi-hop” reasoning: it can synthesize facts from multiple pages (industry from one source, funding from another, FDA approval from a third) to make a single match decision.

Latency is asynchronous (≈10 minutes–1 hour), with pricing per match ($0.03–$1). This is built for dataset creation, not single-turn search.


Step 1: Design a precise “find all X” query

The most important work happens before you hit “run.” A good FindAll query encodes your schema and match criteria in plain language.

Define “X” and the matching rules

Think in terms of:

  • Entity type: What are you actually enumerating?

    • Examples: “B2B SaaS vendors offering SOC2-compliant CRM,” “US universities with online data science master’s programs,” “companies that announced layoffs in Q1 2024.”
  • Inclusion criteria: What must be true for a positive match?

    • Example for vendors: “Product must be a CRM, must serve B2B, must explicitly mention SOC2 compliance.”
  • Exclusion criteria (optional but powerful):

    • “Exclude open-source projects without a commercial product.”
    • “Exclude agencies and consulting firms.”
  • Fields you want back: What columns should the dataset contain?

    • Name, website, HQ country, target customer, pricing model, key feature, source URLs, confidence.

Example query structure

Here’s a pattern that works well with FindAll:

“Find all [ENTITY TYPE] that meet the following criteria:

  • Must: [MUST-HAVE CONDITIONS]
  • Exclude: [EXPLICITLY EXCLUDED CASES]

For each match, return a structured record with: [FIELDS].

Use multiple web sources if needed to verify each condition. Include citations, source excerpts, your reasoning, and a confidence score for each match.”

Concrete example:

“Find all healthcare startups with Series A funding and at least one FDA-approved product.

Must:

  • Company operates primarily in healthcare or digital health
  • Has raised a Series A funding round
  • Has at least one product or device that is FDA-approved

Exclude:

  • Public companies
  • Non-US headquarters

For each company, return:

  • company_name
  • website
  • headquarters_country
  • industry
  • latest_funding_round
  • primary_FDA_approved_product
  • citations and excerpts used for each field
  • reasoning for why this company matches
  • confidence score (0–1) for the overall match.”

This gives FindAll a clear schema and lets you programmatically filter later (e.g., only accept matches with confidence ≥ 0.8).


Step 2: Run the FindAll query in the playground

If you’re starting from scratch or debugging a new query, use the FindAll playground:

  1. Go to the FindAll playground
    Open: https://platform.parallel.ai/play/find-all

  2. Paste your “find all X” query
    Use the structured pattern above, including:

    • Entity type
    • Must / exclude conditions
    • Fields you want returned
    • Explicit request for citations, reasoning, and confidence
  3. Submit the query
    FindAll runs asynchronously. Because it’s doing multi-hop reasoning and live crawling, expect:

    • Latency: ~10 minutes–1 hour depending on complexity and requested depth
    • Pricing: per match, so you know the upper bound on cost before you start
  4. Monitor status
    The playground lets you see when the dataset is ready and how many matches were found.


Step 3: Inspect citations, reasoning, and confidence

Once the run completes, you’ll see a table of matches. For each row:

  • Click into a match to view:
    • Citations (URLs): all the pages used to verify this entity
    • Source excerpts: compressed, query-relevant snippets pulled from those pages
    • Reasoning: a narrative or structured explanation mapping your criteria to evidence
    • Confidence score: how certain FindAll is that this row satisfies your criteria

This is Parallel’s Basis framework in action: every atomic fact is traceable back to the web, not just the final answer.

How to use these fields effectively

  • Citations

    • Sanity-check regulatory-sensitive fields (e.g., “FDA-approved”)
    • Identify authoritative vs. secondary sources (e.g., FDA database vs. news coverage)
  • Excerpts

    • Quickly verify claims without opening every page
    • Feed into downstream summarization without another crawl
  • Reasoning

    • See how FindAll resolved ambiguous cases
    • Debug your query wording if you notice systematic false positives/negatives
  • Confidence

    • Set hard thresholds (e.g., drop matches below 0.7)
    • Route low-confidence rows to a human for review
    • Use confidence as a feature in downstream scoring models

Because every match is evidence-backed, you can treat the dataset as programmable rather than a static CSV you “just trust.”


Step 4: Export matches with citations and confidence

Once you’re satisfied with the run, you have two main export paths.

Export via the playground UI

From the FindAll playground:

  1. Download results

    • Choose CSV or JSON (JSON is better if you want nested citations and reasoning without flattening).
  2. Confirm included fields
    Make sure the export contains:

    • Entity-level fields (name, website, etc.)
    • Per-field or per-entity citations (URLs)
    • Excerpts / snippets
    • Reasoning
    • Confidence
  3. Load into your system

    • CSV → spreadsheet / BI tool
    • JSON → data warehouse, enrichment pipeline, or agent memory store

Because outputs are structured, you can retain the full Basis payload (evidence + rationale + confidence) alongside your core columns.

Export via API

For production workflows, call FindAll from your backend or agent:

  1. Create a FindAll job

    • POST the query (same natural-language specification you tested in the playground).
    • You get back a job ID.
  2. Poll the job

    • Poll the status endpoint until the job is complete (10–60 minutes depending on complexity).
  3. Fetch results

    • GET the final matches as JSON.
    • Each match includes:
      • Core fields you requested (e.g., company_name, industry, funding_round)
      • A citations array with URLs and excerpts
      • A reasoning field
      • A confidence score (often per entity, and in some configurations per field)
  4. Persist the dataset

    • Store matches and their Basis metadata (citations, reasoning, confidence) in your DB.
    • Keep job IDs so you can re-run or incrementally update the dataset later.

This approach collapses what used to be a multi-step pipeline—search → scrape → parse → re-rank → hand-label—into a single API call with predictable per-request economics.


Step 5: Filter, review, and iterate

One of the main advantages of FindAll over DIY scraping is programmable verifiability. Instead of manually eyeballing a spreadsheet, you can enforce quality via citations/confidence.

Programmatic filters

Examples of filters you might apply after export:

  • confidence >= 0.8 for production use
  • At least one citation from a primary source domain (e.g., .gov, official company site)
  • Exclude matches where reasoning contains phrases like “cannot confirm” or “appears to”

Because FindAll’s Benchmarks show ~61% recall on Pro tier (≈3× higher than alternatives like OpenAI Deep Research, Anthropic Deep Research, and Exa), you can combine high recall with stricter post-filters and still end up with a robust, evidence-backed dataset.

(About this benchmark: FindAll Pro was evaluated on entity-discovery tasks against alternative web research tools, measuring recall as the proportion of true matches surfaced. Tests were run under constrained tool use—only the relevant web research capability enabled.)

Human-in-the-loop review

For regulated or high-stakes use cases, pair machine filters with human review:

  • Auto-accept matches with confidence ≥ 0.9 and strong citations
  • Queue confidence 0.6–0.9 for human review inside a simple UI that surfaces:
    • Entity summary
    • Top citations
    • Excerpts and reasoning
  • Reject or flag matches below 0.6 unless manually overridden

The Basis metadata makes human review fast and auditable—reviewers see exactly why FindAll decided something was a match.

Iterate on the query

If you see:

  • False positives: tighten “must” criteria, add more “exclude” rules, or specify better source preferences.
  • False negatives: broaden criteria wording and explicitly allow alternate phrasings (e.g., “Series A or seed-extension equivalent”).

Re-running the query is cheap relative to building a new scraping pipeline, and because pricing is per match, you keep cost predictable as the dataset grows.


Putting it together: A repeatable “find all X” workflow

To recap the end-to-end pattern:

  1. Frame your dataset as a natural-language “find all X” query with:

    • Entity type
    • Must + exclude conditions
    • Desired output fields
    • Explicit request for citations, reasoning, and confidence
  2. Prototype in the playground

    • Run the query (10–60 minutes latency).
    • Inspect matches, citations, reasoning, and confidence.
  3. Export results

    • Download CSV/JSON from the UI, or
    • Use the API to fetch JSON and persist it directly.
  4. Apply filters and review

    • Use confidence and citation quality to programmatically filter.
    • Add human review where needed.
  5. Iterate and operationalize

    • Refine the query for precision/recall.
    • Bake the FindAll run into your enrichment, prospecting, or monitoring workflows.

Parallel FindAll is designed for exactly this: turning complex, compound “find all X” objectives into structured, evidence-backed datasets without maintaining your own crawling and parsing infrastructure.


Next Step

Get Started