deep research / enrichment API that outputs JSON schema with citations + confidence

Most teams looking for a deep research or enrichment API today want the same three things: structured JSON outputs, reliable citations per field, and a machine-readable confidence signal they can use to accept, flag, or discard facts programmatically. If you’re trying to build grounded agents, automated due diligence, or large-scale enrichment, “a nice summary” isn’t enough—you need a web-native research backend that behaves like infrastructure, not a black-box chat UI.

This is exactly the gap Parallel was built to fill: AI-native web research and enrichment APIs that output JSON schemas with citations, reasoning, and calibrated confidence at the field level.

Quick Answer: What You’re Actually Looking For

If you’re searching for a “deep research / enrichment API that outputs JSON schema with citations + confidence,” you’re really asking for three capabilities in one system:

Deep research – Long-horizon, multi-source investigation run by an AI-native web engine, not a single-page fetch.
Enrichment – Taking structured inputs (e.g., { company_name, website }) and filling in additional fields (employees, funding, tech stack, etc.).
Evidence-based JSON – A structured output (custom schema or auto) where each field comes with:
- Citations (URLs)
- Compressed supporting excerpts
- Reasoning / rationale
- Confidence scores you can act on in code

Parallel’s Task API (Deep Research) and Enrichment mode do exactly this, with predictable per-request costs and latency bands.

Core Capabilities You Should Demand

Before we talk specifically about Parallel, it’s useful to frame the category. A credible deep research / enrichment API for production agents should cover:

1. JSON schema control
- Accept a user-defined JSON schema for outputs (or an “auto schema” mode).
- Preserve types (string, number, enum, array, object) for downstream systems.
- Guarantee consistent shapes across runs so you can write stable consumers.
2. Evidence and provenance
- Citations per field (not just per answer).
- Snippets or “compressed excerpts” that show why a fact is claimed.
- Rationale / reasoning that’s machine-readable for audits.
- Confidence scores (0–1 or similar) per field so you can threshold or route.
3. AI-native web retrieval
- Own index + live crawling instead of thin wrappers around consumer SERPs.
- Multi-page, multi-source cross-referencing rather than single-page scraping.
- Currency controls (prioritizing recent information when needed).
4. Compute tiers with predictable economics
- Clear processor tiers (Lite/Base/Core/Pro/Ultra…) so you can trade off:
  - Latency (seconds → tens of minutes)
  - Depth (number of sources, cross-check passes)
  - Cost (CPM-style per 1,000 requests, not token roulette)
- “Pay per query, not per token” style economics for forecastable spend.
5. Asynchronous, production-grade behavior
- Async processing for deeper research jobs.
- Webhooks or polling endpoints to retrieve results.
- SOC-II level operational rigor if you’re in regulated environments.

Parallel is structured around these requirements because it was built as “infrastructure for the web’s second user”—AIs and agents—not as a UX-first search product.

How Parallel’s Deep Research & Enrichment Work

Parallel exposes deep research and enrichment primarily through its Task API and FindAll / Enrichment flows, backed by:

An AI-native web index and live crawling
A Processor architecture that lets you pick compute tiers
A Basis framework that attaches citations, reasoning, and confidence to every atomic output field

Deep Research vs. Enrichment

Parallel internally distinguishes two major workload types:

Deep Research
- Input: natural-language research objective, sometimes plus hints or seed entities.
- Output: structured JSON reports or auto-generated schemas with multi-page evidence.
- Use cases: competitive analysis, due diligence, long-horizon reasoning, market maps.
- Typical latency: 5–30 minutes, depending on processor tier (e.g., Core, Pro, Ultra4x, Ultra8x).
Enrichment
- Input: structured records you already have, e.g.:
```
{ "company_name": "ExampleCorp", "company_website": "https://example.com" }
```
- Output: the same records, augmented with fields like:
  - employee_count
  - funding_rounds
  - hq_location
  - industries
  - key_products
- Typical latency: similar deep-research bands, but often lower compute tiers if you just need targeted fields.

Both modes share the same design principle: return structured JSON with citations, reasoning, and confidence at the field level, not just a single free-form answer.

JSON Schema: Explicit vs Auto Mode

In Parallel’s Task API, you can drive deep research and enrichment with two patterns:

Explicit schema mode

You define the JSON schema that the processor must populate. Example:

{
  "type": "object",
  "properties": {
    "company_name": { "type": "string" },
    "website": { "type": "string" },
    "employee_count": { "type": "integer" },
    "funding_history": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "round": { "type": "string" },
          "date": { "type": "string", "format": "date" },
          "amount_usd": { "type": "number" },
          "lead_investors": {
            "type": "array",
            "items": { "type": "string" }
          }
        }
      }
    }
  },
  "required": ["company_name", "website"]
}

The processor will attempt to populate exactly these fields, attaching evidence metadata (Basis) to each.

Auto mode
- For deep research, you can set:
```
"output_schema": { "type": "auto" }
```
- The processor chooses a schema that best fits the research question.
- Useful when you’re exploring a new space and don’t yet know what fields you want, but still want structured, evidence-backed output.

In both cases, the Basis framework annotates each atomic field with citations, reasoning, and confidence.

What the Output Actually Looks Like

Parallel’s Basis framework adds a metadata layer to every output field. Conceptually, an enriched output might look like this (simplified):

{
  "data": {
    "company_name": "ExampleCorp",
    "website": "https://example.com",
    "employee_count": 240,
    "funding_history": [
      {
        "round": "Series A",
        "date": "2022-05-10",
        "amount_usd": 15000000,
        "lead_investors": ["Alpha Ventures"]
      }
    ]
  },
  "basis": {
    "company_name": {
      "citations": [
        "https://example.com/about",
        "https://www.linkedin.com/company/examplecorp/"
      ],
      "confidence": 0.99,
      "reasoning": "Company name appears consistently across the official website and LinkedIn profile."
    },
    "employee_count": {
      "citations": [
        "https://www.linkedin.com/company/examplecorp/people/",
        "https://example.com/careers"
      ],
      "confidence": 0.86,
      "reasoning": "LinkedIn shows ~220 employees; careers page indicates they are hiring across multiple functions. Rounded to nearest 10."
    },
    "funding_history[0].amount_usd": {
      "citations": [
        "https://techcrunch.com/article/examplecorp-series-a",
        "https://www.crunchbase.com/organization/examplecorp"
      ],
      "confidence": 0.93,
      "reasoning": "Both TechCrunch and Crunchbase report a $15M Series A in May 2022 with Alpha Ventures leading."
    }
  }
}

Key properties:

Field-level citations – Every atomic fact (e.g., funding_history[0].amount_usd) carries its own citation set.
Confidence – A numeric reliability estimate per field that you can:
- Threshold on (if confidence < 0.8 → flag or re-query)
- Use to merge multiple records programmatically
- Feed into downstream models to condition behavior
Reasoning – Machine-readable rationale describing how web sources were combined, so auditors (and future agents) can understand the derivation.

This is where Parallel differs from generic “browsing + summarization” stacks: the output isn’t just text—it’s evidence-aligned JSON you can plug directly into workflows.

Latency, Cost, and Processors

Parallel’s Processor architecture lets you trade off compute vs latency vs cost on a per-task basis:

Lite / Base – Shallower passes, lower cost, faster turnaround. Useful for light enrichment where you just need a couple of fields and aren’t making decisions on tight thresholds.
Core / Pro – Balanced depth and speed for most production workloads.
Ultra / Ultra4x / Ultra8x
- Advanced deep research with up to 8x compute.
- Typical latency bands:
  - Ultra: deep research with 2x compute, ~5–25 minutes.
  - Ultra4x / Ultra8x: heavier research, ~8–30 minutes.
- All return citations, reasoning, excerpts, confidence.

Pricing is exposed as CPM (USD per 1,000 requests) rather than token-based metering, so you can forecast spend before you scale. This matters if you’ve been burned by agent loops that balloon token usage unpredictably.

When to Use Search vs Task vs FindAll vs Enrichment

Parallel exposes multiple APIs, each optimized for part of the research/enrichment spectrum:

Search API
- Best for: fast tool calls inside agents (“what’s the most recent update on X?”).
- Output: ranked URLs + token-dense compressed excerpts in <5s.
- Use it when you want quick context, not full-scale deep research.
Extract API
- Best for: turning single URLs into structured content + compressed excerpts.
- Latency:
  - Cached pages: 1–3s
  - Live crawls: 60–90s
- Use it when you already know which pages you care about.
Task API (Deep Research / Enrichment)
- Best for: multi-page research, due diligence, enrichment against a schema.
- Latency: roughly 5–30 minutes depending on processor.
- This is where you get JSON schema + citations + confidence at scale.
FindAll
- Best for: “Find all X” style entity discovery tasks.
- Input: single natural-language objective like “Find all AI-native web search providers with published benchmarks and SOC II claims.”
- Output: a structured dataset of entities, each with match reasoning and citations.
- Latency: typically 10 minutes–1 hour.
- Useful when you don’t just want to enrich known entities—you want to discover them.

For most teams asking about “deep research / enrichment API that outputs JSON schema with citations + confidence,” the Task API in Deep Research / Enrichment mode is the right starting point, with FindAll reserved for discovery-heavy workloads.

Typical Use Cases This Unlocks

Here’s how teams usually use Parallel in production once they have field-level citations and confidence:

1. Competitive analysis

Input: A list of competitors, or even a single “do deep research on vendor X.”
Output: A JSON report with fields like:
- product_lines
- pricing_model
- positioning
- benchmarks
- security_claims
Impact:
- Automated refresh of competitor profiles weekly (via Monitor + Task).
- Product teams consume structured data instead of PDFs and slide decks.

2. Due diligence and vendor risk

Input: Company name + domain.
Output:
- regulatory_actions
- security_incidents
- certifications (e.g., SOC-II, ISO)
- leadership_background
Confidence and citations let you:
- Require confidence >= 0.9 on anything used in final decisions.
- Route lower-confidence findings to manual review.

3. Database enrichment at scale

Input: Your CRM or product database.
Output: New fields for each record:
- Tech stack inference
- Industry classification
- Revenue bands
- Founding date, HQ, key contacts
Because the system returns structured JSON with predictable schemas, you can:
- Run batch enrichment with known CPM.
- Store Basis metadata alongside your core fields for future audits.

4. Long-horizon reasoning for agents

Input: High-level instructions like “Evaluate the go-to-market strategy for the top 10 players in AI-native web search.”
Output: A structured report with sub-sections, each backed by citations and confidence.
Your agent can:
- Ingest the structured report.
- Ask follow-up questions only where confidence is low or evidence is thin.
- Avoid redoing the entire web-browse loop.

Why Evidence-Based JSON Beats “Summaries with Links”

If you’ve tried implementing deep research with generic browsing tools (e.g., OpenAI/Anthropic browsing, Perplexity, Exa/Tavily + LLM summarization), you’ve probably run into:

Hallucinated fields sneaking into your database because the LLM isn’t penalized for inventing missing data.
Unstable schemas, where the same prompt produces structurally different JSON across runs.
Token-driven cost explosions, where a few long pages and iterative agent loops double or triple your spend without warning.
Opaque provenance, where you can’t tell which source supported which field after the fact.

Parallel’s architecture addresses these directly:

AI-native index + live crawling reduce irrelevant or stale context at the source.
Processor tiers give you cost and latency ranges before you run tasks.
Basis framework means every atomic field arrives with:
- Citations
- Reasoning
- Confidence
Per-request economics (CPM) let you plan large enrichment runs knowing the upper bound on spend.

From a systems-design perspective, this unlocks something that’s hard to get with prompt-only stacks: you can treat research as a predictable, testable service, not as a best-effort agent chain.

Implementation Sketch: From Question to JSON with Confidence

Here’s how a typical workflow might look end-to-end:

Define your schema

{
  "type": "object",
  "properties": {
    "company_name": { "type": "string" },
    "website": { "type": "string" },
    "hq_city": { "type": "string" },
    "hq_country": { "type": "string" },
    "employee_count": { "type": "integer" },
    "last_funding_round": { "type": "string" },
    "last_funding_amount_usd": { "type": "number" }
  },
  "required": ["company_name", "website"]
}

Prepare your inputs

{
  "company_name": "ExampleCorp",
  "website": "https://example.com"
}

Create a Task API request
- Specify:
  - Processor tier (e.g., Core for standard depth).
  - output_schema (explicit schema above).
  - Mode: enrichment / deep research.
  - Objective: “Enrich this company record with HQ, employee count, and latest funding details.”
Run asynchronously
- The system performs:
  - Targeted web search via its own index.
  - Live crawling and extraction as needed.
  - Cross-referencing across multiple sources.
- It then returns:
  - data (fields populated per schema).
  - basis metadata for each field.
Post-process programmatically
- Example logic:
  - Accept any field with confidence >= 0.9.
  - Queue fields with 0.7 <= confidence < 0.9 for human review.
  - Drop fields with confidence < 0.7 or rerun with a higher compute tier.

This is how you turn “deep research” into a deterministic, auditable pipeline instead of an LLM chain that might change behavior silently over time.

Methodology & Benchmarks (How Parallel Measures Itself)

Parallel is benchmark-driven by design. When we say “state of the art across the most challenging benchmarks,” that refers to published evaluations like:

HLE, BrowseComp, DeepResearch Bench – Long-horizon web research tasks.
RACER, WISER-Atomic, WISER-FindAll – Accuracy of atomic facts and entity discovery under constrained tool usage.
Testing typically constrains the agent to only use search-like tools (no privileged access), and uses judge models against held-out answer keys.
Benchmarks report:
- Accuracy/recall across tasks.
- Latency distributions by processor tier.
- CPM (USD per 1,000 requests) on a log scale.

This matters because it aligns with how you’d actually deploy the system: as a constrained tool in an agent, with clear performance envelopes, not as an unconstrained chat assistant.

Final Verdict

If your requirement is a deep research / enrichment API that outputs JSON schema with citations + confidence, you’re looking for:

Structured JSON outputs controlled by explicit schemas or an auto-schema mode.
Field-level provenance (citations, reasoning, confidence) that lets you treat every value as an auditable fact.
AI-native retrieval infrastructure that collapses search → scrape → parse → re-rank into a single call.
Predictable costs via per-request pricing and processor tiers, not token-metered browsing.

Parallel’s Task API (Deep Research / Enrichment), backed by its AI-native web index, Processor architecture, and Basis framework, is purpose-built for this exact use case: programmatic, evidence-based web research for agents and workflows—not just for humans reading SERPs.

Next Step

Get Started