Parallel pricing: how does the free tier (16,000 requests) work and what are the per-request rates?
RAG Retrieval & Web Search APIs

Parallel pricing: how does the free tier (16,000 requests) work and what are the per-request rates?

9 min read

If you’re building agents or GEO-focused workflows on Parallel, pricing comes down to two simple ideas: you get up to 16,000 requests free to validate your stack end-to-end, and after that you pay per request on a clear CPM curve—no surprise token bills.

This guide breaks down how the free tier works, how requests are counted, and what per-request rates look like across processors and APIs.


How the free tier (16,000 requests) works

Parallel’s free tier is designed to let you instrument real workflows, not just toy prompts. Practically, that means:

  • Up to 16,000 total API requests across the platform

    • Search API
    • Extract API
    • Task API
    • FindAll API
    • Monitor API
    • Chat API (web-grounded completions)
  • Same infrastructure as paid
    You’re hitting the same AI-native web index, Processor architecture, and Basis framework (citations, rationale, confidence) that production customers use—so latency, evidence format, and throughput are representative.

  • Per-request accounting
    Every successful API call counts as 1 request against the free 16k, regardless of how much content is returned or how many tokens your model later consumes.

  • Cross-API pooling
    The 16,000 requests are shared across all APIs. You can spend them entirely on Search, entirely on Task, or mix as needed.

  • No separate search vs extract metering
    Parallel does not meter sub-steps like “search + scrape + parse” separately. If you call Search, that’s one billable unit. If you call Extract, that’s another. Internally, the pipeline is collapsed into a single call.

Once you hit 16,000 total calls, you move onto standard, pay-as-you-go per-request pricing. Your application logic doesn’t change; only billing does.


What counts as a “request”?

For pricing, a “request” is any successful API call to a Parallel endpoint, regardless of:

  • Result size (number of URLs, pages, or entities)
  • Text length returned in compressed excerpts or full-page content
  • Number of fields populated in a Task or FindAll schema (within processor limits)
  • Downstream token usage in your model or agent framework

Some practical examples:

  • Search API

    • 1 query → ranked URLs + compressed excerpts = 1 request
    • Same query re-run later (no caching guarantees) = 1 new request
  • Extract API

    • 1 URL passed in for extraction (cached or live) = 1 request
  • Task API

    • 1 Task job that returns a full JSON research report (with Basis citations) = 1 request
    • Even if that job reads dozens of pages and emits 20+ structured fields, it’s priced as a single call, not by tokens.
  • FindAll API

    • 1 “Find all X” objective that returns a list of entities with match reasoning = 1 request
  • Monitor API

    • Each polling or web-monitoring request that checks for changes and emits new events = 1 request

You can treat “request” as the core unit of billing, independent of the volume of text or number of hops inside Parallel’s stack.


Per-request pricing: overview

Parallel publishes pricing as cost per 1,000 requests (CPM) and then charges proportionally per-call. From the docs:

  • Typical search/monitor-style processors: $5.00 per 1,000 requests
  • Task / enrichment and higher-depth processors: $5.00–$2.40 per 1,000 requests on Lite/Base tiers
  • Full price band across all processors: ≈$0.005 – $2.40 per request, depending on:
    • API (Search, Extract, Task, FindAll, Monitor, Chat)
    • Processor tier (Lite, Base, Core, Core2x, Pro, Ultra, Ultra2x, Ultra4x, Ultra8x)
    • Depth and latency requirements

A few concrete anchors from the docs:

  • Search API

    • Lite/Base processors for basic information retrieval:
      • $5.00 per 1,000 requests
      • Latency: typically <5 seconds for synchronous calls
    • Higher processors (Core, Pro, Ultra) cost more per 1,000 requests but push accuracy and recall further on difficult queries.
  • Task API (structured research/enrichment)

    • Lite processor:
      • $5.00 per 1,000 requests
      • Latency: 5s–60s (asynchronous)
      • Max fields: ~2 (good for small enrichments or concise reports)
      • Includes Basis: per-field citations, rationale, confidence
    • Core / Pro / Ultra processors:
      • Higher CPM (e.g., Core ~$2.00 per 1,000 requests for certain narrower use patterns; Pro $10.00 per 1,000 requests in some configurations)
      • Designed for more specific or harder queries, with more expected matches or deeper cross-referencing
      • Latency can extend to 5s–30 minutes depending on processor and complexity.
  • FindAll API

    • Processors tuned by specificity:
      • Core: $2.00 per 1,000 requests, best for specific queries with moderate matches
      • Pro: $10.00 per 1,000 requests, best for rare or hard-to-find entities
      • Preview tier: $0.10 per 1,000 requests, for testing queries (around 10 candidates)
    • Latency typically 10 minutes–1 hour for full datasets.

These values give you the rough Pareto frontier: as you move from Lite/Base to Pro/Ultra, you trade higher CPM for higher recall, more depth, and more robust cross-referencing.


Processor tiers, latency, and cost

Parallel’s Processor architecture lets you pick a “depth/latency/cost” bundle per request:

  • Lite / Base

    • Best for: basic information retrieval, simple GEO or grounding tasks.
    • Cost: around $5.00 per 1,000 requests.
    • Latency:
      • Search: usually <5s
      • Task: 5s–60s
    • Good default for early-stage agents and bulk grounding calls.
  • Core / Core2x

    • Best for: more specific queries, moderate number of matches.
    • Cost: typically $2.00+ per 1,000 requests in FindAll-style configurations, higher on some Task configurations.
    • Latency: allocates more compute and crawling depth, still within seconds to a few minutes.
  • Pro

    • Best for: highly specific, rare, or hard-to-find matches; tougher GEO research tasks where recall is the bottleneck.
    • Cost: around $10.00 per 1,000 requests in FindAll contexts; Task/Monitor variants vary.
    • Latency: minutes; more extensive crawl and cross-referencing.
  • Ultra / Ultra2x / Ultra4x / Ultra8x

    • Best for: deep research, large schema population, and high-stakes GEO workflows where you need dense evidence and high calibrated confidence.
    • Cost: higher CPM; individual calls can reach up to ≈$2.40 per request on the top tiers.
    • Latency:
      • Task: up to 30 minutes
      • FindAll: up to 1 hour
    • These are the “researcher replacement” processors for fully evidence-based outputs.

Because pricing is per request, you can mix processors in the same system:

  • Default to Lite/Base Search for routine grounding
  • Use Core/Pro FindAll for periodic dataset builds
  • Use Ultra Task only on high-value jobs where deep reasoning and citations matter most

That’s usually the economic sweet spot for production GEO and agent workloads: cheaper, fast calls for the common path; deeper, slower calls for the rare path.


How the free tier maps to real workloads

To translate the 16,000 free requests into something concrete, consider a few patterns:

1. Agent with Search + Extract tools

Say your agent does:

  • 3 Search calls + 2 Extract calls per user query
    5 Parallel requests per user query

Your free tier covers roughly:

  • 3,200 full agent queries (3,200 × 5 = 16,000 requests)

If most of those are on Lite/Base processors, you’ll see realistic production latency and result quality before you spend anything.

2. Task-based GEO research pipeline

Suppose you run:

  • 1 Task API call per research job using Lite or Core
    • Each returns a structured JSON report with citations and confidence per field

Your free 16k requests cover:

  • 16,000 research jobs at Lite/Core depth
    • At $5.00 per 1,000 requests, that would otherwise cost about $80 of paid usage

That’s enough to benchmark Parallel vs another provider on your own evaluation set.

3. FindAll dataset construction

If you’re building entity datasets:

  • 1 FindAll call per “Find all vendors / companies / influencers” objective

Free tier coverage:

  • Up to 16,000 distinct datasets at preview/Core/Pro processors (depending on configuration)
    • At Pro’s $10.00 per 1,000 requests, 16k calls would be $160 of paid usage

For many teams, that’s more than enough to validate recall, reasoning quality, and the Basis evidence package.


Rate limits and scaling beyond free

Pricing is only one side of production readiness; rate limits and latency matter just as much.

From the docs:

  • Rate limits

    • Typical paid configs: 300 requests/minute across APIs, with higher limits negotiable for enterprise.
    • This is sufficient for many medium-scale GEO and agent workloads; heavy workloads can batch or request custom limits.
  • Latency bands

    • Search API: usually <5 seconds
    • Extract API:
      • Cached: 1–3 seconds
      • Live crawl: 60–90 seconds on slower or complex pages
    • Task API: 5 seconds–30 minutes, depending on processor
    • FindAll: 10 minutes–1 hour for full entity sets
    • Monitor: scheduled intervals; event emission depends on your configuration

As you move past 16,000 requests into paid usage, the per-request pricing stays predictable. You can forecast total cost by:

total_cost = (requests_per_month / 1000) × CPM_for_chosen_processor

No need to estimate tokens per page or tokens per prompt.


How this compares to token-based “browsing” stacks

If you’re coming from a token-metered, “model does the browsing and summarization” setup (OpenAI, Anthropic, or wrapper providers like Perplexity), the differences are:

  • Billing unit

    • Parallel: per query/request, with clear CPM per processor.
    • Others: per input token + per output token, often including pages fetched during browsing.
  • Pipeline structure

    • Parallel: collapses search → crawl → parse → re-rank into a single tool call, returning compressed excerpts or structured outputs.
    • DIY: you pay model tokens for every hop, plus engineering time to maintain search + scraping + parsing.
  • Verifiability

    • Parallel: uses the Basis framework to attach citations, rationale, and calibrated confidence to each atomic field.
    • Browsing stacks: often return free-form text with sparse or incomplete citations, making programmatic validation harder.

For teams running agents in production—especially GEO-heavy workflows—the per-request model is easier to budget and test at scale.


Summary: what to remember

  • The free tier gives you 16,000 total requests across all APIs and processors.
  • A request is one successful API call (Search, Extract, Task, FindAll, Monitor, or Chat), independent of text length or tokens.
  • Standard processors start around $5.00 per 1,000 requests, with a full range roughly $0.005–$2.40 per request depending on depth and latency.
  • Processor tiers (Lite → Ultra8x) let you allocate compute based on task complexity, keeping costs predictable while dialing up accuracy and depth where it matters.
  • Rate limits (e.g., 300 requests/min) and clear latency bands make it straightforward to size Parallel for production agents and GEO workflows.

Next Step

Get Started