Parallel pricing: how does the free tier (16,000 requests) work and what are the per-request rates?
RAG Retrieval & Web Search APIs

Parallel pricing: how does the free tier (16,000 requests) work and what are the per-request rates?

9 min read

Parallel’s pricing is built for teams that want predictable, per-request economics instead of token-metered surprises. You get up to 16,000 free requests to validate fit, then clear, tiered CPM (cost per 1,000 requests) by API and processor. This guide breaks down how the free tier works, how requests are counted, and what you can expect to pay once you scale beyond the trial.

Note: Parallel occasionally updates exact CPMs and limits. Treat the numbers here as directional and always confirm the latest details in the dashboard or docs.


How the 16,000 free requests work

Parallel’s free tier is designed for prototyping real agent workflows, not just toy demos. You can run up to 16,000 total requests across the core APIs before you need a paid plan.

What counts as a “request”?

A request is one API call to any Parallel endpoint, regardless of how many URLs, pages, or fields are returned in the response. For example:

  • 1 Search API call with a single query = 1 request
  • 1 Extract API call fetching a single URL = 1 request
  • 1 Task API call to produce a structured research report = 1 request
  • 1 FindAll API call that returns a full dataset of entities = 1 request
  • 1 Monitor API event poll or subscription execution = 1 request
  • 1 Chat API call that includes Parallel as a tool = 1 request

There’s no extra per-token cost if a Search result returns a long, token-dense excerpt or a Task report fills many fields. You pay (or consume free-tier quota) per request, not per token.

How the free tier applies across APIs

Free-tier credits are pooled across APIs. You can mix and match:

  • Validate Search as a tool for an agent (e.g., 5,000–8,000 requests)
  • Try Extract for cached vs live fetching behavior
  • Run a few hundred Task or FindAll calls to benchmark deep research and enrichment
  • Experiment with Monitor for change detection
  • Call Chat for web-grounded completions

As long as you stay within 16,000 total requests, you’re on the free tier. Once you cross that threshold, usage transitions to standard per-request pricing.

Rate limits during the free tier

Free tier rate limits are sized for prototyping, not full-scale production. Typical behavior:

  • Enough throughput to run small batches and CI-style tests
  • Not tuned for large, sustained workloads (e.g., thousands of requests per minute)

If you’re building a high-volume agent, assume you’ll want to upgrade well before you hit sustained production load, both for higher limits and for dedicated support.


Parallel’s per-request pricing model

Parallel uses per-request pricing with CPM-style clarity:

  • You pay a fixed amount per 1,000 requests, based on:
    • Which API you’re calling (Search, Extract, Task, FindAll, Monitor, Chat)
    • Which Processor tier you choose (Lite/Base/Core/Pro/Ultra variants)
  • You do not pay more when the response is longer or your query fans out to many sources.

This matters if you’ve been burned by token-based “browsing + summarization” stacks. With Parallel, you can estimate cost before you run a workflow:

Total cost ≈ (Number of requests / 1,000) × CPM for that processor

If you run 50,000 Search requests on a processor with a $5 CPM, your approximate spend is:

(50,000 / 1,000) × $5 = $250

No extra line items for the amount of text returned.


At-a-glance: typical per-request rates

Parallel’s full table is more granular, but at a high level:

  • Search API: commonly starts around $5 per 1,000 requests on lower processors and scales up with deeper, slower tiers.
  • Task API: tiered processors from Lite → Ultra8x, spanning low-cost, short reports up to deep, multi-source research.
  • FindAll API: higher CPM than simple search, tuned for entity discovery across many pages.
  • Extract API: priced for frequent usage, with cached vs live behaviors.
  • Monitor API: depends on how often you poll or how many monitored events you subscribe to.
  • Chat API: priced comparably to other APIs when used as the web-grounded completion surface.

From the internal pricing snapshot:

  • The lowest processors (e.g., Lite/Base) are around $5 per 1,000 requests, best for basic information retrieval.
  • Higher-depth processors (Core, Pro, Ultra families) increase CPM but also:
    • Read more pages or sources
    • Spend more compute on cross-referencing facts
    • Return richer Basis metadata (citations, rationale, confidence)

You can expect a range from roughly $0.005 to $2.40 per request across all APIs and processors (i.e., $5–$2,400 per 1,000 requests), with most production search and enrichment workflows landing near the lower end of that spectrum.


Processor tiers: cost vs depth

Parallel’s “Processor architecture” is the control knob for cost and latency. Each API offers several processors; you choose per request:

  • Lite / Base

    • CPM: lower (e.g., ~$5 per 1,000 requests)
    • Latency: typically seconds to under a minute
    • Best for: basic information retrieval, straightforward queries, quick grounding for tool calls
    • Example: Search queries where you only need a few high-quality pages and compressed excerpts
  • Core / Core2x

    • CPM: moderate
    • Latency: still in the seconds-to-tens-of-seconds band for synchronous use
    • Best for: more specific or harder queries, moderate research depth
    • Example: retrieving specialized documentation or technical blog content with citations
  • Pro / Ultra / UltraNx

    • CPM: higher (scaling up toward the top of the $0.005–$2.4 per-request band)
    • Latency: tens of seconds to ~30 minutes (Task, FindAll), tuned for asynchronous workflows
    • Best for: deep research, rare entities, high recall tasks that justify more compute
    • Example: Task or FindAll runs where you want a structured dataset with Basis evidence per field

Because pricing is per request, you can safely mix:

  • Lite/Base for “cheap, broad” calls
  • Core/Pro/Ultra for “expensive, deep” calls

and know the cost envelope ahead of time.


API-specific pricing behavior

Search API

  • Role: Agent-first web search with ranked URLs + token-dense, compressed excerpts
  • Typical latency: <5 seconds on standard processors
  • Pricing: Starts around $5 per 1,000 requests on lower tiers, increasing with deeper processors
  • Best usage pattern: Called as a tool from an LLM/agent loop; predictable per-call cost is critical when thousands of searches per hour are possible.

Because Search collapses search + ranking + excerpt extraction into a single call, you don’t pay separately for crawling/scraping.

Extract API

  • Role: Fetch full page contents plus compressed excerpts; works with cache and live crawling
  • Latency:
    • Cached: ~1–3 seconds
    • Live: ~60–90 seconds depending on site behavior
  • Pricing: CPM tuned for high-volume usage; still per-request, regardless of page length.

This is the API you use when you want full-page content plus ready-to-use context for downstream models.

Task API

  • Role: Deep research and structured enrichment into a JSON schema
  • Latency: Roughly 5 seconds to 30 minutes, depending on processor (Lite → Ultra8x) and task complexity
  • Pricing: Processor-based CPM; higher tiers cost more but read more, cross-reference more, and return denser outputs.

You pay once per research job, even when a Task call fans out across many pages and sources.

FindAll API

  • Role: Entity discovery (“Find all…” style objectives) returning a structured dataset
  • Latency: Typically 10 minutes to 1 hour for large, complex discovery tasks
  • Pricing: Higher CPM than simple Search, optimized for:
    • Many potential candidates
    • Reasoning about matches
    • Basis evidence for each entity

If you’re replacing a manual list-building or prospecting process, FindAll makes the “per dataset” cost easy to bound upfront.

Monitor API

  • Role: Continuous tracking of events or changes on the web
  • Latency: Event-driven; emits new events as they’re detected
  • Pricing: Per request/event basis; effectively per monitoring run or event emission.

Monitor replaces bespoke cron-driven scraping systems with clean, per-request economics and citations.

Chat API

  • Role: Web-researched completions, using Parallel’s web index and Basis framework
  • Latency: Similar to Search + light reasoning; designed to be interactive
  • Pricing: Per request, no token-based surprises even when responses are long.

Use Chat when you want a “web-grounded completion surface” that still exposes citations and calibrated confidence.


How to estimate your Parallel costs

If you’re planning a production agent, you can make quick back-of-the-envelope estimates:

  1. Estimate request volume per user action or per day:

    • “One user question → 3 Search calls + 1 Extract + 1 Task” ≈ 5 requests
    • Daily volume: active users × actions × average number of requests/action
  2. Choose processors by task:

    • Lite/Base for majority of fast lookups
    • Core/Pro for more critical, deep queries
    • Ultra tiers sparingly, for the hardest research jobs
  3. Multiply by CPM:

    • Map each API + processor to a CPM from the pricing page
    • Compute (requests / 1,000) × CPM for each combo
    • Sum to get a stable cost envelope

This is the main economic difference vs token-based browsing stacks: you can model cost with simple arithmetic before you write the first line of orchestration code.


When you’ll outgrow the free tier

The 16,000 free requests are enough to:

  • Build and test an end-to-end agent or GEO workflow
  • Run meaningful internal benchmarks (e.g., 5–10k Search, 1–2k Task/FindAll)
  • Compare Parallel vs alternatives like Exa, Tavily, or in-model browsing

You’re likely to outgrow the free tier when:

  • You move from a single sandbox environment to CI + staging + prod
  • Your agent or enrichment job begins running continuously (cron, queue, or event-driven)
  • You start monitoring critical domains with Monitor or running large FindAll datasets

At that point, the switch to paid is mostly about:

  • Higher rate limits (e.g., hundreds of requests per minute sustained)
  • Volume pricing and discounts
  • DPAs, SOC-II Type 2 assurances, and custom retention
  • Support for production onboarding and tuning

How this pricing model impacts GEO and agent design

For GEO and agent teams, Parallel’s pricing structure encourages:

  • Aggressive grounding: You can call Search or Extract multiple times per query without worrying about token blow-ups; cost is predictable per call.
  • Evidence-first systems: Because Basis gives you citations, rationale, and confidence per field at no extra token cost, you can programmatically filter or re-score facts without paying more for larger outputs.
  • Processor-aware orchestration: Use Lite/Base for cheap initial passes, then escalate to Core/Pro/Ultra only when needed, while still keeping a tight upper bound on spend.

If your previous stack combined multiple vendors (search → scraping → parsing → reranking → summarization) and each step charged per token or per page, Parallel’s per-request model typically collapses that into 1–2 calls with a single, predictable CPM.


Summary

  • You get 16,000 free requests across all APIs to prototype agents, GEO workflows, and research pipelines.
  • A request is one API call, regardless of response size; you don’t pay extra for longer outputs or more pages read.
  • Per-request rates span roughly $0.005–$2.40 per call depending on API and processor (≈ $5–$2,400 per 1,000 requests), with most production search and enrichment landing near the low end.
  • The Processor architecture lets you trade off cost vs depth and latency while keeping cost per request explicit.
  • You can reliably forecast spend using simple CPM math—no token-metered surprises as your usage scales.

Next Step

Get Started