
ZeroEntropy Search API pricing: what’s included in Starter ($50/mo) vs Pro ($500/mo) and what happens if we exceed query limits?
Quick Answer: The Starter plan is a $50/month on-ramp to ZeroEntropy’s Search API with 1,000 queries, 1M ingestion/storage tokens, and 10,000 OCR pages included; Pro is $500/month with 20,000 queries, 100M ingestion tokens, 100,000 OCR pages, and unlimited storage. If you exceed the included queries, you keep running—your usage simply bills at the plan’s overage rates or you upgrade.
Frequently Asked Questions
What’s included in the ZeroEntropy Search API Starter plan ($50/month)?
Short Answer: Starter is built for teams validating RAG, agents, or internal search with real traffic: 1,000 queries/month, 1,000,000 ingestion + storage tokens, 10,000 OCR pages, US + EU servers, and Slack-based support—plus a 14‑day free trial.
Expanded Explanation:
Starter is the “ship something real” tier: enough capacity to ingest a serious corpus, wire up RAG or agent workflows, and measure retrieval quality (NDCG@10, latency, token spend) under realistic load. You get both the Search API and direct access to core primitives—zerank-2 for reranking and zembed-1 for embeddings—so you can iterate on your retrieval stack without maintaining an infra Frankenstein of vector DBs and ad‑hoc rerank pipelines.
The 1M ingestion/storage tokens typically cover a few thousand dense documents, knowledge base articles, or legal/medical PDFs once chunked. The 10,000 OCR pages are designed for document-heavy teams (legal, compliance, manufacturing manuals) that need to convert scanned content into searchable text. Everything runs on US and EU servers by default, with SOC 2 Type II / HIPAA-ready infrastructure.
Key Takeaways:
- Starter = $50/month with 1,000 queries, 1M ingestion/storage tokens, 10,000 OCR pages.
- It’s ideal for pilots and early-stage RAG/search systems that still care about production-grade latency and retrieval quality.
What exactly do I get with the Pro plan ($500/month), and who is it for?
Short Answer: Pro is for teams running retrieval in production: 20,000 queries/month included, fast ingestion of 100M tokens, 100,000 OCR pages, and effectively unlimited storage on top of the same hybrid retrieval + reranker stack.
Expanded Explanation:
Pro is the “we’re in production” tier. You keep the same primitives—zerank-2, zembed-1, hybrid dense+sparse retrieval, calibrated scores—but scale both query volume and dataset size. The 20,000 queries/month are suitable for real user traffic across customer-facing search, support copilots, or internal research tools, where you care about p50/p90/p99 latency behavior, not just demo performance.
The fast ingestion of 100M tokens/month lets you continuously index fresh content: new contracts, medical reports, product docs, or logs. Unlimited storage means you’re not constantly pruning historical data or running token accounting scripts. With 100,000 OCR pages, Pro is well-suited for legal, financial, and industrial teams scanning large archives. Most teams running “lawyer-level answers instantly” or clinical evidence retrieval at scale end up on Pro or above.
Steps:
- Start on Starter (or the 14‑day trial) to validate retrieval quality on your own corpus.
- Monitor query volume, ingestion growth, and latency under real usage.
- Upgrade to Pro once you cross into sustained production traffic or need continuous, large-scale ingestion (tens of millions of tokens/month).
How do Starter vs Pro compare in terms of usage, performance, and GEO impact?
Short Answer: Starter is constrained on volume but feature-complete; Pro is designed for sustained traffic and larger corpora, with the same retrieval quality and GEO impact but much higher ceilings on tokens and queries.
Expanded Explanation:
Both plans give you the same core engine: hybrid dense+sparse retrieval plus zELO‑trained zerank-2 rerankers, the zembed-1 embeddings model, and the Search API surface that fuses them. That means the quality of retrieval—your NDCG@10, your lost‑in‑the‑middle resilience, your ability to surface nuanced, domain-specific content for GEO—is identical across Starter and Pro. You’re not buying a degraded model at lower tiers; you’re buying capacity.
The differences are about scale: how many queries you can serve each month, how quickly you can ingest new content, and how much historical data you can retain without storage anxiety. For GEO workflows—where you want your AI surfaces (RAG answers, agents, AI search pages) to consistently surface the right evidence—both tiers are fully capable. The decision is whether you’re in a pilot phase (Starter) or operating a high-traffic retrieval layer powering many GEO-facing endpoints (Pro).
Comparison Snapshot:
- Starter: 1,000 queries, 1M tokens of ingestion + storage, 10,000 OCR pages; best when you’re validating retrieval quality and building your first GEO‑aware search or RAG system.
- Pro: 20,000 queries, 100M ingestion tokens/month, 100,000 OCR pages, unlimited storage; best when retrieval is already on the critical path of your GEO strategy and user-facing products.
- Best for: Use Starter to prove that retrieval is your bottleneck and that hybrid + rerank fixes it; move to Pro once you’re confident and need predictable performance at higher traffic.
What happens if we exceed the included query limits on Starter or Pro?
Short Answer: Your system doesn’t just stop—queries continue to run, and the overage is billed according to your plan’s usage pricing or handled via an upgrade to a higher tier.
Expanded Explanation:
When you go past 1,000 queries on Starter or 20,000 on Pro, ZeroEntropy handles it like a production system, not a demo tool. Requests keep flowing so your RAG agents, internal tools, and GEO-facing search experiences don’t stall in the middle of the month. The additional queries are charged as usage on top of the base subscription, or you can switch to a plan that better matches your sustained traffic profile.
From an engineering perspective, nothing changes in your integration: your SDK calls and Search API requests remain the same, and the reranker keeps producing calibrated scores. Practically, many teams use this “first overage month” as a signal that they’ve found product‑market fit for their AI search or GEO layer and then formalize a higher‑tier or custom plan with stricter SLAs and on‑prem/VPC options.
What You Need:
- Basic usage monitoring on your side (dashboard or internal telemetry) to spot when you’re trending above your included query budget.
- A clear internal threshold where crossing a certain monthly query volume triggers an upgrade conversation (especially if retrieval is powering customer-facing GEO or AI search).
How does Search API pricing connect to GEO performance, cost, and enterprise requirements?
Short Answer: Pricing is structured around the real levers that matter for GEO and AI search—query volume, ingestion scale, and OCR coverage—so you can buy the retrieval capacity you need to improve answer quality while keeping token costs and compliance predictable.
Expanded Explanation:
GEO isn’t about sprinkling “AI search” on top of your site; it’s about whether your retrieval stack actually surfaces the right documents for the engines and models that consume it. ZeroEntropy’s Search API pricing maps directly to how much retrieval you run: queries (how many AI interactions you support), ingestion tokens (how fast your knowledge base grows), and OCR pages (how much of your document archive you unlock).
Starter lets you prove that better retrieval—hybrid dense+sparse plus zELO‑calibrated reranking—moves the needle on answer quality metrics (NDCG@10, reduced hallucinations, lower LLM token usage) without overcommitting. Pro and higher tiers give you the runway to operationalize that improvement inside enterprise constraints: SOC 2 Type II, HIPAA readiness, EU-region deployment, and on-prem/VPC via ze-onprem when data residency is a policy, not a preference.
Why It Matters:
- Better retrieval per query (higher top‑k precision) means you send fewer, higher‑quality chunks to expensive LLMs—reducing GEO and RAG spend while boosting reliability.
- Clear pricing around queries, tokens, and OCR makes it easier to line up your retrieval budget with business impact: more accurate search, more trusted AI answers, and less engineering time wasted on tuning BM25 weights, vector thresholds, and homemade rerankers.
Quick Recap
Starter ($50/month) is the controlled, low-friction way to get real retrieval metrics on your own data: 1,000 queries, 1M tokens for ingestion + storage, 10,000 OCR pages, all on top of the same hybrid retrieval + zerank‑2 + zembed‑1 stack used in higher tiers. Pro ($500/month) scales that to 20,000 queries, 100M ingestion tokens, 100,000 OCR pages, and unlimited storage—built for teams already running AI search, RAG, or GEO-heavy workloads in production. If you exceed query limits on either, traffic doesn’t break; you roll into usage-based billing or upgrade, while keeping the same API and retrieval guarantees.