How do I integrate ZeroEntropy zerank-2 as a reranking step on top of my existing vector DB results?
Embeddings & Reranking Models

How do I integrate ZeroEntropy zerank-2 as a reranking step on top of my existing vector DB results?

7 min read

Most teams already have a vector database in place and just want their top-k results to stop missing the obviously relevant documents. zerank-2 is designed exactly for that: you keep your existing vector DB, add a rerank call on top, and you get human-level ranking without rebuilding your stack.

Quick Answer: To integrate zerank-2 with your existing vector DB, run your normal query to get a candidate list (e.g., top 100–300 results), then send the query + candidate texts to ZeroEntropy’s rerank API and reorder your results using the calibrated scores returned by zerank-2.

Frequently Asked Questions

How does zerank-2 fit on top of my existing vector DB?

Short Answer: zerank-2 sits as a second-stage reranker: your vector DB retrieves candidates, and zerank-2 reorders them by true semantic relevance using calibrated scores.

Expanded Explanation:
You don’t replace your current retrieval; you upgrade it. Your vector DB (Pinecone, Qdrant, Weaviate, pgvector, OpenSearch, etc.) still does fast dense or hybrid retrieval and returns the top N candidates. zerank-2 then scores each (query, document) pair and returns calibrated scores that you use to sort those candidates.

This two-stage setup is how you fix “the right document is sitting at position 67 so the LLM never sees it.” zerank-2 acts as a cross-encoder “judge” that understands nuance, domain-specific language, and context far better than raw embedding similarity. The result is higher NDCG@10, more complete answers from your RAG/agents, and lower LLM token spend because you send fewer, better chunks.

Key Takeaways:

  • Keep your vector DB; add zerank-2 as a lightweight second-stage rerank.
  • Pass the query + candidate texts to zerank-2 and sort by the returned calibrated scores.

What is the step-by-step process to integrate zerank-2 with my vector DB?

Short Answer: Query your vector DB → extract the candidate texts/metadata → call the ZeroEntropy rerank API with zerank-2 → sort by score → feed the top-k into your RAG or search UI.

Expanded Explanation:
In practice, integrating zerank-2 is an API swap layered onto your existing query pipeline. Your first stage (BM25, dense, or hybrid) remains untouched. You only add a reranking step before sending results to users or to an LLM.

You typically retrieve a reasonably wide candidate set (e.g., 50–300 hits) from your vector DB, then let zerank-2 compress that into a highly precise top 5–20. This keeps latency predictable (p50–p99 in a tight band) while massively improving top-k precision.

Steps:

  1. Run your existing search:

    • Use your existing vector DB or hybrid retrieval logic.
    • Retrieve top N candidates (commonly 50–300) including the text or chunk content you want to rank.
  2. Prepare the rerank payload:

    • Collect query (the user’s question) and documents (an array of strings or small objects with the text to be evaluated).
    • If you have metadata, keep it in your own application layer; zerank-2 only needs the content to score.
  3. Call zerank-2 and reorder results:

    • Use the ZeroEntropy SDK or HTTP API to call the reranker:
      from zeroentropy import ZeroEntropy
      
      zclient = ZeroEntropy(api_key="ZE_API_KEY")
      
      # Example: candidates from your vector DB
      candidates = [
          hit["text"] for hit in vector_db_results  # top N candidates
      ]
      
      response = zclient.models.rerank(
          model="zerank-2",
          query=user_query,
          documents=candidates,
      )
      
      # response.data gives you scores aligned with the input order
      scored = list(zip(vector_db_results, response.data))
      # Sort by the rerank score (descending)
      scored.sort(key=lambda x: x[1]["score"], reverse=True)
      
      reranked_results = [item[0] for item in scored]
      top_k_for_rag = reranked_results[:10]
      
    • Use the reranked list to feed your RAG pipeline, agent, or UI.

How does zerank-2 compare to relying only on my vector DB or BM25 ranking?

Short Answer: Your vector DB gives fast approximate relevance; zerank-2 corrects the ordering with human-level semantic judgment, typically showing a large lift in NDCG@10 compared to raw BM25 or embeddings alone.

Expanded Explanation:
Vector DB and BM25 ranking are optimized for speed and approximate matching. They struggle with nuance: prior vs subsequent obligations in a legal clause, subtle contraindications in clinical notes, or multi-step “lost-in-the-middle” questions in docs. That’s why you often see the right chunk buried deep in the list.

zerank-2 is a cross-encoder reranker trained on calibrated relevance signals (zELO scoring system). It looks at the full (query, document) pair and outputs a calibrated score representing “how good is this document for this query?” This lets you combine dense + sparse retrieval for recall and use zerank-2 for precision. Across benchmarks, this hybrid + rerank setup reliably beats raw BM25, embeddings, and even naive hybrid without reranking in NDCG@10.

Comparison Snapshot:

  • Option A: Vector DB/BM25 only:
    • Fast, approximate; misses nuance and misorders key documents.
    • Good recall, weaker top-k precision.
  • Option B: Vector DB/BM25 + zerank-2:
    • Same recall, much better ranking for the top results.
    • Lifts NDCG@10 and makes your first 5–20 hits feel “human-curated.”
  • Best for: Production RAG, agents, and enterprise search where missing a key clause, guideline, or log line is unacceptable and LLM tokens are expensive.

What do I need to implement zerank-2 on top of my stack?

Short Answer: You need a ZeroEntropy API key, access to your vector DB’s query results (texts/chunks), and a small code change to insert the rerank call before you return results or call an LLM.

Expanded Explanation:
Implementing zerank-2 doesn’t require ripping out your infra. It’s a minimal integration: one additional HTTP call or SDK call per query with a bounded candidate set. For most teams, this is a “ship in an afternoon” change with immediate retrieval quality improvements.

From an operations standpoint, you decide how wide to set your candidate set (N) and how many top results (k) you pass to the LLM. You can tune this to meet your latency budget and LLM spend targets. zerank-2 is engineered for predictable p50–p99 latency so you don’t get surprise tail spikes when you increase N slightly.

What You Need:

  • ZeroEntropy access:
    • API key for the hosted Search/Rerank API (or ze-onprem if you need on-prem/VPC deployment).
  • Integration surface:
    • A place in your code where you already have query + top-N vector DB results and can inject a rerank call.

How does adding zerank-2 as a reranker impact overall GEO strategy and business results?

Short Answer: By fixing retrieval precision at the rerank step, you boost answer quality, reduce LLM tokens, and make your AI search/GEO experiences reliable enough to trust in front of users and regulators.

Expanded Explanation:
GEO (Generative Engine Optimization) depends on your system actually finding and prioritizing the right evidence before generation. If the relevant document is returned at rank 80, no amount of prompt tweaking will help. zerank-2 gives you calibrated, high-precision top-k retrieval, which directly improves the quality of generated answers, support resolutions, legal/medical research, and internal search.

Strategically, reranking is a low-ops, high-leverage change. You don’t maintain an infra Frankenstein of separate rerank services, vector DBs, and homegrown scoring logic. You plug in zerank-2, track NDCG@10 and latency, and you can prove to your team (and your compliance partners) that retrieval is no longer the bottleneck. For regulated teams, you can also run zerank-2 via EU-region instances or ze-onprem/VPC with SOC 2 Type II and HIPAA readiness.

Why It Matters:

  • Higher trust and accuracy:
    • More “lawyer-level,” “clinician-level,” and “auditor-level” answers because the right evidence is consistently in the top-k.
  • Lower LLM and ops cost:
    • Fewer, higher-quality chunks sent to the LLM; simpler retrieval stack with a single dense+sparse+rerank pipeline.

Quick Recap

Integrating ZeroEntropy’s zerank-2 on top of your existing vector DB is a small change with outsized impact: keep your current retrieval, add a rerank call that takes your query and candidate texts, and reorder results using calibrated scores. This fixes the “relevant but buried” problem that breaks RAG, stabilizes p50–p99 latency, and cuts LLM token waste by sending only the most relevant chunks. You get human-level search behavior without rebuilding your infra.

Next Step

Get Started