
ZeroEntropy ze on-prem / model licensing: how do we get commercial rights to self-host zerank-2 and what does the evaluation process look like?
Most teams hit the same wall at the same time: you’ve proven that reranking is the bottleneck in your RAG or search stack, zerank-2 beats your current reranker on your internal evals, and now security/compliance is asking one question—“How do we get commercial rights to run this on our own infra?”
Quick Answer: You evaluate zerank-2 using our open-weight models from Hugging Face on your data, then move to a commercial license that grants production rights to self-host via ze-onprem in your own VPC or data center, with SLAs and compliance artifacts included for enterprise teams.
Frequently Asked Questions
How do we get commercial rights to self-host zerank-2?
Short Answer: You run your evaluation with the open weights, then we put a commercial license in place that covers production use and self-hosting zerank-2 (and other models) via ze-onprem.
Expanded Explanation:
ZeroEntropy’s rerankers and embeddings are fully open-weight so you can test them against your own corpus without a contract in place. Once you’re confident zerank-2 is winning on your internal benchmarks—NDCG@10, recall@k, ticket resolution accuracy—we move to a commercial license that gives you the right to deploy and operate the model stack in your own environment.
For most teams, that license covers: production traffic, internal and external end users, hybrid retrieval (dense + sparse + rerank), and deployment models ranging from self-managed on-prem/VPC to a managed EU instance if you don’t want to run the infra yourself. You keep operational control; we provide the model rights, support, and SLAs.
Key Takeaways:
- Evaluation can start immediately with open weights from Hugging Face—no paperwork needed.
- Commercial licensing unlocks production and self-hosted deployment of zerank-2 via ze-onprem.
What does the zerank-2 evaluation process look like for sensitive or proprietary data?
Short Answer: You download the open-weight model, run it against your real queries and documents in your own environment, and measure retrieval quality and latency before signing a commercial license.
Expanded Explanation:
Evaluation should mirror how your system behaves in production: same corpora, same query patterns, same latency and throughput constraints. With ZeroEntropy, you don’t have to ship sensitive data to us to find out whether zerank-2 is worth it. zerank-2—and its lighter variants—are available as open weights via Hugging Face, so you can run a full offline benchmark locally or in your VPC.
Most teams start by comparing zerank-2 against their incumbent reranker (often Cohere rerank-3.5 or Jina rerank-m0) on a labeled dataset of realistic queries and candidate documents. You measure NDCG@10, top-k hit rate, and p50–p99 latency. Once you see the lift and confirm that latency is within your budget, we lock in a license and you keep the same setup—just with commercial rights and support.
Steps:
- Pull the model: Download zerank-2 (or the relevant variant) from Hugging Face into your environment.
- Run your benchmark: Evaluate on your own queries, documents, and labels—track NDCG@10, recall@k, and p50/p99 latency.
- Decide and license: Once you’re confident, move to a commercial license to run zerank-2 in production via ze-onprem.
How does ze-onprem differ from using the hosted API?
Short Answer: Hosted API is “managed SaaS with SLAs”; ze-onprem is “you run the stack in your own environment with a commercial license and our support.”
Expanded Explanation:
The hosted Search API is the fastest way to go from “we have data” to “we have human-level search,” especially if you don’t want to own the retrieval infra. You hit a single endpoint for dense+sparse+rerenk, and we handle scaling, monitoring, and updates.
ze-onprem is for teams with strict data residency, network, or regulatory constraints—or those who simply prefer infra to live inside their own VPC or data center. Under ze-onprem, you get the same core stack (zerank-2 rerankers, zembed-1 embeddings, hybrid retrieval pipeline) but deploy it as containers or services you manage. You keep data fully in-house; we provide licensing, support, and, for enterprise, SLAs.
Comparison Snapshot:
- Hosted API: Fastest to integrate, ZeroEntropy manages infra, SOC 2 Type II / HIPAA-ready with EU-region option.
- ze-onprem: You self-host in your VPC/data center, complete data control, commercial model licenses plus support.
- Best for: Teams with strict compliance/data residency needs, or those standardizing on self-managed AI infra.
What do we need in place to run zerank-2 via ze-onprem in production?
Short Answer: You need an on-prem/VPC environment that can run GPU (or optimized CPU) workloads, a commercial license for zerank-2, and a clear integration path from your retrieval layer into the reranker.
Expanded Explanation:
zerank-2 is a cross-encoder reranker optimized for production search and RAG workloads. To run it in-house, you provision an environment that can handle your expected QPS and tail latency targets, then integrate the models into your retrieval pipeline: candidate generation (dense + sparse) → zerank-2 reranking → top-k results forwarded to your LLM or UI.
From a process standpoint, you start with evaluation to size your infra, then when you go live under ze-onprem, we help with sizing guidance, configuration best practices, and operational questions (batching, concurrency, scaling). For enterprise, this is formalized with SLAs and white-glove onboarding.
What You Need:
- Technical environment: A VPC or on-prem cluster with GPUs or high-performance CPUs, container orchestration (Kubernetes or equivalent), and monitoring.
- Commercial + compliance setup: A signed license for zerank-2/ze-onprem plus your internal approvals (security, compliance, procurement).
How does licensing zerank-2 and ze-onprem connect to business value and risk reduction?
Short Answer: Licensing lets you turn a one-off evaluation win into a reliable, compliant retrieval layer that reduces LLM spend, improves answer quality, and satisfies your security team.
Expanded Explanation:
In most stacks, retrieval—not the LLM—is the limiting factor. If the right document sits at rank 67, your model never sees it. zerank-2 is trained with calibrated scores using our zELO scoring system to consistently surface the truly relevant evidence at the top of the list, which improves NDCG@10 and reduces “lost in the middle” failures. That translates into fewer hallucinations, fewer follow-up queries, and less wasted LLM context.
Licensing zerank-2 with ze-onprem lets you bank those gains without compromising on governance. You get machine-speed retrieval with human-level precision—on infra that meets SOC 2 Type II and HIPAA expectations, with EU-region options or full on-prem deployment. For leadership, that’s a clean story: better retrieval metrics, lower token costs, and a risk profile that your CISO can actually sign off on.
Why It Matters:
- Measurable retrieval gains: Higher NDCG@10 and better top-k precision mean more accurate RAG answers and fewer hallucinations.
- Compliance + cost: Self-hosted, licensed zerank-2 gives you data control, regulatory alignment, and lower LLM spend by sending fewer, better chunks.
Quick Recap
If you’re evaluating ZeroEntropy’s zerank-2 for ze-onprem, the path is straightforward: test the open weights on your own data, confirm the lift in NDCG@10 and latency vs. your current reranker, then move to a commercial license that grants production rights to self-host in your own VPC or data center. You get a unified dense+sparse+rerank stack, calibrated scores, and a deployment model that aligns with SOC 2 Type II / HIPAA expectations and any EU or on-prem constraints your team has.