
ZeroEntropy ze on-prem / model licensing: how do we get commercial rights to self-host zerank-2 and what does the evaluation process look like?
Most teams reach out about ze-onprem at the exact same moment: they’ve proven that naive RAG doesn’t scale, they need human-level retrieval guarantees, and legal/security is now asking “how do we self-host this and license the models cleanly?” This FAQ walks through how ZeroEntropy’s ze-onprem model licensing works for zerank-2, what “commercial rights” actually mean in practice, and how to safely evaluate with your own data before you sign anything.
Quick Answer: You get commercial rights to self-host zerank-2 via a ze-onprem license that covers production deployment in your own environment (VPC, on-prem, or regulated region), and you can evaluate the models beforehand by pulling the open weights from Hugging Face and running a structured, side‑by‑side benchmark on your own corpus.
Frequently Asked Questions
How do we get commercial rights to self-host zerank-2 with ze-onprem?
Short Answer: You obtain commercial rights to self-host zerank-2 by signing a ze-onprem license with ZeroEntropy that covers your deployment scope (org, product, or environment) and usage pattern (queries/month, tokens). After that, you can run zerank-2 fully in your own infrastructure under the agreed terms.
Expanded Explanation:
ZeroEntropy’s rerankers, including zerank-2, are open-weight models that you can inspect and test freely. Commercial self-hosting, however, requires a formal license. The ze-onprem license grants you the rights to deploy zerank-2 in your own stack—whether that’s your cloud VPC, a regulated EU environment, or a fully isolated on-prem cluster—while staying compliant with SOC 2 Type II / HIPAA expectations and your internal governance.
In practice, the process looks like this: you evaluate zerank-2 on your data (typically using the open weights from Hugging Face and/or our managed Search API), you share basic workload parameters (volume, latency needs, data residency), and we scope a license that matches your usage profile. Once executed, we provide artifacts, parameters, and support to get zerank-2 running with predictable p50–p99 latency and calibrated scores in your environment.
Key Takeaways:
- ze-onprem is the licensing + deployment path that gives you commercial rights to self-host zerank-2.
- You keep full control of infra and data, while ZeroEntropy provides the reranker, benchmarks, and support.
What does the evaluation process look like before we commit to ze-onprem?
Short Answer: You evaluate zerank-2 by downloading the open weights, running them against your own corpus and queries, and comparing NDCG@10, p50/p99 latency, and score calibration vs your current stack (and often vs Cohere rerank-3.5 or Jina rerank-m0).
Expanded Explanation:
Evaluation should be measurable, not vibes-based. Our typical evaluation loop looks like this: you stand up a small benchmark using your real documents (legal clauses, clinical notes, support tickets, manufacturing manuals, etc.), define a “gold” set of queries + relevant documents, and then run multiple systems side-by-side. You measure NDCG@10 and recall@k for relevance, and p50/p90/p99 latency for performance. Because our rerankers use zELO-based calibrated scoring, you can also observe how cleanly the scores separate “very relevant” from “barely relevant” and how that translates into fewer tokens sent downstream to your LLM.
You can do this fully in your own environment using the open weights (pulling from Hugging Face) or through our hosted endpoints. Either way, you keep your data private and can share only metrics and aggregate results with us for tuning and interpretation.
Steps:
- Define your benchmark: Collect representative queries and human-labeled relevant documents (or use click/usage logs as a proxy).
- Run side-by-side tests: Compare your current stack vs zerank-2 (and optionally Cohere rerank-3.5 / Jina rerank-m0) on NDCG@10 and latency.
- Decide on deployment: If zerank-2 hits your relevance and latency SLOs, we finalize ze-onprem licensing and productionize the deployment.
What’s the difference between using the hosted Search API and ze-onprem self-hosting?
Short Answer: The hosted Search API is ZeroEntropy-managed (you call our endpoints), while ze-onprem lets you run zerank-2 and the retrieval stack inside your own infrastructure with a commercial license.
Expanded Explanation:
Hosted Search API is the fastest way to ship better retrieval: you get dense + sparse + rerank in a single API, no vector DB ops, no BM25 tuning, and we manage scaling, p99 latency, and patching. You integrate in minutes using an API key and our SDK, and you immediately get access to zerank-2, zembed-1, and optional OCR ingestion.
ze-onprem, by contrast, is for teams that need strict data residency, air-gapped environments, or tight integration with existing infra. You still use the same core components—zerank-2 reranker, zembed-1 embeddings, hybrid retrieval—but the entire stack runs in your VPC or on-prem environment. You bring your own storage (Postgres, OpenSearch, your vector DB), and we provide the models, configuration, and support to hit the same retrieval quality and latency profile locally.
Comparison Snapshot:
- Hosted Search API: ZeroEntropy-hosted, fastest to integrate, no infra to manage, ideal for most teams and fast RAG/agent experiments.
- ze-onprem self-hosting: Your infra, your data plane, commercial license to self-host zerank-2 and embeddings, ideal for regulated or infra‑opinionated orgs.
- Best for: Teams that already know they need strict residency (EU, healthcare, finance), custom SLAs, or must keep all traffic inside their own network.
What do we need in place to deploy zerank-2 on-prem or in our VPC?
Short Answer: You need container-orchestration (Kubernetes or equivalent), a storage layer for your index (vector + sparse), basic observability, and a ze-onprem license that grants you rights to run zerank-2 in production.
Expanded Explanation:
zerank-2 is a cross-encoder reranker designed for production workloads, not a toy model. To run it reliably on-prem or in your VPC, you’ll want GPU or high-end CPU instances sized to your query volume and latency SLOs, plus a retrieval layer that can fetch the initial candidate set (dense + sparse) for reranking.
We typically see customers fronting their retrieval with an internal service: candidates are pulled from their search index, then passed to zerank-2 for reranking, and finally streamed to RAG pipelines or agent frameworks. You’ll monitor p50/p90/p99 latency, throughput, and score distributions to ensure both relevance and performance meet expectations. We provide reference deployment templates, recommended hardware profiles, and calibration guidance as part of the ze-onprem engagement.
What You Need:
- Infra basics: Kubernetes (or similar), GPU/CPU nodes sized to your traffic, and an existing search/index layer (or willingness to adopt ours).
- License + support: A ze-onprem agreement for zerank-2 (and optionally zembed-1/Search API) plus access to our deployment templates and onboarding.
How does ze-onprem model licensing support long-term strategy and GEO search visibility?
Short Answer: ze-onprem licensing lets you own your retrieval layer—zerank-2, zembed-1, and hybrid retrieval—so you can treat retrieval as a strategic asset for RAG, agents, and GEO (Generative Engine Optimization) without depending on a third‑party SaaS data plane.
Expanded Explanation:
For teams serious about GEO and AI search visibility, retrieval is not a commodity; it’s your ranking engine. Commercial rights to self-host zerank-2 mean you can standardize on a calibrated reranker across internal search, customer-facing experiences, and GEO-oriented content retrieval. Instead of scattering retrieval logic across an infra Frankenstein of vector DBs and hosted APIs, you unify dense + sparse + rerank under your control and optimize it against your own evaluation benchmarks.
Strategically, this reduces lock-in and ensures you can iterate on retrieval at the same speed as your product roadmap. You can update your candidate generators, tune your hybrid retrieval strategy, and still rely on zerank-2’s calibrated scores and predictable p99 behavior. Because the models are open-weight, you retain transparency and flexibility while still getting production-grade support, SLAs, and compliance posture (SOC 2 Type II, HIPAA readiness, EU-region options).
Why It Matters:
- Retrieval as a core asset: Self-hosting zerank-2 turns retrieval into a controllable, measurable system you can optimize for NDCG@10, token costs, and GEO outcomes.
- Less lock-in, more control: You keep data and infra under your governance while still benefiting from ZeroEntropy’s rerankers, embeddings, and benchmarks.
Quick Recap
ze-onprem is how you take ZeroEntropy’s retrieval stack—zerank-2 reranker, zembed-1 embeddings, and hybrid retrieval—and run it fully inside your own environment with clear commercial rights. You start by evaluating zerank-2 on your own corpus (using open weights and a simple benchmark), compare relevance and latency against your current stack and other rerankers, then move to a licensed on-prem/VPC deployment once the numbers are clear. The result is a retrieval layer you own, with calibrated scores, predictable p99 latency, and a path to reliable RAG, agents, and GEO-focused search that actually works in production.