How can we use generative AI on sensitive internal data without sending anything to an external LLM service?
MLOps & LLMOps Platforms

How can we use generative AI on sensitive internal data without sending anything to an external LLM service?

12 min read

Most regulated teams I work with start from the same hard constraint: they need generative AI on sensitive internal data, but they cannot send prompts, documents, or embeddings to an external LLM service. If you’re in that camp, the question isn’t “Can we use GenAI?”—it’s “How do we do it entirely on our own infrastructure, with controls risk and security can actually sign off on?”

Below is a practical, production-oriented framework for using generative AI on sensitive internal data without ever exposing it to an external LLM API, and how platforms like H2O AI implement that in on‑premise, air‑gapped, and VPC environments.


Quick Answer: Keep Models and Data Inside Your Perimeter

At a high level, you have three non‑negotiables:

  1. Run the LLM stack on your infrastructure

    • On‑premise, air‑gapped, or in your private cloud VPC.
    • No calls to public APIs, no SaaS endpoints in the data path.
  2. Bring the model to the data, not the data to the model

    • Index and retrieve from your internal sources (SharePoint, file shares, Google Drive, Slack, Teams) entirely inside your network.
    • Use retrieval‑augmented generation (RAG) with your own vector store, not a vendor’s managed one.
  3. Wrap it in enterprise controls

    • Authentication, authorization, audit logging.
    • Human‑in‑the‑loop escalation, monitoring, and explainability.
    • Clear evidence that “No data sharing. No model exfiltration.” is not just a slogan but a design constraint.

When those three are true, you can safely use generative AI for sensitive workflows—policy search, regulatory reporting, fraud investigations, KYC documentation, internal HR policies—without violating data residency, confidentiality, or model risk expectations.


At-a-Glance Comparison

If you’re evaluating how to deploy generative AI on sensitive internal data, there are three practical paths:

RankOptionBest ForPrimary StrengthWatch Out For
1Fully sovereign platform (e.g., h2oGPTe on‑prem / air‑gapped)Regulated enterprises needing strict data controlEnd-to-end GenAI + agents deployed fully inside your environmentRequires IT/infra readiness and platform evaluation
2VPC-deployed GenAI with private connectorsCloud-forward teams with strong VPC governanceScales like cloud, stays inside your VPC boundaryMisconfigured networking can accidentally open outbound paths
3Self-managed open-source stackHighly technical teams with in-house ML & MLOpsMaximal customization and controlHigh integration, maintenance, and evaluation burden

Comparison Criteria

We evaluated these options on three criteria that actually matter when you tell your CISO and Model Risk team you’re “doing GenAI”:

  • Security & Sovereignty:
    Can you prove that prompts, documents, and embeddings never leave your infrastructure? Are deployments supported in air‑gapped and on‑prem environments, not just “private cloud”?

  • Accuracy & Governance:
    Does the solution go beyond surface‑level answers? Can you evaluate, monitor, and explain responses—especially when they drive decisions in fraud, credit, or regulatory processes?

  • Integration & Operationalization:
    Can you connect to your real systems (SharePoint, Google Drive, Slack, Teams, Snowflake, databases) and move from pilots to production? Or are you stuck in demo land with isolated POCs?


1. Fully Sovereign Platform (Best overall for regulated enterprises)

A fully sovereign GenAI platform—like h2oGPTe deployed on‑premise, air‑gapped, or in your data center—is the most straightforward way to use generative AI on sensitive internal data without sending anything to an external LLM service.

Why it ranks first

This approach is built around the constraint you care about: the entire GenAI stack (models, embeddings, vector store, connectors, agents) runs on your infrastructure. H2O AI’s platform is explicitly engineered for:

  • On‑premise & air‑gapped data centers
  • Cloud VPC deployments
  • “No data sharing. No model exfiltration.” by design

You bring the platform into your environment and integrate it with your private data sources; nothing calls out to an external LLM service.

What it does well

  • End-to-end controlled environment

    • The LLMs, SLMs (small language models), RAG pipeline, and agents live entirely inside your perimeter.
    • You control networking (no outbound internet), identity providers, and access control lists.
    • This is how NIH runs an intranet-native business assistant in an air‑gapped environment, helping 8,000 users across 28 institutes deflect up to 10,000 annual service requests—without exposing data outside NIH.
  • Deep research & high-precision RAG

    • h2oGPTe uses Citation RAG—answers come with linked, verifiable passages from your documents, not hallucinated references.
    • The h2oGPTe Agent “consistently tops the leaderboard for deep research accuracy” and was the first to achieve 75% accuracy on the GAIA test, ahead of OpenAI’s deep research, which matters when you’re relying on it for policy, regulation, or procedural guidance.
  • Converged generative + predictive AI

    • You’re not limited to chat answers. The platform combines Generative AI with Predictive AI via H2O Driverless AI.
    • That means agents can forecast, reason, and take structured actions—for example:
      • Pre‑populating a regulatory report, then asking a human reviewer to confirm.
      • Using a fraud model to score a case, then generating a narrative summary for an investigator.
      • Pulling KYC data, checking it against rules, then drafting an onboarding decision memo.
  • Explainability and governance baked in

    • H2O’s background in predictive ML brings a mature explainability toolkit, automated documentation, and model monitoring.
    • For GenAI, you can configure guardrails, human-in-the-loop escalation, and MRM-aligned evaluation to ensure that when an agent touches a regulated workflow, you can defend it to risk and compliance.
  • Enterprise integrations inside your network

    • Connectors to Google Drive, SharePoint, Slack, Teams, GitHub, AWS, Snowflake and more—running inside your controlled environment.
    • Agents can sit inside existing tools (e.g., a call center console or internal portal) instead of forcing users into a new, isolated UI.

Tradeoffs & limitations

  • Requires platform adoption, not a quick script
    • You’re adopting an enterprise platform, not just spinning up a single open‑source model.
    • That’s a feature, not a bug, if you actually want monitored, production-grade AI—but it does mean cross‑team alignment and a proper evaluation cycle.

Decision trigger

Choose a fully sovereign GenAI platform if:

  • You must keep sensitive data and prompts inside your infrastructure.
  • You need to support air‑gapped, on‑premise, or strict VPC boundaries.
  • You care about accuracy, explainability, and monitoring, not just “cool demos.”
  • You want to transition from pilots to production on real workflows like KYC, fraud investigations, regulatory reporting, call center support, or internal policy search.

2. VPC-Hosted GenAI (Best for cloud-first teams with strict boundaries)

If your organization is comfortable in the cloud but still can’t expose data to external SaaS LLM APIs, deploying GenAI inside your own cloud VPC is the next strongest option.

Why it’s a strong fit

You keep the core principle: models and data stay inside your VPC, integrated with your existing IAM, networking, and logging. H2O AI supports this deployment pattern as part of its “end-to-end GenAI platform built for air-gapped, on‑premises or cloud VPC deployments.”

What it does well

  • Cloud flexibility with enterprise controls

    • You use your own cloud accounts (AWS, Azure, GCP), with your security controls: private subnets, security groups, NACLs, peering, etc.
    • The GenAI platform runs in your VPC, so nothing goes to a vendor’s shared multi‑tenant endpoint.
  • Simplified scaling and upgrades

    • Easier horizontal scaling of inference, vector stores, and agent orchestration using cloud-native primitives.
    • You can align GenAI capacity with existing infra patterns (e.g., auto-scaling groups, managed Kubernetes).
  • Strong integration with cloud data platforms

    • Direct, private connections to your Snowflake, data lakes, RDS/SQL, and internal APIs.
    • Ideal if a lot of your sensitive data is already in tightly controlled cloud services with strong RBAC.

Tradeoffs & limitations

  • Requires disciplined network configuration
    • Misconfigured outbound access can accidentally punch a hole to external endpoints.
    • You need clear policies: no outbound internet from subnets hosting LLM inference, vector stores, and connectors.
    • Internal risk assessment still needs to treat this as infrastructure you own, not vendor SaaS.

Decision trigger

Choose VPC-hosted GenAI if:

  • Your security posture is “cloud‑allowed, but no external LLM services.”
  • You want sovereignty and control, but with the elasticity of cloud.
  • You’re ready to enforce strict no public egress for GenAI components and have the cloud engineering maturity to keep it that way.

3. Self-Managed Open-Source Stack (Best for highly technical, hands-on teams)

Some teams choose to assemble their own solution using open-source models (e.g., LLaMA derivatives, Mistral‑based models), custom RAG, and homegrown agent orchestration—all deployed on-prem or in a VPC.

Why it’s attractive

This path gives you maximum control over every layer and can be run entirely inside your perimeter if done correctly:

  • Models hosted on your GPU servers.
  • Embeddings and vector store inside your network.
  • Custom connectors to your file shares, intranet sites, and internal apps.

What it does well

  • Fine-grained customization

    • You can tailor the stack to niche use cases, choose exactly which models to run, and tune them yourself.
    • Useful if you have unique domain needs or specialized languages/terminology.
  • Deep internal understanding

    • Your team knows every component, which can help in technical audits.
    • You control the full lifecycle—build, deploy, monitor, retire.

Tradeoffs & limitations

  • High engineering and governance burden

    • You’re responsible for:
      • Model evaluation and benchmarking.
      • Guardrails and safety filters.
      • Monitoring for drift, hallucinations, and failure modes.
      • Audit-ready documentation for model risk and compliance.
    • Most failures I see in this path are not technical—they’re governance gaps: no systematic evaluation harness, no human-in-the-loop policy, no clear escalation when the model is uncertain.
  • Risk of “surface-level insights”

    • Without rigorous evaluation and deep-research benchmarking, teams often deploy assistants that sound confident but produce shallow or unreliable answers.
    • In regulated workflows, that’s dangerous—and you will get pushback from risk and audit.

Decision trigger

Choose the DIY open-source route only if:

  • You have a strong in-house ML, MLOps, and security team.
  • You’re ready to build and maintain an evaluation framework, not just the inference stack.
  • You understand you’re effectively becoming your own GenAI platform vendor—and accept the ongoing cost and responsibility.

How to Keep Sensitive Data Off External LLM Services: A Concrete Checklist

Regardless of which deployment option you choose, here’s a practical checklist I’d expect to see in a design review where the brief is: “Use generative AI on sensitive internal data without sending anything to an external LLM service.”

1. Deployment & Networking

  • LLMs and SLMs deployed on‑premise, air‑gapped, or in private VPC, not as SaaS.
  • No outbound internet from:
    • LLM inference nodes.
    • Embedding/vector store services.
    • Connectors that touch sensitive data.
  • All traffic to the GenAI platform restricted to internal networks/VPN.
  • Logs and telemetry stored inside your environment, with retention policies aligned to governance.

2. Data & RAG Architecture

  • Connectors to internal systems (SharePoint, Google Drive, file shares, Slack, Teams, databases) run within your infrastructure.
  • Document ingestion, chunking, and embedding happens locally; no embeddings are sent to external services.
  • Vector stores (e.g., FAISS, Milvus, proprietary) run inside your perimeter.
  • PII and highly sensitive data have appropriate masking/minimization where required by policy.

3. Identity, Access, and Audit

  • The GenAI platform integrates with enterprise IAM (e.g., SSO, SAML, OAuth) and enforces RBAC.
  • Access to specific documents or data sources is governed by user permissions—the assistant can’t surface content a user wouldn’t normally see.
  • All interactions are logged with:
    • User identity
    • Prompt
    • Retrieved documents
    • Model response
  • Logs are accessible for audit, incident response, and model risk reviews.

4. Accuracy, Guardrails, and Human-in-the-Loop

  • Citation RAG or equivalent: every answer is grounded in specific, linkable sources.
  • Evaluation harnesses are in place:
    • Domain-specific test sets (e.g., KYC policies, fraud rules, regulatory guidance).
    • Quantitative metrics for accuracy, citation quality, and hallucination rate.
  • Guardrails for:
    • Restricted topics.
    • Escalation to human review when confidence is low or stakes are high.
  • Human-in-the-loop workflows:
    • For high-risk use cases, the assistant drafts; a human owns the final decision.
    • This is documented and agreed with risk/compliance.

How H2O AI Addresses This Constraint in Practice

Because H2O AI is built for sovereign AI in the world’s most regulated industries, the “no external LLM service” requirement is not edge-case—it’s the design center.

Concretely:

  • Deployment:

    • End-to-end GenAI platform built for air-gapped, on-premises, or cloud VPC deployments.
    • Used by banks, telcos, and government agencies worldwide where outbound SaaS is often disallowed.
  • Security posture:

    • No data sharing. No model exfiltration.” is a core claim, not a footnote.
    • Airgapped and FedRAMP environments are supported; NIH’s use case is a live example.
  • Accuracy and research depth:

    • h2oGPTe Agent “consistently tops the leaderboard for deep research accuracy.”
    • First to hit 75% accuracy on the GAIA test, ahead of OpenAI’s deep research—critical if you want to replace “tribal knowledge” searches with something auditable.
  • Predictive + generative convergence:

    • H2O Driverless AI plus h2oGPTe gives you both:
      • Production‑grade predictive models (with automatic documentation, explainability, and monitoring).
      • Generative agents that can read those models’ outputs, explain them, and trigger next best actions in workflows.
  • Ecosystem and trust:

    • 2M+ data science users on H2O’s open-source ecosystem.
    • Gartner MQ DSML recognition for “completeness of vision and ability to execute.”
    • Customer outcomes like Australia’s largest bank cutting fraud by 70% and AT&T reporting 2X ROI in free cash flow signal that this stack is battle‑tested, not experimental.

If your security team’s first question is, “Does any data ever leave our perimeter?” and your Model Risk team’s first question is, “How do we evaluate and monitor this in production?”, H2O’s answer is to bring the entire Super Agent™ stack into your environment, then layer on the evaluations, guardrails, and monitoring you need.


Final Verdict

You absolutely can use generative AI on sensitive internal data without sending anything to an external LLM service—but only if you treat sovereignty, governance, and evaluation as first‑class design constraints.

  • A fully sovereign enterprise platform (like h2oGPTe deployed on-prem, air‑gapped, or in a VPC) is the most direct route: models, data, and agents live inside your perimeter, with “No data sharing. No model exfiltration.” and built‑in evaluation, guardrails, and explainability.
  • A VPC-hosted deployment is ideal for cloud-forward teams who can enforce strict network isolation and want cloud elasticity with enterprise governance.
  • A self-managed open-source stack can work for very technical teams willing to own the entire lifecycle—model hosting, RAG, evaluation, monitoring, and audit documentation.

The common pattern across all three: bring the model to your data on your infrastructure, enforce strong identity and monitoring, and never rely on “just trust the model” in a regulated workflow.


Next Step

Get Started