How do I deploy mindSDB Teams (Deploy Anywhere) in our VPC or on‑prem environment?
AI Analytics & BI Platforms

How do I deploy mindSDB Teams (Deploy Anywhere) in our VPC or on‑prem environment?

12 min read

Most teams evaluating mindSDB Teams (Deploy Anywhere) want the same thing: production-grade AI-powered analytics that run inside their own trust boundary—either in a private VPC or an on‑prem data center—without moving or duplicating data.

This guide walks through how to deploy mindSDB Teams (Deploy Anywhere) into your VPC or on‑prem environment, how it connects to your existing databases and document stores, and how to operate it safely at enterprise scale.


What “Deploy Anywhere” actually means

When we say “Deploy Anywhere” for mindSDB Teams, we mean:

  • Your data never leaves your infrastructure.
    mindSDB runs as a service inside your VPC or data center. It queries data sources in place—MySQL, PostgreSQL, Snowflake, BigQuery, file systems, Salesforce, etc.—without ETL or replication.

  • You control the network boundary.
    Deployed via containers (Kubernetes, Docker, or equivalent), fronted by your own load balancers, network policies, and firewalls.

  • You control authentication and governance.
    SSO, LDAP, and RBAC integrate with your identity provider, and document Knowledge Bases respect native permissions from systems like SharePoint, Google Drive, or internal file servers.

If you can run containers in your VPC or on‑prem environment, you can run mindSDB Teams there.


Prerequisites for VPC and on‑prem deployments

Before you deploy mindSDB Teams (Deploy Anywhere), make sure you have:

Infrastructure & networking

  • Container runtime / orchestrator
    • Kubernetes (EKS, AKS, GKE, self‑managed), OpenShift, or
    • Docker / container runtime for single‑node or PoC setups.
  • Network connectivity to data sources
    • Private subnets/routes to your databases (MySQL, PostgreSQL, MS SQL Server, Snowflake, BigQuery via private endpoints, etc.).
    • Access to document/storage systems for Knowledge Base: S3, GCS, Azure Blob, NFS, SharePoint, Google Drive, or in‑house DMS.
  • Ingress / Load balancer
    • HTTP(S) ingress to expose the mindSDB UI/API, typically behind:
      • An internal or external load balancer (ALB/NLB), or
      • A service mesh (Istio, Linkerd) if applicable.

Identity, access, and governance

  • SSO / LDAP / IdP
    • SAML, OIDC, or LDAP/Active Directory available for:
      • Single sign‑on for unlimited users.
      • Group-based roles and permissions.
  • RBAC model
    • Defined roles for:
      • Admins (infrastructure + data configuration).
      • Data owners (source connections, Knowledge Bases).
      • Business users (analytics, questions, semantic search).

Observability and operations

  • Logging
    • Centralized logging (CloudWatch, Stackdriver, ELK, Splunk, etc.) to ingest container and application logs.
  • Metrics
    • Prometheus / Grafana or equivalent for:
      • Latency, throughput, and error rates.
      • Embedding freshness and retrieval accuracy for Knowledge Bases.

Once these are ready, you can choose your deployment path: VPC or on‑prem. Mechanically they’re similar; the differences are mostly in network and identity integration.


Core architecture: how mindSDB Teams runs in your environment

Regardless of where you deploy (VPC or on‑prem), the architecture follows the same principles:

  1. mindSDB Services (Application layer)

    • Runs as a set of containers:
      • API / orchestration service.
      • Cognitive engine for planning, generation, validation, and execution.
      • Web UI for conversational analytics and configuration.
    • Stateless or light state; backed by your choice of database for metadata/config.
  2. Metadata / config store (Customer-controlled)

    • A database (e.g., PostgreSQL or MySQL) you operate.
    • Stores:
      • Connection definitions (encrypted).
      • Knowledge Base metadata and embeddings.
      • User profiles, roles, and access policies.
    • No customer business data is ingested or replicated; queries are executed in place.
  3. Data sources (Your existing systems)

    • Structured: MySQL, PostgreSQL, MS SQL Server, Snowflake, BigQuery, Redshift, etc.
    • SaaS: Salesforce, HubSpot, Zendesk, work management tools.
    • Unstructured: PDFs, Word, HTML, text in S3/GCS/Blob, SharePoint, internal file servers.
  4. LLM endpoints (Customer-selected)

    • You choose and configure:
      • Private LLM endpoints (e.g., in VPC).
      • Vendor APIs with your own keys, if allowed by your governance.
    • mindSDB orchestrates calls; it does not train on your data.
  5. Governance & observability

    • Every step of a query run is logged: planning → generation → validation → execution.
    • SQL is visible and reviewable before execution if you choose to enable “approve before run” modes.
    • Knowledge Base retrievals show citations and document-level sources.

This architecture is what makes “Deploy Anywhere” compatible with strict data residency and compliance: nothing leaves your boundary, and every operation is auditable.


Step-by-step: Deploying mindSDB Teams in your VPC

This section assumes a typical cloud-native setup (e.g., AWS VPC + EKS). Adapt the mechanics to your cloud of choice.

1. Plan your network placement

  • Decide on subnet placement
    • Deploy mindSDB pods in private subnets with:
      • Outbound access to data sources within the VPC.
      • Optional controlled outbound access to LLM APIs or private model endpoints.
  • Security groups / network policies
    • Allow inbound traffic:
      • From your ingress/load balancer to mindSDB service.
    • Allow outbound traffic:
      • To your databases (default ports or custom).
      • To storage (S3 endpoints, internal NFS, DMS).
      • To LLM endpoints or gateways.

This is where you enforce the “trust boundary”—nothing about the architecture requires public data egress.

2. Provision infrastructure

  • Kubernetes cluster (preferred)

    • Create or use an existing EKS/AKS/GKE cluster.
    • Ensure cluster has:
      • Node groups with enough CPU/RAM for:
        • Cognitive engine workloads.
        • Embedding generation for Knowledge Bases.
      • Storage classes for persistent volumes (metadata DB, embedding indexes, etc.).
  • Metadata database

    • Provision PostgreSQL or MySQL inside your VPC.
    • Restrict access to mindSDB service IP ranges / security groups only.
  • Object storage (for Knowledge Base, if you choose)

    • S3/GCS/Blob or internal object store.
    • Optional: bucket/prefix dedicated to mindSDB’s derived artifacts (embeddings, chunk metadata), still within your environment.

3. Deploy mindSDB containers

mindSDB Teams (Deploy Anywhere) is delivered as container images with configuration templates. The exact commands will be provided as part of your enterprise onboarding, but the flow is:

  • Pull images into your registry

    • Mirror mindSDB images into your private container registry (ECR, ACR, GCR, or on‑prem registry).
  • Apply deployment manifests / Helm chart

    • Configure:
      • Image registry / tags.
      • Service type (ClusterIP vs LoadBalancer; typically ClusterIP + Ingress).
      • Environment variables for:
        • Metadata DB connection.
        • LLM endpoints.
        • Feature flags (logging verbosity, validation modes).
    • Deploy via:
      kubectl apply -f mindsdb-deploy.yaml
      
      or
      helm install mindsdb mindsdb/mindsdb -f values.yaml
      
  • Expose the service

    • Configure Ingress to:
      • Terminate TLS at your ingress controller.
      • Route / to the mindSDB web UI and API service.
    • Optionally restrict by IP ranges or internal DNS only.

4. Integrate with your IdP (SSO / LDAP)

With the service reachable inside your VPC, wire up authentication:

  • SSO (SAML/OIDC)

    • Register mindSDB as an application in Okta, Azure AD, Ping, or your IdP.
    • Configure:
      • Callback URLs (e.g., https://mindsdb.yourdomain.com/auth/callback).
      • Group/role mappings so enterprise roles are available inside mindSDB.
    • Enable SSO in mindSDB config (SSO URL, client ID/secret, certificate, etc.).
  • LDAP/AD (if required)

    • Provide LDAP servers, base DN, and group filters.
    • Limit access to specific organizational units or groups.

Once done, your users can sign in with existing enterprise credentials—no separate user management is required.

5. Connect data sources with query‑in‑place execution

Now bring your data into scope—without moving it:

  • Relational and analytical databases

    • From the mindSDB UI or API, define connections to:
      • MySQL / PostgreSQL / MS SQL Server inside your VPC.
      • Snowflake, BigQuery, Redshift, or other warehouses (ideally via private endpoints).
    • For each connection:
      • Use service accounts with least-privilege access.
      • Restrict to allowed schemas / tables.
  • SaaS applications

    • Add connectors for Salesforce, HubSpot, Zendesk, etc.
    • Configure OAuth / API keys with read-only scopes where possible.
  • Document repositories / file systems

    • Configure a Knowledge Base pointing to:
      • S3/GCS/Blob buckets.
      • On‑prem file shares or DMS endpoints.
      • SharePoint, Google Drive, or similar.
    • Enable:
      • Chunking and embedding generation.
      • AutoSync for continuous updates as documents change.
      • Native permission inheritance, so users only see what their source system permissions allow.

This is where mindSDB’s “over 200 data connectors” matter: you eliminate ETL and get cross-system analytics by querying everything in place.

6. Configure LLMs and validation policies

To keep AI outputs trustworthy:

  • Choose your LLM endpoints
    • Private models in your VPC (e.g., on GPU nodes or an internal model-serving platform).
    • Vendor APIs (OpenAI, Anthropic, etc.) if allowed; configured with your API keys.
  • Set validation and execution controls
    • Multi-phase validation for:
      • SQL correctness and safety before execution.
      • Plan consistency vs schema.
    • Governance settings to:
      • Require human review of generated SQL for privileged schemas.
      • Log all reasoning steps and SQL statements to your observability stack.

This config is how you enforce “trust and verify” while still moving from question to answer in seconds.

7. Roll out to users and monitor

Finally, open the gates:

  • Onboard teams

    • Invite users via SSO groups.
    • Create spaces/workspaces by team (Finance, RevOps, Support).
    • Pre-configure common questions, dashboards, and saved queries if desired.
  • Monitor and iterate

    • Track:
      • Response latency and success rates.
      • Embedding freshness and retrieval accuracy for Knowledge Bases.
      • Model usage and cost where applicable.
    • Use logs to debug:
      • Misinterpreted business terms—then enrich schema semantics (e.g., teach “cases,” “tickets,” “projects”).
      • Edge-case queries—then adjust validation rules or permissions.

At this point, you have full-stack conversational analytics running inside your VPC, directly on your production data, without BI backlogs.


Step-by-step: Deploying mindSDB Teams on‑prem

On‑prem deployments use the same architecture but run in your data center rather than a cloud VPC.

1. Prepare your on‑prem environment

  • Compute cluster
    • Kubernetes / OpenShift on bare metal or VMs, or
    • A VM cluster with Docker for smaller deployments.
  • Network
    • VLANs and routing to:
      • Database servers (MySQL, PostgreSQL, MS SQL Server, etc.).
      • File servers, NFS shares, on‑prem DMS.
    • Optional outbound proxy or gateway if you permit external LLM APIs.

2. Deploy containers inside your data center

  • Use your internal registry
    • Mirror mindSDB images into your on‑prem registry.
  • Apply manifests / Helm chart
    • Same pattern as VPC:
      • Metadata DB running on an on‑prem RDBMS.
      • Ingress through your corporate gateway / load balancer.
      • TLS terminated at your reverse proxy.

3. Integrate with LDAP/AD and SSO

  • Connect mindSDB to your on‑prem identity layer:
    • LDAP/AD for user lookup and group membership.
    • Optional SAML SSO integrated with your internal IdP.

4. Connect on‑prem data sources

  • Databases
    • Connect to your on‑prem MySQL, PostgreSQL, MS SQL Server, Oracle, etc.
    • Use service accounts scoped per team or schema.
  • Document repositories
    • Point Knowledge Bases to:
      • On‑prem file shares (SMB/NFS).
      • Internal DMS (via APIs or connectors).
    • Ensure network firewall rules allow mindSDB to access these shares.

5. Optional: Bridge to cloud LLMs or models

If your policy allows:

  • Use an outbound proxy
    • Route LLM requests through a controlled egress point with logging.
  • Or:
    • Host models on‑prem in your own GPU cluster and configure mindSDB to call those endpoints.

Governance and compliance in VPC and on‑prem deployments

For both VPC and on‑prem, the governance story is the same:

  • Data residency

    • All business data stays in the original databases and storage systems.
    • mindSDB performs query‑in‑place execution—no data movement or centralized warehouse required.
  • Access control

    • RBAC for which users can:
      • Add or modify data sources.
      • Create Knowledge Bases.
      • Run or schedule sensitive queries.
    • Native permissions for Knowledge Bases:
      • Documents inherit access rules from their source system (SharePoint, Google Drive, file shares, etc.).
  • Auditability

    • Every question generates:
      • A plan (logged).
      • Generated SQL (logged and reviewable).
      • Validation steps and outcomes (logged).
      • Execution metadata (who ran what, when, and against which sources).
    • Logs can be shipped to your SIEM for unified security monitoring.
  • Data quality and verification

    • Multi-phase validation ensures:
      • SQL is schema-aware and safe before hitting live systems.
      • Answers are returned with citations and links back to source data, so analysts can verify quickly.

This is the backbone of running AI analytics where AI is a decision support layer, not an unreviewed decision maker.


Common deployment patterns and best practices

Pattern 1: Central AI analytics layer in a VPC

  • Deploy mindSDB in a central analytics VPC.
  • Connect:
    • Production databases via VPC peering.
    • Data warehouse (Snowflake, BigQuery).
    • Document storage buckets and SaaS tools.
  • Use SSO and group-based RBAC to carve multi-team access.

Pattern 2: On‑prem mindSDB with hybrid data

  • mindSDB runs fully on‑prem.
  • Connect to:
    • On‑prem databases and file systems directly.
    • Cloud warehouses via private interconnect or secure tunnels, if allowed.
  • LLM requests are either:
    • Routed through a controlled egress gateway, or
    • Served by on‑prem model endpoints.

Best practices

  • Start with read-only accounts.
    Use read-only database roles until you are ready to enable writeback or operational workflows.

  • Scope by business domain first.
    Start with one or two teams (e.g., RevOps + Support), connect their critical systems, and expand from there.

  • Review and tune the cognitive engine.
    As you see how mindSDB interprets your business language, refine:

    • Schema descriptions.
    • Business term mappings (“MRR,” “ARR,” “churn,” “tickets,” “cases”).
  • Measure time-to-insight.
    Track the delta between legacy BI workflows (days to build dashboards) vs conversational analytics (minutes to ask, verify, and decide).


From first deployment to production rollout

A typical path for mindSDB Teams (Deploy Anywhere) in a VPC or on‑prem environment looks like this:

  1. Week 1:

    • Stand up the environment (containers, DB, ingress).
    • Wire SSO/LDAP.
    • Connect a couple of core data sources.
  2. Weeks 2–3:

    • Onboard a pilot team.
    • Build initial Knowledge Bases over key document stores.
    • Tune validation and governance rules.
  3. Weeks 4+:

    • Roll out to additional teams (Finance, Product, Support, Risk).
    • Add proactive monitoring and scheduled reporting.
    • Start embedding mindSDB into internal tools or customer-facing apps via API/OEM options.

You move from “We’re waiting days for dashboards” to “We ask in English or SQL and get citation-backed, cross-system answers in seconds”—all inside your VPC or data center.


Final thoughts

Deploying mindSDB Teams (Deploy Anywhere) in your VPC or on‑prem environment is ultimately about one thing: bringing AI directly to where your data already lives, under your governance, with no ETL sprawl and no black-box decisions.

If you want a detailed deployment plan tailored to your specific stack—cloud vs on‑prem, MySQL vs Snowflake vs BigQuery, document systems, and LLM constraints—the next step is straightforward:

Get Started