How do I deploy mindSDB Teams (Deploy Anywhere) in our VPC or on‑prem environment?
AI Analytics & BI Platforms

How do I deploy mindSDB Teams (Deploy Anywhere) in our VPC or on‑prem environment?

11 min read

Most teams exploring mindSDB Teams (Deploy Anywhere) want the same thing: AI-powered analytics that live inside their existing data stack, not another SaaS silo. That’s exactly what the Deploy Anywhere model is designed for—run mindSDB fully within your VPC or on‑prem environment, keep data inside your trust boundary, and still give your organization conversational analytics and document intelligence across MySQL, PostgreSQL, Snowflake, BigQuery, Salesforce, and more.

This guide walks through how to plan, deploy, and operate mindSDB Teams (Deploy Anywhere) in your own infrastructure, with a focus on security, governance, and fast time‑to‑value.


What “Deploy Anywhere” Actually Means

Deploy Anywhere is the enterprise deployment pattern for mindSDB Teams where:

  • mindSDB runs in your infrastructure
    • Your private cloud VPC (AWS, GCP, Azure)
    • Your on‑prem data center (VMs, Kubernetes, bare metal)
  • Your data never leaves your trust boundary
    • No data movement to mindSDB‑hosted services
    • Query‑in‑place against databases, warehouses, CRMs, and file stores
  • You keep full control over governance
    • SSO/LDAP, RBAC, audit logs, native/source permissions

At a high level, Deploy Anywhere means:

“mindSDB comes to your data — not the other way around.”


Core Architecture: How mindSDB Runs in Your VPC or On‑Prem

mindSDB is an AI Business Insights Solution built as a set of deployable services:

  • Core application / API layer
    • Orchestrates natural language → plan → SQL → execution
    • Manages user sessions, permissions, and query history
  • Cognitive engine
    • Translates English (or SQL) into executable query plans
    • Generates and validates SQL against your schemas
    • Routes to LLMs you configure (can be self‑hosted or cloud endpoints)
  • Connector layer (200+ data sources)
    • Direct connections to MySQL, PostgreSQL, Snowflake, BigQuery, MS SQL Server, Salesforce, and others
    • Query‑in‑place: no ETL pipelines, no replicas required
  • Knowledge Base & document intelligence
    • Connects to object storage, DMS, file systems, cloud drives
    • Handles chunking, embeddings, metadata extraction, AutoSync, and native permissions
  • UI & OEM layer
    • Web application for conversational analytics and insights
    • Customizable UI & embeddable components for ISVs/OEM

In a Deploy Anywhere setup, all of these pieces run as containers or services inside your network, talking directly to your existing data sources.


Pre‑Deployment Checklist

Before you deploy in your VPC or on‑prem environment, align on the following:

1. Deployment Model

Decide where mindSDB will live:

  • Kubernetes (recommended for scale)
    • EKS, GKE, AKS, or on‑prem K8s (Rancher, OpenShift, bare metal)
    • Ideal for HA, auto‑scaling, and multi‑AZ setups
  • Container deployment (non‑K8s)
    • Docker or container‑orchestration on VMs
    • Good for smaller teams or POCs
  • Bare‑metal / VM services
    • Systemd, Docker Compose, or similar
    • Works when containers are constrained but recommended path is containerized

2. Network & Security Assumptions

Confirm:

  • mindSDB services will run inside your VPC / internal network
  • Outbound access allowed only if you choose external LLM endpoints
  • Security controls you’ll integrate:
    • SSO (SAML/OIDC) or LDAP
    • Network segmentation & security groups
    • TLS certificate management (ingress / load balancer)

3. Data Sources & Permissions

List the systems you’ll connect on day one:

  • Databases & warehouses (e.g., MySQL, PostgreSQL, MS SQL Server, Snowflake, BigQuery)
  • SaaS systems (e.g., Salesforce, HubSpot)
  • File/document stores (e.g., S3, GCS, Azure Blob, on‑prem NAS, SharePoint, Google Drive)

For each source:

  • Create least‑privilege service accounts
  • Confirm read‑only where appropriate (especially for analytics)
  • Validate network routes from mindSDB to these systems

4. Identity & Access Management

Plan how users will authenticate:

  • SSO via SAML/OIDC (Okta, Azure AD, Google Workspace, Ping, etc.)
  • LDAP / Active Directory
  • Role setup:
    • Admins (platform configuration, connectors, RBAC)
    • Data stewards (governance, schema exposure)
    • Business users (query access, project‑level access)

High‑Level Deployment Flow in Your VPC or On‑Prem

At a birds‑eye view, deployment follows this path:

  1. Provision infrastructure (K8s cluster or VM fleet) within your VPC/on‑prem
  2. Deploy the mindSDB core services using containers or Helm charts
  3. Configure networking & TLS (ingress, load balancer, certificates)
  4. Integrate identity providers (SSO/LDAP) and set up RBAC
  5. Connect data sources via secure connectors
  6. Configure LLM endpoints (bring your own, or use cloud endpoints you control)
  7. Enable Knowledge Base & AutoSync for document repositories
  8. Set up logging & observability to meet governance requirements
  9. Roll out to pilot users with a governance‑driven onboarding

Step‑by‑Step: Deploying mindSDB in Your Environment

Step 1: Provision Compute & Storage

For Kubernetes deployments:

  • Choose your cluster:
    • AWS EKS, GCP GKE, Azure AKS, or on‑prem K8s
  • Sizing guidance:
    • Start with a small node pool (e.g., 3–5 nodes)
    • Support horizontal pod autoscaling for the core API and cognitive engine
  • Storage:
    • Use a managed storage class for persistent volumes (PostgreSQL metadata, logs, caches)
    • Optionally, dedicated volumes for long‑term logs and observability

For VM / container deployments:

  • Dedicated app servers (e.g., 2–4 VMs to start)
  • Reverse proxy/load balancer in front (NGINX, HAProxy, ALB/ELB, etc.)
  • Centralized logging and metrics (CloudWatch, Prometheus, ELK, Datadog, etc.)

Step 2: Install mindSDB Services

Your mindSDB account team will provide artifacts appropriate for your setup (e.g., containers, Helm charts, or install scripts). At a high level:

  • Pull container images from the approved registry (private or vendor‑provided)
  • Deploy core components:
    • API / UI service
    • Cognitive engine service
    • Connector services
    • Metadata database (if not using your own)
    • Job scheduler for AutoSync and recurring tasks

For Kubernetes, this typically means:

  • Apply Helm chart or Kubernetes manifests
  • Validate:
    • All pods in Running state
    • Services reachable within the cluster
    • Ingress configured for external access

For VM-based deployments:

  • Run Docker Compose or equivalent orchestrator
  • Bind services to internal network interfaces
  • Configure systemd services for auto‑start and health monitoring

Step 3: Configure Ingress, TLS, and Internal Access

Expose mindSDB only as broadly as needed:

  • Ingress / Load Balancer:
    • Internal ALB/ELB (AWS), internal load balancer (GCP/Azure), or on‑prem LB
    • Map a friendly internal domain: e.g., mindsdb.yourcompany.internal
  • TLS / Certificates:
    • Use your internal CA or ACM/managed certificates
    • Enforce HTTPS‑only access
  • Network Restrictions:
    • Restrict access to corporate IP ranges / VPN
    • Apply security group rules to limit inbound ports and outbound destinations

Step 4: Integrate SSO or LDAP

mindSDB Teams (Deploy Anywhere) supports:

  • Single Sign‑On (SSO)
    • SAML or OIDC with providers like Okta, Azure AD, Google Workspace, Ping
  • LDAP / Active Directory
    • Direct integration with your directory for auth and group mapping

Configuration steps typically include:

  1. Create a new “mindSDB” application in your IdP
  2. Set callback/redirect URLs pointing to your internal mindSDB domain
  3. Configure attribute mappings (NameID, email, groups, roles)
  4. Import IdP metadata into mindSDB
  5. Map groups to roles: admin, data steward, business user

Result: unlimited users in your org can log in via SSO, while you retain centralized identity and access management.

Step 5: Connect Databases and Data Warehouses

Once authentication is set up, connect your data where it already lives.

Typical first connections:

  • Operational databases: MySQL, PostgreSQL, MS SQL Server
  • Analytics warehouses: Snowflake, BigQuery, Redshift
  • Application data: Salesforce, HubSpot, other CRM/ERP systems

For each connection:

  1. Create a dedicated DB/service user with least‑privilege access
  2. Whitelist the mindSDB cluster/VM IPs or security groups
  3. Configure connection in mindSDB:
    • Host, port, database name
    • Username/password or managed secrets
    • Optional SSL parameters/certs
  4. Validate:
    • mindSDB can introspect schema
    • Test queries succeed and respect permissions

Because mindSDB executes queries in place, there’s:

  • No ETL required
  • No data movement or replication
  • No new warehouse to maintain

This is central to the “Deploy Anywhere” promise—your data stays inside MySQL, Snowflake, BigQuery, etc., and mindSDB just sends SQL to it.

Step 6: Configure Knowledge Base for Documents

For unstructured content, configure the Knowledge Base:

  • Connect storage:
    • S3, GCS, Azure Blob
    • On‑prem file servers, NAS, or DMS
    • SharePoint, Google Drive, other repository systems
  • Enable AutoSync:
    • mindSDB crawls document locations on a schedule
    • Updates embeddings and metadata as content changes
  • Preserve native permissions:
    • mindSDB respects source‑system ACLs
    • Users only see insights from documents they’re allowed to access

Under the hood, the Knowledge Base:

  • Chunks documents (PDF, Word, HTML, text, etc.)
  • Extracts and attaches metadata (author, dates, categories)
  • Generates embeddings using models you configure
  • Tracks embedding freshness and retrieval accuracy

Result: users can ask natural language questions across structured data and documents and get citation‑backed answers with source links.

Step 7: Bring Your Own LLMs (Optional but Recommended)

In a VPC / on‑prem deployment, you may want full control over model endpoints:

  • Use self‑hosted models in your own VPC or on‑prem (e.g., on Kubernetes or GPU nodes)
  • Use cloud model APIs reachable from your network with strict egress controls
  • Route different workloads to different models (e.g., SQL planning vs summarization)

mindSDB’s cognitive engine:

  • Plans multi‑step workflows (planning → generation → validation → execution)
  • Validates SQL before it hits your production systems
  • Logs reasoning and queries for auditing and troubleshooting

You choose:

  • Which models run where
  • What data they can see
  • How they’re invoked and monitored

Governance, Logging, and Observability

Deploy Anywhere is built for environments where trust, auditability, and compliance are non‑negotiable.

Auditable, Transparent Execution

For every question asked:

  • The plan, SQL, and execution steps are logged
  • Users can inspect:
    • How a question was interpreted
    • SQL queries that ran against each data source
    • Which documents and rows were used as evidence
  • Administrators can trace behavior across:
    • Users and roles
    • Data sources
    • Time ranges

This supports:

  • Internal audit trails
  • Compliance reviews
  • Root‑cause analysis for unexpected outputs

Data Quality and Validation

mindSDB’s multi‑phase pipeline:

  1. Schema understanding — introspects your databases and adapts to business terminology (“projects,” “tickets,” “cases”).
  2. Plan generation — creates an executable plan to answer the question.
  3. SQL generation & validation — generates SQL and validates it against your schemas and constraints.
  4. Execution & response — runs queries in your databases/warehouses, returns response with citations.
  5. Continuous evaluation — track retrieval accuracy, latency, and embedding freshness.

This makes analytics not just fast, but defensible — especially important when decisions carry financial, operational, or regulatory risk.

RBAC and Least Privilege

Inside your VPC/on‑prem environment, pair SSO with:

  • Role‑based access control (RBAC) at:
    • Project/workspace level
    • Data source level
    • Document collection level
  • Governance patterns:
    • Separate “sandbox” vs “production” projects
    • Limit who can create/edit data sources
    • Route sensitive workloads to specific model endpoints

On‑Prem vs VPC: What Changes and What Stays the Same

The overall deployment pattern is similar, but a few details differ.

VPC Deployment Highlights

  • Run on managed Kubernetes (EKS/GKE/AKS) or VM fleets
  • Use cloud‑native services:
    • Managed databases for metadata
    • Cloud logging (CloudWatch, Stackdriver, Azure Monitor)
    • Managed load balancers and certificates
  • Control data residency by region and VPC boundaries
  • Integrate directly with private subnets and service endpoints

On‑Prem Deployment Highlights

  • Run on your data center infrastructure:
    • On‑prem Kubernetes
    • VM clusters
    • Bare metal with container runtimes
  • Integrate with:
    • On‑prem Active Directory/LDAP
    • On‑prem SIEM and logging
    • Existing LBs and firewalls
  • Useful when:
    • Data residency laws require local hosting
    • Regulatory policies forbid cloud services
    • You already operate a mature on‑prem stack

In both cases, the core principles hold:

  • No data movement to mindSDB‑hosted infrastructure
  • Query‑in‑place execution against your systems
  • Full visibility into reasoning, SQL, and sources

Typical Go‑Live Timeline

Most teams can go from “we have a cluster” to “pilot users are asking questions” in 2–4 weeks, roughly:

  • Week 1
    • Infra provisioned, mindSDB services deployed
    • SSO/LDAP integrated
    • First data sources connected
  • Week 2
    • Knowledge Base configured for key document stores
    • Governance/RBAC tuned
    • Pilot group onboarded (analysts, ops, finance, support)
  • Weeks 3–4
    • Expand data coverage and projects
    • Add more LLM endpoints (if needed)
    • Build OEM/embed experiences inside your own apps

That’s a fraction of the “months to years” it typically takes to stitch together ETL pipelines, BI dashboards, and DIY AI features.


When to Choose Deploy Anywhere vs SaaS

Choose mindSDB Teams (Deploy Anywhere) when:

  • You have strict data residency or compliance requirements
  • Your data lives across multiple private systems (databases, warehouses, CRMs, on‑prem file stores)
  • You want unlimited users with SSO/LDAP and enterprise RBAC
  • You need to run everything inside your own VPC or data center
  • Explainability, auditability, and governance are mandatory

If you’re early in your journey or experimenting, you can start with community/open‑source deployments in containers, then move to the full Teams (Deploy Anywhere) model as requirements grow.


Final Takeaway

Deploying mindSDB Teams (Deploy Anywhere) in your VPC or on‑prem environment gives you:

  • AI‑powered analytics and document intelligence that run inside your trust boundary
  • No data movement, no ETL pipelines, and no extra warehouses to maintain
  • Full governance: SSO/LDAP, RBAC, audit logs, and native/source permissions
  • Transparent, citation‑backed answers with reviewable SQL and reasoning

You get from “slow, fragmented insights” to “real‑time, cross‑system answers” without compromising on security or compliance.


Next Step

Get Started