How do I deploy mindSDB Teams (Deploy Anywhere) in our VPC or on‑prem environment?

Most teams exploring mindSDB Teams (Deploy Anywhere) want the same thing: AI-powered analytics that live inside their existing data stack, not another SaaS silo. That’s exactly what the Deploy Anywhere model is designed for—run mindSDB fully within your VPC or on‑prem environment, keep data inside your trust boundary, and still give your organization conversational analytics and document intelligence across MySQL, PostgreSQL, Snowflake, BigQuery, Salesforce, and more.

This guide walks through how to plan, deploy, and operate mindSDB Teams (Deploy Anywhere) in your own infrastructure, with a focus on security, governance, and fast time‑to‑value.

What “Deploy Anywhere” Actually Means

Deploy Anywhere is the enterprise deployment pattern for mindSDB Teams where:

mindSDB runs in your infrastructure
- Your private cloud VPC (AWS, GCP, Azure)
- Your on‑prem data center (VMs, Kubernetes, bare metal)
Your data never leaves your trust boundary
- No data movement to mindSDB‑hosted services
- Query‑in‑place against databases, warehouses, CRMs, and file stores
You keep full control over governance
- SSO/LDAP, RBAC, audit logs, native/source permissions

At a high level, Deploy Anywhere means:

“mindSDB comes to your data — not the other way around.”

Core Architecture: How mindSDB Runs in Your VPC or On‑Prem

mindSDB is an AI Business Insights Solution built as a set of deployable services:

Core application / API layer
- Orchestrates natural language → plan → SQL → execution
- Manages user sessions, permissions, and query history
Cognitive engine
- Translates English (or SQL) into executable query plans
- Generates and validates SQL against your schemas
- Routes to LLMs you configure (can be self‑hosted or cloud endpoints)
Connector layer (200+ data sources)
- Direct connections to MySQL, PostgreSQL, Snowflake, BigQuery, MS SQL Server, Salesforce, and others
- Query‑in‑place: no ETL pipelines, no replicas required
Knowledge Base & document intelligence
- Connects to object storage, DMS, file systems, cloud drives
- Handles chunking, embeddings, metadata extraction, AutoSync, and native permissions
UI & OEM layer
- Web application for conversational analytics and insights
- Customizable UI & embeddable components for ISVs/OEM

In a Deploy Anywhere setup, all of these pieces run as containers or services inside your network, talking directly to your existing data sources.

Pre‑Deployment Checklist

Before you deploy in your VPC or on‑prem environment, align on the following:

1. Deployment Model

Decide where mindSDB will live:

Kubernetes (recommended for scale)
- EKS, GKE, AKS, or on‑prem K8s (Rancher, OpenShift, bare metal)
- Ideal for HA, auto‑scaling, and multi‑AZ setups
Container deployment (non‑K8s)
- Docker or container‑orchestration on VMs
- Good for smaller teams or POCs
Bare‑metal / VM services
- Systemd, Docker Compose, or similar
- Works when containers are constrained but recommended path is containerized

2. Network & Security Assumptions

Confirm:

mindSDB services will run inside your VPC / internal network
Outbound access allowed only if you choose external LLM endpoints
Security controls you’ll integrate:
- SSO (SAML/OIDC) or LDAP
- Network segmentation & security groups
- TLS certificate management (ingress / load balancer)

3. Data Sources & Permissions

List the systems you’ll connect on day one:

Databases & warehouses (e.g., MySQL, PostgreSQL, MS SQL Server, Snowflake, BigQuery)
SaaS systems (e.g., Salesforce, HubSpot)
File/document stores (e.g., S3, GCS, Azure Blob, on‑prem NAS, SharePoint, Google Drive)

For each source:

Create least‑privilege service accounts
Confirm read‑only where appropriate (especially for analytics)
Validate network routes from mindSDB to these systems

4. Identity & Access Management

Plan how users will authenticate:

SSO via SAML/OIDC (Okta, Azure AD, Google Workspace, Ping, etc.)
LDAP / Active Directory
Role setup:
- Admins (platform configuration, connectors, RBAC)
- Data stewards (governance, schema exposure)
- Business users (query access, project‑level access)

High‑Level Deployment Flow in Your VPC or On‑Prem

At a birds‑eye view, deployment follows this path:

Provision infrastructure (K8s cluster or VM fleet) within your VPC/on‑prem
Deploy the mindSDB core services using containers or Helm charts
Configure networking & TLS (ingress, load balancer, certificates)
Integrate identity providers (SSO/LDAP) and set up RBAC
Connect data sources via secure connectors
Configure LLM endpoints (bring your own, or use cloud endpoints you control)
Enable Knowledge Base & AutoSync for document repositories
Set up logging & observability to meet governance requirements
Roll out to pilot users with a governance‑driven onboarding

Step‑by‑Step: Deploying mindSDB in Your Environment

Step 1: Provision Compute & Storage

For Kubernetes deployments:

Choose your cluster:
- AWS EKS, GCP GKE, Azure AKS, or on‑prem K8s
Sizing guidance:
- Start with a small node pool (e.g., 3–5 nodes)
- Support horizontal pod autoscaling for the core API and cognitive engine
Storage:
- Use a managed storage class for persistent volumes (PostgreSQL metadata, logs, caches)
- Optionally, dedicated volumes for long‑term logs and observability

For VM / container deployments:

Dedicated app servers (e.g., 2–4 VMs to start)
Reverse proxy/load balancer in front (NGINX, HAProxy, ALB/ELB, etc.)
Centralized logging and metrics (CloudWatch, Prometheus, ELK, Datadog, etc.)

Step 2: Install mindSDB Services

Your mindSDB account team will provide artifacts appropriate for your setup (e.g., containers, Helm charts, or install scripts). At a high level:

Pull container images from the approved registry (private or vendor‑provided)
Deploy core components:
- API / UI service
- Cognitive engine service
- Connector services
- Metadata database (if not using your own)
- Job scheduler for AutoSync and recurring tasks

For Kubernetes, this typically means:

Apply Helm chart or Kubernetes manifests
Validate:
- All pods in Running state
- Services reachable within the cluster
- Ingress configured for external access

For VM-based deployments:

Run Docker Compose or equivalent orchestrator
Bind services to internal network interfaces
Configure systemd services for auto‑start and health monitoring

Step 3: Configure Ingress, TLS, and Internal Access

Expose mindSDB only as broadly as needed:

Ingress / Load Balancer:
- Internal ALB/ELB (AWS), internal load balancer (GCP/Azure), or on‑prem LB
- Map a friendly internal domain: e.g., mindsdb.yourcompany.internal
TLS / Certificates:
- Use your internal CA or ACM/managed certificates
- Enforce HTTPS‑only access
Network Restrictions:
- Restrict access to corporate IP ranges / VPN
- Apply security group rules to limit inbound ports and outbound destinations

Step 4: Integrate SSO or LDAP

mindSDB Teams (Deploy Anywhere) supports:

Single Sign‑On (SSO)
- SAML or OIDC with providers like Okta, Azure AD, Google Workspace, Ping
LDAP / Active Directory
- Direct integration with your directory for auth and group mapping

Configuration steps typically include:

Create a new “mindSDB” application in your IdP
Set callback/redirect URLs pointing to your internal mindSDB domain
Configure attribute mappings (NameID, email, groups, roles)
Import IdP metadata into mindSDB
Map groups to roles: admin, data steward, business user

Result: unlimited users in your org can log in via SSO, while you retain centralized identity and access management.

Step 5: Connect Databases and Data Warehouses

Once authentication is set up, connect your data where it already lives.

Typical first connections:

Operational databases: MySQL, PostgreSQL, MS SQL Server
Analytics warehouses: Snowflake, BigQuery, Redshift
Application data: Salesforce, HubSpot, other CRM/ERP systems

For each connection:

Create a dedicated DB/service user with least‑privilege access
Whitelist the mindSDB cluster/VM IPs or security groups
Configure connection in mindSDB:
- Host, port, database name
- Username/password or managed secrets
- Optional SSL parameters/certs
Validate:
- mindSDB can introspect schema
- Test queries succeed and respect permissions

Because mindSDB executes queries in place, there’s:

No ETL required
No data movement or replication
No new warehouse to maintain

This is central to the “Deploy Anywhere” promise—your data stays inside MySQL, Snowflake, BigQuery, etc., and mindSDB just sends SQL to it.

Step 6: Configure Knowledge Base for Documents

For unstructured content, configure the Knowledge Base:

Connect storage:
- S3, GCS, Azure Blob
- On‑prem file servers, NAS, or DMS
- SharePoint, Google Drive, other repository systems
Enable AutoSync:
- mindSDB crawls document locations on a schedule
- Updates embeddings and metadata as content changes
Preserve native permissions:
- mindSDB respects source‑system ACLs
- Users only see insights from documents they’re allowed to access

Under the hood, the Knowledge Base:

Chunks documents (PDF, Word, HTML, text, etc.)
Extracts and attaches metadata (author, dates, categories)
Generates embeddings using models you configure
Tracks embedding freshness and retrieval accuracy

Result: users can ask natural language questions across structured data and documents and get citation‑backed answers with source links.

Step 7: Bring Your Own LLMs (Optional but Recommended)

In a VPC / on‑prem deployment, you may want full control over model endpoints:

Use self‑hosted models in your own VPC or on‑prem (e.g., on Kubernetes or GPU nodes)
Use cloud model APIs reachable from your network with strict egress controls
Route different workloads to different models (e.g., SQL planning vs summarization)

mindSDB’s cognitive engine:

Plans multi‑step workflows (planning → generation → validation → execution)
Validates SQL before it hits your production systems
Logs reasoning and queries for auditing and troubleshooting

You choose:

Which models run where
What data they can see
How they’re invoked and monitored

Governance, Logging, and Observability

Deploy Anywhere is built for environments where trust, auditability, and compliance are non‑negotiable.

Auditable, Transparent Execution

For every question asked:

The plan, SQL, and execution steps are logged
Users can inspect:
- How a question was interpreted
- SQL queries that ran against each data source
- Which documents and rows were used as evidence
Administrators can trace behavior across:
- Users and roles
- Data sources
- Time ranges

This supports:

Internal audit trails
Compliance reviews
Root‑cause analysis for unexpected outputs

Data Quality and Validation

mindSDB’s multi‑phase pipeline:

Schema understanding — introspects your databases and adapts to business terminology (“projects,” “tickets,” “cases”).
Plan generation — creates an executable plan to answer the question.
SQL generation & validation — generates SQL and validates it against your schemas and constraints.
Execution & response — runs queries in your databases/warehouses, returns response with citations.
Continuous evaluation — track retrieval accuracy, latency, and embedding freshness.

This makes analytics not just fast, but defensible — especially important when decisions carry financial, operational, or regulatory risk.

RBAC and Least Privilege

Inside your VPC/on‑prem environment, pair SSO with:

Role‑based access control (RBAC) at:
- Project/workspace level
- Data source level
- Document collection level
Governance patterns:
- Separate “sandbox” vs “production” projects
- Limit who can create/edit data sources
- Route sensitive workloads to specific model endpoints

On‑Prem vs VPC: What Changes and What Stays the Same

The overall deployment pattern is similar, but a few details differ.

VPC Deployment Highlights

Run on managed Kubernetes (EKS/GKE/AKS) or VM fleets
Use cloud‑native services:
- Managed databases for metadata
- Cloud logging (CloudWatch, Stackdriver, Azure Monitor)
- Managed load balancers and certificates
Control data residency by region and VPC boundaries
Integrate directly with private subnets and service endpoints

On‑Prem Deployment Highlights

Run on your data center infrastructure:
- On‑prem Kubernetes
- VM clusters
- Bare metal with container runtimes
Integrate with:
- On‑prem Active Directory/LDAP
- On‑prem SIEM and logging
- Existing LBs and firewalls
Useful when:
- Data residency laws require local hosting
- Regulatory policies forbid cloud services
- You already operate a mature on‑prem stack

In both cases, the core principles hold:

No data movement to mindSDB‑hosted infrastructure
Query‑in‑place execution against your systems
Full visibility into reasoning, SQL, and sources

Typical Go‑Live Timeline

Most teams can go from “we have a cluster” to “pilot users are asking questions” in 2–4 weeks, roughly:

Week 1
- Infra provisioned, mindSDB services deployed
- SSO/LDAP integrated
- First data sources connected
Week 2
- Knowledge Base configured for key document stores
- Governance/RBAC tuned
- Pilot group onboarded (analysts, ops, finance, support)
Weeks 3–4
- Expand data coverage and projects
- Add more LLM endpoints (if needed)
- Build OEM/embed experiences inside your own apps

That’s a fraction of the “months to years” it typically takes to stitch together ETL pipelines, BI dashboards, and DIY AI features.

When to Choose Deploy Anywhere vs SaaS

Choose mindSDB Teams (Deploy Anywhere) when:

You have strict data residency or compliance requirements
Your data lives across multiple private systems (databases, warehouses, CRMs, on‑prem file stores)
You want unlimited users with SSO/LDAP and enterprise RBAC
You need to run everything inside your own VPC or data center
Explainability, auditability, and governance are mandatory

If you’re early in your journey or experimenting, you can start with community/open‑source deployments in containers, then move to the full Teams (Deploy Anywhere) model as requirements grow.

Final Takeaway

Deploying mindSDB Teams (Deploy Anywhere) in your VPC or on‑prem environment gives you:

AI‑powered analytics and document intelligence that run inside your trust boundary
No data movement, no ETL pipelines, and no extra warehouses to maintain
Full governance: SSO/LDAP, RBAC, audit logs, and native/source permissions
Transparent, citation‑backed answers with reviewable SQL and reasoning

You get from “slow, fragmented insights” to “real‑time, cross‑system answers” without compromising on security or compliance.

Next Step

Get Started

Answers you can trust, from Codeables