How do I set up a mindSDB proof of concept to answer questions across Postgres + BigQuery without building ETL?
AI Analytics & BI Platforms

How do I set up a mindSDB proof of concept to answer questions across Postgres + BigQuery without building ETL?

10 min read

Most teams trying to wire Postgres and BigQuery together hit the same wall: the second you introduce ETL, your “quick POC” turns into a data engineering project. With mindSDB, you can skip all of that. You connect directly to both databases, keep data where it already lives, and let an AI-powered analytics layer plan and run cross-system queries in place.

Below is a practical, step-by-step way to set up a mindSDB proof of concept (POC) that answers natural language questions across Postgres and BigQuery—without building ETL pipelines.


What this POC will prove

In this POC, you’ll verify that mindSDB can:

  • Connect to Postgres and BigQuery with no data movement
  • Let users ask questions in plain English (and SQL if they want)
  • Automatically plan, generate, and run cross-database queries
  • Return answers with citations, reasoning, and reviewable SQL
  • Run fully inside your trust boundary (VPC / on‑prem), respecting each system’s permissions

You’re not trying to solve every analytics use case. You’re trying to show that “query-in-place AI analytics” across Postgres + BigQuery works in days, not months—without ETL.


Step 1: Define a narrow, business-backed question set

Start with 5–10 concrete questions your stakeholders care about that require both Postgres and BigQuery to answer. This keeps the POC focused and measurable.

Examples:

  • “How many net new paid customers did we acquire last quarter by marketing channel?”
    • Postgres: customers, subscriptions
    • BigQuery: ad spend, campaign metrics
  • “What’s the churn rate for customers whose first touch came from Google Ads?”
  • “For our top 50 accounts by revenue, what’s their feature usage trend over the last 90 days?”

Document each question with:

  • The tables involved in Postgres and BigQuery
  • Any known joins and business definitions (e.g., what “active customer” means)
  • Acceptance criteria: how you’ll verify the answer is correct

These become your test harness for the POC.


Step 2: Decide where mindSDB runs (keep data in your boundary)

mindSDB is designed to run where your data already lives, not as a hosted SaaS that copies your data out.

For a POC, you typically have two options:

  1. Your VPC (recommended for realistic enterprise testing)

    • Deploy mindSDB into the same cloud provider where Postgres and BigQuery sit (e.g., GCP or multi-cloud VPC peering).
    • This keeps all traffic inside your network and under your existing security controls.
  2. On-prem or local lab (for early validation)

    • Useful if Postgres is on-prem and you have secure connectivity to BigQuery via a VPN or private link.
    • You still keep data in-place; mindSDB only sends queries to the databases.

Key point: mindSDB does not host, store, or transfer your customer data. It issues queries to your databases and uses results in memory to answer the user’s question, with every step logged.


Step 3: Connect Postgres and BigQuery (no ETL, no pipelines)

Once mindSDB is deployed, you’ll configure data connections. The goal is simple: expose Postgres and BigQuery as first-class data sources so the cognitive engine can reason across both.

3.1. Gather connection details

For Postgres, you’ll need:

  • Hostname / IP
  • Port (usually 5432)
  • Database name
  • User + password or IAM-based auth
  • SSL requirements

For BigQuery, you’ll need:

  • GCP project ID
  • Dataset(s) you want to expose
  • Service account credentials (JSON key or workload identity)
  • Access scopes / roles (e.g., roles/bigquery.dataViewer for relevant datasets)

Create least-privilege accounts—read-only for analytics is usually sufficient for a POC.

3.2. Register the data sources in mindSDB

Using the mindSDB UI or config, you define both connections. The exact syntax will depend on your deployment, but conceptually you’re doing:

  • CREATE DATABASE postgres_src ... pointing to your Postgres instance
  • CREATE DATABASE bigquery_src ... pointing to your BigQuery project & dataset

Once registered, mindSDB’s engine “sees” both systems and can inspect schemas as needed—without copying any data out or building ETL.


Step 4: Let mindSDB learn your schema and business language

To answer cross-system questions reliably, the engine needs to understand:

  • Table and column names in Postgres and BigQuery
  • Relationships (foreign keys, logical joins, shared IDs)
  • Business semantics (“MRR,” “active subscription,” “lead,” “opportunity,” etc.)

4.1. Expose the right schemas

Scope is your friend. For the POC:

  • Limit Postgres to the 3–5 schemas/tables that matter (e.g., customers, subscriptions, invoices).
  • Limit BigQuery to the relevant datasets (e.g., marketing_performance, ad_spend, web_analytics).

This makes the planning space smaller, which means faster and more accurate query generation.

4.2. Capture business terminology

mindSDB’s cognitive engine is built to adapt to your domain language. You can improve POC quality quickly by:

  • Creating a short glossary (even a simple document) that defines key terms:
    • “Active customer = has at least one paid subscription in the last 30 days”
    • “New customer = first invoice date in the period”
  • Mapping those terms to actual columns and tables in Postgres and BigQuery.

During POC setup, you can surface this context so the engine has a clear signal on how to translate natural language into SQL plans.


Step 5: Configure natural language → SQL planning and validation

The core of this POC is showing that people can ask English questions and get cross-database answers, backed by verifiable SQL. That’s where mindSDB’s multi-step pipeline comes in:

  1. Planning – Understand the user question and decide which data sources and tables to use.
  2. Generation – Build the SQL queries (or sequence of queries) required.
  3. Validation – Check SQL against schema, constraints, and safety rules before execution.
  4. Execution – Run the queries directly on Postgres and BigQuery.
  5. Assembly & reasoning – Join partial results, compute metrics, and present a clear answer with citations.

For the POC, you’ll want to:

  • Enable logging for every phase so you can see how each question was interpreted.
  • Turn on SQL visibility so analysts can view and, if needed, refine queries.
  • Configure guardrails (e.g., row limits, timeout thresholds) for safe execution in shared environments.

This is where mindSDB differs from generic AI: you’re not getting a black-box answer—you’re getting a transparent, auditable execution plan that you can inspect and iterate.


Step 6: Test cross-system questions from a single interface

With connections and planning configured, it’s time to run your real questions.

6.1. Use natural language first

In the mindSDB conversational interface, ask one of your POC questions:

“Show me total new MRR last quarter by marketing channel, combining Stripe data in Postgres with campaign data in BigQuery, and sort descending by MRR.”

Behind the scenes, mindSDB will:

  • Identify which tables in Postgres correspond to payments/subscriptions.
  • Identify which tables in BigQuery correspond to campaigns and channels.
  • Build a plan to compute:
    • New MRR by customer (Postgres)
    • Channel attribution by customer (BigQuery)
  • Generate and validate the required SQL for each system.
  • Run those queries in-place, join the results, and return a concise answer.

The UI should show you:

  • The answer (e.g., a small table or chart, plus a natural language summary).
  • The SQL it ran on Postgres and BigQuery.
  • Citations or references to which tables/columns were used.

6.2. Validate the outputs with your analysts

For each POC question:

  1. Compare the mindSDB answer with a manually verified baseline (a query you already trust, or a pre-existing report).
  2. Look at the generated SQL—does it align with how your analysts would approach the question?
  3. Capture discrepancies: is it a misunderstanding of terminology, a join path, or a data quality issue?

You can then refine schemas, relationships, or glossary to improve subsequent runs.


Step 7: Measure what actually matters: time-to-insight, not chart count

A POC like this isn’t about building a shiny dashboard; it’s about collapsing the time between a question and a verified answer across Postgres and BigQuery.

Track a few simple metrics:

  • Time-to-first-answer

    • How long did it take from “we want this question answered” to mindSDB returning a usable result?
    • Benchmark against your current path: ticket → analyst → ETL → dashboard (often days).
  • Iteration speed

    • How quickly can a user ask follow-up questions (“segment this by region,” “limit to enterprise accounts,” “last 30 days only”) and get updated answers?
    • With mindSDB, this should be seconds, not re-ETL cycles.
  • Analyst time saved

    • How many hours of SQL and data wrangling did this question previously require?
    • For many teams, it’s 5–20 hours per cross-system request; now it’s a few minutes of verifying.
  • Accuracy and trust

    • For each POC question, score answers as correct / partially correct / incorrect.
    • Examine logs and SQL where answers were off—this is where mindSDB’s transparency becomes a governance asset, not a liability.

This is the business case you’ll present to stakeholders: from “days + ETL” to “minutes, no ETL, fully logged.”


Step 8: Bring in more users, keep governance tight

Once the initial POC questions are working, expand in two directions: access and governance.

8.1. Add more users

Invite:

  • A few business leaders who regularly ask cross-system questions.
  • Frontline analysts who can validate and refine questions.
  • Optionally, a data engineer to verify performance and resource usage on Postgres/BigQuery.

Ask them to use mindSDB for real work for a week: preparing QBRs, answering ad hoc revenue questions, or analyzing campaign performance.

8.2. Enforce RBAC and permissions

mindSDB respects existing boundaries:

  • Use RBAC and SSO/LDAP so users inherit the right level of access.
  • Ensure BigQuery and Postgres permissions stay intact. mindSDB should not see tables a user cannot see in the underlying system.
  • Log all queries and actions for auditability—who asked what, when, and which SQL was executed.

This is how you make conversational analytics production-grade in an environment where governance and compliance matter.


Step 9: Decide next steps based on POC outcomes

By this point, you should have:

  • A working mindSDB deployment talking to Postgres + BigQuery.
  • 5–10 real business questions being answered with no ETL.
  • Clear before/after comparisons on time-to-insight and analyst effort.
  • Evidence that queries, reasoning, and results are transparent and auditable.

From here, teams typically choose one of three paths:

  1. Expand to more systems

    • Add Salesforce, billing, or support systems via mindSDB’s broader connector set.
    • Keep leaning into query-in-place instead of adding new pipelines.
  2. Embed into existing workflows

    • Surface mindSDB’s capabilities inside tools your teams already use (internal portals, notebooks, BI tools).
    • Use SQL output as a building block for scheduled reports or monitors.
  3. Productionize with SLAs

    • Formalize latency and accuracy expectations.
    • Monitor key KPIs like retrieval accuracy, embedding freshness, and latency for document-heavy scenarios.
    • Treat mindSDB as a core “AI Business Insights Solution,” not a side experiment.

Why this approach beats an ETL-heavy POC

You could prove the same business value by copying Postgres data into BigQuery, building new models, and wiring dashboards on top. But that’s exactly the trap we built mindSDB to avoid:

  • You lose data freshness—ETL is always catching up.
  • You introduce more fragile pipelines and schema drift risk.
  • You waste weeks to months on data engineering before anyone can even ask a question.

By letting mindSDB query Postgres and BigQuery in-place, your POC shows a different path:

  • No data movement, no ETL. Connect, configure, and start asking questions.
  • < 5 minutes from key question to verified answer, once connections and schema understanding are in place.
  • Fully transparent—every plan, query, and answer is logged and reviewable.
  • Governance-first—your data never leaves your trust boundary; permissions and auditability stay under your control.

If you want help structuring this POC for your specific Postgres and BigQuery environment—or need to design a question set that will resonate with leadership—you can walk through it with our team.

Next Step

Get Started