How do I set up a mindSDB proof of concept to answer questions across Postgres + BigQuery without building ETL?
AI Analytics & BI Platforms

How do I set up a mindSDB proof of concept to answer questions across Postgres + BigQuery without building ETL?

11 min read

Most teams that ask this question are stuck in the same loop: data spread across Postgres and BigQuery, ad-hoc SQL stitched together in notebooks, and a BI queue that moves in days—not minutes. You know there’s a better way, but you don’t want to spend weeks building ETL pipelines or committing to a full-blown platform rollout just to prove out an idea.

This guide walks through how to set up a focused mindSDB proof of concept (POC) that can answer cross-database questions across Postgres and BigQuery—with no data movement and no ETL. The goal: go from “idea” to “I can ask real business questions in natural language and see verifiable SQL over both systems” in a few days, not months.


What this proof of concept should actually prove

Before you touch infrastructure, define what “success” looks like. A good mindSDB POC across Postgres + BigQuery should validate four things:

  1. Query-in-place execution works end-to-end
    That you can ask a question in plain English, mindSDB generates a plan, produces SQL, and executes it directly on Postgres and BigQuery—without copying or replicating data.

  2. Cross-system questions return trustworthy answers
    mindSDB can join facts across both systems (e.g., “Postgres customers” + “BigQuery events”) and return answers with:

    • Citation-backed rationale
    • Visible SQL
    • Results you can independently verify
  3. Governance and trust boundaries are preserved
    The POC must show that:

    • Data stays inside your infrastructure (VPC / on-prem)
    • Native permissions and RBAC are respected
    • Every step is logged and auditable
  4. Time-to-insight is materially faster than your status quo
    If it currently takes 1–5 days to get a cross-system answer from BI/analytics, the POC should demonstrate:

    • You can ask and verify in under 5 minutes
    • You can iterate on follow-up questions in seconds

Capture these as POC acceptance criteria—so you’re not just “playing with AI,” you’re testing whether mindSDB is a viable AI-powered analytics layer for your Postgres + BigQuery stack.


POC scope: keep it narrow, make it real

Resist the temptation to “connect everything.” You want a thin, high-value slice that hits real use cases but stays simple enough for a quick proof.

1. Pick one or two concrete cross-database questions

Examples that work well:

  • Revenue + usage analysis

    • Postgres: accounts, subscriptions
    • BigQuery: events, pageviews
    • Question:
      “For the last 90 days, which customers with MRR > $5,000 have seen a drop of more than 25% in weekly active users?”
  • Pipeline + product adoption

    • Postgres: opportunities, deals
    • BigQuery: feature_usage
    • Question:
      “Among deals closed in Q1, which features did winning customers use more frequently than lost deals?”
  • Support + revenue impact

    • Postgres: customers, billing
    • BigQuery: support_tickets
    • Question:
      “Which high-ARR customers opened more than 10 critical tickets in the last 60 days and also saw a spike in refunds?”

These questions are rich enough to prove value, but bounded enough that you can curate the minimal tables and columns you need.

2. Lock in your minimal schema slice

For each question, list:

  • The Postgres tables/columns needed
  • The BigQuery tables/columns needed
  • The join keys (e.g., account_id, customer_id, email)

Your POC will be dramatically smoother if:

  • Join keys are consistent (or at least mappable) across both systems
  • You avoid free-text joins and weird edge cases for the first iteration

This is not ETL—you’re not reshaping data or moving it. You’re simply defining the slice mindSDB should understand for the POC.


Step 1: Choose your POC deployment model

The non-negotiable premise: mindSDB runs inside your trust boundary. It does not host, store, or transfer your customer data.

For a POC, you typically have three options:

  1. VPC deployment (most common for cloud data stacks)

    • mindSDB deployed into your AWS/GCP/Azure VPC
    • Connects directly to your managed Postgres and BigQuery instances
    • Best if you already have network and IAM patterns for internal services
  2. On-premise deployment (if Postgres is in your data center)

    • mindSDB runs on your own hardware or private cloud
    • Connects to on-prem Postgres and your BigQuery project over secure channels
  3. Developer / sandbox deployment (for early technical validation)

    • mindSDB deployed into a sandbox VPC or dev environment
    • Connected to non-production copies of Postgres/BigQuery
    • Good for initial experiments before pulling in security/IT

For this POC, aim for a deployment that mirrors how you’d run in production (VPC or on-prem). That way, your security and governance teams see the real picture from day one.


Step 2: Connect mindSDB to Postgres and BigQuery (no ETL required)

mindSDB’s whole thesis is: bring AI to the data, not the other way around. So you won’t be exporting CSVs, building pipelines, or duplicating datasets.

2.1 Prepare your Postgres connection

Have these ready:

  • Hostname and port
  • Database name
  • A service account with:
    • SELECT on the POC tables
    • Optional: read-only access at the schema level for future expansion
  • Network access from your mindSDB deployment (VPC/subnet, security groups/firewall rules)

You then configure mindSDB with this connection; it will introspect the schema so the cognitive engine can understand tables, columns, and relationships.

2.2 Prepare your BigQuery connection

You’ll need:

  • GCP project and dataset with the POC tables
  • A service account with:
    • Permission to run queries (bigquery.jobs.create)
    • bigquery.dataViewer on the relevant datasets/tables
  • The service account JSON key or workload identity configuration
  • Network/DNS access from mindSDB to the BigQuery API endpoint (if using private networking, configure accordingly)

mindSDB uses its BigQuery connector to:

  • Discover table schemas
  • Generate and execute SQL in BigQuery’s dialect
  • Return results directly to the cognitive engine for aggregation and explanation

2.3 Confirm query-in-place for both systems

Before you even bring AI into it, validate that mindSDB can:

  • Run a simple SELECT against Postgres
  • Run a simple SELECT against BigQuery
  • Return rows and logs for each query

This ensures your connectors, credentials, and network setup are correct before you test cross-database analytics.


Step 3: Teach mindSDB your business language (minimal semantic setup)

One reason AI POCs fail is that they assume the model will “just know” what a “churned customer” or “closed-won opportunity” is. It doesn’t—unless you teach it.

The POC-level goal is not exhaustive modeling; it’s creating just enough semantic context for mindSDB to map natural language questions to your Postgres + BigQuery structure.

3.1 Define key business concepts

For your POC questions, define:

  • Entities: customers, accounts, subscriptions, opportunities, deals
  • Metrics: MRR, ARR, MAU/WAU/DAU, conversion rate, ticket volume
  • Events: signups, logins, feature usage, page views, ticket creation

Then tie each concept to the exact tables/columns:

  • MRR lives in Postgres on subscriptions.mrr
  • Active users per week is derived from BigQuery events with event_type = 'login' grouped by week”

3.2 Establish join logic across Postgres and BigQuery

Spell out how mindSDB should relate data:

  • “Join Postgres accounts.id to BigQuery events.account_id
  • “If both exist, prefer account_id over email for joins”
  • “Only consider events where environment = 'production'

This is where you encode the multi-system equivalence that’s currently living in analysts’ heads as tribal knowledge.

3.3 Configure these semantics in mindSDB

Using mindSDB’s configuration / cognitive engine controls, you:

  • Expose the relevant tables and columns from both systems
  • Add friendly names / descriptions so the engine can map NL terms to schema
  • Define the preferred join patterns and metric definitions

No manual schema setup beyond what’s necessary—just enough to make your POC questions answerable and verifiable.


Step 4: Validate cross-database questions end-to-end

Now you test the thing that matters: can someone ask a plain-English question and get a reliable, cross-system answer in minutes?

4.1 Start with a core POC question

For example:

“Over the last 90 days, which customers with MRR above $5,000 saw a week-over-week drop in active users of more than 25%?”

When you ask this through mindSDB:

  1. The cognitive engine plans the query:

    • Understands “customers” = Postgres accounts
    • Understands “MRR” = Postgres subscriptions.mrr
    • Understands “active users” = BigQuery events with event_type = 'login'
    • Designs a cross-system join using the configured keys
  2. It generates SQL for each system:

    • A Postgres query to fetch customer + MRR data
    • A BigQuery query to compute weekly active users per customer
  3. It validates the plan/queries:

    • Checks SQL syntax
    • Ensures queries match available schema
    • Applies sanity checks before execution
  4. It executes queries in place:

    • Postgres SQL runs directly on Postgres
    • BigQuery SQL runs directly on BigQuery
    • Results are pulled back for combination and explanation
  5. It explains the answer with citations:

    • Shows the SQL for each system
    • Describes how metrics were computed
    • Lists which tables were used from Postgres and BigQuery

4.2 Review the SQL and results

This is where trust is built.

  • Compare mindSDB’s SQL against what an analyst would write
  • Spot-check a few customers manually in BI or a SQL client
  • Validate that filters, joins, and date ranges are correctly applied

If something is off, you adjust:

  • Semantic mappings (e.g., clarify what “active” means)
  • Join rules (e.g., change primary key from email to account_id)
  • Metric definitions

Then re-run. Iteration here is measured in minutes, not days.

4.3 Add follow-up questions to test robustness

Try variations like:

  • “Break this down by region and customer success manager.”
  • “Show this as weekly aggregates rather than per customer.”
  • “Limit to customers created in the last 6 months.”

You’re testing whether mindSDB can adapt the plan and SQL while still staying within your Postgres + BigQuery reality.


Step 5: Bring governance and observability into the POC

Enterprises don’t just need answers—they need answers that are defensible.

5.1 Enforce RBAC, SSO, and native permissions

For the POC, wire up:

  • SSO/LDAP or your existing identity provider
  • Role-based access control so:
    • Some users can query all POC tables
    • Others are limited to specific schemas or datasets

Crucially, mindSDB honors the native permissions of Postgres and BigQuery:

  • If BigQuery denies access to a table, mindSDB can’t query it
  • If a Postgres role lacks SELECT on a table, those rows never show up in answers

This is how you avoid building yet another parallel permissions model.

5.2 Turn on logging and auditing

Every step of mindSDB’s pipeline should be observable:

  • The planning step: how it interpreted the natural language question
  • The generation step: the exact SQL for Postgres and BigQuery
  • The validation step: any corrections/rewrites made before execution
  • The execution step: query IDs, timestamps, and performance metrics

During the POC, collect a few representative logs to show:

  • How easy it is to reconstruct “why did we get this answer?”
  • How you can review SQL before trusting/automating insights
  • How you can monitor latency, accuracy, and model behavior over time

This is the backbone of “trust and verify” AI analytics.


Step 6: Quantify time-to-insight and value vs your status quo

To make this POC actionable, compare mindSDB against how you answer the same questions today.

6.1 Baseline your current process

For each POC question, estimate:

  • Time to define the requirement and ticket
  • Time for an analyst to:
    • Join Postgres + BigQuery data (often through ETL or exports)
    • Clean/validate data
    • Build/modify dashboards or spreadsheets
  • Time to iterate on follow-ups

It’s normal to see numbers like:

  • 1–5 days from question to answer
  • 4–8 hours of analyst time per complex cross-system question

6.2 Measure the POC results

With mindSDB:

  • Time from question to first answer (usually < 5 minutes once configured)
  • Time to iterate on follow-up questions (seconds to a minute)
  • Analyst hours saved per question

If you put even a light multiplier on this (questions per month, analysts in your org), you’ll start to see why teams report 20k+ hours saved and why we talk about “5 days vs 5 minutes” as a real, quantified delta.


Step 7: Expand prudently beyond the first POC slice

A successful POC should end with a clear “what’s next,” not just a demo everyone forgets about.

Consider these next steps:

  • Add more Postgres schemas or BigQuery datasets
    Bring in finance tables, product telemetry, or marketing attribution.

  • Introduce scheduled queries and monitoring
    Use mindSDB to watch a cross-system KPI (e.g., “high-ARR customers with declining usage”) and surface alerts.

  • Embed conversational analytics into internal tools
    Use mindSDB’s APIs and SDKs to embed this capability into your internal portals so GTM, support, or product teams can ask their own questions.

The key is to preserve the same principles that made the POC successful:

  • No ETL sprawl
  • Query-in-place execution
  • Transparent, auditable reasoning and SQL
  • Data never leaving your trust boundary

Final verdict: what a good Postgres + BigQuery POC should prove

If you follow this approach, your mindSDB proof of concept should give you confident answers to three questions:

  1. Can we answer real, cross-system questions across Postgres and BigQuery without building ETL?
    Yes—via query-in-place execution and over 200 data connectors, mindSDB brings the cognitive engine to your existing databases and warehouses.

  2. Can we trust those answers in a high-stakes environment?
    Yes—because you can see the SQL, review the reasoning, inherit native permissions, and audit every step in the pipeline before relying on the insights.

  3. Is the time-to-insight improvement big enough to matter?
    In most environments, it is—shrinking “question to answer” from days to minutes is not a UX upgrade, it’s an operating model change.

If you’re ready to validate this in your own stack—against your own Postgres and BigQuery data—the next step is straightforward.

Next Step

Get Started