ClickHouse vs BigQuery for near-real-time dashboards on event data—tradeoffs on performance, cost, and ops
Analytical Databases (OLAP)

ClickHouse vs BigQuery for near-real-time dashboards on event data—tradeoffs on performance, cost, and ops

10 min read

Near-real-time dashboards on high-volume event streams expose all the tradeoffs between warehouse-style systems like BigQuery and OLAP databases like ClickHouse. Both can power analytics at scale—but they make very different bets on latency, cost, and operational control as your data volume and concurrency grow.

Quick Answer: For near-real-time dashboards on event data, ClickHouse typically delivers lower latency (sub-second, millisecond-level queries) and lower cost at high concurrency, especially for continuously updated dashboards and investigative queries. BigQuery shines for ad hoc, sporadic analytics at massive scale, but its query- and slot-based pricing, ingest latency, and lack of local indexing primitives make it harder to run truly real-time, high-concurrency dashboards without cost and UX friction.

Why This Matters

If your product or internal teams depend on dashboards that reflect user activity, logs, or metrics within seconds—not minutes—your database choice shapes everything: customer experience, SRE workflows, and cloud bills. With event workloads (clickstreams, telemetry, log events), costs scale not just with data volume but with query concurrency and engineering effort to keep ingestion healthy.

ClickHouse and BigQuery can both ingest billions of events per day. The difference is how each behaves when you ask for:

  • “Latest 5 minutes of events, aggregated by user and endpoint, with P95 latency” every few seconds
  • 50–500 concurrent users slicing and dicing billions of rows interactively
  • Millisecond-level drilldowns for on-call engineers investigating incidents or anomalies

Key Benefits:

  • ClickHouse for real-time UX: Columnar storage, vectorized execution, and MergeTree indexes enable millisecond queries over fresh data, which is ideal for near-real-time dashboards and investigative workflows.
  • BigQuery for bursty, batch analytics: Serverless execution and separation of storage/compute works well when queries are periodic or ad hoc, and you’re comfortable with second-to-minute latency.
  • Operational control vs. abstraction: ClickHouse (Cloud/OSS) gives fine-grained tuning over ingestion, storage layout, and query behavior; BigQuery minimizes ops but gives you fewer levers to optimize real-time workloads.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Near-real-time dashboardsDashboards that reflect event data within seconds to a few minutes, typically auto-refreshing for operations or product analytics.Drives product decisions, incident response, and user experience; exposes latency, cost, and concurrency constraints in your data stack.
OLAP database vs. cloud warehouseClickHouse is an OLAP-first, column-oriented database with local indexes; BigQuery is a serverless, distributed warehouse optimized for large analytic scans.Determines how you pay (per query vs. per resource), how you index/partition, and how fast fresh data becomes queryable.
Operational envelopeThe combination of ingestion patterns, query shapes, concurrency, and maintenance tasks that a system can sustain reliably.The same system that’s great for weekly reports may struggle with always-on, high-cardinality, high-concurrency real-time dashboards.

How It Works (Step-by-Step)

Let’s walk through how a near-real-time event analytics pipeline typically differs on ClickHouse vs BigQuery.

1. Ingestion & Freshness

ClickHouse

  1. Event producers → buffer → batched inserts

    • Best practice is to batch between 1,000–100,000 rows per insert to keep MergeTree healthy and avoid “too many parts” errors.
    • In ClickHouse Cloud, you can ingest via Kafka, HTTP, or native drivers, often with seconds-level freshness.
  2. Storage in MergeTree tables
    Example schema for a clickstream table:

    CREATE TABLE events
    (
        event_time      DateTime,
        user_id         UInt64,
        session_id      UUID,
        event_type      LowCardinality(String),
        uri             String,
        latency_ms      UInt32,
        properties      JSON
    )
    ENGINE = MergeTree
    PARTITION BY toDate(event_time)
    ORDER BY (event_time, user_id, session_id);
    
    • PARTITION BY is used primarily for data lifecycle (e.g., daily drops), not micro-optimizing queries.
    • ORDER BY defines the primary index, which ClickHouse uses to skip large ranges of data during scans.
  3. Immediate queryability

    • New parts become visible to queries as soon as inserts commit—no separate load job required.
    • Background merges compact data asynchronously; you monitor them via system.merges.
    SELECT database, table, sum(rows) AS rows_processing
    FROM system.merges
    GROUP BY database, table;
    

BigQuery

  1. Event producers → streaming inserts or batch loads

    • Streaming inserts (via Storage Write API) can make data available in seconds but incur higher cost and may expose soft limits under extreme concurrency.
    • Batch loads (via GCS) are cheaper but have higher latency (minutes).
  2. Storage in columnar tables

    • You define partitioning (e.g., PARTITION BY DATE(event_time)) and clustering (e.g., on user_id, event_type).
    • There’s no local primary index like MergeTree; pruning is based on partition/cluster metadata and statistics.
  3. Eventual freshness

    • Streaming inserts typically result in seconds to low-minute latency before data is consistently queryable.
    • There’s no concept of “too many parts,” but high-frequency small streaming inserts show up as increased cost.

2. Querying & Latency

ClickHouse: OLAP database with millisecond queries

  • Vectorized execution scans columns efficiently, exploiting CPU and compression.

  • Primary index + data ordering enable aggressive data skipping.

  • For a P95 latency dashboard:

    SELECT
        toStartOfMinute(event_time) AS minute,
        event_type,
        quantile(0.95)(latency_ms) AS p95_latency,
        count() AS requests
    FROM events
    WHERE event_time >= now() - INTERVAL 15 MINUTE
    GROUP BY minute, event_type
    ORDER BY minute DESC, event_type
    LIMIT 100;
    
  • On billions of rows with proper ordering and compression, this pattern routinely executes in tens of milliseconds to low hundreds of milliseconds, supporting live, auto-refresh dashboards and investigative use.

You can validate real latency and concurrency through system.query_log:

SELECT
    query,
    count() AS executions,
    quantile(0.5)(query_duration_ms) AS p50_ms,
    quantile(0.95)(query_duration_ms) AS p95_ms
FROM system.query_log
WHERE event_time >= now() - INTERVAL 15 MINUTE
  AND type = 'QueryFinish'
GROUP BY query
ORDER BY p95_ms DESC
LIMIT 20;

BigQuery: warehouse-optimized, not latency-optimized

  • Columnar storage with distributed execution; ideal for scanning large data sets with seconds-level response.
  • Same query translated to BigQuery is straightforward, but typical latencies for always-on dashboards are hundreds of milliseconds to several seconds, depending on:
    • Data volume
    • Slot allocation / concurrency
    • Whether BI Engine is used for acceleration

BigQuery is excellent for heavy, infrequent analytics. For SLA-style sub-second dashboards, you often need:

  • BI Engine with caching
  • Carefully tuned partitioning/clustering
  • Pre-aggregation tables or materialized views

3. Cost & Operational Model

ClickHouse

  • Cost profile

    • You pay for provisioned compute/storage (ClickHouse Cloud) or your own hardware (OSS).
    • Compression is extremely efficient, often cutting storage costs vs row-oriented systems.
    • High concurrency and frequent queries don’t spike cost per query; they’re bound by your cluster resources.
  • Operational control

    • You control ingestion patterns, merges, and schema layout:

      • Monitor system.parts to keep parts per partition under control and avoid “too many parts.”
      • Tune insert batching rather than relying on opaque autoscaling.
      SELECT
          table,
          partition,
          count() AS parts,
          sum(rows) AS rows
      FROM system.parts
      WHERE database = 'default'
        AND active
      GROUP BY table, partition
      HAVING parts > 100
      ORDER BY parts DESC;
      
    • In ClickHouse Cloud, scaling and backups are managed, but you still see cluster behavior, making performance tuning transparent.

BigQuery

  • Cost profile

    • Storage billed per TB.
    • Compute billed per-query (on-demand) or via slots (flat-rate).
    • Streaming inserts are more expensive than batch loads.
    • High-concurrency dashboards and live filters can introduce unpredictable compute cost if users can fire many queries.
  • Operational abstraction

    • Little to no cluster management; no dealing with merges or parts.
    • Limited control over execution internals—optimization is mostly via schema design, partitioning, clustering, and caching.
    • Great when you want minimal ops but less ideal when you need precise, predictable performance for always-on real-time workloads.

Common Mistakes to Avoid

  • Using BigQuery as a hard real-time backend:
    Relying solely on BigQuery for dashboards that must refresh every few seconds over hot data can lead to user-visible lag and cost surprises. Mitigate by:

    • Adding a speed layer (e.g., ClickHouse) for the latest N minutes/hours.
    • Using pre-aggregations and BI Engine, and limiting ultra-tight refresh intervals.
  • Misusing ClickHouse partitioning as a query optimization tool:
    Over-partitioning (e.g., partition by hour or high-cardinality keys) can create too many parts, slow merges, and increased memory use. Follow ClickHouse guidance:

    • Partition for retention and lifecycle first (e.g., daily partitions).
    • Keep partition key cardinality relatively low (often < 100–1,000 values in active windows).
    • Rely on ORDER BY and compression for query performance, not excessively granular partitions.

Real-World Example

A product team wants a dashboard showing per-tenant error rates and P95 latency over the last 30 minutes, refreshing every 5 seconds. They also want engineers to drill into specific users or endpoints during incidents, scanning billions of rows.

BigQuery approach:

  • Events land via streaming inserts. Freshness is typically a few seconds to a minute.
  • The main dashboard query runs every 5 seconds per viewer, scanning recent partitions.
  • To keep costs manageable and latency acceptable, they:
    • Add materialized views or pre-aggregation tables that summarize metrics per minute.
    • Use BI Engine to accelerate repeated queries.
    • Enforce guardrails in the BI tool (no arbitrary filters for huge date ranges).

This works, but:

  • Engineers still see multi-second latencies for unconstrained queries.
  • Spiky usage (incident drills, many on-call engineers querying at once) can drive up query costs.

ClickHouse approach:

  • Events are batched (10,000–100,000 rows per insert) into a MergeTree table with daily partitions, ordered by (event_time, tenant_id, endpoint).

  • Dashboard queries over the last 30 minutes hit compressed, well-ordered data and routinely return in tens of milliseconds, even with high concurrency.

  • During incidents, SREs run exploratory queries directly:

    SELECT
        tenant_id,
        endpoint,
        quantile(0.99)(latency_ms) AS p99,
        countIf(status >= 500) AS errors,
        count() AS total
    FROM events
    WHERE event_time >= now() - INTERVAL 30 MINUTE
    GROUP BY tenant_id, endpoint
    ORDER BY p99 DESC
    LIMIT 50;
    

    They can confirm behavior and regressions by looking at system.query_log and adjust indices or settings if needed.

Operationally, the team:

  • Monitors system.parts and system.merges to keep MergeTree healthy.
  • Lets ClickHouse Cloud handle scaling and backups.
  • Treats BigQuery (if they use it at all) as a slower, cheaper archive or as a warehouse for offline modeling.

Pro Tip: For mixed workloads—near-real-time dashboards plus long-term, deep historical analytics—a common pattern is: ClickHouse as the speed layer for the recent N days/weeks of data and BigQuery as the cold, archival warehouse. You can stream the same events into both, use ClickHouse for operational dashboards and exploratory debugging, and BigQuery for heavy offline modeling, batch reporting, or cross-domain joins.

Summary

For near-real-time dashboards on high-volume event data, the tradeoffs between ClickHouse and BigQuery come down to:

  • Performance: ClickHouse is optimized as an OLAP database with local indexing and vectorized execution, delivering millisecond queries on fresh data at high concurrency. BigQuery is optimized for large batch analytics with seconds-level latency.
  • Cost: BigQuery’s per-query or slot-based model can become expensive with constantly refreshing dashboards and high user concurrency. ClickHouse’s resource-based model (Cloud or OSS) aligns better with continuous, interactive workloads; compression further reduces storage cost.
  • Ops: BigQuery minimizes operational overhead but limits your tuning levers. ClickHouse exposes the internals—parts, merges, query logs—letting you shape ingestion and schema to your workload. In ClickHouse Cloud, operational heavy lifting (scaling, backups) is managed while maintaining transparency and control.

If your primary requirement is always-on, near-real-time event dashboards and investigative querying over billions of rows, ClickHouse is usually the better fit—especially as concurrency and data volume grow. BigQuery remains excellent for more traditional warehouse use cases, long-running analytics, and organizations prioritizing minimal infrastructure management over hard real-time SLAs.

Next Step

Get Started