How does ZenML keep data and artifacts in our VPC—what exactly gets sent to ZenML Pro (metadata-only) and what stays in our cloud?
MLOps & LLMOps Platforms

How does ZenML keep data and artifacts in our VPC—what exactly gets sent to ZenML Pro (metadata-only) and what stays in our cloud?

9 min read

The demo era is over. If you’re serious about ML and GenAI in production, you can’t afford a platform that quietly pipes your data out of your VPC “for convenience.” ZenML Pro is explicitly built to do the opposite: it stays a metadata layer while your data, models, and artifacts never leave your cloud.

Quick Answer: ZenML Pro only sees metadata about your ML and GenAI workflows (pipeline names, run status, metrics, parameter configs, references to artifact locations, etc.). Your actual data, model binaries, embeddings, and artifacts stay in your own infrastructure (VPC / cloud accounts / on‑prem), behind your existing security and compliance controls.


The Quick Overview

  • What It Is: ZenML Pro is a metadata-first AI engineering layer that tracks, orchestrates, and governs ML and GenAI workflows without ever moving your raw data or artifacts into ZenML’s infrastructure.
  • Who It Is For: Teams that need to standardize ML and GenAI pipelines across Kubernetes, Slurm, Airflow, Kubeflow, etc., but operate under strict security, compliance, or data sovereignty constraints.
  • Core Problem Solved: You want reproducibility, lineage, and a unified control plane, but can’t send PII, model weights, or logs outside your VPC. ZenML gives you the metadata control layer without touching your data plane.

How It Works

ZenML’s architecture is simple on purpose: it’s a metadata layer on top of your existing infrastructure. Compute and storage live entirely in your cloud; ZenML Pro acts as the registry and brain that keeps track of what happened, where, and with which code and environment.

At a high level:

  1. Execution in Your VPC:
    Pipelines run where you already run workloads: your Kubernetes cluster, Slurm farm, VM fleet, or other environments. Data is read from and written to your own storage (S3/GCS/Azure Blob, internal object stores, databases, feature stores).

  2. Metadata Sync to ZenML Pro:
    During and after runs, ZenML clients and orchestrator integrations send a compact metadata stream to ZenML Pro: pipeline definitions, run status, parameters, metrics, artifact URIs, environment snapshots, and execution traces. No user data, no model binaries.

  3. Control, Governance, and UI on Metadata Only:
    ZenML Pro uses this metadata to power the UI, lineage graphs, diff/rollback, caching, and RBAC. When execution needs cloud resources or external tools, ZenML uses service connectors and short‑lived tokens so your workloads talk to those services directly—again, from inside your VPC.


What Exactly Gets Sent to ZenML Pro (Metadata Only)

Here’s what actually leaves your VPC and goes to ZenML Pro. All of this is:

  • Non‑PII by design (you control naming).
  • Encrypted in transit and at rest.
  • About your workflows, not your data.

1. Pipeline and Step Definitions

  • Pipeline names and IDs (e.g., customer_churn_training, rag_agent_eval).
  • Step names and structure (e.g., load_data, train_model, evaluate_agent).
  • DAG/topology: which steps depend on which others.
  • Orchestrator references (e.g., this pipeline runs on “team‑k8s‑cluster”, “slurm‑research”).

Why: This lets ZenML show you standardized pipelines across Airflow, Kubeflow, and custom orchestrators in a single graph.

2. Run and Execution Metadata

  • Run IDs, timestamps, and status (queued, running, completed, failed).
  • Execution times and resource usage summaries (e.g., CPU/GPU hours at a step level if you enable it).
  • Retry counts and failure reasons (exception types, stack traces locations, not raw data).
  • Trigger information (manual, scheduled, API‑triggered).

Why: You get an execution history that’s audit-ready: who ran what, when, and what happened.

3. Parameters and Configurations

  • Hyperparameters and configuration values used for a run (e.g., learning rate, max sequence length, retrieval top‑k).
  • Pipeline configuration knobs (e.g., which model family, which environment, which storage bucket).
  • Environment selection (e.g., “use GPU‑large cluster,” “use internal LLM endpoint”).

Caution: You should avoid putting sensitive data directly into parameter names/values (e.g., don’t encode user IDs in parameter strings). ZenML gives you the hooks; you keep parameter content clean.

4. Metrics and Evaluation Results

  • Scalar and structured metrics: accuracy, F1, BLEU, latency, cost per 1,000 tokens, etc.
  • Aggregated evaluation scores for agents and models.
  • Links to detailed evaluation artifacts that live in your storage (e.g., full eval dataset outputs in S3, just the URI in ZenML).

Why: You can compare runs, models, and agents over time without exposing underlying data.

5. Artifact References (But Not Artifacts Themselves)

ZenML tracks:

  • Artifact names and types (e.g., “preprocessed_dataset”, “trained_xgboost_model”, “llm_embeddings”).
  • Storage locations as URIs (e.g., s3://ml-prod-bucket/..., gs://genai-artifacts/...).
  • Version metadata (which run created which artifact, hash checksums, etc.).

ZenML does not ingest or store:

  • The actual dataset files.
  • Model binaries/checkpoints (PyTorch, TensorFlow, etc.).
  • Embedding vectors.
  • Serialized agent states.

Those live entirely in your cloud storage; ZenML just holds pointers and lineage.

6. Environment and Dependency Snapshots

ZenML’s metadata layer snapshots the environment around each step so you can debug and roll back:

  • Code commit hashes or package versions.
  • Dependency versions (e.g., your Pydantic, PyTorch, LangChain, LlamaIndex versions).
  • Container image names and tags.
  • Hardware configuration (e.g., GPU type, memory).

Why: This is how you answer “what changed?” when an agent breaks after a library update—and get consistent, diffable, rollbackable runs without sending code or binaries to ZenML’s side.

7. Execution Traces and Lineage Graph

For ML and GenAI pipelines:

  • Step‑level traces (which step executed with which inputs/outputs).
  • Lineage relations: datasets used, models produced, agents evaluated.
  • High‑level traces for agents (e.g., which tools were called, which retrieval step ran), without logging the underlying raw prompts or documents, unless you choose to encode that into metrics (not recommended for sensitive content).

Again, this is structural metadata—enough to reconstruct what happened, not the actual text, images, or user content flowing through your agents.


What Stays in Your Cloud / VPC

Everything that could get you in trouble with a security review or regulator stays where it already is: inside your own controlled environments.

1. Raw Data and Features

  • Raw transactional tables, event streams, logs.
  • PII, PHI, financial data, or any regulated content.
  • Feature store backends and materialized features.
  • Text corpora, documents, images, audio, video.

ZenML orchestrates pipelines that read from and write to these sources, but the bits never route through ZenML Pro. They go straight from your compute to your storage.

2. Models and Checkpoints

  • Model weights and checkpoints (PyTorch, TensorFlow, XGBoost, custom).
  • Fine‑tuned LLM checkpoints and adapters (LoRAs, PEFT artifacts).
  • Vector indices and embedding stores (e.g., pgvector, Pinecone, internal vector DBs where configured inside VPC).

ZenML only stores metadata and URIs; model files stay in your buckets, artifact stores, or registries.

3. Artifacts and Intermediate Outputs

  • Preprocessed datasets, train/validation/test splits.
  • Intermediate features or transformed inputs.
  • Evaluation datasets and raw evaluation outputs.
  • Agent transcripts, prompts, and responses (if you log them at artifact level).

ZenML’s artifact tracking is pointer‑based. The physical storage is always your infra.

4. Secrets and Credentials

By design:

  • API keys for LLM providers.
  • Cloud provider access keys (AWS, GCP, Azure).
  • Database passwords and connection strings.
  • Internal service tokens and certificates.

These are managed via:

  • Your own secret managers (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault, etc.).
  • ZenML’s service connectors, which generate short‑lived tokens for your workloads to talk to external services directly—without persisting long‑lived secrets in ZenML Pro.

ZenML Pro doesn’t need or keep your raw secrets to function as a metadata layer.

5. Compute and Orchestration Control

  • Kubernetes clusters (configs, nodes, pods).
  • Slurm clusters and job definitions.
  • Existing orchestrators, like Airflow or Kubeflow, and their job history.
  • Any internal job queues or batch systems.

ZenML integrates with these; it doesn’t replace them. Control plane for compute stays yours. ZenML issues instructions and tracks metadata; your orchestrators actually run the workloads.


How ZenML Pro Interfaces with Your Internal Services

ZenML is very explicit about the “your VPC, your data” design:

  1. Local Authentication for Internal Services
    The ZenML client and your orchestrators run inside your VPC. They authenticate to internal databases, object stores, and APIs just like any other internal workload—using your IAM roles, service accounts, or secret managers. ZenML Pro doesn’t see those credentials.

  2. Service Connectors & Short‑Lived Tokens
    When you need to reach external services (e.g., a cloud LLM API, external vector DB, or SaaS system), ZenML uses service connectors:

    • You configure the connector in a secure way.
    • At runtime, connectors mint short‑lived tokens or temporary credentials.
    • Your pipeline code calls the external service using that temporary auth, from inside your VPC.

    The ZenML server remains a metadata store, not a long‑term credential vault.

  3. Metadata Only Over the Wire
    Communication between your VPC and ZenML Pro is limited to the metadata described above. All of it is:

    • Encrypted in transit (TLS).
    • Encrypted at rest in ZenML’s infrastructure.

    There is no path or API in ZenML that requires your team to upload raw datasets, model binaries, or sensitive logs to ZenML Pro.


Security, Compliance, and Governance Guarantees

ZenML backs this architecture with explicit compliance posture:

  • Your VPC, your data: ZenML is a metadata layer on top of your infrastructure. Data and compute stay in your cloud; ZenML only has metadata access.
  • SOC2 Type II & ISO 27001: ZenML is SOC2 and ISO 27001 compliant, validating practices around data security, availability, and confidentiality of the metadata it does store.
  • RBAC and Governance: ZenML gives you RBAC, centralized run history, and lineage graphs over your ML and GenAI workflows, without centralizing your data.

From a security review standpoint, the conversation becomes:

  • “We’re not sending data or artifacts to ZenML—only pipeline metadata, metrics, and artifact references.”
  • “Secrets remain in our VPC; ZenML uses short‑lived tokens and service connectors.”
  • “All metadata stored by ZenML is encrypted at rest and in transit, with SOC2 Type II and ISO 27001 controls.”

Why This Matters in Practice

As someone who’s fought through audits in a regulated enterprise, here’s what this architecture actually buys you:

  • No compliance battles over data residency: You can adopt ZenML without re‑negotiating where your regulated datasets live. They never move.
  • Audit‑ready without data exposure: You get execution traces, lineage, and run histories that satisfy auditors (“show us how this model was trained”) without duplicating or exporting datasets.
  • No lock‑in to a “new data plane”: You keep your existing S3/GCS/Azure buckets, feature stores, model registries, and orchestrators (Airflow, Kubeflow). ZenML layers metadata on top rather than replacing anything.
  • Safer GenAI experimentation: You can run LangChain/LangGraph/LlamaIndex agents in production with full lineage and environment snapshots, but keep prompts, documents, and embeddings inside your own infrastructure.

Summary

ZenML Pro is deliberately “metadata‑only.” It sees:

  • Pipeline structures and run status.
  • Parameters, metrics, environment snapshots.
  • Artifact references and lineage.
  • Execution traces at a structural level.

It never needs:

  • Raw datasets, PII, or PHI.
  • Model weights, checkpoints, or embeddings.
  • Secrets and long‑lived credentials.
  • Your actual compute or storage control plane.

You get the missing layer for AI engineering—reproducible, traceable, RBAC‑controlled ML and GenAI pipelines—while your data and artifacts stay firmly inside your VPC and your cloud accounts.


Next Step

Get Started