How do I install ZenML Open Source and run a first pipeline locally (quickstart steps)?
MLOps & LLMOps Platforms

How do I install ZenML Open Source and run a first pipeline locally (quickstart steps)?

11 min read

Most teams don’t fail at “AI strategy.” They fail at the first boring hurdle: getting a reproducible pipeline running anywhere other than a Jupyter notebook. ZenML Open Source exists to break that prototype wall, and you can feel that difference in under 15 minutes on your laptop.

Quick Answer: ZenML Open Source is a metadata-first layer that standardizes ML and GenAI pipelines in plain Python while keeping your data and compute where they are. You install it with pip, initialize a local ZenML environment, and run a simple pipeline that you can later move unchanged to Kubernetes, Slurm, or your favorite orchestrator.


The Quick Overview

  • What It Is: A unified AI “metadata layer” that snapshots code, dependencies, and artifacts for every pipeline run. It plugs into your existing tools instead of replacing them.
  • Who It Is For: ML/GenAI engineers, data scientists, and platform teams who are tired of fragile scripts, “it worked on my machine” failures, and stack rewrites when moving from notebooks to production.
  • Core Problem Solved: Notebook experiments don’t translate cleanly to batch evaluation, CI, or production serving. ZenML standardizes workflows as pipelines with full lineage and local-to-cloud portability.

How It Works (High Level)

ZenML doesn’t ask you to abandon your stack. You write Python functions, decorate them with @step, and connect them into a @pipeline. ZenML runs that graph locally at first, but the same definitions can later be scheduled in Airflow, executed on Kubernetes, or scaled out on Slurm—without you rewriting everything.

Under the hood, ZenML:

  • Tracks code, dependency versions, and container state for every run.
  • Manages artifacts between steps instead of you glue-coding file paths.
  • Gives you a control plane to inspect runs, lineage, and execution traces.

Let’s walk through the quickstart: install, initialize, and run your first local pipeline.


Step 1 — Prerequisites

You’ll need:

  • Python: 3.8 or later (3.9–3.11 is a safe default).
  • Virtual environment: venv, conda, or poetry—anything that isolates dependencies.
  • Basic CLI access: macOS, Linux, or WSL on Windows.

If your team cares about reproducibility (they should), always treat the environment as part of the experiment. ZenML amplifies this by snapshotting environment metadata, but start clean anyway.


Step 2 — Create and Activate a Virtual Environment

From a new project directory:

mkdir zenml-quickstart
cd zenml-quickstart

# Using venv
python -m venv .venv
source .venv/bin/activate    # On Windows: .venv\Scripts\activate

You should see (.venv) or similar in your shell prompt. All ZenML dependencies will now stay scoped to this project.


Step 3 — Install ZenML Open Source

Install the core package from PyPI:

pip install --upgrade pip
pip install "zenml[server]"

Why "zenml[server]"? Because ZenML uses a lightweight local server to store metadata for your pipelines. Even in quickstart mode, you want that metadata—without it, orchestration is theater.

Verify the installation:

zenml version

You should see a version string (e.g., 0.x.y) with no errors.


Step 4 — Initialize ZenML in Your Project

Inside the zenml-quickstart directory:

zenml init

This:

  • Creates a .zen folder containing local configuration.
  • Sets up a default “stack” for local execution (local orchestrator, local artifact store, local metadata).
  • Prepares the project to track pipelines and runs.

Check that the stack exists:

zenml stack describe

You should see something like:

ACTIVE STACK: default
  Orchestrator: default
  Artifact Store: default
  ...

For now, everything is local and file-based. Later, you’ll swap those components for Kubernetes, S3/GCS artifact stores, and external orchestrators without touching pipeline code.


Step 5 — Understand the ZenML Building Blocks

Before writing the pipeline, keep three core concepts in mind:

  1. Step: A single unit of work (load data, train model, call LLM, run evaluation). Implemented as a Python function wrapped with @step.
  2. Pipeline: A DAG that wires steps together. Implemented as a function wrapped with @pipeline.
  3. Stack: The runtime configuration (orchestrator, artifact store, etc.). For this quickstart it’s all local.

You’ll write pure Python. ZenML handles:

  • How artifacts move between steps.
  • How runs are recorded and versioned.
  • How to reproduce or debug a particular run later.

Step 6 — Write Your First Local Pipeline

Create a file pipeline.py in the zenml-quickstart directory:

from zenml import step, pipeline


@step
def ingest_data() -> int:
    """Dummy ingestion step: returns a constant."""
    value = 42
    print(f"[ingest_data] Produced value: {value}")
    return value


@step
def transform_data(input_value: int) -> int:
    """Simple transformation step."""
    transformed = input_value * 2
    print(f"[transform_data] Transformed {input_value} -> {transformed}")
    return transformed


@step
def report_result(result: int) -> None:
    """Final step: logs the result."""
    print(f"[report_result] Final result: {result}")


@pipeline
def simple_local_pipeline():
    """Connect the steps in a DAG."""
    raw = ingest_data()
    transformed = transform_data(raw)
    report_result(transformed)


if __name__ == "__main__":
    # Trigger a pipeline run locally
    simple_local_pipeline()

What’s happening here:

  • @step wraps plain functions and turns their inputs/outputs into versioned artifacts.
  • @pipeline builds a DAG that ZenML can orchestrate, cache, and track.
  • The if __name__ == "__main__": block lets you run the pipeline as a normal Python script.

This is deliberately minimal, but structurally identical to real pipelines where steps might be:

  • Ingestion via a data warehouse.
  • Model training in PyTorch or scikit-learn.
  • RAG retrieval with LlamaIndex.
  • Agent reasoning with LangChain or LangGraph.
  • Evaluation and batch inference.

Step 7 — Run the Pipeline Locally

With your virtual environment activated:

python pipeline.py

You should see log output like:

[ingest_data] Produced value: 42
[transform_data] Transformed 42 -> 84
[report_result] Final result: 84

Behind that simple printout, ZenML has:

  • Created a pipeline run record.
  • Versioned artifacts for each step output.
  • Stored metadata locally in your .zen directory.

You can list runs:

zenml pipeline runs list

And inspect details for a specific run (replace <RUN_NAME>):

zenml pipeline runs describe <RUN_NAME>

This is where ZenML starts to diverge from “just a Python script”: every run is now diffable, trackable, and repeatable.


Step 8 — Inspect Metadata and Lineage (Local UI)

ZenML Open Source includes a lightweight server with a UI. Launch it:

zenml up

This will start the local ZenML server and open a web UI (or print a URL) where you can:

  • See your simple_local_pipeline runs.
  • Inspect step-level execution.
  • Explore artifacts and lineage.

This UI is your first glimpse of ZenML as a control plane over your AI workflows. In an enterprise setting, that same plane lives inside your VPC, with RBAC, centralized credentials, and audit-ready histories.

To stop the local server, use Ctrl+C in the terminal where it’s running.


Step 9 — Add a Tiny ML Example (Optional but Useful)

Let’s make the pipeline less toy-like by adding a scikit-learn step. Install the dependency:

pip install scikit-learn

Update pipeline.py:

from typing import Tuple

import numpy as np
from sklearn.linear_model import LinearRegression
from zenml import step, pipeline


@step
def generate_training_data() -> Tuple[np.ndarray, np.ndarray]:
    """Generate synthetic training data."""
    X = np.arange(0, 10).reshape(-1, 1).astype(float)
    y = 3 * X.squeeze() + 1  # y = 3x + 1
    return X, y


@step
def train_model(X: np.ndarray, y: np.ndarray) -> LinearRegression:
    """Train a simple linear regression model."""
    model = LinearRegression()
    model.fit(X, y)
    print(f"[train_model] Coeff: {model.coef_}, Intercept: {model.intercept_}")
    return model


@step
def evaluate_model(model: LinearRegression) -> None:
    """Evaluate the model on a sample input."""
    x_test = np.array([[5.0]])
    y_pred = model.predict(x_test)
    print(f"[evaluate_model] Prediction for x=5: {y_pred[0]}")


@pipeline
def training_pipeline():
    X, y = generate_training_data()
    model = train_model(X, y)
    evaluate_model(model)


if __name__ == "__main__":
    training_pipeline()

Run it:

python pipeline.py

Now you have:

  • A real model artifact tracked by ZenML.
  • A pipeline that represents your ML workflow end-to-end.

This is exactly the path from “notebook cell” to “versioned pipeline” that most teams miss.


Step 10 — What Happens When You Outgrow Local?

The point of this quickstart is a local pipeline, but you should know what you’ve unlocked:

  • Same pipeline, different stacks: Use the identical @pipeline and @step definitions when you switch to:
    • Kubernetes/Slurm-backed orchestrators.
    • Cloud artifact stores (S3, GCS, Azure).
    • Airflow or Kubeflow for scheduling and execution.
  • Zero YAML gymnastics: You describe resources in Python; ZenML handles dockerization, GPU provisioning, and scaling behind the scenes.
  • Full lineage and rollback: When a future library update breaks your agent or model, ZenML’s snapshots of code, dependency versions (e.g., specific Pydantic versions), and container state let you diff and roll back quickly.

In other words, what you just ran on your laptop is structurally production-ready. The stack changes; your pipeline code mostly doesn’t.


Features & Benefits Breakdown (In the Context of This Quickstart)

Core FeatureWhat It DoesPrimary Benefit
Python-first pipelinesLet you define @step and @pipeline in pure Python, no YAML required.Turn notebooks into reproducible workflows without changing languages or tools.
Metadata & artifact trackingRecords runs, artifacts, and environment info in a local server.Make every experiment auditable, reproducible, and debuggable instead of “it worked on my machine.”
Stack abstractionSeparates pipeline code from infrastructure (orchestrator, storage, etc.).Run the same pipeline locally today and on Kubernetes/Slurm/cloud tomorrow with minimal changes.

Ideal Use Cases for a Local ZenML Quickstart

  • Best for “notebook-to-pipeline” migrations: Because it lets you lift your existing Python code into steps and pipelines without rewriting everything in Airflow or raw Kubernetes specs.
  • Best for teams prototyping ML and GenAI agents with a path to prod: Because you can bind scikit-learn training, LlamaIndex retrieval, and LangChain/LangGraph reasoning into one DAG locally, then move that DAG to your real infrastructure.

Limitations & Considerations

  • Local stack only: This quickstart uses a local orchestrator and local artifact store. It’s perfect for learning and early experiments, but you’ll want remote stacks for real workloads. ZenML is built for that—don’t stop at local.
  • Single-user by default: The local ZenML server and metadata store are scoped to your machine. For team-wide lineage, RBAC, and centralized credentials, you’ll want ZenML Cloud or a self-hosted ZenML deployment in your VPC.

Pricing & Plans (Where This Fits)

ZenML Open Source is Apache 2.0 licensed and free to run locally or inside your own infra. When you outgrow “one engineer on a laptop,” ZenML Pro and Cloud add collaboration and governance layers.

  • Open Source (OSS): Best for individual ML/GenAI engineers or small teams needing reproducible pipelines and easy local-to-remote migration without licensing friction.
  • ZenML Pro / Cloud: Best for teams and enterprises needing guided onboarding, managed infrastructure setup, RBAC, centralized secret management, SOC2 Type II / ISO 27001 compliance, and multi-user control planes.

You can start on OSS today and later connect to ZenML Cloud without throwing away your pipelines.


Frequently Asked Questions

Do I need Docker or Kubernetes to run this first ZenML pipeline?

Short Answer: No. The quickstart runs everything locally without Docker or Kubernetes.

Details: ZenML’s stack abstraction is what enables you to standardize on Kubernetes or Slurm later, but for this local quickstart you only need Python and a virtual environment. The default local stack uses your filesystem for artifacts and a lightweight local server for metadata. When you’re ready, you can register new stacks with Kubernetes, Slurm, or Airflow/Kubeflow orchestrators—your @step and @pipeline code stays the same.


Does ZenML replace my orchestrator (Airflow, Kubeflow, etc.)?

Short Answer: No. ZenML is a metadata layer and control plane that complements orchestrators rather than replacing them.

Details: Orchestrators like Airflow and Kubeflow are good at scheduling and executing jobs. They’re not designed to snapshot your environment, track ML/GenAI artifacts, or give you run-level lineage from raw data to agent response. ZenML sits on top: you define your workflow in Python as a pipeline, and ZenML can run it with its own local orchestrator or delegate execution to Airflow, Kubeflow, Kubernetes, or Slurm. The result is one unified, diffable pipeline definition that’s independent of any single orchestrator.


Summary

In practical terms, “installing ZenML Open Source and running a first pipeline locally” looks like this:

  1. Create a virtual environment and pip install "zenml[server]".
  2. Run zenml init to set up a local stack.
  3. Write a simple @step + @pipeline script.
  4. Execute python pipeline.py and inspect the run with zenml up.

From there, you’ve already moved beyond notebooks into a world where every run is traceable and every pipeline is portable. The same definitions you just used on your laptop can later orchestrate scikit-learn training, LangGraph loops, and RAG workflows on Kubernetes—without you rewriting the core logic every time infra changes.


Next Step

Get Started(https://cloud.zenml.io/signup)