
How do I install ZenML Open Source and run a first pipeline locally (quickstart steps)?
The demo era is over. If you want ZenML to do real work for you, you need it running on your machine with a pipeline you can actually execute and debug locally—before you ever point it at Kubernetes or Slurm.
This quickstart walks you through exactly that: installing ZenML Open Source, initializing a local project, and running your first ML pipeline end‑to‑end on your laptop. No Kubernetes, no YAML walls, just Python and a reproducible pipeline you can later scale out.
Quick Answer: Install ZenML via
pip, initialize a ZenML repository in your project folder, define a simple@stepand@pipelinein Python, then run it locally using the default local stack. This gives you a fully tracked, reproducible pipeline run on your machine in minutes.
The Quick Overview
- What It Is: A hands-on quickstart to install ZenML Open Source and run your first local ML / GenAI pipeline.
- Who It Is For: ML engineers, data scientists, and AI engineers who are stuck in notebooks and want a real pipeline that still runs locally.
- Core Problem Solved: Stop glue-coding and “it worked on my machine” scripts; start from a local pipeline that’s already reproducible, versioned, and ready to move to the cloud.
How It Works
At its core, ZenML is a metadata layer and workflow system that sits on top of your existing tools. For a local quickstart, you don’t need Airflow, Kubeflow, or Kubernetes. ZenML ships with a default “local stack” that:
- Stores artifacts (data, models, metrics) in a local directory
- Uses a local orchestrator to run your pipeline as a normal Python process
- Tracks every run, step, and environment so you can inspect lineage and debug
You’ll:
- Install ZenML and create an isolated Python environment.
- Initialize a ZenML repository and write a tiny pipeline.
- Run the pipeline using the default local stack and inspect the results.
From there, the same @step and @pipeline definitions can later be pointed at Kubernetes, Slurm, or your orchestrator of choice—without rewriting everything.
Step 1: Install ZenML Open Source (locally via pip)
Stop installing heavy MLOps stacks before you know they’ll run on your laptop. Start small.
1.1. Prepare a clean Python environment
Using venv (or conda if you prefer) keeps your ZenML dependencies isolated:
# Create and activate a virtualenv (Python 3.9+ recommended)
python -m venv .venv
source .venv/bin/activate # on macOS / Linux
# .venv\Scripts\activate # on Windows PowerShell
Check your Python version:
python --version
Aim for a modern 3.x (e.g., 3.9–3.12).
1.2. Install ZenML via pip
pip install --upgrade pip
pip install "zenml[server]"
zenmlinstalls the CLI and core library.zenml[server]adds dependencies needed to run the local ZenML server UI later if you want.
Verify installation:
zenml version
You should see the installed ZenML version printed out.
Step 2: Initialize a ZenML Repository Locally
ZenML organizes everything (pipelines, stacks, artifacts) inside a ZenML repository. Think of it as your project’s metadata root.
2.1. Create a project folder
mkdir zenml-quickstart
cd zenml-quickstart
2.2. Initialize ZenML
zenml init
What this does:
- Creates a
.zendirectory for metadata and configuration. - Registers a default local stack (local orchestrator + local artifact store + local metadata store).
- Marks the current directory as a ZenML repository.
Check the active stack:
zenml stack list
You should see something like default with components all set to local_*.
Your project is now ZenML-aware. Any pipeline you define in this directory will be tracked and reproducible.
Step 3: Write Your First ZenML Pipeline (Pure Python)
This is where you break the prototype wall: move from “notebook cells in the right order” to a pipeline with concrete steps and lineage.
We’ll create a tiny pipeline with:
- A data loader step that returns some data.
- A trainer step that fits a simple Scikit-learn model.
- A pipeline that wires them together.
3.1. Install runtime dependencies (example: Scikit-learn)
pip install scikit-learn
3.2. Create a file for your pipeline
Create quickstart_pipeline.py in your repository:
touch quickstart_pipeline.py
Add the following code:
from typing import Tuple
from zenml import step, pipeline
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
@step
def load_data() -> Tuple:
"""Load sample data and split into train and test sets."""
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
return X_train, X_test, y_train, y_test
@step
def train_model(inputs: Tuple) -> float:
"""Train a simple classifier and return the test accuracy."""
X_train, X_test, y_train, y_test = inputs
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
preds = model.predict(X_test)
acc = accuracy_score(y_test, preds)
print(f"Test accuracy: {acc:.3f}")
return acc
@pipeline
def iris_training_pipeline(
load_data_step,
train_step,
):
"""Wire steps together into a reproducible pipeline DAG."""
data = load_data_step()
train_step(data)
if __name__ == "__main__":
# Instantiate the pipeline with concrete step functions
pipeline_instance = iris_training_pipeline(
load_data_step=load_data(),
train_step=train_model(),
)
pipeline_instance.run()
What’s happening here:
@stepturns plain Python functions into tracked pipeline steps.@pipelinedefines the DAG:load_data→train_model.- Each run will be recorded with inputs, outputs, and environment metadata in your local ZenML repository.
Step 4: Run the Pipeline Locally with the Default Stack
You now have everything you need for a full local run.
From the project directory:
python quickstart_pipeline.py
You should see:
- Logs from your steps
- A printed accuracy, e.g.,
Test accuracy: 0.967
Behind the scenes, ZenML:
- Used the local orchestrator to execute the pipeline.
- Stored step inputs/outputs in the local artifact store (under
.zen/and local dirs). - Recorded run metadata, code snapshots, and lineage in the local metadata store.
This is your first real, tracked pipeline run—not just a script.
Step 5: Inspect the Run and Lineage (Optional but Recommended)
Orchestration without lineage is theater. ZenML gives you both, even locally.
5.1. List your pipeline runs
zenml pipeline runs list
You’ll see a run entry with:
- Pipeline name
iris_training_pipeline - Status (
Completed) - Timestamp and ID
5.2. Start the local ZenML dashboard (if installed with [server])
zenml up
Then visit the URL printed in your terminal (usually http://127.0.0.1:8237).
From the UI you can:
- View the pipeline run graph (steps and their status)
- Inspect artifacts produced by each step
- Drill into logs and metadata
This is the same kind of inspection enterprise teams use later for Kubernetes, Slurm, or distributed training—just applied to your laptop stack.
Step 6: Evolve the Quickstart (Optional Extensions)
You now have a minimal, working local setup. Here’s how teams usually extend it next, still staying local-first:
-
Add a metrics step
Split metrics computation into a dedicated step so it can be cached and compared between runs. -
Parameterize your pipeline
Use ZenML configuration to pass hyperparameters into steps, making experiments traceable instead of “magic numbers” in a notebook. -
Integrate with your stack
Add steps using PyTorch, TensorFlow, LlamaIndex, or LangChain—ZenML doesn’t care what framework you use; it just tracks the DAG, artifacts, and versions. -
Prepare for your orchestrator
Later, you can bind this same pipeline to Airflow, Kubeflow, or a Kubernetes-native stack without rewriting the pipeline code—ZenML is the metadata layer on top.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Local Stack by Default | Runs orchestrator, artifact store, and metadata store entirely on your machine. | Lets you get a real pipeline running without touching Kubernetes or external services. |
@step and @pipeline APIs | Wrap plain Python functions and compose them into a DAG. | Converts ad‑hoc notebook logic into reproducible, inspectable workflows. |
| Metadata & Lineage Tracking | Records code, parameters, artifacts, and run metadata. | Makes every local run diffable, debuggable, and ready to move to production environments. |
Ideal Use Cases
- Best for local ML experimentation with structure: Because it lets you keep the flexibility of Python while making every run a tracked pipeline instead of a fragile notebook sequence.
- Best for teams preparing for production stacks: Because the exact same pipeline you build locally today can be bound to Kubernetes, Slurm, or Airflow/Kubeflow tomorrow without rewrites.
Limitations & Considerations
- Local resources only: Local runs are constrained by your machine’s CPU, RAM, and GPU. For heavier training or complex LangGraph loops, you’ll eventually want to connect ZenML to remote compute (e.g., Kubernetes or a managed stack).
- No multi-user governance on pure local: The local quickstart is single‑user and filesystem-based. For team RBAC, centralized credentials, and shared lineage, you’ll move to a shared ZenML deployment (self‑hosted or ZenML Cloud).
Pricing & Plans
Everything described so far uses ZenML Open Source (Apache 2.0) and runs entirely on your local machine.
If you want collaborative features, managed infrastructure, and guided onboarding, you can upgrade to ZenML Cloud:
- Open Source (Self‑managed): Best for individuals or small teams who want to run ZenML inside their own environment (or just on their laptops) and are comfortable managing infra themselves.
- ZenML Pro / Cloud: Best for teams needing shared stacks, RBAC, SOC2 Type II / ISO 27001 posture, and guided setup from “fresh repo” to “pipelines in production” without burning weeks on MLOps plumbing.
Frequently Asked Questions
Do I need Docker, Kubernetes, or an orchestrator to run the first pipeline?
Short Answer: No. The entire quickstart runs on the local stack with no external orchestrator.
Details: ZenML ships with a local orchestrator and local artifact/metadata stores that work out of the box. You don’t need Docker, Kubernetes, Airflow, or Kubeflow to run this first pipeline. Those come later when you’re ready to scale. The idea is to get a fully tracked pipeline running on your laptop first, then point the same pipeline at more powerful infrastructure when needed.
Can I still use Jupyter notebooks with ZenML?
Short Answer: Yes, but your core pipelines should live in Python modules, not as fragile notebook cell sequences.
Details: You can develop and debug ZenML steps inside notebooks, then move the final @step and @pipeline definitions into Python files (as in quickstart_pipeline.py). ZenML’s value shows up when pipelines are versioned and reproducible; notebooks remain useful for exploration, visualization, and ad‑hoc analysis around those pipelines. The local quickstart is intentionally script-based to create something you can immediately re-run, diff, and eventually deploy.
Summary
Installing ZenML Open Source and running a first local pipeline is the fastest way to move from “notebook demos” to a real, metadata-backed workflow. You:
- Installed ZenML into an isolated Python environment.
- Initialized a ZenML repository with a default local stack.
- Defined a simple Scikit-learn training pipeline with
@stepand@pipeline. - Ran it locally and captured full lineage and artifacts.
From here, you can iterate on that pipeline, integrate your preferred frameworks, and eventually bind it to Kubernetes, Slurm, Airflow, or Kubeflow—without rewriting your core pipeline logic. Local is just the first step; the APIs stay the same as you scale out.