
How do I enable ZenML caching/deduplication to reduce repeated training steps and LLM eval costs?
The demo era is over. If you’re still paying to re-run the same training loop or the same LLM eval prompts on every pipeline run, you’re not doing AI engineering—you’re burning budget.
Quick Answer: ZenML’s caching and deduplication automatically reuse step outputs when nothing relevant has changed. By enabling caching on your steps and configuring artifact stores correctly, you can skip redundant training epochs and avoid paying for identical LLM/tool calls across runs.
The Quick Overview
- What It Is: ZenML’s built-in caching and deduplication system that snapshots step inputs, code, and environment so identical work is only computed once.
- Who It Is For: ML and GenAI teams running repeated pipelines (training, batch inference, and evaluation) that want to cut GPU/API waste and stabilize production runs.
- Core Problem Solved: Re-running the same training and LLM evaluation steps because your orchestrator doesn’t know what has changed—and what hasn’t.
How It Works
Raw orchestrators like Airflow or Kubeflow will happily re-run every task on every schedule. They don’t understand that your training data, hyperparameters, and code are identical to last time, or that your LLM eval prompts haven’t changed. ZenML adds a metadata layer on top: it fingerprints all relevant inputs, the step code, and the execution environment, then uses that fingerprint to decide whether to reuse a cached artifact or recompute.
At a high level:
- Step Fingerprinting & Metadata Snapshot:
Each step (e.g.,train_model,eval_llm) is hashed based on:- Input artifacts (data, models, configs)
- Parameters and hyperparameters
- The step source code
- The environment (e.g., container state, library versions like Pydantic)
- Cache Lookup & Deduplication:
Before execution, ZenML checks if a compatible cached output already exists for this fingerprint in your artifact store. If yes, the step is skipped, and outputs are restored—no GPU spin-up, no LLM calls. - Execution, Storage & Lineage:
If no cache hit is found, the step runs on your infrastructure (Kubernetes, Slurm, local, etc.), ZenML stores its outputs as versioned artifacts and updates lineage. Next time, the same fingerprint will resolve to this cached result.
This applies to both classical ML (Scikit-learn, PyTorch) and GenAI workflows (LangChain/LangGraph agents, LlamaIndex retrieval, LLM evaluations) in the same DAG.
Enabling ZenML Caching in Practice
1. Turn on caching at the step level
Caching is controlled per step so you decide where to save compute. For expensive training or LLM eval steps, you typically want it on by default.
from zenml import step
@step(enable_cache=True)
def train_model(data, config):
# your PyTorch / Scikit-learn training code
...
@step(enable_cache=True)
def eval_llm(responses, eval_prompts, criteria):
# your LLM eval logic: call OpenAI/Anthropic, or a local model
...
If you prefer a global default, you can use settings per pipeline or globally via ZenML configuration, then override per step where you explicitly don’t want caching (e.g., steps that produce timestamps or non-deterministic outputs):
@step(enable_cache=False)
def log_run_metadata(...):
...
Key idea: treat caching as an explicit control knob: “I’m okay reusing this result if nothing important changed.”
2. Make sure artifacts and metadata are stored centrally
Caching only works if ZenML can find previous artifacts. That means:
- A reachable artifact store (e.g., S3/GCS/Azure Blob, or a shared filesystem)
- A metadata store (e.g., MySQL/PostgreSQL, or ZenML Cloud) tracking fingerprints and lineage
- Consistent configuration across your runs / orchestrators
Example with a local stack (for experimentation):
zenml integration install aws # or gcp, azure, etc.
zenml stack register local_cached_stack \
-a s3_artifact_store \
-m local_metadata_store \
-o local_orchestrator \
--set
In production, you typically:
- Keep artifact storage in your own cloud account or VPC
- Use ZenML Cloud or a managed DB in your infra as metadata store
- Reuse the same stack across Airflow, Kubeflow, or other orchestrators through ZenML
3. Wire expensive steps into your pipelines as first-class ZenML steps
Caching and deduplication only apply to ZenML steps. If your LLM eval is buried inside an untracked Python script called by Airflow, ZenML can’t help.
Structure your pipeline:
from zenml import pipeline
@step(enable_cache=True)
def load_data(...):
...
@step(enable_cache=True)
def preprocess(data):
...
@step(enable_cache=True)
def train_model(preprocessed_data, config):
...
@step(enable_cache=True)
def eval_llm(model, eval_dataset, prompts):
# e.g., call LangChain or LangGraph loop here
...
@pipeline
def training_and_eval_pipeline():
data = load_data()
processed = preprocess(data)
model = train_model(processed)
eval_llm(model, processed, prompts=...)
Then run this pipeline as often as you like (on schedule, on-demand, triggered by CI). ZenML will skip train_model and eval_llm whenever inputs, code, and environment haven’t changed.
4. Understand what breaks the cache (and what doesn’t)
To trust caching for real workloads, you need to know when a step will re-run.
ZenML invalidates cache when:
- Input artifacts change: different data version, new training split, updated embedding store, etc.
- Parameters change: hyperparameters, eval criteria, temperature, max tokens, or any
@stepparameter. - Step code changes: refactors, bug fixes, or changed model architecture.
- Environment changes: container image updates, library version changes (e.g., new Pydantic or PyTorch version).
It does not invalidate cache when:
- Downstream steps change but the step itself is unchanged.
- Orchestration schedule changes (e.g., same pipeline run more frequently with no input change).
- Unrelated parts of your stack are modified (e.g., Airflow DAG wrapper code).
This is the point: ZenML knows whether a library update broke an Agent or a Model because it snapshots the code and environment state per step and lets you inspect diffs and roll back.
5. Use caching for LLM eval deduplication
LLM evaluations are the worst place to pay twice. Large prompt matrices, multiple criteria, and long contexts add up quickly.
Pattern:
- Represent your eval data (prompts, expected behaviors, rubrics) as ZenML artifacts.
- Wrap LLM calls into a ZenML step with
enable_cache=True. - Ensure evaluation configs (model, temperature, criteria) are step parameters.
Example:
@step(enable_cache=True)
def llm_eval_step(
model_name: str,
temperature: float,
prompts: list[str],
target_responses: list[str],
rubric: dict,
):
# call OpenAI/Anthropic/Bedrock or LangChain eval
# return structured scores & metadata
...
@pipeline
def llm_eval_pipeline():
prompts, targets, rubric = load_eval_inputs()
llm_eval_step(
model_name="gpt-4.1",
temperature=0.2,
prompts=prompts,
target_responses=targets,
rubric=rubric,
)
Run this pipeline as you:
- Tune non-LLM parts of the system (retrieval, routing)
- Change your orchestration
- Trigger CI checks before release
As long as the eval inputs and parameters are identical, ZenML will reuse previous scores and skip LLM calls. You only pay when the evaluation actually changed.
6. Combine caching with infra abstraction to avoid idle GPU and YAML overhead
Caching doesn’t live in isolation: it’s most powerful when combined with ZenML’s infrastructure abstraction.
- Define hardware in Python:
“This step needs 1x A100, 40GB RAM” instead of 100 lines of Kubernetes YAML. - ZenML handles dockerization, GPU provisioning, and scaling on Kubernetes or Slurm.
- Smart caching ensures those GPUs only spin up when you actually need to recompute.
Example (conceptual):
from zenml.client import Client
client = Client()
@step(enable_cache=True, resource_config={"gpu": 1, "memory": "40Gi"})
def heavy_training_step(...):
...
# Run via the stack with a Kubernetes or Slurm orchestrator
client.get_pipeline("training_and_eval_pipeline").run()
You eliminate two anti-patterns at once:
- YAML-driven infra complexity
- Paying for idle / redundant GPU jobs
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Step-level Smart Caching | Reuses outputs when inputs, code, and environment fingerprints match | Skips redundant training epochs and repeated LLM eval runs |
| Artifact & Environment Versioning | Snapshots data, models, and container state for every step | Makes cache behavior inspectable, diffable, and rollbackable |
| Deduplication Across Runs | Shares cached artifacts across multiple pipeline runs and schedules | Cuts GPU and LLM API costs in multi-team / CI / scheduled pipelines |
Ideal Use Cases
- Best for repeated training & retraining workflows: Because your daily or weekly training jobs often see minor data shifts, but not full changes. ZenML caching avoids retraining the same model on effectively identical data and hyperparameters.
- Best for LLM evaluation and agent regression suites: Because you can run large eval matrices on every release candidate without re-paying for identical prompts and criteria, while still getting full lineage and execution traces.
Limitations & Considerations
- Non-deterministic steps:
If a step relies on randomness or external non-deterministic services without stable inputs, caching may hide changes you care about. Useenable_cache=Falseor include explicit seeds/inputs as parameters to stabilize behavior. - Storage and retention:
Caching trades cheap storage for expensive compute. Over time, you may want retention policies or manual cleanup of old artifacts, especially in S3/GCS-backed stores. Plan your artifact lifecycle as part of platform governance.
Pricing & Plans
Caching and deduplication are core capabilities of ZenML’s metadata layer, available across open-source and managed deployments. You don’t pay extra to “turn caching on,” but you do need a properly configured stack (artifact store + metadata store) and, for managed convenience, many teams choose ZenML Cloud.
- Open Source / Self-Hosted: Best for teams that want full control inside their own VPC, managing artifact and metadata stores themselves while leveraging caching across their orchestrators.
- ZenML Cloud: Best for teams wanting a managed control plane with SOC2 Type II and ISO 27001 posture, unified governance dashboards, and seamless setup of caching and lineage without running their own control services.
(For current plan details, feature matrices, and enterprise options, see ZenML’s pricing page.)
Frequently Asked Questions
How do I know if a particular step was cached or re-executed?
Short Answer: Check the pipeline run UI or CLI: ZenML marks whether a step was served from cache or executed fresh, and you can inspect the associated artifacts and metadata.
Details:
In the ZenML dashboard, each pipeline run shows:
- Step status (executed vs. cached)
- Linked artifacts for inputs and outputs
- Execution traces and logs
When a step uses cached outputs, ZenML records that relationship in the run lineage. You can drill down to see:
- Which prior run produced the cached artifact
- The code version, container image, and dependency versions at that time
- The exact fingerprints used to determine cache hits
From the CLI, commands like zenml runs describe <run-id> surface similar information, so you can debug why a step did or did not hit cache.
Can I force ZenML to ignore the cache and recompute a step?
Short Answer: Yes. You can disable caching at the step level, change inputs/parameters, or force a re-run through configuration or ad-hoc pipeline changes.
Details:
There are several strategies:
- Disable caching for a step:
Setenable_cache=Falseon the step decorator to always recompute. - Change a parameter or input:
Tweaking a config value or adding a “cache busting” parameter (e.g.,force_rerun=True) will change the fingerprint and trigger a new execution. - Pipeline-level or run-level toggles:
Depending on your setup, you can also configure caching behavior via pipeline settings or environment variables that influence the step fingerprint.
The goal is control: you keep caching on for normal CI/scheduled runs, but explicitly force recompute when investigating a bug, testing a new library version, or validating that a fix behaves as expected.
Summary
Stop paying to prove the same thing twice. ZenML’s caching and deduplication layer sits on top of your existing orchestrators and infrastructure, fingerprinting step inputs, code, and environment so identical work is only done once. For ML training jobs, that means skipping redundant epochs when data and hyperparameters haven’t changed. For LLM evaluations and agent loops, it means avoiding repeated, identical API calls while still logging full execution traces and lineage.
You get practical control: turn caching on where it saves real money (training, eval, heavy preprocessing), off where non-determinism matters, and let ZenML handle the metadata, artifact versioning, and rollback when a library update breaks your workflows.