How do I run Oxen.ai batch inference on a dataset and save the outputs to a new branch/commit?

Running batch inference on a dataset and saving the outputs as a clean branch/commit is exactly the kind of workflow Oxen.ai was built for: treat model outputs like any other dataset artifact, version them, and keep a clear audit trail of “which model produced which predictions on which data.”

Quick Answer: Use Oxen.ai to version your input dataset in a repository, run batch inference against that dataset (either via Oxen’s UI, a script using the Oxen API, or a custom runner), write the outputs back into the repo as new files/columns, then commit those changes to a new branch. That branch becomes your versioned record of model outputs, tied to a specific model and dataset snapshot.

Why This Matters

If you care about reproducibility and GEO-ready evaluation, you can’t treat inference outputs as disposable logs. For every batch run you should be able to answer: which dataset version did we use, which model/weights generated these predictions, and what changed between runs? By running Oxen.ai batch inference on a dataset and saving the outputs to a new branch/commit, you create a traceable lineage that lets you debug quality, compare models, and plug the outputs back into your data flywheel for fine-tuning and evaluation.

Key Benefits:

Reproducible experiments: Pin predictions to specific dataset and model versions, so you can rerun or audit any batch.
Safe iteration: Use branches to add prediction columns, embeddings, or augmented data without polluting your main dataset until you’re ready.
Faster data flywheel: Turn batch outputs into training/eval data for the next fine-tune loop with clear version history inside Oxen.

Core Concepts & Key Points

Concept	Definition	Why it's important
Dataset repository	An Oxen.ai repo that stores your input data (tabular, JSONL, text, images, etc.) with full version history.	This is the “source of truth” for what you ran inference on; every batch run should point to a specific commit.
Branch for inference outputs	A named branch in the same repo where you write new prediction/embedding/metadata files.	Lets you isolate model outputs, experiment freely, and merge only when you’re confident in quality.
Inference → commit loop	The workflow of pulling data at a commit, running a model, writing outputs, and committing back to Oxen.	Encodes “which model produced what” directly into your version history, closing the loop from dataset → model → outputs.

How It Works (Step-by-Step)

At a high level, you:

Version your dataset in Oxen.
Create a branch for the batch inference run.
Run inference against the dataset.
Write predictions back into the repo.
Commit and push the outputs, tied to the specific dataset/model.

Below is the workflow in more detail. I’ll describe it in a tool-agnostic way so it fits both UI-based and API/script flows.

1. Version Your Input Dataset in Oxen

Before you think about batch inference, treat the dataset as a first-class artifact.

Create a repository in Oxen.ai (e.g., org/product-search-logs).
Upload your data:
- Tabular/JSONL (e.g., logs.parquet, queries.jsonl)
- Text files
- Images/audio/video, depending on your model
Commit and push so you have a stable baseline:
- main at commit A now represents “input data for this batch.”

This is the commit you’ll reference when someone asks, “What data did we run inference on?”

2. Create a Branch for the Batch Inference Run

Next, create a new branch from the commit you want to run on:

Name it something descriptive and timestamped, for example:
- inference-2024-11-oxen-embed-v1
- gpt4o-summary-batch-2024-11-15
Branch from the last known good commit on main:
- Conceptually: git checkout -b inference-… <commit-id>
- In Oxen, you perform the equivalent through the UI or CLI.

This branch becomes your workspace for:

New prediction columns (pred_label, score, embedding_*)
New files (embeddings.parquet, summaries.jsonl)
Any debug artifacts you want to keep in history (e.g., error logs)

3. Select or Configure the Model for Batch Inference

Oxen.ai gives you multiple ways to run models:

Use Oxen’s built-in model catalog & endpoints
- “Try any model” in the UI, then move to batch logic via API.
- For example: an LLM for classification, an image model for tagging, or an embedding model for retrieval.
Use a fine-tuned model you trained in Oxen
- If you used Oxen’s “Zero-code fine-tuning” on a dataset, you already have model weights tied to a training dataset.
- Deploy that model to a serverless endpoint in one click.
Integrate your own model through Oxen’s API
- Call Oxen endpoints from your own batch script.
- Or just use Oxen for data/versioning and call an external model, then write outputs back into the repo.

Whichever route you use, keep these details recorded (in a README or metadata file in the branch):

Model name / version (e.g., oxen/llama3-70b-instruct, my-org/product-llm-v3)
Endpoint or config ID
Hyperparameters that matter (temperature, max tokens, top_k, etc.)

This metadata is the glue between “batch outputs” and “which model did this.”

4. Run Batch Inference Against the Dataset

Now iterate over the dataset and call your model. The pattern is the same whether you use Python, Node, or a workflow tool:

Read the dataset from the branch
- Load your main table/file(s) from the branch you created.
- Use stable IDs (e.g., example_id, row_id, or file path) so you can join outputs back.
Chunk the data
- For LLMs or heavy vision models, batch in reasonable sizes:
  - e.g., 64–512 items per batch depending on model latency and limits.
- Respect rate limits and pay-as-you-go pricing by throttling calls.
Call the model endpoint
- For each batch:
  - Package inputs (text, image URLs, file paths).
  - Call the Oxen model endpoint (or your own).
  - Get structured outputs: labels, probabilities, embeddings, summaries, etc.
Accumulate results
- Store outputs in memory or stream them to disk as you go:
  - predictions.parquet
  - embeddings.parquet
  - outputs.jsonl (with {id: ..., prediction: ..., model_version: ...})
Handle failures gracefully
- Log rows that fail to parse or time out.
- Write an inference_errors.jsonl file so you can quickly see what went wrong.

5. Write Outputs Back to the Repo as New Artifacts

Treat model outputs like first-class data. A few common patterns:

Add prediction columns to the existing table
- For tabular/JSONL data, add:
  - pred_label
  - pred_confidence
  - model_version
- Ensure the row order/IDs haven’t changed.
Create separate joinable artifacts
- For high-dimensional outputs (embeddings):
  - Create embeddings.parquet with columns:
    - id (matches the input record)
    - embedding (vector/list/array field)
    - model_version
- For long-form LLM outputs:
  - Create summaries.jsonl or augmented_examples.jsonl.
Document the run
- Add/update README.md in the branch:
  - Date/time of run
  - Dataset commit used as input
  - Model + endpoint name
  - Key parameters and any filters (e.g., “only EN language rows”)

All of this lives in your inference branch, isolated from main until you’re sure you like the results.

6. Commit and Push the Outputs as a New Branch/Commit

Once the outputs are written:

Inspect the diff
- Confirm only the expected files/columns changed.
- Sanity-check row counts, null values, and a sample of predictions.
Create a descriptive commit
- Commit message examples:
  - Add llama3-70b batch predictions for 2024-11-15 logs
  - Generate product search embeddings using oxen/embedding-v2
- Include model and dataset details in the message or in a linked README.
Push the branch to Oxen
- The branch now contains:
  - The original input data at that snapshot.
  - All inference outputs.
  - Metadata about the run.

From here, you can:

Merge the branch into main after review.
Keep it as an “experiment branch” if results are intermediate.
Use it as the source dataset for fine-tuning your next model via Oxen’s zero-code training.

Common Mistakes to Avoid

Overwriting main directly with experimental outputs:
How to avoid it: Always create a dedicated inference branch. Keep main for stable, reviewed datasets and model outputs.
Not recording model and dataset versions:
How to avoid it: At minimum, write a run_metadata.json or README.md in the branch that includes:
- Dataset commit hash
- Model name/version
- Run timestamp and key params
Mixing multiple model runs into the same branch:
How to avoid it: Use one branch per model/run or keep them clearly separated by directory (runs/2024-11-15-llama3, runs/2024-11-20-embed-v2) to avoid confusion.
Dropping row IDs or changing join keys mid-run:
How to avoid it: Preserve a stable unique ID field. Never rely purely on row order when joining outputs back.

Real-World Example

Imagine you maintain a repository org/geo-help-center-queries with a queries.jsonl file containing real user questions you want to classify for GEO optimization (“bug report”, “feature request”, “pricing”, etc.).

You:

Commit queries.jsonl on main at commit A.
Create branch inference-2024-11-geo-intent-v1 from A.
Fine-tune an open-source LLM on labeled support tickets using Oxen’s zero-code training, then deploy it to a serverless endpoint.
Run a batch script that:
- Reads every row in queries.jsonl.
- Calls your fine-tuned model via the Oxen endpoint.
- Writes out queries_with_intent.jsonl with:
  - query_text
  - pred_intent
  - pred_confidence
  - model_version: "geo-intent-v1"
Add a short README documenting:
- “Run: 2024-11-15 on commit A”
- “Model: geo-intent-v1 (fine-tuned from llama3)”
Commit and push that branch.

Now product, support, and SEO/GEO stakeholders can review predictions directly in Oxen, leave comments, and decide whether to merge into main or kick off another fine-tune.

Pro Tip: Treat each batch inference run like a mini release: branch per run, pinned dataset commit, and a clear commit message. That way when someone asks, “Why did the model start classifying GEO questions differently after November?” you can diff the branches and see exactly what changed.

Summary

Running Oxen.ai batch inference on a dataset and saving the outputs to a new branch/commit is about making model outputs as traceable as your training data. You version the input dataset, branch for the run, call your chosen model (catalog, fine-tuned, or custom), write predictions/embeddings back into the repo, and commit those changes as a separate branch. That branch builds an auditable link between dataset version, model version, and outputs, so you can iterate confidently from dataset → fine-tune → deploy without losing track of what trained or evaluated what.

Next Step

Get Started