Oxen.ai quickstart: what are the exact CLI commands to upload a large dataset and push the first version?
AI Data Version Control

Oxen.ai quickstart: what are the exact CLI commands to upload a large dataset and push the first version?

6 min read

Quick Answer: To upload a large dataset and push the first version with the Oxen CLI, you’ll: log in, init a repo, add your dataset, commit, and push. The core commands are oxen login, oxen init, oxen add, oxen commit -m "...", and oxen push pointing at your remote on Oxen.ai.

Why This Matters

If you can’t reliably version and push large datasets, your whole AI loop stalls. You end up with ad‑hoc S3 buckets, mystery ZIP files, and no clear answer to “which data trained which model?” Using Oxen’s CLI to treat datasets like code—versioned, reviewable, and reproducible—unlocks faster iteration from dataset → fine‑tune → deploy, without custom infra.

Key Benefits:

  • Version large datasets like code: Use oxen init, oxen add, and oxen commit to track massive, multi-modal datasets with Git‑style workflows that don’t fall over on big files.
  • Push once, share everywhere: oxen push sends your dataset and its version history to Oxen.ai so ML, data, product, and creative teams can review and collaborate.
  • Enable reproducible training: With every dataset version pushed and tagged, you can always answer which dataset version trained which model and roll back if needed.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Oxen repositoryA version-controlled project managed by the Oxen CLI, containing datasets, model weights, and metadata.Gives you Git‑like history for large AI assets so you can track exactly what changed and when.
Add & commitThe workflow of staging files with oxen add and recording a snapshot with oxen commit.Turns a loose folder of data into a reproducible dataset version you can reference in experiments and training runs.
Remote pushSending local commits to a remote repository on Oxen.ai with oxen push <remote> <branch>.Centralizes your dataset, enables collaboration, and connects data versions to fine‑tuning and deployment in the Oxen platform.

How It Works (Step-by-Step)

Below is a minimal, end‑to‑end quickstart for uploading a large dataset and pushing the first version. This assumes you’ve already created an Oxen.ai account.

1. Install and log in to the Oxen CLI

Install the CLI (check the docs for your OS if needed), then authenticate:

oxen login

You’ll be prompted to authenticate with your Oxen.ai account. This connects your local CLI to your Oxen user so you can push to your remote repos.

2. Initialize a new Oxen repository

In your project or dataset directory:

cd /path/to/your-dataset-folder

# Initialize a new Oxen repo in this directory
oxen init

This creates the Oxen metadata needed to track versions (similar to git init, but built for large datasets and model artifacts).

If you haven’t already created a repository on Oxen.ai via the UI, do that now (give it a name like my-large-dataset). The UI will show you the remote URL for the repo—something like:

oxen://<your-username>/my-large-dataset

Add that as a remote:

oxen remote add origin oxen://<your-username>/my-large-dataset

3. Add your large dataset files

Put your dataset under the repo directory (if it isn’t already). For example:

# Example: large image dataset
ls
# images/  labels.csv  ...

# Stage everything in the current directory
oxen add .

Or, to be more explicit:

# Add specific paths
oxen add images/
oxen add labels.csv

oxen add is optimized for large files and multi‑modal datasets, so you don’t have to play the “Syncing to S3 will be slow… But zipping it will take forever.” game.

4. Commit your first dataset version

Once the files are added, create the first commit:

oxen commit -m "Initial dataset import"

This snapshots the current state of your dataset. You can later diff versions, rollback, or branch, just like you would with code—but the engine is tuned for big assets.

5. Push to Oxen.ai (first version)

Finally, push your main branch to the remote repo on Oxen.ai:

# Push the current branch (commonly 'main') to the 'origin' remote
oxen push origin main

The CLI will upload your dataset and commit history to Oxen.ai. For large datasets, the first push might take a bit, but subsequent pushes transfer only changes.

Once the push completes, you can:

  • View the dataset and its history in the Oxen UI
  • Share the repo with collaborators
  • Use this dataset version for fine‑tuning custom models, then deploy them to serverless endpoints in a few clicks

Common Mistakes to Avoid

  • Pushing from the wrong directory:
    It’s easy to run oxen init in a parent folder and then push things you didn’t intend. Double‑check pwd and ls so the repo only includes your dataset and related metadata, not your whole home directory.

  • Skipping the remote setup step:
    If you forget oxen remote add origin oxen://<your-username>/my-large-dataset, oxen push origin main will fail or push to the wrong place. Always confirm your remote URL matches what you see in the Oxen.ai UI.

Real-World Example

Say you’ve got a 300 GB image classification dataset spread across messy folders: raw/, train/, val/, plus a labels.csv. You want your team to review it and then fine‑tune a vision model.

Your exact terminal workflow might look like this:

# 1. Log in
oxen login

# 2. Go to your dataset folder
cd /data/image-project

# 3. Initialize the Oxen repo
oxen init

# 4. Add the remote you created in the Oxen UI
oxen remote add origin oxen://team/image-classification-dataset

# 5. Add all relevant data
oxen add train/
oxen add val/
oxen add labels.csv

# 6. Commit the first dataset version
oxen commit -m "Initial labeled image dataset v1.0"

# 7. Push the first version
oxen push origin main

After this push, your product and creative stakeholders can browse the dataset in Oxen, annotate issues, and sign off on the exact version you’ll use to fine‑tune a model. From there, you can move straight into zero‑code fine‑tuning and deploy the resulting model to a serverless endpoint—without spinning up or maintaining infra.

Pro Tip: Treat every meaningful dataset change (new labels, filtered data, added sources) as a new commit with a clear message—e.g., "Filtered low-res images", "Added new category: outdoor_scenes". That discipline is what makes it trivial later to answer “which dataset version improved our model?” or to roll back after a bad change.

Summary

To upload a large dataset and push the first version with Oxen.ai, you:

  1. Log in with oxen login.
  2. Initialize a repo with oxen init.
  3. Add your dataset files with oxen add.
  4. Commit the snapshot with oxen commit -m "...".
  5. Configure your remote and push with oxen remote add origin ... and oxen push origin main.

Those few commands are the foundation for reproducible, dataset‑first AI workflows—version every asset, collaborate at scale, and move cleanly from dataset → fine‑tune → deploy.

Next Step

Get Started