Oxen.ai quickstart: what are the exact CLI commands to upload a large dataset and push the first version?
AI Data Version Control

Oxen.ai quickstart: what are the exact CLI commands to upload a large dataset and push the first version?

6 min read

Quick Answer: To upload a large dataset and push the first version with the Oxen CLI, you’ll: authenticate, create a repo, initialize it locally, add your dataset files, commit them, and push to Oxen. The core commands are oxen auth, oxen repo create, oxen init, oxen add, oxen commit, and oxen push.

Most teams don’t struggle to train a model—they struggle to track which exact dataset trained which exact model. A clean Oxen.ai quickstart for your first large dataset upload solves that immediately: you get a versioned repository, reproducible history, and a single source of truth for everyone touching the data.

Key Benefits:

  • Version Every Asset: Treat large datasets like code, with commit history, diffs, and reproducible versions.
  • Handle Large Files at Scale: Skip fragile “zip-then-sync-to-S3” workflows and let Oxen handle big artifacts directly.
  • Unblock Collaboration: Give ML, product, and creative teams a shared, reviewable dataset they can trust.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Oxen RepositoryA version-controlled store for datasets, model weights, and other large AI artifacts.Keeps all assets for a project in one place with full history: “which data trained which model?” becomes answerable.
Commit & VersionA snapshot of your repository at a point in time, created via oxen commit.Enables rollbacks, audits, and reproducible training runs mapped to specific data states.
Push to RemoteUploading local commits to Oxen’s servers with oxen push.Makes your dataset shareable, backed up, and accessible for fine-tuning and deployment workflows.

How It Works (Step-by-Step)

Below is a practical Oxen.ai quickstart for a typical workflow: you have a local folder with a large dataset and want to upload and version it with the CLI.

Assumptions:

  • You already created an Oxen.ai account at https://www.oxen.ai.
  • You have the Oxen CLI installed (see “Read the docs” on the site if you need install details).
  • You’re in a shell (macOS, Linux, or WSL) with your dataset directory ready, e.g. ~/data/my_large_dataset.

1. Authenticate with Oxen

First, log in so the CLI can talk to your Oxen account.

oxen auth login

Follow the prompt (typically opens a browser) and confirm login. This ties your CLI to your Oxen user so you can create and push repos.

2. Create a remote repository on Oxen

Create a new repo on Oxen to hold this dataset. Replace YOUR_USERNAME and my-large-dataset with your own values.

oxen repo create YOUR_USERNAME/my-large-dataset \
  --description "Large dataset for [project name] – first version"

This creates a remote repository on Oxen.ai where your data, versions, and history will live.

3. Initialize a local Oxen repository in your dataset folder

Move into your dataset directory and initialize Oxen tracking.

cd /path/to/my_large_dataset

# Initialize Oxen repository
oxen init

This creates the Oxen metadata (similar to git init) so the CLI can track your files and versions.

4. Add your large dataset files

Add all files in the current folder:

oxen add .

Or, if you want to be explicit (for example, only the images and labels folders):

oxen add images/ labels/

Under the hood, Oxen is staging these large assets in a way that’s safe for big files—no manual zipping, no hand-rolled S3 sync scripts.

5. Commit the first dataset version

Now record the snapshot of your dataset as the first commit:

oxen commit -m "Initial dataset import – v1.0"

This creates a versioned state of your dataset. If someone asks “what did v1.0 contain?”, this commit is the answer.

6. Set the remote and push the first version

If you created the remote with the same name via CLI, Oxen can often infer it. If you need to set it explicitly, use:

oxen remote add origin oxen://YOUR_USERNAME/my-large-dataset

Then push your committed data to Oxen.ai:

oxen push origin main

Adjust the branch name if you’re using something other than main. The CLI will upload your large dataset and associate it with the commit you just created.

7. Verify the dataset in the Oxen UI

Once the push completes:

  1. Go to https://www.oxen.ai/YOUR_USERNAME/my-large-dataset
  2. Browse files, preview samples, and verify the commit message and version history.
  3. Invite collaborators as needed so they can review and edit data with you.

Now you’ve completed the Oxen.ai quickstart loop: local dataset → versioned commit → remote repo on Oxen.

Common Mistakes to Avoid

  • Skipping the commit step:
    If you run oxen push without a prior oxen commit, nothing meaningful gets uploaded. Always oxen commit -m "message" before pushing.

  • Initializing in the wrong directory:
    Running oxen init one level above your dataset can accidentally pull in unrelated files. cd into the exact dataset folder you want to track, confirm with pwd and ls, then run oxen init.

  • Using ad-hoc zips instead of Oxen:
    Old habits die hard—don’t zip your dataset and store it as a single blob. Let Oxen version individual files and directories so you can diff changes and selectively update subsets.

Real-World Example

Say you’re building a vision model for product images. You’ve got:

  • images/ with 500,000 JPEGs
  • annotations/ with JSON or CSV labels
  • A model team and a creative team both needing to review the dataset

You run:

cd ~/datasets/product_images
oxen auth login
oxen repo create YOUR_USERNAME/product-images \
  --description "Product image classification dataset"
oxen init
oxen add images/ annotations/
oxen commit -m "Initial product image dataset – 500k images"
oxen remote add origin oxen://YOUR_USERNAME/product-images
oxen push origin main

Now everyone can open the Oxen repo, inspect samples, propose label fixes, and you can later fine-tune a model directly against this exact versioned dataset—no mystery about which “final_final_v3.zip” was used.

Pro Tip: After pushing v1, create smaller “experiment” subsets as new commits (e.g., filtered by quality, geography, or category) in branches. That way you can benchmark model performance across data variants without duplicating raw assets all over S3.

Summary

For an Oxen.ai quickstart with a large dataset, the core CLI flow is:

oxen auth login
oxen repo create YOUR_USERNAME/my-large-dataset --description "First version"
cd /path/to/my_large_dataset
oxen init
oxen add .
oxen commit -m "Initial dataset import – v1.0"
oxen remote add origin oxen://YOUR_USERNAME/my-large-dataset
oxen push origin main

You end up with a versioned dataset repository you can query, explore, and feed directly into fine-tuning and deployment workflows—without custom infra or homegrown S3 scripts.

Next Step

Get Started