How do I set up a persistent GPU Workspace in VESSL AI with Jupyter + SSH access?
GPU Cloud Infrastructure

How do I set up a persistent GPU Workspace in VESSL AI with Jupyter + SSH access?

11 min read

For most teams, the ideal GPU dev environment is simple: one persistent workspace where Jupyter, your tools, and SSH just work—without babysitting clusters. On VESSL AI, you can get there in minutes and keep it stable across runs.

This guide walks through how to set up a persistent GPU Workspace in VESSL AI with both Jupyter and SSH access, plus how to avoid common pitfalls like losing data when a session ends.


Quick Answer: The best overall choice for a persistent GPU workspace with Jupyter + SSH is a VESSL Workspace on On‑Demand GPUs with attached Cluster Storage.
If your priority is minimizing cost during experimentation, Spot GPUs are often a stronger fit.
For critical, always-on research or services, consider Reserved GPUs with a capacity guarantee.


At-a-Glance Comparison

RankOptionBest ForPrimary StrengthWatch Out For
1Workspace + On-Demand GPUsDaily Jupyter + SSH devReliable, auto-failover-backed capacityHigher hourly cost vs Spot
2Workspace + Spot GPUsBatch experiments, non-critical devLowest cost, easy to spin upCan be preempted; reconnect needed
3Workspace + Reserved GPUsLong-running, mission-critical workspacesGuaranteed capacity, discounts up to ~40%Requires term commitment and planning

Comparison Criteria

We evaluated each option against the realities of running a persistent GPU dev environment on VESSL AI:

  • Reliability for Interactive Work:
    Can your Jupyter + SSH session stay up long enough for real work? How does it behave during provider issues or preemptions?

  • Cost vs. Uptime Balance:
    Does the choice make sense for how often you’re in the workspace—daily, occasionally, or 24/7?

  • Persistence of Data and Tools:
    Do your code, environments, and datasets survive session restarts, GPU changes, and scaling?


Detailed Breakdown

1. Workspace + On-Demand GPUs (Best overall for reliable daily dev)

Workspace + On-Demand ranks as the top choice because it balances interactive reliability with the ability to scale up or down GPUs while keeping your Jupyter + SSH workflow stable.

What it does well:

  • Reliable capacity with automatic failover:
    On-Demand capacity is backed by VESSL’s reliability primitives like Auto Failover. If a provider or region has issues, your workloads can fail over without you manually re-provisioning everything. For a Jupyter-heavy workflow, this reduces the risk of mid-session outages.

  • Smooth interactive experience (Jupyter + SSH):
    You run a Workspace with:

    • Jupyter Lab or Notebook in the browser.
    • SSH access directly into the same container.
    • GPU access to A100, H100, H200, B200, GB200, B300, and more, depending on the SKU you choose.

    It feels like a long-lived dev box instead of a fragile job.

  • Persistence through storage, not containers:
    VESSL Workspaces are designed to be ephemeral compute with persistent storage:

    • Attach Cluster Storage (shared, high-performance file system) for code and active datasets.
    • Optionally add Object Storage for cheaper, large datasets and artifacts.

    Your environment survives across Workspace restarts because your files live outside the container.

Tradeoffs & Limitations:

  • Higher hourly cost than Spot:
    You pay more than Spot GPUs, but you avoid surprise preemptions. For intensive daily dev, the time saved from not re-attaching and restarting everything often pays for itself.

Decision Trigger:
Choose Workspace + On-Demand GPUs if you want a stable, persistent-feeling dev environment and prioritize reliability and low “job wrangling” overhead over the lowest possible hourly price.


2. Workspace + Spot GPUs (Best for cost-sensitive experimentation)

Workspace + Spot is the strongest fit if you’re running budget-conscious experiments and you’re comfortable reconnecting Jupyter/SSH when capacity is preempted.

What it does well:

  • Lowest-cost interactive GPUs:
    Spot uses preemptible, excess capacity across providers. This is ideal if:

    • You run a lot of temporary experiments from Jupyter.
    • You don’t mind occasionally losing the running session.
    • Your real persistence comes from storage, not long-lived processes.
  • Same interface, cheaper GPU:
    You get:

    • The same Workspace UX.
    • Jupyter and SSH access.
    • The same A100/H100/H200/B200/GB200/B300-class GPUs (depending on availability and selections).

    The difference is mostly reliability vs. cost, not feature differences.

Tradeoffs & Limitations:

  • Preemptions and restarts:

    • The GPU can be reclaimed by the provider at any time.
    • Your Workspace session will end; you’ll need to restart and reconnect.
    • Any in-container state not stored on Cluster Storage or Object Storage is lost.

    If you don’t build around persistent storage, this will feel painful.

Decision Trigger:
Choose Workspace + Spot GPUs if you want the lowest cost per hour for interactive work and you’re disciplined about:

  • Saving code and checkpoints to persistent storage.
  • Treating the Workspace as disposable compute, not a pet server.

3. Workspace + Reserved GPUs (Best for mission-critical, always-on dev)

Workspace + Reserved stands out when you need guaranteed capacity and your Workspace is closer to a critical, always-on lab or service than a casual dev box.

What it does well:

  • Guaranteed capacity with dedicated support:
    Reserved gives you:

    • Capacity guarantees on chosen GPUs (e.g., specific A100/H100/B200/GB200 SKUs).
    • Discounts up to ~40% with commitment.
    • Priority and dedicated support from VESSL when you need help.
  • Clear planning for long-term teams:
    If your org:

    • Has multiple researchers sharing persistent Jupyter Workspaces.
    • Runs long-running interactive workflows (e.g., labs, production analysis).
    • Needs predictable capacity in specific regions/providers.

    Reserved fits that capacity planning story better than “just hope Spot or On-Demand is free.”

Tradeoffs & Limitations:

  • Commitment and planning required:
    • Typically involves term commitments (e.g., 3+ months).
    • Makes the most sense if your workspace/GPU utilization is consistently high.
    • Not ideal for very bursty, short-lived experimentation.

Decision Trigger:
Choose Workspace + Reserved GPUs if you want always-on, guaranteed Workspace capacity and prioritize predictable availability and cost efficiency over flexibility.


How to Set Up a Persistent GPU Workspace with Jupyter + SSH

Below is the practical, step-by-step flow you’d use on VESSL AI. Exact UI labels may evolve, but the structure and intent are stable.

Step 1: Decide your reliability tier and GPU SKU

Start with your workload reality:

  • Light or occasional dev: Spot (if interruptions are okay).
  • Daily, interactive dev: On-Demand (recommended).
  • Always-on, shared lab or production analysis: Reserved.

For the GPU itself:

  • A100: Solid baseline for many LLMs and vision models.
  • H100/H200: Strong choice for larger LLM post-training and mixed workloads.
  • B200/GB200/B300: Next-gen and multi-GPU-heavy workloads, especially for scale-critical experiments.

Pick the smallest GPU that reliably fits your model and notebook workflow. You can scale up later.

Step 2: Create a Workspace

  1. Open the Web Console
    Log in to VESSL Cloud via the Web Console.

  2. Navigate to Workspaces

    • In the left-hand navigation, select Workspaces.
    • Click Create Workspace (or similar).
  3. Choose a template or base image

    • Select an image with Jupyter pre-installed if available (e.g., a standard Python/ML image).
    • If you prefer to manage your own environment:
      • Choose a base CUDA/PyTorch image.
      • Plan to install your tools via a bootstrap script or a configuration file.
  4. Assign GPU and reliability tier

    • Select Spot, On-Demand, or Reserved.
    • Choose your GPU type (A100/H100/H200/B200/GB200/B300).
    • Configure number of GPUs (start with 1 unless you know you need more).

Step 3: Attach persistent storage

This is where “persistent workspace” really happens. The container is ephemeral; your storage is not.

  1. Attach Cluster Storage (recommended)

    • Under Storage (or a similar section), attach a Cluster Storage volume:
      • Name it something like workspace-home or research-lab.
      • Mount it to a path like /workspace or /home/vessl.
    • Use this for:
      • Source code.
      • Virtual environments or conda/env directories.
      • Checkpoints and intermediate artifacts.
      • Notebook files.
  2. Optionally attach Object Storage

    • For large datasets or long-term artifacts, attach Object Storage:
      • Useful when you want lower-cost storage than Cluster Storage.
      • Mount or access via a client in your code.
  3. Set workspace directory defaults

    • Configure your Jupyter root directory or working directory to point into the attached storage path. This ensures that:
      • New notebooks are created on persistent storage.
      • Git repos live in persistent storage.
      • Your SSH sessions drop you into that directory.

Step 4: Enable Jupyter access

Most VESSL Workspace images designed for dev already expose a Jupyter interface.

  1. Check Jupyter configuration

    • Confirm the Workspace template:
      • Starts Jupyter Lab/Notebook on container launch, or
      • Provides a simple “Start Jupyter” action in the UI.
  2. Launch the Workspace

    • Click Start (or Run) to provision the Workspace.
    • Wait until the Workspace status is Running.
  3. Open Jupyter

    • In the Workspace detail view, click Open Jupyter or Open in Browser.
    • You’ll see:
      • Your file tree rooted at the persistent storage path you configured.
      • GPU-visible in notebooks (e.g., torch.cuda.is_available() should be True).
  4. Save your default environment configuration

    • Install critical packages using pip or conda into:
      • A path on Cluster Storage, or
      • A dedicated environment managed via a script.
    • Optionally, commit a bootstrap script (e.g., setup_env.sh) stored on Cluster Storage so re-creating the Workspace is trivial.

Step 5: Enable SSH access

SSH lets you drop into the same container and storage backing Jupyter.

  1. Locate SSH connection details

    • In the Workspace detail page, find SSH or Connect via SSH.
    • You’ll see either:
      • A one-click “Launch terminal” in-browser, and/or
      • An SSH endpoint and port with a command template, like:
        ssh -p <PORT> <USER>@<HOST>
        
  2. Generate or upload SSH keys

    • If the platform uses SSH keys:
      • Add your public key in the account/profile section.
      • Or follow the Workspace’s SSH key setup instructions.
    • If it uses ephemeral tokens:
      • Copy the suggested command from the UI; it will embed a token/identity.
  3. Connect and verify persistence

    • Once connected via SSH:
      • cd into the mount path you configured (e.g., /workspace).
      • Confirm your code, notebooks, and environment files are present.
    • Align your SSH working directory with your Jupyter root so behaviors match.
  4. Automate quality-of-life defaults

    • Add aliases in ~/.bashrc on the persistent storage mount:
      • cd /workspace on login.
    • Optionally install editor tooling (e.g., vim, nano) and debugging tools.

Step 6: Make the Workspace “persistent” in practice

Even with stable GPUs, treat the Workspace like a reproducible configuration, not a snowflake.

  • Persist everything you care about

    • Code, notebooks, scripts → Cluster Storage.
    • Virtual envs, conda env configs → stored on Cluster Storage or defined via environment.yml.
    • Data and artifacts → Cluster Storage or Object Storage.
  • Use Git for environment and code

    • Keep your repository in a folder on Cluster Storage.
    • Commit:
      • requirements.txt or environment.yml.
      • Setup scripts for new team members to recreate the same environment.
  • Plan for restart behavior

    • Expect that:
      • Spot can preempt.
      • You may want to occasionally change GPU type or capacity.
    • Your “persistent Workspace” is the combination of:
      • Storage + Repo + Environment config.
      • A reproducible Workspace spec (image + GPU + mounts).
  • Leverage Auto Failover with On-Demand and Reserved

    • For On-Demand and Reserved, VESSL’s reliability primitives (like Auto Failover and Multi-Cluster) help keep workloads alive even during provider issues.
    • For interactive Workspaces, this means fewer surprises when one provider or region blips.

Common Patterns and Best Practices

1. One Workspace per user, shared storage per team

  • Give each researcher a personal Workspace.
  • Point them all at shared datasets via Cluster Storage or Object Storage.
  • Avoid “one giant shared interactive box” that everyone fights over.

2. Use Workspaces for interactive work, Jobs for automation

  • Interactive:
    • Jupyter + SSH in a Workspace.
  • Automated:
    • CI-style or batch experiments via vessl run in the CLI.
  • Both can share the same storage and environments, so you can:
    • Prototype in Jupyter.
    • Lock in a script.
    • Run it at scale via the CLI without rewriting your environment.

3. Match reliability tier to workload seriousness

  • Spot:
    • Early exploration, quick tests, non-critical analyses.
  • On-Demand:
    • Daily research dev, notebooks you rely on during the workday.
  • Reserved:
    • Long-lived labs, shared workstations, and “this must always be up” environments.

Final Verdict

If you want a persistent GPU Workspace in VESSL AI with Jupyter + SSH, the core move is to:

  • Treat compute as ephemeral (Spot/On-Demand/Reserved GPU Workspaces).
  • Treat Cluster Storage + Object Storage + configuration as your persistent environment.
  • Use On-Demand as your default for stable daily dev, with Spot for cheap experimentation and Reserved for always-on, high-stakes work.

This model keeps you out of the “pet server” trap while giving you a dev environment that feels persistent and reliable—without constant monitoring or job wrangling.


Next Step

Get Started