
Coder installation options: Kubernetes vs VMs — which should we choose for an enterprise rollout?
Most platform teams evaluating Coder for an enterprise rollout end up asking the same question: should we run coderd on Kubernetes, or on a VM fleet? The right answer depends less on what’s “modern” and more on what you already operate reliably, how you plan to scale, and where your compliance boundaries sit.
Quick Answer: For most enterprises with an existing Kubernetes footprint and platform team, Kubernetes is the better default for running coderd at scale; a VM-based install is often the fastest path for initial pilots, smaller teams, or environments where Kubernetes isn’t yet a first-class platform.
Frequently Asked Questions
How should we decide between Kubernetes and VMs for our Coder rollout?
Short Answer: Use Kubernetes if you already run critical workloads on it and have a supporting platform team; use VMs if you need a simpler, self-contained deployment or don’t operate Kubernetes yet.
Expanded Explanation:
Coder’s control plane (coderd) runs cleanly on both Kubernetes and traditional virtual machines. The trade-off is operational: Kubernetes gives you built-in primitives for scaling, rolling updates, and multi-tenant isolation, while VMs keep the stack simpler and easier to understand if you don’t already run clusters. In both modes, you can still host developer workspaces on Kubernetes or VMs; the key decision is where coderd itself lives, and who will own it operationally.
In regulated or air‑gapped environments, I look at one thing first: what does your team know how to patch, back up, and monitor at 2 a.m.? If that’s Kubernetes—with Helm, network policies, and cluster autoscaling already in place—Kubernetes is your natural control plane. If it’s a hardened VM image with config management (Ansible, Chef, Puppet, or cloud-init) and a traditional load balancer, a VM-based install makes more sense to start, and you can still connect it to Kubernetes clusters that host the actual workspaces.
Key Takeaways:
- Choose Kubernetes when you already operate it as a platform and want built-in scaling, rollout, and isolation primitives.
- Choose VMs when operational simplicity, quick evaluation, or non-Kubernetes shops matter more than deep cluster integration.
What’s the process to deploy Coder on Kubernetes vs on VMs?
Short Answer: On Kubernetes, you install Coder using a Helm chart into an existing or new cluster; on VMs, you deploy coderd and its dependencies onto one or more virtual machines behind a load balancer.
Expanded Explanation:
A Kubernetes deployment of coderd looks like any other operator-grade application: you prepare a cluster (GKE, EKS, AKS, or on-prem), provision the right machine types, then install Coder with Helm using your values for storage, ingress, and identity (OIDC SSO). You can co-locate workspace pods on the same cluster or dedicate separate clusters and connect them via network and Terraform templates.
A VM deployment is more traditional: you provision one or more VMs (cloud or on‑prem), install coderd and its dependencies, configure TLS, and front it with a load balancer or reverse proxy. Workspaces can still run on Kubernetes or additional VMs; coderd just isn’t sharing a control plane with them. This can be easier for constrained environments—for example, air‑gapped networks where Kubernetes isn’t accredited yet, but hardened VMs are.
Steps:
-
For Kubernetes:
- Size and create your cluster.
- For GKE, choose
ubuntu_containerdorcos_containerdnode images; Docker-based node images are not supported in GKE 1.24+ and will block cluster creation. - Start with 2 nodes for evaluation and enable autoscaling up to 8 (or more) nodes depending on workspace load and GPU needs.
- If you’ll attach GPUs to workspaces, use general-purpose
n1machine types.
- For GKE, choose
- Configure cluster features.
- Enable network policies and autoscaling.
- For GKE, common flags include:
--enable-autoscaling,--min-nodes 1,--max-nodes 8,--enable-network-policy,--addons HorizontalPodAutoscaling,HttpLoadBalancing.
- Install coderd via Helm.
- Provide values for ingress (TLS, hostnames), persistent storage, OIDC SSO, and RBAC defaults.
- Validate health, logging, and workspace provisioning from your templates.
- Connect additional clusters or node pools for workspaces as your fleet grows.
- Size and create your cluster.
-
For VMs:
- Provision VMs on your target platform (AWS/Azure/GCP, vSphere, OpenStack, bare metal).
- Size the primary coderd VM for control-plane needs (CPU, memory, and disk for logs and metadata).
- Optionally front multiple coderd instances with a load balancer for HA.
- Install coderd and dependencies.
- Harden the OS, configure TLS certificates, and set up log shipping to your SIEM.
- Configure OIDC SSO and RBAC directly on the coderd configuration.
- Configure workspaces.
- Point workspace templates at Kubernetes clusters or additional VM pools via Terraform.
- Flesh out Terraform templates to standardize CPU/memory/GPU, disks, dev URL policies, and permitted IDEs.
- Provision VMs on your target platform (AWS/Azure/GCP, vSphere, OpenStack, bare metal).
-
For both:
- Define and test your “golden path” Terraform templates.
- Set idle-stop policies and quotas.
- Wire coderd audit logs into your central logging and monitoring stack.
How do Kubernetes and VMs compare for scale, governance, and performance?
Short Answer: Kubernetes tends to win for high-scale, multi-tenant governance and elastic performance; VMs can be simpler for smaller or highly controlled deployments, but require more manual scaling and policy enforcement.
Expanded Explanation:
Kubernetes gives you fine-grained control over how coderd runs and how workspaces get scheduled: pod/resource limits, node pools per team, network policies, and autoscaling are all native. That’s ideal when you need to govern hundreds or thousands of developers and AI coding agents, spread across multiple clusters or clouds, and still keep cost and performance predictable.
VM-based installations keep operational overhead low when your footprint is smaller or more static. You’re relying on your hypervisor or cloud auto-scaling for elasticity rather than Kubernetes auto-scaling and HPA. Governance is still strong—Coder’s OIDC SSO, RBAC, dev URL access levels, and workspace templates all work the same—but isolation and bin-packing are less dynamic than what you can get from Kubernetes.
Comparison Snapshot:
-
Kubernetes:
- Native auto-scaling (HPA, cluster autoscaler), node pools, and network policies.
- Strong multi-tenant isolation with pod-level and namespace-level controls.
- Best fit when you already have a platform team and want to standardize dev environments across clusters and regions.
-
VMs:
- Simpler setup and mental model; fewer moving parts to accredit.
- Depends on VM orchestration or scripts for scaling and failover.
- Best for smaller teams, restricted environments, or organizations without mature Kubernetes operations.
-
Best for:
- Kubernetes when your priority is large-scale, multi-team governance and efficient resource utilization across many workspaces.
- VMs when your priority is fast time-to-value with minimal new platform surface area.
How do we implement Coder so developers get speed and security teams get control?
Short Answer: Define Coder workspaces as Terraform templates, run coderd on your chosen platform (Kubernetes or VMs), and enforce identity, network, and AI governance through OIDC SSO, RBAC, dev URL policies, and AI Bridge.
Expanded Explanation:
Regardless of where you install coderd, the implementation pattern is the same: you standardize everything as code. Workspaces are defined via Terraform templates that specify compute, storage, base images, and allowed IDEs (VS Code Remote, JetBrains Gateway, browser IDEs, Cursor, Windsurf, Jupyter). Platform teams maintain these templates, while developers self-serve workspaces in seconds from a catalog that already encodes your guardrails.
Security and governance controls live in the coderd control plane: you integrate your identity provider using OIDC SSO, define roles via RBAC, and use dev URL access levels to control who can reach running workspaces and how. For AI, you enable AI Bridge inside coderd so that AI coding agents and editor plugins call LLMs through your infrastructure, with prompts, token usage, and tool invocations logged for audit.
What You Need:
- Infrastructure & platform baselines:
- A Kubernetes cluster (with supported node images like
ubuntu_containerd/cos_containerd) or hardened VMs with TLS, backups, and monitoring. - Network paths from coderd to your workspace pools (Kubernetes node pools or VM fleets), Git providers, and artifact registries.
- A Kubernetes cluster (with supported node images like
- Governance & templates:
- OIDC SSO integration (e.g., Okta, Azure AD, Google Workspace) and RBAC role definitions.
- A set of Terraform-based workspace templates that reflect your “golden paths” per team, with clear policies for compute limits, idle-stop behavior, dev URL access, and AI Bridge usage.
Caution: Don’t store secrets in Terraform templates. Assume every user with access to a template can see it in cleartext. Use your existing secret management approach (cloud KMS, Vault, etc.) and authenticated Terraform providers instead.
Strategically, which option positions us better for long-term enterprise use?
Short Answer: Kubernetes generally positions you better for a long-term, enterprise-wide rollout, but the best strategy is often VM-based evaluation now, with a clear path to a Kubernetes-backed control plane once you prove internal value.
Expanded Explanation:
Coder is used by organizations like the U.S. Department of Defense, Dropbox, Palantir, Discord, Goldman Sachs, and Mercedes to standardize remote development at scale. In those environments, the winning pattern is consistent: the control plane runs on a platform that already has organizational trust and operational maturity. For many enterprises, that’s Kubernetes—with established processes for upgrades, DR, and accreditation—so Kubernetes becomes the long-term home for coderd and its associated services.
At the same time, you don’t need to over-engineer day one. I’ve seen teams get real leverage by standing up Coder quickly on VMs to replace fragile laptop setups or expensive VDI, prove outcomes like 4x faster onboarding and 90% VDI cost reduction, and then migrate coderd into their primary Kubernetes platform once the internal demand and budget are anchored. In both scenarios, your Terraform templates, governance model, and AI Bridge configuration stay largely the same; you’re just changing where the control plane pods/processes run.
Why It Matters:
- Strategic fit: Aligning Coder with your primary platform (Kubernetes or VM-based) determines who owns it, how fast you can scale it, and how easily you can meet accreditation and logging requirements.
- Future-proofing: A Kubernetes-based control plane generally provides the most flexibility for multi-cluster, multi-cloud, and high-scale AI agent usage—without changing Coder’s core operating model around Terraform templates and on‑infrastructure governance.
Quick Recap
Choosing between Kubernetes and VMs for your Coder installation is an operational decision, not a feature trade-off. Kubernetes is usually the better long-term home when you already operate clusters for critical workloads and want tight, automated control at scale. VMs are often the fastest way to prove value in constrained or early-stage environments. In both cases, the real leverage comes from defining workspaces as Terraform, enforcing identity and access through OIDC SSO + RBAC + dev URL policies, and keeping source code and AI context inside infrastructure you control.