self-hosted AI coding agent Kubernetes VPC (behind firewall)
AI Coding Agent Platforms

self-hosted AI coding agent Kubernetes VPC (behind firewall)

7 min read

Most engineering leaders asking about “self-hosted AI coding agent Kubernetes VPC (behind firewall)” are wrestling with the same constraint: you want real autonomy on your codebase, but nothing leaves the controlled blast radius of your own VPC. You need agents that can run in Kubernetes, respect your firewall, integrate with GitHub and CI, and still give you full visibility into every command, diff, and artifact they produce.

Quick Answer: A self-hosted AI coding agent in a Kubernetes VPC runs as containerized services behind your firewall, connects to your internal Git/GitHub Enterprise, CI, and ticketing systems, and talks to your chosen LLMs via controlled egress or BYO-LLM. Platforms like OpenHands give you a sandboxed runtime, model-agnostic LLM selection, and full auditability so you can automate code reviews, test fixes, and dependency upgrades without exposing your code to a third-party SaaS.

Why This Matters

AI agents are moving from “interesting demos” to core SDLC infrastructure. If you let them touch source code, secrets, or production-like environments, you can’t treat them as a black-box SaaS. You need the same controls you demand from CI/CD: Kubernetes-native deployment, RBAC, audit logging, and the ability to replay what happened when something breaks.

Self-hosting an AI coding agent inside your Kubernetes VPC means:

  • Your code, configs, and logs remain inside your network boundary.
  • You control the runtime (Docker/Kubernetes) and the execution policy.
  • You can wire agents into existing workflows (GitHub/GitLab, Jenkins, GitHub Actions, Argo, Jira) without punching risky holes through your firewall.

Done right, this lets you take on the “outer loop” work—code reviews, test maintenance, dependency upgrades, vulnerability remediation—at scale, without sacrificing security or governance.

Key Benefits:

  • Data control & compliance: Keep source code, artifacts, and logs inside your VPC with fine-grained access control and auditability.
  • Real autonomy with guardrails: Run agents in a containerized sandbox, inspect every diff and log, and re-run tasks deterministically when needed.
  • Scalability & flexibility: Scale from a single coding agent to thousands of parallel runs, using your choice of models and Kubernetes primitives.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Self-hosted AI coding agentAn autonomous coding system you deploy and operate yourself, rather than consuming as a multi-tenant SaaS.Keeps code and telemetry within your control, lets you integrate with internal systems, and aligns with compliance requirements.
Kubernetes VPC deploymentRunning the agent platform as pods and services inside a Kubernetes cluster that lives in your VPC or private cloud.Gives you standard infrastructure controls: namespaces, NetworkPolicies, service accounts, autoscaling, and integration with existing observability.
Sandboxed runtime & model-agnostic LLMsExecuting agent work in containerized sandboxes with scoped credentials, while routing prompts to whichever LLMs you choose (BYOK, multiple providers, self-hosted models).Limits blast radius when agents touch code/infrastructure and avoids vendor lock-in while still getting best-in-class model performance.

How It Works (Step-by-Step)

At a high level, a “self-hosted AI coding agent Kubernetes VPC (behind firewall)” setup looks like this: Kubernetes hosts the agent platform and sandboxes, your VPC provides the network boundary, and you connect to LLMs via controlled egress or internal model endpoints. OpenHands is an example of such a platform: open, model-agnostic, and built to run in isolated Docker or Kubernetes environments with full auditability.

  1. Deploy the agent platform into your cluster

    • Install the core platform (e.g., OpenHands) as Kubernetes deployments/services in a dedicated namespace.
    • Use Kubernetes primitives to harden the environment:
      • NetworkPolicies to restrict egress only to approved LLM endpoints and internal services (Git, CI, artifact stores).
      • Service accounts and RBAC roles scoped to what the agents actually need (e.g., access to certain repos, not the whole cluster).
      • PodSecurity and securityContext settings to constrain capabilities (no privileged containers, read-only root FS where feasible).
    • Optionally front the Web GUI with your standard ingress controller (NGINX, Envoy, ALB/ELB ingress) and SSO/SAML for authentication.
  2. Configure sandboxed runtimes and credentials

    • Define how individual agents run tasks:
      • Each agent invocation spins up in a containerized sandbox (Docker-in-Docker, isolated pods, or sidecar-based sandboxes).
      • Mount only the repositories, config files, and env vars needed for that task.
    • Use scoped credentials:
      • Git access tokens limited to specific orgs or repos.
      • Short-lived CI tokens for pushing branches or triggering pipelines.
      • No direct access to production secrets; if you need to touch infra, do it through existing automation (Terraform PRs, deployment pipelines).
    • Ensure every run is auditable:
      • Persist logs, shell commands, and agent reasoning traces.
      • Store generated artifacts (diffs, PRs, test files, release notes) in version control or your artifact store.
  3. Wire agents into your SDLC workflows

    • Connect to your code hosts:
      • GitHub/GitHub Enterprise, GitLab, or Bitbucket inside your VPC.
      • Agents can summarize pull requests, apply reviewer feedback, fix failing tests, and push branches back.
    • Integrate with CI/CD:
      • Trigger agents from CI jobs to remediate failing tests or flaky suites.
      • Run OpenHands headlessly from your pipelines for scheduled tasks (e.g., weekly dependency upgrades).
    • Hook into collaboration tools:
      • Slack or chat tools (if allowed) for triggering agents on demand.
      • Issue trackers like Jira to turn bug tickets and vulnerability issues into reviewable PRs.
    • Choose your model strategy:
      • BYOK to cloud LLMs via egress that’s locked down and monitored (Anthropic, OpenAI, Bedrock, etc.).
      • Or route to self-hosted models inside the same VPC if you’re running your own inference stack.
    • Scale out as you prove trust:
      • Start with a small set of repos and read-only modes.
      • Grow to parallel agent runs across repositories once you’ve validated observability, RBAC, and sandboxing.

Common Mistakes to Avoid

  • Treating agents like a SaaS add-on instead of infrastructure:

    • How to avoid it: Manage your self-hosted AI coding agent as first-class infra. Use IaC (Helm, Kustomize, Terraform) to deploy OpenHands or similar platforms, integrate them with centralized logging/metrics, and subject them to the same change-management processes as CI/CD.
  • Giving agents broad, unscoped access to code and credentials:

    • How to avoid it: Design for least privilege from day one. Create per-project service accounts, repo-scoped tokens, and sandbox configs. Make sure you can see exactly what each agent run touched, and have fast ways to roll back or disable access if needed.

Real-World Example

Imagine you’re running a regulated fintech with a Kubernetes cluster in a locked-down VPC. You’ve got hundreds of microservices, a mix of GitHub Enterprise and GitLab, and a backlog of security vulnerabilities and dependency upgrades that never quite gets to the top of the sprint.

You deploy OpenHands into a dedicated namespace in your production-adjacent Kubernetes cluster. The Web GUI is fronted by your internal ingress and SSO; only authenticated engineers can launch or approve tasks. Each agent run spins up in its own sandboxed pod with access to a single repo and a repo-scoped token.

You configure egress so the cluster can only talk to your chosen LLM providers and your internal Git and CI endpoints. Agents start by summarizing pull requests, generating tests, and applying review feedback—each change landing as a PR that your team can review. Once you’re satisfied with visibility and behavior, you introduce a weekly pipeline that runs OpenHands headlessly: parallel agents sweep repos for outdated dependencies and known vulnerable versions, then open PRs to upgrade and regenerate tests. All of this happens behind your firewall, in a runtime you control, with full logs and artifacts stored in your existing observability stack.

Pro Tip: Treat your self-hosted AI coding agent like a new CI worker tier: separate namespaces, strict NetworkPolicies, distinct service accounts, and end-to-end logging from “trigger” to “PR merged.” That framing makes security and platform teams far more comfortable approving the rollout.

Summary

Running a self-hosted AI coding agent on Kubernetes inside your VPC gives you the best of both worlds: the ability to automate serious engineering work—code reviews, tests, dependency upgrades, vulnerability fixes—without pushing your source code or execution traces into a black-box SaaS. The key is to treat the agent platform as infrastructure: containerized sandbox runtimes, model-agnostic LLM routing, SSO/RBAC, audit logging, and deterministic re-runs.

OpenHands was built for exactly this deployment pattern: cloud coding agents that scale from one task to thousands of parallel runs, running in isolated Docker or Kubernetes environments you control, with full visibility into every agent and artifact. If you want “self-hosted AI coding agent Kubernetes VPC (behind firewall)” without sacrificing autonomy or security, that’s the architecture to aim for.

Next Step

Get Started