AutoGen: how do I run model-generated code using DockerCommandLineCodeExecutor with restricted filesystem and no network?

Most teams hit the same wall with model-generated code: you want AutoGen agents to write and run code, but you cannot give them a shell on your host, a writable filesystem, or a path to the public internet. DockerCommandLineCodeExecutor in autogen-ext is exactly the escape hatch for that—it lets you execute model-generated code inside a Docker container with tight controls on filesystem access and networking.

Quick Answer: Use DockerCommandLineCodeExecutor from autogen-ext and configure its Docker options (image, volumes, user, and security flags) so the container has a minimal, throwaway filesystem and no network. Then wire the executor into your AutoGen agents or tools so any model-generated code runs inside that sandbox instead of on the host.

Why This Matters

In a regulated environment, “let the LLM run this code” is a non-starter unless you can prove isolation, auditability, and blast-radius limits. AutoGen’s DockerCommandLineCodeExecutor gives you a concrete runtime boundary: all model-generated code runs in a container you control, with read-only mounts, ephemeral workspaces, and network disabled by default via Docker’s security features.

You stop debating “is code execution safe in principle” and start reasoning in concrete controls: what image, what mounts, what user, what capabilities.

Key Benefits:

Hard isolation via containers: Keep model-generated code out of your host OS and production runtimes by executing it in a tightly-scoped Docker container.
Filesystem control: Use read-only mounts and throwaway working directories to prevent persistent writes and to “Control memory load” across runs.
Network lock-down: Disable networking at the container level so model-generated code can’t exfiltrate data or call external APIs.

Core Concepts & Key Points

Concept	Definition	Why it's important
`DockerCommandLineCodeExecutor`	An executor in `autogen-ext.code_executors.docker` that runs commands/code inside a Docker container.	Gives you a configurable sandbox for model-generated code without changing your agent logic.
Sandboxed workspace	The container’s working directory plus any mounted volumes exposed to the code.	Lets you control what the model can read/write and ensures temporary artifacts are isolated from the host.
Network restrictions	Docker-level flags (`--network none`, capabilities, seccomp) that remove external connectivity.	Prevents model-generated code from reaching the internet or lateral-moving inside your network.

How It Works (Step-by-Step)

At a high level you:

Install the AutoGen packages and ensure Docker is running.
Configure a DockerCommandLineCodeExecutor with a restricted image, volumes, user, and network.
Plug the executor into your AutoGen agents/tools so all code execution flows through that sandbox.

1. Installation & prerequisites

Python 3.10 or later is required.

Install the core packages and extensions:

pip install -U "autogen-agentchat" "autogen-core" "autogen-ext"

You also need:

Docker installed and accessible to your Python process.
A base image that contains the runtime you need (for example, Python 3.11).

Example base image (Dockerfile sketch):

FROM python:3.11-slim

# Optional: add system deps / libraries you want available
RUN pip install numpy pandas

# Create non-root user
RUN useradd -ms /bin/bash sandbox
USER sandbox
WORKDIR /workspace

Build it:

docker build -t my-sandboxed-python:latest .

2. Create a `DockerCommandLineCodeExecutor` with a restrictive config

DockerCommandLineCodeExecutor lives in autogen_ext.code_executors.docker. The executor typically exposes parameters to control:

image: Docker image name (e.g., my-sandboxed-python:latest).
Volume mounts (volumes, binds, or equivalent): which host paths are visible.
work_dir: working directory inside the container.
Security/network options (no network, user, capabilities).

Below is a minimal pattern that you can adapt; names may differ slightly depending on the exact version, so check the autogen-ext API reference for final signatures.

from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

# Define a very restricted executor
executor = DockerCommandLineCodeExecutor(
    image="my-sandboxed-python:latest",
    # Only mount a temp directory from the host; mark it read-only if you can
    volumes={
        # host_path: {"bind": container_path, "mode": "ro"} style in many Docker APIs
        "/tmp/autogen-sandbox": {"bind": "/workspace/host_ro", "mode": "ro"},
    },
    work_dir="/workspace",        # internal working dir
    network_disabled=True,        # explicit no-network mode if supported
    user="sandbox",               # non-root user inside the container
    remove=True,                  # remove container after each run
    # Optional: additional Docker run options if exposed
    # extra_docker_args={"network": "none", "pids-limit": 128}
)

If your installed version doesn’t expose a boolean like network_disabled, you can usually pass extra_docker_args={"network": "none"} or equivalent through the executor’s constructor; check the reference docs for the exact parameter.

Practical constraints:

Only mount directories the code must see (e.g., a read-only data snapshot).
Prefer read-only (mode: "ro") mounts; writeable volumes should be separate from your host path.
Use a non-root user in the image to limit damage even inside the container.

3. Wire the executor into an AutoGen agent

In the AutoGen stack, executors are used by tools. For model-generated Python, you typically expose a tool like python that delegates execution to DockerCommandLineCodeExecutor.

This snippet uses AgentChat’s AssistantAgent and a simple “code execution tool” pattern:

import os
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import SelectorGroupChat  # or another pattern
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

# 1. Model client
model_client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",
    api_key=os.environ["OPENAI_API_KEY"],
)

# 2. Docker-based code executor
code_executor = DockerCommandLineCodeExecutor(
    image="my-sandboxed-python:latest",
    volumes={"/tmp/autogen-sandbox": {"bind": "/workspace/host_ro", "mode": "ro"}},
    work_dir="/workspace",
    network_disabled=True,
    user="sandbox",
    remove=True,
)

# 3. Tool wrapper: takes code string, runs via executor
def run_python_in_docker(code: str) -> str:
    # The actual executor API may be async (e.g., await executor.run(...))
    # Adjust to match your installed version.
    result = code_executor.run(
        cmd=["python", "-c", code],
        timeout=20,              # seconds; limit long-running code
    )
    return result.stdout or result.stderr

# 4. Agent with a “tool” style convention
assistant = AssistantAgent(
    name="code_writer",
    model_client=model_client,
    # In 0.4 AgentChat you typically pass tools via the tools argument
    tools={
        "run_python": run_python_in_docker,
    },
)

# 5. Simple interaction
async def main():
    # Ask the agent to write and run code, but force it to use the tool
    user_prompt = """
    Write a short Python script that prints the squares of numbers 1 to 5,
    then execute it using the `run_python` tool and return the output only.
    """
    result = await assistant.run(task=user_prompt)
    print(result.messages[-1].content)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Note that this pattern keeps all model-generated execution inside the Docker sandbox. AutoGen’s runtime (SingleThreadedAgentRuntime / distributed runtimes like GrpcWorkerAgentRuntime) only sees function calls and results.

Common Mistakes to Avoid

Mounting sensitive host paths:
Don’t mount /, /home, or production data volumes into the container. Create a dedicated /tmp/autogen-sandbox with a read-only snapshot or synthetic test data instead.
Forgetting to disable the network:
A default Docker network bridge gives code an outbound path. Explicitly use network_disabled=True or --network none (via extra_docker_args) so all external calls fail by design.
Running as root inside the container:
Always set USER in your Dockerfile and configure the executor to run as that non-root user. Even though containers isolate, root-in-container plus a Docker escape bug is a risk you don’t need.
Unbounded runtime and resources:
If your executor supports timeouts and resource limits (CPU, memory, pids), set them. Without limits, model-generated code can loop forever or exhaust RAM.

Real-World Example

In our internal “agent platform,” we let analysts request ad-hoc data transforms in Python via an AutoGen AgentChat workflow. The LLM writes small scripts to manipulate a read-only CSV snapshot and returns summaries. We cannot let that code anywhere near production databases or the corporate network.

We use DockerCommandLineCodeExecutor with:

A slim Python image that includes only pandas and numpy.
A read-only volume mount of sanitized CSVs at /workspace/data.
No network (--network none) and a non-root sandbox user.
Containers auto-removed after completion.

Analysts get a smooth experience: “Ask, get a script and a result.” From our side, we can prove that:

Code never touches the live warehouse (only the snapshot volume is mounted).
There’s no network path to exfiltrate the data.
Every run is ephemeral, and we can capture logs for audit.

Pro Tip: Treat your Docker image for DockerCommandLineCodeExecutor like a production runtime: pin versions, strip unnecessary tools (curl, compilers, shells), and add only the libraries your workflows need. A smaller image surface makes misuses and escapes far less likely.

Summary

DockerCommandLineCodeExecutor is the clean way to let AutoGen agents run model-generated code without exposing your host or network. By:

Building a minimal, non-root Docker image,
Mounting only the volumes you truly need (preferably read-only),
Disabling networking and setting runtime limits,

you get a well-defined security and privacy boundary around code execution, aligned with the rest of AutoGen’s runtime model. The agents don’t need to know about Docker—they just see a tool that can “run code” and return structured results.

Next Step

Get Started

AutoGen: how do I run model-generated code using DockerCommandLineCodeExecutor with restricted filesystem and no network?

Why This Matters

Core Concepts & Key Points

How It Works (Step-by-Step)

1. Installation & prerequisites

2. Create a `DockerCommandLineCodeExecutor` with a restrictive config

3. Wire the executor into an AutoGen agent

Common Mistakes to Avoid

Real-World Example

Summary

Next Step

Keep Reading

More from AI Agent Automation Platforms

Yuma AI pricing: how are “tickets resolved by AI” counted, and how do automated-ticket packages + overages work?

n8n options for scheduled portal checks (login → extract → alert) with screenshots/run logs for failures

How long does it take to implement Mandolin for intake → benefits → OOP estimation → PA in a multi-site infusion network?

AutoGen: how do I run model-generated code using DockerCommandLineCodeExecutor with restricted filesystem and no network?

Why This Matters

Core Concepts & Key Points

How It Works (Step-by-Step)

1. Installation & prerequisites

2. Create a DockerCommandLineCodeExecutor with a restrictive config

3. Wire the executor into an AutoGen agent

Common Mistakes to Avoid

Real-World Example

Summary

Next Step

Keep Reading

More from AI Agent Automation Platforms

Yuma AI pricing: how are “tickets resolved by AI” counted, and how do automated-ticket packages + overages work?

n8n options for scheduled portal checks (login → extract → alert) with screenshots/run logs for failures

How long does it take to implement Mandolin for intake → benefits → OOP estimation → PA in a multi-site infusion network?

2. Create a `DockerCommandLineCodeExecutor` with a restrictive config