agent framework with distributed workers (multi-process / multi-machine) and message routing

Most teams that outgrow single-LLM scripts hit the same wall: you need an agent framework with distributed workers (multi-process / multi-machine) and message routing that you can reason about and control. That’s exactly where AutoGen’s layered stack—and especially the distributed runtime in autogen-core—starts to matter.

Quick Answer: If you need an agent framework with distributed workers (multi-process / multi-machine) and robust message routing, use AutoGen Core’s distributed Agent Runtime. Combine it with AgentChat for higher-level agent behaviors, and rely on topics/subscriptions instead of hard-coded IDs to route messages across processes and machines.

Why This Matters

Once you move beyond a single-process POC, naive patterns break down: agents can’t all live in one Python process, you need isolation between tenants, and debugging who-talked-to-whom becomes the real problem. A proper agent framework with distributed workers (multi-process / multi-machine) and message routing gives you:

a runtime that knows where agents live,
a consistent way to address them (topics, subscriptions, IDs),
and observable execution paths when something goes wrong in production.

AutoGen’s distributed runtime lets you run agents across machines and languages while keeping the same programming model as the standalone runtime. That means you can prototype locally, then scale out to a host servicer + workers + gateways topology without rewriting your agent logic.

Key Benefits:

Scale across processes and machines: Use workers and gateways so heavy tools or models can run where it makes operational sense.
Deterministic message routing: Use topics, subscriptions, and runtime-enforced addressing instead of ad-hoc “call this agent directly” logic.
Isolation and control: Separate tenants or workloads into different workers while keeping a single logical agent system.

Core Concepts & Key Points

Concept	Definition	Why it's important
Distributed Agent Runtime	A runtime composed of a host servicer, workers, and gateways, used to run agents across processes/machines.	Gives you multi-process / multi-machine execution without changing agent implementations.
Message Routing (Topics & Subscriptions)	A pattern where messages are delivered based on topics and subscribers rather than hard-coded agent IDs.	Decouples producers from consumers, improves portability, and simplifies multi-tenant designs.
AgentChat vs Core	AgentChat is a high-level API for building multi-agent apps; `autogen-core` provides the underlying event-driven runtime and primitives.	Start with AgentChat for productivity, drop to Core when you need fine‑grained runtime, routing, and scaling control.

How It Works (Step-by-Step)

At a high level, building an agent framework with distributed workers (multi-process / multi-machine) and message routing in AutoGen looks like this:

Prototype the conversation flow with AgentChat (single process).
Use AssistantAgent, Teams, and patterns like selector group chat to confirm your agent roles and tools.
Move the same agents onto AutoGen Core’s distributed runtime.
Stand up a host servicer and one or more workers (possibly on different machines). Agents run on workers and talk via gateways; the host servicer handles routing and state.
Introduce topic-based message routing and subscriptions.
Instead of direct calls to specific agent IDs, register subscriptions based on types and routing rules so that messages find the right agent instances—even when they’re distributed.

Below I’ll walk through concrete commands and code so you can actually run this end-to-end.

Installation

Python 3.10 or later is required.

1. Core + AgentChat + Extensions

pip install -U "autogen-core" "autogen-agentchat" "autogen-ext[openai]"

If you’re using Azure OpenAI, you’ll typically add:

pip install -U "autogen-ext[azure]"

Set your model provider environment variables (for example, for OpenAI):

export OPENAI_API_KEY="sk-..."

Step 1: Prototype Agents in a Single Process (AgentChat)

AgentChat is the recommended starting point if you’re new to AutoGen. It hides the runtime details and gives you “agents and teams” primitives.

Minimal single-process AgentChat example

# pip install -U "autogen-agentchat" "autogen-ext[openai]"

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gpt-4o-mini",  # choose a supported model
)

assistant = AssistantAgent(
    name="research_assistant",
    model_client=model_client,
    system_message="You are a concise research assistant.",
)

async def main():
    result = await assistant.run("Summarize the pros and cons of multi-agent AI systems.")
    print(result.messages[-1].content)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

At this stage, you’re not thinking about distributed workers yet; you’re validating:

your agent’s role and behavior,
the prompts and tools it will need,
basic message flows.

Once this feels right, you move down one layer to autogen-core for runtime control.

Step 2: Understand the Runtime: Standalone vs Distributed

autogen-core gives you an event-driven framework for multi-agent systems. The key choice is the runtime:

Standalone runtime (SingleThreadedAgentRuntime):
One process, simpler debugging, great for local dev, unit tests, and small workflows.
Distributed runtime (host servicer + workers + gateways):
Suitable for multi-process and multi-machine deployments—agents can run on different machines, in different languages, but share one logical system.

The design goal is that agents work the same way in both runtime types, so you can switch runtimes without changing agent implementations. That’s critical if you want to start small and scale out later.

Use the standalone runtime when:

you’re iterating quickly on behaviors,

you want simpler logs and debugging,

latency is low and you don’t need multi-process isolation.

Move to the distributed runtime when:

you need to scale tools or models across machines,

you’re partitioning workloads/tenants,

you need isolation (different processes/containers) for regulated workloads.

Step 3: Bring Topics and Subscriptions into the Picture

When we talk about “message routing” in a real agent framework with distributed workers (multi-process / multi-machine), we’re talking about more than “call this agent by ID.”

AutoGen Core leans on topics and subscriptions:

Topic = (Topic Type, Topic Source)
Often represented as a string: Topic_Type/Topic_Source.
Subscriptions say: “deliver messages of topic type X to agents of type Y.”

Example from the docs pattern:

# This is conceptual pseudo-code-style, not a direct import:
TypeSubscription(topic_type="default", agent_type="triage_agent")

Why I recommend topics/subscriptions over hard-coded agent IDs:

You can add new agents that handle a topic without changing producers.
You can run multiple copies of the same agent type on different workers.
Multi-tenant routing can key off topic source instead of special-casing IDs.

When you deploy to the distributed runtime, the host servicer uses these routing rules to deliver messages to the right worker and agent instance.

Step 4: Spin Up a Distributed Runtime

Below is a minimal, conceptual outline for running a distributed runtime. Exact APIs may differ slightly by version, so always cross-check with the latest docs, but the architecture is stable:

Start the host servicer.
This process listens for workers, maintains connection state, and routes messages.
Start workers with gateways.
Each worker process runs one or more agents and connects to the host servicer via a gateway.
Register agents (and their subscriptions) on workers.
Once agents are alive, they advertise themselves so the host knows where to route messages.

Example: Standalone to Distributed (Conceptual Flow)

Standalone (single process):

from autogen_core import SingleThreadedAgentRuntime

runtime = SingleThreadedAgentRuntime()

# Register agents directly in process
runtime.register_agent(my_agent)

# Send messages
result = runtime.send_message(
    sender="user",
    receiver="my_agent",
    content="Process this request."
)

Distributed:

# host_servicer.py
from autogen_core import HostServicerRuntime  # naming may vary by version

servicer = HostServicerRuntime(host="0.0.0.0", port=9000)
servicer.start()

# worker.py
from autogen_core import WorkerRuntime, GatewayClient  # conceptual imports

gateway = GatewayClient(servicer_host="hostservicer", servicer_port=9000)
worker_runtime = WorkerRuntime(gateway=gateway)

# Register local agents on this worker
worker_runtime.register_agent(my_agent)
worker_runtime.start()

Now, any client talking to the host servicer can send a message that will be routed to my_agent, regardless of which worker it lives on.

Note: Names like HostServicerRuntime, WorkerRuntime, and GatewayClient are representative of the documented architecture (host servicer + workers + gateways). Always consult the exact class names and signatures in your installed version and the docs, particularly if you’re migrating from 0.2.x to 0.4.x because some APIs have changed.

Step 5: A Minimal End-to-End Flow with TaskResult

A good sanity check for your agent framework with distributed workers (multi-process / multi-machine) is to drive a simple task through the runtime and inspect a TaskResult.

In AgentChat, TaskResult shows you:

the final messages,
stop_reason so you know why the workflow ended (finished, max turns, guardrail, etc.).

Even in a distributed setup, you should keep this habit of interrogating results explicitly.

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat import run_chat_task
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")

assistant = AssistantAgent(
    name="calc_agent",
    model_client=model_client,
    system_message="You are a precise calculator. Answer with only the final number.",
)

async def main():
    result = await run_chat_task(
        agents=[assistant],
        messages=[{"role": "user", "content": "What is 17 * 23?"}],
        max_turns=4,
    )

    print("Stop reason:", result.stop_reason)
    print("Final answer:", result.messages[-1].content)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

As you move to distributed runtimes, this style of structured results lets you reason about failure modes across processes and machines rather than just eyeballing logs.

Common Mistakes to Avoid

Hard-coding agent IDs for routing:
This ties your topology to your code and makes distributed deployment painful.
How to avoid it: Use topic-based routing and TypeSubscription-style patterns so agents can be added or moved between workers without changing producers.
Skipping a standalone runtime phase:
Going straight to the distributed runtime before you understand your agents’ behaviors will slow you down.
How to avoid it: Start with SingleThreadedAgentRuntime or AgentChat, write minimal end-to-end flows, then “lift and shift” to the distributed runtime once the behaviors are stable.

Real-World Example

In our regulated environment, we started with a single-process AgentChat prototype for a “document triage” workflow:

A triage_agent read user requests.
It delegated to either a research_agent or a policy_agent.
Everything ran in one container with shared tools.

Once that workflow proved useful, we needed to:

put research_agent near GPU-heavy models on a different machine,
keep policy_agent in a locked-down environment with restricted tools,
route tenant-specific messages without leaking context.

We moved the workflow to AutoGen Core’s distributed runtime:

The host servicer ran in our control plane.
A GPU worker hosted research_agent with model-heavy tools.
A regulated worker hosted policy_agent with a strict tool set.

Instead of hard-coded calls like “triage_agent → policy_agent,” we:

defined topics for triage_decision/default,
subscribed policy_agent and research_agent to the appropriate topics,
used tenant-specific topic sources to isolate data.

The result was a single logical “triage system” distributed across machines, with clear routing rules and the ability to add new agents (e.g., billing_agent) by only updating subscriptions, not consumer code.

Pro Tip: Treat topics like public APIs: once you define a topic type and source scheme, avoid breaking changes. It lets you evolve your agents and workers independently while keeping routing contracts stable.

Summary

If you’re serious about an agent framework with distributed workers (multi-process / multi-machine) and message routing, focus on the runtime before you obsess over prompts or models. AutoGen gives you a layered path:

AgentChat to prototype multi-agent behaviors in a single process.
autogen-core with a standalone runtime for early event-driven workflows.
The distributed runtime (host servicer + workers + gateways) when you need multi-process / multi-machine deployments.

By leaning on topics, subscriptions, and structured results like TaskResult(stop_reason=...), you get a system you can scale and debug—not just a clever demo.

Next Step

Get Started