How do I migrate from Amazon MSK to Redpanda with minimal downtime and no client changes?
Data Streaming Platforms

How do I migrate from Amazon MSK to Redpanda with minimal downtime and no client changes?

11 min read

Most teams don’t migrate off Amazon MSK because it’s easy. They do it because the operational drag is killing them—but they still can’t afford downtime or a risky client rewrite. The good news: because Redpanda is fully Kafka API compatible, you can move from MSK to Redpanda with near-zero downtime and no client code changes if you structure the migration correctly.

This guide walks through a practical, production-grade path: how to plan, run, and cut over a migration from Amazon MSK to Redpanda so your producers and consumers barely notice the switch.


Quick Answer: Use Redpanda’s Kafka compatibility and topic-level mirroring (with tools like MirrorMaker 2 or Kafka Connect) to run MSK and Redpanda side-by-side, keep data in sync, then switch DNS/bootstrap servers once you’ve verified lag, offsets, and consumer behavior. No protocol-level client changes required.

The Quick Overview

  • What It Is: A step-by-step, low-risk migration pattern to move Kafka workloads from Amazon MSK to Redpanda using Kafka-compatible tooling, replication, and DNS-based cutover.
  • Who It Is For: Platform, data, and infra teams running MSK who want Kafka semantics with lower TCO, simpler operations, and better latency—without touching client code.
  • Core Problem Solved: Escaping MSK’s cost/complexity and dependency sprawl while preserving topic data, offsets, SLAs, and client behavior.

How It Works

At a high level, you treat Redpanda as a drop-in Kafka broker layer, run it in “shadow mode” next to MSK, mirror data between the two, validate, then flip clients over via configuration (ideally DNS). Because Redpanda speaks the Kafka protocol, your clients don’t know or care that the broker under them has changed.

The workflow looks like this:

  1. Plan & Prepare:
    Size and deploy Redpanda, map your MSK topics and security model, and define a migration window and rollback plan.

  2. Mirror & Verify:
    Set up one-way replication from MSK → Redpanda, keep topics and ACLs aligned, and validate that data, throughput, and consumer offsets behave as expected.

  3. Cut Over & Decommission:
    Move producers, then consumers to Redpanda with DNS/bootstrap changes, run both systems in parallel for a short period, then turn off MSK once you’re confident.

Let’s break this down piece by piece.


Phase 1: Plan & Prepare

1. Inventory your MSK environment

Start by making the invisible visible:

  • Clusters:

    • Number of MSK clusters and regions
    • Broker counts and instance types
    • Storage type and retention configurations
  • Topics & traffic:

    • Topic list, partitions, replication factor
    • Throughput (MB/s or GB/min), peak vs steady-state
    • Consumer groups and lag profiles
  • Security:

    • Authentication: IAM, TLS mutual auth, SASL (SCRAM, OIDC, etc.)
    • Authorization: ACLs per topic/consumer group
    • Network: VPCs, subnets, security groups

This inventory becomes your migration checklist.

2. Right-size your Redpanda cluster

Redpanda’s C++ engine and single-binary architecture are measurably more efficient than a typical Java-based Kafka stack, so you can often run with fewer nodes:

  • Use your peak throughput from MSK as the target baseline.
  • Map that to Redpanda sizing (e.g., if MSK needs N brokers, you may get equivalent or better performance with significantly fewer Redpanda nodes).
  • Plan for:
    • Enough local storage (or tiered storage if you keep large histories)
    • Zonal redundancy
    • Future growth (10–30% headroom)

Decide your deployment model:

  • Self-hosted in your VPC / Kubernetes (common MSK replacement path)
  • Redpanda BYOC or managed if you want SaaS convenience but keep data in your account
  • Air-gapped or private environments if you have strict compliance requirements

3. Network and DNS alignment

To keep client changes minimal (ideally zero code changes):

  • Deploy Redpanda into the same or peered VPC as MSK, so security groups and routes are predictable.
  • Plan a DNS-based cutover:
    • Use a CNAME like kafka-bootstrap.your-domain.internal currently pointing at MSK.
    • When it’s time to switch, point this record to your Redpanda brokers/load balancer.
  • Keep the same ports and TLS mode (if possible) to avoid client configuration churn.

4. Align authentication and authorization

If you use:

  • TLS mTLS: Generate Redpanda-compatible broker and client certs signed by your CA. Configure Redpanda listeners for TLS.
  • SASL (SCRAM/OIDC/Kerberos): Mirror credentials or token mechanisms into Redpanda’s auth configuration.
  • IAM: If you’re moving away from MSK IAM auth, you can migrate to OIDC or SASL in Redpanda and update only client configs, not code.

The goal: once apps switch bootstrap servers, their credentials still work and topic-level access is intact.


Phase 2: Mirror & Verify

Now you run MSK and Redpanda in parallel.

1. Create matching topics in Redpanda

For each topic in MSK:

  • Create the same topic name in Redpanda.
  • Match:
    • Partitions
    • Replication factor
    • Compaction vs delete policies
    • Retention (time and/or size)
    • Key/record formats (Avro, Protobuf, JSON—your serializers don’t change)

You can script this via:

  • kafka-topics.sh against MSK to dump configs.
  • A script or tooling to create equivalent topics via rpk (Redpanda CLI) or Kafka Admin API.

2. Set up replication: MSK → Redpanda

Use Kafka-native replication tooling. Common options:

  • MirrorMaker 2 (MM2)
  • Kafka Connect with MirrorSourceConnector
  • Strimzi/Kafka Connect operators if you already have them

Core idea: MSK is the source of truth, Redpanda is the replica until you cut over.

Config highlights:

  • source.bootstrap.servers: MSK brokers
  • target.bootstrap.servers: Redpanda brokers
  • Topic selection: use whitelist/regex to control what you mirror
  • Preserve topic names one-to-one (disable renaming/topic.rename.format)

Make sure:

  • TLS/SASL settings match your clusters.
  • Replication factor and partitions are respected.

3. Validate data and performance

While MSK is still primary:

  • Produce test traffic into MSK (if you don’t already have real load).
  • Use consumers against Redpanda to:
    • Verify identical message counts
    • Validate ordering per partition
    • Confirm schema compatibility (no serialization issues)

Monitor:

  • Lag in MM2/Connect (source vs target)
  • Redpanda’s CPU, disk, and network during peak flows
  • End-to-end latency for a sampled message path

If you’re planning a stateful consumer cutover:

  • Make sure consumer groups can be reassigned and maintain correctness with new offsets.

Phase 3: Cut Over & Decommission

1. Choose a cutover strategy

You have two main patterns:

  1. Big-bang cutover
    All producers and consumers switch bootstrap servers within a defined window.
    Pros: Simple reasoning, clear rollback.
    Cons: Requires tight coordination.

  2. Phased cutover
    Move producers first, then consumers (or vice versa), per service or domain.
    Pros: Lower blast radius, easy to test.
    Cons: Requires careful duplication logic if both clusters get writes.

For “minimal downtime and no client changes,” a phased cutover with DNS is usually safest.

2. Move producers from MSK to Redpanda

Two options:

  • Option A: DNS switch (best case)
    If producers use kafka-bootstrap.your-domain.internal:

    1. Lower TTL ahead of time (e.g., 30–60 seconds).
    2. Switch DNS target from MSK bootstrap to Redpanda.
    3. Restart or let producers refresh metadata.
  • Option B: Config-only change
    If DNS wasn’t abstracted:

    1. Update bootstrap.servers in producer configs to your Redpanda brokers/load balancer.
    2. Redeploy apps with no code change, just configuration.

During this stage:

  • Keep MSK → Redpanda replication running for topics still being written to MSK.
  • For topics where producers now write directly to Redpanda, you can:
    • Stop mirroring, or
    • Switch direction if any consumers still depend on MSK (short-lived bridging).

3. Move consumers with offset alignment

Consumers can move once producers are stable on Redpanda.

Key question: Do you need exact offset continuity?

  • If your consumers are idempotent or can tolerate reprocessing, you can:

    • Start them at earliest or latest on Redpanda.
    • Accept some replays or a small gap.
  • If you need strict continuity:

    • Replicate committed offsets from MSK to Redpanda. Possible approaches:
      • Mirror consumer group offsets via MM2’s offset translation.
      • Export offsets from MSK (e.g., using Kafka admin tools) and import into Redpanda using admin APIs.
    • Once offsets are in Redpanda, start consumers pointing to Redpanda with the same group IDs.

Execute:

  • Update consumer bootstrap servers (via DNS or config).
  • Monitor lag and processing health on Redpanda.
  • Verify that no unexpected replays or gaps occur, especially for stateful services.

4. Run in dual mode briefly

Once both producers and consumers are on Redpanda:

  • Keep MSK alive but quiet for a short “observation” period:
    • No new writes (unless you’re explicitly bridging).
    • Keep MSK → Redpanda replication either paused or stopped, depending on direction.

Use this window to:

  • Stress test Redpanda under real production load.
  • Confirm metrics, alerting, and dashboards are accurate.
  • Validate that downstream systems (sinks, warehouses, search, etc.) are consuming Redpanda topics correctly.

5. Decommission MSK

When you’re confident:

  1. Shut down replication jobs.
  2. Stop any remaining consumers/producers pointing to MSK.
  3. Remove MSK from DNS, configs, and IaC templates.
  4. Decommission the MSK clusters to stop paying for them.

Keep backups/snapshots and IaC history in case you ever need a forensic reconstruction of your pre-migration environment.


Features & Benefits Breakdown

Why go through all this to land on Redpanda instead of just reshuffling Kafka brokers inside AWS?

Core FeatureWhat It DoesPrimary Benefit
Kafka API compatibilitySpeaks the same Kafka protocol your clients are already using.Migrate from MSK with no client code changes, just bootstrap/config updates.
Single-binary, zero-dependency architectureCollapses multiple Kafka ecosystem components into one C++-based broker binary.Simplifies day-two ops, with fewer moving parts and lower TCO vs MSK.
Performance-engineered streaming coreMaximizes hardware utilization and can reduce latency by up to 10x vs Kafka.Handles higher throughput with fewer nodes, letting you cut costs and complexity.

You also get enterprise-grade capabilities like tiered storage, SSO, audit logging, and a managed/BYOC option if you don’t want to operate clusters yourself.


Ideal Use Cases

  • Best for teams hitting MSK cost or complexity ceilings:
    Because Redpanda delivers Kafka-compatible streaming with up to 6x TCO savings through reduced compute footprint, storage costs, and operational overhead.

  • Best for teams planning AI/agent workloads on top of streaming data:
    Because Redpanda is evolving into an Agentic Data Plane—the plane agents run on—with a governed access layer, unified SQL across streams + history, and enterprise-grade control surfaces for production-safe AI.


Limitations & Considerations

  • Cross-region or multi-cluster complexity:
    If you run several MSK clusters across regions/accounts, plan migration per cluster. Use clear boundaries (per domain or region) rather than a single global “flip” to avoid tangled replication paths.

  • Offset translation edge cases:
    Some consumer patterns (stateful stream processors, custom offset storage) may need extra care. Always test offset migration in a staging environment and be prepared to tolerate controlled reprocessing in worst-case scenarios.


Pricing & Plans

Redpanda offers multiple deployment models and editions so you can choose the level of control vs convenience that fits your team:

  • Community Edition (self-managed):
    Best for developers and teams prototyping or running smaller workloads who want to trial Redpanda locally or in their own infra at no cost.

  • Enterprise / Managed (including BYOC):
    Best for production teams with strict SLOs, governance, and compliance needs, requiring features like SSO, audit logging, tiered storage, and 24x7 support—while still keeping data in their own VPC or even air-gapped.

For detailed pricing and plan comparisons, you’ll want to talk directly with Redpanda so costs can be mapped to your current MSK footprint and growth projections.


Frequently Asked Questions

Do I need to change my Kafka client libraries to migrate from MSK to Redpanda?

Short Answer: No. Redpanda is fully Kafka API compatible, so you don’t need to change client libraries—just bootstrap servers and possibly security configs.

Details:
Redpanda implements the Kafka wire protocol. Your existing producers and consumers (Java, Go, Python, Node, .NET, etc.) connect using the same Kafka clients they use for MSK. In most migrations, the only changes are:

  • bootstrap.servers (or equivalent) pointing to Redpanda instead of MSK.
  • TLS/SASL configuration alignment if your auth model changes (e.g., IAM → SASL/OIDC).

This is what makes a “no client changes” migration realistic: you’re swapping the broker plane, not the client stack.


How do I minimize or eliminate downtime during the MSK → Redpanda cutover?

Short Answer: Run MSK and Redpanda side-by-side with replication, then switch clients using DNS or config during a controlled window while monitoring lag and health.

Details:
The core playbook for minimal downtime is:

  1. Deploy Redpanda and mirror topics from MSK.
  2. Validate that every topic needed in production is present and in sync.
  3. Lower DNS TTLs or prepare a config rollout.
  4. Switch producers first (MSK → Redpanda).
  5. Align offsets and switch consumers.
  6. Keep MSK idle but available while you observe behavior.
  7. Decommission MSK once you’re confident.

If something goes sideways, your rollback is simply:

  • Point clients back to MSK.
  • Resume treating MSK as the source of truth.
  • Use the audit and metric trails to debug.

Summary

You can absolutely migrate from Amazon MSK to Redpanda with minimal downtime and no client code changes. The key is to treat Redpanda as a Kafka-compatible drop-in, run it alongside MSK long enough to mirror topics and test behavior, then flip the switch via DNS or configuration once you’re confident.

You get:

  • The same Kafka semantics and clients you already rely on.
  • A simpler, faster, single-binary streaming platform.
  • Lower operational overhead and TCO, with room to grow into an agent-first data plane when you’re ready.

If MSK has become the bottleneck—operationally, financially, or architecturally—you don’t have to live with it. You just need a disciplined migration path.


Next Step

Get Started