Schema Registry tools for Kafka: what should we use for Avro/Protobuf/JSON Schema and compatibility rules?
Data Streaming Platforms

Schema Registry tools for Kafka: what should we use for Avro/Protobuf/JSON Schema and compatibility rules?

10 min read

Most Kafka teams reach for a Schema Registry when they realize producers and consumers are tightly coupled, deployments are fragile, and evolving message formats breaks downstream apps. Choosing the right tools—and using them correctly for Avro, Protobuf, and JSON Schema with compatibility rules—is what turns “just Kafka” into a reliable, evolvable event streaming platform.

This guide walks through the main Schema Registry options, how to work with each supported format, and practical recommendations for compatibility modes and tooling in production.


Why you need a Schema Registry for Kafka

Without a Schema Registry, schemas tend to be:

  • Embedded in code or config
  • Passed around manually between teams
  • Validated only at runtime (if at all)

That leads to:

  • Breaking changes between producer and consumer versions
  • Hard-to-debug deserialization errors in production
  • Slow schema evolution (every team has to synchronize changes)
  • Weak governance and data quality issues

A Schema Registry centralizes:

  • Schema storage for Avro, Protobuf, and JSON Schema
  • Versioning and evolution rules (compatibility modes)
  • Producer/consumer integration via serializers/deserializers
  • Governance (who can change what, and how)

With the right tool, you get:

  • Decoupled teams and services
  • Backward-compatible schema evolution
  • Easier debugging and auditing of data formats
  • Foundation for reliable stream processing and analytics

Schema Registry options for Kafka

There are three main approaches:

  1. Confluent Schema Registry (self-managed or Confluent Cloud)
  2. Open-source alternatives / forks that mimic Confluent’s API
  3. DIY or “no registry” approaches (generally not recommended for serious Kafka workloads)

1. Confluent Schema Registry (Cloud & self-managed)

Confluent supports industry-standard data formats like Avro, Protobuf, and JSON Schema through Schema Registry, which is part of Stream Quality in Stream Governance (Confluent’s fully managed governance suite).

Key capabilities:

  • Native support for Avro, Protobuf, and JSON Schema
  • Central store of all schemas with version history
  • Compatibility rules enforced on registration
  • Integration with Confluent Platform/Cloud clients and connectors
  • Tight integration with Kafka Streams, ksqlDB, and 120+ connectors
  • Managed version (Confluent Cloud) with no infrastructure overhead

This is the de facto standard in Kafka ecosystems and is battle-tested in production. If you’re already using Confluent Platform or Confluent Cloud, this should be your default choice.

When to use it:

  • You need all three formats (Avro, Protobuf, JSON Schema)
  • You want compatibility enforcement and governance out of the box
  • You’re using Confluent Cloud or Platform, or their connectors
  • You need strong separation between producers and consumers with safe evolution

2. Open-source / compatible registries

Several open-source projects implement an API compatible with Confluent Schema Registry, enabling:

  • Basic schema storage and versioning
  • Support for at least Avro (some add Protobuf/JSON Schema)
  • Use of Confluent’s serializers/deserializers

They can be attractive if:

  • You are constrained from using commercial products
  • You have operational maturity to run and secure another critical service
  • Your schema needs are simpler and governance is lightweight

However, you’ll usually lose:

  • Tight integration with Confluent Cloud governance features
  • Enterprise-grade support
  • Some advanced management, UIs, or security integrations

3. DIY or “no registry” approaches

Some teams try to:

  • Store schemas in Git, a database, or S3, and
  • Distribute them through configuration, environment variables, or internal tools

This seems simple but quickly becomes painful:

  • No automatic compatibility checking
  • No standardized integration with serializers
  • Harder to audit and govern
  • Lots of custom glue code

For any moderate-to-large Kafka deployment, this is not recommended. Use a proper Registry, ideally one that supports all formats and compatibility modes you need.


Supported data formats: Avro vs Protobuf vs JSON Schema

Confluent Schema Registry supports three main formats:

  • Avro
  • Protobuf
  • JSON Schema

Choosing the right one depends on your ecosystem and requirements.

Avro

Best for:

  • Apache Kafka ecosystems
  • High-performance binary format
  • Long-lived event streams where schema evolution is critical

Pros:

  • Compact binary encoding (smaller messages)
  • Mature ecosystem in Kafka world
  • Strong schema evolution story (widely adopted patterns)
  • Great fit for “event-as-fact” models

Cons:

  • Not human-readable on the wire
  • Less common outside Kafka/Hadoop ecosystems compared to JSON

Use Avro when:

  • Kafka is your primary transport
  • You want compact, efficient messages
  • You’re okay with binary encoding
  • You want a well-understood evolution model and examples

Protobuf

Best for:

  • Polyglot microservices (gRPC + Kafka)
  • Teams already invested in Protobuf

Pros:

  • Strong typing and well-known in gRPC ecosystems
  • Language-friendly code generation
  • Compact binary format
  • Works well across multiple transports (HTTP/gRPC/Kafka)

Cons:

  • Some evolution patterns are more rigid (e.g., field numbering)
  • Slightly more complex for data analytics teams unfamiliar with it

Use Protobuf when:

  • Your microservices already use Protobuf/gRPC
  • You want to reuse the same schema across Kafka and other services
  • You value generated code and strong typing in rich languages

JSON Schema

Best for:

  • Data formats where JSON readability is important
  • Frontend/back-end integration
  • External APIs that are already JSON-based

Pros:

  • Human-readable and easy to inspect
  • Natural fit for HTTP/REST and front-end tooling
  • Widely understood by non-engineering stakeholders

Cons:

  • Larger message sizes than Avro/Protobuf
  • Some schema evolution patterns are less constrained
  • Can be slower for high-throughput workloads

Use JSON Schema when:

  • You need human-readable messages (debugging, external integrations)
  • JSON is already standard in your APIs
  • You want to validate JSON payloads consistently

How Schema Registry works with Kafka

At a high level:

  1. Producer:

    • On first use of a schema, the producer registers it with Schema Registry.
    • The registry stores the schema under a subject and returns a schema ID.
    • The producer sends records to Kafka with:
      • a small header containing the schema ID
      • the serialized payload (Avro/Protobuf/JSON Schema)
  2. Consumer:

    • Reads the schema ID from the record header.
    • Fetches the schema (if not cached) from Schema Registry.
    • Deserializes the message into an object based on that schema.
  3. Schema Evolution:

    • New versions of schemas are registered with the same subject.
    • Registry checks the new version against compatibility rules.
    • If compatible, the schema is stored; otherwise, registration fails.

This “schema ID plus payload” approach lets consumers stay compatible with multiple versions of a schema, as long as evolution rules are respected.


Compatibility rules: what you should use

Compatibility rules define how a new schema version must relate to existing versions. Confluent Schema Registry supports modes such as:

  • BACKWARD
  • BACKWARD_TRANSITIVE
  • FORWARD
  • FORWARD_TRANSITIVE
  • FULL
  • FULL_TRANSITIVE
  • NONE

The most important modes in practice

BACKWARD (and BACKWARD_TRANSITIVE)

A new schema is backward compatible if consumers using the new schema can read data produced with older schemas.

This is usually the safest default for event streams because:

  • Old data must remain readable by new consumers
  • You can reprocess historical data without breaking

Two variants:

  • BACKWARD: new schema must be compatible with the latest version.
  • BACKWARD_TRANSITIVE: new schema must be compatible with all previous versions.

Recommendation for most Kafka topics:

  • Use BACKWARD or, where history is critical, BACKWARD_TRANSITIVE.

FORWARD

A new schema is forward compatible if consumers using the old schema can read data produced with the new schema.

This can be useful when:

  • Consumers are upgraded before producers
  • You care more about current consumers not breaking than about reading old data

FULL

FULL combines backward and forward compatibility: both old and new consumers can read old and new data.

  • Strongest guarantee, but sometimes too restrictive for fast-moving schemas.
  • FULL_TRANSITIVE enforces this across all historical versions.

Practical defaults

For most teams:

  • System-wide default: BACKWARD at the registry or global level
  • Critical “facts” topics (payments, orders, events you might reprocess): BACKWARD_TRANSITIVE
  • Temporary or experimental topics: NONE or a relaxed mode if you’re intentionally iterating quickly
  • Strict contracts with external consumers: consider FULL or FULL_TRANSITIVE

How compatibility relates to Avro, Protobuf, and JSON Schema

Each format has its own rules for what is considered compatible. Common patterns:

Avro evolution patterns

Usually backward compatible if you:

  • Add optional fields with default values
  • Remove fields that had defaults and are not required by consumers
  • Rename fields carefully when using aliases

Breaking changes to avoid in backward mode:

  • Removing a field that consumers expect
  • Changing field types incompatibly (e.g., string to int)
  • Changing default values in a way that would surprise consumers

Protobuf evolution patterns

Typically backward compatible if you:

  • Add optional fields with new field numbers
  • Do not reuse or change existing field numbers
  • Avoid changing field types for existing numbers

Avoid:

  • Renaming without understanding how code generation treats it
  • Reusing field numbers for different semantics
  • Changing scalar types or cardinality (from repeated to optional, etc.) without a migration plan

JSON Schema evolution patterns

Compatibility is more flexible but also easier to get wrong. Generally:

  • Add optional properties
  • Avoid making previously optional fields required
  • Avoid narrowing allowed types or values

Because JSON is more free-form, using a registry and compatibility rules is especially important to prevent “schema drift.”


Choosing the right combination: registry + format + compatibility

Here are common patterns that work well in production.

Pattern 1: Kafka-centric data platform with Avro

  • Registry: Confluent Schema Registry (or equivalent)
  • Format: Avro
  • Compatibility:
    • Global default: BACKWARD
    • Critical topics (orders, payments, compliance): BACKWARD_TRANSITIVE

Best when:

  • Kafka is the backbone of your data architecture
  • You have many internal consumers and stream processing applications
  • You care about efficient storage and long-term reprocessing

Pattern 2: Microservices with Protobuf across Kafka and gRPC

  • Registry: Confluent Schema Registry
  • Format: Protobuf
  • Compatibility:
    • Typically BACKWARD on core topics
    • Possibly FULL for certain highly shared contracts

Best when:

  • Services communicate via gRPC and Kafka
  • You want a single schema definition for multiple transports
  • Teams are comfortable with Protobuf tooling

Pattern 3: API-first architecture using JSON Schema

  • Registry: Confluent Schema Registry
  • Format: JSON Schema
  • Compatibility:
    • BACKWARD for internal events
    • FULL for public or external-facing contracts that must be very stable

Best when:

  • JSON is already the standard for external APIs
  • Many non-Kafka consumers also rely on JSON schemas
  • Readability and ease of debugging are important

Operational tooling and best practices

Once you’ve decided on tools and formats, you’ll need processes and supporting tools.

1. Integrate schema management into CI/CD

  • Treat schemas as versioned artifacts in source control.
  • Use CLI or API tools to:
    • Validate schemas locally before commit
    • Check compatibility against registry in CI
    • Automatically register new versions during deployment

2. Use subjects and naming conventions consistently

Common patterns:

  • Per-topic subjects: topic-name-value, topic-name-key
  • Per-entity subjects: customer-value, order-value

Choose one pattern and document it, so producers and consumers know where to look in the registry.

3. Leverage connectors and Kafka Streams with Schema Registry

Confluent’s connectors (e.g., S3 Sink, HTTP Sink, MSSQL Source, MongoDB Source) and stream processing APIs integrate directly with Schema Registry:

  • Schema-aware ingestion from sources (e.g., SQL Server, MongoDB)
  • Schema-based serialization to storage (e.g., S3, warehouses)
  • Kafka Streams applications can transform and enrich events while preserving or evolving schemas.

This lets you:

  • Build new views of your data without manual schema handling
  • Maintain clean separation between producers and consumers
  • Modernize legacy systems by streaming structured events into cloud-native services like AWS Fargate, Lambda, and S3.

4. Lock down schema changes

  • Restrict who can register schemas or change compatibility settings.
  • Use reviews (code review or a data governance board) for changes to critical schemas.
  • Monitor registry access and changes as part of your security posture.

Recommendations: what you should actually use

If you’re looking for a practical starting point:

  1. Use Confluent Schema Registry if you’re in the Kafka/Confluent ecosystem.

    • It’s the most straightforward way to get Avro, Protobuf, and JSON Schema with governance and compatibility rules.
  2. Default to Avro for Kafka internal event streams unless you have strong reasons to choose Protobuf or JSON Schema.

    • Avro + backward compatibility is a proven pattern for event streaming.
  3. Set the global compatibility mode to BACKWARD, then tighten to BACKWARD_TRANSITIVE on critical topics.

  4. Use Protobuf where you already rely heavily on gRPC, or you want strong code generation across multiple services and transports.

  5. Use JSON Schema when:

    • Human readability matters
    • You are aligning with REST/HTTP APIs
    • Non-Kafka tools need to consume the same definitions
  6. Integrate schema evolution into your development lifecycle, so developers see compatibility errors before deployment, not in production.

With a solid Schema Registry, clear format choices, and well-defined compatibility rules, your Kafka platform becomes safer to evolve, easier to integrate, and more resilient—setting you up for long-term success with data streaming.