KEMBAR78
Getting Started with Confluent Schema Registry | PDF
Getting Started with Confluent
Schema Registry
Patrick Druley
Senior Solution Engineer @confluentinc
Agenda
2
1. Schema-less Kafka
2. Schema Registry Basics
3. 3 Good Habits
4. Schema Validation Demo
https://www.linkedin.com/in/patrickdruley/
twitter: @PatrickLovesAK
Schema-less Kafka
“Someone changed
the date field from
unix timestamp to
datetime and broke
all our reporting
dashboards”
“I need to add a
new field to
support new
functionality, but I
don’t want to
update ALL the
potential
downstream
consumers of this
data”
“My compliance
team wants an
audit to ensure
there is no PII data
in Kafka, there’s no
easy way to get all
the message
metadata”
“My data is
‘democratized’ in
Kafka, but my dev
teams have no way
to know what data
is in Kafka without
domain knowledge
or coming to my
platform team”
“My event is JSON” - Producer
5
{
id : 1,
name: “fox”
}
Serialization humans-topic
101010111100
Producer Consumer
JSON
“Sounds great” - Consumer
6
{
id : 1,
name: “fox”
}
Serialization humans-topic Deserialization
{
id : 1,
name: “fox”
}
101010111100 101010111100
Producer Consumer
JSON JSON
Time Passes
7
{
id : 1,
name: “fox”
}
{
id : 2,
name: “jet”
}
{
id : 3,
name: “oak”
}
humans-topic
Tail Head
“I added a field” - Producer
8
{
id : 4,
name: “roc”,
Lname: “druley”
}
Serialization humans-topic Deserialization
{
id : 3,
name: “oak”
}
101010111100 101010111100
Producer Consumer
JSON JSON
“You did what?” - Consumer
9
{
id : 4,
name: “roc”,
Lname: “druley”
}
Serialization humans-topic Deserialization
{
id : 1,
name: “patrick”,
Lname: “druley”
}
101010111100 101010111100
Producer Consumer
JSON JSON
ERROR:
Lname doesn’t exist!
What Happened?
10
{
id : 1,
name: “patrick”
}
{
id : 2,
name: “roc”
}
{
id : 3,
name: “oak”
}
humans-topic
Tail Head
{
id : 4,
name: “roc”,
Lname: “druley”
}
Consumer code breaks:
#new field Lname
Lname = humans[“Lname”]
KeyError: 'Lname'
Consumer
But there’s no “Lname”
in event 3.
Schema Registry Basics
Confluent Schema Registry
12
Schema Compatibility - Avro
13
Compatibility Type Changes Allowed
Check against which
schemas
Upgrade First
BACKWARD
● Delete Fields
● Add optional fields
Last Version Consumers
BACKWARD_TRANSITIVE
● Delete Fields
● Add optional fields
All previous versions Consumers
FORWARD
● Add Fields
● Delete Optional Fields
Last Version Producers
FORWARD_TRANSITIVE
● Add Fields
● Delete Optional Fields
All previous versions Producers
FULL
● Add Optional Fields
● Delete Optional Fields
Last Version Any order
FULL_TRANSITIVE
● Add Optional Fields
● Delete Optional Fields
All previous versions Any order
NONE ● All Changes Accepted
Compatibility checking
disabled
Depends
Subject Naming Strategies
14
TopicNameStrategy
Subject Name =
Topic Name + [key|value]
Example:
Topic Name = mytopic
Value Subject Name = mytopic-value
Default
RecordNameStrategy
Subject Name =
Record Name + [key|value]
Example:
{"type":"record",
"name":"myrecord",
"fields":
[{"name":"f1",
"type":"string"}]
}
Value Subject Name = myrecord-value
Time Ordered Events
TopicRecordNameStrategy
Subject Name =
Topic Name + Record Name + [key|value]
Example:
Topic Name = mytopic
Record Name = myrecord
Value Subject Name = mytopic-myrecord-value
Most Granular
https://docs.confluent.io/current/schema-registry/serdes-develop/index.html#subject-name-strategy
Kafka with Schema Registry
15
{
id : 1,
name: “fox”
}
Serialization
humans-topic
Deserialization
{
id : 1,
name: “fox”
}
101010111100 101010111100
Producer Consumer
AVRO AVRO
{
id : int,
name: string
}
Schema Registry
v1
v1
{
id : 1,
name: “fox”
}
Time Passes, Again
16
v1
{
id : 1,
name: “fox”
}
v1
{
id : 2,
name: “jet”
}
v1
{
id : 3,
name: “oak”
}
humans-topic
Tail Head
“I added a field” - Producer
17
{
id : 4,
name: “roc”,
Lname: “druley”
}
Serialization
humans-topic
Deserialization
{
id : 3,
name: “oak”
}
101010111100 101010111100
Producer Consumer
AVRO AVRO
{
id : int,
name: string,
Lname: string
}
Schema Registry
v1
{
id : 3,
name: “oak”
}
v2
Backwards Compatibility
compare(v2, v1) => fail
ERROR:
Schema
Incompatible
“I added an optional field” - Producer
18
{
id : 4,
name: “roc”,
Lname: “druley”
}
Serialization
humans-topic
Deserialization
{
id : 3,
name: “oak”,
Lname: “druley”
}
101010111100 101010111100
Producer Consumer
AVRO AVRO
{
id : int,
name: string,
Lname: string [druley]
}
Schema Registry
v1
{
id : 3,
name: “oak”
}v2
use “druley” as the default,
it’s optional
Backwards Compatibility
compare(v2, v1) => ok
3 Good Habits
1. Set auto register
to false and add
Schema Validation
to CI/CD Pipeline.
20
Habit:
In prod and near-prod, clients should not
automatically register new schemas.
Ideally, this should be done in a CI/CD
pipeline.
Producer Setting:
auto.register.schema = false
Exceptions:
1. Dev environments
2. Schema Registry ACLs are enabled
using Confluent’s security plugin
https://docs.confluent.io/current/confluent-security-pl
ugins/schema-registry/authorization/index.html#auth
orization-for-sr-operations-and-resources
21
Schema Registry Maven Plugin - pom.xml
<plugin>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-maven-plugin</artifactId>
<version>${confluent.version}</version>
<configuration>
<schemaRegistryUrls>
<param>${schemaRegistryUrl}</param>
</schemaRegistryUrls>
<userInfoConfig>${schemaRegistryBasicAuthUserInfo}</userInfoConfig>
<subjects>
<transactions-value>src/main/resources/avro/io/confluent/examples/clients/basicavro/Payment2a.avsc</transactions-value>
</subjects>
</configuration>
<goals>
<goal>test-compatibility</goal>
</goals>
</plugin>
https://docs.confluent.io/current/schema-registry/schema_registry_onprem_tutorial.html#maven
2. Create new
topics if you need
to break
compatibility.
Habit:
Figure out the right compatibility for you
and create new topics in order to break it.
Don’t retrofit your compability just for
one time major schema changes.
Exceptions:
-Dev
22
3. Use Confluent
Schema Validation.
Scale schemas reliably
• Automated broker-side schema
validation and enforcement
• Direct interface from the broker
to Confluent Schema Registry
Granular control
• Enabled validation at the topic
level
• Set subject naming strategy at
the topic level
Producer Broker
Schema
Registry
1. Invalid
schema
2. Error
message
confluent.value.schema.validation=true
Broker Side Schema Validation
Schema Validation Demohttps://docs.confluent.io/current/schema-registry/schema-validation.html#sv
Confluent Developer
developer.confluent.io
Learn Kafka.
Start building with
Apache Kafka at
Confluent Developer.
Project Metamorphosis
Unveiling the next-gen event
streaming platform
For Updates Visit
cnfl.io/pm
Jay Kreps
Co-founder and CEO
Confluent
Q&A
https://www.linkedin.com/in/patrickdruley/
patrick@confluent.io
cnfl.io/meetups cnfl.io/slackcnfl.io/blog
https://github.com/confluentinc/demo-scene/tree/master/industry-themes

Getting Started with Confluent Schema Registry

  • 1.
    Getting Started withConfluent Schema Registry Patrick Druley Senior Solution Engineer @confluentinc
  • 2.
    Agenda 2 1. Schema-less Kafka 2.Schema Registry Basics 3. 3 Good Habits 4. Schema Validation Demo https://www.linkedin.com/in/patrickdruley/ twitter: @PatrickLovesAK
  • 3.
  • 4.
    “Someone changed the datefield from unix timestamp to datetime and broke all our reporting dashboards” “I need to add a new field to support new functionality, but I don’t want to update ALL the potential downstream consumers of this data” “My compliance team wants an audit to ensure there is no PII data in Kafka, there’s no easy way to get all the message metadata” “My data is ‘democratized’ in Kafka, but my dev teams have no way to know what data is in Kafka without domain knowledge or coming to my platform team”
  • 5.
    “My event isJSON” - Producer 5 { id : 1, name: “fox” } Serialization humans-topic 101010111100 Producer Consumer JSON
  • 6.
    “Sounds great” -Consumer 6 { id : 1, name: “fox” } Serialization humans-topic Deserialization { id : 1, name: “fox” } 101010111100 101010111100 Producer Consumer JSON JSON
  • 7.
    Time Passes 7 { id :1, name: “fox” } { id : 2, name: “jet” } { id : 3, name: “oak” } humans-topic Tail Head
  • 8.
    “I added afield” - Producer 8 { id : 4, name: “roc”, Lname: “druley” } Serialization humans-topic Deserialization { id : 3, name: “oak” } 101010111100 101010111100 Producer Consumer JSON JSON
  • 9.
    “You did what?”- Consumer 9 { id : 4, name: “roc”, Lname: “druley” } Serialization humans-topic Deserialization { id : 1, name: “patrick”, Lname: “druley” } 101010111100 101010111100 Producer Consumer JSON JSON ERROR: Lname doesn’t exist!
  • 10.
    What Happened? 10 { id :1, name: “patrick” } { id : 2, name: “roc” } { id : 3, name: “oak” } humans-topic Tail Head { id : 4, name: “roc”, Lname: “druley” } Consumer code breaks: #new field Lname Lname = humans[“Lname”] KeyError: 'Lname' Consumer But there’s no “Lname” in event 3.
  • 11.
  • 12.
  • 13.
    Schema Compatibility -Avro 13 Compatibility Type Changes Allowed Check against which schemas Upgrade First BACKWARD ● Delete Fields ● Add optional fields Last Version Consumers BACKWARD_TRANSITIVE ● Delete Fields ● Add optional fields All previous versions Consumers FORWARD ● Add Fields ● Delete Optional Fields Last Version Producers FORWARD_TRANSITIVE ● Add Fields ● Delete Optional Fields All previous versions Producers FULL ● Add Optional Fields ● Delete Optional Fields Last Version Any order FULL_TRANSITIVE ● Add Optional Fields ● Delete Optional Fields All previous versions Any order NONE ● All Changes Accepted Compatibility checking disabled Depends
  • 14.
    Subject Naming Strategies 14 TopicNameStrategy SubjectName = Topic Name + [key|value] Example: Topic Name = mytopic Value Subject Name = mytopic-value Default RecordNameStrategy Subject Name = Record Name + [key|value] Example: {"type":"record", "name":"myrecord", "fields": [{"name":"f1", "type":"string"}] } Value Subject Name = myrecord-value Time Ordered Events TopicRecordNameStrategy Subject Name = Topic Name + Record Name + [key|value] Example: Topic Name = mytopic Record Name = myrecord Value Subject Name = mytopic-myrecord-value Most Granular https://docs.confluent.io/current/schema-registry/serdes-develop/index.html#subject-name-strategy
  • 15.
    Kafka with SchemaRegistry 15 { id : 1, name: “fox” } Serialization humans-topic Deserialization { id : 1, name: “fox” } 101010111100 101010111100 Producer Consumer AVRO AVRO { id : int, name: string } Schema Registry v1 v1 { id : 1, name: “fox” }
  • 16.
    Time Passes, Again 16 v1 { id: 1, name: “fox” } v1 { id : 2, name: “jet” } v1 { id : 3, name: “oak” } humans-topic Tail Head
  • 17.
    “I added afield” - Producer 17 { id : 4, name: “roc”, Lname: “druley” } Serialization humans-topic Deserialization { id : 3, name: “oak” } 101010111100 101010111100 Producer Consumer AVRO AVRO { id : int, name: string, Lname: string } Schema Registry v1 { id : 3, name: “oak” } v2 Backwards Compatibility compare(v2, v1) => fail ERROR: Schema Incompatible
  • 18.
    “I added anoptional field” - Producer 18 { id : 4, name: “roc”, Lname: “druley” } Serialization humans-topic Deserialization { id : 3, name: “oak”, Lname: “druley” } 101010111100 101010111100 Producer Consumer AVRO AVRO { id : int, name: string, Lname: string [druley] } Schema Registry v1 { id : 3, name: “oak” }v2 use “druley” as the default, it’s optional Backwards Compatibility compare(v2, v1) => ok
  • 19.
  • 20.
    1. Set autoregister to false and add Schema Validation to CI/CD Pipeline. 20 Habit: In prod and near-prod, clients should not automatically register new schemas. Ideally, this should be done in a CI/CD pipeline. Producer Setting: auto.register.schema = false Exceptions: 1. Dev environments 2. Schema Registry ACLs are enabled using Confluent’s security plugin https://docs.confluent.io/current/confluent-security-pl ugins/schema-registry/authorization/index.html#auth orization-for-sr-operations-and-resources
  • 21.
    21 Schema Registry MavenPlugin - pom.xml <plugin> <groupId>io.confluent</groupId> <artifactId>kafka-schema-registry-maven-plugin</artifactId> <version>${confluent.version}</version> <configuration> <schemaRegistryUrls> <param>${schemaRegistryUrl}</param> </schemaRegistryUrls> <userInfoConfig>${schemaRegistryBasicAuthUserInfo}</userInfoConfig> <subjects> <transactions-value>src/main/resources/avro/io/confluent/examples/clients/basicavro/Payment2a.avsc</transactions-value> </subjects> </configuration> <goals> <goal>test-compatibility</goal> </goals> </plugin> https://docs.confluent.io/current/schema-registry/schema_registry_onprem_tutorial.html#maven
  • 22.
    2. Create new topicsif you need to break compatibility. Habit: Figure out the right compatibility for you and create new topics in order to break it. Don’t retrofit your compability just for one time major schema changes. Exceptions: -Dev 22
  • 23.
    3. Use Confluent SchemaValidation. Scale schemas reliably • Automated broker-side schema validation and enforcement • Direct interface from the broker to Confluent Schema Registry Granular control • Enabled validation at the topic level • Set subject naming strategy at the topic level Producer Broker Schema Registry 1. Invalid schema 2. Error message confluent.value.schema.validation=true Broker Side Schema Validation
  • 24.
  • 25.
    Confluent Developer developer.confluent.io Learn Kafka. Startbuilding with Apache Kafka at Confluent Developer.
  • 26.
    Project Metamorphosis Unveiling thenext-gen event streaming platform For Updates Visit cnfl.io/pm Jay Kreps Co-founder and CEO Confluent
  • 27.