KEMBAR78
Kafka 101 and Developer Best Practices | PDF
Kafka 101 &
Developer Best Practices
Agenda
● Kafka Overview
● Kafka 101
● Best Practices for Writing to Kafka: A tour of the
Producer
● Best Practices for Reading from Kafka: The
Consumer
● General Considerations
3
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
4
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
5
ETL/Data Integration
Batch
Expensive
Time Consuming
Messaging
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Transient Messages
Stored records
6
ETL/Data Integration
Batch
Expensive
Time Consuming
Messaging
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Transient Messages
Stored records
Both of these are a complete mismatch
to how your business works.
7
ETL/Data Integration Messaging
Transient Messages
Stored records
ETL/Data Integration Messaging
Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming Paradigm
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
8
Fast (Low Latency)
Event Streaming Paradigm
To rethink data as not stored records
or transient messages, but instead as
a continually updating stream of events
9
Fast (Low Latency)
Event Streaming Paradigm
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
...
Device
Logs ... ...
...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Real-time
Customer 360
Financial Fraud
Detection
Real-time
Risk Analytics
Real-time
Payments
Machine
Learning
Models
...
Event-Streaming Applications
Universal Event Pipeline
Amazon
S3
SaaS
apps
Confluent: Central Nervous System For Enterprise
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent uniquely
enables Event Streaming
success
Hall of Innovation
CTO Innovation
Award Winner
2019
Enterprise Technology
Innovation
AWARDS
Confluent founders are
original creators of Kafka
Confluent team wrote 80%
of Kafka commits and has
over 1M hours technical
experience with Kafka
Confluent helps enterprises
successfully deploy event
streaming at scale and
accelerate time to market
Confluent Platform extends
Apache Kafka to be a
secure, enterprise-ready
platform
Kafka 101
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Scalability of a
Filesystem
Guarantees of a
Database
Distributed By
Design
Rewind and Replay
15
Kafka Broker
Kafka Broker
Kafka Broker
Kafka Broker
Kafka
Producer
Kafka
Producer
Kafka
Producer
Kafka
Consumer
Kafka
Consumer
(grouped)
Kafka
Consumer
/Producer
Kafka Broker
Writers Readers
Kafka Cluster
KAFKA
A MODERN, DISTRIBUTED
PLATFORM FOR DATA
STREAMS
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka Topics
my-topic
my-topic-partition-0
my-topic-partition-1
my-topic-partition-2
broker-1
broker-2
broker-3
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Creating a topic
$ kafka-topics --zookeeper zk:2181
--create 
--topic my-topic 
--replication-factor 3 
--partitions 3
Or use the AdminClient API!
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producing to Kafka
Time
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producing to Kafka
Time
C C
C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka’s distributed nature
Broker 1
Topic-1
partition-1
Broker 2 Broker 3 Broker 4
Topic-1
partition-1
Topic-1
partition-1
Leader Follower
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-3
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-4
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka’s distributed nature
Broker 1
Topic-1
partition-1
Broker 2 Broker 3 Broker 4
Topic-1
partition-1
Topic-1
partition-1
Leader Follower
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-2
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-3
Topic-1
partition-3
Topic-1
partition-4
Topic-1
partition-4
Producing to Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer
Clients - Producer Design
Producer Record
Topic
[Partition]
[Key]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
Can’t retry, throw
exception
Success: return
metadata
Yes
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The Serializer
Kafka doesn’t care about what you send to it as long as
it’s been converted to a byte stream beforehand.
JSON
CSV
Avro
Protobufs
XML
SERIALIZERS
01001010 01010011 01001111 01001110
01000011 01010011 01010110
01001010 01010011 01001111 01001110
01010000 01110010 01101111 01110100 ...
01011000 01001101 01001100
(if you must)
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The Serializer
private Properties kafkaProps = new Properties();
kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”);
kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”);
kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
kafkaProps.put("schema.registry.url", "https://schema-registry:8083");
producer = new KafkaProducer<String, SpecificRecord>(kafkaProps);
Reference
https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Record
Topic
[Partition]
[Key]
Value
Record keys determine the partition with the default kafka
partitioner
If a key isn’t provided, messages will be
produced in a round robin fashion
partitioner
Record Keys and why they’re important -
Ordering
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
AAA
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
BBB
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
CCC
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Keys are used in the default partitioning algorithm:
partition = hash(key) % numPartitions
Record Keys and why they’re important -
Ordering
Producer Record
Topic
[Partition]
DDD
Value
partitioner
Record keys determine the partition with the default kafka
partitioner
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Record Keys and why they’re important - Key
Cardinality
Consumers
Key cardinality affects the amount
of work done by the individual
consumers in a group. Poor key
choice can lead to uneven
workloads.
Keys in Kafka don’t have to be
primitives, like strings or ints. Like
values, they can be be anything:
JSON, Avro, etc… So create a key
that will evenly distribute groups of
records around the partitions.
Car·di·nal·i·ty
/ˌkärdəˈnalədē/
Noun
the number of elements in a set or other grouping, as a property of that grouping.
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“Zip”: “19101”
}
You don’t have to but... use a Schema!
Data
Producer
Service
Data
Consumer
Service
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“City”: “Philadelphia”,
“State”: “PA”,
“Zip”: “19101”
}
send JSON
“Where’s record.City?”
Reference
https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/
Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible
changes
● Support multi-data center environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic
!
Schema
Registry
Open Source Feature
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“Zip”: “19101”,
“City”: “NA”,
“State”: “NA”
}
Avro allows for evolution of schemas
Data
Producer
Service
Data
Consumer
Service
{
“Name”: “John Smith”,
“Address”: “123 Apple St.”,
“City”: “Philadelphia”,
“State”: “PA”,
“Zip”: “19101”
}
send AvroRecord
Schema
Registry
Version 1
Version 2
Reference
https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you
-really-need-one/
Developing with Confluent Schema Registry
We provide several Maven plugins for developing with
the Confluent Schema Registry
● download - download a subject’s schema to
your project
● register - register a new schema to the
schema registry from your development env
● test-compatibility - test changes made to
a schema against compatibility rules set by the
schema registry
Reference
https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html
<plugin>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-maven-plug
<version>5.0.0</version>
<configuration>
<schemaRegistryUrls>
<param>http://192.168.99.100:808
</schemaRegistryUrls>
<outputDirectory>src/main/avro</outp
<subjectPatterns>
<param>^TestSubject000-(key|valu
</subjectPatterns>
</configuration>
</plugin>
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Use Kafka’s Headers
Reference
https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers
Producer Record
Topic
[Partition]
[Timestamp]
Value
[Headers]
[Key]
Kafka Headers are simply an interface that requires a key of type
String, and a value of type byte[], the headers are stored in an
iterator in the ProducerRecord .
Example Use Cases
● Data lineage: reference previous topic partition/offsets
● Producing host/application/owner
● Message routing
● Encryption metadata (which key pair was this message
payload encrypted with?)
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=0
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
ack
Producer Properties
acks=1
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
ack
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
ack
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees - without exactly once
guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
{key: 1234 data: abcd} - offset 3345
Failed ack
Successful write
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees - without exactly once
guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
acks=all
min.insync.replica=2
{key: 1234, data: abcd} - offset 3345
{key: 1234, data: abcd} - offset 3346
retry
ack
dupe!
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees - with exactly once
guarantees
P
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
Producer Properties
enable.idempotence=true
max.inflight.requests.per.connection=5
acks = “all”
retries > 0 (preferably MAX_INT)
(pid, seq) [payload]
(100, 1) {key: 1234, data: abcd} - offset 3345
(100, 1) {key: 1234, data: abcd} - rejected, ack re-sent
(100, 2) {key: 5678, data: efgh} - offset 3346
retry
ack
no dupe!
Reference
https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
Transactional Producer
Producer
T1 T1 T1 T1 T1
KafkaProducer producer = createKafkaProducer(
“bootstrap.servers”, “broker:9092”,
“transactional.id”, “my-transactional-id”);
producer.initTransactions();
-- send some records --
producer.commitTransaction();
Consumer
KafkaConsumer consumer = createKafkaConsumer(
“bootstrap.servers”, “broker:9092”,
“group.id”, “my-group-id”,
"isolation.level", "read_committed");
Reference
https://www.confluent.io/blog/transactions-apache-kafka/
Consuming from Kafka
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
A basic Java Consumer
final Consumer<String, String> consumer = new KafkaConsumer<String,
String>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
-- Do Some Work --
}
}
} finally {
consumer.close();
}
}
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Single Consumer
C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
C
C
C1
C
C
C2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
C C
C C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
0 1
2 3
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
0 1
2 3
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Consuming From Kafka - Grouped Consumers
0, 3 1
2 3
Kafka’s Interceptors
ProducerInterceptor
onSend(ProducerRecord<K, V> record)
Returns ProducerRecord<K, V> . Called from
send() before key and value get serialized and
partition is assigned. This method is allowed to
modify the record.
onAcknowledgement(RecordMetadata metadata,
java.lang.Exception exception)
This method is called when the record sent to the
server has been acknowledged, or when sending
the record fails before it gets sent to the server.
Used for observability and reporting.
ConsumerInterceptor
onConsume(ConsumerRecords<K,V> records)
Called just before the records are returned by
KafkaConsumer.poll()
This method is allowed to modify consumer
records, in which case the new records will be
returned.
onCommit(Map<TopicPartition,OffsetAndMetada
ta> offsets)
This is called when offsets get committed.
Used for observability and reporting
Reference
https://kafka.apache.org/20/javadoc/org/apache/kafka/clien
ts/producer/ProducerInterceptor.html
Reference
https://kafka.apache.org/20/javadoc/org/apache/kafka/clien
ts/consumer/ConsumerInterceptor.html
Should I pool connections?
NO!
Since Kafka connections are long-lived, there is no reason to
pool connections. It’s common to keep one connection per
thread.
Use a good client!
Clients
● Java/Scala - default clients, comes with Kafka
● C/C++ - https://github.com/edenhill/librdkafka
● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet
● Python - https://github.com/confluentinc/confluent-kafka-python
● Golang - https://github.com/confluentinc/confluent-kafka-go
● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!)
New Kafka features will only be available to modern, updated clients!
Resources
Free E-Books from Confluent!
I Heart Logs:
https://www.confluent.io/ebook/i-heart-logs-event-data-stream-processing-and-data-integration/
Kafka: The Definitive Guide: https://www.confluent.io/resources/kafka-the-definitive-guide/
Designing Event Driven Systems:
https://www.confluent.io/designing-event-driven-systems/
Confluent Blog: https://www.confluent.io/blog
Thank You!
Thank you!
pascal@confluent.io

Kafka 101 and Developer Best Practices

  • 1.
    Kafka 101 & DeveloperBest Practices
  • 2.
    Agenda ● Kafka Overview ●Kafka 101 ● Best Practices for Writing to Kafka: A tour of the Producer ● Best Practices for Reading from Kafka: The Consumer ● General Considerations
  • 3.
    3 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 4.
    4 ETL/Data Integration Messaging Batch Expensive TimeConsuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 5.
    5 ETL/Data Integration Batch Expensive Time Consuming Messaging Difficultto Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency) Transient Messages Stored records
  • 6.
    6 ETL/Data Integration Batch Expensive Time Consuming Messaging Difficultto Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency) Transient Messages Stored records Both of these are a complete mismatch to how your business works.
  • 7.
    7 ETL/Data Integration Messaging TransientMessages Stored records ETL/Data Integration Messaging Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence Data Loss No Replay High Throughput Durable Persistent Maintains Order Fast (Low Latency) Event Streaming Paradigm High Throughput Durable Persistent Maintains Order Fast (Low Latency)
  • 8.
    8 Fast (Low Latency) EventStreaming Paradigm To rethink data as not stored records or transient messages, but instead as a continually updating stream of events
  • 9.
    9 Fast (Low Latency) EventStreaming Paradigm
  • 10.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. ... Device Logs ... ... ... Data Stores Logs 3rd Party Apps Custom Apps / Microservices Real-time Customer 360 Financial Fraud Detection Real-time Risk Analytics Real-time Payments Machine Learning Models ... Event-Streaming Applications Universal Event Pipeline Amazon S3 SaaS apps Confluent: Central Nervous System For Enterprise
  • 11.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Confluent uniquely enables Event Streaming success Hall of Innovation CTO Innovation Award Winner 2019 Enterprise Technology Innovation AWARDS Confluent founders are original creators of Kafka Confluent team wrote 80% of Kafka commits and has over 1M hours technical experience with Kafka Confluent helps enterprises successfully deploy event streaming at scale and accelerate time to market Confluent Platform extends Apache Kafka to be a secure, enterprise-ready platform
  • 12.
  • 13.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
  • 14.
    Scalability of a Filesystem Guaranteesof a Database Distributed By Design Rewind and Replay
  • 15.
    15 Kafka Broker Kafka Broker KafkaBroker Kafka Broker Kafka Producer Kafka Producer Kafka Producer Kafka Consumer Kafka Consumer (grouped) Kafka Consumer /Producer Kafka Broker Writers Readers Kafka Cluster
  • 16.
  • 17.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka Topics my-topic my-topic-partition-0 my-topic-partition-1 my-topic-partition-2 broker-1 broker-2 broker-3
  • 18.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Creating a topic $ kafka-topics --zookeeper zk:2181 --create --topic my-topic --replication-factor 3 --partitions 3 Or use the AdminClient API!
  • 19.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producing to Kafka Time
  • 20.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producing to Kafka Time C C C
  • 21.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka’s distributed nature Broker 1 Topic-1 partition-1 Broker 2 Broker 3 Broker 4 Topic-1 partition-1 Topic-1 partition-1 Leader Follower Topic-1 partition-2 Topic-1 partition-2 Topic-1 partition-2 Topic-1 partition-3 Topic-1 partition-4 Topic-1 partition-3 Topic-1 partition-3 Topic-1 partition-4 Topic-1 partition-4
  • 22.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Kafka’s distributed nature Broker 1 Topic-1 partition-1 Broker 2 Broker 3 Broker 4 Topic-1 partition-1 Topic-1 partition-1 Leader Follower Topic-1 partition-2 Topic-1 partition-2 Topic-1 partition-2 Topic-1 partition-3 Topic-1 partition-4 Topic-1 partition-3 Topic-1 partition-3 Topic-1 partition-4 Topic-1 partition-4
  • 23.
  • 24.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Clients - Producer Design Producer Record Topic [Partition] [Key] Value Serializer Partitioner Topic A Partition 0 Batch 0 Batch 1 Batch 2 Topic B Partition 1 Batch 0 Batch 1 Batch 2 Kafka Broker Send() Retry ? Fail ? Yes No Can’t retry, throw exception Success: return metadata Yes
  • 25.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. The Serializer Kafka doesn’t care about what you send to it as long as it’s been converted to a byte stream beforehand. JSON CSV Avro Protobufs XML SERIALIZERS 01001010 01010011 01001111 01001110 01000011 01010011 01010110 01001010 01010011 01001111 01001110 01010000 01110010 01101111 01110100 ... 01011000 01001101 01001100 (if you must) Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 26.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. The Serializer private Properties kafkaProps = new Properties(); kafkaProps.put(“bootstrap.servers”, “broker1:9092,broker2:9092”); kafkaProps.put(“key.serializer”, “org.apache.kafka.common.serialization.StringSerializer”); kafkaProps.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer"); kafkaProps.put("schema.registry.url", "https://schema-registry:8083"); producer = new KafkaProducer<String, SpecificRecord>(kafkaProps); Reference https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html
  • 27.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Record Topic [Partition] [Key] Value Record keys determine the partition with the default kafka partitioner If a key isn’t provided, messages will be produced in a round robin fashion partitioner Record Keys and why they’re important - Ordering
  • 28.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions Record Keys and why they’re important - Ordering Producer Record Topic [Partition] AAA Value partitioner Record keys determine the partition with the default kafka partitioner
  • 29.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions Record Keys and why they’re important - Ordering Producer Record Topic [Partition] BBB Value partitioner Record keys determine the partition with the default kafka partitioner
  • 30.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions Record Keys and why they’re important - Ordering Producer Record Topic [Partition] CCC Value partitioner Record keys determine the partition with the default kafka partitioner
  • 31.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions Record Keys and why they’re important - Ordering Producer Record Topic [Partition] DDD Value partitioner Record keys determine the partition with the default kafka partitioner
  • 32.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Record Keys and why they’re important - Key Cardinality Consumers Key cardinality affects the amount of work done by the individual consumers in a group. Poor key choice can lead to uneven workloads. Keys in Kafka don’t have to be primitives, like strings or ints. Like values, they can be be anything: JSON, Avro, etc… So create a key that will evenly distribute groups of records around the partitions. Car·di·nal·i·ty /ˌkärdəˈnalədē/ Noun the number of elements in a set or other grouping, as a property of that grouping.
  • 33.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. { “Name”: “John Smith”, “Address”: “123 Apple St.”, “Zip”: “19101” } You don’t have to but... use a Schema! Data Producer Service Data Consumer Service { “Name”: “John Smith”, “Address”: “123 Apple St.”, “City”: “Philadelphia”, “State”: “PA”, “Zip”: “19101” } send JSON “Where’s record.City?” Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
  • 34.
    Schema Registry: MakeData Backwards Compatible and Future-Proof ● Define the expected fields for each Kafka topic ● Automatically handle schema changes (e.g. new fields) ● Prevent backwards incompatible changes ● Support multi-data center environments Elastic Cassandra HDFS Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic ! Schema Registry Open Source Feature
  • 35.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. { “Name”: “John Smith”, “Address”: “123 Apple St.”, “Zip”: “19101”, “City”: “NA”, “State”: “NA” } Avro allows for evolution of schemas Data Producer Service Data Consumer Service { “Name”: “John Smith”, “Address”: “123 Apple St.”, “City”: “Philadelphia”, “State”: “PA”, “Zip”: “19101” } send AvroRecord Schema Registry Version 1 Version 2 Reference https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you -really-need-one/
  • 36.
    Developing with ConfluentSchema Registry We provide several Maven plugins for developing with the Confluent Schema Registry ● download - download a subject’s schema to your project ● register - register a new schema to the schema registry from your development env ● test-compatibility - test changes made to a schema against compatibility rules set by the schema registry Reference https://docs.confluent.io/current/schema-registry/docs/maven-plugin.html <plugin> <groupId>io.confluent</groupId> <artifactId>kafka-schema-registry-maven-plug <version>5.0.0</version> <configuration> <schemaRegistryUrls> <param>http://192.168.99.100:808 </schemaRegistryUrls> <outputDirectory>src/main/avro</outp <subjectPatterns> <param>^TestSubject000-(key|valu </subjectPatterns> </configuration> </plugin>
  • 37.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Use Kafka’s Headers Reference https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers Producer Record Topic [Partition] [Timestamp] Value [Headers] [Key] Kafka Headers are simply an interface that requires a key of type String, and a value of type byte[], the headers are stored in an iterator in the ProducerRecord . Example Use Cases ● Data lineage: reference previous topic partition/offsets ● Producing host/application/owner ● Message routing ● Encryption metadata (which key pair was this message payload encrypted with?)
  • 38.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=0 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 39.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 ack Producer Properties acks=1 Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 40.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 ack
  • 41.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 ack
  • 42.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees - without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234 data: abcd} - offset 3345 Failed ack Successful write Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 43.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees - without exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties acks=all min.insync.replica=2 {key: 1234, data: abcd} - offset 3345 {key: 1234, data: abcd} - offset 3346 retry ack dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 44.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Producer Guarantees - with exactly once guarantees P Broker 1 Broker 2 Broker 3 Topic1 partition1 Leader Follower Topic1 partition1 Topic1 partition1 Producer Properties enable.idempotence=true max.inflight.requests.per.connection=5 acks = “all” retries > 0 (preferably MAX_INT) (pid, seq) [payload] (100, 1) {key: 1234, data: abcd} - offset 3345 (100, 1) {key: 1234, data: abcd} - rejected, ack re-sent (100, 2) {key: 5678, data: efgh} - offset 3346 retry ack no dupe! Reference https://www.confluent.io/blog/exactly-once-semantics-are-p ossible-heres-how-apache-kafka-does-it/
  • 45.
    Transactional Producer Producer T1 T1T1 T1 T1 KafkaProducer producer = createKafkaProducer( “bootstrap.servers”, “broker:9092”, “transactional.id”, “my-transactional-id”); producer.initTransactions(); -- send some records -- producer.commitTransaction(); Consumer KafkaConsumer consumer = createKafkaConsumer( “bootstrap.servers”, “broker:9092”, “group.id”, “my-group-id”, "isolation.level", "read_committed"); Reference https://www.confluent.io/blog/transactions-apache-kafka/
  • 46.
  • 47.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. A basic Java Consumer final Consumer<String, String> consumer = new KafkaConsumer<String, String>(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { -- Do Some Work -- } } } finally { consumer.close(); } }
  • 48.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Consuming From Kafka - Single Consumer C
  • 49.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Consuming From Kafka - Grouped Consumers C C C1 C C C2
  • 50.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Consuming From Kafka - Grouped Consumers C C C C
  • 51.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Consuming From Kafka - Grouped Consumers 0 1 2 3
  • 52.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Consuming From Kafka - Grouped Consumers 0 1 2 3
  • 53.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Consuming From Kafka - Grouped Consumers 0, 3 1 2 3
  • 54.
    Kafka’s Interceptors ProducerInterceptor onSend(ProducerRecord<K, V>record) Returns ProducerRecord<K, V> . Called from send() before key and value get serialized and partition is assigned. This method is allowed to modify the record. onAcknowledgement(RecordMetadata metadata, java.lang.Exception exception) This method is called when the record sent to the server has been acknowledged, or when sending the record fails before it gets sent to the server. Used for observability and reporting. ConsumerInterceptor onConsume(ConsumerRecords<K,V> records) Called just before the records are returned by KafkaConsumer.poll() This method is allowed to modify consumer records, in which case the new records will be returned. onCommit(Map<TopicPartition,OffsetAndMetada ta> offsets) This is called when offsets get committed. Used for observability and reporting Reference https://kafka.apache.org/20/javadoc/org/apache/kafka/clien ts/producer/ProducerInterceptor.html Reference https://kafka.apache.org/20/javadoc/org/apache/kafka/clien ts/consumer/ConsumerInterceptor.html
  • 55.
    Should I poolconnections? NO! Since Kafka connections are long-lived, there is no reason to pool connections. It’s common to keep one connection per thread.
  • 56.
    Use a goodclient! Clients ● Java/Scala - default clients, comes with Kafka ● C/C++ - https://github.com/edenhill/librdkafka ● C#/.Net - https://github.com/confluentinc/confluent-kafka-dotnet ● Python - https://github.com/confluentinc/confluent-kafka-python ● Golang - https://github.com/confluentinc/confluent-kafka-go ● Node/JavaScript - https://github.com/Blizzard/node-rdkafka (not supported by Confluent!) New Kafka features will only be available to modern, updated clients!
  • 57.
    Resources Free E-Books fromConfluent! I Heart Logs: https://www.confluent.io/ebook/i-heart-logs-event-data-stream-processing-and-data-integration/ Kafka: The Definitive Guide: https://www.confluent.io/resources/kafka-the-definitive-guide/ Designing Event Driven Systems: https://www.confluent.io/designing-event-driven-systems/ Confluent Blog: https://www.confluent.io/blog Thank You!
  • 58.