KEMBAR78
Kafka Intro With Simple Java Producer Consumers | PPTX
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra and Kafka Support on AWS/EC2
Cloudurable
Kafka Introduction
Support around Cassandra
and Kafka running in EC2
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra / Kafka Support in EC2/AWS
Kafka Introduction Kafka messaging
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
What is Kafka?
❖ Distributed Streaming Platform
❖ Publish and Subscribe to streams of records
❖ Fault tolerant storage
❖ Process records as they occur
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Usage
❖ Build real-time streaming data pipe-lines
❖ Enable in-memory microservices (actors, Akka, Vert.x,
Qbit)
❖ Build real-time streaming applications that react to
streams
❖ Real-time data analytics
❖ Transform, react, aggregate, join real-time data flows
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Use Cases
❖ Metrics / KPIs gathering
❖ Aggregate statistics from many sources
❖ Even Sourcing
❖ Used with microservices (in-memory) and actor systems
❖ Commit Log
❖ External commit log for distributed systems. Replicated
data between nodes, re-sync for nodes to restore state
❖ Real-time data analytics, Stream Processing, Log
Aggregation, Messaging, Click-stream tracking, Audit trail,
etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Who uses Kafka?
❖ LinkedIn: Activity data and operational metrics
❖ Twitter: Uses it as part of Storm – stream processing
infrastructure
❖ Square: Kafka as bus to move all system events to various
Square data centers (logs, custom events, metrics, an so
on). Outputs to Splunk, Graphite, Esper-like alerting
systems
❖ Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box,
Cisco, CloudFlare, DataDog, LucidWorks, MailChimp,
NetFlix, etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Fundamentals
❖ Records have a key, value and timestamp
❖ Topic a stream of records
❖ Log topic storage on disk
❖ Partition / Segments (parts of Topic Log)
❖ Producer API to produce a streams or records
❖ Consumer API to consume a stream of records
❖ Broker: Cluster of Kafka servers running in cluster form broker. Consists on many
processes on many servers
❖ ZooKeeper: Does coordination of broker and consumers. Consistent file system
for configuration information and leadership election
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka: Topics, Producers, and
Consumers
Kafka
Cluster
Topic
Producer
Producer
Producer
Consumer
Consumer
Consumer
record
record
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
ZooKeeper does coordination for Kafka Consumer
and Kafka Cluster
Kafka BrokerProducer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Broker
Kafka Broker
Topic
ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Extensions
❖ Streams API to transform, aggregate, process records
from a stream and produce derivative streams
❖ Connector API reusable producers and consumers
(e.g., stream of changes from DynamoDB)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Connectors and
Streams
Kafka
Cluster
App
App
App
App
App
App
DB DB
App App
Connectors
Producers
Consumers
Streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Polyglot clients / Wire
protocol
❖ Kafka communication from clients and servers wire
protocol over TCP protocol
❖ Protocol versioned
❖ Maintains backwards compatibility
❖ Many languages supported
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topics and Logs
❖ Topic is a stream of records
❖ Topics stored in log
❖ Log broken up into partitions and segments
❖ Topics is a category or stream name
❖ Topics are pub/sub
❖ Can have zero or many consumers (subscribers)
❖ Topics are broken up into partitions for speed and size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Partitions
❖ Topics are broken up into partitions
❖ Partitions are decided usually by key of record
❖ Key of record determines which partition
❖ Partitions are used to scale Kafka across many servers
❖ Record sent to correct partition by key
❖ Partitions are used to facilitate parallel consumers
❖ Records are consumed in parallel up to the number of
partitions
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Partition Log
❖ Partition is ordered, immutable sequence of records that is continually
appended to—a structured commit log
❖ Records in partitions are assigned sequential id number called
the offset
❖ Offset identifies each record within the partition
❖ Topic Partitions allow Kafka log to scale beyond a size that will fit on a
single server
❖ Topic partition must fit on servers that host it, but topic can span
many partitions hosted by many servers
❖ Topic Partitions are unit of parallelism - each consumer in a consumer
group can work on one partition at a time
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Topic Partitions Layout
0 1 42 3 5 6 7 8 9 10 11
0 1 42 3 5 6 7 8
0 1 42 3 5 6 7 8 9 10
Older Newer
0 1 42 3 5 6 7
Partition
0
Partition
1
Partition
2
Partition
3
Writes
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Record retention
❖ Kafka cluster retains all published records
❖ Time based – configurable retention period
❖ Size based
❖ Compaction
❖ Retention policy of three days or two weeks or a month
❖ It is available for consumption until discarded by time, size or
compaction
❖ Consumption speed not impacted by size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumers / Producers
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Consumer Group A
Producers
Consumer Group B
Consumers remember offset where they left off.
Consumers groups each have their own offset.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Partition Distribution
❖ Each partition has leader server and zero or more follower
servers
❖ Leader handles all read and write requests for partition
❖ Followers replicate leader, and take over if leader dies
❖ Used for parallel consumer handling within a group
❖ Partitions of log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of partitions
❖ Each partition can be replicated across a configurable number of
Kafka servers
❖ Used for fault tolerance
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers
❖ Producers send records to topics
❖ Producer picks which partition to send record to per
topic
❖ Can be done in a round-robin
❖ Can be based on priority
❖ Typically based on key of record
❖ Important: Producer picks partition
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumers
❖ Consumers are grouped into a Consumer Group
❖ Consumer group has a unique name
❖ Each consumer group is a subscriber
❖ Each consumer group maintains its own offset
❖ Multiple subscribers = multiple consumer groups
❖ A Record is delivered to one Consumer in a Consumer Group
❖ Each consumer in consumer groups takes records and only one
consumer in group gets same record
❖ Consumers in Consumer Group load balance record
consumption
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
2 server Kafka cluster hosting 4 partitions (P0-P5)
Kafka Cluster
Server 2
P0 P1 P5
Server 1
P2 P3 P4
Consumer Group A
C0 C1 C3
Consumer Group B
C0 C1 C3
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer
Consumption
❖ Kafka Consumer consumption divides partitions over consumer instances
❖ Each Consumer is exclusive consumer of a "fair share" of partitions
❖ Consumer membership in group is handled by the Kafka protocol
dynamically
❖ If new Consumers join Consumer group they get share of partitions
❖ If Consumer dies, its partitions are split among remaining live
Consumers in group
❖ Order is only guaranteed within a single partition
❖ Since records are typically stored by key into a partition then order per
partition is sufficient for most use cases
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs JMS Messaging
❖ It is a bit like both Queues and Topics in JMS
❖ Kafka is a queue system per consumer in consumer group so load
balancing like JMS queue
❖ Kafka is a topic/pub/sub by offering Consumer Groups which act like
subscriptions
❖ Broadcast to multiple consumer groups
❖ By design Kafka is better suited for scale due to partition topic log
❖ Also by moving location in log to client/consumer side of equation
instead of the broker, less tracking required by Broker
❖ Handles parallel consumers better
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka scalable message
storage
❖ Kafka acts as a good storage system for records/messages
❖ Records written to Kafka topics are persisted to disk and replicated to
other servers for fault-tolerance
❖ Kafka Producers can wait on acknowledgement
❖ Write not complete until fully replicated
❖ Kafka disk structures scales well
❖ Writing in large streaming batches is fast
❖ Clients/Consumers control read position (offset)
❖ Kafka acts like high-speed file system for commit log storage,
replication
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Stream Processing
❖ Kafka for Stream Processing
❖ Kafka enable real-time processing of streams.
❖ Kafka supports stream processor
❖ Stream processor takes continual streams of records from input topics, performs some
processing, transformation, aggregation on input, and produces one or more output
streams
❖ A video player app might take in input streams of videos watched and videos paused, and
output a stream of user preferences and gear new video recommendations based on recent
user activity or aggregate activity of many users to see what new videos are hot
❖ Kafka Stream API solves hard problems with out of order records, aggregating across
multiple streams, joining data from multiple streams, allowing for stateful computations, and
more
❖ Stream API builds on core Kafka primitives and has a life of its own
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Single
Node
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka
❖ Run ZooKeeper
❖ Run Kafka Server/Broker
❖ Create Kafka Topic
❖ Run producer
❖ Run consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Server
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running Kafka Producer and
Consumer
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab 1-A Use Kafka Use single server version of
Kafka
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Cluster
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running many nodes
❖ Modify properties files
❖ Change port
❖ Change Kafka log location
❖ Start up many Kafka server instances
❖ Create Replicated Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Leave everything from before
running
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create two new
server.properties files
❖ Copy existing server.properties to server-
1.properties, server-2.properties
❖ Change server-1.properties to use port 9093, broker
id 1, and log.dirs “/tmp/kafka-logs-1”
❖ Change server-2.properties to use port 9094, broker
id 2, and log.dirs “/tmp/kafka-logs-2”
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
server-x.properties
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start second and third servers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka replicated topic my-
failsafe-topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start Kafka consumer and
producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka consumer and producer
running
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use Kafka Describe Topic
The leader is broker 0
There is only one partition
There are three in-sync replicas (ISR)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Test Failover by killing 1st
server
Use Kafka topic describe to see that a new leader was elected!
NEW LEADER IS 2!
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab 2-A Use Kafka Use a Kafka Cluster to
replicate a Kafka topic log
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Consumer
and
Producers
Working with producers and
consumers
Step by step first example
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Objectives Create Producer and Consumer
example
❖ Create simple example that creates a Kafka Consumer
and a Kafka Producer
❖ Create a new replicated Kafka topic
❖ Create Producer that uses topic to send records
❖ Send records with Kafka Producer
❖ Create Consumer that uses topic to receive messages
❖ Process messages from Kafka with Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Replicated Kafka
Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Build script
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Producer to send
records
❖ Specify bootstrap servers
❖ Specify client.id
❖ Specify Record Key serializer
❖ Specify Record Value serializer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Common Kafka imports and
constants
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Producer to send
records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Send async records with Kafka
Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Send sync records with Kafka
Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Consumer using Topic to Receive
Records
❖ Specify bootstrap servers
❖ Specify client.id
❖ Specify Record Key deserializer
❖ Specify Record Value deserializer
❖ Specify Consumer Group
❖ Subscribe to Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Consumer using Topic to Receive
Records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Process messages from Kafka with
Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running both Consumer and
Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Java Kafka simple example
recap
❖ Created simple example that creates a Kafka
Consumer and a Kafka Producer
❖ Created a new replicated Kafka topic
❖ Created Producer that uses topic to send records
❖ Send records with Kafka Producer
❖ Created Consumer that uses topic to receive
messages
❖ Processed records from Kafka with Consumer

Kafka Intro With Simple Java Producer Consumers

  • 1.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Cassandra and Kafka Support on AWS/EC2 Cloudurable Kafka Introduction Support around Cassandra and Kafka running in EC2
  • 3.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Cassandra / Kafka Support in EC2/AWS Kafka Introduction Kafka messaging
  • 4.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ What is Kafka? ❖ Distributed Streaming Platform ❖ Publish and Subscribe to streams of records ❖ Fault tolerant storage ❖ Process records as they occur
  • 5.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Usage ❖ Build real-time streaming data pipe-lines ❖ Enable in-memory microservices (actors, Akka, Vert.x, Qbit) ❖ Build real-time streaming applications that react to streams ❖ Real-time data analytics ❖ Transform, react, aggregate, join real-time data flows
  • 6.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Use Cases ❖ Metrics / KPIs gathering ❖ Aggregate statistics from many sources ❖ Even Sourcing ❖ Used with microservices (in-memory) and actor systems ❖ Commit Log ❖ External commit log for distributed systems. Replicated data between nodes, re-sync for nodes to restore state ❖ Real-time data analytics, Stream Processing, Log Aggregation, Messaging, Click-stream tracking, Audit trail, etc.
  • 7.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Who uses Kafka? ❖ LinkedIn: Activity data and operational metrics ❖ Twitter: Uses it as part of Storm – stream processing infrastructure ❖ Square: Kafka as bus to move all system events to various Square data centers (logs, custom events, metrics, an so on). Outputs to Splunk, Graphite, Esper-like alerting systems ❖ Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, DataDog, LucidWorks, MailChimp, NetFlix, etc.
  • 8.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Fundamentals ❖ Records have a key, value and timestamp ❖ Topic a stream of records ❖ Log topic storage on disk ❖ Partition / Segments (parts of Topic Log) ❖ Producer API to produce a streams or records ❖ Consumer API to consume a stream of records ❖ Broker: Cluster of Kafka servers running in cluster form broker. Consists on many processes on many servers ❖ ZooKeeper: Does coordination of broker and consumers. Consistent file system for configuration information and leadership election
  • 9.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka: Topics, Producers, and Consumers Kafka Cluster Topic Producer Producer Producer Consumer Consumer Consumer record record
  • 10.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ ZooKeeper does coordination for Kafka Consumer and Kafka Cluster Kafka BrokerProducer Producer Producer Consumer Consumer Consumer Kafka Broker Kafka Broker Topic ZooKeeper
  • 11.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Extensions ❖ Streams API to transform, aggregate, process records from a stream and produce derivative streams ❖ Connector API reusable producers and consumers (e.g., stream of changes from DynamoDB)
  • 12.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Connectors and Streams Kafka Cluster App App App App App App DB DB App App Connectors Producers Consumers Streams
  • 13.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Polyglot clients / Wire protocol ❖ Kafka communication from clients and servers wire protocol over TCP protocol ❖ Protocol versioned ❖ Maintains backwards compatibility ❖ Many languages supported
  • 14.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Topics and Logs ❖ Topic is a stream of records ❖ Topics stored in log ❖ Log broken up into partitions and segments ❖ Topics is a category or stream name ❖ Topics are pub/sub ❖ Can have zero or many consumers (subscribers) ❖ Topics are broken up into partitions for speed and size
  • 15.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Topic Partitions ❖ Topics are broken up into partitions ❖ Partitions are decided usually by key of record ❖ Key of record determines which partition ❖ Partitions are used to scale Kafka across many servers ❖ Record sent to correct partition by key ❖ Partitions are used to facilitate parallel consumers ❖ Records are consumed in parallel up to the number of partitions
  • 16.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Partition Log ❖ Partition is ordered, immutable sequence of records that is continually appended to—a structured commit log ❖ Records in partitions are assigned sequential id number called the offset ❖ Offset identifies each record within the partition ❖ Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server ❖ Topic partition must fit on servers that host it, but topic can span many partitions hosted by many servers ❖ Topic Partitions are unit of parallelism - each consumer in a consumer group can work on one partition at a time
  • 17.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Topic Partitions Layout 0 1 42 3 5 6 7 8 9 10 11 0 1 42 3 5 6 7 8 0 1 42 3 5 6 7 8 9 10 Older Newer 0 1 42 3 5 6 7 Partition 0 Partition 1 Partition 2 Partition 3 Writes
  • 18.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Record retention ❖ Kafka cluster retains all published records ❖ Time based – configurable retention period ❖ Size based ❖ Compaction ❖ Retention policy of three days or two weeks or a month ❖ It is available for consumption until discarded by time, size or compaction ❖ Consumption speed not impacted by size
  • 19.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumers / Producers 0 1 42 3 5 6 7 8 9 10 11 Partition 0 Consumer Group A Producers Consumer Group B Consumers remember offset where they left off. Consumers groups each have their own offset.
  • 20.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Partition Distribution ❖ Each partition has leader server and zero or more follower servers ❖ Leader handles all read and write requests for partition ❖ Followers replicate leader, and take over if leader dies ❖ Used for parallel consumer handling within a group ❖ Partitions of log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of partitions ❖ Each partition can be replicated across a configurable number of Kafka servers ❖ Used for fault tolerance
  • 21.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producers ❖ Producers send records to topics ❖ Producer picks which partition to send record to per topic ❖ Can be done in a round-robin ❖ Can be based on priority ❖ Typically based on key of record ❖ Important: Producer picks partition
  • 22.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumers ❖ Consumers are grouped into a Consumer Group ❖ Consumer group has a unique name ❖ Each consumer group is a subscriber ❖ Each consumer group maintains its own offset ❖ Multiple subscribers = multiple consumer groups ❖ A Record is delivered to one Consumer in a Consumer Group ❖ Each consumer in consumer groups takes records and only one consumer in group gets same record ❖ Consumers in Consumer Group load balance record consumption
  • 23.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ 2 server Kafka cluster hosting 4 partitions (P0-P5) Kafka Cluster Server 2 P0 P1 P5 Server 1 P2 P3 P4 Consumer Group A C0 C1 C3 Consumer Group B C0 C1 C3
  • 24.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Consumption ❖ Kafka Consumer consumption divides partitions over consumer instances ❖ Each Consumer is exclusive consumer of a "fair share" of partitions ❖ Consumer membership in group is handled by the Kafka protocol dynamically ❖ If new Consumers join Consumer group they get share of partitions ❖ If Consumer dies, its partitions are split among remaining live Consumers in group ❖ Order is only guaranteed within a single partition ❖ Since records are typically stored by key into a partition then order per partition is sufficient for most use cases
  • 25.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka vs JMS Messaging ❖ It is a bit like both Queues and Topics in JMS ❖ Kafka is a queue system per consumer in consumer group so load balancing like JMS queue ❖ Kafka is a topic/pub/sub by offering Consumer Groups which act like subscriptions ❖ Broadcast to multiple consumer groups ❖ By design Kafka is better suited for scale due to partition topic log ❖ Also by moving location in log to client/consumer side of equation instead of the broker, less tracking required by Broker ❖ Handles parallel consumers better
  • 26.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka scalable message storage ❖ Kafka acts as a good storage system for records/messages ❖ Records written to Kafka topics are persisted to disk and replicated to other servers for fault-tolerance ❖ Kafka Producers can wait on acknowledgement ❖ Write not complete until fully replicated ❖ Kafka disk structures scales well ❖ Writing in large streaming batches is fast ❖ Clients/Consumers control read position (offset) ❖ Kafka acts like high-speed file system for commit log storage, replication
  • 27.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Stream Processing ❖ Kafka for Stream Processing ❖ Kafka enable real-time processing of streams. ❖ Kafka supports stream processor ❖ Stream processor takes continual streams of records from input topics, performs some processing, transformation, aggregation on input, and produces one or more output streams ❖ A video player app might take in input streams of videos watched and videos paused, and output a stream of user preferences and gear new video recommendations based on recent user activity or aggregate activity of many users to see what new videos are hot ❖ Kafka Stream API solves hard problems with out of order records, aggregating across multiple streams, joining data from multiple streams, allowing for stateful computations, and more ❖ Stream API builds on core Kafka primitives and has a life of its own
  • 28.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Using Kafka Single Node
  • 29.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka ❖ Run ZooKeeper ❖ Run Kafka Server/Broker ❖ Create Kafka Topic ❖ Run producer ❖ Run consumer
  • 30.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Run ZooKeeper
  • 31.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka Server
  • 32.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka Topic
  • 33.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producer
  • 34.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer
  • 35.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Running Kafka Producer and Consumer
  • 36.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Use Kafka to send and receive messages Lab 1-A Use Kafka Use single server version of Kafka
  • 37.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Using Kafka Cluster
  • 38.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Running many nodes ❖ Modify properties files ❖ Change port ❖ Change Kafka log location ❖ Start up many Kafka server instances ❖ Create Replicated Topic
  • 39.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Leave everything from before running
  • 40.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create two new server.properties files ❖ Copy existing server.properties to server- 1.properties, server-2.properties ❖ Change server-1.properties to use port 9093, broker id 1, and log.dirs “/tmp/kafka-logs-1” ❖ Change server-2.properties to use port 9094, broker id 2, and log.dirs “/tmp/kafka-logs-2”
  • 41.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ server-x.properties
  • 42.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Start second and third servers
  • 43.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka replicated topic my- failsafe-topic
  • 44.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Start Kafka consumer and producer
  • 45.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka consumer and producer running
  • 46.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Use Kafka Describe Topic The leader is broker 0 There is only one partition There are three in-sync replicas (ISR)
  • 47.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Test Failover by killing 1st server Use Kafka topic describe to see that a new leader was elected! NEW LEADER IS 2!
  • 48.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Use Kafka to send and receive messages Lab 2-A Use Kafka Use a Kafka Cluster to replicate a Kafka topic log
  • 49.
    ™ Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting Kafka Consumer and Producers Working with producers and consumers Step by step first example
  • 50.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Objectives Create Producer and Consumer example ❖ Create simple example that creates a Kafka Consumer and a Kafka Producer ❖ Create a new replicated Kafka topic ❖ Create Producer that uses topic to send records ❖ Send records with Kafka Producer ❖ Create Consumer that uses topic to receive messages ❖ Process messages from Kafka with Consumer
  • 51.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Replicated Kafka Topic
  • 52.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Build script
  • 53.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka Producer to send records ❖ Specify bootstrap servers ❖ Specify client.id ❖ Specify Record Key serializer ❖ Specify Record Value serializer
  • 54.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Common Kafka imports and constants
  • 55.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka Producer to send records
  • 56.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Send async records with Kafka Producer
  • 57.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Send sync records with Kafka Producer
  • 58.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Consumer using Topic to Receive Records ❖ Specify bootstrap servers ❖ Specify client.id ❖ Specify Record Key deserializer ❖ Specify Record Value deserializer ❖ Specify Consumer Group ❖ Subscribe to Topic
  • 59.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Consumer using Topic to Receive Records
  • 60.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Process messages from Kafka with Consumer
  • 61.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Running both Consumer and Producer
  • 62.
    Cassandra / KafkaSupport in EC2/AWS. Kafka Training, Kafka Consulting ™ Java Kafka simple example recap ❖ Created simple example that creates a Kafka Consumer and a Kafka Producer ❖ Created a new replicated Kafka topic ❖ Created Producer that uses topic to send records ❖ Send records with Kafka Producer ❖ Created Consumer that uses topic to receive messages ❖ Processed records from Kafka with Consumer

Editor's Notes

  • #8 https://cwiki.apache.org/confluence/display/KAFKA/Powered+By