Technical Principles of
Kafka
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.
Objectives
Upon completion of this course, you will be able to know:
Basic concepts and application scenarios of Kafka
System architecture of Kafka
Key processes of Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to Kafka
2. Architecture and Functions of Kafka
3. Key Processes of Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Kafka Overview
Definition of Kafka: Kafka is a high-throughput, distributed, and
publishing-subscription messaging system. A large messaging
system can be established on low-cost servers with Kafka
technology.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Kafka Overview
Application scenarios
Compared with other components, Kafka features message persistence, high throughput,
distributed processing and real-time processing. It applies to online and offline message
consumption and massive data collection scenarios, such as website active tracking, operation
data monitoring of the aggregation statistics system, and log collection, etc.
Frontend Backend
Producer Producer
Storm
Flume Kafka
Spark
Hadoop Farmer
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Position of Kafka in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog
Data Information Knowledge Wisdom
DataFarm Porter Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
M/R Hive Kafka Spark Streaming Solr
Hadoop LibrA
YARN/Zookeeper Security
management
HDFS/HBase
Kafka is a distributed messaging system that supports online and offline
message processing and provides Java APIs for other components.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Contents
1. Introduction to Kafka
2. Architecture and Functions of Kafka
3. Key Processes of Kafka
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Kafka Topology
(Producer) Front End Front End Front End Service
(Push) (Push) (Push) (Push)
(Kafka)
Broker Broker Broker ZooKeeper
Zookeeper
Zookeeper
(Pull) (Pull) (Pull) (Pull)
Hadoop Real-time Other Data
(Consumer)
Cluster Monitoring Service Warehouse
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Kafka Topics
Consumer group 1
A consumer uses offsets to record and
Consumer group 2
read location information.
Kafka cleans up old messages
based on the time and size.
Kafka topic
。。。 new
Older msgs Newer msgs Producer 1
Producer 2
...
Producer N
Producer appends messages at the
end of a topic.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Kafka Partition
Each topic contains one or more partitions. Each partition is an
ordered and immutable sequence of messages. Partitions ensure high
throughput capabilities of Kafka.
Partition 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1 0 1 2 3 4 5 6 7 8 9 Writes
Partition 2 0 1 2 3 4 5 6 7 8 9 10 11 12
Old New
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Kafka Partition
Consumer group A has two consumers to read data from four partitions
Consumer group B has four consumers to read data from four partitions.
Kafka Cluster
Server 1 Server 2
P0 P3 P1 P2
C1 C2 C3 C4 C5 C6
Consumer group A Consumer group B
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Kafka Partition Offset
The location of a message in a log file is called offset, which is a long integer
that uniquely identifies a message. Consumers use offsets, partitions, and
topics to track records.
Consumer
group C1
Partition 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1 0 1 2 3 4 5 6 7 8 9 Writes
Partition 2 0 1 2 3 4 5 6 7 8 9 10 11 12
Old New
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Kafka Partition Replica (1)
Kafka Cluster
Broker 1 Broker 2 Broker 3 Broker 4
Partition-0 Partition-1 Partition-2 Partition-3
Partition-3 Partition-0 Partition-1 Partition-2
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Kafka Partition Replica (2)
Follower->Leader
Pulls data
ReplicaFetcherThread
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7
writes
old new old new
Leader Partition Follower Partition
ack
Producer
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Kafka Partition Replica (3)
ReplicaFetherThread
Broker 1 Broker 2 Broker 3
Leader Follower Follower
Partition-0 Partition-0 Partition-0
Leader Follower Follower
Partition-1 Partition-1 Partition-1
… … …
ReplicaFetherThread
ReplicaFetherThread-1
Broker 1 Broker 2 Broker 3
Leader
Leader Follower
Partiton-0
Partition-1 Partition-0
… Follower
…
Partition-1
ReplicaFetherThread-2
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Kafka Logs (1)
A large file in a partition is split into multiple small segments. These
segments facilitate periodical clearing or deletion of consumed files to
reduce disk usage.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Kafka Logs (2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Kafka Logs (3)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Kafka Log Cleanup (1)
Log cleanup modes: delete and compact.
Threshold for deleting logs: retention time limit and size of all logs in
a partition.
Default
Parameter Description Range
Value
Delete or
log.cleanup.policy delete Outdated log cleanup policy.
compact
Maximum retention time of log
log.retention.hours 168 1 ~ 2147483647
files. Unit: hour.
Maximum size of log data in a -1 ~
log.retention.bytes -1 partition. By default, the value is 9223372036854
not restricted. Unit: byte. 775807
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Kafka Log Cleanup (2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Kafka Data Reliability
All Kafka messages are stored in hard disks and topic partition
replication is performed to ensure data reliability.
How data reliability is ensured during message delivery?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Message Delivery Semantics
There are three data delivery modes:
At Most Once
Messages may be lost.
Messages are never redelivered or reprocessed.
At Lease Once
Messages are never lost.
Messages may be redelivered and reprocessed.
Exactly Once
Messages are never lost.
Messages are processed only once.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Kafka Message Delivery
Messages are delivered in different modes to ensure reliability in different application
scenarios.
Asynchronou
Asynchronou
Synchronous Asynchronous s delivery
Synchronous s delivery
delivery delivery with
delivery with with
without without confirmation
confirmation confirmation
confirmation confirmation but no
and retries
retries
At most once At least once
No replicas At most once At least once At least once
Synchronous
replication At most once At least once
At least once At most once At least once
(leader and
followers)
Asynchronous Messages may Messages may Messages may
replication At most once be lost or At most once be lost or be lost or
(leader) repeated. repeated. repeated.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Kafka Cluster Mirroring
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Contents
1. Introduction to Kafka
2. Architecture and Functions of Kafka
3. Key Processes of Kafka
Kafka Write Process
Kafka Read Process
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Write Data by Producer
Data Create
Data Message
Publish
Producer Message
Message
Kafka Cluster
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Contents
1. Introduction to Kafka
2. Architecture and Functions of Kafka
3. Key Processes of Kafka
Kafka Write Process
Kafka Read Process
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Read Data by Consumer
Overall process:
Process
Data
A consumer connects to Message
the leader broker where Subscribe
Message
the specified topic Consumer
partition is located and
pulls messages from
Kafka logs. Message
Kafka Cluster
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Summary
This module describes the following information about Kafka:
basic concepts and application scenarios, system architecture
and key processes.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Quiz
1. Which of the following are features of Kafka? ( )
A. High throughput
B. Distributed
C. Data persistence
D. Random message read
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Quiz
2. What is the component that Kafka directly depends on for running?
( )
A. HDFS
B. ZooKeeper
C. HBase
D. Spark
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Quiz
1. How is Kafka data reliability ensured?
2. What operations can the shell commands provided by the Kafka
client be used to perform on the topics?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
More Information
Training materials:
http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
Exam outline:
http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
Mock exam:
http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
Authentication process:
http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Thank You
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35