KEMBAR78
Kafka Concepts | PDF | Computer Data | Computing
0% found this document useful (0 votes)
227 views75 pages

Kafka Concepts

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
227 views75 pages

Kafka Concepts

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Data Engineering

101 - Kafka

KAFKA:
Concepts
Data Engineering 101 - Kafka

KAFKA BROKER
A KAFKA BROKER IS A SERVER THAT
RUNS THE KAFKA SOFTWARE AND IS
RESPONSIBLE FOR STORING AND
SERVING DATA. BROKERS RECEIVE
MESSAGES FROM PRODUCERS, ASSIGN
OFFSETS TO MESSAGES, AND STORE
THEM ON DISK.

EXAMPLE: IN A KAFKA CLUSTER,


MULTIPLE BROKERS WORK TOGETHER
TO ENSURE DATA IS RELIABLY STORED
AND SERVED.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

TOPICS
TOPICS ARE LOGICAL CHANNELS TO
WHICH MESSAGES ARE SENT BY
PRODUCERS AND FROM WHICH
MESSAGES ARE READ BY CONSUMERS.
A TOPIC IS DIVIDED INTO MULTIPLE
PARTITIONS TO ALLOW PARALLEL
PROCESSING.

EXAMPLE: A "USER_ACTIVITY" TOPIC


MIGHT BE DIVIDED INTO SEVERAL
PARTITIONS TO HANDLE HIGH
MESSAGE VOLUME.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

PARTITIONS

PARTITIONS ARE SUBDIVISIONS OF


TOPICS. EACH PARTITION IS AN
ORDERED, IMMUTABLE SEQUENCE OF
MESSAGES THAT IS CONTINUALLY
APPENDED TO. PARTITIONS ENABLE
KAFKA TO SCALE HORIZONTALLY AND
MAINTAIN MESSAGE ORDER.

EXAMPLE: PARTITION 0 OF THE


"USER_ACTIVITY" TOPIC STORES
MESSAGES FOR A SPECIFIC SUBSET OF
USERS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

PRODUCERS
PRODUCERS ARE CLIENTS THAT SEND
MESSAGES TO KAFKA TOPICS. THEY
CAN SEND MESSAGES TO SPECIFIC
PARTITIONS BASED ON A
PARTITIONING STRATEGY OR
DISTRIBUTE THEM EVENLY ACROSS ALL
PARTITIONS.

EXAMPLE: A WEB APPLICATION THAT


LOGS USER ACTIVITY SENDS THESE
LOGS TO A KAFKA TOPIC AS
MESSAGES.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

CONSUMERS
CONSUMERS ARE CLIENTS THAT READ
MESSAGES FROM KAFKA TOPICS.
CONSUMERS CAN OPERATE
INDIVIDUALLY OR AS PART OF A
CONSUMER GROUP, WHICH ALLOWS
FOR PARALLEL PROCESSING OF
MESSAGES.

EXAMPLE: AN ANALYTICS SERVICE


READS USER ACTIVITY LOGS FROM A
KAFKA TOPIC TO GENERATE REPORTS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

CONSUMER GROUPS
CONSUMER GROUPS ALLOW MULTIPLE
CONSUMERS TO COLLABORATE ON
PROCESSING MESSAGES FROM A
TOPIC. EACH PARTITION IN A TOPIC IS
ASSIGNED TO ONLY ONE CONSUMER
WITHIN A GROUP AT A TIME,
ENSURING PARALLEL PROCESSING
AND LOAD BALANCING.

EXAMPLE: THREE CONSUMERS IN A


GROUP PROCESS MESSAGES FROM SIX
PARTITIONS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

OFFSETS
OFFSETS ARE UNIQUE IDENTIFIERS
ASSIGNED TO EACH MESSAGE WITHIN
A PARTITION. CONSUMERS USE
OFFSETS TO TRACK WHICH MESSAGES
HAVE BEEN READ.

EXAMPLE: A CONSUMER READS


MESSAGES UP TO OFFSET 105 AND
RESUMES FROM OFFSET 106 AFTER A
RESTART.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA CLUSTER
A KAFKA CLUSTER IS COMPOSED OF
MULTIPLE BROKERS THAT WORK
TOGETHER. CLUSTERS PROVIDE FAULT
TOLERANCE AND HIGH AVAILABILITY.

EXAMPLE: A CLUSTER WITH THREE


BROKERS CAN CONTINUE OPERATING
IF ONE BROKER FAILS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

REPLICATION
KAFKA REPLICATES PARTITIONS
ACROSS MULTIPLE BROKERS TO
ENSURE FAULT TOLERANCE. EACH
PARTITION HAS A LEADER AND
SEVERAL FOLLOWERS. THE LEADER
HANDLES ALL READS AND WRITES,
WHILE FOLLOWERS REPLICATE THE
DATA.

EXAMPLE: PARTITION 0 HAS ONE


LEADER AND TWO FOLLOWERS ACROSS
THREE BROKERS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

ZOOKEEPER

ZOOKEEPER IS USED FOR DISTRIBUTED


COORDINATION AND METADATA
MANAGEMENT IN KAFKA. IT MANAGES
BROKER METADATA, LEADER
ELECTION, AND CONFIGURATION.

EXAMPLE: ZOOKEEPER ENSURES A NEW


LEADER IS ELECTED IF THE CURRENT
LEADER BROKER FAILS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

PRODUCERS AND ACKS

PRODUCERS SEND MESSAGES TO


BROKERS AND CAN CONFIGURE
ACKNOWLEDGMENT SETTINGS (ACKS)
TO ENSURE RELIABLE MESSAGE
DELIVERY.

EXAMPLE: A PRODUCER CONFIGURES


ACKS TO WAIT FOR CONFIRMATION
FROM ALL REPLICAS BEFORE
CONSIDERING A MESSAGE SENT.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

RETENTION POLICY

KAFKA TOPICS CAN HAVE RETENTION


POLICIES THAT DETERMINE HOW LONG
MESSAGES ARE STORED. POLICIES CAN
BE TIME-BASED OR SIZE-BASED.

EXAMPLE: A TOPIC IS CONFIGURED TO


RETAIN MESSAGES FOR 7 DAYS, AFTER
WHICH THEY ARE DELETED.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

LOG COMPACTION

LOG COMPACTION ENSURES THAT


ONLY THE LATEST MESSAGE FOR EACH
KEY IS RETAINED IN A TOPIC, USEFUL
FOR MAINTAINING THE LATEST STATE.
EXAMPLE: A LOG-COMPACTED TOPIC
RETAINS ONLY THE LATEST UPDATE
FOR EACH USER PROFILE.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA CONNECT

KAFKA CONNECT IS A FRAMEWORK


FOR INTEGRATING KAFKA WITH OTHER
DATA SYSTEMS. IT PROVIDES
CONNECTORS TO MOVE DATA IN AND
OUT OF KAFKA.

EXAMPLE: USING KAFKA CONNECT TO


SYNC DATA BETWEEN A MYSQL
DATABASE AND A KAFKA TOPIC.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA STREAMS

KAFKA STREAMS IS A LIBRARY FOR


BUILDING STREAM PROCESSING
APPLICATIONS ON TOP OF KAFKA. IT
ALLOWS PROCESSING AND
TRANSFORMING DATA IN REAL TIME.

EXAMPLE: AN APPLICATION USING


KAFKA STREAMS AGGREGATES
CLICKSTREAM DATA TO GENERATE
REAL-TIME METRICS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MIRRORMAKER

MIRRORMAKER IS A TOOL FOR


REPLICATING DATA BETWEEN KAFKA
CLUSTERS, OFTEN USED FOR CROSS-
DATACENTER REPLICATION.

EXAMPLE: USING MIRRORMAKER TO


REPLICATE MESSAGES FROM A
PRIMARY DATACENTER TO A BACKUP
DATACENTER.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA API

KAFKA PROVIDES APIS FOR


PRODUCING, CONSUMING, AND
MANAGING DATA, INCLUDING
PRODUCER API, CONSUMER API, AND
ADMIN API.

EXAMPLE: USING THE PRODUCER API


TO SEND MESSAGES FROM A JAVA
APPLICATION TO A KAFKA TOPIC.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

SECURITY

KAFKA SUPPORTS VARIOUS SECURITY


FEATURES, INCLUDING SSL
ENCRYPTION, SASL AUTHENTICATION,
AND ACLS FOR AUTHORIZATION.

EXAMPLE: CONFIGURING SSL TO


ENCRYPT DATA IN TRANSIT AND SASL
FOR CLIENT AUTHENTICATION.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

ADMINCLIENT API

THE ADMINCLIENT API ALLOWS


PROGRAMMATIC MANAGEMENT OF
KAFKA TOPICS, BROKERS, AND
CONFIGURATIONS.

EXAMPLE: USING ADMINCLIENT TO


CREATE A NEW TOPIC AND CONFIGURE
ITS RETENTION POLICY.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MONITORING AND
METRICS
KAFKA PROVIDES METRICS FOR
MONITORING CLUSTER HEALTH AND
PERFORMANCE. TOOLS LIKE
PROMETHEUS AND GRAFANA CAN BE
USED TO VISUALIZE THESE METRICS.

EXAMPLE: MONITORING CONSUMER


LAG AND BROKER HEALTH USING
PROMETHEUS AND GRAFANA
DASHBOARDS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MESSAGE DELIVERY
SEMANTICS
KAFKA SUPPORTS THREE TYPES OF
MESSAGE DELIVERY SEMANTICS: AT
MOST ONCE, AT LEAST ONCE, AND
EXACTLY ONCE.

EXAMPLE: CONFIGURING A PRODUCER


FOR EXACTLY-ONCE DELIVERY TO
ENSURE NO MESSAGE IS LOST OR
DUPLICATED.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

STATEFUL PROCESSING

KAFKA STREAMS SUPPORTS STATEFUL


PROCESSING, ALLOWING
APPLICATIONS TO MAINTAIN STATE
ACROSS MESSAGES USING STATE
STORES.

EXAMPLE: A STREAM PROCESSING


APPLICATION THAT MAINTAINS A
RUNNING COUNT OF EVENTS OVER A
WINDOW OF TIME.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

WINDOWED OPERATIONS

KAFKA STREAMS PROVIDES SUPPORT


FOR WINDOWED OPERATIONS,
ENABLING TIME-BASED
AGGREGATIONS AND
TRANSFORMATIONS.

EXAMPLE: CALCULATING THE AVERAGE


NUMBER OF USER CLICKS PER MINUTE
USING WINDOWED OPERATIONS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KSQL

KSQL IS A SQL-LIKE INTERFACE FOR


STREAM PROCESSING IN KAFKA,
SIMPLIFYING THE CREATION OF
STREAM PROCESSING APPLICATIONS.

EXAMPLE: USING KSQL TO FILTER,


AGGREGATE, AND TRANSFORM
STREAMS OF DATA IN REAL TIME.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA ECOSYSTEM

KAFKA'S ECOSYSTEM INCLUDES


VARIOUS TOOLS AND FRAMEWORKS
FOR COMPREHENSIVE DATA
PROCESSING, SUCH AS KAFKA
CONNECT, KAFKA STREAMS, AND
KSQL.

EXAMPLE: INTEGRATING KAFKA WITH


A RELATIONAL DATABASE USING
KAFKA CONNECT AND PROCESSING
THE DATA WITH KAFKA STREAMS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

PUBLISH/SUBSCRIBE
MESSAGING
PUB/SUB SYSTEMS ALLOW
DECOUPLING OF MESSAGE PRODUCERS
AND CONSUMERS. KAFKA ACTS AS A
BROKER FACILITATING THIS.

EXAMPLE: AN APPLICATION
PUBLISHES USER ACTIVITY LOGS
WHICH CAN BE CONSUMED BY
ANALYTICS SERVICES.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MESSAGE AND BATCHES

MESSAGES ARE THE BASIC UNIT OF


DATA IN KAFKA, STORED AS BYTE
ARRAYS. MESSAGES ARE WRITTEN IN
BATCHES FOR EFFICIENCY.

EXAMPLE: A BATCH OF LOG MESSAGES


SENT FROM AN APPLICATION.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

SCHEMAS

SCHEMAS DEFINE THE STRUCTURE OF


MESSAGES, ENSURING CONSISTENCY.
APACHE AVRO IS A COMMON
SERIALIZATION FRAMEWORK USED
WITH KAFKA.

EXAMPLE: AVRO SCHEMA FOR USER


PROFILE DATA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

TOPICS AND PARTITIONS

TOPICS ARE CATEGORIES TO WHICH


MESSAGES ARE PUBLISHED. TOPICS
ARE DIVIDED INTO PARTITIONS FOR
SCALABILITY AND REDUNDANCY.

EXAMPLE: A "USER_ACTIVITY" TOPIC


WITH PARTITIONS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

PRODUCERS AND
CONSUMERS
PRODUCERS CREATE AND SEND
MESSAGES TO KAFKA TOPICS.
CONSUMERS READ MESSAGES FROM
TOPICS.

EXAMPLE: A MICROSERVICE
PRODUCING ORDER DATA AND
ANOTHER CONSUMING FOR
PROCESSING.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

BROKERS AND CLUSTERS

A BROKER IS A KAFKA SERVER THAT


STORES DATA AND SERVES CLIENTS.
MULTIPLE BROKERS FORM A KAFKA
CLUSTER, PROVIDING FAULT
TOLERANCE AND SCALABILITY.

EXAMPLE: A KAFKA CLUSTER WITH


THREE BROKERS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

DISK-BASED RETENTION

KAFKA RETAINS MESSAGES ON DISK


FOR A CONFIGURED PERIOD,
ALLOWING CONSUMERS TO READ AT
THEIR PACE.

EXAMPLE: RETAINING LOGS FOR 7


DAYS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MULTIPLE PRODUCERS
AND CONSUMERS
KAFKA SUPPORTS MULTIPLE
PRODUCERS AND CONSUMERS FOR THE
SAME TOPIC, ENABLING FLEXIBLE
DATA PIPELINES.

EXAMPLE: MULTIPLE SENSORS


PRODUCING DATA TO A SINGLE TOPIC,
MULTIPLE ANALYTICS SERVICES
CONSUMING IT.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

HIGH THROUGHPUT

KAFKA CAN HANDLE LARGE VOLUMES


OF MESSAGES EFFICIENTLY DUE TO ITS
ARCHITECTURE.

EXAMPLE: PROCESSING MILLIONS OF


LOG ENTRIES PER SECOND.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

STREAM PROCESSING

KAFKA SUPPORTS REAL-TIME


PROCESSING OF STREAMS OF DATA
USING TOOLS LIKE KAFKA STREAMS.

EXAMPLE: REAL-TIME ANALYTICS ON


INCOMING TRANSACTION DATA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA CONNECT

KAFKA CONNECT SIMPLIFIES THE


INTEGRATION OF KAFKA WITH OTHER
DATA SYSTEMS.

EXAMPLE: USING KAFKA CONNECT TO


SYNC DATA BETWEEN A DATABASE
AND A KAFKA TOPIC.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA STREAMS API

KAFKA STREAMS API ALLOWS BUILDING


STREAM PROCESSING APPLICATIONS
WITH KAFKA.

EXAMPLE: AN APPLICATION THAT


AGGREGATES USER CLICKSTREAM DATA
IN REAL-TIME.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

LOG COMPACTION

KAFKA CAN RETAIN ONLY THE LATEST


MESSAGE PER KEY IN A LOG-
COMPACTED TOPIC, USEFUL FOR
CHANGELOG DATA.

EXAMPLE: KEEPING ONLY THE LATEST


UPDATE TO USER PROFILES.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

EXACTLY ONCE
SEMANTICS
KAFKA ENSURES THAT MESSAGES ARE
PROCESSED EXACTLY ONCE, EVEN IN
DISTRIBUTED SYSTEMS.

EXAMPLE: FINANCIAL TRANSACTIONS


PROCESSED WITHOUT DUPLICATES.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

IDEMPOTENT PRODUCER

PRODUCERS CAN SAFELY RETRY


SENDING MESSAGES WITHOUT
DUPLICATING THEM.

EXAMPLE: SENDING A PAYMENT


CONFIRMATION MESSAGE WITH
GUARANTEED SINGLE DELIVERY.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

TRANSACTIONS

KAFKA SUPPORTS ATOMIC WRITES


ACROSS MULTIPLE PARTITIONS AND
TOPICS USING TRANSACTIONS.

EXAMPLE: ENSURING THAT A SERIES


OF RELATED MESSAGES ARE EITHER
ALL WRITTEN OR NONE ARE.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MIRRORMAKER

TOOL FOR REPLICATING KAFKA TOPICS


ACROSS CLUSTERS, USEFUL FOR
DISASTER RECOVERY AND MULTI-
DATACENTER SETUPS.

EXAMPLE: MIRRORING PRODUCTION


DATA TO A BACKUP DATACENTER.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

SECURITY

KAFKA SUPPORTS AUTHENTICATION,


AUTHORIZATION, AND ENCRYPTION
TO SECURE DATA.

EXAMPLE: USING SSL FOR


ENCRYPTING DATA IN TRANSIT.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA ADMINCLIENT

ADMINCLIENT API ALLOWS


PROGRAMMATIC MANAGEMENT OF
KAFKA.

EXAMPLE: CREATING TOPICS,


ALTERING CONFIGURATIONS
PROGRAMMATICALLY.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MONITORING AND
METRICS
KAFKA PROVIDES METRICS AND
MONITORING TOOLS TO TRACK
CLUSTER PERFORMANCE.

EXAMPLE: MONITORING CONSUMER


LAG AND BROKER HEALTH.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

SERIALIZATION AND
DESERIALIZATION
KAFKA REQUIRES SERIALIZATION OF
DATA FOR TRANSMISSION, WITH
SUPPORT FOR VARIOUS FORMATS LIKE
AVRO, JSON.

EXAMPLE: SERIALIZING USER DATA TO


AVRO FORMAT BEFORE SENDING TO
KAFKA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MESSAGE ORDERING

KAFKA MAINTAINS THE ORDER OF


MESSAGES WITHIN A PARTITION,
IMPORTANT FOR CONSISTENCY.

EXAMPLE: ENSURING ORDER OF


TRANSACTION LOGS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

CONSUMER GROUP

CONSUMERS CAN JOIN GROUPS TO


BALANCE LOAD AND ENSURE EACH
MESSAGE IS PROCESSED ONCE.

EXAMPLE: MULTIPLE CONSUMERS


PROCESSING A HIGH-VOLUME TOPIC
COLLABORATIVELY.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

OFFSET MANAGEMENT

KAFKA TRACKS THE OFFSET OF


MESSAGES TO MANAGE CONSUMER
PROGRESS.

EXAMPLE: STORING OFFSETS IN KAFKA


TO RESUME PROCESSING AFTER A
RESTART.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

TOPIC REPLICATION

KAFKA REPLICATES PARTITIONS


ACROSS MULTIPLE BROKERS FOR
FAULT TOLERANCE.

EXAMPLE: A PARTITION REPLICATED


ACROSS THREE BROKERS TO HANDLE
BROKER FAILURE.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MESSAGE COMPRESSION

KAFKA SUPPORTS COMPRESSING


MESSAGES TO SAVE BANDWIDTH AND
STORAGE.

EXAMPLE: COMPRESSING LOG


MESSAGES BEFORE SENDING TO
KAFKA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

ZOOKEEPER

KAFKA USES ZOOKEEPER FOR


DISTRIBUTED COORDINATION AND
METADATA MANAGEMENT.

EXAMPLE: ZOOKEEPER MANAGING


BROKER METADATA AND LEADER
ELECTION.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA BROKER
CONFIGURATION
BROKERS CAN BE CONFIGURED FOR
PERFORMANCE, RETENTION POLICIES,
AND MORE.

EXAMPLE: CONFIGURING A BROKER TO


RETAIN MESSAGES FOR 30 DAYS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

PRODUCER
CONFIGURATION
PRODUCERS HAVE CONFIGURABLE
PARAMETERS FOR MESSAGE DELIVERY,
RETRIES, AND MORE.

EXAMPLE: SETTING PRODUCER


RETRIES TO HANDLE TRANSIENT
FAILURES.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

CONSUMER
CONFIGURATION
CONSUMERS CAN BE CONFIGURED FOR
FETCH SIZES, TIMEOUT SETTINGS, AND
MORE.

EXAMPLE: CONFIGURING CONSUMER


FETCH SIZE FOR OPTIMAL
PERFORMANCE.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

TOPIC MANAGEMENT

TOPICS CAN BE CREATED, DELETED,


AND MANAGED PROGRAMMATICALLY
OR VIA CLI.

EXAMPLE: CREATING A NEW TOPIC


FOR STORING EVENT LOGS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

QUOTAS AND
THROTTLING
KAFKA SUPPORTS SETTING QUOTAS TO
CONTROL RESOURCE USAGE BY
CLIENTS.

EXAMPLE: THROTTLING A HIGH-


VOLUME PRODUCER TO PREVENT
OVERWHELMING THE CLUSTER.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

REBALANCE PROTOCOL

KAFKA HANDLES REBALANCING OF


CONSUMERS WITHIN A GROUP TO
MAINTAIN LOAD BALANCE.

EXAMPLE: REBALANCING PARTITIONS


WHEN A NEW CONSUMER JOINS THE
GROUP.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA REST PROXY

PROVIDES A RESTFUL INTERFACE TO


INTERACT WITH KAFKA CLUSTERS.

EXAMPLE: SENDING MESSAGES TO


KAFKA USING HTTP REQUESTS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA API

KAFKA PROVIDES APIS FOR


PRODUCING, CONSUMING, AND
MANAGING DATA.

EXAMPLE: USING THE KAFKA


PRODUCER API TO SEND MESSAGES
FROM A JAVA APPLICATION.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

SCHEMA REGISTRY

CONFLUENT SCHEMA REGISTRY


MANAGES AND ENFORCES SCHEMAS
FOR KAFKA MESSAGES.

EXAMPLE: ENSURING ALL MESSAGES


IN A TOPIC FOLLOW A PREDEFINED
SCHEMA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA STREAMS DSL

A HIGH-LEVEL API FOR STREAM


PROCESSING IN KAFKA.

EXAMPLE: USING KAFKA STREAMS DSL


TO FILTER AND TRANSFORM A STREAM
OF EVENTS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

FAULT TOLERANCE

KAFKA’S DESIGN ENSURES HIGH


AVAILABILITY AND FAULT TOLERANCE.

EXAMPLE: AUTOMATIC FAILOVER TO


REPLICAS WHEN A BROKER FAILS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

REAL-TIME ANALYTICS

KAFKA SUPPORTS REAL-TIME DATA


ANALYTICS AND PROCESSING.

EXAMPLE: REAL-TIME DASHBOARD


UPDATING WITH LIVE METRICS FROM
KAFKA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

ETL PIPELINES

KAFKA CAN BE USED TO BUILD


EFFICIENT ETL PIPELINES FOR DATA
INTEGRATION.

EXAMPLE: EXTRACTING DATA FROM


DATABASES, TRANSFORMING IT, AND
LOADING IT INTO A DATA WAREHOUSE
VIA KAFKA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA UPGRADES

KAFKA SUPPORTS ROLLING UPGRADES


TO MINIMIZE DOWNTIME.

EXAMPLE: UPGRADING KAFKA


BROKERS WITHOUT DISRUPTING
MESSAGE FLOW.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

MESSAGE
TIMESTAMPING
KAFKA MESSAGES CAN HAVE
TIMESTAMPS FOR TIME-BASED
PROCESSING.

EXAMPLE: USING TIMESTAMPS FOR


EVENT TIME PROCESSING IN KAFKA
STREAMS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

STATE STORES

KAFKA STREAMS ALLOWS


MAINTAINING STATEFUL PROCESSING
WITH STATE STORES.

EXAMPLE: COUNTING OCCURRENCES


OF EVENTS OVER A WINDOW OF TIME
USING STATE STORES.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

WINDOWED OPERATIONS

KAFKA STREAMS SUPPORTS


WINDOWED OPERATIONS FOR
AGGREGATIONS OVER TIME WINDOWS.

EXAMPLE: CALCULATING THE SUM OF


TRANSACTIONS EVERY MINUTE.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KSQL

KSQL IS A SQL-LIKE INTERFACE FOR


STREAM PROCESSING WITH KAFKA.

EXAMPLE: USING KSQL TO PERFORM


REAL-TIME FILTERING AND
AGGREGATIONS ON KAFKA TOPICS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA ECOSYSTEM

KAFKA’S ECOSYSTEM INCLUDES TOOLS


LIKE CONNECT, STREAMS, KSQL, AND
MORE FOR COMPREHENSIVE DATA
PROCESSING.

EXAMPLE: USING KAFKA CONNECT TO


INTEGRATE WITH DATABASES, KAFKA
STREAMS FOR PROCESSING, AND KSQL
FOR QUERYING STREAMS.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA CONNECTORS

PRE-BUILT CONNECTORS FOR


INTEGRATING KAFKA WITH VARIOUS
DATA SOURCES AND SINKS.

EXAMPLE: USING A JDBC CONNECTOR


TO SYNC DATA BETWEEN A DATABASE
AND KAFKA.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

KAFKA CLUSTER
MANAGEMENT
TOOLS AND PRACTICES FOR
MANAGING KAFKA CLUSTERS
EFFICIENTLY.

EXAMPLE: USING TOOLS LIKE KAFKA


MANAGER FOR MONITORING AND
MANAGING CLUSTER HEALTH.

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101 - Kafka

TIERED STORAGE

KAFKA’S TIERED STORAGE ALLOWS


OFFLOADING OLDER DATA TO CHEAPER
STORAGE.

EXAMPLE: STORING OLDER KAFKA


TOPIC DATA IN S3 TO REDUCE ON-
PREM STORAGE COSTS.

Shwetank Singh
GritSetGrow - GSGLearn.com

You might also like