0% found this document useful (0 votes)

4 views13 pages

Kafka Interview Preparation

Apache Kafka is a distributed messaging system that allows real-time data transmission, storage, and processing, utilizing a publish-subscribe model. It features advantages such as loose coupling, durability, scalability, and flexibility, and differentiates itself from traditional message brokers like ActiveMQ/RabbitMQ through its pull-based message delivery and high throughput. Kafka organizes data into topics and partitions, ensuring reliable message replication and leader-follower dynamics for efficient data handling.

Uploaded by

peacock20221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views13 pages

Kafka Interview Preparation

Uploaded by

peacock20221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Kafka Interview Preparation

What is Apache Kafka?

Apache Kafka is a distributed, fault-tolerant, and high-throughput messaging system
based on the publish-subscribe model. It was originally developed at LinkedIn and open-
sourced in 2011.

• Send and receive real-time data

• → Like a chat app, where one service sends a message and others instantly get it.

• Store data safely

→ If a service is offline, Kafka saves the data until the service comes back and reads it.

• Process data as it flows

→ Kafka can also analyze data as it moves, using something called the Kafka Streams API.

Example:

Imagine you are getting live order data from an online store:

o Without Kafka Streams: You just collect and store all the orders.
o With Kafka Streams: You can count orders, find totals, or detect fraud —
while the data is still coming in.

So, Kafka Streams helps you do smart things with data in real time, without needing to
store and analyze it later.

Kafka Advantages
• Loose Coupling: Services communicate without needing to know about each other.
• Durability: Messages are stored on disk and can be replayed even if the consumer
is down.
• Scalability: Easily handles large volumes of data and can scale horizontally.
• Flexibility: New consumers can be added without modifying the producers.
Explain Horizontal scaling and vertical scaling?

Horizontal Scaling (Scale Out)

• Add more machines to handle more load.

• Each machine (or server) handles part of the work.
• Common in distributed systems like Kafka.

Example:

You have 1 Kafka broker. If traffic increases, you add 2 more brokers — now 3 Kafka servers
share the load.

Pros: More flexible, fault-tolerant, and easy to scale.

Cons: Slightly complex to manage.

Vertical Scaling (Scale Up)

• Add more power (CPU, RAM, storage) to your existing machine.

• You're making a single server stronger.

Example:

You upgrade your Kafka server from 8 GB RAM to 32 GB RAM to handle more messages.

Pros: Simpler setup.

Cons: There's a limit to how much you can upgrade one machine, and if it fails —
everything fails.

Kafka uses Horizontal Scaling - Because it can run on multiple servers (called
brokers) and share the data among them.

Step by step Kafka Flow:

Step-by-step Flow:

1. Producer creates and sends events.

2. Kafka producer batches the events (internally).
3. These batches are sent directly to Kafka brokers.
4. The broker receives and stores them in the topic partition on disk.
5. Consumers later pull from these topic partitions.

Example Flow (RabbitMQ):

Producer sends message → Broker queues it.
1. Broker delivers the message to a consumer.
2. Consumer processes it and sends back an acknowledgment (ACK).
3. Broker removes the message from the queue.

If no ACK is received, broker will:

• Assume failure (e.g., consumer crashed).

• Redeliver the message to another consumer.

This process is called message tracking or delivery state management.

Differences between ActiveMQ/RabbitMQ and kafka?

Feature Active MQ/Rabbit MQ kafka

Type Message Broker (traditional) Distributed Event Streaming Platform

Message Push-based — Broker sends Pull-based — Consumers fetch (pull)

Delivery (pushes) messages to messages when they’re ready.
consumers.

Throughput Lower Throughput — Each High Throughput — Producers don’t

message needs wait for acknowledgments. Messages
acknowledgment, message state are written fast in batches.
is tracked.

Message Short-term — Once consumed, Long-term — Messages are stored for

Storage the message is deleted. a defined time, even after being read.

Message Not possible — Once a message Possible — Consumers can re-read

Replay is consumed, it’s gone. old messages using offsets.

Ordering Maintains order within a queue. Maintains order within a partition.

Scaling Limited Horizontal Scaling — Easy Horizontal Scaling — Add more

Difficult to scale across nodes. brokers and partitions.
Delivery Supports at-most-once, at-least- Supports at-least-once and exactly-
Guarantees once, exactly-once (with plugins). once (with config).

Latency Low latency for small workloads Slightly higher latency, but optimized
for throughput.

Stream Not supported natively. Built-in Kafka Streams API for real-
Processing time processing.

Use Cases Ideal for traditional messaging: .Ideal for data pipelines, analytics, log
commands, notifications, tasks aggregation, and real-time apps.

message state is tracked means →

The broker keeps track of the message state:

• Has it been delivered?

• Has it been acknowledged (successfully processed)?
• Should it be redelivered if no ACK is received?

Consumers can re-read old messages using offsets. →

In Kafka, every message in a partition is stored with a unique offset — a number that tells
the position of the message.

Kafka UI:
Kafka UI is a web-based tool that helps you interact with Kafka visually. (like we will
open https:localhost:8080 in browser right.. that only kafka UI)

Instead of using command-line tools, Kafka UI lets you see and manage Kafka data easily
from your browser.

Kafka UI Lets You:

1. See all topic names in the Kafka cluster

2. Check how many partitions each topic has
3. View replication factor (how many copies per partition)
4. See how many messages are in each topic
5. Send (produce) messages to any topic manually
6. Read (consume) messages from a topic
7. Test message flow without writing code → Kafka UI allows you to send and read
messages directly from your browser —

without needing to write Java, Python, or CLI scripts.

8. View all consumer groups

9. Track offsets (how far each group has read) → If a partition has 100 messages
(offsets 0 to 99), and a consumer group has read up to offset 79, then:

The group’s offset = 79

It has read 80 messages

10. Check consumer lag (how behind the group is) → Lag means the difference between
the latest message in Kafka and the consumer’s current offset.

✅ Example:

a. Latest message in topic = offset 99

b. Group has read up to = offset 79
c. Lag = 99 - 79 = 20

So the consumer is 20 messages behind. That’s the consumer lag.

11. Monitor group activity/status

12. See all Kafka brokers and their status (online/offline)
13. Create new topics
14. Delete topics
15. Edit topic settings (like partitions, retention time) → Retention Time
a. This defines how long Kafka should keep messages before deleting them.

Why Use Kafka UI?

• Easy to use — no command-line needed

• Helps developers test Kafka features
• Useful for debugging and monitoring
• Saves time during development and support
What is a Kafka Topic?
A Kafka topic is like a container where messages (also called events) are stored.

• Producers send (write) messages into the topic.

• Consumers read messages from the topic.
• You can have many producers and many consumers for a single topic — this is
called multi-producer, multi-consumer.

Important point:

In most traditional systems, once a message is read, it's deleted.

But in Kafka, messages are not deleted after being read.

Instead, Kafka keeps them for a specific amount of time (like 7 days or more) — this is
called retention.

Think of it like this:

Topic = Folder

Message = File inside the folder

What are Partitions?

Kafka divides a topic into smaller parts called partitions.

• Each partition stores messages in the order they arrive.

• Once a message is added to a partition, it can’t be changed.
• Every message inside a partition has a unique number called an offset (used to
identify the message).

Why use partitions?

• So that multiple consumers can read from the same topic at the same time.
• This improves speed and performance.

Example:
If a topic has 3 partitions, then 3 consumers can read from it in parallel.

What is Replication?
Kafka makes copies of each partition to make the system more reliable. These copies are
called replicas.

• Replication is done at the partition level.

• One of the replicas is the leader → it handles all reads and writes.
• The other replicas are followers → they copy data from the leader.

If the leader fails, one of the followers will automatically take over as the new leader.

This keeps the system running without data loss.

Leader and Follower Behavior ?

• Each partition has one broker acting as the leader.
• The other brokers for that partition are followers.
• The leader handles all client requests (reading or writing messages).
• Followers replicate (copy) the leader’s data to stay in sync.

If the leader fails, Kafka automatically promotes a follower to become the new
leader, ensuring no interruption in service.

How a Producer Sends a Message

1. The producer sends a message to a topic.
2. Kafka routes it to the leader of a partition within that topic.
3. The leader:
a. Appends the message to the partition.
b. Assigns it a unique offset.
c. Increments the offset for the next message.

In Simple Terms
• Topic = a folder to hold messages.
• Partitions = split the topic into parts for better performance.
• Offsets = unique ID for each message in a partition.
• Replication = backup copies of partitions to prevent data loss
• Leader = handles all writes/reads.
• Followers = sync data from leader.
• Producers send to the leader, and the leader stores the message with an offset.

Will all replicas be on different brokers?

Yes — ideally, Kafka tries to place each replica on a different broker.

This is done to ensure high availability.

• If one broker goes down, another broker still has the data.
• This is why a replication factor must be less than or equal to the number of
brokers.

How Does the Producer Choose the Partition?

When a producer sends a message to a Kafka topic, it decides which partition the
message should go to using one of the following methods:

1. Manual Partitioning

You explicitly specify the partition number:

javaCopyEditnew ProducerRecord<>("topic", 2, "key", "value");

• The message is sent directly to partition 2.

• Kafka does not interfere.

2. Key-based Partitioning

You provide a key, but not a partition:

javaCopyEditnew ProducerRecord<>("topic", "user123", "value");

• The producer uses:

iniCopyEditpartition = hash(key) % total_partitions

• All messages with the same key (e.g., "user123") go to the same partition.
• Ensures order is preserved for that key.

3. Round-Robin Partitioning (No Key)

You don’t provide a key or partition:

javaCopyEditnew ProducerRecord<>("topic", "value");

• The producer distributes messages in a round-robin fashion.

• Ensures load is balanced across all partitions.

iniCopyEditpartition = hash(key) % total_partitions → here total partitions

value how we will get ?

When the producer sends a message:

• It automatically contacts the Kafka cluster

• Gets the metadata of the topic
o This includes how many partitions the topic has

So, you don’t need to manually set total_partitions in your code.

The Kafka producer client fetches it from the broker.

Consumers?
• A Kafka consumer is an application that reads messages from a Kafka topic. Just
like how a producer sends data, a consumer receives it.
• When a topic has multiple partitions, a consumer can read from one or more of
those partitions. But in most cases, consumers work together in a consumer
group. → Each consumer in the group takes care of some partitions, so they share
the work and process messages faster and more efficiently.
• A consumer group is a set of consumers that share the work of reading messages
from a topic. Within a group, each partition is read by only one consumer at a
time. This ensures that no message is read twice within the same group. However, if
there are more partitions than consumers, some consumers may handle multiple
partitions. If there are more consumers than partitions, some consumers will be
idle.
• Every message in a partition has a unique number called an offset. The consumer
keeps track of the offset to know which messages it has already read. This way, it
avoids processing the same message again. Kafka can auto-commit this offset or
let the developer commit it manually.
• Kafka does not send messages to consumers automatically.

Instead, consumers ask (poll) Kafka regularly to check if there are new messages.

This is called pull-based communication.

• You can have many consumer groups reading from the same topic, and each
group will get all the messages.

So if you have two different services (like billing and notification), both can read the same
topic independently, using their own group IDs.

• Each message in Kafka has a unique offset (like a line number).

Consumers keep track of the last offset they read, so they don’t read the same message
again.

Kafka can auto-save this or let you save it manually.

Broker
A broker is a Kafka server.

Kafka runs as a cluster of brokers — usually 3 or more.

Each broker:

• Stores topic data (partitions)

• Handles read/write requests from producers and consumers
• Coordinates with other brokers in the cluster

Example:

If you have 3 brokers: Broker 1, Broker 2, Broker 3 — Kafka will spread partitions and
replicas across these brokers.
Partition
A partition is a part (or slice) of a Kafka topic.

Messages inside a partition are:

• Stored in order
• Identified by a unique number called an offset

A topic can have one or many partitions.

Example:

If a topic has 3 partitions, Kafka will split messages across Partition 0, Partition 1, and
Partition 2.

What is ZooKeeper?
ZooKeeper is a coordination service used by Kafka (in older versions) to manage and
monitor the Kafka cluster.

Think of ZooKeeper like a manager who keeps track of:

• Which Kafka brokers are alive

• Who is the leader of each partition
• Helps in electing a new leader if one broker goes down

What Does ZooKeeper Do in Kafka?

1. Keeps track of all brokers in the cluster

(e.g., Broker 1, Broker 2, etc.)

2. Manages leader election

For each partition, there is one leader broker.

If the leader fails, ZooKeeper helps pick a new leader from the followers.

3. Stores metadata
Like topic names, partition counts, replica info, etc.

4. Helps Kafka maintain cluster health

Kafkha
No ratings yet
Kafkha
32 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka
No ratings yet
Kafka
43 pages
Kafka for Developers and Engineers
No ratings yet
Kafka for Developers and Engineers
7 pages
Kafka
No ratings yet
Kafka
12 pages
Kafka Overview
No ratings yet
Kafka Overview
36 pages
Kafka
No ratings yet
Kafka
23 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
100% (1)
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Introduction To Apache Kafka and Its Setup
No ratings yet
Introduction To Apache Kafka and Its Setup
29 pages
Apache Kafka
No ratings yet
Apache Kafka
10 pages
Big Data - Group 14
No ratings yet
Big Data - Group 14
26 pages
5 Kafka 2.7m
No ratings yet
5 Kafka 2.7m
46 pages
Kafka & Spring Boot for Developers
No ratings yet
Kafka & Spring Boot for Developers
150 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
Kafka With Spring Boot
No ratings yet
Kafka With Spring Boot
48 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Kafka & Confluent: A Technical Guide
No ratings yet
Kafka & Confluent: A Technical Guide
72 pages
Kafka
No ratings yet
Kafka
15 pages
Kafka Notes2
No ratings yet
Kafka Notes2
19 pages
Apache - Kafka Notes
No ratings yet
Apache - Kafka Notes
9 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
Apache Kafka - Thi Nguyen's Blog
No ratings yet
Apache Kafka - Thi Nguyen's Blog
39 pages
AK
No ratings yet
AK
22 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Apache Kafka Beginner Guide
No ratings yet
Apache Kafka Beginner Guide
40 pages
Big Data-Kafka
No ratings yet
Big Data-Kafka
14 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Kafka Presentation
No ratings yet
Kafka Presentation
16 pages
Apache Kafka 360 1631077800
No ratings yet
Apache Kafka 360 1631077800
137 pages
Apache Kafka Beginner Guide Final
No ratings yet
Apache Kafka Beginner Guide Final
3 pages
Kafka Notes1
No ratings yet
Kafka Notes1
19 pages
Documentation
No ratings yet
Documentation
105 pages
Kafka
No ratings yet
Kafka
88 pages
Best Practices For Apache Kafka
No ratings yet
Best Practices For Apache Kafka
6 pages
SITA1603 Unit 3 Material
No ratings yet
SITA1603 Unit 3 Material
45 pages
Kafka
No ratings yet
Kafka
140 pages
Kafka Topic Questions
No ratings yet
Kafka Topic Questions
9 pages
08 Apache Kafka
No ratings yet
08 Apache Kafka
45 pages
Kafka for Big Data Messaging
No ratings yet
Kafka for Big Data Messaging
3 pages
Kafka Patterns and Anti-Patterns
No ratings yet
Kafka Patterns and Anti-Patterns
7 pages
Kafka Monitoring Essentials
No ratings yet
Kafka Monitoring Essentials
64 pages
Some Special Terms in Kafka
No ratings yet
Some Special Terms in Kafka
10 pages
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
No ratings yet
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
33 pages
KAFKAExample 2
No ratings yet
KAFKAExample 2
12 pages
Apache Kafka - Introduction
No ratings yet
Apache Kafka - Introduction
2 pages
Introduction to Kafka & Microservices
No ratings yet
Introduction to Kafka & Microservices
17 pages
Kafka
No ratings yet
Kafka
26 pages
Apache Kafka
No ratings yet
Apache Kafka
43 pages
Kafka
No ratings yet
Kafka
5 pages
Kafka
No ratings yet
Kafka
3 pages
Apache Kafka Notes
No ratings yet
Apache Kafka Notes
11 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
S4 Hana Training Course Content
No ratings yet
S4 Hana Training Course Content
6 pages
BASIS Tcodes 1page
No ratings yet
BASIS Tcodes 1page
1 page
Java Project
100% (1)
Java Project
282 pages
Disclaimer
No ratings yet
Disclaimer
1 page
PWC Cloud Computing
No ratings yet
PWC Cloud Computing
8 pages
Composite Groups: "SRDF Consistency Group Operations" On Page 191
No ratings yet
Composite Groups: "SRDF Consistency Group Operations" On Page 191
1 page
Simoes Masters Document
No ratings yet
Simoes Masters Document
18 pages
Zabbix Documentation 7.2.en
No ratings yet
Zabbix Documentation 7.2.en
2,088 pages
Cyber Security
No ratings yet
Cyber Security
29 pages
Lab 05 - JIRA Project With Kanban Board
No ratings yet
Lab 05 - JIRA Project With Kanban Board
17 pages
47 Process Management Body of Knowlage
No ratings yet
47 Process Management Body of Knowlage
5 pages
U2L7 Internet Dilemmas Proj Guide (GROUP)
No ratings yet
U2L7 Internet Dilemmas Proj Guide (GROUP)
4 pages
Fleet Management System Features
No ratings yet
Fleet Management System Features
9 pages
Shoplify 4
No ratings yet
Shoplify 4
9 pages
Presentation About Hacking (Cybersecurity)
No ratings yet
Presentation About Hacking (Cybersecurity)
15 pages
The Datto Advantage: Products Built For The MSP
No ratings yet
The Datto Advantage: Products Built For The MSP
36 pages
List of Candidates - Libis
No ratings yet
List of Candidates - Libis
2 pages
SRS1
No ratings yet
SRS1
9 pages
3D Modeling for Decentraland Users
No ratings yet
3D Modeling for Decentraland Users
10 pages
3rd International Conference On Computer Science, Engineering and Information Technology Trends (CSEITT 2025)
No ratings yet
3rd International Conference On Computer Science, Engineering and Information Technology Trends (CSEITT 2025)
3 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Prisoner Information Managment System
100% (1)
Prisoner Information Managment System
62 pages
System and Functional Testing
No ratings yet
System and Functional Testing
19 pages
GCF Probable Questions
No ratings yet
GCF Probable Questions
2 pages
GIS & Asset Management Services Proposal
No ratings yet
GIS & Asset Management Services Proposal
19 pages
FAQ Mass Change / Mass Retirement: Symptom
No ratings yet
FAQ Mass Change / Mass Retirement: Symptom
2 pages
Sahil Agarwal CV 22
No ratings yet
Sahil Agarwal CV 22
1 page
Certified Digital Marketing Professional - Session 01
No ratings yet
Certified Digital Marketing Professional - Session 01
13 pages
Release NotesMagnifiGO 5.3R3
No ratings yet
Release NotesMagnifiGO 5.3R3
3 pages
Digital Marketing Insights
No ratings yet
Digital Marketing Insights
25 pages