Apache Kafka
Apache Kafka
Apache Kafka is like a communication system that help different parts of a
computer system exchange data by publishing and subscribing to topic
                                                             Subscriber
                  Publisher
    Sender                       Apache Kafka                             Receiver
Why we use Apache Kafka
• Ola driver location update
• Zomato live food tracking
• Notification system to huge users
                                                                 • Increase database throughput
                                      Zomato boy
                                                                 • Difficult to read write at frequent
                                                                   basic
   User                               Live location store   Data base
                                                             Zomato server
                         Zomato Boy
                     Update
User
               Publish
                                                 Zomato
                                                  server
       Kafka
       Topic
                                      Bulk Batch OP
Kafka Architecture
                                        Kafka cluster
                     Kafka Ecosystem
            Offset                       Kafka Broker 1
                      Topic A
                        Topic A           Partition
 Producer                                                 Consumer
                      Topic B
                                       Kafka Broker 2
                      Zookeeper
       Key concepts
• Kafka Ecosystem: The Kafka ecosystem refers to the entire suite of tools,
 libraries, and components that complement Apache Kafka for building real-time
 data pipelines, stream processing applications, and other distributed systems.
• Kafka Topic: A Kafka topic is a category or feed name to which messages are
 published by producers. Topics are divided into partitions to allow parallelism and
 scalability within a Kafka cluster. Each message published to a topic is appended
 to one of its partitions.
• Kafka Broker: A Kafka broker is a Kafka server that runs in a Kafka cluster. It stores
  and manages partitions, handles producer requests, and serves consumer
  requests. Brokers are responsible for storing and replicating data across the
  cluster.
• Kafka Cluster: A Kafka cluster is a group of Kafka brokers working together to
 store and manage topics and handle the load from producers and consumers.
 Kafka clusters provide scalability, fault tolerance, and high availability by
 distributing data partitions across multiple brokers.
• Partition: A partition is a unit of parallelism in Kafka. Topics are divided into
 partitions, and each partition is replicated across multiple brokers for fault
 tolerance. Messages within a partition are ordered and assigned a sequential id
 called an offset.
• Offset: An offset is a unique identifier assigned to each message within a
 partition. Offsets are sequential integers that represent the position of a message
 within the partition. Consumers use offsets to track their position in a partition
 and retrieve messages.
• Zookeeper: Apache Zookeeper is a centralized service used by Kafka for
 managing and coordinating Kafka brokers and maintaining cluster metadata. It
 handles tasks such as leader election, maintaining configuration information, and
 detecting broker failures.
• Producer: A Kafka producer is a client application that publishes messages to
  Kafka topics. Producers send messages (key-value pairs) to Kafka brokers, which
  then append the messages to the appropriate topic partitions based on the
  message key (optional) and partitioning strategy.
• Consumer:A Kafka consumer is a client application that subscribes to topics and
  reads messages from Kafka brokers. Consumers read messages from partitions,
  process them, and maintain their own offset to track their position in each
  partition. Consumers can be part of a consumer group for load balancing and
  parallelism.
  Key Features of Kafka
• Distributed Messaging System: Kafka is designed as a distributed messaging system,
  providing a unified platform for handling real-time data feeds with high-throughput,
  fault tolerance, and horizontal scalability.
• Partitioning:Kafka topics are divided into partitions, allowing data within a topic to be
  distributed across multiple Kafka brokers. Partitioning enables horizontal scalability
  and improves parallelism for data processing.
• Replication: Kafka replicates partitions across multiple brokers to ensure fault
  tolerance and data durability. Replication ensures that data is not lost even if some
  brokers or nodes fail.
• High Throughput: Kafka is optimized for high throughput and low latency, making it
  suitable for handling large volumes of data and supporting real-time data processing
  and analytics.
• Fault Tolerance: Kafka provides built-in replication and leader election mechanisms to
  maintain availability and durability of data, even in the event of broker failures.
• Scalability: Kafka scales horizontally by adding more brokers to the cluster and
  partitioning topics across multiple nodes. This scalability allows Kafka to handle
  increasing data volumes and growing workloads.
• Streaming:Kafka supports stream processing with the Kafka Streams API and
  integration with Apache Kafka Connect for connecting Kafka with external systems
  such as databases and data lakes.
• Exactly-once Semantics:Kafka guarantees exactly-once semantics for message
  delivery between producers and consumers. This ensures that messages are
  processed exactly once, addressing concerns about data consistency.
• Connectivity and Integration: Kafka Connect simplifies integration with external
  systems by providing connectors for various data sources and sinks. It allows
  seamless data movement between Kafka and other systems.
• Ecosystem and Community:Kafka has a vibrant ecosystem with support for
  monitoring, management, and integration tools. It is backed by a strong
  community and active development, ensuring continuous improvement and
  innovation.