Building Data Streaming Applications with
Apache Kafka
Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or
an enterprise messaging system. It lets you publish and subscribe to a stream of records, and
process them in a fault-tolerant way as they occur.
This course is a comprehensive guide to designing and architecting enterprise -grade
streaming applications using Apache Kafka and other big data tools. It includes best practices
for building such applications, and tackles some common challenges such as how to use Kafka
efficiently and handle high data volumes with ease. This course first takes you through
understanding the type messaging system and then provides a thorough introduction to
Apache Kafka and its internal details. The second part of the course takes you through
designing streaming application using various frameworks and tools such as Apache Spark,
Apache Storm, and more. Once you grasp the basics, we will take you through more advanced
concepts in Apache Kafka such as capacity planning and security.
What You Will Learn
• Learn the basics of Apache Kafka from scratch
• Use the basic building blocks of a streaming application
• Design effective streaming applications with Kafka using Spark, Storm &, and Heron
• Understand the importance of a low -latency , high- throughput, and fault-tolerant
messaging system
• Make effective capacity planning while deploying your Kafka Application
• Understand and implement the best security practices
Course Contents
• Introduction to Messaging Systems
1. Understanding the principles of messaging systems
2. Understanding messaging systems
3. Peeking into a point-to-point messaging system
4. Publish-subscribe messaging system
5. Advance Queuing Messaging Protocol
6. Using messaging systems in big data streaming applications
• Introducing Kafka the Distributed Messaging Platform
1. Kafka origins
2. Kafka's architecture
3. Message topics
4. Message partitions
5. Replication and replicated logs
6. Message producers
7. Message consumers
8. Role of Zookeeper
• Deep Dive into Kafka Producers
1. Kafka producer internals
2. Kafka Producer APIs
3. Java Kafka producer example
4. Common messaging publishing patterns
5. Best practices
• Deep Dive into Kafka Consumers
1. Kafka consumer internals
2. Kafka consumer APIs
3. Java Kafka consumer
4. Scala Kafka consumer
5. Common message consuming patterns
6. Best practices
• Building Spark Streaming Applications with Kafka
1. Introduction to Spark
2. Spark Streaming
3. Use case log processing - fraud IP detection
4. Producer
• Building Storm Applications with Kafka
1. Introduction to Apache Storm
2. Introduction to Apache Heron
3. Integrating Apache Kafka with Apache Storm - Java
4. Integrating Apache Kafka with Apache Storm - Scala
5. Use case – log processing in Storm, Kafka, Hive
• Using Kafka with Confluent Platform
1. Introduction to Confluent Platform
2. Deep driving into Confluent architecture
3. Understanding Kafka Connect and Kafka Stream
4. Playing with Avro using Schema Registry
5. Moving Kafka data to HDFS
• Building ETL Pipelines Using Kafka
1. Considerations for using Kafka in ETL pipelines
2. Introducing Kafka Connect
3. Deep dive into Kafka Connect
4. Introductory examples of using Kafka Connect
5. Kafka Connect common use cases
• Building Streaming Applications Using Kafka Streams
1. Introduction to Kafka Streams
2. Kafka Stream architecture
3. Integrated framework advantages
4. Understanding tables and Streams together
5. Use case example of Kafka Streams
• Kafka Cluster Deployment
1. Kafka cluster internals
2. Capacity planning
3. Single cluster deployment
4. Multicluster deployment
5. Decommissioning brokers
6. Data migration
• Using Kafka in Big Data Applications
1. Managing high volumes in Kafka
2. Kafka message delivery semantics
3. Big data and Kafka common usage patterns
4. Kafka and data governance
5. Alerting and monitoring
6. Useful Kafka matrices
• Securing Kafka
1. An overview of securing Kafka
2. Wire encryption using SSL
3. Kerberos SASL for authentication
4. Understanding ACL and authorization
5. Understanding Zookeeper authentication
6. Apache Ranger for authorization
7. Best practices
• Streaming Application Design Considerations
1. Latency and throughput
2. Data and state persistence
3. Data sources
4. External data lookups
5. Data formats
6. Data serialization
7. Level of parallelism
8. Out-of-order events
9. Message processing semantics