KEMBAR78
Introduction Apache Kafka | PPTX
Apache Kafka 
Introduction 
http://kafka.apache.org/
Joe Stein 
• Developer, Architect & Technologist 
• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly 
Big Data Open Source Security LLC provides professional services and product solutions for the collection, 
storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and 
distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data 
Infrastructure Components to use but also how to change their existing (or build new) systems to work with 
them. 
• Apache Kafka Committer & PMC member 
• Blog & Podcast - http://allthingshadoop.com 
• Twitter @allthingshadoop
Apache Kafka 
• Apache Kafka 
o http://kafka.apache.org 
• Apache Kafka Source Code 
o https://github.com/apache/kafka 
• Documentation 
o http://kafka.apache.org/documentation.html 
• Wiki 
o https://cwiki.apache.org/confluence/display/KAFKA/Index
Kafka decouples data-pipelines
Topics & Partitions
A high-throughput distributed messaging system 
rethought as a distributed commit log.
More! 
• Producers - ** push ** 
o Batching 
o Compression 
o Sync (Ack), Async (auto batch) 
o Replication 
o Sequential writes, guaranteed ordering within each partition 
• Consumers - ** pull ** 
o No state held by broker 
o Consumers control reading from the stream 
• Zero Copy for producers and consumers to and from the broker 
http://kafka.apache.org/documentation.html#maximizingefficiency 
• Message stay on disk when consumed, deletes on TTL or compaction 
https://kafka.apache.org/documentation.html#compaction
Client Libraries 
Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients 
• Python - Pure Python implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• C - High performance C library with full protocol support 
• C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. 
• Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy 
compression supported. Ruby 1.9.3 and up (CI runs MRI 2. 
• Clojure - Clojure DSL for the Kafka API 
• JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation 
• stdin & stdout 
Wire Protocol Developers Guide 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Really Quick Start (Scala) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/scala-kafka 
4) cd scala-kafka 
5) vagrant up 
Zookeeper will be running on 192.168.86.5 
BrokerOne will be running on 192.168.86.10 
All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 
6) ./gradlew test
Really Quick Start (Go) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/go-kafka 
4) cd go-kafka 
5) vagrant up 
6) vagrant ssh brokerOne 
7) cd /vagrant 
8) sudo ./test.sh
Questions? 
/******************************************* 
Joe Stein 
Founder, Principal Consultant 
Big Data Open Source Security LLC 
http://www.stealth.ly 
Twitter: @allthingshadoop 
********************************************/

Introduction Apache Kafka

  • 1.
    Apache Kafka Introduction http://kafka.apache.org/
  • 2.
    Joe Stein •Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  • 3.
    Apache Kafka •Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 4.
  • 5.
  • 6.
    A high-throughput distributedmessaging system rethought as a distributed commit log.
  • 7.
    More! • Producers- ** push ** o Batching o Compression o Sync (Ack), Async (auto batch) o Replication o Sequential writes, guaranteed ordering within each partition • Consumers - ** pull ** o No state held by broker o Consumers control reading from the stream • Zero Copy for producers and consumers to and from the broker http://kafka.apache.org/documentation.html#maximizingefficiency • Message stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  • 8.
    Client Libraries CommunityClients https://cwiki.apache.org/confluence/display/KAFKA/Clients • Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • C - High performance C library with full protocol support • C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. • Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. • Clojure - Clojure DSL for the Kafka API • JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation • stdin & stdout Wire Protocol Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
  • 9.
    Really Quick Start(Scala) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./gradlew test
  • 10.
    Really Quick Start(Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  • 11.
    Questions? /******************************************* JoeStein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/