KEMBAR78
Apache Kafka | PPTX
a
Emre Akış
Outline
• Why do we use Apache Kafka ?
• What is it?
• How it works?
• Demo
• Ecosystem
1
Big Data
• Data doesn’t fit in one computer
• Welcome to the distributed systems 
2
(Near) Real-time Big Data & Analytics
• Events (e.g. clickstreams)
• Sensors
• Internet of Things (IoT)
• Data streams
3
Messaging Queues
4
FIFO
Distributed Messaging Queues
• Scalable
• Reliable
• High throughput (read & write)
5
Why’s for Apache Kafka
• Clean and simple architecture
• Easy to use
• Easy to deploy
• High throughput
• Scalability
• High availability
• Persistence (for a while)
6
Apache Kafka 101
• Distributed, partitioned, replicated commit log
service.
• Provides the functionality of a messaging
system.
7
Cluster
8
Language agnostic
TCP protocol
Cluster => group of servers(brokers)
Topic
9
• Category or feed name to which messages are
published.
• Partitioned log
• Each partition
– Ordered
– Immutable seq.
– Appended to
offset => sequential id number
Partition Distribution
• Distributed over servers in the cluster
• Replicated for fault tolerance (configurable)
• Each partition has a leader server (read &
writes)
• Others acts followers (replicate leader)
• In case of partition failure one of the followers
becomes new leader
10
Producer
• Decides which message to which partition
– Round-robin
– Semantic partitioning
11
Consumer
• Queue vs. Publish/Subscribe
• Traditional queue ordering vs per-partition
ordering
12
Guarantees
• Messages in a partition will be same order
they are sent by a producer.
• Consumers see messages in the stored order
in log.
13
Demo
• Basic Command Line Tools
– Start a server
– Create a topic
– Send a message
– Start a consumer
– Multi-broker cluster
• No arguments displays usage information
14
Clients
• Java
• Python
• Ruby
• Go
• C/C++
• .NET
• Clojure
• Node.js
• Scala
• JRuby
• Perl
• Erlang
• PHP
• Rust
• HTTP Rest
15https://cwiki.apache.org/confluence/display/KAFKA/Clients
Administrative Tools
• Kafka Manager (powered by Yahoo)
• Kafkat : Command-line administration for Kafka
brokers.
• Kafka Web Console : Displays information about
your Kafka cluster including which nodes are up
and what topics they host data for.
• Kafka Offset Monitor : Displays the state of all
consumers and how far behind the head of the
stream they are.
16
Ecosystem
• Samza
• Spark Streaming
• Storm
17https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Use Cases
• Messaging
• Website activity tracking (at Linkedin)
• Metrics
• Log aggregation
• Stream processing (with Storm or Samza)
• Event sourcing (state changes are logged by time)
• Commit log (like database transaction log – log
compaction)
18
Who uses ?
• LinkedIn
• Yahoo
• Twitter
• Netflix
• Spotify
• Pinterest
• Uber
• Goldman Sachs
• Tumblr
• PayPal
• Box
• Airbnb
• Mozilla
• Cisco
• Etsy
• Foursquare
• StumbleUpon
• Coursera
• …
19https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Resources
• http://kafka.apache.org/
• https://cwiki.apache.org/confluence/display/KAFKA/Index
• https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
• http://www.confluent.io/blog
20
Q & A
21
About Me
• Twitter : @akisemre
• Linkedin : https://tr.linkedin.com/in/emreakis
22

Apache Kafka

  • 1.
  • 2.
    Outline • Why dowe use Apache Kafka ? • What is it? • How it works? • Demo • Ecosystem 1
  • 3.
    Big Data • Datadoesn’t fit in one computer • Welcome to the distributed systems  2
  • 4.
    (Near) Real-time BigData & Analytics • Events (e.g. clickstreams) • Sensors • Internet of Things (IoT) • Data streams 3
  • 5.
  • 6.
    Distributed Messaging Queues •Scalable • Reliable • High throughput (read & write) 5
  • 7.
    Why’s for ApacheKafka • Clean and simple architecture • Easy to use • Easy to deploy • High throughput • Scalability • High availability • Persistence (for a while) 6
  • 8.
    Apache Kafka 101 •Distributed, partitioned, replicated commit log service. • Provides the functionality of a messaging system. 7
  • 9.
  • 10.
    Topic 9 • Category orfeed name to which messages are published. • Partitioned log • Each partition – Ordered – Immutable seq. – Appended to offset => sequential id number
  • 11.
    Partition Distribution • Distributedover servers in the cluster • Replicated for fault tolerance (configurable) • Each partition has a leader server (read & writes) • Others acts followers (replicate leader) • In case of partition failure one of the followers becomes new leader 10
  • 12.
    Producer • Decides whichmessage to which partition – Round-robin – Semantic partitioning 11
  • 13.
    Consumer • Queue vs.Publish/Subscribe • Traditional queue ordering vs per-partition ordering 12
  • 14.
    Guarantees • Messages ina partition will be same order they are sent by a producer. • Consumers see messages in the stored order in log. 13
  • 15.
    Demo • Basic CommandLine Tools – Start a server – Create a topic – Send a message – Start a consumer – Multi-broker cluster • No arguments displays usage information 14
  • 16.
    Clients • Java • Python •Ruby • Go • C/C++ • .NET • Clojure • Node.js • Scala • JRuby • Perl • Erlang • PHP • Rust • HTTP Rest 15https://cwiki.apache.org/confluence/display/KAFKA/Clients
  • 17.
    Administrative Tools • KafkaManager (powered by Yahoo) • Kafkat : Command-line administration for Kafka brokers. • Kafka Web Console : Displays information about your Kafka cluster including which nodes are up and what topics they host data for. • Kafka Offset Monitor : Displays the state of all consumers and how far behind the head of the stream they are. 16
  • 18.
    Ecosystem • Samza • SparkStreaming • Storm 17https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
  • 19.
    Use Cases • Messaging •Website activity tracking (at Linkedin) • Metrics • Log aggregation • Stream processing (with Storm or Samza) • Event sourcing (state changes are logged by time) • Commit log (like database transaction log – log compaction) 18
  • 20.
    Who uses ? •LinkedIn • Yahoo • Twitter • Netflix • Spotify • Pinterest • Uber • Goldman Sachs • Tumblr • PayPal • Box • Airbnb • Mozilla • Cisco • Etsy • Foursquare • StumbleUpon • Coursera • … 19https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
  • 21.
    Resources • http://kafka.apache.org/ • https://cwiki.apache.org/confluence/display/KAFKA/Index •https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem • http://www.confluent.io/blog 20
  • 22.
  • 23.
    About Me • Twitter: @akisemre • Linkedin : https://tr.linkedin.com/in/emreakis 22

Editor's Notes

  • #11 With Partitions Scalabilitiy and parallelism