KEMBAR78
Juggling with Bits and Bytes - How Apache Flink operates on binary data | PPTX
Juggling with Bits and Bytes
How Apache Flink operates on binary data
Fabian Hueske
fhueske@apache.org @fhueske
1
Big Data frameworks on JVMs
• Many (open source) Big Data frameworks run on JVMs
– Hadoop, Drill, Spark, Hive, Pig, and ...
– Flink as well
• Common challenge: How to organize data in-memory?
– In-memory processing (sorting, joining, aggregating)
– In-memory caching of intermediate results
• Memory management of a system influences
– Reliability
– Resource efficiency, performance & performance predictability
– Ease of configuration
2
The straight-forward approach
Store and process data as objects on the heap
• Put objects in an Array and sort it
A few notable drawbacks
• Predicting memory consumption is hard
– If you fail, an OutOfMemoryError will kill you!
• High garbage collection overhead
– Easily 50% of time spend on GC
• Objects have space overhead
– At least 8 bytes for each (nested) object! (Depends on arch)
3
FLINK’S APPROACH
4
Flink adopts DBMS technology
• Allocates fixed number of memory segments upfront
• Data objects are serialized into memory segments
• DBMS-style algorithms work on binary representation
5
Why is that good?
• Memory-safe execution
– Used and available memory segments are easy to count
• Efficient out-of-core algorithms
– Memory segments can be efficiently written to disk
• Reduced GC pressure
– Memory segments are never deallocated
– Data objects are short-lived or reused
• Space-efficient data representation
• Efficient operations on binary data
6
What does it cost?
• Significant implementation investment
– Using java.util.HashMap
vs.
– Implementing a spillable hash table backed by byte arrays
and custom serialization stack
• Other systems use similar techniques
– Apache Drill, Apache Ignite, Apache Geode
• Apache Spark plans to evolve into a similar direction
7
MEMORY ALLOCATION
8
Memory segments
• Unit of memory distribution in Flink
– Fixed number allocated when worker starts
• Backed by a regular byte array (default 32KB)
• R/W access through Java’s efficient unsafe methods
• Multiple memory segments can be concatenated to
a larger chunk of memory
9
Memory allocation
10
DATA SERIALIZATION
11
Custom de/serialization stack
• Many alternatives for Java object serialization
– Kryo, Apache Avro, Apache Thrift, Protobufs, …
• But Flink has its own serialization stack
– Operating on serialized data requires knowledge of layout
– Control over layout can improve efficiency of operations
– Data types are known before execution
12
Rich & extensible type system
• Serialization framework requires knowledge of types
• Flink analyzes return types of functions
– Java: Reflection based type analyzer
– Scala: Compiler information
• Rich type system
– Atomics: Primitives, Writables, Generic types, …
– Composites: Tuples, Pojos, CaseClasses
– Extensible by custom types
13
Serializers & comparators
• All types have dedicated de/serializers
– Primitives are natively serialized
– Writables use their own serialization functions
– Generic types use Kryo
– …
• Serialization goes automatically through Java unsafe
• Comparators compare and hash objects
– On binary representation if possible
• Composite serializers and comparators delegate to
serializers and comparators of member types
14
Serializing a Tuple3<Integer, Double, Person>
15
OPERATING ON BINARY DATA
16
Data Processing Algorithms
• Flink’s algorithms are based on RDBMS technology
– External Merge Sort, Hybrid Hash Join, Sort Merge Join, …
• Algorithms receive a budget of memory segments
• Operate in-memory as long as data fits into budget
– And gracefully spill to disk if data exceeds memory
17
In-Memory Sort – Fill the Sort Buffer
18
In-Memory Sort – Sort the Buffer
19
In-Memory Sort – Read Sorted Buffer
20
SHOW ME NUMBERS!
21
Sort benchmark
• Task: Sort 10 million Tuple2<Integer, String> records
– String length 12 chars
• Tuple has 16 Bytes of raw data
• ~152 MB raw data
– Integers uniformly, Strings long-tail distributed
– Sort on Integer field and on String field
• Input provided as mutable object iterator
• Use JVM with 900 MB heap size
– Minimum size to reliable run the benchmark
22
Sorting methods
1. Objects-on-Heap:
– Put cloned data objects in ArrayList and use Java’s Collection sort.
– ArrayList is initialized with right size.
2. Flink-serialized:
– Using Flink’s custom serializers.
– Integer with full binary sorting key, String with 8 byte prefix key.
3. Kryo-serialized:
– Serialize fields with Kryo.
– No binary sorting keys, objects are deserialized for comparison.
• All implementations use a single thread
• Average execution time of 10 runs reported
• GC triggered between runs (does not go into time)
23
Execution time
24
Garbage collection and heap usage
25
Objects-on-heap
Flink-serialized
Memory usage
26
• Breakdown: Flink serialized - Sort Integer
– 4 bytes Integer
– 12 bytes String
– 4 bytes String length
– 4 bytes pointer
– 4 bytes Integer sorting key
– 28 bytes * 10M records = 267 MB
Object-on-heap Flink-serialized Kryo-serialized
Sort Integer Approx. 700 MB 277 MB 266 MB
Sort String Approx. 700 MB 315 MB 266 MB
WHAT’S NEXT?
27
We’re not done yet!
• Move memory segments to off-heap memory
– Smaller JVM, lower GC pressure, easier configuration
• Table API provides full semantics for execution
– Use code generation to operate fully on binary data
• Serialization layouts tailored towards operations
– More efficient operations on binary data
• …
28
Summary
• Active memory management avoids OOMErrors.
• Highly efficient data serialization stack
– Facilitates operations on binary data
– Makes more data fit into memory
• DBMS-style operators operate on binary data
– High performance in-memory processing
– Graceful destaging to disk if necessary
• Read the full story:
http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
29
30
http://flink.apache.org @ApacheFlink
Apache Flink

Juggling with Bits and Bytes - How Apache Flink operates on binary data

  • 1.
    Juggling with Bitsand Bytes How Apache Flink operates on binary data Fabian Hueske fhueske@apache.org @fhueske 1
  • 2.
    Big Data frameworkson JVMs • Many (open source) Big Data frameworks run on JVMs – Hadoop, Drill, Spark, Hive, Pig, and ... – Flink as well • Common challenge: How to organize data in-memory? – In-memory processing (sorting, joining, aggregating) – In-memory caching of intermediate results • Memory management of a system influences – Reliability – Resource efficiency, performance & performance predictability – Ease of configuration 2
  • 3.
    The straight-forward approach Storeand process data as objects on the heap • Put objects in an Array and sort it A few notable drawbacks • Predicting memory consumption is hard – If you fail, an OutOfMemoryError will kill you! • High garbage collection overhead – Easily 50% of time spend on GC • Objects have space overhead – At least 8 bytes for each (nested) object! (Depends on arch) 3
  • 4.
  • 5.
    Flink adopts DBMStechnology • Allocates fixed number of memory segments upfront • Data objects are serialized into memory segments • DBMS-style algorithms work on binary representation 5
  • 6.
    Why is thatgood? • Memory-safe execution – Used and available memory segments are easy to count • Efficient out-of-core algorithms – Memory segments can be efficiently written to disk • Reduced GC pressure – Memory segments are never deallocated – Data objects are short-lived or reused • Space-efficient data representation • Efficient operations on binary data 6
  • 7.
    What does itcost? • Significant implementation investment – Using java.util.HashMap vs. – Implementing a spillable hash table backed by byte arrays and custom serialization stack • Other systems use similar techniques – Apache Drill, Apache Ignite, Apache Geode • Apache Spark plans to evolve into a similar direction 7
  • 8.
  • 9.
    Memory segments • Unitof memory distribution in Flink – Fixed number allocated when worker starts • Backed by a regular byte array (default 32KB) • R/W access through Java’s efficient unsafe methods • Multiple memory segments can be concatenated to a larger chunk of memory 9
  • 10.
  • 11.
  • 12.
    Custom de/serialization stack •Many alternatives for Java object serialization – Kryo, Apache Avro, Apache Thrift, Protobufs, … • But Flink has its own serialization stack – Operating on serialized data requires knowledge of layout – Control over layout can improve efficiency of operations – Data types are known before execution 12
  • 13.
    Rich & extensibletype system • Serialization framework requires knowledge of types • Flink analyzes return types of functions – Java: Reflection based type analyzer – Scala: Compiler information • Rich type system – Atomics: Primitives, Writables, Generic types, … – Composites: Tuples, Pojos, CaseClasses – Extensible by custom types 13
  • 14.
    Serializers & comparators •All types have dedicated de/serializers – Primitives are natively serialized – Writables use their own serialization functions – Generic types use Kryo – … • Serialization goes automatically through Java unsafe • Comparators compare and hash objects – On binary representation if possible • Composite serializers and comparators delegate to serializers and comparators of member types 14
  • 15.
  • 16.
  • 17.
    Data Processing Algorithms •Flink’s algorithms are based on RDBMS technology – External Merge Sort, Hybrid Hash Join, Sort Merge Join, … • Algorithms receive a budget of memory segments • Operate in-memory as long as data fits into budget – And gracefully spill to disk if data exceeds memory 17
  • 18.
    In-Memory Sort –Fill the Sort Buffer 18
  • 19.
    In-Memory Sort –Sort the Buffer 19
  • 20.
    In-Memory Sort –Read Sorted Buffer 20
  • 21.
  • 22.
    Sort benchmark • Task:Sort 10 million Tuple2<Integer, String> records – String length 12 chars • Tuple has 16 Bytes of raw data • ~152 MB raw data – Integers uniformly, Strings long-tail distributed – Sort on Integer field and on String field • Input provided as mutable object iterator • Use JVM with 900 MB heap size – Minimum size to reliable run the benchmark 22
  • 23.
    Sorting methods 1. Objects-on-Heap: –Put cloned data objects in ArrayList and use Java’s Collection sort. – ArrayList is initialized with right size. 2. Flink-serialized: – Using Flink’s custom serializers. – Integer with full binary sorting key, String with 8 byte prefix key. 3. Kryo-serialized: – Serialize fields with Kryo. – No binary sorting keys, objects are deserialized for comparison. • All implementations use a single thread • Average execution time of 10 runs reported • GC triggered between runs (does not go into time) 23
  • 24.
  • 25.
    Garbage collection andheap usage 25 Objects-on-heap Flink-serialized
  • 26.
    Memory usage 26 • Breakdown:Flink serialized - Sort Integer – 4 bytes Integer – 12 bytes String – 4 bytes String length – 4 bytes pointer – 4 bytes Integer sorting key – 28 bytes * 10M records = 267 MB Object-on-heap Flink-serialized Kryo-serialized Sort Integer Approx. 700 MB 277 MB 266 MB Sort String Approx. 700 MB 315 MB 266 MB
  • 27.
  • 28.
    We’re not doneyet! • Move memory segments to off-heap memory – Smaller JVM, lower GC pressure, easier configuration • Table API provides full semantics for execution – Use code generation to operate fully on binary data • Serialization layouts tailored towards operations – More efficient operations on binary data • … 28
  • 29.
    Summary • Active memorymanagement avoids OOMErrors. • Highly efficient data serialization stack – Facilitates operations on binary data – Makes more data fit into memory • DBMS-style operators operate on binary data – High performance in-memory processing – Graceful destaging to disk if necessary • Read the full story: http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html 29
  • 30.