Juggling with Bits and Bytes - How Apache Flink operates on binary data

Juggling with Bits and Bytes
How Apache Flink operates on binary data
Fabian Hueske
fhueske@apache.org @fhueske
1

Big Data frameworks on JVMs
• Many (open source) Big Data frameworks run on JVMs
– Hadoop, Drill, Spark, Hive, Pig, and ...
– Flink as well
• Common challenge: How to organize data in-memory?
– In-memory processing (sorting, joining, aggregating)
– In-memory caching of intermediate results
• Memory management of a system influences
– Reliability
– Resource efficiency, performance & performance predictability
– Ease of configuration
2

The straight-forward approach
Store and process data as objects on the heap
• Put objects in an Array and sort it
A few notable drawbacks
• Predicting memory consumption is hard
– If you fail, an OutOfMemoryError will kill you!
• High garbage collection overhead
– Easily 50% of time spend on GC
• Objects have space overhead
– At least 8 bytes for each (nested) object! (Depends on arch)
3

Flink adopts DBMS technology
• Allocates fixed number of memory segments upfront
• Data objects are serialized into memory segments
• DBMS-style algorithms work on binary representation
5

Why is that good?
• Memory-safe execution
– Used and available memory segments are easy to count
• Efficient out-of-core algorithms
– Memory segments can be efficiently written to disk
• Reduced GC pressure
– Memory segments are never deallocated
– Data objects are short-lived or reused
• Space-efficient data representation
• Efficient operations on binary data
6

What does it cost?
• Significant implementation investment
– Using java.util.HashMap
vs.
– Implementing a spillable hash table backed by byte arrays
and custom serialization stack
• Other systems use similar techniques
– Apache Drill, Apache Ignite, Apache Geode
• Apache Spark plans to evolve into a similar direction
7

Memory segments
• Unit of memory distribution in Flink
– Fixed number allocated when worker starts
• Backed by a regular byte array (default 32KB)
• R/W access through Java’s efficient unsafe methods
• Multiple memory segments can be concatenated to
a larger chunk of memory
9

Custom de/serialization stack
• Many alternatives for Java object serialization
– Kryo, Apache Avro, Apache Thrift, Protobufs, …
• But Flink has its own serialization stack
– Operating on serialized data requires knowledge of layout
– Control over layout can improve efficiency of operations
– Data types are known before execution
12

Rich & extensible type system
• Serialization framework requires knowledge of types
• Flink analyzes return types of functions
– Java: Reflection based type analyzer
– Scala: Compiler information
• Rich type system
– Atomics: Primitives, Writables, Generic types, …
– Composites: Tuples, Pojos, CaseClasses
– Extensible by custom types
13

Serializers & comparators
• All types have dedicated de/serializers
– Primitives are natively serialized
– Writables use their own serialization functions
– Generic types use Kryo
– …
• Serialization goes automatically through Java unsafe
• Comparators compare and hash objects
– On binary representation if possible
• Composite serializers and comparators delegate to
serializers and comparators of member types
14

Serializing a Tuple3<Integer, Double, Person>
15

Data Processing Algorithms
• Flink’s algorithms are based on RDBMS technology
– External Merge Sort, Hybrid Hash Join, Sort Merge Join, …
• Algorithms receive a budget of memory segments
• Operate in-memory as long as data fits into budget
– And gracefully spill to disk if data exceeds memory
17

In-Memory Sort – Fill the Sort Buffer
18

In-Memory Sort – Sort the Buffer
19

In-Memory Sort – Read Sorted Buffer
20

Sort benchmark
• Task: Sort 10 million Tuple2<Integer, String> records
– String length 12 chars
• Tuple has 16 Bytes of raw data
• ~152 MB raw data
– Integers uniformly, Strings long-tail distributed
– Sort on Integer field and on String field
• Input provided as mutable object iterator
• Use JVM with 900 MB heap size
– Minimum size to reliable run the benchmark
22

Sorting methods
1. Objects-on-Heap:
– Put cloned data objects in ArrayList and use Java’s Collection sort.
– ArrayList is initialized with right size.
2. Flink-serialized:
– Using Flink’s custom serializers.
– Integer with full binary sorting key, String with 8 byte prefix key.
3. Kryo-serialized:
– Serialize fields with Kryo.
– No binary sorting keys, objects are deserialized for comparison.
• All implementations use a single thread
• Average execution time of 10 runs reported
• GC triggered between runs (does not go into time)
23

Garbage collection and heap usage
25
Objects-on-heap
Flink-serialized

Memory usage
26
• Breakdown: Flink serialized - Sort Integer
– 4 bytes Integer
– 12 bytes String
– 4 bytes String length
– 4 bytes pointer
– 4 bytes Integer sorting key
– 28 bytes * 10M records = 267 MB
Object-on-heap Flink-serialized Kryo-serialized
Sort Integer Approx. 700 MB 277 MB 266 MB
Sort String Approx. 700 MB 315 MB 266 MB

We’re not done yet!
• Move memory segments to off-heap memory
– Smaller JVM, lower GC pressure, easier configuration
• Table API provides full semantics for execution
– Use code generation to operate fully on binary data
• Serialization layouts tailored towards operations
– More efficient operations on binary data
• …
28

Summary
• Active memory management avoids OOMErrors.
• Highly efficient data serialization stack
– Facilitates operations on binary data
– Makes more data fit into memory
• DBMS-style operators operate on binary data
– High performance in-memory processing
– Graceful destaging to disk if necessary
• Read the full story:
http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
29

30
http://flink.apache.org @ApacheFlink
Apache Flink

Juggling with Bits and Bytes - How Apache Flink operates on binary data

More Related Content

What's hot

Viewers also liked

Similar to Juggling with Bits and Bytes - How Apache Flink operates on binary data

Recently uploaded

Juggling with Bits and Bytes - How Apache Flink operates on binary data