Apache Flink Worst Practices

© 2019 Ververica
Konstantin Knauf
Head of Product - Ververica Platform
@snntrable

© 2019 Ververica2 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Go Live Maintenance

...choose the right setup!
start with one of you most challenging uses cases
no training
don’t use the community
no prior stream processing knowledge

● Community
○ user@flink.apache.org
■ ~ 600 threads per month
■ ∅30h until first response
○ www.stackoverflow.com
■ How not to ask questions?
● https://data.stackexchange.com/stackoverflow/query/1115371/most-down-voted-apache-flink-questions
○ (ASF Slack #flink)
● Training
○ @FlinkForward
○ https://www.ververica.com/training
Getting Help!

...don’t think too much about your requirements.
don’t think about consistency & delivery guarantees
don’t think about the scale of your problem
don’t think too much about your actual business requirements
don’t think about upgrades and application evolution

Do I care about correct results?
Three Questions
Do I care about duplicate (yet correct)
records downstream?
Do I care about losing records?
No → CheckpointingMode.AT_LEAST_ONCE & replayable
sources
Yes → Next question
No → No checkpointing, any source
Yes → Next question
No → CheckpointingMode.EXACTLY_ONCE & replayable
sources
Yes → CheckpointingMode.EXACTLY_ONCE, replayable
sources & transactional sinks
L
A
T
E
N
C
Y

No checkpointing,
any source
NoYes
CheckpointingMode.EXACTLY_ONCE
& replayable sources
CheckpointingMode.AT_LEAST_ONCE
& replayable sources
CheckpointingMode.EXACTLY_ONCE,
replayable sources & transactional sinks
Yes No
NoYes
Do I care about correct results?
Do I care about duplicate (yet correct)
records downstream?
Do I care about losing records?
L
A
T
E
N
C
Y

Flink job
user code
Local state
backends
Persisted
savepoint
local reads / writes that
manipulate state
persist to
DFS on
savepoint
Basics

Flink job
user code
Local state
backends
Persisted
savepoint
Upgrade
Application
1. Upgrade Flink cluster
2. Fix bugs
3. Pipeline topology changes
4. Job reconﬁgurations
5. Adapt state schema
6. ...
Basics

Flink job
user code
Local state
backends
Persisted
savepoint
continue to
access state
reload state
into local
state backends
Basics

flink run --allowNonRestoredState <jar-file
<arguments>
Sink:
LateDataSink
Topology Changes

new operator starts ➙ empty state
Still the same?
streamOperator
.uid("AggregatePerSensor")
Topology Changes

● Avro Types ✓
○ https://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution
● Flink POJOs ✓
○ https://ci.apache.org/projects/ﬂink/ﬂink-docs-release-1.9/dev/stream/state/schema_evolution.html#pojo-types
● Kryo✗
● Key data types can not be changed
State Schema Evolution
State Unlocked
October 9: 2:30 pm - 3:10 pm
B 07 - B 08
Seth Wiesman, Tzu-Li (Gordon) Tai
● State Processor API for the rescue

State Size & Network
Stateful Operator
TaskManager (1 Slot)
Source
keyBy
Sink
Ingress MB/s Shuﬄe MB/s
Shuﬄe MB/s
Egress MB/s
FilesystemStatebackend Checkpoint MB/s
#key * state_size
#records/s * size
#records/s * size * #tm-1/#tm
overall_state_size * 1/checkpoint_interval
#records/s * size * #tm-1/#tm
#messages/s * size

Event Time & Out-Of-Orderness
I want to send an alarm when the number of transactions per customer
exceeds three in ten seconds.
4 8 13 11 15 21
4 8 13 11 15 21
4 8 13 11 15 21
4 8 13 11 15 21
Tumbling Window (10 secs):
Sliding Window (10/5 secs):
“Look Back” (10 secs) :
Fire or wait for watermark?
Fire or wait till watermark reaches end of window?

Batch Processing Requirements
I receive two ﬁles every ten minutes: transactions.txt & quotes.txt
● They need to be transformed & joined.
● The output must be exactly one ﬁle.
● The operation needs to be atomic.
→ Don’t try to re-implement your batch jobs as a stream processing jobs.

AnalysisProject Setup Development (Testing) Production Maintenance

Applications
(physical)
Analytics
(declarative)
DataStream API Table API/SQL
Types are Java / Scala classes Logical Schema for Tables
Transformation Functions Declarative Language (SQL, Table DSL)
State implicit in operationsExplicit control over State
Explicit control over Time SLAs deﬁne when to trigger
Executes as described Automatic Optimization
Analytics or Application

Table API / SQL Red Flags
● When upgrading Apache Flink or my application I want to migrate state.
● I can not loose late data.
● I want to change the behaviour of my application during runtime.

Worst Practices
make use of deeply-nested, complex data types
(don’t think about potential schema evolution)
KeySelectors#getKey can return any serializable type; use this freedom

● serialization is not to be underestimated
● the simpler your data types the better
● use Flink POJOs or Avro SpecificRecords
● key types matter most
○ part of every KeyedState
○ part of every Timer
● tune locally
Serializer Ops/s
PojoSerializer 305
Kryo 102
Avro (Reﬂect API) 127
Avro (SpeciﬁcRecord API) 297

sourceStream.flatMap(new Deserializer())
.keyBy(“cities”)
.timeWindow()
.count()
.filter(new GeographyFilter(“America”))
.addSink(...)
don’t process data you don’t need
● project early
● ﬁlter early
● don’t deserialize unused ﬁelds, e.g.
public class Record {
private City city;
private byte[] enclosedRecord;
}

static variables to share state between Tasks
spawning threads in user functions
● bugs
● deadlocks & lock contention
(interaction with framework code)
● synchronization overhead
● complicated error prone
(checkpointing)
● use AsyncStreams to reduce wait times
on external IO
● use Timers to schedule tasks
● increase operator parallelism to
increase parallelism

stream.keyBy(“key”)
.timeWindow(Time.of(30, DAYS), Time.of(5, SECONDS)
.apply(new MyWindowFunction())
.window(GlobalWindows.create())
.trigger(new CustomTrigger())
.evictor(new CustomEvictor())
.reduce/aggregate/fold/apply()
● Avoid custom windowing
● KeyedProcessFunction usually
○ less error-prone
○ simpler
● each record is added to > 500k
windows
● without pre-aggregation

Queryable State for Inter-Task Communication
● Non-Thread Safe State Access
○ RocksDBStatebackend ✓
○ FilesystemStatebackend ✗
● Performance?
● Consistency Guarantees?
● Use Queryable State for debugging and
monitoring only

.flatmap(..)
.keyBy(“key”)
.process(..)
.keyBy(“key”)
.timeWindow(..)
● DataStreamUtils#reinterpretAsKeyedStream
● Note: Stream needs to be partitioned exactly as Flink
would partition it.
public void flatMap(Bar bar, Collector<Foo> out) throws Exception {
MyParserFactory factory = MyParserFactory.newInstance();
MyParser parser = factory.newParser();
out.collect(parser.parse(bar));
}
● use RichFunction#open
for initialization logic

That’s a pyramid!
UDF Unit Tests
System Tests

That’s a pyramid!
UDF Unit Tests
AbstractTestHarness
MiniClusterResource
System Tests

Examples: https://github.com/knaufk/ﬂink-testing-pyramid
Testing Stateful and Timely Operators and Functions
@Test
public void testingStatefulFlatMapFunction() throws Exception {
//push (timestamped) elements into the operator (and hence user defined function)
testHarness.processElement(2L, 100L);
//trigger event time timers by advancing the event time of the operator with a watermark
testHarness.processWatermark(100L);
//trigger processing time timers by advancing the processing time of the operator directly
testHarness.setProcessingTime(100L);
//retrieve list of emitted records for assertions
assertThat(testHarness.getOutput(), containsInExactlyThisOrder(3L))
//retrieve list of records emitted to a specific side output for assertions (ProcessFunction only)
assertThat(testHarness.getSideOutput(new OutputTag<>("invalidRecords")), hasSize(0))
}

Ignore Spiky Loads!
● seasonal ﬂuctuations
● checkpoint alignment
● watermark interval
● GC pauses
● ...
● timers
processing time
records/s

Ignore Spiky Loads!
● seasonal ﬂuctuations
● checkpoint alignment
● watermark interval
● GC pauses
● ...
● timers
processing time
records/s
catch-up scenario

use the Flink Web Interface as monitoring system
Monitoring & Metrics
if at all start monitoring now
using latency markers in production
[1] https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html
● don’t miss the chance to learn about
Flink’s runtime during development
● not sure how to start → read [1]
● not the right tool for the job →
MetricsReporters (e.g Prometheus,
InfluxDB, Datadog)
● too many metrics can bring
JobManagers down
● high overhead (in particular with
metrics.latency.granularity:
subtask)
● measure event time lag instead [2]
[2] https://flink.apache.org/2019/06/05/flink-network-stack.html

Configuration
choosing RocksDBStatebackend by default
playing around with Slots and SlotSharingGroups (too
early)
NFS/EBS/etc. as state.backend.rocksdb.localdir
● FilesystemStatebackend is faster &
easier to configure
● State Processor API can be used to
switch later
● disk IO ultimately determines how fast
you can read/write state
● usually not worth it
● leads to less chaining and additional
serialization, more network shuffles
and lower resource utilization

with a fast-pace project like Apache Flink, don’t upgrade

Apache Flink Worst Practices

More Related Content

What's hot

Similar to Apache Flink Worst Practices

Recently uploaded

Apache Flink Worst Practices