KEMBAR78
Apache Flink Worst Practices | PDF
© 2019 Ververica
Konstantin Knauf
Head of Product - Ververica Platform
@snntrable
© 2019 Ververica2 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Go Live Maintenance
© 2019 Ververica3 = Worst Practice
...choose the right setup!
start with one of you most challenging uses cases
no training
don’t use the community
no prior stream processing knowledge
© 2019 Ververica4 = Worst Practice
● Community
○ user@flink.apache.org
■ ~ 600 threads per month
■ ∅30h until first response
○ www.stackoverflow.com
■ How not to ask questions?
● https://data.stackexchange.com/stackoverflow/query/1115371/most-down-voted-apache-flink-questions
○ (ASF Slack #flink)
● Training
○ @FlinkForward
○ https://www.ververica.com/training
Getting Help!
© 2019 Ververica5 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Go Live Maintenance
© 2019 Ververica6 = Worst Practice
...don’t think too much about your requirements.
don’t think about consistency & delivery guarantees
don’t think about the scale of your problem
don’t think too much about your actual business requirements
don’t think about upgrades and application evolution
© 2019 Ververica7 = Worst Practice
Do I care about correct results?
Three Questions
Do I care about duplicate (yet correct)
records downstream?
Do I care about losing records?
No → CheckpointingMode.AT_LEAST_ONCE & replayable
sources
Yes → Next question
No → No checkpointing, any source
Yes → Next question
No → CheckpointingMode.EXACTLY_ONCE & replayable
sources
Yes → CheckpointingMode.EXACTLY_ONCE, replayable
sources & transactional sinks
L
A
T
E
N
C
Y
© 2019 Ververica8 = Worst Practice
No checkpointing,
any source
NoYes
CheckpointingMode.EXACTLY_ONCE
& replayable sources
CheckpointingMode.AT_LEAST_ONCE
& replayable sources
CheckpointingMode.EXACTLY_ONCE,
replayable sources & transactional sinks
Yes No
NoYes
Do I care about correct results?
Do I care about duplicate (yet correct)
records downstream?
Do I care about losing records?
L
A
T
E
N
C
Y
© 2019 Ververica9 = Worst Practice
Flink job
user code
Local state
backends
Persisted
savepoint
local reads / writes that
manipulate state
persist to
DFS on
savepoint
Basics
© 2019 Ververica10 = Worst Practice
Flink job
user code
Local state
backends
Persisted
savepoint
Upgrade
Application
1. Upgrade Flink cluster
2. Fix bugs
3. Pipeline topology changes
4. Job reconfigurations
5. Adapt state schema
6. ...
Basics
© 2019 Ververica11 = Worst Practice
Flink job
user code
Local state
backends
Persisted
savepoint
continue to
access state
reload state
into local
state backends
Basics
© 2019 Ververica12 = Worst Practice
flink run --allowNonRestoredState <jar-file
<arguments>
Sink:
LateDataSink
Topology Changes
© 2019 Ververica13 = Worst Practice
new operator starts ➙ empty state
Still the same?
streamOperator
.uid("AggregatePerSensor")
Topology Changes
© 2019 Ververica14 = Worst Practice
● Avro Types ✓
○ https://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution
● Flink POJOs ✓
○ https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/schema_evolution.html#pojo-types
● Kryo✗
● Key data types can not be changed
State Schema Evolution
State Unlocked
October 9: 2:30 pm - 3:10 pm
B 07 - B 08
Seth Wiesman, Tzu-Li (Gordon) Tai
● State Processor API for the rescue
© 2019 Ververica15 = Worst Practice
State Size & Network
Stateful Operator
TaskManager (1 Slot)
Source
keyBy
Sink
Ingress MB/s Shuffle MB/s
Shuffle MB/s
Egress MB/s
FilesystemStatebackend Checkpoint MB/s
#key * state_size
#records/s * size
#records/s * size * #tm-1/#tm
overall_state_size * 1/checkpoint_interval
#records/s * size * #tm-1/#tm
#messages/s * size
© 2019 Ververica16 = Worst Practice
Event Time & Out-Of-Orderness
I want to send an alarm when the number of transactions per customer
exceeds three in ten seconds.
4 8 13 11 15 21
4 8 13 11 15 21
4 8 13 11 15 21
4 8 13 11 15 21
Tumbling Window (10 secs):
Sliding Window (10/5 secs):
“Look Back” (10 secs) :
Fire or wait for watermark?
Fire or wait till watermark reaches end of window?
© 2019 Ververica17 = Worst Practice
Batch Processing Requirements
I receive two files every ten minutes: transactions.txt & quotes.txt
● They need to be transformed & joined.
● The output must be exactly one file.
● The operation needs to be atomic.
→ Don’t try to re-implement your batch jobs as a stream processing jobs.
© 2019 Ververica18 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Production Maintenance
© 2019 Ververica19 = Worst Practice
Applications
(physical)
Analytics
(declarative)
DataStream API Table API/SQL
Types are Java / Scala classes Logical Schema for Tables
Transformation Functions Declarative Language (SQL, Table DSL)
State implicit in operationsExplicit control over State
Explicit control over Time SLAs define when to trigger
Executes as described Automatic Optimization
Analytics or Application
© 2019 Ververica20 = Worst Practice
Table API / SQL Red Flags
● When upgrading Apache Flink or my application I want to migrate state.
● I can not loose late data.
● I want to change the behaviour of my application during runtime.
© 2019 Ververica21 = Worst Practice
Worst Practices
make use of deeply-nested, complex data types
(don’t think about potential schema evolution)
KeySelectors#getKey can return any serializable type; use this freedom
© 2019 Ververica22 = Worst Practice
● serialization is not to be underestimated
● the simpler your data types the better
● use Flink POJOs or Avro SpecificRecords
● key types matter most
○ part of every KeyedState
○ part of every Timer
● tune locally
Serializer Ops/s
PojoSerializer 305
Kryo 102
Avro (Reflect API) 127
Avro (SpecificRecord API) 297
© 2019 Ververica23 = Worst Practice
sourceStream.flatMap(new Deserializer())
.keyBy(“cities”)
.timeWindow()
.count()
.filter(new GeographyFilter(“America”))
.addSink(...)
don’t process data you don’t need
● project early
● filter early
● don’t deserialize unused fields, e.g.
public class Record {
private City city;
private byte[] enclosedRecord;
}
© 2019 Ververica24 = Worst Practice
static variables to share state between Tasks
spawning threads in user functions
● bugs
● deadlocks & lock contention
(interaction with framework code)
● synchronization overhead
● complicated error prone
(checkpointing)
● use AsyncStreams to reduce wait times
on external IO
● use Timers to schedule tasks
● increase operator parallelism to
increase parallelism
© 2019 Ververica25 = Worst Practice
stream.keyBy(“key”)
.timeWindow(Time.of(30, DAYS), Time.of(5, SECONDS)
.apply(new MyWindowFunction())
stream.keyBy(“key”)
.window(GlobalWindows.create())
.trigger(new CustomTrigger())
.evictor(new CustomEvictor())
.reduce/aggregate/fold/apply()
● Avoid custom windowing
● KeyedProcessFunction usually
○ less error-prone
○ simpler
● each record is added to > 500k
windows
● without pre-aggregation
© 2019 Ververica26 = Worst Practice
Queryable State for Inter-Task Communication
● Non-Thread Safe State Access
○ RocksDBStatebackend ✓
○ FilesystemStatebackend ✗
● Performance?
● Consistency Guarantees?
● Use Queryable State for debugging and
monitoring only
© 2019 Ververica27 = Worst Practice
stream.keyBy(“key”)
.flatmap(..)
.keyBy(“key”)
.process(..)
.keyBy(“key”)
.timeWindow(..)
● DataStreamUtils#reinterpretAsKeyedStream
● Note: Stream needs to be partitioned exactly as Flink
would partition it.
public void flatMap(Bar bar, Collector<Foo> out) throws Exception {
MyParserFactory factory = MyParserFactory.newInstance();
MyParser parser = factory.newParser();
out.collect(parser.parse(bar));
}
● use RichFunction#open
for initialization logic
© 2019 Ververica28 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Go Live Maintenance
© 2019 Ververica29 = Worst Practice
That’s a pyramid!
UDF Unit Tests
System Tests
© 2019 Ververica30 = Worst Practice
That’s a pyramid!
UDF Unit Tests
AbstractTestHarness
MiniClusterResource
System Tests
© 2019 Ververica31 = Worst Practice
Examples: https://github.com/knaufk/flink-testing-pyramid
Testing Stateful and Timely Operators and Functions
@Test
public void testingStatefulFlatMapFunction() throws Exception {
//push (timestamped) elements into the operator (and hence user defined function)
testHarness.processElement(2L, 100L);
//trigger event time timers by advancing the event time of the operator with a watermark
testHarness.processWatermark(100L);
//trigger processing time timers by advancing the processing time of the operator directly
testHarness.setProcessingTime(100L);
//retrieve list of emitted records for assertions
assertThat(testHarness.getOutput(), containsInExactlyThisOrder(3L))
//retrieve list of records emitted to a specific side output for assertions (ProcessFunction only)
assertThat(testHarness.getSideOutput(new OutputTag<>("invalidRecords")), hasSize(0))
}
© 2019 Ververica32 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Go Live Maintenance
© 2019 Ververica33 = Worst Practice
Ignore Spiky Loads!
● seasonal fluctuations
● checkpoint alignment
● watermark interval
● GC pauses
● ...
● timers
processing time
records/s
© 2019 Ververica34 = Worst Practice
Ignore Spiky Loads!
● seasonal fluctuations
● checkpoint alignment
● watermark interval
● GC pauses
● ...
● timers
processing time
records/s
catch-up scenario
© 2019 Ververica35 = Worst Practice
use the Flink Web Interface as monitoring system
Monitoring & Metrics
if at all start monitoring now
using latency markers in production
[1] https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html
● don’t miss the chance to learn about
Flink’s runtime during development
● not sure how to start → read [1]
● not the right tool for the job →
MetricsReporters (e.g Prometheus,
InfluxDB, Datadog)
● too many metrics can bring
JobManagers down
● high overhead (in particular with
metrics.latency.granularity:
subtask)
● measure event time lag instead [2]
[2] https://flink.apache.org/2019/06/05/flink-network-stack.html
© 2019 Ververica36 = Worst Practice
Configuration
choosing RocksDBStatebackend by default
playing around with Slots and SlotSharingGroups (too
early)
NFS/EBS/etc. as state.backend.rocksdb.localdir
● FilesystemStatebackend is faster &
easier to configure
● State Processor API can be used to
switch later
● disk IO ultimately determines how fast
you can read/write state
● usually not worth it
● leads to less chaining and additional
serialization, more network shuffles
and lower resource utilization
© 2019 Ververica37 = Worst Practice
Don’t use an iterative development process!
AnalysisProject Setup Development (Testing) Go Live Maintenance
© 2019 Ververica38 = Worst Practice
with a fast-pace project like Apache Flink, don’t upgrade
© 2019 Ververica
Questions?
© 2019 Ververica
www.ververica.com @snntrablekonstantin@ververica.com

Apache Flink Worst Practices

  • 1.
    © 2019 Ververica KonstantinKnauf Head of Product - Ververica Platform @snntrable
  • 2.
    © 2019 Ververica2= Worst Practice Don’t use an iterative development process! AnalysisProject Setup Development (Testing) Go Live Maintenance
  • 3.
    © 2019 Ververica3= Worst Practice ...choose the right setup! start with one of you most challenging uses cases no training don’t use the community no prior stream processing knowledge
  • 4.
    © 2019 Ververica4= Worst Practice ● Community ○ user@flink.apache.org ■ ~ 600 threads per month ■ ∅30h until first response ○ www.stackoverflow.com ■ How not to ask questions? ● https://data.stackexchange.com/stackoverflow/query/1115371/most-down-voted-apache-flink-questions ○ (ASF Slack #flink) ● Training ○ @FlinkForward ○ https://www.ververica.com/training Getting Help!
  • 5.
    © 2019 Ververica5= Worst Practice Don’t use an iterative development process! AnalysisProject Setup Development (Testing) Go Live Maintenance
  • 6.
    © 2019 Ververica6= Worst Practice ...don’t think too much about your requirements. don’t think about consistency & delivery guarantees don’t think about the scale of your problem don’t think too much about your actual business requirements don’t think about upgrades and application evolution
  • 7.
    © 2019 Ververica7= Worst Practice Do I care about correct results? Three Questions Do I care about duplicate (yet correct) records downstream? Do I care about losing records? No → CheckpointingMode.AT_LEAST_ONCE & replayable sources Yes → Next question No → No checkpointing, any source Yes → Next question No → CheckpointingMode.EXACTLY_ONCE & replayable sources Yes → CheckpointingMode.EXACTLY_ONCE, replayable sources & transactional sinks L A T E N C Y
  • 8.
    © 2019 Ververica8= Worst Practice No checkpointing, any source NoYes CheckpointingMode.EXACTLY_ONCE & replayable sources CheckpointingMode.AT_LEAST_ONCE & replayable sources CheckpointingMode.EXACTLY_ONCE, replayable sources & transactional sinks Yes No NoYes Do I care about correct results? Do I care about duplicate (yet correct) records downstream? Do I care about losing records? L A T E N C Y
  • 9.
    © 2019 Ververica9= Worst Practice Flink job user code Local state backends Persisted savepoint local reads / writes that manipulate state persist to DFS on savepoint Basics
  • 10.
    © 2019 Ververica10= Worst Practice Flink job user code Local state backends Persisted savepoint Upgrade Application 1. Upgrade Flink cluster 2. Fix bugs 3. Pipeline topology changes 4. Job reconfigurations 5. Adapt state schema 6. ... Basics
  • 11.
    © 2019 Ververica11= Worst Practice Flink job user code Local state backends Persisted savepoint continue to access state reload state into local state backends Basics
  • 12.
    © 2019 Ververica12= Worst Practice flink run --allowNonRestoredState <jar-file <arguments> Sink: LateDataSink Topology Changes
  • 13.
    © 2019 Ververica13= Worst Practice new operator starts ➙ empty state Still the same? streamOperator .uid("AggregatePerSensor") Topology Changes
  • 14.
    © 2019 Ververica14= Worst Practice ● Avro Types ✓ ○ https://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution ● Flink POJOs ✓ ○ https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/schema_evolution.html#pojo-types ● Kryo✗ ● Key data types can not be changed State Schema Evolution State Unlocked October 9: 2:30 pm - 3:10 pm B 07 - B 08 Seth Wiesman, Tzu-Li (Gordon) Tai ● State Processor API for the rescue
  • 15.
    © 2019 Ververica15= Worst Practice State Size & Network Stateful Operator TaskManager (1 Slot) Source keyBy Sink Ingress MB/s Shuffle MB/s Shuffle MB/s Egress MB/s FilesystemStatebackend Checkpoint MB/s #key * state_size #records/s * size #records/s * size * #tm-1/#tm overall_state_size * 1/checkpoint_interval #records/s * size * #tm-1/#tm #messages/s * size
  • 16.
    © 2019 Ververica16= Worst Practice Event Time & Out-Of-Orderness I want to send an alarm when the number of transactions per customer exceeds three in ten seconds. 4 8 13 11 15 21 4 8 13 11 15 21 4 8 13 11 15 21 4 8 13 11 15 21 Tumbling Window (10 secs): Sliding Window (10/5 secs): “Look Back” (10 secs) : Fire or wait for watermark? Fire or wait till watermark reaches end of window?
  • 17.
    © 2019 Ververica17= Worst Practice Batch Processing Requirements I receive two files every ten minutes: transactions.txt & quotes.txt ● They need to be transformed & joined. ● The output must be exactly one file. ● The operation needs to be atomic. → Don’t try to re-implement your batch jobs as a stream processing jobs.
  • 18.
    © 2019 Ververica18= Worst Practice Don’t use an iterative development process! AnalysisProject Setup Development (Testing) Production Maintenance
  • 19.
    © 2019 Ververica19= Worst Practice Applications (physical) Analytics (declarative) DataStream API Table API/SQL Types are Java / Scala classes Logical Schema for Tables Transformation Functions Declarative Language (SQL, Table DSL) State implicit in operationsExplicit control over State Explicit control over Time SLAs define when to trigger Executes as described Automatic Optimization Analytics or Application
  • 20.
    © 2019 Ververica20= Worst Practice Table API / SQL Red Flags ● When upgrading Apache Flink or my application I want to migrate state. ● I can not loose late data. ● I want to change the behaviour of my application during runtime.
  • 21.
    © 2019 Ververica21= Worst Practice Worst Practices make use of deeply-nested, complex data types (don’t think about potential schema evolution) KeySelectors#getKey can return any serializable type; use this freedom
  • 22.
    © 2019 Ververica22= Worst Practice ● serialization is not to be underestimated ● the simpler your data types the better ● use Flink POJOs or Avro SpecificRecords ● key types matter most ○ part of every KeyedState ○ part of every Timer ● tune locally Serializer Ops/s PojoSerializer 305 Kryo 102 Avro (Reflect API) 127 Avro (SpecificRecord API) 297
  • 23.
    © 2019 Ververica23= Worst Practice sourceStream.flatMap(new Deserializer()) .keyBy(“cities”) .timeWindow() .count() .filter(new GeographyFilter(“America”)) .addSink(...) don’t process data you don’t need ● project early ● filter early ● don’t deserialize unused fields, e.g. public class Record { private City city; private byte[] enclosedRecord; }
  • 24.
    © 2019 Ververica24= Worst Practice static variables to share state between Tasks spawning threads in user functions ● bugs ● deadlocks & lock contention (interaction with framework code) ● synchronization overhead ● complicated error prone (checkpointing) ● use AsyncStreams to reduce wait times on external IO ● use Timers to schedule tasks ● increase operator parallelism to increase parallelism
  • 25.
    © 2019 Ververica25= Worst Practice stream.keyBy(“key”) .timeWindow(Time.of(30, DAYS), Time.of(5, SECONDS) .apply(new MyWindowFunction()) stream.keyBy(“key”) .window(GlobalWindows.create()) .trigger(new CustomTrigger()) .evictor(new CustomEvictor()) .reduce/aggregate/fold/apply() ● Avoid custom windowing ● KeyedProcessFunction usually ○ less error-prone ○ simpler ● each record is added to > 500k windows ● without pre-aggregation
  • 26.
    © 2019 Ververica26= Worst Practice Queryable State for Inter-Task Communication ● Non-Thread Safe State Access ○ RocksDBStatebackend ✓ ○ FilesystemStatebackend ✗ ● Performance? ● Consistency Guarantees? ● Use Queryable State for debugging and monitoring only
  • 27.
    © 2019 Ververica27= Worst Practice stream.keyBy(“key”) .flatmap(..) .keyBy(“key”) .process(..) .keyBy(“key”) .timeWindow(..) ● DataStreamUtils#reinterpretAsKeyedStream ● Note: Stream needs to be partitioned exactly as Flink would partition it. public void flatMap(Bar bar, Collector<Foo> out) throws Exception { MyParserFactory factory = MyParserFactory.newInstance(); MyParser parser = factory.newParser(); out.collect(parser.parse(bar)); } ● use RichFunction#open for initialization logic
  • 28.
    © 2019 Ververica28= Worst Practice Don’t use an iterative development process! AnalysisProject Setup Development (Testing) Go Live Maintenance
  • 29.
    © 2019 Ververica29= Worst Practice That’s a pyramid! UDF Unit Tests System Tests
  • 30.
    © 2019 Ververica30= Worst Practice That’s a pyramid! UDF Unit Tests AbstractTestHarness MiniClusterResource System Tests
  • 31.
    © 2019 Ververica31= Worst Practice Examples: https://github.com/knaufk/flink-testing-pyramid Testing Stateful and Timely Operators and Functions @Test public void testingStatefulFlatMapFunction() throws Exception { //push (timestamped) elements into the operator (and hence user defined function) testHarness.processElement(2L, 100L); //trigger event time timers by advancing the event time of the operator with a watermark testHarness.processWatermark(100L); //trigger processing time timers by advancing the processing time of the operator directly testHarness.setProcessingTime(100L); //retrieve list of emitted records for assertions assertThat(testHarness.getOutput(), containsInExactlyThisOrder(3L)) //retrieve list of records emitted to a specific side output for assertions (ProcessFunction only) assertThat(testHarness.getSideOutput(new OutputTag<>("invalidRecords")), hasSize(0)) }
  • 32.
    © 2019 Ververica32= Worst Practice Don’t use an iterative development process! AnalysisProject Setup Development (Testing) Go Live Maintenance
  • 33.
    © 2019 Ververica33= Worst Practice Ignore Spiky Loads! ● seasonal fluctuations ● checkpoint alignment ● watermark interval ● GC pauses ● ... ● timers processing time records/s
  • 34.
    © 2019 Ververica34= Worst Practice Ignore Spiky Loads! ● seasonal fluctuations ● checkpoint alignment ● watermark interval ● GC pauses ● ... ● timers processing time records/s catch-up scenario
  • 35.
    © 2019 Ververica35= Worst Practice use the Flink Web Interface as monitoring system Monitoring & Metrics if at all start monitoring now using latency markers in production [1] https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html ● don’t miss the chance to learn about Flink’s runtime during development ● not sure how to start → read [1] ● not the right tool for the job → MetricsReporters (e.g Prometheus, InfluxDB, Datadog) ● too many metrics can bring JobManagers down ● high overhead (in particular with metrics.latency.granularity: subtask) ● measure event time lag instead [2] [2] https://flink.apache.org/2019/06/05/flink-network-stack.html
  • 36.
    © 2019 Ververica36= Worst Practice Configuration choosing RocksDBStatebackend by default playing around with Slots and SlotSharingGroups (too early) NFS/EBS/etc. as state.backend.rocksdb.localdir ● FilesystemStatebackend is faster & easier to configure ● State Processor API can be used to switch later ● disk IO ultimately determines how fast you can read/write state ● usually not worth it ● leads to less chaining and additional serialization, more network shuffles and lower resource utilization
  • 37.
    © 2019 Ververica37= Worst Practice Don’t use an iterative development process! AnalysisProject Setup Development (Testing) Go Live Maintenance
  • 38.
    © 2019 Ververica38= Worst Practice with a fast-pace project like Apache Flink, don’t upgrade
  • 39.
  • 40.
    © 2019 Ververica www.ververica.com@snntrablekonstantin@ververica.com