KEMBAR78
Lecture #7.1 - Introducing Streaming Data | PDF | Real Time Computing | Information Technology
0% found this document useful (0 votes)
69 views24 pages

Lecture #7.1 - Introducing Streaming Data

This document introduces streaming data systems. It defines real-time systems and distinguishes between streaming data systems. Key terminology for streaming data systems is explained, including event time vs processing time, windows of data, and message delivery semantics. Finally, examples of use cases for streaming data systems are provided, such as clickstream analytics, high-frequency trading, predictive maintenance, and fraud detection.

Uploaded by

Sumit Khaitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views24 pages

Lecture #7.1 - Introducing Streaming Data

This document introduces streaming data systems. It defines real-time systems and distinguishes between streaming data systems. Key terminology for streaming data systems is explained, including event time vs processing time, windows of data, and message delivery semantics. Finally, examples of use cases for streaming data systems are provided, such as clickstream analytics, high-frequency trading, predictive maintenance, and fraud detection.

Uploaded by

Sumit Khaitan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MODERN DATA ARCHITECTURES

FOR BIG DATA II

INTRODUCING STREAMING DATA


Agenda

● What’s a real-time system?


○ Soft & Near real-time
● Real-time vs Streaming systems
○ Streams Data Systems
● Streams Data Systems terminology
○ Event vs Processing/Stream time
○ Windows of data
○ Message delivery semantics
● Use cases
1.
WHAT’S A
REAL-TIME
SYSTEM?
What’s a real-time system?

● Real-time systems & computing have been around


for decades.

● They have become popular lately causing


ambiguity and debate though.

● Let’s clarify it by understanding this classification:

○ Hard real-time
○ Soft real-time
○ Near real-time
What’s a real-time system?

Stream processing & real-time analytics


Soft & Near real-time

● Latency measured in ms, sec or even min.

● Tolerance for delay between low and high:

○ No system failure, no life at risk


○ What a relief!

● Differentiating soft & near real-time becomes


blurry and, often times, it’s simplified to just
real-time.
Soft & Near real-time

● A generic real-time system with consumers:

Tightly coupled;
hard to make it
scalable!
2.
REAL-TIME
VS
STREAMING
SYSTEMS
Streams Data System

● A non-hard real-time system that makes its data


available at the moment a client application
needs it.

● Clients may not be consuming the data in real time


due to network delays, application design, or
client applications
aren’t even running.
Streams Data System

● A data stream is a continuous flow of data that is


usually modeled as a sequence of elements and,
theoretically, is unbounded/infinite in size.

● Streaming data systems process huge amount of


data, and processing this data might need some
algorithms with approximations or aggregations:

○ Sampling the stream


○ Filtering the stream to keep relevant elements
○ Estimating number of different elements
○ ...
Streams Data System

● Streaming data architectural blueprint we’ll use


as a reference during this course:

Already covered in the


“Introduction to Big Data
Architecture” course
3.
STREAMS
DATA SYSTEMS
TERMINOLOGY
Event vs Processing/Stream time

● More often than not, elements of a data stream


are called events.

● In regards to timing related to events, this is it:

○ Event time.- time at which the event occurs.


○ Processing/Stream time.- time at which an event is
processed/enters the streaming system.
Event vs Processing/Stream time

● Ideally, event time == processing/stream time.

● Processing-time lag - how much delay is


observed between when the events for a
given time occurred and when they were
processed.

● Event-time skew - how far behind the ideal


(in event time) the pipeline is currently.

● They’re just two ways of looking at the same


thing; processing-time lag and event-time
skew at any given point in time are identical.
Windows of data

● Due to the infinite size and never-ending nature


of streams of data, data doesn’t keep in memory.

● Computation/processing is possible by using


windows of data - certain amount of data that can
be processed.
Windows of data

● Attributes common to windowing techniques:

○ Trigger policy - notifies that it’s time to process all


the data that is in the window.
○ Eviction policy - decides if a data element should be
evicted from the window.

● Common windowing strategies based on:

○ Number of events - tumbling windows


○ Time - fixed, sliding and sessions windows
Windows of data

● Tumbling windows:

● Time based windowing strategies:


Message delivery semantics

● Within the context of producers, brokers,


consumers and messages that will be studied later.
Message delivery semantics

● Three common semantic guarantees when looking


at message queuing:

○ At most once - A message may get lost, but there


won’t be duplicates at consumer side.
○ At least once - A message will never get lost, but
there might be duplicates at consumer side.
○ Exactly once - A message is never lost and is read by a
consumer once and only once.
4.
USE CASES
Clickstream/site Analytics

● Stream of visitors (identified with cookies)

● Applications:

○ find hot-links
○ frequent customers
○ probability of click
over new content.
High-frequency trading

● Stream of stock trades

● Applications:

○ Exploit trading opportunities


that may open up for
ms or secs.
○ Taking advantage of
small price imbalances
to generate
sizable profits.
Predictive maintenance

● Stream of measures (whatever can be measured)

● Applications:

○ Condition based maintenance.


○ Unnecessary maintenance
and use of spare parts.
○ Guaranteeing levels of
availability.
Fraud detection

● Stream of actions (whatever can be done)

● Applications:

○ Credit card fraud


○ Stock trading fraud
○ Video game cheaters
○ Cyber Security risks

You might also like