KEMBAR78
Streaming Data | PDF | Information Technology Management | Applied Mathematics
0% found this document useful (0 votes)
34 views33 pages

Streaming Data

The document discusses data stream processing architectures, specifically Lambda, Kappa, and Delta architectures. Lambda architecture separates batch and real-time processing into three layers, while Kappa simplifies this by using a single stream processing pipeline for both. The choice of architecture depends on the specific needs of the application, considering factors like complexity and performance.

Uploaded by

Muneeba Kaleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views33 pages

Streaming Data

The document discusses data stream processing architectures, specifically Lambda, Kappa, and Delta architectures. Lambda architecture separates batch and real-time processing into three layers, while Kappa simplifies this by using a single stream processing pipeline for both. The choice of architecture depends on the specific needs of the application, considering factors like complexity and performance.

Uploaded by

Muneeba Kaleem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33


Data Stream Data Stream
Sales Sales
Trends Distribution

Data-Driven Marketing
Monitoring and Fault
Detection
~60M Flight
Events
Weekly!
Data Stream
Data Stream

A possibly unbounded sequence


of data records
Data Stream

A possibly unbounded sequence


of data records
Timestamped Geo-tagged
Data Stream
Streaming Data
System
Data Streams
Results
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Near-real-time
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Near-real-time
Independent
computations
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Near-real-time
Independent
computations
Non-interactive






Static / Batch
Size determines
Processing
time and space

Streaming Unbounded size,


Processing but finite time
and space


λ Now

Batch Real-time
Batch Real-time
Batch Real-time

Time
The Lambda architecture is structured into three layers:

1. Batch Layer:
• Responsible for managing the historical data and processing it in
batches.
• Performs complex algorithms on the entire data set to provide
comprehensive and accurate results.
• Output is stored in a batch view, which is a read-optimized view of the
data.
2. Speed Layer:
• Handles real-time data processing.
• Provides low-latency results for recent data.
• Output from the speed layer is combined with the batch layer's results
to generate a complete and up-to-date view of the data.
3. Serving Layer:
• Serves the results to queries in real-time.
• Merges the results from the batch and speed layers to provide a unified
view.

Kappa Architecture
In contrast to the Lambda architecture, which maintains separate batch and stream
processing paths, the Kappa architecture proposes using a single stream processing
pipeline for both real-time and batch data. The key idea is to treat batch processing
as a special case of stream processing.

Here are the main components of the Kappa architecture:

1. Stream Ingestion:
• All data, whether historical or real-time, is ingested through a unified
stream processing pipeline.
2. Stream Processing:
• A stream processing engine processes the data in real-time as it arrives.
• The same processing logic is applied to both historical and real-time
data.
3. Storage:
•Processed data is stored in a storage system that is optimized for
efficient querying and retrieval.
4. Query Layer:
• The query layer interacts with the storage system to serve queries and
provide access to the processed data.

While the Kappa architecture offers simplicity and elegance, it may not be suitable
for all use cases. For example, if batch processing requires complex algorithms or if
there is a need for explicit separation of concerns between batch and real-time
processing, the Lambda architecture or a hybrid approach might be more
appropriate.

Ultimately, the choice between the Kappa and Lambda architectures depends on the
specific requirements of a given big data application and the trade-offs that the
architecture introduces in terms of complexity, maintainability, and performance.

Delta Architecture
Size

Frequency
Periodic: evenings,
weekends, etc.

Sporadic: major
events
Average
= 6000
Tweets / Second

Record
> 144,000
Tweets / Second



Manage one record or
small time window Many challenges!
Near-real-time
Independent
computations
Non-interactive

You might also like