•
•
Data Stream Data Stream
Sales Sales
Trends Distribution
Data-Driven Marketing
Monitoring and Fault
Detection
~60M Flight
Events
Weekly!
Data Stream
Data Stream
A possibly unbounded sequence
of data records
Data Stream
A possibly unbounded sequence
of data records
Timestamped Geo-tagged
Data Stream
Streaming Data
System
Data Streams
Results
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Near-real-time
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Near-real-time
Independent
computations
Streaming Data
System
Data Streams
Manage one record or Results
small time window
Near-real-time
Independent
computations
Non-interactive
•
•
•
•
•
•
•
Static / Batch
Size determines
Processing
time and space
Streaming Unbounded size,
Processing but finite time
and space
•
•
λ Now
Batch Real-time
Batch Real-time
Batch Real-time
…
Time
The Lambda architecture is structured into three layers:
1. Batch Layer:
• Responsible for managing the historical data and processing it in
batches.
• Performs complex algorithms on the entire data set to provide
comprehensive and accurate results.
• Output is stored in a batch view, which is a read-optimized view of the
data.
2. Speed Layer:
• Handles real-time data processing.
• Provides low-latency results for recent data.
• Output from the speed layer is combined with the batch layer's results
to generate a complete and up-to-date view of the data.
3. Serving Layer:
• Serves the results to queries in real-time.
• Merges the results from the batch and speed layers to provide a unified
view.
Kappa Architecture
In contrast to the Lambda architecture, which maintains separate batch and stream
processing paths, the Kappa architecture proposes using a single stream processing
pipeline for both real-time and batch data. The key idea is to treat batch processing
as a special case of stream processing.
Here are the main components of the Kappa architecture:
1. Stream Ingestion:
• All data, whether historical or real-time, is ingested through a unified
stream processing pipeline.
2. Stream Processing:
• A stream processing engine processes the data in real-time as it arrives.
• The same processing logic is applied to both historical and real-time
data.
3. Storage:
•Processed data is stored in a storage system that is optimized for
efficient querying and retrieval.
4. Query Layer:
• The query layer interacts with the storage system to serve queries and
provide access to the processed data.
While the Kappa architecture offers simplicity and elegance, it may not be suitable
for all use cases. For example, if batch processing requires complex algorithms or if
there is a need for explicit separation of concerns between batch and real-time
processing, the Lambda architecture or a hybrid approach might be more
appropriate.
Ultimately, the choice between the Kappa and Lambda architectures depends on the
specific requirements of a given big data application and the trade-offs that the
architecture introduces in terms of complexity, maintainability, and performance.
Delta Architecture
Size
Frequency
Periodic: evenings,
weekends, etc.
Sporadic: major
events
Average
= 6000
Tweets / Second
Record
> 144,000
Tweets / Second
•
•
•
Manage one record or
small time window Many challenges!
Near-real-time
Independent
computations
Non-interactive