KEMBAR78
Lambda Architecture - Wikipedia | PDF | Data | Computer Programming
0% found this document useful (0 votes)
36 views4 pages

Lambda Architecture - Wikipedia

Lambda architecture is a data-processing framework that combines batch and stream-processing methods to manage large volumes of data, balancing latency, throughput, and fault-tolerance. It consists of three layers: a batch layer for accurate data processing, a speed layer for real-time data processing, and a serving layer for responding to queries. While effective for real-time analytics, it faces criticism for its complexity and the need for maintaining separate code bases for batch and streaming processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

Lambda Architecture - Wikipedia

Lambda architecture is a data-processing framework that combines batch and stream-processing methods to manage large volumes of data, balancing latency, throughput, and fault-tolerance. It consists of three layers: a batch layer for accurate data processing, a speed layer for real-time data processing, and a serving layer for responding to queries. While effective for real-time analytics, it faces criticism for its complexity and the need for maintaining separate code bases for batch and streaming processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lambda architecture

Lambda architecture is a data-processing architecture designed


to handle massive quantities of data by taking advantage of both
batch and stream-processing methods. This approach to architecture
attempts to balance latency, throughput, and fault-tolerance by
using batch processing to provide comprehensive and accurate views
of batch data, while simultaneously using real-time stream
processing to provide views of online data. The two view outputs
may be joined before presentation. The rise of lambda architecture is Flow of data through the processing
correlated with the growth of big data, real-time analytics, and the and serving layers of a generic
drive to mitigate the latencies of map-reduce.[1] lambda architecture

Lambda architecture depends on a data model with an append-only,


immutable data source that serves as a system of record.[2]:32 It is intended for ingesting and processing
timestamped events that are appended to existing events rather than overwriting them. State is
determined from the natural time-based ordering of the data.

Contents
Overview
Batch layer
Speed layer
Serving layer
Optimizations
Lambda architecture in use
Criticism
See also
References
External links

Overview
Lambda architecture describes a system consisting of three layers: batch processing, speed (or real-time)
processing, and a serving layer for responding to queries.[3]
[3]::13 The processing layers ingest from an
immutable master copy of the entire data set. This paradigm was first described by Nathan Marz in a
blog post titled "How to beat the CAP theorem" in which he originally termed it the "batch/realtime
architecture".[4]

Batch layer
The batch layer precomputes results using a distributed processing system that can handle very large
quantities of data. The batch layer aims at perfect accuracy by being able to process all available data
when generating views. This means it can fix any errors by recomputing based on the complete data set,
then updating existing views. Output is typically stored in a read-only database, with updates completely
replacing existing precomputed views.[3]:18

Apache Hadoop is the leading batch-processing system used in most high-throughput architectures.[5]
New massively parallel, elastic, relational databases like Snowflake, Redshift, Synapse and Big Query are
also used in this role.

Speed layer

The speed layer processes data streams in real time and without the
requirements of fix-ups or completeness. This layer sacrifices
throughput as it aims to minimize latency by providing real-time
views into the most recent data. Essentially, the speed layer is
responsible for filling the "gap" caused by the batch layer's lag in
providing views based on the most recent data. This layer's views
may not be as accurate or complete as the ones eventually produced Diagram showing the flow of data
by the batch layer, but they are available almost immediately after through the processing and serving
data is received, and can be replaced when the batch layer's views for layers of lambda architecture.
the same data become available.[3]:203 Example named components are
shown.
Stream-processing technologies typically used in this layer include
Apache Storm, SQLstream, Apache Samza, Apache Spark, Azure
Stream Analytics. Output is typically stored on fast NoSQL databases.[6][7]

Serving layer

Output from the batch and speed layers are stored in the serving
layer, which responds to ad-hoc queries by returning precomputed
views or building views from the processed data.

Examples of technologies used in the serving layer include Druid,


which provides a single cluster to handle output from both layers.[8] Diagram showing a lambda
Dedicated stores used in the serving layer include Apache architecture with a Druid data store.
Cassandra, Apache HBase, Azure Cosmos DB, MongoDB, VoltDB or
Elasticsearch for speed-layer output, and Elephant DB (https://gith
ub.com/nathanmarz/elephantdb), Apache Impala, SAP HANA or Apache Hive for batch-layer
output.[2]:45[6]

Optimizations
To optimize the data set and improve query efficiency, various rollup and aggregation techniques are
executed on raw data,[8]:23 while estimation techniques are employed to further reduce computation
costs.[9] And while expensive full recomputation is required for fault tolerance, incremental computation
algorithms may be selectively added to increase efficiency, and techniques such as partial computation
and resource-usage optimizations can effectively help lower latency.[3]:93,287,293
Lambda architecture in use
Metamarkets, which provides analytics for companies in the programmatic advertising space, employs a
version of the lambda architecture that uses Druid for storing and serving both the streamed and batch-
processed data.[8]:42

For running analytics on its advertising data warehouse, Yahoo has taken a similar approach, also using
Apache Storm, Apache Hadoop, and Druid.[10]:9,16

The Netflix Suro project has separate processing paths for data, but does not strictly follow lambda
architecture since the paths may be intended to serve different purposes and not necessarily to provide
the same type of views.[11] Nevertheless, the overall idea is to make selected real-time event data
available to queries with very low latency, while the entire data set is also processed via a batch pipeline.
The latter is intended for applications that are less sensitive to latency and require a map-reduce type of
processing.

Criticism
Criticism of lambda architecture has focused on its inherent complexity and its limiting influence. The
batch and streaming sides each require a different code base that must be maintained and kept in sync
so that processed data produces the same result from both paths. Yet attempting to abstract the code
bases into a single framework puts many of the specialized tools in the batch and real-time ecosystems
out of reach.[12]

In a technical discussion over the merits of employing a pure streaming approach, it was noted that
using a flexible streaming framework such as Apache Samza could provide some of the same benefits as
batch processing without the latency.[13] Such a streaming framework could allow for collecting and
processing arbitrarily large windows of data, accommodate blocking, and handle state.

See also
Event stream processing

References
1. Schuster, Werner. "Nathan Marz on Storm, Immutability in the Lambda Architecture, Clojure" (http://
www.infoq.com/interviews/marz-lambda-architecture). www.infoq.com. Interview with Nathan Marz, 6
April 2014
2. Bijnens, Nathan. "A real-time architecture using Hadoop and Storm" (http://lambda-architecture.net/a
rchitecture/2013-12-11-a-real-time-architecture-using-hadoop-and-storm-devoxx). 11 December
2013.
3. Marz, Nathan; Warren, James. Big Data: Principles and best practices of scalable realtime data
systems. Manning Publications, 2013.
4. Marz, Nathan. "How to beat the CAP theorem" (http://nathanmarz.com/blog/how-to-beat-the-cap-the
orem.html). 13 October 2011.
5. Kar, Saroj. "Hadoop Sector will Have Annual Growth of 58% for 2013-2020" (http://cloudtimes.org/20
14/05/28/hadoop-sector-will-have-annual-growth-of-58-for-2013-2020/) Archived (https://archive.is/2
0140826020014/http://cloudtimes.org/2014/05/28/hadoop-sector-will-have-annual-growth-of-58-for-2
013-2020/) 2014-08-26 at archive.today, 28 May 2014. Cloud Times.
6. Kinley, James. "The Lambda architecture: principles for architecting realtime Big Data systems" (http
s://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting),
retrieved 26 August 2014.
7. Ferrera Bertran, Pere. "Lambda Architecture: A state-of-the-art" (http://www.datasalt.com/2014/01/la
mbda-architecture-a-state-of-the-art/). 17 January 2014, Datasalt.
8. Yang, Fangjin, and Merlino, Gian. "Real-time Analytics with Open Source Technologies" (https://spea
kerdeck.com/druidio/real-time-analytics-with-open-source-technologies-1). 30 July 2014.
9. Ray, Nelson. "The Art of Approximating Distributions: Histograms and Quantiles at Scale" (https://me
tamarkets.com/2013/histograms/). 12 September 2013. Metamarkets.
10. Rao, Supreeth; Gupta, Sunil. "Interactive Analytics in Human Time" (http://www.slideshare.net/Hadoo
p_Summit/interactive-analytics-in-human-time?next_slideshow=1). 17 June 2014
11. Bae, Jae Hyeon; Yuan, Danny; Tonse, Sudhir. "Announcing Suro: Backbone of Netflix's Data
Pipeline" (http://techblog.netflix.com/2013/12/announcing-suro-backbone-of-netflixs.html), Netflix, 9
December 2013
12. Kreps, Jay. "Questioning the Lambda Architecture" (http://radar.oreilly.com/2014/07/questioning-the-l
ambda-architecture.html). radar.oreilly.com. Oreilly. Retrieved 15 August 2014.
13. Hacker News (https://news.ycombinator.com/item?id=7976785) retrieved 20 August 2014

External links
Repository of Information on Lambda of Architecture (http://lambda-architecture.net/)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Lambda_architecture&oldid=997909581"

This page was last edited on 2 January 2021, at 20:57 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site,
you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.

You might also like