100% found this document useful (1 vote)

45 views36 pages

Big Data Batch Analytics Lecture

Uploaded by

Mohammed Albohiry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

45 views36 pages

Big Data Batch Analytics Lecture

Uploaded by

Mohammed Albohiry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

BIG DATA Master IT Lecture 8 Course code : M25331

Batch Analytics

Dr. Ali Haider Shamsan

1
Lecture Outlines
• Batch Analysis frameworks
• Hadoop and MapReduce
• Pig
• Apache Oozie
• Apache Spark
• Apache Solr

Review

• Data Acquisition

Keywords

big data, Database, batch Analytics

2
Review
NoSQL

• Data Acquisition Considerations

• Publish - Subscribe Messaging Frameworks
• Big Data Collection Systems
• Messaging Queues
• Custom Connectors

3
Batch Analysis frameworks
• Batch analytics is a type of analytics that involves processing large amounts of
data in batches, typically over a period of time.
• Batch analytics typically involve the use of ETL (Extract-Transform-Load)
processes to
• extract data from various sources,
• transform the data into a structure suitable for analytics,
• and load it into a data warehouse or other data repository.
• This is then followed by the use of analytical tools to
• explore the data,
• discover patterns, and
• generate insights.
• Batch analytics can be used to conduct predictive and descriptive analytics, as
well as more complex analytics such as machine learning and deep learning.
4
Batch Analysis frameworks

• Hadoop-MapReduce
• Pig
• Spark
• Solr.

5
Batch Analysis frameworks
Hadoop and
MapReduce
• Apache Hadoop is an open source framework for distributed batch
processing of big data.
• Similarly, MapReduce is a parallel programming model suitable
analysis of big data.
• MapReduce algorithms allow large-scale computations to be
automatically parallelized across a large cluster of servers.

6
Batch Analysis frameworks
MapReduce
Programming Model
• MapReduce is a parallel data processing model for processing and analysis
of massive scale data .
• MapReduce model has two phases: Map and Reduce.
• The input data to the map and reduce phases is in the form of key-value
pairs.
• Run-time systems for MapReduce are typically large clusters built of
commodity hardware.
• The MapReduce run-time systems take care of tasks such partitioning the
data, scheduling of jobs and communication between nodes in the cluster.
• This makes it easier for programmers to analyze massive scale data
without worrying about tasks such as data partitioning and scheduling. 7
Batch Analysis frameworks
MapReduce
Programming Model
• In the Map phase, data is read from a distributed file system, partitioned
among a set of computing nodes in the cluster, and sent to the nodes as a
set of key-value pairs.
• The Map tasks process the input records independently of each other
and produce intermediate results as key-value pairs.
• When all the Map tasks are completed, the Reduce phase begins in
which the intermediate data with the same key is aggregated.
• An optional Combine task can be used to perform data aggregation on
the intermediate data of the same key for the output of the mapper
before transferring the output to the Reduce task.
8
Batch Analysis frameworks

Hadoop YARN

• Hadoop YARN is the next generation architecture of Hadoop (version

2.x).
• In the YARN architecture, the original processing engine of Hadoop
(MapReduce) has been separated from the resource management
component (which is now part of YARN).
• This makes YARN effectively an operating system for Hadoop that
supports different processing engines on a Hadoop cluster such as
• MapReduce for batch processing, Apache Tez for interactive queries, Apache
Storm for stream processing.

9
Batch Analysis frameworks

Hadoop YARN

Figure 8.1
10
Batch Analysis frameworks

Hadoop YARN

The key components of YARN are described as follows:

1. Resource Manager (RM):
• RM manages the global assignment of compute resources to
applications.
• RM consists of two main services: –
• Scheduler: Scheduler is a pluggable service that manages and enforces
the resource scheduling policy in the cluster.
• Applications Manager (AsM):
• AsM manages the running Application Masters in the cluster.
• AsM is responsible for starting application masters and for monitoring
and restarting them on different nodes in case of failures.
11
Batch Analysis frameworks

Hadoop YARN
The key components of YARN are described as follows:
2. Application Master (AM):
• A application Master AM manages the application’s life cycle.
• AM is responsible for negotiating resources from the RM and working with the
NMs to execute and monitor the tasks.
3. Node Manager (NM):
• NM manages the user processes on that machine.
4. Containers:
• Container is a bundle of resources allocated by RM (memory, CPU and network).
• A container is a conceptual entity that grants an application the privilege to use a
certain amount of resources on a given machine to run a task.
• Each node has multiple containers based on the resource allocations made by
the RM. 12
Batch Analysis frameworks

Hadoop YARN

Figure 8.2

13
Batch Analysis frameworks

Hadoop YARN
• Figure 8.2 shows a YARN cluster with a Resource Manager node and
three Node Manager nodes.
• There are as many Application Masters running as there are applications
(jobs).
• Each application’s AM manages the application tasks such as
• starting, monitoring and restarting tasks in case of failures.
• Each application has multiple tasks.
• Each task runs in a separate container.
• Each container in YARN can be used for both map and reduce tasks.
• The resource allocation model of YARN is more flexible with the
introduction of resource containers which improve cluster utilization.
14
Batch Analysis frameworks

Hadoop YARN

Figure 8.3

15
Batch Analysis frameworks

Hadoop YARN
• To better understand the YARN job execution workflow let us analyze the interactions
between the main components on YARN.
• Figure 8.3 shows the interactions between a Client and Resource Manager.
• Job execution begins with the submission of a new application request by the client to
the RM.
• The RM then responds with a unique application ID and information about cluster
resource capabilities that the client will need in requesting resources for running the
application’s AM.
• Using the information received from the RM, the client constructs and submits an
Application Submission Context which contains information such as scheduler queue,
priority and user information.
• The Application Submission Context also contains a Container Launch Context which
contains the application’s jar, job files, security tokens and any resource requirements.
• The client can query the RM for application reports.
• The client can also "force kill" an application by sending a request to the RM.
16
Batch Analysis frameworks

Hadoop YARN

Figure 8.4

17
Batch Analysis frameworks

Hadoop YARN
• Above Figure shows the interactions between Resource Manager and
Application Master.
• Upon receiving an application submission context from a client, the RM finds
an available container meeting the resource requirements for running the AM
for the application.
• On finding a suitable container, the RM contacts the NM for the container to
start the AM process on its node.
• When the AM is launched it registers itself with the RM.
• The registration process consists of handshaking that conveys information such
as the port that the AM will be listening on, the tracking URL for monitoring the
application’s status and progress, etc.
• The registration response from the RM contains information for the AM that is
used in calculating and requesting any resource requests for the application’s
individual tasks (such as minimum and maximum resource capabilities for the
cluster). 18
Batch Analysis frameworks

Hadoop YARN

• The AM relays heartbeat and progress information to the RM.

• The AM sends resource allocation requests to the RM that contains a list
of requested containers, and may also contain a list of released
containers by the AM.
• Upon receiving the allocation request, the scheduler component of the
RM computes a list of containers that satisfy the request and sends back
an allocation response.
• Upon receiving the resource list, the AM contacts the associated NMs
for starting the containers.
• When the job finishes, the AM sends a Finish Application message to the
RM.
19
Batch Analysis frameworks

Hadoop YARN

Figure 8.5
20
Batch Analysis frameworks

Hadoop YARN

• Figure 8.5 shows the interactions between the an Application Master

and the Node Manager.
• Based on the resource list received from the RM, the AM requests the
hosting NM for each container to start the container.
• The AM can request and receive a container status report from the Node
Manager.
• Figure 8.6 shows the MapReduce job execution within a YARN cluster

21
Batch Analysis frameworks

Hadoop YARN

Figure 8.6

22
Hadoop YARN

Hadoop Schedulers

• The scheduler is a pluggable component in Hadoop that allows it to

support different scheduling algorithms.
• The pluggable scheduler framework provides the flexibility to support a
variety of workloads with varying priority and performance constraints.
• The Hadoop scheduling algorithms are described as follows:
FIFO
• FIFO scheduler maintains a work queue in which the jobs are queued.
The scheduler pulls jobs in first-in first-out manner (oldest job first) for
scheduling.
• There is no concept of priority or size of the job in FIFO scheduler.
23
Hadoop YARN

Hadoop Schedulers
Fair Scheduler
• The Fair Scheduler was originally developed by Facebook.
• Facebook uses Hadoop to manage the massive content and log data it
accumulates every day.
• It is our understanding that the need for Fair Scheduler arose when Facebook
wanted to share the data warehousing infrastructure between multiple users.
• The Fair Scheduler allocates resources evenly between multiple jobs and also
provides capacity guarantees.
• Fair Scheduler assigns resources to jobs such that each job gets an equal share
of the available resources on average over time.
• The Fair Scheduler lets short jobs finish in reasonable time while not starving
long jobs.
24
Hadoop YARN

Hadoop Schedulers

Fair Scheduler
• Tasks slots that are free are assigned to to the new jobs, so that each job
gets roughly the same amount of CPU time.
• The Fair Scheduler maintains a set of pools into which jobs are placed.
Each pool has a guaranteed capacity.
• When there is a single job running, all the resources are assigned to that
job. When there are multiple jobs in the pools, each pool gets at least as
many task slots as guaranteed.
• This lets the scheduler guarantee capacity for pools while utilizing
resources efficiently when these pools don’t contain jobs.
25
Hadoop YARN

Hadoop Schedulers

Fair Scheduler
• The Fair Scheduler keeps track of the compute time received by each
job.
• Fair scheduler is useful when a small or large Hadoop cluster is shared
between multiple groups of users.
• Though the fair scheduler ensures fairness by maintaining a set of pools
and providing guaranteed capacity to each pool, it does not provide any
timing guarantees and hence it is ill-equipped for real-time jobs.

26
Hadoop YARN

Hadoop Schedulers

Capacity Scheduler
• Capacity scheduler has similar functionality as the Fair Scheduler but
adopts a different scheduling philosophy.
• In Capacity Scheduler, multiple named queues are defined, each with a
configurable number of map and reduce slots.
• Each queue is also assigned a guaranteed capacity.
• The Capacity Scheduler gives each queue its capacity when it contains
jobs, and shares any unused capacity between the queues.

27
Hadoop YARN

Hadoop Schedulers

Capacity Scheduler
• When a TaskTracker has free slots, the Capacity Scheduler picks a queue
for which the ratio of number of running slots to capacity is the lowest.
• The capacity scheduler is useful when a large Hadoop cluster is shared
between with multiple clients and different types and priorities of jobs.
• Though the capacity scheduler ensures fairness by maintaining a set of
queues and providing guaranteed capacity to each queue, it does not
provide any timing guarantees and, therefore, it may be ill-equipped for
real-time jobs.

28
Batch Analysis frameworks

Pig
• While MapReduce is a powerful for big data analysis, for certain complex
analysis jobs, developers may find it difficult to identify the key-value pairs
involved at each step and then implement the map and reduce functions.
• Moreover, complex analysis jobs may require multiple MapReduce jobs to be
chained.
• Pig is a high-level data processing language which makes it easy for developers
to write data analysis scripts, which are translated into MapReduce programs
by the Pig compiler.
Pig includes:
• (1) a high-level language (called Pig Latin) for expressing data analysis programs and
• (2) a complier which produces sequences of MapReduce programs from the pig scripts.
• Pig can be executed either in local mode or MapReduce mode. 29
Batch Analysis frameworks

Pig
• In local mode, Pig runs inside a single JVM process on a local machine.
• Local mode is useful for development purpose and testing the
scripts with small data files on a single machine.
• MapReduce mode requires a Hadoop cluster.
• In MapReduce mode, Pig can analyze data stored in HDFS.
• Pig compiler translates the pig scripts into MapReduce programs
which are executed in a Hadoop cluster.
• Pig provides an interactive shell called grunt, for developing pig scripts.
• Pig provides various built-in functions such as AVG, MIN, MAX, SUM,
and COUNT.
30
Pig
Pig Operators
• Pig operators are a set of operators used in Apache Pig, a platform for creating data
analysis programs.
These operators include the following:
• 1. Foreach
• This operator generates a new data set by applying a set of transformations to each record
of the input data set.
• 2. Join
• This operator allows two data sets to be joined together to form a single data set.
• 3. Filter
• This operator is used to select a subset of records from the input data set that meet certain
criteria.
• 4. Group
• This operator is used to group related records from the input data set.
• 5. Sort
• This operator is used to sort the records of the input data set in ascending or descending
order. 31
Pig
Pig Operators
• 6. Limit
• This operator is used to limit the number of records returned from the input data
set.
• 7. Distinct
• This operator is used to remove duplicate records from the input data set.
• 8. Union
• This operator is used to combine two data sets into a single data set.
• 9. Load
• This operator is used to load data from an external source such as a file or database
into the Apache Pig environment.
• 10. Store
• This operator is used to write the output data set to an external source such as a file
or database.
32
Pig
Data Types
• The simple data types work the same way as in other programming languages.
Let is look at the complex data types in detail.
• Tuple
• A tuple is an ordered set of fields.
• A collection of fields with different data types.
• Bag
• A bag is an unordered collection of tuples.
• A bag is represented with curly braces.
• Map
• A Map is a set of key-value pairs.
• Map is represented with square brackets.
• a # is used to separate the key and value. 33
Pig
Storing Results
• To save the results on the filesystem the STORE operator is used.
• Pig uses a lazy evaluation strategy and delays the evaluation of
expressions till a STORE or DUMP operator triggers the results to be
stored or displayed.
• The DUMP operator is used to dump the results on the console.
• The EXPLAIN operator is used to view the logical, physical, and
MapReduce execution plans for computing a relation.
• The ILLUSTRATE operator is used to display the step by step
execution of statements to compute a relation with a small sample of
data.

34
Batch Analysis frameworks

Apache Oozie
• Many batch analysis applications require more than one MapReduce job to
be chained to perform data analysis.
• This can be accomplished using Apache Oozie system.
• Oozie is a workflow scheduler system that allows managing Hadoop jobs.
• With Oozie, you can create workflows which are a collection of actions
(such as MapReduce jobs) arranged as Direct Acyclic Graphs (DAG).
• Control dependencies exist between the actions in a workflow.
• Thus, an action is executed only when the preceding action is completed.
• An Oozie workflow specifies a sequence of actions that need to be
executed using an XML-based Process Definition Language called Hadoop
Process Definition Language (hPDL).
• Oozie supports various types of actions such as Hadoop MapReduce, Hadoop
file system, Pig, Java, Email, Shell, Hive, Sqoop, SSH and custom actions.
35
Next lecture

• Realtime Analysis

Assignment

Deadline

Previous Deadline

Lecture 8 - Batch Analysis Part 1
No ratings yet
Lecture 8 - Batch Analysis Part 1
29 pages
Data W - Bigdata8
No ratings yet
Data W - Bigdata8
105 pages
33-Unit5 DataAnalytics ApacheHadoop BatchDataAnalysis IoTPart2
No ratings yet
33-Unit5 DataAnalytics ApacheHadoop BatchDataAnalysis IoTPart2
27 pages
Big Data 101-125
No ratings yet
Big Data 101-125
25 pages
Unit IV
No ratings yet
Unit IV
14 pages
Hadoop YARN vs MapReduce Architecture
No ratings yet
Hadoop YARN vs MapReduce Architecture
31 pages
10 - Big Data Architecture and Tools
No ratings yet
10 - Big Data Architecture and Tools
31 pages
Bda Unit 3 - Mam
No ratings yet
Bda Unit 3 - Mam
89 pages
Big Data Unit 3 Own
No ratings yet
Big Data Unit 3 Own
20 pages
Big Data Notes Unit-3
No ratings yet
Big Data Notes Unit-3
7 pages
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
100% (1)
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
49 pages
Unit4 4.2 DataAnalytics ApacheHadoop BatchDataAnalysis IoT
No ratings yet
Unit4 4.2 DataAnalytics ApacheHadoop BatchDataAnalysis IoT
29 pages
YARN Tutorial: Architecture & Use Cases
No ratings yet
YARN Tutorial: Architecture & Use Cases
14 pages
BDMA Part 3
No ratings yet
BDMA Part 3
22 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
Report Title: Wasit University
No ratings yet
Report Title: Wasit University
8 pages
Hadoop Class 2 PDF
No ratings yet
Hadoop Class 2 PDF
18 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Download
No ratings yet
Download
7 pages
CP 329 - Lecture Twonew - 2025 - 122059
No ratings yet
CP 329 - Lecture Twonew - 2025 - 122059
43 pages
Hadoop
No ratings yet
Hadoop
7 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
No ratings yet
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
2 pages
Second Exam Summary
No ratings yet
Second Exam Summary
44 pages
Custom Notes
No ratings yet
Custom Notes
10 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts
No ratings yet
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts
14 pages
Unit-2 1
No ratings yet
Unit-2 1
93 pages
Unit 3
No ratings yet
Unit 3
18 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
Unit 3
No ratings yet
Unit 3
90 pages
Yarn and Its Failures
No ratings yet
Yarn and Its Failures
22 pages
Module 1 - Introduction To Big Data
100% (1)
Module 1 - Introduction To Big Data
40 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
30 pages
Big Data - Tomas Iglesias IV
No ratings yet
Big Data - Tomas Iglesias IV
37 pages
Apache Hadoop YARN
No ratings yet
Apache Hadoop YARN
24 pages
Module 2 HDFS
No ratings yet
Module 2 HDFS
33 pages
Big Data Unit 2 AKTU Notes
No ratings yet
Big Data Unit 2 AKTU Notes
63 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
Unit2 Bda
No ratings yet
Unit2 Bda
12 pages
Bigdata Lecture 4
No ratings yet
Bigdata Lecture 4
23 pages
Hadoop Notesforstudents
No ratings yet
Hadoop Notesforstudents
13 pages
Big Data Tools for Engineers
No ratings yet
Big Data Tools for Engineers
5 pages
Chap5 BigDataComputingAndProcessing
No ratings yet
Chap5 BigDataComputingAndProcessing
72 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
04 MapRed 6 JobExecutionOnYarn
No ratings yet
04 MapRed 6 JobExecutionOnYarn
20 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
22 pages
BDA praON Iat1
No ratings yet
BDA praON Iat1
12 pages
4 PPT On YARN MapReduce 31 10 20
No ratings yet
4 PPT On YARN MapReduce 31 10 20
17 pages
Chap8 YARN
No ratings yet
Chap8 YARN
31 pages
Mod 5
No ratings yet
Mod 5
46 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Unit 2
No ratings yet
Unit 2
9 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit - 4 Yarn
No ratings yet
Unit - 4 Yarn
20 pages
UNIT-4 Bda
No ratings yet
UNIT-4 Bda
26 pages
Introduction To Hadoop (1) - 1
No ratings yet
Introduction To Hadoop (1) - 1
39 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Abinitio Intvw Questions
100% (1)
Abinitio Intvw Questions
20 pages
Evaluasi Bahasa Inggris Bab 4
No ratings yet
Evaluasi Bahasa Inggris Bab 4
3 pages
HPE AI-ML Accelerated With HPE Proliant
No ratings yet
HPE AI-ML Accelerated With HPE Proliant
33 pages
Internship Report Anguraj
No ratings yet
Internship Report Anguraj
35 pages
Features of Maltego
No ratings yet
Features of Maltego
6 pages
Internux Recharge Using Dealer Inventory Stock API - Technical Specification V 1.0.0
No ratings yet
Internux Recharge Using Dealer Inventory Stock API - Technical Specification V 1.0.0
14 pages
Eng 110 Purpossive Communication
No ratings yet
Eng 110 Purpossive Communication
8 pages
Natural Language Processing in Python
No ratings yet
Natural Language Processing in Python
214 pages
Final Kisi Soal English Provinsi
No ratings yet
Final Kisi Soal English Provinsi
7 pages
All Programming Interview Q&A
No ratings yet
All Programming Interview Q&A
14 pages
2016-30 Jan-Three Holy Hierarchs
No ratings yet
2016-30 Jan-Three Holy Hierarchs
8 pages
TA3 Workbook Answer Key - Module 1
No ratings yet
TA3 Workbook Answer Key - Module 1
5 pages
Software Engineering Principles
No ratings yet
Software Engineering Principles
38 pages
Neelam
No ratings yet
Neelam
21 pages
Basic Algebra
100% (4)
Basic Algebra
13 pages
Top Notch Listening
100% (2)
Top Notch Listening
1 page
The Planners
No ratings yet
The Planners
10 pages
PROPOSAL DIKLATSAR ANGKATAN KE 2 SMP KP Babakan
No ratings yet
PROPOSAL DIKLATSAR ANGKATAN KE 2 SMP KP Babakan
16 pages
Langham DevotionalBooklet Who Is Jesus
No ratings yet
Langham DevotionalBooklet Who Is Jesus
68 pages
The Great Disappointment Explained
No ratings yet
The Great Disappointment Explained
4 pages
MC - Final Student List (HSC Form Fillup-2025) 19.3.25
No ratings yet
MC - Final Student List (HSC Form Fillup-2025) 19.3.25
200 pages
Joshua Landy, Michael T. Saler - The Re-Enchantment of The World - Secular Magic in A Rational Age-Stanford University Press (2009)
100% (2)
Joshua Landy, Michael T. Saler - The Re-Enchantment of The World - Secular Magic in A Rational Age-Stanford University Press (2009)
416 pages
Omni Paint
No ratings yet
Omni Paint
10 pages
Inverse Trigonometric Function: Multiple Choice Questions
100% (1)
Inverse Trigonometric Function: Multiple Choice Questions
6 pages
Complex Object Verb Usage Guide
No ratings yet
Complex Object Verb Usage Guide
1 page
Winxp PDF
No ratings yet
Winxp PDF
18 pages
Daily Vocabulary (Day 2)
No ratings yet
Daily Vocabulary (Day 2)
2 pages
Grade 4 Session 1
No ratings yet
Grade 4 Session 1
29 pages
Hackanythingfor Blogspot Com 2020 07 Api Testing Checklist HTML
No ratings yet
Hackanythingfor Blogspot Com 2020 07 Api Testing Checklist HTML
5 pages
Sending Salat Upon The Prophet
No ratings yet
Sending Salat Upon The Prophet
20 pages

Big Data Batch Analytics Lecture

Uploaded by

Big Data Batch Analytics Lecture

Uploaded by

BIG DATA Master IT Lecture 8 Course code : M25331

Dr. Ali Haider Shamsan

big data, Database, batch Analytics

• Data Acquisition Considerations

• Hadoop YARN is the next generation architecture of Hadoop (version

The key components of YARN are described as follows:

• The AM relays heartbeat and progress information to the RM.

• Figure 8.5 shows the interactions between the an Application Master

• The scheduler is a pluggable component in Hadoop that allows it to

You might also like