KEMBAR78
Module 4 - Hadoop HDFS | PDF | Apache Hadoop | File System
0% found this document useful (0 votes)
14 views102 pages

Module 4 - Hadoop HDFS

The document provides an overview of the Hadoop Distributed Filesystem (HDFS), detailing its architecture, components like NameNode and DataNode, and features such as HDFS Federation and High Availability. It explains the design of HDFS for handling large files with streaming data access and its operational principles, including data storage, replication, and management. Additionally, it discusses the importance of checkpointing and the role of secondary NameNode in maintaining system resilience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views102 pages

Module 4 - Hadoop HDFS

The document provides an overview of the Hadoop Distributed Filesystem (HDFS), detailing its architecture, components like NameNode and DataNode, and features such as HDFS Federation and High Availability. It explains the design of HDFS for handling large files with streaming data access and its operational principles, including data storage, replication, and management. Additionally, it discusses the importance of checkpointing and the role of secondary NameNode in maintaining system resilience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

HADOOP DISTRIBUTED

FILESYSTEM
Presented by: Le Ngoc Thanh
Outline
o A brief review of Hadoop
o Hadoop Distributed Filesystem (HDFS)
• NameNode and DataNode
• HDFS Federation
• HDFS High Availability
• Data processing on HDFS
o Read data from HDFS
o Write data to HDFS

©lnthanh
2
A brief review of Hadoop
An essential ecosystem for Big Data

©lnthanh 3
Hadoop architecture
o The core of Hadoop includes HDFS and MapReduce.

©lnthanh
4
Hadoop architecture
o Working nodes in a Hadoop cluster can be
categorized into

©lnthanh
5
Hadoop architecture

©lnthanh
6
Pros of using Hadoop

Distributed storage Parallel processing

Cost
effective
Automatic failover
management
©lnthanh
7
Pros of using Hadoop

Data locality optimization


Traditional approach: copy
the data from USA to
Singapore ® not good in term
of performance and time.

Better approach: copy the


source code to USA from
Singapore.

©lnthanh
8
A typical workflow in HDFS

Store result in
Analyze data
the cluster
MapReduce HDFS write

Load data into the Read the result from the


cluster cluster

HDFS write HDFS read

©lnthanh
9
HDFS
A distributed filesystem designed to run on commodity hardware

©lnthanh 10
What is HDFS?
o Hadoop Distributed Filesystem (HDFS): A filesystem
designed for storing very large files with streaming
data access patterns, running on clusters of
commodity hardware.
STORAGE

MANAGE
©lnthanh
11
HDFS are designed for
Very large files
Files of hundreds of megabytes, gigabytes, or terabytes in size. There are Hadoop
clusters running today that store petabytes of data.

Streaming data access


Most efficient data processing pattern: write-once, read-many
The time to read the whole dataset is more important than the latency in reading the
first record.

Commodity hardware
Expensive and/or highly reliable hardware is not obligatory. HDFS is designed to
carry on working without a noticeable interruption to the user in the face of such
failure.

©lnthanh
12
But, HDFS are NOT designed for

Low-latency data access


Not for applications that require access in the tens of milliseconds HDFS is optimized
for delivering a high throughput of data, and this may be at the expense of latency.

Lots of small files


The limit to the number of files in a filesystem is governed by the amount of memory
on the namenode.

Multiple writers, arbitrary file modifications


Writes are always made at the end of file in append-only fashion. There is no support
for multiple writers or for modifications at arbitrary offsets in the file.

©lnthanh
13
Hadoop Distributed File System Google File System
HDFS GFS

Cross Platform Linux


Developed in Java environment Developed in C, C++ environment
At first developed by Yahoo and now it is an Developed by Google
open source framework

NameNode and DataNode Master Node and Chunk server


The default block size is 128MB The default block size is 64MB
Heartbeat: DataNode ® NameNode Heartbeat:Chunk server ®Master Node
Commodities hardware were used Commodities hardware were used
WORM – Write Once and Read Many times Multiple writer, multiple reader model
Deleted files are renamed into particular Deleted files are not reclaimed immediately.
folder and then it will removed via garbage They are renamed in hidden namespace
and will be deleted after 3 days if not in use

No Network stack issue Network stack Issue


Journal, editlog Oprational log
Only append is possible ©lnthanh Random file write possible 14
Storing data with HDFS

©lnthanh
15
Blocks in HDFS vs. in local filesystem

o Like: HDFS files are


broken into block-sized
chunks and stored as
independent units.

• Unlike: A HDFS file that is


smaller than a single block does
not occupy a full block’s worth of
underlying storage.

©lnthanh
16
HDFS Daemons

©lnthanh
17
NameNode

©lnthanh 18
NameNode
o Manage the filesystem namespace by maintaining
the filesystem tree and the metadata for all the files
and directories in the tree
o This information is stored persistently on the local
disk in the namespace image (fsimage) and edit log
(editLog) files.

©lnthanh
19
NameNode
o The NameNode knows the DataNodes on which all
the blocks for a given file are located.
o However, it does not store block locations
persistently.
o This information is reconstructed from DataNodes
when the system starts.

©lnthanh
20
NameNode in use

o DataNodes send heartbeats every 3 seconds via a TCP handshake.


o Every 10th heartbeat is a Block report, allowing the NameNode builds its
metadata and insure (3) copies of the block exist in the system.
o If NameNode is down, HDFS is down.
©lnthanh
21
Re-replicating missing replicas

o Missing heartbeats signify lost nodes


o NameNode consults metadata, finds affected data.
o NameNode consults Rack Awareness script.
o NameNode tells a DataNode to re-replicate.
©lnthanh
22
Functions of a NameNode
o Maintain and execute the filesystem namespace
• Any modifications in the filesystem namespace or in its
properties is tracked by the NameNode with the help of a
transactional log, i.e. EditLog.
• E.g., when a directory is created in HDFS, the NameNode
will immediately record this in the EditLog.

©lnthanh
23
Functions of a NameNode
o Govern the mechanism of storing data
• Map a file to a set of blocks and maps a block to the
DataNodes where it is located
• Keep a record of how the files in HDFS are divided into
blocks, in which nodes these blocks are stored.
• Take care of the replication factor of blocks
• A change in the replication factor of any block will be recorded in
the editLog by the NameNode.

©lnthanh
24
Functions of a NameNode
o Record the metadata of all the files stored in the cluster
• E.g., the location, the size of the files, permissions, hierarchy,
etc.
o Make sure that the DataNodes are working properly
• Regularly receive heartbeats and Block reports from all
DataNodes.
• In case of a DataNode failure, it chooses new DataNodes for new
replicas, balances disk usage and also manages the
communication traffic to the DataNodes.
o Direct DataNodes to execute the low-level I/O operations
o By and large the NameNode manages cluster
configuration.
©lnthanh
25
Secondary NameNode
o The filesystem cannot be used without the NameNode.
• All the files on the filesystem would be lost since there would be
no way of reconstructing them from the blocks on the
DataNodes.
o It is important to make the NameNode resilient to
failure.
o Conventional solution: backup the files on local disks or a
remote NFS mount.
o More elastic solution: run a secondary NameNode.

©lnthanh
26
Secondary NameNode

o Not a hot standby for the NameNode


o Connect to NameNode every hour (by default).
o Housekeeping, backup of NameNode metadata
o Saved metadata can rebuild a failed NameNode
©lnthanh
27
HDFS fsimage file
o A persistent checkpoint of HDFS metadata
o Binary file, containing information about files and
directories

©lnthanh
28
Advanced HDFS capacity planning
o Analyze the fsimage to learn how fast HDFS grows
o Combine it with “external” datasets
• number of daily/monthly active users.
• total size of logs generated by users.
• number of queries / day run by data analysts.

©lnthanh
29
HDFS edit/audit file

o Log all filesystem access


requests sent to the
NameNode
o Easy to parse and aggregate

©lnthanh
30
HDFS edit/audit file
o Some files/directories are accessed more often than
others.
• E.g., fresh logs, core datasets, dictionary files, etc.
• Using an edit file to find them.
o To process it faster, increase its replication factor
while it is “hot”.
o To save disk space, decrease its replication factor
when it becomes “cold”

©lnthanh
31
Checkpointing
o A process that takes an fsimage and editLog and
compacts them into a new fsimage
o Instead of replaying a potentially unbounded edit
log, the NameNode can load the final in-memory
state directly from the fsimage.
o More efficient, NameNode startup time reduced.

©lnthanh
32
Checkpointing: An example
o An example of HDFS metadata directory taken from
a NameNode

©lnthanh
33
A typical workflow of checkpointing

©lnthanh
34
Checkpointing with NameNode HA configured

©lnthanh 35
Checkpointing with NameNode – Secondary NN
http://blog.cloudera.com/blog/2014/03/a-guide-to-checkpointing-in-hadoop/

©lnthanh 36
DataNode

©lnthanh 37
DataNode
o DataNodes are the workhorses of the filesystem.
o Register with NameNode
o Store and retrieve blocks when they are told to (by
clients or the NameNode)
• Report back to the NameNode periodically
with lists of blocks that they are storing

©lnthanh
38
Functions of a DataNode
o Perform low-level read and write requests from
clients
• Enable pipelining of data, forward data to other specified
DataNodes.
o Responsible for manipulating blocks based on the
decisions taken by the NameNode
• Create blocks, deleting blocks and replicating the same
• Store each HDFS data block in separate files in its local
filesystem

©lnthanh
39
Functions of a DataNode
o Send heartbeats to the NameNode every 3 seconds
to report the overall health of HDFS
• Total storage capacity, fraction of storage in use, and the
number of data transfers currently in progress.
o Periodically send a report on all the blocks present
in the cluster to the NameNode
• Block ID, the generation stamp and the length for each
block replica the server hosts.
• When getting started, DataNode scans through its local
filesystem, creates a list of all data blocks that relate to
each of these local files and sends a Block report to the
NameNode.
©lnthanh
40
Heartbeat messages
o HDFS uses heartbeat messages to detect connectivity
between NameNode and DataNodes.

©lnthanh
41
Heartbeat messages
o Periodically sent from each DataNode to NameNode.
o The NameNode marks those from which heartbeats
are missing as dead DataNodes.
• Refrain from sending further requests to dead nodes
• Data stored on a dead node is no longer available, which is
effectively removed from the system.
o The death of a node may cause the replication factor
of data blocks to drop below their minimum value.
• The NameNode initiates additional replication to bring the
replication factor back to a normalized state.

©lnthanh
42
Block caching
o Blocks of frequently accessed files may be explicitly
cached in the DataNode’s memory, in an off-heap
block cache.
o By default, a block is cached in the memory of only
one DataNode.
• User/application instruct the NameNode which files to
cache (and how long) by adding a cache directive to a
cache pool.

©lnthanh
43
HDFS Federation

©lnthanh 44
HDFS Federation

Hadoop 1.0 Hadoop 2.0

©lnthanh
45
HDFS in Hadoop 1.0
• Namespace
• Directories, files and blocks
• Create, delete, modify and list files
or dirs operations

o Block Storage
• Block Management
• DataNode cluster membership
• Support create/delete/modify/get block location operations
• Manage replication and replica placement
• Storage - provide read and write access to blocks

©lnthanh
46
HDFS in Hadoop 1.0
o Only a single namespace for the entire cluster is
allowed.
o A single NameNode manages this namespace.
o If cluster is very large with many files, memory
becomes the limiting factor of scaling.

©lnthanh
47
HDFS Federation
o HDFS Federation allows adding multiple
NameNodes, and hence multiple namespaces, to the
HDFS filesystem.
o For example:
• One NameNode might manage all files rooted under /user
• Second NameNode might handle files under /data
• Thrid NameNode might handle files under /share

©lnthanh
48
HDFS Federation
o Namespace volume are independent of each other,
the failure of one NameNode does not affect the
others.

©lnthanh
49
HDFS Federation

o Namespace Volume = Namespace + Block Pool


o Block Storage as generic storage service
• Set of blocks for a Namespace Volume is called a Block Pool.
• DataNodes store blocks for all©lnthanh
the Namespace Volumes – no partitioning.
50
HDFS High Availability

©lnthanh 51
Single Point of Failure

©lnthanh
52
Single Point of Failure
o HDFS did:
• Replicating NameNode metadata on multiple filesystems.
• Using Secondary NameNode to create checkpoints protects
against data loss.
o Yet, it does not provide high availability of the
filesystem.
o NameNode is still a single point of failure (SPOF).
• If it did fail, all clients would be unable to read, write,…

©lnthanh
53
Remind Secondary NameNode

©lnthanh
54
Recover from a failed NameNode
o Start a new primary NameNode and configure
DataNode and clients to use this new NameNode
o Note that new NameNode is not able to serve
requests until
• It has loaded its namespace image into memory
• Replayed its edit log
• Received enough block reports from DataNodes to leave
safe mode.
o If clusters is large, it takes about 30 minutes or
more.

©lnthanh
55
HDFS High Availability

o A pair of NameNodes in an active-standby mode


o In failure, the standby takes over its duties to continue
servicing client requests without a significant interruption
©lnthanh
56
HDFS High Availability
o Use shared storage to share edit log
• Some techniques in shared storage: NFS filer, Quorum
journal manager (QJM)
o When coming up, a standby NameNode will
• Read up to end of the shared edit log to synchronize its
state with the active NameNode, then
• Continue to read new entries as they are written by the
active NameNode.
o DataNodes must send block requests to both
NameNodes.
o Clients must be configured to handle NameNode
failover.
©lnthanh
57
©lnthanh 58
Shared storage with Journal Nodes

©lnthanh
59
HDFS High Availability with YARN

©lnthanh
60
Failover controller
o The transition from the active NameNode to the
standby one is managed by an entity integrated the
system called the failover controller.
• ZooKeeper Failover Controller (ZKFC), …
o Failover that may be initiated manually by an
administrator is call a graceful failover.
• For example, in the case of routine maintenance.
• The failover controller arranges an orderly transition for
both NameNodes to switch roles.

©lnthanh
61
Failover controller

©lnthanh
62
Fencing
o It is impossible to be sure that the failed NameNode
has stopped running.
• E.g., a slow network or a network partition can trigger a
failover transition à active NameNode is still running and
think it is still the active NameNode.
o The implementation to ensure that the previously
active NameNode is prevented from doing any
damage and causing corruption is fencing.
o This can be controlled by the use of ZooKeeper or
something similar.

©lnthanh
63
HDFS in Hadoop 3.0

©lnthanh 64
Support for Erasure Encoding
o Erasure Coding is mostly used in RAID.

©lnthanh
65
Support for Erasure Encoding
o Erasure Coding stores the data and provide fault
tolerance with less space overhead as compared to
HDFS replication.

©lnthanh
66
Support for more than 2
NameNodes
o Business critical deployments require higher degrees
of fault-tolerance.

©lnthanh
67
Other improvements
o Support for Filesystem Connector
• Hadoop now supports integration with Microsoft Azure
Data Lake and Aliyun Object Storage System.
o Intra-DataNode Balancer
• A single DataNode manages multiple disks. Similarly to the
problem of Unbalanced Cluster
• This can be handled via the hdfs diskbalancer CLI.

©lnthanh
68
Data processing on HDFS

©lnthanh 69
Data processing with Map task

o Map: “Run this computation on your local data”.


o JobTracker delivers the code to DataNodes with local data.
©lnthanh
70
What if data is not local?
o Rack awareness: The JobTracker tries to select a
node in same rack as data.
®

©lnthanh
71
Data processing with Reduce task

o Reduce: “Run this computation across Map results”.


o Map tasks deliver output data over the network.
o Reduce task data output written to and read from HDFS.
©lnthanh
72
Unbalanced cluster
o Hadoop prefers local processing if possible
o New servers underutilized for Map Reduce, HDFS*
o Might see more network bandwidth, slower job
times**

©lnthanh
73
Cluster balancing
o Balancer utility (if used) runs in the background.
o Does not interfere with Map Reduce or HDFS.
o Default speed limit 1 MB/s.

©lnthanh
74
An example of HDFS
configurations

©lnthanh
75
Read Data from HDFS
Anatomy of a file read

©lnthanh 76
A general workflow of data read

©lnthanh
77
Anatomy of a file read

©lnthanh
78
Anatomy of a file read

- Step 1: The client calls open() on the FileSystem object.


- Step 2: DistributedFileSystem calls the NameNode to determine
the locations of the first few blocks in the file.
- For each block, the NameNode returns the addresses of the DataNodes
that have a copy of that block, which are sorted according to the
proximity between the node and the client.
- If the client is itself a DataNode and it hosts a copy of the block, the
client will read from the local
©lnthanh
storage. 79
Read data from HDFS

- The DistributedFileSystem returns an FSDataInputStream (an


input stream that supports file seeks) to the client for it to read
data from.
- FSDataInputStream in turn wraps a DFSInputStream, which
manages the DataNode and NameNode I/O.

©lnthanh
80
Read data from HDFS

- Step 3: The client calls


read() on the stream and
DFSInputStream
connects to the first
(closest) DataNode for
the first block in the file.
- Step 4: Data is streamed
from the DataNode back
to the client, which calls
read() repeatedly on the
stream.
©lnthanh
81
- Step 5: When the end of the block is reached, DFSInputStream
will close the connection to the DataNode, then find the best
Read data
DataNode for thefrom HDFS
next block.
- This happens transparently to the client, which from its point of view is
just reading a continuous stream.
- Blocks are read in order. The DFSInputStream opens new connections
to DataNodes as the client reads through the stream. It also calls the
NameNode to retrieve the DataNode locations for the next batch of
blocks as needed.

©lnthanh
82
Read data from HDFS

- Step 6: When the client has finished reading, it calls close() on the
FSDataInputStream.

©lnthanh
83
Read data from HDFS
o The DFSInputStream will try the next closest
DataNode for a block if it fails to communicate with
current node.
• The failed DataNodes is marked so that the
DFSInputStream does not needlessly retry them for later
blocks.
o It also verifies checksums for the data transferred
from the DataNode.
• If a corrupted block is found, the DFSInputStream attempts
to read a replica of the block from another DataNode; it
also reports the corrupted block to the NameNode.

©lnthanh
84
Example: Read data from HDFS

o Client receives DataNode list for each block and picks first Data Node for each
block.
o Client reads blocks sequentially.
©lnthanh
85
Example: Read data from HDFS

o Name Node provides rack-local nodes first.


o Leverage in-rack bandwidth, single hop.
©lnthanh
86
Wring Data to HDFS
Anatomy of a file write

©lnthanh 87
Write data to HDFS

©lnthanh
88
Write data to HDFS

©lnthanh
89
Write data to HDFS

- Step 1: The client calls create() on DistributedFileSystem.


- Step 2: DistributedFileSystem calls the NameNode to create a
new file in the filesystem namespace, with no blocks associated
with it.
- The NameNode performs various checks to make sure the file does not
already exist and the client has the right permissions to create the file.
- If checks passed, the NameNode makes a record of the new file;
otherwise, file creation fails and the client is thrown an IOException.
- The DistributedFileSystem returns an FSDataOutputStream for the client
©lnthanh
to start writing data to. 90
Write data to HDFS

- Step 3: As the client writes data, the


DFSOutputStream splits it into packets,
which it writes to an internal queue called
the data queue.
- The queue is consumed by the
DataStreamer, which is responsible for
asking NameNode to allocate new blocks
by picking a list of suitable DataNodes to
store the replicas (usually 3 nodes).
- The list of DataNodes forms a pipeline.
- Step 4: The DataStreamer sequentially
streams the packets through nodes.
- The first DataNode in the pipeline stores
the packet and forwards it to the second
DataNode. Similarly, the second
DataNode stores the packet and forwards
it to the third (and last) DataNode.
©lnthanh
91
Write data to HDFS

- Step 5: The DFSOutputStream also


maintains an internal queue of
packets that are waiting to be
acknowledged by DataNodes, called
the ack queue.
- A packet is removed from the queue
only when it has been acknowledged
by all DataNodes in the pipeline.

©lnthanh
92
Write data to HDFS: Failure
o First, the pipeline is closed, and any packets in the ack queue are added to
the front of the data queue so that DataNodes that are downstream from
the failed node will not miss any packets.
o The current block on the good DataNodes is given a new identity, which is
communicated to the NameNode, so that the partial block on the failed
DataNode will be deleted if the failed DataNode recovers later on.
o The failed DataNode is removed from the pipeline, and a new pipeline is
constructed from the two good DataNodes.
o The remainder of the block’s data is written to the good DataNodes in the
pipeline.

©lnthanh
93
Write data to HDFS: Failure
o The aforementioned actions are transparent to the client writing the data.
o The NameNode notices that the block is under-replicated, and it arranges
for a further replica to be created on another node. Subsequent blocks are
then treated as normal.

©lnthanh
94
Write data to HDFS

- Step 6: When the client has finished writing data, it calls close() .
- Step 7: All the remaining packets are flushed to the DataNode pipeline and
waits for acknowledgments before contacting the NameNode to signal that
the file is complete.

©lnthanh
95
An example of writing data to
HDFS

o Client consults NameNode and writes the block directly to one DataNode.
o DataNodes replicates block
o Cycle repeats for next block ©lnthanh
96
Hadoop Rack Awareness

o Never loose all data if entire rack fails.


o Keep bulky flows in-rack when possible.
o Assumption that in-rack is higher
©lnthanhbandwidth, lower latency.
97
HDFS Write Pipeline

o The NameNode picks two


nodes in the same rack, one
node in a different rack.
o Data protection

©lnthanh
o Locality for M/R
98
HDFS Write Pipeline

• The first and


second DataNode
pass data along as
its received.
• TCP 50010

©lnthanh
99
HDFS Write Pipeline

©lnthanh
100
HDFS Write Pipeline

©lnthanh
101
©lnthanh 102

You might also like