This document provides an overview and instructions for using Hadoop including:
- Hadoop uses HDFS for distributed storage and divides files into 64MB chunks across data servers.
- The master node tracks the namespace and metadata while slave nodes store data blocks.
- Commands like start-all.sh and stop-all.sh are used to start and stop Hadoop across nodes.
- The hadoop dfs command is used to interact with files in HDFS using options like -ls, -put, -get. Configuration files allow customizing Hadoop.