KEMBAR78
Hadoop administration | PPTX
2013.09.16
Ryan Ahn
HADOOP ADMINISTRATION
2 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop 2.0; Glance at YARN
CONTENTS
3 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
4 9/27/2013
โ€ข ๋ˆ„๊ตฌ๋‚˜ Mobile device
โ€ข Facebook, Twitter ๋“ฑ์˜ ์„œ๋น„์Šค ํฌํƒˆ
> 100์–ต์žฅ์˜ ์‚ฌ์ง„ ๏ƒ  ์ˆ˜ PB ์Šคํ† ๋ฆฌ์ง€
โ€ข ์ด๋™ํ†ต์‹ 
> ์‹œ๊ฐ„๋‹น 250 GB ์ด์ƒ
> ํ•˜๋ฃจ 6TB
> 1๋…„, 5๋…„, 10๋…„?
โ€ข IT ์„œ๋น„์Šค ์œตํ•ฉ
> Mobile + Biz(๊ธˆ์œต, ์‡ผํ•‘ ๋“ฑ)
Data paradigm shift
10244
10245
10248
10246
10247
5 9/27/2013
โ€ข Change ๏ƒจ Chance
Big Data = Big Chance
2011๋…„ 2012๋…„ 2013๋…„
1 ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ๋ฏธ๋””์–ด ํƒœ๋ธ”๋ฆฟ ์ดํ›„ ๋ชจ๋ฐ”์ผ ๊ธฐ๊ธฐ ๋Œ€์ „
2
๋ชจ๋ฐ”์ผ ์•ฑ๊ณผ ๋ฏธ๋””์–ด
ํƒœ๋ธ”๋ฆฟ
๋ชจ๋ฐ”์ผ ์ค‘์‹ฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ
์ธํ„ฐํŽ˜์ด์Šค
๋ชจ๋ฐ”์ผ ์•ฑ๊ณผ HTML5
3
์†Œ์…œ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ๋ฐ
ํ˜‘์—…
์ƒํ™ฉ์ธ์‹๊ณผ ์†Œ์…œ์ด ๊ฒฐํ•ฉ๋œ ์‚ฌ์šฉ์ž
๊ฒฝํ—˜
ํผ์Šค๋„ ํด๋ผ์šฐ๋“œ
4 ๋น„๋””์˜ค M2M IoT
5 ์ฐจ์„ธ๋Œ€ ๋ถ„์„ ์•ฑ์Šคํ† ์–ด์™€ ๋งˆ์ผ“ ํ”Œ๋ ˆ์ด์Šค
ํ•˜์ด๋ธŒ๋ฆฌ๋“œ IT์™€ ํด๋ผ์šฐ๋“œ
์ปดํ“จํŒ…
6 ์†Œ์…œ ๋ถ„์„ ์ฐจ์„ธ๋Œ€ ๋ถ„์„ ์ „๋žต์  ๋น…๋ฐ์ดํ„ฐ
7 ์ƒํ™ฉ์ธ์‹ ์ปดํ“จํŒ… ๋น…๋ฐ์ดํ„ฐ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๋ถ„์„
8 ์Šคํ† ๋ฆฌ์ง€๊ธ‰ ๋ฉ”๋ชจ๋ฆฌ ์ธ๋ฉ”๋ชจ๋ฆฌ ์ปดํ“จํŒ… ์ธ๋ฉ”๋ชจ๋ฆฌ ์ปดํ“จํŒ…
9 ์œ ๋น„์ฟผํ„ฐ์Šค ์ปดํ“จํŒ… ์ €์ „๋ ฅ ์„œ๋ฒ„ ํ†ตํ•ฉ ์—์ฝ”์‹œ์Šคํ…œ
10 ํŒจ๋ธŒ๋ฆญ ๊ธฐ๋ฐ˜ ์ปดํ“จํŒ… ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์•ฑ์Šคํ† ์–ด
Data ๊ด€๋ฆฌ
- ์ƒ์‚ฐ
- ๊ด€๋ฆฌ
- ํ™œ์šฉ
6 9/27/2013
Value
Velocity
Volume Variety
Complexity
Big Data Goal: 4V + 1C
7 9/27/2013
โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋“ฌ
> ์ธํ”„๋ผ์˜ ๊ตฌ์กฐ, ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์ˆ˜์šฉ ๊ฐ€๋Šฅ์„ฑ
> ๊ธฐ์กด ์ธํ”„๋ผ๋Š” ๋…๋ฆฝ์  ์‹œ์Šคํ…œ, ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ, ์œ ์ง€๋ณด์ˆ˜
> ํ”Œ๋žซํผ์ด ํ•„์š”
โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์‹ผ(?) ๊ตฌ์กฐ
> HDFS
โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ Bundling Framework
> Map + Reduce
โ€ข Logic on Data
> Data Locality ๋ณด์žฅ
โ€ข I/O ์ง‘์ค‘์ ์ด๋ฉด์„œ CPU ์—ฐ์‚ฐ
> ํŒŒ์ผ์ฒ˜๋ฆฌ ์‚ฌ๊ณ , ๋ฉ€ํ‹ฐ๋…ธ๋“œ ๋ถ€ํ•˜๋ถ„์‚ฐ ์‚ฌ์ƒ
โ€ข ํ•˜๋“œ์›จ์–ด ์ถ”๊ฐ€์‹œ ์„ฑ๋Šฅ Linear
> ๊ฒฐ๊ตญ, DB ๋ณด๋‹ค ์ฒด๊ฐ์†๋„ ์กด์žฌ ํ•จ
Why Hadoop
8 9/27/2013
Hadoop Architecture
9 9/27/2013
Hadoop & Eco-system
10 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
11 9/27/2013
โ€ข POSIX ์š”๊ตฌ์‚ฌํ•ญ ์ผ๋ถ€๋ฅผ ๋งŒ์กฑ
โ€ข ๋‹ค์ˆ˜์˜ ๋…๋ฆฝ ๋จธ์‹ ์œผ๋กœ ์‹œ์Šคํ…œ์œผ๋กœ ์„ฑ๋Šฅ๊ณผ ๋น„์šฉ์„ ๋ชจ๋‘ ๋งŒ์กฑ
โ€ข ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ์ˆ˜์‹ญ ๊ธฐ๊ฐ€๋ฐ”์ดํŠธ ํฌ๊ธฐ์˜ ํŒŒ์ผ์„ ์ €์žฅ ๊ฐ€๋Šฅ
> ์ˆ˜์‹ญ PB ์ด์ƒ๋„ ๊ฐ€๋Šฅ
โ€ข Scale out ๋ชจ๋ธ
> ๋Œ€์šฉ๋Ÿ‰ ์Šคํ† ๋ฆฌ์ง€ ๊ตฌ์„ฑ์„ ์œ„ํ•ด RAID ๋Œ€์‹  JBOD๋ฅผ ์ง€์›
> ์• ํ”Œ๋ฆฌ์ผ€์ด์„  ์ˆ˜์ค€์˜ ๋ฐ์ดํ„ฐ ๋ณต์ œ๋กœ ๊ฐ€์šฉ์„ฑ ํ™•๋ณด์™€ ๋†’์€ ์„ฑ๋Šฅ ์œ ์ง€
โ€ข ํฐ ํŒŒ์ผ์˜ ์ŠคํŠธ๋ฆฌ๋ฐ ์ฝ๊ธฐ์™€ ์“ฐ๊ธฐ์— ๋” ์ตœ์ ํ™”
> ํ•˜๋‘ก์€ ๋‹ค์ˆ˜์˜ ์ž‘์€ ํŒŒ์ผ์— ๋Œ€ํ•œ ๋งค์šฐ ๋А๋ฆฐ ์‘๋‹ต
> ๋ฐฐ์น˜ ์‹คํ–‰์ด ์‘๋‹ต ์†๋„๋ณด๋‹ค ๋” ์ค‘์š”
โ€ข Fault Tolerance
> ๋จธ์‹ ๊ณผ ๋””์Šคํฌ ๋“ฑ์˜ ์ปดํฌ๋„ŒํŠธ ์‹คํŒจ์— ๋Œ€์ฒ˜
โ€ข ๋งต๋ฆฌ๋“€์Šค Framework ์—ฐ๊ณ„ ๊ฐ€๋Šฅํ•ด์•ผ ํ•จ
HDFS Goal and Motivation
12 9/27/2013
โ€ข User Level File System
> ์ปค๋„ ์™ธ๋ถ€์—์„œ Application์ด ์ˆ˜ํ–‰ ๋จ, System Mount ๋ถˆํ•„์š”
> FUSE ์‚ฌ์šฉ ์‹œ์—๋Š”?
โ€ข Distributed File System
โ€ข Disk Block Size
> Default Size ๏ƒ  64M
> 128MB, 256MB, 1GB ๋Š˜๋ฆด ์ˆ˜ ์žˆ์Œ(Trade-off)
> ์™œ ๋ธ”๋ก ์‚ฌ์ด์ฆˆ๋ฅผ ๋А๋ฆด๊นŒ? ๋“œ๋ผ์ด๋ธŒ ํƒ์ƒ‰ ์กฐ์ž‘ ์ตœ์†Œํ™” I/O ์„ฑ๋Šฅ ํ–ฅ์ƒ
โ€ข Data Protection
> ์—ฌ๋Ÿฌ ๋จธ์‹ ์— ๋ฐ์ดํ„ฐ ๋ธ”๋ก ๋ณต์ œ
> ๋ฐ์ดํ„ฐ๋Š” ํ•œ ๋ฒˆ ์“ฐ๊ฒŒ ๋˜๋ฉด ์ˆ˜์ • ๋ถˆ๊ฐ€๋Šฅ
> ๋ฐ์ดํ„ฐ READ ์‹œ์—๋Š” ๋ณต์ œ ์ค‘ ํ•˜๋‚˜๋งŒ ์ฝ์Œ
โ€“ ๋„คํŠธ์›Œํฌ ์ƒ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋จธ์‹ ์˜ ๋ ˆํ”Œ๋ฆฌ์นด์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๊ฒŒ ๋จ
HDFS Design
13 9/27/2013
โ€ข ๋„ค์ž„๋…ธ๋“œ(NameNode)
> ํŒŒ์ผ์‹œ์Šคํ…œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ „๋ถ€ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ
> 1๋ฐฑ๋งŒ ๋ธ”๋ก์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด 1GB์˜ Heap ํ•„์š”
โ€ข ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ(Secondary NameNode)
> ๋ฐฑ์—…์€ ์šฉ๋„๋Š” ์•„๋‹˜
> ๋„ค์ž„๋…ธ๋“œ ์ด๋ฏธ์ง€๋ฅผ ๊ด€๋ฆฌ, ์ผ์ข…์˜ Check Pointer Server
HDFS Daemon
Daemon ํด๋Ÿฌ์Šคํ„ฐ๋‹น ๊ฐœ์ˆ˜ ์šฉ๋„
๋„ค์ž„๋…ธ๋“œ 1 ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ,
ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๊ธ€๋กœ๋ฒŒ ์ด๋ฏธ์ง€ ์ œ๊ณต
๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ 1 ๋„ค์ž„๋…ธ๋“œ ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ์˜ ์ฒดํฌํฌ์ธํŠธ
์ž‘์—…์ˆ˜ํ–‰
๋ฐ์ดํ„ฐ๋…ธ๋“œ ๋‹ค์ˆ˜ ๋ธ”๋ก ๋ฐ์ดํ„ฐ ์ €์žฅ(ํŒŒ์ผ๋‚ด์šฉ)
14 9/27/2013
HDFS ๋™์ž‘๋ฐฉ์‹(Read)
Name Node
Secondly
Name Node
Data Node Data Node Data Node Data Node Data Node
rack1 rack2
HDFS
Application
Hadoop Client
1. ํŒŒ์ผ๊ฒฝ๋กœ: /foo/bar/test.txt ์š”์ฒญ
2. ๋ธ”๋ก1, ํ˜ธ์ŠคํŠธโ€ฆ ์‘๋‹ต
Host1 Host2 Host3
3. ๋ธ”๋ก1 ์ฝ๊ธฐ ์š”์ฒญ
4. ๋ฐ์ดํ„ฐ ์‘๋‹ต
15 9/27/2013
HDFS ๋™์ž‘๋ฐฉ์‹(Write)
Name Node
Secondly
Name Node
Data Node Data Node Data Node Data Node Data Node
rack1 rack2
HDFS
Application
Hadoop Client
1. ํŒŒ์ผ ์ƒ์„ฑ์„ ์œ„ํ•œ ํŒŒ์ผ๊ฒฝ๋กœ ์ƒ์„ฑ ์š”์ฒญ
- ํŒŒ์ผ๊ฒฝ๋กœ: /foo/bar
- ๋ณต์ œ๋ณธ์ˆ˜: 3
Memory
2.1 ํŒŒ์ผ ๊ฒฝ๋กœ ์ •๋ณด ์ƒ์„ฑ(๋ฉ”๋ชจ๋ฆฌ์— ์ƒ์„ฑ)
2.2 ๋ฝ ์ƒ์„ฑ(๋‹ค๋ฅธ ํด๋ผ์ด์–ธํŠธ๊ฐ€ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•˜๊ฒŒ)
Host1 Host2
3.1ํŒŒ์ผ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์„ ํƒ ํ›„
ํ˜ธ์ŠคํŠธ ์ •๋ณด ๋ฐ˜ํ™˜ (Host1, Host2, Host3)
4. ํŒŒ์ผ ๋ฐ์ดํ„ฐ ๋ฐ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ๋ชฉ๋ก ์ „์†ก
Host3
5.1 ๋กœ์ปฌ์ €์žฅ 5.2๋ณต์ œ๋ณธ ์ €์žฅ 5.3๋ณต์ œ๋ณธ ์ €์žฅ
edits
5.4 ์ €์žฅ์™„๋ฃŒ (close()๋ช…๋ น)
6. ๋ฉ”๋ชจ๋ฆฌ์˜ ๋‚ด์šฉ์„ edits ํŒŒ์ผ์— ๊ธฐ๋ก(๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋“ฑ๋ก)
์ •ํ•ด์ง„ ๋ธ”๋กํฌ๊ธฐ๋ฅผ ๋„˜์–ด์„œ๋ฉด
ํด๋ผ์ด์–ธํŠธ๋Š” ๋„ค์ž„ ๋…ธ๋“œ๋กœ
์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์š”์ฒญ
fsimage
์ฃผ๊ธฐ์ ์œผ๋กœ ๋‹ค์šด๋กœ๋“œ ํ›„
edits์™€ fsimage ๋ณ‘ํ•ฉ
fsimage ํŒŒ์ผ์„
Name Node๋กœ ์ „์†ก
16 9/27/2013
HDFS ๋™์ž‘๋ฐฉ์‹(๋ฉ”ํƒ€๋ฐ์ดํ„ฐ)
Name Node
Secondly
Name Node
1. edit ํŒŒ์ผ ํšŒ์ „(๋ณด๊ด€, edits.new ์ƒ์„ฑ)
2. fsimage์™€ edits ํŒŒ์ผ ๋ณต์‚ฌ
4. ์ƒˆ๋กœ์šด fsimage ํŒŒ์ผ ์ „์†ก
Disk
3. ๋‘ ํŒŒ์ผ์„ ์ฝ์–ด์„œ edits ๋ฅผ ๋ฐ˜์˜,
์ƒˆ๋กœ์šด fsimage ํŒŒ์ผ ์ƒ์„ฑ5. edits.new ํŒŒ์ผ๋ช…์„ edits๋กœ ๋ณ€๊ฒฝ
Disk
โ€ข ๋งค์‹œ๊ฐ„(๊ธฐ๋ณธ๊ฐ’) ๋˜๋Š” ๋„ค์ž„๋…ธ๋“œ edits ํŒŒ์ผ์˜ ํฌ๊ธฐ๊ฐ€ 64MB๊ฐ€ ๋˜๋ฉด ๋ฐœ์ƒ
โ€ข ์ตœ๊ทผ ํ•˜๋‘ก๋ฒ„์ „์€ ํŠธ๋žœ์žญ์…˜ ๊ฐœ์ˆ˜ ๊ธฐ์ค€์œผ๋กœ ์ฒดํฌ ํฌ์ธํŠธ๋ฅผ ์ˆ˜ํ–‰
17 9/27/2013
NFS
โ€ข Hadoop NameNode๋Š” SPOF
โ€ข ๋”ฐ๋ผ์„œ HA ๊ตฌ์กฐ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•จ
โ€ข HA ๊ธฐ๋ณธ ๊ธฐ๋Šฅ์€ Hadoop 1.x ๋ฒ„์ „ ์ดํ•˜์—์„œ๋Š” ์กด์žฌํ•˜์ง€ ์•Š์Œ
> Hadoop 2.0(YARN)์˜ ๊ฐ€์žฅ ์ธ๊ธฐ์žˆ๋Š” ๊ธฐ๋Šฅ
โ€ข NFS, Zookeeper ๋“ฑ์„ ์ด์šฉํ•˜์—ฌ HA ๊ตฌํ˜„
Hadoop HA(NameNode)
Zookeeper
Quorum
๋ณต๊ตฌ
์ปจํŠธ๋กค๋Ÿฌ
๋ณต๊ตฌ
์ปจํŠธ๋กค๋Ÿฌ
๋„ค์ž„๋…ธ๋“œ(Ac
tive)
๋„ค์ž„๋…ธ๋“œ(St
andby)
๋ฉ”ํƒ€๋ฐ์ด
ํ„ฐ
๊ณต์œ 
๋ฉ”ํƒ€๋ฐ์ด
ํ„ฐ
๋ฉ”ํƒ€๋ฐ์ด
ํ„ฐ
* ๊ทธ ๋ฐ–์˜ ํŽ˜๋”๋ ˆ์ด์…˜ ๊ตฌ์กฐ๊ฐ€ ์žˆ์Œ
18 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
19 9/27/2013
โ€ข MapReduce Framework ์‚ฌ์ƒ
> Logic on Data; Data Locality ๋ฐ˜์˜
โ€ข MapReduce: Simplified Data Processing on Large Clusters
โ€ข ๊ฐœ๋ฐœ ๋‹จ์ˆœ์„ฑ
> ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ๋ ˆ์ฝ”๋“œ๋งŒ์„ ๋‹ค๋ฃจ๋Š” ๋ฐ์ด์ฒ˜ ์ฒ˜๋ฆฌ ํ”„๋กœ๊ทธ๋žจ๋งŒ ์ž‘์„ฑํ•˜๋ฉด ๋จ
> ํ•˜์ง€๋งŒ, Functionality Programming ๊ฐœ๋…์„ ์ˆ™์ง€; KEY-VALUE
โ€“ ์‚ฌ์‹ค, Java MapReduce ๋Š” ๋ณ€์ข…
> Filtering, Projection, Grouping, Aggregation ๋“ฑ์„ ๊ตฌํ˜„
โ€ข ํ™•์žฅ์„ฑ
> ํƒœ์Šคํฌ๋“ค์€ ์„œ๋กœ ํ†ต์‹ ํ•˜๊ฑฐ๋‚˜, ์ƒํƒœ๋ฅผ ๊ณต์œ ํ•˜์ง€ ์•Š๊ณ  ๋ถ„์‚ฐ๋œ ๋จธ์‹ ์—์„œ
๋ณ‘๋ ฌ ์ˆ˜ํ–‰
โ€ข ๋‚ด๊ณ ์žฅ์„ฑ
> ์‹คํŒจ๋Š” ์˜ˆ์™ธ๊ฐ€ ์•„๋‹ˆ๋ผ ํ•ญ์ƒ ์ผ์–ด๋‚˜๋Š” ์ผ
> ํด๋Ÿฌ์Šคํ„ฐ์˜ ์›Œ์ปค ๋…ธ๋“œ์—์„œ ํƒœ์Šคํฌ๊ฐ€ ์‹คํŒจํ•˜๋ฉด ๋‹ค์‹œ ์‹œ๋„
MapReduce ์˜ Features
20 9/27/2013
โ€ข (1) Job summit ๏ƒจ (2) Map Task ๏ƒจ (3) Shuffling& Sort ๏ƒจ (4) Reduce Task
MapReduce 4๋‹จ๊ณ„ ์ฒ˜๋ฆฌ
21 9/27/2013
โ€ข Job Tracker
> 1๊ฐœ์ž„
> Client์™€ Task Tracker์™€ RPC ํ†ต์‹  ์ˆ˜ํ–‰
> Task Tracker๋Š” Heartbeat์œผ๋กœ ์ƒํƒœ๊ณผ ์ •๋ณด๋ฅผ ๋ณด๋‚ด ์คŒ
> Job Configuration ์ฑ…์ž„
โ€ข Task Tracker
> ์‚ฌ์šฉ์ž ์ฝ”๋“œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ๋ชฌ
> Job Tracker์—๊ฒŒ ์ฃผ๊ธฐ์ ์œผ๋กœ ์ง„ํ–‰๊ณผ์ • ๋ณด๊ณ 
> Job Tracker๋กœ๋ถ€ํ„ฐ ํ• ๋‹น ๋ฐ›์œผ๋ฉด ์ƒˆ๋กœ์šด ํ”„๋กœ์„ธ์Šค ๋งŒ๋“ค์–ด Task
Attemptํ•จ
โ€“ Task vs. Task Attempt
MapReduce Daemon
22 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
23 9/27/2013
1. ํ•˜๋‘ก ๋ฐฐํฌํŒ ๋ฒ„์ „ ์„ ํƒ
2. ํ•˜๋“œ์›จ์–ด ์„ ํƒ
3. ์šด์˜์ฒด์ œ ์„ ํƒœ๊ณผ ์ค€๋น„ ์‚ฌํ•ญ
4. ์ปค๋„ ํŠœ๋‹
5. ๋„คํŠธ์›Œํฌ ์„ค๊ณ„
Cluster Planning
24 9/27/2013
โ€ข Apache Hadoop vs. Packaging Hadoop
> Apache Hadoop
> Hortonworks
> Cloudera
ํ•˜๋‘ก ํŒจํฌํŒ๊ณผ ๋ฒ„์ „ ์„ ํƒ
25 9/27/2013
โ€ข 20๋Œ€ ๋ฏธ๋งŒ ํด๋Ÿฌ์Šคํ„ฐ(์†Œํ˜• ํด๋Ÿฌ์Šคํ„ฐ) ๋งˆ์Šคํ„ฐ ํ•˜๋“œ์›จ์–ด ์ €๋น„์šฉ ํˆฌ์ž
> CPU: 2.6GH, ์ฟผ๋“œ์ฝ”์–ด CPU 2๊ฐœ
> MEM: DDR3 RAM 24GB
> NT: 1GB NIC 2๊ฐœ
> Controller: SAS ๋“œ๋ผ์ด๋ฒ„ Controller, SAS II(OS ๋””๋ฐ”์ด์Šค JBOD)
> Disk Storage: ์ตœ์†Œ 1TB ์ด์ƒ
โ€ข 300๋Œ€ ๋ฏธ๋งŒ
> Memory๋งŒ 24GB ํ˜น์€ 48GB
โ€ข ๋Œ€ํ˜• Cluster
> Memory 96GB
ํ•˜๋“œ์›จ์–ด ์„ ํƒ
๊ตฌ๋ถ„ ๋Œ€์ƒ ๊ณ ๋ ค์‚ฌํ•ญ
๋งˆ์Šคํ„ฐ
ํ•˜๋“œ์›จ์–ด
์„ ํƒ
๋„ค์ž„๋…ธ๋“œ โ€ข ๋…๋ฆฝ๋จธ์‹ , ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ
โ€ข CPU๋ณด๋‹ค ๋ฉ”๋ชจ๋ฆฌ ์œ„์ฃผ
๋ณด์กฐ
๋„ค์ž„๋…ธ๋“œ
โ€ข ๋„ค์ž„๋…ธ๋“œ์™€ ๊ฐ™์Œ, ๋™์ผํ•œ ๋ฉ”๋ชจ๋ฆฌ, ๋””์Šคํฌ ์šฉ๋Ÿ‰
ํ•„์š”
์žก ํŠธ๋ž˜์ปค โ€ข ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉ, ๋„ค์ž„๋…ธ๋“œ์™€ ๊ฐ™์€ Spec
26 9/27/2013
ํ•˜๋“œ์›จ์–ด ์„ ํƒ
๊ตฌ๋ถ„ ๋Œ€์ƒ ๊ณ ๋ ค์‚ฌํ•ญ
์›Œ์ปค
ํ•˜๋“œ์›จ์–ด
์„ ํƒ
๋ฐ์ดํ„ฐ ๋…ธ๋“œ โ€ข ์ €์žฅ๊ณผ ๊ณ„์‚ฐ ๋‘ ์—ญํ•  ๋ชจ๋‘ ํ•จ
โ€ข ๋”ฐ๋ผ์„œ CPU, Disk Storage ๊ณ ๋ ค
โ€ข ๋ณต์ œ ๊ณ„์ˆ˜, X 3
โ€ข MapReduce ์ž„์‹œ ์ €์žฅ, 20~30% ์ถ”๊ฐ€ ์ €์žฅ
โ€ข ๋ฉ”๋ชจ๋ฆฌ ๋ณด๋‹ค, Disk ์œ„์ฃผ
โ€ข ์—ฐ์‚ฐ์„ ์œ„ํ•œ CPU ์„ ํƒ
ํƒœ์Šคํฌ
ํŠธ๋ž˜์ปค
์ค‘๊ธ‰(๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ, ๋งŽ์€ ๋””์Šคํฌ, 1GB ์ด๋”๋„ท) ๊ณ ๊ธ‰(๋Œ€์šฉ๋Ÿ‰ ๋ฉ”๋ชจ๋ฆฌ, ์ดˆ๊ณ ์† ๋””์Šคํฌ, 10GB
์ด๋”๋„ท)
โ€ข CPU: 2.9GHz, 15MB ์บ์‹œ 6์ฝ”์–ด CPU 2๊ฐœ
โ€ข ๋ฉ”๋ชจ๋ฆฌ: DDR3-1600 ECC 64GB
โ€ข ๋””์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ: SAS 6GB/S 1๊ฐœ
โ€ข ๋””์Šคํฌ: 3TB LFF SATA II 7200 RPM HDD 12๊ฐœ
โ€ข ๋„คํŠธ์›Œํฌ: 1GB ์ด๋”๋„ท 2๊ฐœ
โ€ข ๋น„๊ณ : ์ธํ…” ํ•˜์ดํผ ์“ฐ๋ ˆ๋”ฉ QPI ๊ธฐ๋Šฅ ๊ถŒ์žฅ
โ€ข 3 ๋˜๋Š” 4 ์ฑ„๋„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •
โ€ข CPU: 2.9GHz, 15MB ์บ์‹œ 6์ฝ”์–ด CPU 2๊ฐœ
โ€ข ๋ฉ”๋ชจ๋ฆฌ: DDR3-1600 ECC 96GB
โ€ข ๋””์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ: SAS 6GB/S 2๊ฐœ
โ€ข ๋””์Šคํฌ: 3TB LFF SATA II 7200 RPM HDD 24๊ฐœ
โ€ข ๋„คํŠธ์›Œํฌ: 10GB ์ด๋”๋„ท 1๊ฐœ
โ€ข ๋น„๊ณ : ์ธํ…” ํ•˜์ดํผ ์“ฐ๋ ˆ๋”ฉ QPI ๊ธฐ๋Šฅ ๊ถŒ์žฅ
โ€ข 3 ๋˜๋Š” 4 ์ฑ„๋„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •
27 9/27/2013
โ€ข Scale up ๋Œ€์‹  Scale out
โ€ข ๊ฐ€์ƒํ™”๋ฅผ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.
> ๋ฒ ์–ด๋ฉ”ํƒˆ ๋“ฑ์˜ ํ•˜์ดํผ๋ฐ”์ด์ €๋Š” I/O ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์คŒ
ํ•˜๋“œ์›จ์–ด ์„ ํƒ: ํด๋Ÿฌ์Šคํ„ฐ ๊ทœ๋ชจ ๊ฒฐ์ •
์ผ์ผ ์ˆ˜์ง‘ ์šฉ๋Ÿ‰ 1TB
๋ณต์ œ ์ธ์ˆ˜ 3(๋ธ”๋ก ์‚ฌ๋ณธ ์ˆ˜)
์ผ์ผ ์‹ค์ œ ์šฉ๋Ÿ‰ 3TB ์ผ์ผ์ˆ˜์ง‘ ์šฉ๋Ÿ‰ X ๋ณต์ œ์ธ์ˆ˜
๋…ธ๋“œ ์ €์žฅ ์š”๋Ÿ‰ 24TB 2TB SATA II HDD 12๊ฐœ
๋งต๋ฆฌ๋“€์Šค ์ž„์‹œ ๋ฐ์ดํ„ฐ ์šฉ๋Ÿ‰ 25% ๋งต๋ฆฌ๋“€์Šค ์ž„์‹œ ๋ฐ์ดํ„ฐ
๋…ธ๋“œ ๊ฐ€์šฉ ์ €์žฅ์†Œ ์šฉ๋Ÿ‰ 18TB ๋…ธ๋“œ์ €์žฅ์šฉ๋Ÿ‰ โ€“ ๋งต๋ฆฌ๋“€์Šค ์ž„์‹œ ๋ฐ์ดํ„ฐ
์šฉ๋Ÿ‰
1๋…„ 61 ๋…ธ๋“œ ์ผ์ผ์ˆ˜์ง‘์šฉ๋Ÿ‰ X ๋ณต์ œ์ธ์ˆ˜ X 365 / ๋…ธ๋“œ
๊ฐ€์šฉ ์ €์žฅ์†Œ ์šฉ๋Ÿ‰
1๋…„(๋งค์›” 5% ์ฆ๊ฐ€) 81 ๋…ธ๋“œ
1๋…„(๋งค์›” 10% ์ฆ๊ฐ€) 109 ๋…ธ๋“œ
28 9/27/2013
โ€ข RedHat, CentOS, Ubuntu, SuSE ๋“ฑ ๋ฆฌ๋ˆ…์Šค ์šด์˜์ฒด์ œ ์ตœ์ ํ™”
โ€ข Puppet, Chef ๋“ฑ์˜ ์„ค์ •๊ด€๋ฆฌ ์‹œ์Šคํ…œ ํ•„์š”(์˜คํ”ˆ์†Œ์Šค)
โ€ข ์†Œํ”„ํŠธ์›จ์–ด
> Oracle Java 1.6 ์ด์ƒ
โ€“ Hadoop RPM ์‚ฌ์šฉํ•˜๋ ค๋ฉด, Oracle Java๋„ RPM ๋ฒ„์ „์œผ๋กœ ์„ค์น˜
> Cron Daemon
> ntp
> SSH
> SNTP
> rsync
โ€ข Hostname, DNS ์ธ์‹
> /etc/hostname, /etc/hosts, java dns
์šด์˜์ฒด์ œ ์„ ํƒ๊ณผ ์ค€๋น„์‚ฌํ•ญ
29 9/27/2013
๋ฐ๋ชฌ ์œ„์น˜ ์„ค์ • ๋งค๊ฐœ๋ณ€์ˆ˜ ์†Œ์œ ์ž:๊ทธ๋ฃน ๊ถŒํ•œ
NameNode /data/1/dfs/nn,
/data/2/dfs/nn,
/data/3/dfs/nn
dfs.name.dir hdfs:hadoop 0700
Secondary Name
Node
/data/1/dfs/snn fs.checkpoint.dir hdfs:hadoop 0700
DataNode /data/1/dfs/dn,
/data/2/dfs/dn,
/data/3/dfs/dn
dfs.datanode.dir hdfs:hadoop 0700
Task Tracker /data/1/mapred/local,
/data/2/mapred/local,
/data/3/mapred/local,
mapred.local.dir mapred:hadoop 0700
Job Tracker /data/1/mapred/local mapred.local.dir mapred:hadoop 0700
์ „์ฒด /var/log/hadoop
/tmp/hadoopuser.name
$HADOOP_LOG_DIR
hadoop.tmp.dir
root:hadoop
root:root
0775
1777
Directory Layout
30 9/27/2013
โ€ข ์ปค๋„ ๋งค๊ฐœ๋ณ€์ˆ˜ /etc/sysctl.conf ์„ค์ •
> ๋ฐ˜๋“œ์‹œ ์žฌ์‹œ์ž‘ํ•ด์•ผ ๋ฐ˜์˜
โ€ข vm.swappiness
> ๋ฐ์ดํ„ฐ์™€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ„์˜ SWAP
> 0~100, ์ˆ˜์น˜๊ฐ€ ๋†’์„์ˆ˜๋ก ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋งŽ์ด ์Šค์™‘
> ๋Œ€๋žต 60~80์ด๋ฉด ์ ๋‹น
โ€ข Vm.overcommit_memory
> malloc() ํ•จ์ˆ˜ ํ˜ธ์ถœํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ์˜ต์…˜
> 0: ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•˜๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น
> 1: ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ ์ดˆ๊ณผํ•  ๋•Œ vm.overcommit_ratio ๊ธฐ์ค€์œผ๋กœ ์Šค์™‘ํ›„ ํ• ๋‹น
โ€“ ratio 50์ด๊ณ , ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ 1GB์ด๋ฉด ์ปค๋„ ์Šค์™‘ ํฌํ•จํ•ด์„œ 1.5GB
> 2: ์ปค๋„์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ์š”์ฒญ์„ ๋ฌด์กฐ๊ฑฐ ์ˆ˜์šฉ
โ€“ ์ ˆ๋Œ€ ๊ถŒ๊ณ ํ•˜์ง€ ์•Š์Œ
์ปค๋„ ํŠœ๋‹
31 9/27/2013
โ€ข ๋ฆฌ๋ˆ…์Šค์˜ LVM ์‚ฌ์šฉ ๊ธˆ์ง€
> /dev/sd* ์•„๋‹Œ /dev/vg* ์ด๋ฉด ์ž˜ ๋ชป๋œ ์„ค์ •
โ€ข ๋Œ€๋ถ€๋ถ„ ์šด์˜์ฒด์ œ ํŒŒ์ผ์‹œ์Šคํ…œ ๋”ฐ๋ผ ๊ฐ
> ext3, ext4, xfs
โ€ข ํ•ญ์ƒ ๋งˆ์šดํŠธ ์˜ต์…˜์— ์ถ”๊ฐ€ํ•ด์•ผ ํ•จ
๋””์Šคํฌ ์„ค์ •
32 9/27/2013
โ€ข ๊ผญ ๋„คํŠธ์›Œํฌ ์—”์ง€๋‹ˆ์–ด์™€ ํ•จ๊ป˜ ์„ค๊ณ„ํ•ด์•ผ ํ•จ
โ€ข ๊ณ ๋ฆฝ๋œ ๋„คํŠธ์›Œํฌ๊ฐ€ ์ตœ์ 
โ€ข 1,152 ์ง€์› ์žฅ๋น„
> Cisco Nexus 7000
> ์ŠคํŒŒ์ธ ํŒจ๋ธŒ๋ฆญ ์Šค์œ„์น˜ 2๊ฐœ
๋„คํŠธ์›Œํฌ ์„ค๊ณ„
48x10GbE
48x10GbE
4 x 10GbE
48x10GbE
4 x 10GbE
48x10GbE
4 x 10GbE
Host Host Host
33 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
34 9/27/2013
โ€ข Apache Hadoop
> Tarball ๊ฒฝ์šฐ ์••์ถ•ํ•ด์ œ๋กœ ๋
> ์ „๋ฌธ์  ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•ด ๋””๋ ‰ํ„ฐ๋ฆฌ ์œ„์น˜ ์กฐ์ • ๊ฐ€๋Šฅ
โ€ข Cloudera, Hortonworks
> Installation Manager๊ฐ€ ์กด์žฌ
> Cloudera Manager
> Hortonworks Management Center
ํ•˜๋‘ก ๋ฐฐํฌํŒ์— ๋”ฐ๋ผ์„œ
35 9/27/2013
โ€ข ๋ฒ„์ „ ๊ด€๋ฆฌ๊ฐ€ ์šฉ์ด
โ€ข ๋””๋ ‰ํ† ๋ฆฌ ์œ„์น˜ ์ผ๊ด€์„ฑ
> /etc/hadoop
โ€“ conf ๋””๋ ‰ํ„ฐ๋ฆฌ(๋’ค์žฅ์— ์„ค๋ช…)
> /etc/rc.d/init.d
โ€“ ๋ฐ๋ชฌ ํ˜•์‹์˜ ์‹œ์ž‘/์ข…๋ฃŒ/์žฌ์‹ ์Šคํฌ๋ฆฝํŠธ
> /usr/bin
โ€“ hadoop ์‹คํ–‰ํŒŒ์ผ๊ณผ ํƒœ์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ ๋ฐ”์ด๋„ˆ๋ฆฌ ํŒŒ์ผ
> /usr/include/hadoop
โ€“ ํ•˜๋‘ก ํŒŒ์ดํ”„๋ฅผ ์œ„ํ•œ C++ ํ—ค๋” ํŒŒ์ผ
> /usr/lib
โ€“ ํ•˜๋‘ก C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
> /usr/libexec
> /usr/sbin/
> /usr/share/doc/hadoop
RPM ํŒจํ‚ค์ง€ ์‚ฌ์šฉํ–ˆ์„ ๊ฒฝ์šฐ
36 9/27/2013
โ€ข hadoop-env.sh
โ€ข core-site.xml
โ€ข hdfs-site.xml
โ€ข mapred-site.xml
โ€ข log4j.properties
โ€ข masters
โ€ข slaves
โ€ข fair-scheduler.xml
โ€ข capacity-scheduler.xml
โ€ข dfs.include
โ€ข dfs.exclude
โ€ข hadoop-policy.xml
โ€ข mapred-queue-acls.xml
โ€ข taskCtrller.cfg
Hadoop conf ๋””๋ ‰ํ„ฐ๋ฆฌ ์•„๋ž˜ ํŒŒ์ผ
Configuration Framework ํ•ต์‹ฌ
<configuration>
<property>
<name></name>
<value></value>
<finale></final>
</property>
<configuration>
37 9/27/2013
โ€ข Demo
Installation and Configuration
38 9/27/2013
name value ์„ค๋ช…
fs.default.name(core-site.xml) hdfs://centos1:9000 ํŒŒ์ผ์‹œ์Šคํ…œ์˜ URL ์ง€์ •
dfs.name.dir /data/1/dfs/nn ๋„ค์ž„๋…ธ๋“œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ
dfs.data.dir /data/1/dfs/dn ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ๋ธ”๋ก ์ €์žฅ์†Œ
fs.checkpoint.dir /data/1/dfs/snn ์ฒด์ฝ”ํฌ์ธํŠธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ
dfs.permission.supergroup hadoop ์Šˆํผ์œ ์ €๊ทธ๋ฃน์œผ๋กœ ๋ชจ๋“  HDFS
์กฐ์ž‘์ˆ˜ํ–‰
io.file.buffer.size(core-site.xml) 65536 IO๋ฒ„ํผ ํฌ๊ธฐ ํด์ˆ˜๋ก ๋„คํŠธ์›Œํฌ
์ „์†กํšจ์œจ ๋†’์•„์ง ํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ
์†Œ๋น„์™€ ์ง€์—ฐ์‹œ๊ฐ„๋„ ๋Š˜์–ด๋‚จ
dfs.balance.bandwidthPerSec ๋ถ„์‚ฐ๋œ ๋ธ”๋ก์˜ ๊ท ํ˜•์„ ๋งž์ถ”๊ธฐ ์œ„ํ•œ
๋„๊ตฌ
dfs.block.size 134217728 ์ƒˆ๋กœ์šด ํŒŒ์ผ์ด ์ƒ์„ฑ์‹œ ํ•„์š”ํ•œ ๋ธ”๋ก
ํฌ๊ธฐ
fs.trash.interval (core-site.xml) 1440(24์‹œ๊ฐ„) ํœด์ง€ํ†ต ๊ธฐ๋Šฅ
HDFS Configurations(hdfs-site.xml)
โ€ข dfs.datanode.du.reserved,
โ€ข dfs.namenode.handler.count,
โ€ข dfs.datanode.failed.volumes.tolerated
โ€ข dfs.hosts
โ€ข dfs.host.exclude
39 9/27/2013
MapReduce Configurations(mapred-site.xml)
name value ์„ค๋ช…
mapred.job.tracker hdfs://centos1:9000 Job Tracker URL scheme
mapred.local.dir /data/1/mapred/local ๋กœ์ปฌ ๋””์Šคํฌ์— ์ž„์‹œ ์ถœ๋ ฅ์„ ์ €์žฅ
mapred.java.child.opts -Xmx2g JVM ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ๊ณต๊ฐ„์—
mapred.child.ulimit 1572864 ํƒœ์Šคํฌ์˜ ๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์ œํ•œ
mapred.tasktracker.map.tasks.maximu
m
16 ๊ฐ ๋จธ์‹ ์ด ๋™์‹œ์— ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํƒœ์Šคํฌ
์ˆ˜๋Š”
mapred.tasktracker.reduce.tasks.
maximum
8 ๊ฐ ๋จธ์‹ ์ด ๋™์‹œ์— ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํƒœ์Šคํฌ
์ˆ˜๋Š”
mapred.compress.map.output org.apache.io.compress.SnappyCodec ๋ฑ€ ํƒœ์Šคํฌ์˜ ์ถœ๋ ฅ์„ ๋””์Šคํฌ์— ์ €์žฅํ•  ๋•Œ ๊ธฐ๋ณธ
์„ค์ •
mapred.output.compression.type BLOCK maprecl.output.compression.
type ์— ์ง€์ •๋œ ์••์ถ• ๋ฐฉ์‹
โ€ข mapred.jobtracker.taskScheduler
โ€ข mapred.reduce.parallel.copies
โ€ข mapred.reduce.tasks
โ€ข tasktracker.http.threads
โ€ข mapred.reduce.slowstart.completed.maps
40 9/27/2013
โ€ข HDFS ๋ธ”๋ก ์‚ฌ๋ณธ์€ ์„œ๋กœ ๋‹ค๋ฅธ ๋จธ์‹ ์— ์ €์žฅ
> ํ•˜์ง€๋งŒ, ์œ„์น˜ ์ •๋ณด๊ฐ€ ์—†๋‹ค๋ฉด ํ•˜๋‚˜์˜ ๋ž™์— ๋ชจ๋‘ ์ €์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Œ
> ๋ž™ ์ „์ฒด๊ฐ€ ์‹คํŒจํ•  ๊ฒฝ์šฐ ๊ฐ€์šฉ์„ฑ ๋ฌธ์ œ ๋ฐœ์ƒ
โ€ข ์ฒซ ๋ฒˆ์งธ ๋ณต์ œ ์ €์žฅ์†Œ๋Š” ์ž„์˜๋กœ ๊ฒฐ์ •
โ€ข ๋‘ ๋ฒˆ์งธ, ์„ธ ๋ฒˆ์งธ ๋ณต์ œ ์ €์žฅ์†Œ๋Š” ๋‹ค๋ฅธ ๋ž™์˜ ๋‘ ๋จธ์‹  ์ €์žฅ
> 3๊ฐœ์˜ ๋ž™์— ์ €์žฅํ•˜์ง€ ์•Š๋Š” ์ด์œ ๋Š” ๋จธ์‹ ์˜ ์‹คํŒจ๋ณด๋‹ค ๋ž™์˜ ์‹คํŒจ๊ฐ€ ์ ๋‹ค.
> ๋‘ ๋ฒˆ์งธ, ์„ธ ๋ฒˆ์งธ ๋™์ผ ๋ž™ ๋‚ด์˜ ๋จธ์‹ ์— ๋ฐฐ์น˜ํ•˜๋ฉด ๋ฐ์ดํ„ฐ ๊ตํ™˜ ์† ์‰ฌ์›€
โ€“ ๋ž™ ์Šค์œ„์น˜๋งŒ ๊ฑฐ์น˜๊ฒŒ ๋˜๋‹ˆ๊นŒ.
โ€ข ์Šคํฌ๋ฆฝํŠธ๋กœ ์ž‘์„ฑํ•˜์—ฌ Configuration ์„ค์ •
Rack Topology
41 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
42 9/27/2013
โ€ข ๋Œ€๊ฐœ Kerberos ์‚ฌ์šฉ
> ์ฃผ์ฒด, ์ธ์Šคํ„ฐ์Šค ์˜์—ญ์˜ ์„ธ ๊ฐ€์ง€ ์ปดํฌ๋„ŒํŠธ
> Kerberos ์ž์ฒด๋งŒ์œผ๋กœ ํ•™์Šตํ•ด์•ผ ํ•˜๋ฉฐ, Learning Curve ์ƒ๋‹นํžˆ ํผ
> ๋”ฐ๋ผ์„œ, ์‹œ์Šคํ…œ ๊ด€๋ฆฌ์ž/์šด์˜์ž์™€ ํ•จ๊ป˜
> ์˜คํžˆ๋ ค ํ•˜๋‘ก์€ core-site.xml/mapred-site.xml ์„ค์ •ํŒŒ์ผ๋งŒ ์ˆ˜์ •ํ•˜๋ฉด
๋
์‹๋ณ„, ์ธ์ฆ, ํ—ˆ๊ฐ€
name value
hadoop.security.authentication Kerberos
hadoop.securitY.authorization true
dfs.namenode.keytab.file
dfs.datanode.keytab.file
/etc/hadoop/conf/hdfs.keytab
dfs.block.access.token
dfs.namenode.kerberos.principal
dfs.namenode.kerberos.https.principal
dfs.datanode.kerberos.principal
dfs.datanode.kerberos.https.principal
host/ _HOST@MYREALM.MYCOMPANY.COM
dfs.https.address
dfs.datanode.http.address
0.0.0.0
0.0.0.0:1006
dfs.https.port 50470
dfs.datanode.address 0.0.0.0:1004
dfs.datanode.data.dir.perm 0700
43 9/27/2013
โ€ข ๋‹ค๋ฅธ ์—์ฝ”์‹œ์Šคํ…œ๊ณผ ๊ถŒํ•œ/์ธ์ฆ/ํ—ˆ๊ฐ€ ์„ค์ •์„ ํ•ด์•ผ ํ•จ
โ€ข ์•ˆ์ „ํ•˜๊ฑฐ๋‚˜ ์•„๋‹ˆ๊ฑฐ๋‚˜!
> ๋ฐ์ดํ„ฐ๋Š” ์†Œ์ค‘ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ๋‚ด์—์„œ๋„ TOP Level
์œ„์น˜
> ๊ธฐ์—… ๋‚ด SOC(Security Official Center) ์šด์˜
โ€“ ์„ฌ! ์ธํ„ฐ๋„ท, ํœด๋Œ€ํฐ๋„ ์šด์šฉ๋˜์ง€ ์•Š์Œ
โ€“ ๋”ฐ๋ผ์„œ ๊ฐœ๋ฐœ Cluster์™€ ๊ตฌ๋ถ„๋˜์–ด์•ผ ํ•จ
โ€“ Hadoop Client ์—ฐ๊ฒฐ๋˜๋Š” ๋…ธ๋“œ๊ฐ€ ํ•„์ˆ˜๋กœ ํ•„์š”ํ•จ
โ€ข ์•„๋‹ˆ๋ฉด, ์™„๋ฒฝํ•œ ๋ณด์•ˆ ๊ฐ€์ด๋“œ ์ค€์ˆ˜
> ํ•˜์ง€๋งŒ, ๋ณด์•ˆ ๋ ˆ๋ฒจ์ด ๋†’์„์ˆ˜๋ก ์„ฑ๋Šฅ์€ ์ €ํ•˜ ๋จ
> ๋ฐ์ดํ„ฐ ์•”ํ˜ธํ™”
โ€“ ๋””์ฝ”๋”ฉ ๋ ˆ๋ฒจ์„ ํ•˜๋‘ก์—์„œ ํ•  ๊ฒƒ์ธ๊ฐ€? ์•„๋‹ˆ๋ฉด ์™ธ๋ถ€ ์„œ๋ฒ„์—์„œ ํ•  ๊ฒƒ์ธ๊ฐ€?
โ€“ ๋Œ€๊ฒŒ ํ•˜๋‘ก์—์„œ๋Š” ํ•˜์ง€ ์•Š์Œ ๏ƒ  ๋””์ฝ”๋”ฉ ์ž์ฒด๊ฐ€ ๊ทน์‹ฌํ•œ IO ๋ฌธ์ œ๋ฅผ ๋ฐœ์ƒํ•จ
Hadoop Security Planning
44 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
45 9/27/2013
โ€ข ๋””๋ ‰ํ„ฐ๋ฆฌ ํŠธ๋ฆฌ ๊ตฌ์กฐ ํ™œ์šฉ์œผ๋กœ ์‚ฌ์šฉ์ž๋“ค์—์„ธ ์ฟผํ„ฐ ๋ถ€์—ฌ
# hadoop dfsadmin โ€“setSpaceQuota 10737418240 /user/joel
# hadoop fs โ€“count โ€“q /user/joel
HDFS ์ฟผํ„ฐ
/
--data/
----user-activity/
----syslog/
----purchase/ # ์ฟผํ„ฐ ์ œํ•œ ์—†์Œ
--group/
----ps/ #์ฟผํ„ฐ 100TB
--users/ #์‚ฌ์šฉ์ž ์ฟผํ„ฐ 1TB
----joel/
----ryan/
----simon/
46 9/27/2013
MapReduce Scheduler
โ€ข FIFO ์Šค์ผ€์ค„๋Ÿฌ
> ๋จผ์ € ์˜ค๋ฉด ๋จผ์ € ์ฒ˜๋ฆฌ
> ๋…์  ๋ฌธ์ œ ๋ฐœ์ƒ
> ์šฐ์„ ์ˆœ์œ„ ์ง€์›
โ€“ ๋งค์šฐ ๋‚ฎ์Œ < ๋‚ฎ์Œ < ๋ณดํ†ต < ๋†’์Œ < ๋งค์šฐ ๋†’์Œ
> ์†Œํ˜•, ์‹คํ—˜์šฉ, ๊ฐœ๋ฐœ ํด๋Ÿฌ์Šคํ„ฐ์—๋งŒ ์ ์šฉ
โ€ข Fair ์Šค์ผ€์ค„๋Ÿฌ
> ํ์— ์ œ์ถœ๋œ ์žก์€ ํ’€ ์ค‘ ํ•˜๋‚˜์— ๋ฐฐ์น˜
> ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰
> ๋‹ค๋ฅธ ํ’€์˜ ํ˜„์žฌ ์š”์ฒญ, ์ตœ์†Œ ๋ณด์žฅ ์Šฌ๋กฏ, ๊ธฐ์šฉ ์Šฌ๋กฏ ์šฉ๋Ÿ‰ ๋“ฑ์œผ๋กœ ํƒœ์Šคํฌ
์Šฌ๋กฏ ์ˆ˜ ๊ฒฐ์ •
โ€ข Capacity ์Šค์ผ€์ค„๋Ÿฌ
> ์ด ํด๋Ÿฌ์Šคํ„ฐ ์Šฌ๋กฏ ์ˆ˜๋ฅผ ๊ฐ€์ง„ ๋‹ค์ˆ˜์˜ ํ๋ฅผ ์„ค์ •
> ํ์— ์š”์ฒญ์ด ์žˆ์„ ๋•Œ๋งŒ ์˜ˆ์•ฝ
47 9/27/2013
Fair ์Šค์ผ€์ค„๋Ÿฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ค๋ช…
ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„
๋ฐฐ
์‹ค์ œ๋ถ„
๋ฐฐ
์กฐ์—˜ 20 0 20
๋ผ์ด์–ธ 40 0 40
์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80
ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„
๋ฐฐ
์‹ค์ œ๋ถ„
๋ฐฐ
์กฐ์—˜ 20 0 20
๋ผ์ด์–ธ 40 0 30
์‹ธ์ด๋จผ 120 0 30
์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80
ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„
๋ฐฐ
์‹ค์ œ๋ถ„
๋ฐฐ
์กฐ์—˜ 40 0 25
๋ผ์ด์–ธ 30 0 25
์‹ธ์ด๋จผ 30 50 30
์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80
ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„
๋ฐฐ
์‹ค์ œ๋ถ„
๋ฐฐ
์กฐ์—˜ 40 0 15
๋ผ์ด์–ธ 30 0 15
์‹ธ์ด๋จผ 60 50 50
์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80
ํ’€ ์š”์ฒญ ๊ฐ€์ค‘์น˜ ์‹ค์ œ๋ถ„
๋ฐฐ
์กฐ์—˜ 80 1 26
๋ผ์ด์–ธ 60 2 53
์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80
48 9/27/2013
โ€ข Demo
Fair Scheduler
49 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
50 9/27/2013
โ€ข ํ•˜๋‘ก์˜ ๋ชจ๋“  ๋ฐ๋ชฌ์€ Java ๋ฐ๋ชฌ
โ€ข ์‹คํ–‰ ๊ณ„์ •์ด ์ผ๋ฐ˜ ๊ณ„์ •์œผ๋กœ ์‹คํ–‰ ํ–ˆ๋‹ค๋ฉด
> # jps
โ€ข ์‹คํ–‰ ๊ณ„์ •์ด ์„œ๋น„์Šคํ˜•ํƒœ ์ฆ‰, root ์‹คํ–‰ ํ–ˆ๋‹ค๋ฉด
> # ps โ€“elf | grep java
> # ps โ€“elf | grep โ€“i namenode โ€ฆ
โ€ข ํ•˜๋‘ก์€ ํ”„๋กœ์„ธ์Šค ์‹คํ–‰
> # start-all.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ, ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ
ํŠธ๋ž˜์ปค ์‹คํ–‰
> # stop-all.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ, ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ
ํŠธ๋ž˜์ปค ์ •์ง€
> # start-dfs.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์‹คํ–‰
> # stop-dfs.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ •์ง€
> # start-mapred.sh // ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ ํŠธ๋ž˜์ปค ์‹คํ–‰
> # stop-mapred.sh // ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ ํŠธ๋ž˜์ปค ์ •์ง€
> # hadoop-daemon.sh // ํ•˜๋‘ก ๋…ธ๋“œ ๊ฐœ๋ณ„์  ์‹คํ–‰
Hadoop Process ๊ด€๋ฆฌ
51 9/27/2013
โ€ข ๋ฐ๋ชจ
Cluster Commissioned, Decommissioned
52 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
53 9/27/2013
โ€ข ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง
> JVM, dfs, mapred
> Job status, Failed Job, Task ๊ฐœ์ˆ˜
โ€ข ์ƒํƒœ ๋ชจ๋‹ˆํ„ฐ๋ง
> CPU, Memory, Disk, Network Traffic
> Hadoop Daemon, Hadoop Log
โ€ข ์ƒ์šฉ ํˆด์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
> ์ƒ๋‹นํžˆ ๊ณ ๊ฐ€
โ€ข ์ผ๋ฐ˜์ ์ธ ์˜คํ”ˆ์†Œ์Šค ํ™œ์šฉ
> Ganglia + Nagios + Ambari(Hadoop Ecosystem)
Monitoring
54 9/27/2013
โ€ข ๋ฐ์ดํ„ฐ ๋ฐฑ์—…
> ๋ถ„์‚ฐ ๋ณต์ œ(distcp)
> ๋ณ‘๋ ฌ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์‹œ ๋ฐ”๋กœ ๋ฐ์ดํ„ฐ ์ด์ค‘ํ™”
โ€“ Apache Flume ์ด์šฉํ•˜์—ฌ HDFS ์‹ฑํฌ๋กœ ํ†ตํ•ด Direct ์ €์žฅ
โ€ข ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ ๋ฐฑ์—…
> ๋‚ด์žฅ ์›น์„œ๋ฒ„ /getimage ์„œ๋ธ”๋ฆฟ ํ˜ธ์ถœ
> getimage=1, fsimage ์ถ”์ถœ
# curl โ€“o fsimage.201309 โ€˜http://centos1:50070/getimage?getimage=1โ€™
> getedit=1, edits ์ถ”์ถœ
# curl โ€“o fsimage.201309 โ€˜http://centos1:50070/getimage?getedits=1โ€™
Backup and Recovery
Hadoop Cluster 1
Hadoop Cluster 2
Flume
sink
Data source
๋™์‹œ ์ €์žฅ
55 9/27/2013
โ€ข ๊ธฐ๋ณธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
# hadoop distcp hdfs://centos1:50030/path/one hdfs://remote:50030/path/two
> -m: ๋งคํผ์ˆ˜ ์ œ์–ด
> -overwrite: ๊ธฐ์กดํŒŒ์ผ ๋ฎ์–ด์“ฐ๊ธฐ
> -update: ๋ณ€๊ฒฝ๋œ ๋ถ€๋ถ„๋งŒ ๋ณต์‚ฌ
> -delete: ์›๋ณธ์—๋Š” ์—†๊ณ  ๋ชฉ์ ์ง€์— ์žˆ๋Š” ํŒŒ์ผ ์‚ญ์ œ
โ€ข ๊ธฐ๋ณธ ๋™์ข… ํด๋Ÿฌ์Šคํ„ฐ
> hdfs://
โ€ข ์ด๊ธฐ์ข… ํด๋Ÿฌ์Šคํ„ฐ
> webhdfs://
> httpfs://
โ€ข Amazon S3 ์ง€์›
> s3://
๋ถ„์‚ฐ๋ณต์ œ Distcp
56 9/27/2013
1. Hadoop Introduction
2. Hadoop Distributed File System
3. Hadoop MapReduce
4. Hadoop Cluster Planning
5. Hadoop Installation and Configuration
6. Hadoop Security
7. Hadoop Resource Management
8. Hadoop Cluster Management
9. Hadoop Monitoring, Backup and Recovery
10. Hadoop NG; Glance at YARN
CONTENTS
57 9/27/2013
Hadoop 2.0
Aster 6.0 Key Cap.
- Graph
- BSP
- ADFS
58 9/27/2013
YARN
โ€ข Resource Manager
> Job Tracker์˜ ์ž์› ๊ด€๋ฆฌ
> ํด๋Ÿฌ์Šคํ„ฐ ๋ชจ๋‹ˆํ„ฐ๋ง
โ€ข Node Manager
> Task Tracker ์—ญํ• 
โ€“ Map๊ณผ Reduce ๊ด€๋ฆฌ
> ์‹ค์งˆ์  MapReduce ์ˆ˜ํ–‰
โ€ข Application Master
> ๋‹จ์ผ Job Tracker๊ฐ€ ์•„๋‹˜

Hadoop administration

  • 1.
  • 2.
    2 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop 2.0; Glance at YARN CONTENTS
  • 3.
    3 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 4.
    4 9/27/2013 โ€ข ๋ˆ„๊ตฌ๋‚˜Mobile device โ€ข Facebook, Twitter ๋“ฑ์˜ ์„œ๋น„์Šค ํฌํƒˆ > 100์–ต์žฅ์˜ ์‚ฌ์ง„ ๏ƒ  ์ˆ˜ PB ์Šคํ† ๋ฆฌ์ง€ โ€ข ์ด๋™ํ†ต์‹  > ์‹œ๊ฐ„๋‹น 250 GB ์ด์ƒ > ํ•˜๋ฃจ 6TB > 1๋…„, 5๋…„, 10๋…„? โ€ข IT ์„œ๋น„์Šค ์œตํ•ฉ > Mobile + Biz(๊ธˆ์œต, ์‡ผํ•‘ ๋“ฑ) Data paradigm shift 10244 10245 10248 10246 10247
  • 5.
    5 9/27/2013 โ€ข Change๏ƒจ Chance Big Data = Big Chance 2011๋…„ 2012๋…„ 2013๋…„ 1 ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ๋ฏธ๋””์–ด ํƒœ๋ธ”๋ฆฟ ์ดํ›„ ๋ชจ๋ฐ”์ผ ๊ธฐ๊ธฐ ๋Œ€์ „ 2 ๋ชจ๋ฐ”์ผ ์•ฑ๊ณผ ๋ฏธ๋””์–ด ํƒœ๋ธ”๋ฆฟ ๋ชจ๋ฐ”์ผ ์ค‘์‹ฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ์ธํ„ฐํŽ˜์ด์Šค ๋ชจ๋ฐ”์ผ ์•ฑ๊ณผ HTML5 3 ์†Œ์…œ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ๋ฐ ํ˜‘์—… ์ƒํ™ฉ์ธ์‹๊ณผ ์†Œ์…œ์ด ๊ฒฐํ•ฉ๋œ ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ ํผ์Šค๋„ ํด๋ผ์šฐ๋“œ 4 ๋น„๋””์˜ค M2M IoT 5 ์ฐจ์„ธ๋Œ€ ๋ถ„์„ ์•ฑ์Šคํ† ์–ด์™€ ๋งˆ์ผ“ ํ”Œ๋ ˆ์ด์Šค ํ•˜์ด๋ธŒ๋ฆฌ๋“œ IT์™€ ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… 6 ์†Œ์…œ ๋ถ„์„ ์ฐจ์„ธ๋Œ€ ๋ถ„์„ ์ „๋žต์  ๋น…๋ฐ์ดํ„ฐ 7 ์ƒํ™ฉ์ธ์‹ ์ปดํ“จํŒ… ๋น…๋ฐ์ดํ„ฐ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๋ถ„์„ 8 ์Šคํ† ๋ฆฌ์ง€๊ธ‰ ๋ฉ”๋ชจ๋ฆฌ ์ธ๋ฉ”๋ชจ๋ฆฌ ์ปดํ“จํŒ… ์ธ๋ฉ”๋ชจ๋ฆฌ ์ปดํ“จํŒ… 9 ์œ ๋น„์ฟผํ„ฐ์Šค ์ปดํ“จํŒ… ์ €์ „๋ ฅ ์„œ๋ฒ„ ํ†ตํ•ฉ ์—์ฝ”์‹œ์Šคํ…œ 10 ํŒจ๋ธŒ๋ฆญ ๊ธฐ๋ฐ˜ ์ปดํ“จํŒ… ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์•ฑ์Šคํ† ์–ด Data ๊ด€๋ฆฌ - ์ƒ์‚ฐ - ๊ด€๋ฆฌ - ํ™œ์šฉ
  • 6.
  • 7.
    7 9/27/2013 โ€ข ๋ฐ์ดํ„ฐ๋ฅผ์ˆ˜์ง‘ํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋น„์šฉ์ด ๋“ฌ > ์ธํ”„๋ผ์˜ ๊ตฌ์กฐ, ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์ˆ˜์šฉ ๊ฐ€๋Šฅ์„ฑ > ๊ธฐ์กด ์ธํ”„๋ผ๋Š” ๋…๋ฆฝ์  ์‹œ์Šคํ…œ, ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ, ์œ ์ง€๋ณด์ˆ˜ > ํ”Œ๋žซํผ์ด ํ•„์š” โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์‹ผ(?) ๊ตฌ์กฐ > HDFS โ€ข ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ Bundling Framework > Map + Reduce โ€ข Logic on Data > Data Locality ๋ณด์žฅ โ€ข I/O ์ง‘์ค‘์ ์ด๋ฉด์„œ CPU ์—ฐ์‚ฐ > ํŒŒ์ผ์ฒ˜๋ฆฌ ์‚ฌ๊ณ , ๋ฉ€ํ‹ฐ๋…ธ๋“œ ๋ถ€ํ•˜๋ถ„์‚ฐ ์‚ฌ์ƒ โ€ข ํ•˜๋“œ์›จ์–ด ์ถ”๊ฐ€์‹œ ์„ฑ๋Šฅ Linear > ๊ฒฐ๊ตญ, DB ๋ณด๋‹ค ์ฒด๊ฐ์†๋„ ์กด์žฌ ํ•จ Why Hadoop
  • 8.
  • 9.
  • 10.
    10 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 11.
    11 9/27/2013 โ€ข POSIX์š”๊ตฌ์‚ฌํ•ญ ์ผ๋ถ€๋ฅผ ๋งŒ์กฑ โ€ข ๋‹ค์ˆ˜์˜ ๋…๋ฆฝ ๋จธ์‹ ์œผ๋กœ ์‹œ์Šคํ…œ์œผ๋กœ ์„ฑ๋Šฅ๊ณผ ๋น„์šฉ์„ ๋ชจ๋‘ ๋งŒ์กฑ โ€ข ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ์ˆ˜์‹ญ ๊ธฐ๊ฐ€๋ฐ”์ดํŠธ ํฌ๊ธฐ์˜ ํŒŒ์ผ์„ ์ €์žฅ ๊ฐ€๋Šฅ > ์ˆ˜์‹ญ PB ์ด์ƒ๋„ ๊ฐ€๋Šฅ โ€ข Scale out ๋ชจ๋ธ > ๋Œ€์šฉ๋Ÿ‰ ์Šคํ† ๋ฆฌ์ง€ ๊ตฌ์„ฑ์„ ์œ„ํ•ด RAID ๋Œ€์‹  JBOD๋ฅผ ์ง€์› > ์• ํ”Œ๋ฆฌ์ผ€์ด์„  ์ˆ˜์ค€์˜ ๋ฐ์ดํ„ฐ ๋ณต์ œ๋กœ ๊ฐ€์šฉ์„ฑ ํ™•๋ณด์™€ ๋†’์€ ์„ฑ๋Šฅ ์œ ์ง€ โ€ข ํฐ ํŒŒ์ผ์˜ ์ŠคํŠธ๋ฆฌ๋ฐ ์ฝ๊ธฐ์™€ ์“ฐ๊ธฐ์— ๋” ์ตœ์ ํ™” > ํ•˜๋‘ก์€ ๋‹ค์ˆ˜์˜ ์ž‘์€ ํŒŒ์ผ์— ๋Œ€ํ•œ ๋งค์šฐ ๋А๋ฆฐ ์‘๋‹ต > ๋ฐฐ์น˜ ์‹คํ–‰์ด ์‘๋‹ต ์†๋„๋ณด๋‹ค ๋” ์ค‘์š” โ€ข Fault Tolerance > ๋จธ์‹ ๊ณผ ๋””์Šคํฌ ๋“ฑ์˜ ์ปดํฌ๋„ŒํŠธ ์‹คํŒจ์— ๋Œ€์ฒ˜ โ€ข ๋งต๋ฆฌ๋“€์Šค Framework ์—ฐ๊ณ„ ๊ฐ€๋Šฅํ•ด์•ผ ํ•จ HDFS Goal and Motivation
  • 12.
    12 9/27/2013 โ€ข UserLevel File System > ์ปค๋„ ์™ธ๋ถ€์—์„œ Application์ด ์ˆ˜ํ–‰ ๋จ, System Mount ๋ถˆํ•„์š” > FUSE ์‚ฌ์šฉ ์‹œ์—๋Š”? โ€ข Distributed File System โ€ข Disk Block Size > Default Size ๏ƒ  64M > 128MB, 256MB, 1GB ๋Š˜๋ฆด ์ˆ˜ ์žˆ์Œ(Trade-off) > ์™œ ๋ธ”๋ก ์‚ฌ์ด์ฆˆ๋ฅผ ๋А๋ฆด๊นŒ? ๋“œ๋ผ์ด๋ธŒ ํƒ์ƒ‰ ์กฐ์ž‘ ์ตœ์†Œํ™” I/O ์„ฑ๋Šฅ ํ–ฅ์ƒ โ€ข Data Protection > ์—ฌ๋Ÿฌ ๋จธ์‹ ์— ๋ฐ์ดํ„ฐ ๋ธ”๋ก ๋ณต์ œ > ๋ฐ์ดํ„ฐ๋Š” ํ•œ ๋ฒˆ ์“ฐ๊ฒŒ ๋˜๋ฉด ์ˆ˜์ • ๋ถˆ๊ฐ€๋Šฅ > ๋ฐ์ดํ„ฐ READ ์‹œ์—๋Š” ๋ณต์ œ ์ค‘ ํ•˜๋‚˜๋งŒ ์ฝ์Œ โ€“ ๋„คํŠธ์›Œํฌ ์ƒ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋จธ์‹ ์˜ ๋ ˆํ”Œ๋ฆฌ์นด์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๊ฒŒ ๋จ HDFS Design
  • 13.
    13 9/27/2013 โ€ข ๋„ค์ž„๋…ธ๋“œ(NameNode) >ํŒŒ์ผ์‹œ์Šคํ…œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ „๋ถ€ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ > 1๋ฐฑ๋งŒ ๋ธ”๋ก์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด 1GB์˜ Heap ํ•„์š” โ€ข ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ(Secondary NameNode) > ๋ฐฑ์—…์€ ์šฉ๋„๋Š” ์•„๋‹˜ > ๋„ค์ž„๋…ธ๋“œ ์ด๋ฏธ์ง€๋ฅผ ๊ด€๋ฆฌ, ์ผ์ข…์˜ Check Pointer Server HDFS Daemon Daemon ํด๋Ÿฌ์Šคํ„ฐ๋‹น ๊ฐœ์ˆ˜ ์šฉ๋„ ๋„ค์ž„๋…ธ๋“œ 1 ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ, ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๊ธ€๋กœ๋ฒŒ ์ด๋ฏธ์ง€ ์ œ๊ณต ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ 1 ๋„ค์ž„๋…ธ๋“œ ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ์˜ ์ฒดํฌํฌ์ธํŠธ ์ž‘์—…์ˆ˜ํ–‰ ๋ฐ์ดํ„ฐ๋…ธ๋“œ ๋‹ค์ˆ˜ ๋ธ”๋ก ๋ฐ์ดํ„ฐ ์ €์žฅ(ํŒŒ์ผ๋‚ด์šฉ)
  • 14.
    14 9/27/2013 HDFS ๋™์ž‘๋ฐฉ์‹(Read) NameNode Secondly Name Node Data Node Data Node Data Node Data Node Data Node rack1 rack2 HDFS Application Hadoop Client 1. ํŒŒ์ผ๊ฒฝ๋กœ: /foo/bar/test.txt ์š”์ฒญ 2. ๋ธ”๋ก1, ํ˜ธ์ŠคํŠธโ€ฆ ์‘๋‹ต Host1 Host2 Host3 3. ๋ธ”๋ก1 ์ฝ๊ธฐ ์š”์ฒญ 4. ๋ฐ์ดํ„ฐ ์‘๋‹ต
  • 15.
    15 9/27/2013 HDFS ๋™์ž‘๋ฐฉ์‹(Write) NameNode Secondly Name Node Data Node Data Node Data Node Data Node Data Node rack1 rack2 HDFS Application Hadoop Client 1. ํŒŒ์ผ ์ƒ์„ฑ์„ ์œ„ํ•œ ํŒŒ์ผ๊ฒฝ๋กœ ์ƒ์„ฑ ์š”์ฒญ - ํŒŒ์ผ๊ฒฝ๋กœ: /foo/bar - ๋ณต์ œ๋ณธ์ˆ˜: 3 Memory 2.1 ํŒŒ์ผ ๊ฒฝ๋กœ ์ •๋ณด ์ƒ์„ฑ(๋ฉ”๋ชจ๋ฆฌ์— ์ƒ์„ฑ) 2.2 ๋ฝ ์ƒ์„ฑ(๋‹ค๋ฅธ ํด๋ผ์ด์–ธํŠธ๊ฐ€ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•˜๊ฒŒ) Host1 Host2 3.1ํŒŒ์ผ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์„ ํƒ ํ›„ ํ˜ธ์ŠคํŠธ ์ •๋ณด ๋ฐ˜ํ™˜ (Host1, Host2, Host3) 4. ํŒŒ์ผ ๋ฐ์ดํ„ฐ ๋ฐ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ๋ชฉ๋ก ์ „์†ก Host3 5.1 ๋กœ์ปฌ์ €์žฅ 5.2๋ณต์ œ๋ณธ ์ €์žฅ 5.3๋ณต์ œ๋ณธ ์ €์žฅ edits 5.4 ์ €์žฅ์™„๋ฃŒ (close()๋ช…๋ น) 6. ๋ฉ”๋ชจ๋ฆฌ์˜ ๋‚ด์šฉ์„ edits ํŒŒ์ผ์— ๊ธฐ๋ก(๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋“ฑ๋ก) ์ •ํ•ด์ง„ ๋ธ”๋กํฌ๊ธฐ๋ฅผ ๋„˜์–ด์„œ๋ฉด ํด๋ผ์ด์–ธํŠธ๋Š” ๋„ค์ž„ ๋…ธ๋“œ๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์š”์ฒญ fsimage ์ฃผ๊ธฐ์ ์œผ๋กœ ๋‹ค์šด๋กœ๋“œ ํ›„ edits์™€ fsimage ๋ณ‘ํ•ฉ fsimage ํŒŒ์ผ์„ Name Node๋กœ ์ „์†ก
  • 16.
    16 9/27/2013 HDFS ๋™์ž‘๋ฐฉ์‹(๋ฉ”ํƒ€๋ฐ์ดํ„ฐ) NameNode Secondly Name Node 1. edit ํŒŒ์ผ ํšŒ์ „(๋ณด๊ด€, edits.new ์ƒ์„ฑ) 2. fsimage์™€ edits ํŒŒ์ผ ๋ณต์‚ฌ 4. ์ƒˆ๋กœ์šด fsimage ํŒŒ์ผ ์ „์†ก Disk 3. ๋‘ ํŒŒ์ผ์„ ์ฝ์–ด์„œ edits ๋ฅผ ๋ฐ˜์˜, ์ƒˆ๋กœ์šด fsimage ํŒŒ์ผ ์ƒ์„ฑ5. edits.new ํŒŒ์ผ๋ช…์„ edits๋กœ ๋ณ€๊ฒฝ Disk โ€ข ๋งค์‹œ๊ฐ„(๊ธฐ๋ณธ๊ฐ’) ๋˜๋Š” ๋„ค์ž„๋…ธ๋“œ edits ํŒŒ์ผ์˜ ํฌ๊ธฐ๊ฐ€ 64MB๊ฐ€ ๋˜๋ฉด ๋ฐœ์ƒ โ€ข ์ตœ๊ทผ ํ•˜๋‘ก๋ฒ„์ „์€ ํŠธ๋žœ์žญ์…˜ ๊ฐœ์ˆ˜ ๊ธฐ์ค€์œผ๋กœ ์ฒดํฌ ํฌ์ธํŠธ๋ฅผ ์ˆ˜ํ–‰
  • 17.
    17 9/27/2013 NFS โ€ข HadoopNameNode๋Š” SPOF โ€ข ๋”ฐ๋ผ์„œ HA ๊ตฌ์กฐ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•จ โ€ข HA ๊ธฐ๋ณธ ๊ธฐ๋Šฅ์€ Hadoop 1.x ๋ฒ„์ „ ์ดํ•˜์—์„œ๋Š” ์กด์žฌํ•˜์ง€ ์•Š์Œ > Hadoop 2.0(YARN)์˜ ๊ฐ€์žฅ ์ธ๊ธฐ์žˆ๋Š” ๊ธฐ๋Šฅ โ€ข NFS, Zookeeper ๋“ฑ์„ ์ด์šฉํ•˜์—ฌ HA ๊ตฌํ˜„ Hadoop HA(NameNode) Zookeeper Quorum ๋ณต๊ตฌ ์ปจํŠธ๋กค๋Ÿฌ ๋ณต๊ตฌ ์ปจํŠธ๋กค๋Ÿฌ ๋„ค์ž„๋…ธ๋“œ(Ac tive) ๋„ค์ž„๋…ธ๋“œ(St andby) ๋ฉ”ํƒ€๋ฐ์ด ํ„ฐ ๊ณต์œ  ๋ฉ”ํƒ€๋ฐ์ด ํ„ฐ ๋ฉ”ํƒ€๋ฐ์ด ํ„ฐ * ๊ทธ ๋ฐ–์˜ ํŽ˜๋”๋ ˆ์ด์…˜ ๊ตฌ์กฐ๊ฐ€ ์žˆ์Œ
  • 18.
    18 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 19.
    19 9/27/2013 โ€ข MapReduceFramework ์‚ฌ์ƒ > Logic on Data; Data Locality ๋ฐ˜์˜ โ€ข MapReduce: Simplified Data Processing on Large Clusters โ€ข ๊ฐœ๋ฐœ ๋‹จ์ˆœ์„ฑ > ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ๋ ˆ์ฝ”๋“œ๋งŒ์„ ๋‹ค๋ฃจ๋Š” ๋ฐ์ด์ฒ˜ ์ฒ˜๋ฆฌ ํ”„๋กœ๊ทธ๋žจ๋งŒ ์ž‘์„ฑํ•˜๋ฉด ๋จ > ํ•˜์ง€๋งŒ, Functionality Programming ๊ฐœ๋…์„ ์ˆ™์ง€; KEY-VALUE โ€“ ์‚ฌ์‹ค, Java MapReduce ๋Š” ๋ณ€์ข… > Filtering, Projection, Grouping, Aggregation ๋“ฑ์„ ๊ตฌํ˜„ โ€ข ํ™•์žฅ์„ฑ > ํƒœ์Šคํฌ๋“ค์€ ์„œ๋กœ ํ†ต์‹ ํ•˜๊ฑฐ๋‚˜, ์ƒํƒœ๋ฅผ ๊ณต์œ ํ•˜์ง€ ์•Š๊ณ  ๋ถ„์‚ฐ๋œ ๋จธ์‹ ์—์„œ ๋ณ‘๋ ฌ ์ˆ˜ํ–‰ โ€ข ๋‚ด๊ณ ์žฅ์„ฑ > ์‹คํŒจ๋Š” ์˜ˆ์™ธ๊ฐ€ ์•„๋‹ˆ๋ผ ํ•ญ์ƒ ์ผ์–ด๋‚˜๋Š” ์ผ > ํด๋Ÿฌ์Šคํ„ฐ์˜ ์›Œ์ปค ๋…ธ๋“œ์—์„œ ํƒœ์Šคํฌ๊ฐ€ ์‹คํŒจํ•˜๋ฉด ๋‹ค์‹œ ์‹œ๋„ MapReduce ์˜ Features
  • 20.
    20 9/27/2013 โ€ข (1)Job summit ๏ƒจ (2) Map Task ๏ƒจ (3) Shuffling& Sort ๏ƒจ (4) Reduce Task MapReduce 4๋‹จ๊ณ„ ์ฒ˜๋ฆฌ
  • 21.
    21 9/27/2013 โ€ข JobTracker > 1๊ฐœ์ž„ > Client์™€ Task Tracker์™€ RPC ํ†ต์‹  ์ˆ˜ํ–‰ > Task Tracker๋Š” Heartbeat์œผ๋กœ ์ƒํƒœ๊ณผ ์ •๋ณด๋ฅผ ๋ณด๋‚ด ์คŒ > Job Configuration ์ฑ…์ž„ โ€ข Task Tracker > ์‚ฌ์šฉ์ž ์ฝ”๋“œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ๋ชฌ > Job Tracker์—๊ฒŒ ์ฃผ๊ธฐ์ ์œผ๋กœ ์ง„ํ–‰๊ณผ์ • ๋ณด๊ณ  > Job Tracker๋กœ๋ถ€ํ„ฐ ํ• ๋‹น ๋ฐ›์œผ๋ฉด ์ƒˆ๋กœ์šด ํ”„๋กœ์„ธ์Šค ๋งŒ๋“ค์–ด Task Attemptํ•จ โ€“ Task vs. Task Attempt MapReduce Daemon
  • 22.
    22 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 23.
    23 9/27/2013 1. ํ•˜๋‘ก๋ฐฐํฌํŒ ๋ฒ„์ „ ์„ ํƒ 2. ํ•˜๋“œ์›จ์–ด ์„ ํƒ 3. ์šด์˜์ฒด์ œ ์„ ํƒœ๊ณผ ์ค€๋น„ ์‚ฌํ•ญ 4. ์ปค๋„ ํŠœ๋‹ 5. ๋„คํŠธ์›Œํฌ ์„ค๊ณ„ Cluster Planning
  • 24.
    24 9/27/2013 โ€ข ApacheHadoop vs. Packaging Hadoop > Apache Hadoop > Hortonworks > Cloudera ํ•˜๋‘ก ํŒจํฌํŒ๊ณผ ๋ฒ„์ „ ์„ ํƒ
  • 25.
    25 9/27/2013 โ€ข 20๋Œ€๋ฏธ๋งŒ ํด๋Ÿฌ์Šคํ„ฐ(์†Œํ˜• ํด๋Ÿฌ์Šคํ„ฐ) ๋งˆ์Šคํ„ฐ ํ•˜๋“œ์›จ์–ด ์ €๋น„์šฉ ํˆฌ์ž > CPU: 2.6GH, ์ฟผ๋“œ์ฝ”์–ด CPU 2๊ฐœ > MEM: DDR3 RAM 24GB > NT: 1GB NIC 2๊ฐœ > Controller: SAS ๋“œ๋ผ์ด๋ฒ„ Controller, SAS II(OS ๋””๋ฐ”์ด์Šค JBOD) > Disk Storage: ์ตœ์†Œ 1TB ์ด์ƒ โ€ข 300๋Œ€ ๋ฏธ๋งŒ > Memory๋งŒ 24GB ํ˜น์€ 48GB โ€ข ๋Œ€ํ˜• Cluster > Memory 96GB ํ•˜๋“œ์›จ์–ด ์„ ํƒ ๊ตฌ๋ถ„ ๋Œ€์ƒ ๊ณ ๋ ค์‚ฌํ•ญ ๋งˆ์Šคํ„ฐ ํ•˜๋“œ์›จ์–ด ์„ ํƒ ๋„ค์ž„๋…ธ๋“œ โ€ข ๋…๋ฆฝ๋จธ์‹ , ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ โ€ข CPU๋ณด๋‹ค ๋ฉ”๋ชจ๋ฆฌ ์œ„์ฃผ ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ โ€ข ๋„ค์ž„๋…ธ๋“œ์™€ ๊ฐ™์Œ, ๋™์ผํ•œ ๋ฉ”๋ชจ๋ฆฌ, ๋””์Šคํฌ ์šฉ๋Ÿ‰ ํ•„์š” ์žก ํŠธ๋ž˜์ปค โ€ข ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉ, ๋„ค์ž„๋…ธ๋“œ์™€ ๊ฐ™์€ Spec
  • 26.
    26 9/27/2013 ํ•˜๋“œ์›จ์–ด ์„ ํƒ ๊ตฌ๋ถ„๋Œ€์ƒ ๊ณ ๋ ค์‚ฌํ•ญ ์›Œ์ปค ํ•˜๋“œ์›จ์–ด ์„ ํƒ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ โ€ข ์ €์žฅ๊ณผ ๊ณ„์‚ฐ ๋‘ ์—ญํ•  ๋ชจ๋‘ ํ•จ โ€ข ๋”ฐ๋ผ์„œ CPU, Disk Storage ๊ณ ๋ ค โ€ข ๋ณต์ œ ๊ณ„์ˆ˜, X 3 โ€ข MapReduce ์ž„์‹œ ์ €์žฅ, 20~30% ์ถ”๊ฐ€ ์ €์žฅ โ€ข ๋ฉ”๋ชจ๋ฆฌ ๋ณด๋‹ค, Disk ์œ„์ฃผ โ€ข ์—ฐ์‚ฐ์„ ์œ„ํ•œ CPU ์„ ํƒ ํƒœ์Šคํฌ ํŠธ๋ž˜์ปค ์ค‘๊ธ‰(๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ, ๋งŽ์€ ๋””์Šคํฌ, 1GB ์ด๋”๋„ท) ๊ณ ๊ธ‰(๋Œ€์šฉ๋Ÿ‰ ๋ฉ”๋ชจ๋ฆฌ, ์ดˆ๊ณ ์† ๋””์Šคํฌ, 10GB ์ด๋”๋„ท) โ€ข CPU: 2.9GHz, 15MB ์บ์‹œ 6์ฝ”์–ด CPU 2๊ฐœ โ€ข ๋ฉ”๋ชจ๋ฆฌ: DDR3-1600 ECC 64GB โ€ข ๋””์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ: SAS 6GB/S 1๊ฐœ โ€ข ๋””์Šคํฌ: 3TB LFF SATA II 7200 RPM HDD 12๊ฐœ โ€ข ๋„คํŠธ์›Œํฌ: 1GB ์ด๋”๋„ท 2๊ฐœ โ€ข ๋น„๊ณ : ์ธํ…” ํ•˜์ดํผ ์“ฐ๋ ˆ๋”ฉ QPI ๊ธฐ๋Šฅ ๊ถŒ์žฅ โ€ข 3 ๋˜๋Š” 4 ์ฑ„๋„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ • โ€ข CPU: 2.9GHz, 15MB ์บ์‹œ 6์ฝ”์–ด CPU 2๊ฐœ โ€ข ๋ฉ”๋ชจ๋ฆฌ: DDR3-1600 ECC 96GB โ€ข ๋””์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ: SAS 6GB/S 2๊ฐœ โ€ข ๋””์Šคํฌ: 3TB LFF SATA II 7200 RPM HDD 24๊ฐœ โ€ข ๋„คํŠธ์›Œํฌ: 10GB ์ด๋”๋„ท 1๊ฐœ โ€ข ๋น„๊ณ : ์ธํ…” ํ•˜์ดํผ ์“ฐ๋ ˆ๋”ฉ QPI ๊ธฐ๋Šฅ ๊ถŒ์žฅ โ€ข 3 ๋˜๋Š” 4 ์ฑ„๋„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •
  • 27.
    27 9/27/2013 โ€ข Scaleup ๋Œ€์‹  Scale out โ€ข ๊ฐ€์ƒํ™”๋ฅผ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. > ๋ฒ ์–ด๋ฉ”ํƒˆ ๋“ฑ์˜ ํ•˜์ดํผ๋ฐ”์ด์ €๋Š” I/O ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์คŒ ํ•˜๋“œ์›จ์–ด ์„ ํƒ: ํด๋Ÿฌ์Šคํ„ฐ ๊ทœ๋ชจ ๊ฒฐ์ • ์ผ์ผ ์ˆ˜์ง‘ ์šฉ๋Ÿ‰ 1TB ๋ณต์ œ ์ธ์ˆ˜ 3(๋ธ”๋ก ์‚ฌ๋ณธ ์ˆ˜) ์ผ์ผ ์‹ค์ œ ์šฉ๋Ÿ‰ 3TB ์ผ์ผ์ˆ˜์ง‘ ์šฉ๋Ÿ‰ X ๋ณต์ œ์ธ์ˆ˜ ๋…ธ๋“œ ์ €์žฅ ์š”๋Ÿ‰ 24TB 2TB SATA II HDD 12๊ฐœ ๋งต๋ฆฌ๋“€์Šค ์ž„์‹œ ๋ฐ์ดํ„ฐ ์šฉ๋Ÿ‰ 25% ๋งต๋ฆฌ๋“€์Šค ์ž„์‹œ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ๊ฐ€์šฉ ์ €์žฅ์†Œ ์šฉ๋Ÿ‰ 18TB ๋…ธ๋“œ์ €์žฅ์šฉ๋Ÿ‰ โ€“ ๋งต๋ฆฌ๋“€์Šค ์ž„์‹œ ๋ฐ์ดํ„ฐ ์šฉ๋Ÿ‰ 1๋…„ 61 ๋…ธ๋“œ ์ผ์ผ์ˆ˜์ง‘์šฉ๋Ÿ‰ X ๋ณต์ œ์ธ์ˆ˜ X 365 / ๋…ธ๋“œ ๊ฐ€์šฉ ์ €์žฅ์†Œ ์šฉ๋Ÿ‰ 1๋…„(๋งค์›” 5% ์ฆ๊ฐ€) 81 ๋…ธ๋“œ 1๋…„(๋งค์›” 10% ์ฆ๊ฐ€) 109 ๋…ธ๋“œ
  • 28.
    28 9/27/2013 โ€ข RedHat,CentOS, Ubuntu, SuSE ๋“ฑ ๋ฆฌ๋ˆ…์Šค ์šด์˜์ฒด์ œ ์ตœ์ ํ™” โ€ข Puppet, Chef ๋“ฑ์˜ ์„ค์ •๊ด€๋ฆฌ ์‹œ์Šคํ…œ ํ•„์š”(์˜คํ”ˆ์†Œ์Šค) โ€ข ์†Œํ”„ํŠธ์›จ์–ด > Oracle Java 1.6 ์ด์ƒ โ€“ Hadoop RPM ์‚ฌ์šฉํ•˜๋ ค๋ฉด, Oracle Java๋„ RPM ๋ฒ„์ „์œผ๋กœ ์„ค์น˜ > Cron Daemon > ntp > SSH > SNTP > rsync โ€ข Hostname, DNS ์ธ์‹ > /etc/hostname, /etc/hosts, java dns ์šด์˜์ฒด์ œ ์„ ํƒ๊ณผ ์ค€๋น„์‚ฌํ•ญ
  • 29.
    29 9/27/2013 ๋ฐ๋ชฌ ์œ„์น˜์„ค์ • ๋งค๊ฐœ๋ณ€์ˆ˜ ์†Œ์œ ์ž:๊ทธ๋ฃน ๊ถŒํ•œ NameNode /data/1/dfs/nn, /data/2/dfs/nn, /data/3/dfs/nn dfs.name.dir hdfs:hadoop 0700 Secondary Name Node /data/1/dfs/snn fs.checkpoint.dir hdfs:hadoop 0700 DataNode /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn dfs.datanode.dir hdfs:hadoop 0700 Task Tracker /data/1/mapred/local, /data/2/mapred/local, /data/3/mapred/local, mapred.local.dir mapred:hadoop 0700 Job Tracker /data/1/mapred/local mapred.local.dir mapred:hadoop 0700 ์ „์ฒด /var/log/hadoop /tmp/hadoopuser.name $HADOOP_LOG_DIR hadoop.tmp.dir root:hadoop root:root 0775 1777 Directory Layout
  • 30.
    30 9/27/2013 โ€ข ์ปค๋„๋งค๊ฐœ๋ณ€์ˆ˜ /etc/sysctl.conf ์„ค์ • > ๋ฐ˜๋“œ์‹œ ์žฌ์‹œ์ž‘ํ•ด์•ผ ๋ฐ˜์˜ โ€ข vm.swappiness > ๋ฐ์ดํ„ฐ์™€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ„์˜ SWAP > 0~100, ์ˆ˜์น˜๊ฐ€ ๋†’์„์ˆ˜๋ก ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋งŽ์ด ์Šค์™‘ > ๋Œ€๋žต 60~80์ด๋ฉด ์ ๋‹น โ€ข Vm.overcommit_memory > malloc() ํ•จ์ˆ˜ ํ˜ธ์ถœํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ์˜ต์…˜ > 0: ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•˜๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น > 1: ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ ์ดˆ๊ณผํ•  ๋•Œ vm.overcommit_ratio ๊ธฐ์ค€์œผ๋กœ ์Šค์™‘ํ›„ ํ• ๋‹น โ€“ ratio 50์ด๊ณ , ๋ฌผ๋ฆฌ ๋ฉ”๋ชจ๋ฆฌ 1GB์ด๋ฉด ์ปค๋„ ์Šค์™‘ ํฌํ•จํ•ด์„œ 1.5GB > 2: ์ปค๋„์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ์š”์ฒญ์„ ๋ฌด์กฐ๊ฑฐ ์ˆ˜์šฉ โ€“ ์ ˆ๋Œ€ ๊ถŒ๊ณ ํ•˜์ง€ ์•Š์Œ ์ปค๋„ ํŠœ๋‹
  • 31.
    31 9/27/2013 โ€ข ๋ฆฌ๋ˆ…์Šค์˜LVM ์‚ฌ์šฉ ๊ธˆ์ง€ > /dev/sd* ์•„๋‹Œ /dev/vg* ์ด๋ฉด ์ž˜ ๋ชป๋œ ์„ค์ • โ€ข ๋Œ€๋ถ€๋ถ„ ์šด์˜์ฒด์ œ ํŒŒ์ผ์‹œ์Šคํ…œ ๋”ฐ๋ผ ๊ฐ > ext3, ext4, xfs โ€ข ํ•ญ์ƒ ๋งˆ์šดํŠธ ์˜ต์…˜์— ์ถ”๊ฐ€ํ•ด์•ผ ํ•จ ๋””์Šคํฌ ์„ค์ •
  • 32.
    32 9/27/2013 โ€ข ๊ผญ๋„คํŠธ์›Œํฌ ์—”์ง€๋‹ˆ์–ด์™€ ํ•จ๊ป˜ ์„ค๊ณ„ํ•ด์•ผ ํ•จ โ€ข ๊ณ ๋ฆฝ๋œ ๋„คํŠธ์›Œํฌ๊ฐ€ ์ตœ์  โ€ข 1,152 ์ง€์› ์žฅ๋น„ > Cisco Nexus 7000 > ์ŠคํŒŒ์ธ ํŒจ๋ธŒ๋ฆญ ์Šค์œ„์น˜ 2๊ฐœ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„ 48x10GbE 48x10GbE 4 x 10GbE 48x10GbE 4 x 10GbE 48x10GbE 4 x 10GbE Host Host Host
  • 33.
    33 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 34.
    34 9/27/2013 โ€ข ApacheHadoop > Tarball ๊ฒฝ์šฐ ์••์ถ•ํ•ด์ œ๋กœ ๋ > ์ „๋ฌธ์  ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•ด ๋””๋ ‰ํ„ฐ๋ฆฌ ์œ„์น˜ ์กฐ์ • ๊ฐ€๋Šฅ โ€ข Cloudera, Hortonworks > Installation Manager๊ฐ€ ์กด์žฌ > Cloudera Manager > Hortonworks Management Center ํ•˜๋‘ก ๋ฐฐํฌํŒ์— ๋”ฐ๋ผ์„œ
  • 35.
    35 9/27/2013 โ€ข ๋ฒ„์ „๊ด€๋ฆฌ๊ฐ€ ์šฉ์ด โ€ข ๋””๋ ‰ํ† ๋ฆฌ ์œ„์น˜ ์ผ๊ด€์„ฑ > /etc/hadoop โ€“ conf ๋””๋ ‰ํ„ฐ๋ฆฌ(๋’ค์žฅ์— ์„ค๋ช…) > /etc/rc.d/init.d โ€“ ๋ฐ๋ชฌ ํ˜•์‹์˜ ์‹œ์ž‘/์ข…๋ฃŒ/์žฌ์‹ ์Šคํฌ๋ฆฝํŠธ > /usr/bin โ€“ hadoop ์‹คํ–‰ํŒŒ์ผ๊ณผ ํƒœ์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ ๋ฐ”์ด๋„ˆ๋ฆฌ ํŒŒ์ผ > /usr/include/hadoop โ€“ ํ•˜๋‘ก ํŒŒ์ดํ”„๋ฅผ ์œ„ํ•œ C++ ํ—ค๋” ํŒŒ์ผ > /usr/lib โ€“ ํ•˜๋‘ก C ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ > /usr/libexec > /usr/sbin/ > /usr/share/doc/hadoop RPM ํŒจํ‚ค์ง€ ์‚ฌ์šฉํ–ˆ์„ ๊ฒฝ์šฐ
  • 36.
    36 9/27/2013 โ€ข hadoop-env.sh โ€ขcore-site.xml โ€ข hdfs-site.xml โ€ข mapred-site.xml โ€ข log4j.properties โ€ข masters โ€ข slaves โ€ข fair-scheduler.xml โ€ข capacity-scheduler.xml โ€ข dfs.include โ€ข dfs.exclude โ€ข hadoop-policy.xml โ€ข mapred-queue-acls.xml โ€ข taskCtrller.cfg Hadoop conf ๋””๋ ‰ํ„ฐ๋ฆฌ ์•„๋ž˜ ํŒŒ์ผ Configuration Framework ํ•ต์‹ฌ <configuration> <property> <name></name> <value></value> <finale></final> </property> <configuration>
  • 37.
  • 38.
    38 9/27/2013 name value์„ค๋ช… fs.default.name(core-site.xml) hdfs://centos1:9000 ํŒŒ์ผ์‹œ์Šคํ…œ์˜ URL ์ง€์ • dfs.name.dir /data/1/dfs/nn ๋„ค์ž„๋…ธ๋“œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ dfs.data.dir /data/1/dfs/dn ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ๋ธ”๋ก ์ €์žฅ์†Œ fs.checkpoint.dir /data/1/dfs/snn ์ฒด์ฝ”ํฌ์ธํŠธ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ dfs.permission.supergroup hadoop ์Šˆํผ์œ ์ €๊ทธ๋ฃน์œผ๋กœ ๋ชจ๋“  HDFS ์กฐ์ž‘์ˆ˜ํ–‰ io.file.buffer.size(core-site.xml) 65536 IO๋ฒ„ํผ ํฌ๊ธฐ ํด์ˆ˜๋ก ๋„คํŠธ์›Œํฌ ์ „์†กํšจ์œจ ๋†’์•„์ง ํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„์™€ ์ง€์—ฐ์‹œ๊ฐ„๋„ ๋Š˜์–ด๋‚จ dfs.balance.bandwidthPerSec ๋ถ„์‚ฐ๋œ ๋ธ”๋ก์˜ ๊ท ํ˜•์„ ๋งž์ถ”๊ธฐ ์œ„ํ•œ ๋„๊ตฌ dfs.block.size 134217728 ์ƒˆ๋กœ์šด ํŒŒ์ผ์ด ์ƒ์„ฑ์‹œ ํ•„์š”ํ•œ ๋ธ”๋ก ํฌ๊ธฐ fs.trash.interval (core-site.xml) 1440(24์‹œ๊ฐ„) ํœด์ง€ํ†ต ๊ธฐ๋Šฅ HDFS Configurations(hdfs-site.xml) โ€ข dfs.datanode.du.reserved, โ€ข dfs.namenode.handler.count, โ€ข dfs.datanode.failed.volumes.tolerated โ€ข dfs.hosts โ€ข dfs.host.exclude
  • 39.
    39 9/27/2013 MapReduce Configurations(mapred-site.xml) namevalue ์„ค๋ช… mapred.job.tracker hdfs://centos1:9000 Job Tracker URL scheme mapred.local.dir /data/1/mapred/local ๋กœ์ปฌ ๋””์Šคํฌ์— ์ž„์‹œ ์ถœ๋ ฅ์„ ์ €์žฅ mapred.java.child.opts -Xmx2g JVM ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ๊ณต๊ฐ„์— mapred.child.ulimit 1572864 ํƒœ์Šคํฌ์˜ ๊ฐ€์ƒ ๋ฉ”๋ชจ๋ฆฌ ์šฉ๋Ÿ‰์„ ์ œํ•œ mapred.tasktracker.map.tasks.maximu m 16 ๊ฐ ๋จธ์‹ ์ด ๋™์‹œ์— ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํƒœ์Šคํฌ ์ˆ˜๋Š” mapred.tasktracker.reduce.tasks. maximum 8 ๊ฐ ๋จธ์‹ ์ด ๋™์‹œ์— ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํƒœ์Šคํฌ ์ˆ˜๋Š” mapred.compress.map.output org.apache.io.compress.SnappyCodec ๋ฑ€ ํƒœ์Šคํฌ์˜ ์ถœ๋ ฅ์„ ๋””์Šคํฌ์— ์ €์žฅํ•  ๋•Œ ๊ธฐ๋ณธ ์„ค์ • mapred.output.compression.type BLOCK maprecl.output.compression. type ์— ์ง€์ •๋œ ์••์ถ• ๋ฐฉ์‹ โ€ข mapred.jobtracker.taskScheduler โ€ข mapred.reduce.parallel.copies โ€ข mapred.reduce.tasks โ€ข tasktracker.http.threads โ€ข mapred.reduce.slowstart.completed.maps
  • 40.
    40 9/27/2013 โ€ข HDFS๋ธ”๋ก ์‚ฌ๋ณธ์€ ์„œ๋กœ ๋‹ค๋ฅธ ๋จธ์‹ ์— ์ €์žฅ > ํ•˜์ง€๋งŒ, ์œ„์น˜ ์ •๋ณด๊ฐ€ ์—†๋‹ค๋ฉด ํ•˜๋‚˜์˜ ๋ž™์— ๋ชจ๋‘ ์ €์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Œ > ๋ž™ ์ „์ฒด๊ฐ€ ์‹คํŒจํ•  ๊ฒฝ์šฐ ๊ฐ€์šฉ์„ฑ ๋ฌธ์ œ ๋ฐœ์ƒ โ€ข ์ฒซ ๋ฒˆ์งธ ๋ณต์ œ ์ €์žฅ์†Œ๋Š” ์ž„์˜๋กœ ๊ฒฐ์ • โ€ข ๋‘ ๋ฒˆ์งธ, ์„ธ ๋ฒˆ์งธ ๋ณต์ œ ์ €์žฅ์†Œ๋Š” ๋‹ค๋ฅธ ๋ž™์˜ ๋‘ ๋จธ์‹  ์ €์žฅ > 3๊ฐœ์˜ ๋ž™์— ์ €์žฅํ•˜์ง€ ์•Š๋Š” ์ด์œ ๋Š” ๋จธ์‹ ์˜ ์‹คํŒจ๋ณด๋‹ค ๋ž™์˜ ์‹คํŒจ๊ฐ€ ์ ๋‹ค. > ๋‘ ๋ฒˆ์งธ, ์„ธ ๋ฒˆ์งธ ๋™์ผ ๋ž™ ๋‚ด์˜ ๋จธ์‹ ์— ๋ฐฐ์น˜ํ•˜๋ฉด ๋ฐ์ดํ„ฐ ๊ตํ™˜ ์† ์‰ฌ์›€ โ€“ ๋ž™ ์Šค์œ„์น˜๋งŒ ๊ฑฐ์น˜๊ฒŒ ๋˜๋‹ˆ๊นŒ. โ€ข ์Šคํฌ๋ฆฝํŠธ๋กœ ์ž‘์„ฑํ•˜์—ฌ Configuration ์„ค์ • Rack Topology
  • 41.
    41 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 42.
    42 9/27/2013 โ€ข ๋Œ€๊ฐœKerberos ์‚ฌ์šฉ > ์ฃผ์ฒด, ์ธ์Šคํ„ฐ์Šค ์˜์—ญ์˜ ์„ธ ๊ฐ€์ง€ ์ปดํฌ๋„ŒํŠธ > Kerberos ์ž์ฒด๋งŒ์œผ๋กœ ํ•™์Šตํ•ด์•ผ ํ•˜๋ฉฐ, Learning Curve ์ƒ๋‹นํžˆ ํผ > ๋”ฐ๋ผ์„œ, ์‹œ์Šคํ…œ ๊ด€๋ฆฌ์ž/์šด์˜์ž์™€ ํ•จ๊ป˜ > ์˜คํžˆ๋ ค ํ•˜๋‘ก์€ core-site.xml/mapred-site.xml ์„ค์ •ํŒŒ์ผ๋งŒ ์ˆ˜์ •ํ•˜๋ฉด ๋ ์‹๋ณ„, ์ธ์ฆ, ํ—ˆ๊ฐ€ name value hadoop.security.authentication Kerberos hadoop.securitY.authorization true dfs.namenode.keytab.file dfs.datanode.keytab.file /etc/hadoop/conf/hdfs.keytab dfs.block.access.token dfs.namenode.kerberos.principal dfs.namenode.kerberos.https.principal dfs.datanode.kerberos.principal dfs.datanode.kerberos.https.principal host/ _HOST@MYREALM.MYCOMPANY.COM dfs.https.address dfs.datanode.http.address 0.0.0.0 0.0.0.0:1006 dfs.https.port 50470 dfs.datanode.address 0.0.0.0:1004 dfs.datanode.data.dir.perm 0700
  • 43.
    43 9/27/2013 โ€ข ๋‹ค๋ฅธ์—์ฝ”์‹œ์Šคํ…œ๊ณผ ๊ถŒํ•œ/์ธ์ฆ/ํ—ˆ๊ฐ€ ์„ค์ •์„ ํ•ด์•ผ ํ•จ โ€ข ์•ˆ์ „ํ•˜๊ฑฐ๋‚˜ ์•„๋‹ˆ๊ฑฐ๋‚˜! > ๋ฐ์ดํ„ฐ๋Š” ์†Œ์ค‘ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ๋‚ด์—์„œ๋„ TOP Level ์œ„์น˜ > ๊ธฐ์—… ๋‚ด SOC(Security Official Center) ์šด์˜ โ€“ ์„ฌ! ์ธํ„ฐ๋„ท, ํœด๋Œ€ํฐ๋„ ์šด์šฉ๋˜์ง€ ์•Š์Œ โ€“ ๋”ฐ๋ผ์„œ ๊ฐœ๋ฐœ Cluster์™€ ๊ตฌ๋ถ„๋˜์–ด์•ผ ํ•จ โ€“ Hadoop Client ์—ฐ๊ฒฐ๋˜๋Š” ๋…ธ๋“œ๊ฐ€ ํ•„์ˆ˜๋กœ ํ•„์š”ํ•จ โ€ข ์•„๋‹ˆ๋ฉด, ์™„๋ฒฝํ•œ ๋ณด์•ˆ ๊ฐ€์ด๋“œ ์ค€์ˆ˜ > ํ•˜์ง€๋งŒ, ๋ณด์•ˆ ๋ ˆ๋ฒจ์ด ๋†’์„์ˆ˜๋ก ์„ฑ๋Šฅ์€ ์ €ํ•˜ ๋จ > ๋ฐ์ดํ„ฐ ์•”ํ˜ธํ™” โ€“ ๋””์ฝ”๋”ฉ ๋ ˆ๋ฒจ์„ ํ•˜๋‘ก์—์„œ ํ•  ๊ฒƒ์ธ๊ฐ€? ์•„๋‹ˆ๋ฉด ์™ธ๋ถ€ ์„œ๋ฒ„์—์„œ ํ•  ๊ฒƒ์ธ๊ฐ€? โ€“ ๋Œ€๊ฒŒ ํ•˜๋‘ก์—์„œ๋Š” ํ•˜์ง€ ์•Š์Œ ๏ƒ  ๋””์ฝ”๋”ฉ ์ž์ฒด๊ฐ€ ๊ทน์‹ฌํ•œ IO ๋ฌธ์ œ๋ฅผ ๋ฐœ์ƒํ•จ Hadoop Security Planning
  • 44.
    44 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 45.
    45 9/27/2013 โ€ข ๋””๋ ‰ํ„ฐ๋ฆฌํŠธ๋ฆฌ ๊ตฌ์กฐ ํ™œ์šฉ์œผ๋กœ ์‚ฌ์šฉ์ž๋“ค์—์„ธ ์ฟผํ„ฐ ๋ถ€์—ฌ # hadoop dfsadmin โ€“setSpaceQuota 10737418240 /user/joel # hadoop fs โ€“count โ€“q /user/joel HDFS ์ฟผํ„ฐ / --data/ ----user-activity/ ----syslog/ ----purchase/ # ์ฟผํ„ฐ ์ œํ•œ ์—†์Œ --group/ ----ps/ #์ฟผํ„ฐ 100TB --users/ #์‚ฌ์šฉ์ž ์ฟผํ„ฐ 1TB ----joel/ ----ryan/ ----simon/
  • 46.
    46 9/27/2013 MapReduce Scheduler โ€ขFIFO ์Šค์ผ€์ค„๋Ÿฌ > ๋จผ์ € ์˜ค๋ฉด ๋จผ์ € ์ฒ˜๋ฆฌ > ๋…์  ๋ฌธ์ œ ๋ฐœ์ƒ > ์šฐ์„ ์ˆœ์œ„ ์ง€์› โ€“ ๋งค์šฐ ๋‚ฎ์Œ < ๋‚ฎ์Œ < ๋ณดํ†ต < ๋†’์Œ < ๋งค์šฐ ๋†’์Œ > ์†Œํ˜•, ์‹คํ—˜์šฉ, ๊ฐœ๋ฐœ ํด๋Ÿฌ์Šคํ„ฐ์—๋งŒ ์ ์šฉ โ€ข Fair ์Šค์ผ€์ค„๋Ÿฌ > ํ์— ์ œ์ถœ๋œ ์žก์€ ํ’€ ์ค‘ ํ•˜๋‚˜์— ๋ฐฐ์น˜ > ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰ > ๋‹ค๋ฅธ ํ’€์˜ ํ˜„์žฌ ์š”์ฒญ, ์ตœ์†Œ ๋ณด์žฅ ์Šฌ๋กฏ, ๊ธฐ์šฉ ์Šฌ๋กฏ ์šฉ๋Ÿ‰ ๋“ฑ์œผ๋กœ ํƒœ์Šคํฌ ์Šฌ๋กฏ ์ˆ˜ ๊ฒฐ์ • โ€ข Capacity ์Šค์ผ€์ค„๋Ÿฌ > ์ด ํด๋Ÿฌ์Šคํ„ฐ ์Šฌ๋กฏ ์ˆ˜๋ฅผ ๊ฐ€์ง„ ๋‹ค์ˆ˜์˜ ํ๋ฅผ ์„ค์ • > ํ์— ์š”์ฒญ์ด ์žˆ์„ ๋•Œ๋งŒ ์˜ˆ์•ฝ
  • 47.
    47 9/27/2013 Fair ์Šค์ผ€์ค„๋Ÿฌ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ค๋ช… ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„ ๋ฐฐ ์‹ค์ œ๋ถ„ ๋ฐฐ ์กฐ์—˜ 20 0 20 ๋ผ์ด์–ธ 40 0 40 ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80 ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„ ๋ฐฐ ์‹ค์ œ๋ถ„ ๋ฐฐ ์กฐ์—˜ 20 0 20 ๋ผ์ด์–ธ 40 0 30 ์‹ธ์ด๋จผ 120 0 30 ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80 ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„ ๋ฐฐ ์‹ค์ œ๋ถ„ ๋ฐฐ ์กฐ์—˜ 40 0 25 ๋ผ์ด์–ธ 30 0 25 ์‹ธ์ด๋จผ 30 50 30 ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80 ํ’€ ์š”์ฒญ ์ตœ์†Œ๋ถ„ ๋ฐฐ ์‹ค์ œ๋ถ„ ๋ฐฐ ์กฐ์—˜ 40 0 15 ๋ผ์ด์–ธ 30 0 15 ์‹ธ์ด๋จผ 60 50 50 ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80 ํ’€ ์š”์ฒญ ๊ฐ€์ค‘์น˜ ์‹ค์ œ๋ถ„ ๋ฐฐ ์กฐ์—˜ 80 1 26 ๋ผ์ด์–ธ 60 2 53 ์ด ์Šฌ๋กฏ ์šฉ๋Ÿ‰: 80
  • 48.
  • 49.
    49 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 50.
    50 9/27/2013 โ€ข ํ•˜๋‘ก์˜๋ชจ๋“  ๋ฐ๋ชฌ์€ Java ๋ฐ๋ชฌ โ€ข ์‹คํ–‰ ๊ณ„์ •์ด ์ผ๋ฐ˜ ๊ณ„์ •์œผ๋กœ ์‹คํ–‰ ํ–ˆ๋‹ค๋ฉด > # jps โ€ข ์‹คํ–‰ ๊ณ„์ •์ด ์„œ๋น„์Šคํ˜•ํƒœ ์ฆ‰, root ์‹คํ–‰ ํ–ˆ๋‹ค๋ฉด > # ps โ€“elf | grep java > # ps โ€“elf | grep โ€“i namenode โ€ฆ โ€ข ํ•˜๋‘ก์€ ํ”„๋กœ์„ธ์Šค ์‹คํ–‰ > # start-all.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ, ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ ํŠธ๋ž˜์ปค ์‹คํ–‰ > # stop-all.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ, ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ ํŠธ๋ž˜์ปค ์ •์ง€ > # start-dfs.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์‹คํ–‰ > # stop-dfs.sh // ๋„ค์ž„๋…ธ๋“œ, ๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ •์ง€ > # start-mapred.sh // ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ ํŠธ๋ž˜์ปค ์‹คํ–‰ > # stop-mapred.sh // ์žก ํŠธ๋ž˜์ปค, ํƒœ์ŠคํŠธ ํŠธ๋ž˜์ปค ์ •์ง€ > # hadoop-daemon.sh // ํ•˜๋‘ก ๋…ธ๋“œ ๊ฐœ๋ณ„์  ์‹คํ–‰ Hadoop Process ๊ด€๋ฆฌ
  • 51.
  • 52.
    52 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 53.
    53 9/27/2013 โ€ข ์„ฑ๋Šฅ๋ชจ๋‹ˆํ„ฐ๋ง > JVM, dfs, mapred > Job status, Failed Job, Task ๊ฐœ์ˆ˜ โ€ข ์ƒํƒœ ๋ชจ๋‹ˆํ„ฐ๋ง > CPU, Memory, Disk, Network Traffic > Hadoop Daemon, Hadoop Log โ€ข ์ƒ์šฉ ํˆด์„ ์ด์šฉํ•˜๋Š” ๋ฐฉ๋ฒ• > ์ƒ๋‹นํžˆ ๊ณ ๊ฐ€ โ€ข ์ผ๋ฐ˜์ ์ธ ์˜คํ”ˆ์†Œ์Šค ํ™œ์šฉ > Ganglia + Nagios + Ambari(Hadoop Ecosystem) Monitoring
  • 54.
    54 9/27/2013 โ€ข ๋ฐ์ดํ„ฐ๋ฐฑ์—… > ๋ถ„์‚ฐ ๋ณต์ œ(distcp) > ๋ณ‘๋ ฌ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์‹œ ๋ฐ”๋กœ ๋ฐ์ดํ„ฐ ์ด์ค‘ํ™” โ€“ Apache Flume ์ด์šฉํ•˜์—ฌ HDFS ์‹ฑํฌ๋กœ ํ†ตํ•ด Direct ์ €์žฅ โ€ข ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ ๋ฐฑ์—… > ๋‚ด์žฅ ์›น์„œ๋ฒ„ /getimage ์„œ๋ธ”๋ฆฟ ํ˜ธ์ถœ > getimage=1, fsimage ์ถ”์ถœ # curl โ€“o fsimage.201309 โ€˜http://centos1:50070/getimage?getimage=1โ€™ > getedit=1, edits ์ถ”์ถœ # curl โ€“o fsimage.201309 โ€˜http://centos1:50070/getimage?getedits=1โ€™ Backup and Recovery Hadoop Cluster 1 Hadoop Cluster 2 Flume sink Data source ๋™์‹œ ์ €์žฅ
  • 55.
    55 9/27/2013 โ€ข ๊ธฐ๋ณธ์‚ฌ์šฉ ๋ฐฉ๋ฒ• # hadoop distcp hdfs://centos1:50030/path/one hdfs://remote:50030/path/two > -m: ๋งคํผ์ˆ˜ ์ œ์–ด > -overwrite: ๊ธฐ์กดํŒŒ์ผ ๋ฎ์–ด์“ฐ๊ธฐ > -update: ๋ณ€๊ฒฝ๋œ ๋ถ€๋ถ„๋งŒ ๋ณต์‚ฌ > -delete: ์›๋ณธ์—๋Š” ์—†๊ณ  ๋ชฉ์ ์ง€์— ์žˆ๋Š” ํŒŒ์ผ ์‚ญ์ œ โ€ข ๊ธฐ๋ณธ ๋™์ข… ํด๋Ÿฌ์Šคํ„ฐ > hdfs:// โ€ข ์ด๊ธฐ์ข… ํด๋Ÿฌ์Šคํ„ฐ > webhdfs:// > httpfs:// โ€ข Amazon S3 ์ง€์› > s3:// ๋ถ„์‚ฐ๋ณต์ œ Distcp
  • 56.
    56 9/27/2013 1. HadoopIntroduction 2. Hadoop Distributed File System 3. Hadoop MapReduce 4. Hadoop Cluster Planning 5. Hadoop Installation and Configuration 6. Hadoop Security 7. Hadoop Resource Management 8. Hadoop Cluster Management 9. Hadoop Monitoring, Backup and Recovery 10. Hadoop NG; Glance at YARN CONTENTS
  • 57.
    57 9/27/2013 Hadoop 2.0 Aster6.0 Key Cap. - Graph - BSP - ADFS
  • 58.
    58 9/27/2013 YARN โ€ข ResourceManager > Job Tracker์˜ ์ž์› ๊ด€๋ฆฌ > ํด๋Ÿฌ์Šคํ„ฐ ๋ชจ๋‹ˆํ„ฐ๋ง โ€ข Node Manager > Task Tracker ์—ญํ•  โ€“ Map๊ณผ Reduce ๊ด€๋ฆฌ > ์‹ค์งˆ์  MapReduce ์ˆ˜ํ–‰ โ€ข Application Master > ๋‹จ์ผ Job Tracker๊ฐ€ ์•„๋‹˜

Editor's Notes

  • #7ย Velocity: Batch, Near Real-time, Real-time, StreamVolume: Terabytes, Records, Transaction, Table, fileVariety: ์ •ํ˜•, ๋ฐ˜์ •ํ˜•, ๋น„์ •ํ˜•๋ชจ๋“  ๋ฐ์ดํ„ฐComplex: IT ์œตํ•ฉ
  • #8ย ๋งต๊ณผ ๋ฆฌ๋“€์Šค๋Š” LISP ์ฝ”๋“œ์—์„œ ์œ ๋ž˜,FunctionalityProgramming์˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ชจ๋ธ,Map๊ณผ Reduce๊ทธ๋ž˜์„œ ์–ด๋–ค ์–ธ์–ด๋กœ๋„ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ,Functional Language์— ์ ํ•ฉ์ž๋ฐ”๋Š” ๋ณ€์ข…Map + Reduce๏ƒจ ๊ตฌ๊ธ€ GFS ๋ฐœํ‘œ ์ดํ›„ ๋…ผ๋ฌธ์—์„œ ๊ณต๊ฐœ ๋”๊ทธ ์ปคํŒ…์ด ์ž๋ฐ”๋กœ ๊ฐœ๋ฐœ
  • #14ย ๊ฐœ์ธ์ ์œผ๋กœ ์ตœ์•…์˜ ์ด๋ฆ„์ด์ง€ ์•Š์•˜๋‚˜ ์‹ถ๋‹ค. ๋„ค์ž„๋…ธ๋“œ๊ฐ€ ์ฃฝ์–ด๋„ ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ ๋•Œ๋ฌธ์— ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋™์ž‘์œผ๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค๊ณ  ๋ฏฟ๊ณ  ์žˆ๋‹ค.
  • #16ย ํ•˜๋‘ก 0.15๋ถ€ํ„ฐ ์†Œ์Šค ๋ถ„์„, ์ดˆ๊ธฐ๋ฒ„์ „์—์„œ HDFSํ•˜๋‘ก 0.19๋ฒ„์ „ MapReduce ๋ฒ„์ „ ๋ถ„์„ํ•˜์ง€๋งŒ, ๋ฒ„์ „๋ณ„๋กœ ๋„ˆ๋ฌด ํž˜๋“ค์—ˆ๋‹ค.
  • #17ย fsimage์™€ edits ํŒŒ์ผ์ด ๋งค์šฐ ์ค‘์š”,fsimage๋Š” ํŒŒ์ผ์‹œ์Šคํ…œ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์Šค๋ƒ…์ƒท์ด๊ณ ,edits ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ์˜ ๋ˆ„์ ๋œ ๋ณ€๊ฒฝ ๋‚ด์šฉedits ํŒŒ์ผ์€ WAL๋กœ ์ง€์†์ ์ธ ํŒŒ์ผ ์ถ”๊ฐ€์กฐ์ž‘์„ ํ•˜๋ฏ€๋กœ I/O ์ž‘์—…์˜ ๋ถ€ํ•˜๋„ ์ ๊ณ , ์„ฑ๋Šฅ์ €ํ•ดํ•˜๋Š” ํƒ์ƒ‰๋„ ์กฐ์ž‘ ๊ฐ€๋Šฅ, ํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ์ง€๋‚ ์ˆ˜๋ก ์ปค์ง€๊ฒŒ ๋˜๋ฏ€๋กœ fsimage์— ๋ฐ˜์˜์ด๋Ÿฐ ์ผ์„ ํ•˜๊ธฐ์—๋Š” ํด๋ผ์ด์–ธํŠธ์˜ ํด๋Ÿฌ์Šคํ„ฐ ์š”์ฒญ ์ฒ˜๋ฆฌ๋„ ํ•ด์•ผ ํ•จ. ๋”ฐ๋ผ์„œ ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๊ฐ€ ํ•„์š”ํ•˜๊ฒŒ ๋จ.1 ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๋Š” ๋„ค์ž„๋…ธ๋“œ์—๊ฒŒ edits ํŒŒ์ผ์„ ๋ณด๊ด€ํ•˜๊ณ  edits.new ํŒŒ์ผ์— ๋กœ๊ทธ๋ฅผ ์“ฐ๋ผ๊ณ  ์ง€์‹œํ•œ๋‹ค.2 ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๋Š” ๋„ค์ž„๋…ธ๋“œ์˜ fsimage์™€ edits ํŒŒ์ผ์„ ๋กœ๊ฑธ ์ฒ˜|ํฌํฌ์ธํŠธ checkpoint ๋””๋ ‰ํ„ฐ๋ฆฌ๋กœ ๋ณต์‚ฌํ•œ๋‹ค.3 ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๋Š” fsimage๋ฅผ ๋กœ๋“œํ•˜๊ณ  edits๋ฅผ ์œ„์—์„œ๋ถ€ํ„ฐ ๋ฐ˜์˜ํ•˜์—ฌ ์ƒˆ๋กœ์šด ํŒŒ์ผ์„ ๋งŒ๋“  ํ›„ fsimage๋ฅผ ๋””์Šคํฌ์— ์ €์žฅํ•œ๋‹ค.4 ๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๋Š” ์ƒˆ fsimage๋ฅผ ๋„ค์ž„๋…ธ๋“œ์— ๋ณด๋‚ด๊ณ  ๊ทธ๊ฒƒ์„ ๋ฐ”๋กœ ์ ์šฉํ•˜๋„๋ก ์ง€์‹œํ•œ๋‹ค.5 ๋„ค์ž„๋…ธ๋“œ๋Š” edits.new ํŒŒ์ผ์˜ ์ด๋ฆ„์„ edits๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค
  • #18ย ๋ฆฌ๋ˆ…์Šค HA๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ์ด์œ ๋Š” ๋ฆฌ๋ˆ…์Šค-HA๋Š” ์ •์  ์ปจํ…์ธ ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ฌด์ƒํƒœ ์„œ๋น„์Šค์— ์ ํ•ฉํ•˜๊ณ ๋„ค์ž„๋…ธ๋“œ๋Š” ๊ฐ™์€ ์ƒํƒœ-๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์—๋Š” ๋งž์ง€ ์•Š๋‹ค. ๋˜ํ•œ ๊ฐ€์ƒ IP์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฆฌ๋ˆ…์Šค HA ์ ํ•ฉํ•˜์ง€ ์•Š์Œ
  • #31ย ํ”„๋กœ์„ธ์Šค๋ฅผ ์‹œ์ž‘ํ•˜๊ฑฐ๋‚˜ fork () ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ์ „์ฒด ํŽ˜์ด์ง€ ํ…Œ์ด๋ธ”์ด ๋ณต์ œ๋œ๋‹ค. ๋‹ค๋ฅธ ๋ง๋กœํ•˜๋ฉด,์ž์‹ ํ”„๋กœ์„ธ์Šค๋Š” ๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค ๋ฉ”๋ชจ๋ฆฌ์˜ ์™„์ „ํ•œ ์‚ฌ๋ณธ์„ ์ €์žฅํ•˜๋ฏ€๋กœ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋‘ ๋ฐฐ ํ•„์š”ํ•˜๋‹ค. ๋˜ํ•œ์ž์‹ ํ”„๋กœ์„ธ์Šค๊ฐ€ exec() ํž˜์ˆ˜๋ฅผํ†ตํ•ด ๋ถ€๋ชจ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ณต์‚ฌํžˆ๋Š” ์‹œ๊ฐ„๋„ ๋‚ญ๋น„๋œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ fork ()์˜ ํ˜ธ์ถœ์ด ๋„ˆ๋ฌด ๋นˆ๋ฒˆํ•˜์—ฌ ๋ถ€๋ชจ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ณต์‚ฌํ•˜์ง€ ์•Š๋Š” vfork () ํž˜์ˆ˜๊ฐ€ ํƒ„์ƒํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ๋‹ค๋ฅธ ๋ง๋กœ ํ•˜๋ฉด๏ผŒ์ž์‹ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ง์ ‘ exec() ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•  ๋•Œ๊นŒ์ง€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋ณต์‚ฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ์นจ๊ณ ๋กœ HotSpot ]VM์€ vfork () ํ•จ์ˆ˜๊ฐ€ ์•„๋‹Œ fork ()๋กœ ์ž๋ฐ”์˜ fork ์กฐ์ž‘์„ ๊ตฌํ˜„ํ–ˆ์œผ๋ฏ€๋กœ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.์ด๊ฒƒ์ด ์™œ ํžˆ๋‘ก์— ๋ฌธ์ œ๊ฐ€ ๋ ๊นŒ? ํ•˜๋‘ก ์ŠคํŠธ๋ฆฌ๋ฐ (๋‹ค๋ฅธ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์—์„œ ๊ธฐ๋ณธ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์œผ๋กœ ๋žฉ๋ฆฌ๋“€์Šค ์žก์„ ์‹คํ–‰ํ•˜๋„๋ก ์ง€์›ํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)์€ ์‚ฌ์šฉ์ž ์ฝ”๋“œ๋ฅผ ์‹œ์ž‘ํ•  ๋•Œ์ž์‹ ํ”„๋กœ์„ธ ์Šค๋ฅผ ์ƒ์„ฑ์‹œํ‚ค๊ณ  ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ฒฝ์œ ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ๊ณ ๋ฐ›๋Š”๋‹ค. ๋”ฐ๋ผ์„œ ์ž์‹ ํƒœ์Šคํฌ๋ฅผ ์œ„ํ•ด ์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํŽผ์š”ํ•˜๋‹ค. ๋˜ํ•œ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹œ์ž‘๋  ๋•Œ ์‹œ๊ฐ„๋„ ๋‚ญ๋น„๋˜์ง€๋งŒ๏ผŒ ์šฐ๋ฆฌ๊ฐ€ ์˜ˆ์ƒํžˆ๋Š” ๋Œ€๋กœ ๋ฉ”๋ชจ๋ฆฌ๋„ ๋‘ ๋ฐฐ ์‹œ์šฉํ•œ๋‹ค. ์ด๋Ÿฐ ์ด์œ ๋กœ vm. overcommlt_memroy๋ฅผ 1 ๋กœ ์„ค์ •ํ•ด์•ผํ•˜๋ฉฐ, ์ด์™€๊ด€๋ จ๋œ vm.overcommit_ratio๋„์ •ํ™•ํžˆ ์„ค์ •ํ•ด์•ผํ•œ๋‹ค.
  • #33ย 48x 12 =576
  • #37ย โ€ข hadoop-env.shํ•˜๋‘ก ์Šคํฌ๋ฆฝํŠธ๋Š” ๋ณธbourne์…€๋กœ ์ž‘์„ฑ๋˜๋ฉฐ๏ผŒ ์ด ํŒŒ์ผ์€ ํ•˜๋‘ก์— ํ•„์š”ํ•œ JDK. JDK ๋ฐ๋ชฌ์˜ ์˜ต์…˜.P ID ํŒŒ์ผ๏ผŒ ๋กœ๊ทธํŒŒ์ผ์˜ ๋””๋ ‰ํ„ฐ๋ฆฌ ๋“ฑ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋“ค์„ ์ง€์ •ํšจ๋–ผ ์ด ๋ณ€์ˆ˜๋“ค์€ 1 20ํŽ˜์ด์ง€ โ€˜ํ™˜๊ฒฝ ๋ณ€์ˆ˜์™€ ์…€ ์Šคํฌ๋ฆฝํŠธโ€™์—์„œ ์„ค์˜ํšจtcf .โ€ข core-site.xml๋ชจ๋“  ํ•˜๋‘ก ๋ฐ๋ชฌ๊ณผํด๋ผ์ด์–ธํŠธ์— ๊ด€๋ จ๋œ ๋”ฐ๋ผ๋ฏธํ„ฐ๋ฅผ ์ง€์ •ํ•œ XML ํŒŒ์ผ์ด๋‹คโ€ข hdfs-site.xmlHDFS ๋ฐ๋ชฌ๊ณผํด๋ผ์ด์–ธํŠธ์— ํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ง€์ •ํ•œ XML ํŒŒ์ผ์ด๋‹คโ€ข mapred -site.xml์—…๋ฆฌ๋“€์Šค ๋ฐ๋ชฌ๋“ค๊ณผํด๋ผ์ด์–ธํŠธ์— ํ•„์š”ํ•œ ๋”ฐ๋ผ๋ฏธํ„ฐ๋ฅผ ์ง€์ •ํ•œ XML ํŒŒ์ผ์ด๋‹ค .โ€ข log4j.properties๋ชจ๋“ ๋กœ๊ทธ ์„ค์ • ์ •๋ณด๋ฅผ ํฌํ•จํ•œ ์ง€๋ฐ” ์†์„ฑ ํŒŒ์ผ์ด๋‹ค ์ด ์†์„ฑ๋“ค์€ 1 23ํŽ˜์ด์ง€ โ€˜๋กœ๊ทธ ์„ค์ •โ€™์—์„œ ์„ค๋ช…ํšจH๊ฐ .โ€ข masters (์˜ต์„ )๋ณด์กฐ ๋„ค์ž„๋…ธ๋“œ๊ฐ€ ์‹คํ–‰๋˜๋Š” ๋จธ์‹ ๋“ค์˜ ๋ชฉ๋ก์œผ๋กœ ํ•œ ์ค„์— ํ•˜๋‚˜์”ฉ ๋“ฑ๋ก๋œ๋‹ค. ์ด ํŒŒ์ผ์€ start-*.shํ˜•์‹์˜ ๋ณด์กฐ์Šคํฌ๋ฆฝํŠธ์—์„œ์‚ฌ์šฉํšจ๋ดโ€ข slaves (์˜ต์…˜)๋ฐ์ดํ„ฐ๋…ธ๋“œlํƒœ์ŠคํฌํŠธ๋ž˜์ปค ~fQ l ์‹คํ–‰๋˜๋Š” ๋จธ์‹ ํ‹€์˜ ๋ชฉ๋ก์œผ๋กœ ํ•œ ์ค„์— ํ•˜๋‚˜์”ฉ ๋“ฑ๋กํšจฮบt์ด ํŒŒ์ผ์€ staฮท- *.shํ˜•์‹์˜ ๋ณด์กฐ ์Šคํฌ๋ฆฝํŠธ์—์„œ ์‚ฌ์šฉํšจ๋–ผ .โ€ข fair-scheduler.xml (์˜ต์…˜)์—…๋ฆฌ๋“€์Šค์˜ ํƒœ์Šคํฌ ์Šค์ผ€์ค„๋Ÿฌ ํ”Œ๋Ÿฌ๊ทธ์ธ ์ค‘ ํŽ˜์–ด ์Šค์ผ€์ค„๋ŸฌFair Scheduler์œผ| ์ž์› ํ’€๊ณผ ์„ค์ •์„ ์ •์˜ํ•œ ํŒŒ์ผ์ด๋‹คโ€ข capacity-scheduler.xml (์˜ต์…˜)์—…๋ฆฌ๋“€์Šค์˜ ํƒœ์Šคํฌ ์Šค์ผ€์ค„๋Ÿฌ ํ”Œ๋Ÿฌ๊ทธ์ธ ์ค‘ ์ผ€๋– ์‹œํ‹ฐ ์Šค์ผ€์ค„๋ŸฌCapacity Scheduler์˜ ํ์™€ ์„ค์ •์„ ์ •์˜ํ•œ ํŒŒ์ผ์ด๋‹คโ€ข dfs.include (์˜ต์…˜๏ผŒ ๊ด€์šฉ๋ช…)๋„ค์ž„๋…ธ๋“œ ์ ‘์†์„ ํ—ˆ์šฉํ•  ๋จธ์‹ ๋“ค์˜ ๋ชฉ๋ก์œผ๋กœ ํ•œ ์ค„์— ํ•˜๋‚˜์”ฉ ๋“ฑ๋กํ”tcf .โ€ข dfs.exclude(์˜ต์„ ๏ผŒ ๊ด€์šฉ๋ช… )๋„ค์ž„๋…ธ๋“œ ์ ‘์†์„ ํ—ˆ์šฉํ•˜์ง€ ์•Š์„ ๋จธ์‹ ๋“ค์˜ ๋ชฉ๋ก์œผ๋กœ ํ•œ์ค„์–ด| ํ•˜๋‚˜์”ฉ ๋“ฑ๋กํ•œ๋‹ค .โ€ข hadoop-policy.xmlํ•˜๋‘ก๊ณผ ํ†ต์‹ ํ•  ๋•Œ ํŠน์ • RPC ํ• ์ˆ˜์˜ ํ˜ธ์ถœ์„ ํ—ˆ์šฉํ•  ๊ณ„์ • ๋˜๋Š” ๊ทธ๋ฃน์„ ์ง€์ •ํ•œ XML ๋”ฐ์ผ์ด๋‹คโ€ข mapred-queue - acls.xml์—…๋ฆฌ๋“€์Šค ์žก ํ์— ์žก์˜ ์ €|์ถœ์ด ํ—ˆ์šฉ๋œ ๊ณ„์ • ๋˜๋Š” ๊ทธ๋ฃน์„ ์ง€์ •ํ•œ XML ํŒŒ์ผ์ด๋‹คโ€ข taskCtrller.cfg๋ณด์•ˆ ๋ชจ๋“œ๋กœ ์‹คํ–‰ํ•  ๋•Œ ์—…๋ฆฌ๋“€์Šค ๋ณด์กฐ ํ”„๋กœ๊ทธ๋žจ์ธ setuidํƒœ์Šคํฌ ์ปจํŠธ๋กค๋Ÿฌ์— ํ•„์š”ํ•œ ๊ฐ’์„ ์ง€์ •ํ•œ ์ž๋ฐ” ์†์„ฑํ˜•์‹์˜ํŒŒ์ผ์ด๋‹ค.
  • #40ย CPU ์ฝ”์–ด ๊ฐœ์ˆ˜๊ฐ€ 12์ด๋ฉด x 1.5๋ฅผ ์ˆ˜ํ–‰ ๏ƒจ 18๊ฐœ2/3๋Š” ๋งตํƒœ์ŠคํŠธ,1/3์€ ๋ฆฌ๋“€์Šคํƒœ์Šคํฌ