Building the Right Platform Architecture for Hadoop

Building the right Platform
Architecture for Hadoop
ROMMEL GARCIA
HORTONWORKS

#whoami
▸ Sr. Solutions Engineer & Platform Architect @hortonworks
▸ Global Security SME Lead @hortonworks
▸ Author of “Virtualizing Hadoop: How to Install, Deploy,
and Optimize Hadoop in a Virtualized Architecture”
▸ Runs Atlanta Hadoop User Group

Overview
The Elephant keeps getting bigger

THE TWO FACE OF HADOOP
Hadoop Cluster
Master Nodes Worker Nodes
Manage cluster state
Manage data & job state
Manage CPU resources
Manage RAM resources
Manage I/O resources
Manage Network resources
RUN
W/
YARN

PLATFORM FOCUS
▸ Performance (SLA)
▸ Scale (Bursts)
▸ Speed (RT)
▸ Throughput (data/time, compute/time)
▸ Resiliency (HA)
▸ Graceful Degradation (Throttling/Failure Mgt.)

NETWORK IS EVERYTHING!
W
N
E
S

Workloads
Your workload no longer affects me

HADOOP AND THE WORLD OF WORKLOADS
Workloads (YARN)
Hadoop Cluster
Deployment
OLAP OLTP
DATA
SCIENCE STREAMING
STRUCTURED UNSTRUCTUREDData (HDFS)

HADOOP WORKLOAD MANAGEMENT
Hadoop Cluster
YARN
PHYSICAL MEMORY/CPUqueues
containers
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN

DEFAULT GRID SETTINGS FOR ALL WORKLOADS
▸ ext4 or XFS for Worker Nodes ENFORCED
▸ Transparent Huge Pages (THP) Compaction OFF
▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF
▸ Jumbo Frames ENABLED
▸ IO Scheduling “deadline” ENABLED
▸ Limiting “processes” and “ﬁles” ENFORCED
▸ Name Service Cache ENABLED

OLAP
Give back my precious SQL !

HIVE/SPARK SQL WORKLOAD BEHAVIOR
▸ Hive/Spark SQL workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Large data set, consumes lots of memory, fair cpu usage
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ Typically Memory bound ﬁrst, then I/O

HIVE/SPARK SQL WORKLOAD PARALLELISM
▸ Hive
▸ Hive auto-parallelism ENABLED
▸ Hive on Tez ENABLED
▸ Reuse Tez Session ENABLED
▸ ORC for Hive ENFORCED
▸ Spark
▸ Repartition Spark SQL RDD ENFORCED
▸ 2GB to 4GB YARN container size

HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
Cloud
All Nodes are the same
▸ Typically more nodes vs bare
metal
▸ Storage (AWS/Azure)
▸ Batch - S3/Blob/ADLS
▸ Interactive - EBS/Premium
▸ Near-realtime - Local/Local
▸ vm types (AWS/Azure)
▸ >=m4.4xlarge (EBS) / >= A9
(Blob/Premium)
▸ >=i2.4xlarge (Local) / >=
DS14 (Premium/Local)
▸ >=d2.2xlarge (Local) / >=
DS14_v2 (Premium/Local)
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node
▸ 6 vCPU Cores
▸ 32 GB vRAM
▸ 1TB SAN/NAS
▸ Storage (data)
▸ SAN/NAS
▸ Appliance
▸ NetApp
▸ Isilon
▸ vBlock

HBASE/SOLR WORKLOADS BEHAVIOR
▸ HBase & Solr workload
▸ Realtime, in sub-seconds SLA
▸ Large data set
▸ Millions to Billions of hits per day
▸ HBase
▸ HBase for GET/PUT/SCAN
▸ Typically IO bound ﬁrst, then memory for HBase
▸ Solr
▸ Solr for Search
▸ Typically CPU bound ﬁrst, then memory for Solr

HBASE WORKLOAD PARALLELISM
▸ Use HBase Java API/Phoenix, not ThriftServer
▸ HBase “short circuit” reads ENABLED
▸ HBase BucketCache ENABLED
▸ Manage HBase Compaction for block locality
▸ HBase Read HA ENABLED
▸ Properly design “rowkey” and ColumnFamily
▸ Properly design RPC Handlers
▸ Minimize GC pauses

HBASE WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK, HBase Master)
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
Hot-spare
Worker Node
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
Cloud
▸ Typically more nodes vs
bare metal
▸ Local
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
▸ 48 GB vRAM
▸ 1TB SAN/NAS
Worker Node w/ DAS
▸ 256 GB RAM
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD

SOLR WORKLOAD PARALLELISM
▸ # of Shards I/O operations Speed of disk
▸ # of threads Indexing Speed
▸ Properly manage RAM Buffer Size
▸ Merge Factor = 25 (indexing), 2 (search)
▸ Use large disk cache
▸ Minimize GC pauses

SOLR WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK)
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
Worker Node
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
Cloud
bare metal
▸ Local
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
▸ 48 GB vRAM
▸ 1TB SAN/NAS
Worker Node w/ DAS
▸ 256 GB RAM
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD

Data Science
But Captain, one does not simply warp into Mordor

DATA SCIENCE WORKLOAD BEHAVIOR
▸ Spark ML workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Runs “Monte Carlo” type of processing
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ CPU bound ﬁrst, then memory

SPARK ML WORKLOAD PARALLELISM
▸ YARN ENABLED
▸ Cache data
▸ Serialized/Raw for fast access/processing
▸ Off-heap, slower processing
▸ Use Checkpointing
▸ Use Broadcast Variables to minimize network trafﬁc
▸ Minimize GC by limiting object creation, use Builders
▸ executor-memory = between 8GB to 64GB
▸ executor-cores = between 2 to 4
▸ num-executors = w/ caching, datasize * 2 as total app memory

SPARK ML WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 64 GB RAM
Hot-spare
Worker Node
▸ 256 - 512 GB RAM
Cloud
bare metal
▸ Local
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
▸ 48 GB vRAM
▸ 1TB SAN/NAS
Worker Node w/ DAS
▸ 256 GB RAM
▸ 12-24 x 4TB SATA/SAS/NL-SAS

Streaming
Stream me in scotty !

NIFI/STORM WORKLOAD BEHAVIOR
▸ NiFi & Storm Streaming workloads
▸ Always “ON” data ingest and processing
▸ Guarantees data delivery
▸ Highly distributed
▸ simple event processing -> complex event processing

NIFI WORKLOAD PARALLELISM
▸ Network bound first, then memory or cpu
▸ Go granular on NiFi processors
▸ nifi.bored.yield.duration=10
▸ RAID 10 for repositories: Content, FlowFile and
Provenance
▸ nifi.queue.swap.threshold=20000
▸ No sharing of disk across repositories, use RAID 10

NIFI WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
Worker Node
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
RAID 10
Cloud
bare metal
▸ Local/EBS/Premium
▸ >= m4.4xlarge / >= A9
Virtualized On-prem
▸ 16 GB vRAM
▸ 1TB SAN/NAS
Worker Node w/ DAS
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
RAID 10

STORM WORKLOAD PARALLELISM
▸ CPU bound ﬁrst, then memory
▸ Keep Tuple processing light, quick execution of execute()
▸ Use Google’s Guava for bolt-local caches (mem <1GB)
▸ Shared caches across bolts: HBase, Phoenix, Redis or
MemcacheD
▸ # of workers memory

STORM WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 4 x 1TB SSD RAID 10 plus
1 Hot-spare
Worker Node
▸ 256 GB RAM
Cloud
bare metal
▸ Local/EBS/Premium
▸ >= r3.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare
Metal
▸ 24 GB vRAM
▸ 1TB SAN/NAS
Worker Node w/ DAS
▸ 256 GB RAM
▸ 1TB SAN/NAS

http://hortonworks.com/products/sandbox/

THANK YOU!
@rommelgarcia
/in/rommelgarcia
#hortonworks

Building the Right Platform Architecture for Hadoop

More Related Content

What's hot

Viewers also liked

Similar to Building the Right Platform Architecture for Hadoop

More from All Things Open

Recently uploaded

Building the Right Platform Architecture for Hadoop