KEMBAR78
Building the Right Platform Architecture for Hadoop | PDF
Building the right Platform
Architecture for Hadoop
ROMMEL GARCIA
HORTONWORKS
#whoami
▸ Sr. Solutions Engineer & Platform Architect @hortonworks
▸ Global Security SME Lead @hortonworks
▸ Author of “Virtualizing Hadoop: How to Install, Deploy,
and Optimize Hadoop in a Virtualized Architecture”
▸ Runs Atlanta Hadoop User Group
Overview
The Elephant keeps getting bigger
THE HADOOP ECOSYSTEM
THE TWO FACE OF HADOOP
Hadoop Cluster
Master Nodes Worker Nodes
Manage cluster state
Manage data & job state
Manage CPU resources
Manage RAM resources
Manage I/O resources
Manage Network resources
RUN
W/
YARN
PLATFORM FOCUS
▸ Performance (SLA)
▸ Scale (Bursts)
▸ Speed (RT)
▸ Throughput (data/time, compute/time)
▸ Resiliency (HA)
▸ Graceful Degradation (Throttling/Failure Mgt.)
Network
It’s not Pyramidal
NETWORK IS EVERYTHING!
W
N
E
S
Workloads
Your workload no longer affects me
HADOOP AND THE WORLD OF WORKLOADS
Workloads (YARN)
Hadoop Cluster
Deployment
OLAP OLTP
DATA
SCIENCE STREAMING
STRUCTURED UNSTRUCTUREDData (HDFS)
HADOOP WORKLOAD MANAGEMENT
Hadoop Cluster
YARN
PHYSICAL MEMORY/CPUqueues
containers
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
DEFAULT GRID SETTINGS FOR ALL WORKLOADS
▸ ext4 or XFS for Worker Nodes ENFORCED
▸ Transparent Huge Pages (THP) Compaction OFF
▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF
▸ Jumbo Frames ENABLED
▸ IO Scheduling “deadline” ENABLED
▸ Limiting “processes” and “files” ENFORCED
▸ Name Service Cache ENABLED
OLAP
Give back my precious SQL !
HIVE/SPARK SQL WORKLOAD BEHAVIOR
▸ Hive/Spark SQL workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Large data set, consumes lots of memory, fair cpu usage
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ Typically Memory bound first, then I/O
HIVE/SPARK SQL WORKLOAD PARALLELISM
▸ Hive
▸ Hive auto-parallelism ENABLED
▸ Hive on Tez ENABLED
▸ Reuse Tez Session ENABLED
▸ ORC for Hive ENFORCED
▸ Spark
▸ Repartition Spark SQL RDD ENFORCED
▸ 2GB to 4GB YARN container size
HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare
metal
▸ Storage (AWS/Azure)
▸ Batch - S3/Blob/ADLS
▸ Interactive - EBS/Premium
▸ Near-realtime - Local/Local
▸ vm types (AWS/Azure)
▸ >=m4.4xlarge (EBS) / >= A9
(Blob/Premium)
▸ >=i2.4xlarge (Local) / >=
DS14 (Premium/Local)
▸ >=d2.2xlarge (Local) / >=
DS14_v2 (Premium/Local)
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node
▸ 6 vCPU Cores
▸ 32 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
▸ Storage (data)
▸ SAN/NAS
▸ Appliance
▸ NetApp
▸ Isilon
▸ vBlock
OLTP
Come on, OLTP is fun!
HBASE/SOLR WORKLOADS BEHAVIOR
▸ HBase & Solr workload
▸ Realtime, in sub-seconds SLA
▸ Large data set
▸ Millions to Billions of hits per day
▸ HBase
▸ HBase for GET/PUT/SCAN
▸ Typically IO bound first, then memory for HBase
▸ Solr
▸ Solr for Search
▸ Typically CPU bound first, then memory for Solr
HBASE WORKLOAD PARALLELISM
▸ Use HBase Java API/Phoenix, not ThriftServer
▸ HBase “short circuit” reads ENABLED
▸ HBase BucketCache ENABLED
▸ Manage HBase Compaction for block locality
▸ HBase Read HA ENABLED
▸ Properly design “rowkey” and ColumnFamily
▸ Properly design RPC Handlers
▸ Minimize GC pauses
HBASE WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK, HBase Master)
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
SOLR WORKLOAD PARALLELISM
▸ # of Shards I/O operations Speed of disk
▸ # of threads Indexing Speed
▸ Properly manage RAM Buffer Size
▸ Merge Factor = 25 (indexing), 2 (search)
▸ Use large disk cache
▸ Minimize GC pauses
SOLR WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK)
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Data Science
But Captain, one does not simply warp into Mordor
DATA SCIENCE WORKLOAD BEHAVIOR
▸ Spark ML workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Runs “Monte Carlo” type of processing
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ CPU bound first, then memory
SPARK ML WORKLOAD PARALLELISM
▸ YARN ENABLED
▸ Cache data
▸ Serialized/Raw for fast access/processing
▸ Off-heap, slower processing
▸ Use Checkpointing
▸ Use Broadcast Variables to minimize network traffic
▸ Minimize GC by limiting object creation, use Builders
▸ executor-memory = between 8GB to 64GB
▸ executor-cores = between 2 to 4
▸ num-executors = w/ caching, datasize * 2 as total app memory
SPARK ML WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 64 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 - 512 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Streaming
Stream me in scotty !
NIFI/STORM WORKLOAD BEHAVIOR
▸ NiFi & Storm Streaming workloads
▸ Always “ON” data ingest and processing
▸ Guarantees data delivery
▸ Highly distributed
▸ simple event processing -> complex event processing
NIFI WORKLOAD PARALLELISM
▸ Network bound first, then memory or cpu
▸ Go granular on NiFi processors
▸ nifi.bored.yield.duration=10
▸ RAID 10 for repositories: Content, FlowFile and
Provenance
▸ nifi.queue.swap.threshold=20000
▸ No sharing of disk across repositories, use RAID 10
NIFI WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS
RAID 10
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= m4.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 16 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS
RAID 10
▸ 2 x 10Gbe NIC Bonded
STORM WORKLOAD PARALLELISM
▸ CPU bound first, then memory
▸ Keep Tuple processing light, quick execution of execute()
▸ Use Google’s Guava for bolt-local caches (mem <1GB)
▸ Shared caches across bolts: HBase, Phoenix, Redis or
MemcacheD
▸ # of workers memory
STORM WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 4 x 1TB SSD RAID 10 plus
1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= r3.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare
Metal
▸ 2 x 4 vCPU Cores
▸ 24 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
http://hortonworks.com/products/sandbox/
?
THANK YOU!
@rommelgarcia
/in/rommelgarcia
#hortonworks

Building the Right Platform Architecture for Hadoop

  • 1.
    Building the rightPlatform Architecture for Hadoop ROMMEL GARCIA HORTONWORKS
  • 2.
    #whoami ▸ Sr. SolutionsEngineer & Platform Architect @hortonworks ▸ Global Security SME Lead @hortonworks ▸ Author of “Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture” ▸ Runs Atlanta Hadoop User Group
  • 3.
  • 4.
  • 5.
    THE TWO FACEOF HADOOP Hadoop Cluster Master Nodes Worker Nodes Manage cluster state Manage data & job state Manage CPU resources Manage RAM resources Manage I/O resources Manage Network resources RUN W/ YARN
  • 6.
    PLATFORM FOCUS ▸ Performance(SLA) ▸ Scale (Bursts) ▸ Speed (RT) ▸ Throughput (data/time, compute/time) ▸ Resiliency (HA) ▸ Graceful Degradation (Throttling/Failure Mgt.)
  • 7.
  • 8.
  • 9.
    Workloads Your workload nolonger affects me
  • 10.
    HADOOP AND THEWORLD OF WORKLOADS Workloads (YARN) Hadoop Cluster Deployment OLAP OLTP DATA SCIENCE STREAMING STRUCTURED UNSTRUCTUREDData (HDFS)
  • 11.
    HADOOP WORKLOAD MANAGEMENT HadoopCluster YARN PHYSICAL MEMORY/CPUqueues containers C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN
  • 12.
    DEFAULT GRID SETTINGSFOR ALL WORKLOADS ▸ ext4 or XFS for Worker Nodes ENFORCED ▸ Transparent Huge Pages (THP) Compaction OFF ▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF ▸ Jumbo Frames ENABLED ▸ IO Scheduling “deadline” ENABLED ▸ Limiting “processes” and “files” ENFORCED ▸ Name Service Cache ENABLED
  • 13.
    OLAP Give back myprecious SQL !
  • 14.
    HIVE/SPARK SQL WORKLOADBEHAVIOR ▸ Hive/Spark SQL workload, depends on YARN ▸ Near-realtime to Batch SLA ▸ Large data set, consumes lots of memory, fair cpu usage ▸ Hundreds to hundreds of thousands of analytical jobs daily ▸ Typically Memory bound first, then I/O
  • 15.
    HIVE/SPARK SQL WORKLOADPARALLELISM ▸ Hive ▸ Hive auto-parallelism ENABLED ▸ Hive on Tez ENABLED ▸ Reuse Tez Session ENABLED ▸ ORC for Hive ENFORCED ▸ Spark ▸ Repartition Spark SQL RDD ENFORCED ▸ 2GB to 4GB YARN container size
  • 16.
    HIVE/SPARK SQL WORKLOADDEPLOYMENT MODELS Bare Metal Master Node ▸ 2 x 6 CPU Cores ▸ 128 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 8 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12 x 4TB SATA/SAS/NL-SAS ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Batch - S3/Blob/ADLS ▸ Interactive - EBS/Premium ▸ Near-realtime - Local/Local ▸ vm types (AWS/Azure) ▸ >=m4.4xlarge (EBS) / >= A9 (Blob/Premium) ▸ >=i2.4xlarge (Local) / >= DS14 (Premium/Local) ▸ >=d2.2xlarge (Local) / >= DS14_v2 (Premium/Local) Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node ▸ 6 vCPU Cores ▸ 32 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe NIC Bonded ▸ Storage (data) ▸ SAN/NAS ▸ Appliance ▸ NetApp ▸ Isilon ▸ vBlock
  • 17.
  • 18.
    HBASE/SOLR WORKLOADS BEHAVIOR ▸HBase & Solr workload ▸ Realtime, in sub-seconds SLA ▸ Large data set ▸ Millions to Billions of hits per day ▸ HBase ▸ HBase for GET/PUT/SCAN ▸ Typically IO bound first, then memory for HBase ▸ Solr ▸ Solr for Search ▸ Typically CPU bound first, then memory for Solr
  • 19.
    HBASE WORKLOAD PARALLELISM ▸Use HBase Java API/Phoenix, not ThriftServer ▸ HBase “short circuit” reads ENABLED ▸ HBase BucketCache ENABLED ▸ Manage HBase Compaction for block locality ▸ HBase Read HA ENABLED ▸ Properly design “rowkey” and ColumnFamily ▸ Properly design RPC Handlers ▸ Minimize GC pauses
  • 20.
    HBASE WORKLOAD DEPLOYMENTMODEL Bare Metal Master Node (ZK, HBase Master) ▸ 2 x 6 CPU Cores ▸ 128 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 8-12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12-24 x 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local ▸ vm types (AWS/Azure) ▸ >= d2.4xlarge / >= D14_v2 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 8-12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12-24 x 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded
  • 21.
    SOLR WORKLOAD PARALLELISM ▸# of Shards I/O operations Speed of disk ▸ # of threads Indexing Speed ▸ Properly manage RAM Buffer Size ▸ Merge Factor = 25 (indexing), 2 (search) ▸ Use large disk cache ▸ Minimize GC pauses
  • 22.
    SOLR WORKLOAD DEPLOYMENTMODEL Bare Metal Master Node (ZK) ▸ 2 x 4 CPU Cores ▸ 32 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 10 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 12 x 2 - 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local ▸ vm types (AWS/Azure) ▸ >= d2.4xlarge / >= D14_v2 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 10 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12 x 2 - 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded
  • 23.
    Data Science But Captain,one does not simply warp into Mordor
  • 24.
    DATA SCIENCE WORKLOADBEHAVIOR ▸ Spark ML workload, depends on YARN ▸ Near-realtime to Batch SLA ▸ Runs “Monte Carlo” type of processing ▸ Hundreds to hundreds of thousands of analytical jobs daily ▸ CPU bound first, then memory
  • 25.
    SPARK ML WORKLOADPARALLELISM ▸ YARN ENABLED ▸ Cache data ▸ Serialized/Raw for fast access/processing ▸ Off-heap, slower processing ▸ Use Checkpointing ▸ Use Broadcast Variables to minimize network traffic ▸ Minimize GC by limiting object creation, use Builders ▸ executor-memory = between 8GB to 64GB ▸ executor-cores = between 2 to 4 ▸ num-executors = w/ caching, datasize * 2 as total app memory
  • 26.
    SPARK ML WORKLOADDEPLOYMENT MODELS Bare Metal Master Node ▸ 2 x 6 CPU Cores ▸ 64 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 12 CPU Cores ▸ 256 - 512 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12 x 4TB SATA/SAS/NL-SAS ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local ▸ vm types (AWS/Azure) ▸ >= d2.4xlarge / >= D14_v2 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 8-12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12-24 x 4TB SATA/SAS/NL-SAS ▸ 2 x 10Gbe NIC Bonded
  • 27.
  • 28.
    NIFI/STORM WORKLOAD BEHAVIOR ▸NiFi & Storm Streaming workloads ▸ Always “ON” data ingest and processing ▸ Guarantees data delivery ▸ Highly distributed ▸ simple event processing -> complex event processing
  • 29.
    NIFI WORKLOAD PARALLELISM ▸Network bound first, then memory or cpu ▸ Go granular on NiFi processors ▸ nifi.bored.yield.duration=10 ▸ RAID 10 for repositories: Content, FlowFile and Provenance ▸ nifi.queue.swap.threshold=20000 ▸ No sharing of disk across repositories, use RAID 10
  • 30.
    NIFI WORKLOAD DEPLOYMENTMODELS Bare Metal Master Node ▸ 2 x 4 CPU Cores ▸ 32 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 8 CPU Cores ▸ 128 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10 ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local/EBS/Premium ▸ vm types (AWS/Azure) ▸ >= m4.4xlarge / >= A9 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 16 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 8 CPU Cores ▸ 128 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10 ▸ 2 x 10Gbe NIC Bonded
  • 31.
    STORM WORKLOAD PARALLELISM ▸CPU bound first, then memory ▸ Keep Tuple processing light, quick execution of execute() ▸ Use Google’s Guava for bolt-local caches (mem <1GB) ▸ Shared caches across bolts: HBase, Phoenix, Redis or MemcacheD ▸ # of workers memory
  • 32.
    STORM WORKLOAD DEPLOYMENTMODELS Bare Metal Master Node ▸ 2 x 4 CPU Cores ▸ 32 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local/EBS/Premium ▸ vm types (AWS/Azure) ▸ >= r3.4xlarge / >= A9 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 24 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 12 CPU Cores ▸ 256 GB RAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe NIC Bonded
  • 33.
  • 34.
  • 35.