Tajo case study bay area hug 20131105

1.
A case study: Tajoon Big Telco

2.
Jeong-shik Jang System Development Deployment Gruter Inc, Seoul, South Korea ©2013 Gruter. All rights reserved.

4.
Mobile carriers inS. Korea 2

6.
Test setup Performance teston Hive / Impala / Tajo H/W CPU 24 cores (Xeon 2.5 GHz, HT) Memory 64 GB Disks 3TB x 6 (NLSAS 7200 RPM) Network 10G Size 1 master + 6 data nodes Versions: Hadoop cdh4.3.0 Hive 0.10.0-cdh4.3.0 Impala impalad_version_1.1.1_RELEASE Tajo 0.2-SNAPSHOT Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs) 3

8.
Test setup: Queries Q1:scan using about 20 text pattern matching filters Q2: 7 unions with joins Q3: join Q4: group by and order by Q5: 30 text pattern matching filters with OR conditions, group by, having, and order by 4

10.
Results: Q1 –filter scan •  •  1445.69 1400 NB: * Tajo showed enhanced performance due to dynamic task scheduling 1200 1000 800 895.96 789.09 Impala 600 Tajo processing time (sec.) 400 200 0 5

11.
Hive Q1: scan usingabout 20 text pattern matching filters

12.
Results: Q2 –unions, joins •  •  70 63.64 NB: 60 *Tajo materializing all query results to HDFS , as is the main goal *unions are processed in sequence in Tajo n ow (parallel processing is coming soon) 50 38.64 40 Impala 30 Tajo processing time (sec.) 20 10 0 6

13.
Hive 9.11 Q2: 7 unionswith joins

14.
Results: Q3 –join •  •  101.45 NB: 100 *Tajo has an optimal selection/projection push down 80 Hive 60 Impala 40 36.81 20 0 7

15.
Q3: join 31.92 Tajo processing time(sec.)

16.
Results: Q4 –group by and sort •  •  25 24.7 20 15 Hive Impala 10 Tajo processing time (sec.) 5 0.45 0 8

17.
Q4: group byand order by 0.65

18.
Results: Q5 –filters, group by, having and sort •  •  128.78 120 100 80 Hive 60 Impala Tajo 40 20 0 9

19.
processing time (sec.) 17.03 6.03 Q6: Q5:30 Text pattern matching filters with OR conditions, group by, having, and order by resulting in smaller set of output

20.
Results: Wrap up Theproject is underway; more findings expected in the future Performance enhancement thanks to dynamic task scheduling : some results showed better performan ce than Impala, despite Tajo materializing every qu ery result to HDFS, the project still being in its earl y stages, and Tajo still being an early build. 10

22.
GRUTER: YOUR PARTNER INTHE BIG DATA REVOLUTION Phone Fax +82-70-8129-2950 +82-70-8129-2952 E-mail Web contact@gruter.com www.gruter.com Gruter, Inc. 5F Sehwa Office Building 889-70 Daechi-dong, Gangnam-gu, Seoul, South Korea 135-83 9 Jeong-shik Jang: jsjang@gruter.com ©2013 Gruter. All rights reserved.

Tajo case study bay area hug 20131105

More Related Content

What's hot

Viewers also liked

Similar to Tajo case study bay area hug 20131105

More from Gruter

Recently uploaded

Tajo case study bay area hug 20131105