KEMBAR78
Tajo case study bay area hug 20131105 | PDF
A case study:
Tajo on Big Telco
 
Jeong-shik Jang
System Development  Deployment
Gruter Inc, Seoul, South Korea

©2013 Gruter. All rights reserved.
 
Mobile carriers in S. Korea

2
 
Test setup
Performance test on Hive / Impala / Tajo
H/W
CPU

24 cores (Xeon 2.5 GHz, HT)

Memory

64 GB

Disks

3TB x 6 (NLSAS 7200 RPM)

Network

10G

Size

1 master + 6 data nodes

Versions:
Hadoop

cdh4.3.0

Hive

0.10.0-cdh4.3.0

Impala

impalad_version_1.1.1_RELEASE

Tajo

0.2-SNAPSHOT

Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs)
3
 
Test setup: Queries
Q1: scan using about 20 text pattern matching filters
Q2: 7 unions with joins
Q3: join
Q4: group by and order by
Q5: 30 text pattern matching filters with OR conditions, group
by, having, and order by

4
 
Results: Q1 – filter scan

• 

• 
1445.69
1400

NB:
* Tajo showed enhanced performance due to
dynamic task scheduling

1200
1000
800

895.96
789.09

Impala

600

Tajo
processing time (sec.)

400
200
0
5
 

Hive

Q1: scan using about 20 text pattern matching filters

Tajo case study bay area hug 20131105

  • 1.
    A case study: Tajoon Big Telco
  • 2.
      Jeong-shik Jang System Development Deployment Gruter Inc, Seoul, South Korea ©2013 Gruter. All rights reserved.
  • 3.
  • 4.
  • 5.
  • 6.
    Test setup Performance teston Hive / Impala / Tajo H/W CPU 24 cores (Xeon 2.5 GHz, HT) Memory 64 GB Disks 3TB x 6 (NLSAS 7200 RPM) Network 10G Size 1 master + 6 data nodes Versions: Hadoop cdh4.3.0 Hive 0.10.0-cdh4.3.0 Impala impalad_version_1.1.1_RELEASE Tajo 0.2-SNAPSHOT Data size: 1.7 TB (4.1B rows, Q1), 8 or less GB (results of Q1, rest of Qs) 3
  • 7.
  • 8.
    Test setup: Queries Q1:scan using about 20 text pattern matching filters Q2: 7 unions with joins Q3: join Q4: group by and order by Q5: 30 text pattern matching filters with OR conditions, group by, having, and order by 4
  • 9.
  • 10.
    Results: Q1 –filter scan •  •  1445.69 1400 NB: * Tajo showed enhanced performance due to dynamic task scheduling 1200 1000 800 895.96 789.09 Impala 600 Tajo processing time (sec.) 400 200 0 5
  • 11.
      Hive Q1: scan usingabout 20 text pattern matching filters
  • 12.
    Results: Q2 –unions, joins •  •  70 63.64 NB: 60 *Tajo materializing all query results to HDFS , as is the main goal *unions are processed in sequence in Tajo n ow (parallel processing is coming soon) 50 38.64 40 Impala 30 Tajo processing time (sec.) 20 10 0 6
  • 13.
  • 14.
    Results: Q3 –join •  •  101.45 NB: 100 *Tajo has an optimal selection/projection push down 80 Hive 60 Impala 40 36.81 20 0 7
  • 15.
  • 16.
    Results: Q4 –group by and sort •  •  25 24.7 20 15 Hive Impala 10 Tajo processing time (sec.) 5 0.45 0 8
  • 17.
      Q4: group byand order by 0.65
  • 18.
    Results: Q5 –filters, group by, having and sort •  •  128.78 120 100 80 Hive 60 Impala Tajo 40 20 0 9
  • 19.
      processing time (sec.) 17.03 6.03 Q6: Q5:30 Text pattern matching filters with OR conditions, group by, having, and order by resulting in smaller set of output
  • 20.
    Results: Wrap up Theproject is underway; more findings expected in the future Performance enhancement thanks to dynamic task scheduling : some results showed better performan ce than Impala, despite Tajo materializing every qu ery result to HDFS, the project still being in its earl y stages, and Tajo still being an early build. 10
  • 21.
  • 22.
    GRUTER: YOUR PARTNER INTHE BIG DATA REVOLUTION Phone Fax +82-70-8129-2950 +82-70-8129-2952 E-mail Web contact@gruter.com www.gruter.com Gruter, Inc. 5F Sehwa Office Building 889-70 Daechi-dong, Gangnam-gu, Seoul, South Korea 135-83 9 Jeong-shik Jang: jsjang@gruter.com ©2013 Gruter. All rights reserved.
  • 23.