KEMBAR78
Kylin Engineering Principles | PPTX
Engineering Principles of Kylin 
October 2014 
1 
Jiang Xu
2 
Done is better than perfect!
3 
How to get product ideas?
Get ideas from real product problems 
• We get this ideas from ??? project & MicroStrategy limitation : 
– Although data is on-boarding to hadoop, how to access data is a big issue. Hive is too slow! 
– Although MicroStrategy is fast, MicroStrategy can’t handle 2+ billion records 
– Although there are lots of SQL-on-Hadoop solutions, they can’t guarantee the low latency for big query 
• Lesson learned 
– Try to get ideas from customer’s pain point 
– Always get ideas from real product problems 
4
Thinking as product instead of project 
• We think to build a generic product or platform 
– Standard: ANSQL SQL 
– Full Stack: ODBC/JDBC for BI tools integration 
– … 
• Lesson Learned: 
– When you get ideas, try to think about a product or platform 
– Product is more generic and is easy to adopt in long term 
5
Control scope to build best solution 
• Due to the time and resource limitation, we must control the scope of product 
– Focus on MOLAP instead of HOLAP 
– Focus on Tableau instead of MicroStrategy 
– Don’t support real-time 
– Don’t support full SQL 
• Try to build best solution for a “small problem” 
6
Reference the industry solution & academic papers 
• Study industry analysis report 
– Gartner 
– Forrester 
– … 
• Study existed solution 
– Google BigQuery 
– Google Dremel & PowerDrill 
– SQL-on-Hadoop (Hive, Presto, Phoenix, Druid…) 
• Study academic papers 
– Data Mining Concepts and Techniques, 3rd 
– Lost of papers on data cube, OLAP… 
7
8 
How to setup a team?
Find the right people 
• Due to the complexity of this product, we put lots of efforts to setup the team 
– Smart 
– Diligent 
– Solid CS background 
– Matching the team’s chemistry 
• Try to use your connection to find the good candidate 
– Find a very good team member by friend 
• Try to give a tough interview to find the good candidate 
– Give a 2+ hours 1:1 interview to find a good member, mostly on coding, algorithm and problem solving 
9
Assign the right tasks to the right people 
• Assign the components based on the team member’s capability and interesting 
• All member have to do the dirty work 
• All member have the opportunity to do challenge tasks. 
• People have to prove himself to take more challenge task 
10
Lead by example 
• Leader Knows Details, Leader Writes Code 
• If you want the team member to follow the engineering principle, the leader must follow it firstly. 
– For example, the test driven development, the leader must write test case firstly. 
• Lead should take nobody-wanted tasks 
– Support 
– Testing 
– Customer onboard 
– … 
11
12 
How to design a product?
Done is better than perfect 
• It’s easy to design a “perfect” system. But it’s hard to design a feasible system! 
• Due to resource limitation, we must guarantee that the design can be done by the team. 
• Don’t do everything average. Try to do one thing best! 
13
KISS – Keep it simple stupid 
• Designing a simple system is much challenge than a complex system. 
– Give simple solution to complex problem; 
– Build a system that is easy to maintain and extend over time 
• For example, Kylin has a very simple deployment architecture: just web server besides hadoop 
14
SOLID Principles - Robert C. Martin 
• SRP: Single responsibility principle 
• OCP: Open/closed principle 
• LSP: Liskov substitution principle 
• ISP: Interface segregation principle 
• DIP: Dependency inversion principle 
15
Don’t reinvent wheels 
• Try to reuse the existed open source product 
– Calcite 
– Hive 
– MapReduce 
– HBase 
– … 
• Try to reference the existed solution 
– Bias error in Hyperloglog 
• Google Hyperloglog++ 
• Facebook Presto: magic parameter 
16
80-20 Rule 
• Put 80% efforts to develop 20% most important features 
• What should be done? 
– ODBC driver 
– Analytic SQL: groups, aggregation, filter, join, projection, sub-query… 
– … 
• What shouldn’t be done? 
– BI tools 
– Full ANSI SQL 
– … 
17
Explain your design in simple words 
• If you can’t explain to your peers with simple words, there must be something wrong. 
• Challenge each other! 
• Good design is involving! 
18
Build a workable prototype 
• Paper work can’t verify your design 
• Only the workable prototype can validate your design. 
• We use 1 month to build a workable prototype 
– SQL is parsed by hand-written ANTLR 
– Cube is built by simple map-reduce scripts 
19
20 
How to develop a product?
Automate Test 
• Auto integration testing >> auto unit test 
– No mock! 
– Test on live system 
– Each case cover one user case 
• 1+ auto test for each feature & 1+ auto test for each bug fix 
• Reusing a golden-standard test sample will simplify the test cases building 
• Automate everything 
– Compare SQL result between H2 and Hadoop 
21
Code Review - Simple is Beautiful 
• Code is clear to read and easy to change 
• If I have problem understanding your code, FIX it! 
– One class has > 1 responsibilities 
– Code looks complex 
– Not easy to do enhancement 
– Duplicate logic 
– Package organization looks messy 
– … 
22
Code Review – Buddy Programming 
• Can Code Review find Bugs ??? – NO !!! 
• How can we find Bugs 
– Testing as a customer with vertical use case 
– You write first version, I write second version 
– Each component has 2+ owner 
23
Continues Code Refactoring 
• If other people have problem understanding your code, REFACTOR it! 
• Comprehensive auto test suite make refactor much easy 
24
DevOps – Develop For Operation 
• Logging every important information 
• Export every important metrics 
• Easy to trouble shooting 
• Easy to monitor 
• One-liner installation 
25
Performance Tuning - Question Everything 
• System Level 
– CPU, Memory 
• JVM Level 
– GC: Calcite generate code and use up perm generation that trigger full GC 
– Java Profile to question yourself every hotspot 
– Remove hotspot One by One 
• Hadoop 
– Data Skew 
– MapReduce Job Tuning 
– … 
• Algorithm 
– Hyperloglog 
26
Open Source Adoption 
• Open source software is budget-free, but isn’t bug-free 
• We fix lots of bugs 
– Calcite 
– Trev4j 
– Hyperloglog 
– … 
27
28 
How to on-board a product?
Customer is 1st priority 
• Work with customer closely 
– Help customer to design cube 
– Refine requirements to reduce complexity 
• because we make impossible become possible 
• Fix bug quickly 
– Fix product bus is more important than feature development 
• Continues Improvement 
– Prioritize the customer requirement 
– Give a workable solution quickly, then improve it later. 
• Specific requirement vs. Generic requirement 
– Do your best to give generic solution for specific requirement 
– Say NO to very specific solution 
29
2+ Cases for Product Verification 
• To develop a good product, we need at least 2+ use case to verify and finalize our design. 
• Try to find different use cases to verify product 
– Transaction Data 
– Behavior Data 
30
Usability is Key 
• Usability is key for customer onboarding 
• Easy used GUI to hide the complexity concepts 
• … 
31
Q & A 
32

Kylin Engineering Principles

  • 1.
    Engineering Principles ofKylin October 2014 1 Jiang Xu
  • 2.
    2 Done isbetter than perfect!
  • 3.
    3 How toget product ideas?
  • 4.
    Get ideas fromreal product problems • We get this ideas from ??? project & MicroStrategy limitation : – Although data is on-boarding to hadoop, how to access data is a big issue. Hive is too slow! – Although MicroStrategy is fast, MicroStrategy can’t handle 2+ billion records – Although there are lots of SQL-on-Hadoop solutions, they can’t guarantee the low latency for big query • Lesson learned – Try to get ideas from customer’s pain point – Always get ideas from real product problems 4
  • 5.
    Thinking as productinstead of project • We think to build a generic product or platform – Standard: ANSQL SQL – Full Stack: ODBC/JDBC for BI tools integration – … • Lesson Learned: – When you get ideas, try to think about a product or platform – Product is more generic and is easy to adopt in long term 5
  • 6.
    Control scope tobuild best solution • Due to the time and resource limitation, we must control the scope of product – Focus on MOLAP instead of HOLAP – Focus on Tableau instead of MicroStrategy – Don’t support real-time – Don’t support full SQL • Try to build best solution for a “small problem” 6
  • 7.
    Reference the industrysolution & academic papers • Study industry analysis report – Gartner – Forrester – … • Study existed solution – Google BigQuery – Google Dremel & PowerDrill – SQL-on-Hadoop (Hive, Presto, Phoenix, Druid…) • Study academic papers – Data Mining Concepts and Techniques, 3rd – Lost of papers on data cube, OLAP… 7
  • 8.
    8 How tosetup a team?
  • 9.
    Find the rightpeople • Due to the complexity of this product, we put lots of efforts to setup the team – Smart – Diligent – Solid CS background – Matching the team’s chemistry • Try to use your connection to find the good candidate – Find a very good team member by friend • Try to give a tough interview to find the good candidate – Give a 2+ hours 1:1 interview to find a good member, mostly on coding, algorithm and problem solving 9
  • 10.
    Assign the righttasks to the right people • Assign the components based on the team member’s capability and interesting • All member have to do the dirty work • All member have the opportunity to do challenge tasks. • People have to prove himself to take more challenge task 10
  • 11.
    Lead by example • Leader Knows Details, Leader Writes Code • If you want the team member to follow the engineering principle, the leader must follow it firstly. – For example, the test driven development, the leader must write test case firstly. • Lead should take nobody-wanted tasks – Support – Testing – Customer onboard – … 11
  • 12.
    12 How todesign a product?
  • 13.
    Done is betterthan perfect • It’s easy to design a “perfect” system. But it’s hard to design a feasible system! • Due to resource limitation, we must guarantee that the design can be done by the team. • Don’t do everything average. Try to do one thing best! 13
  • 14.
    KISS – Keepit simple stupid • Designing a simple system is much challenge than a complex system. – Give simple solution to complex problem; – Build a system that is easy to maintain and extend over time • For example, Kylin has a very simple deployment architecture: just web server besides hadoop 14
  • 15.
    SOLID Principles -Robert C. Martin • SRP: Single responsibility principle • OCP: Open/closed principle • LSP: Liskov substitution principle • ISP: Interface segregation principle • DIP: Dependency inversion principle 15
  • 16.
    Don’t reinvent wheels • Try to reuse the existed open source product – Calcite – Hive – MapReduce – HBase – … • Try to reference the existed solution – Bias error in Hyperloglog • Google Hyperloglog++ • Facebook Presto: magic parameter 16
  • 17.
    80-20 Rule •Put 80% efforts to develop 20% most important features • What should be done? – ODBC driver – Analytic SQL: groups, aggregation, filter, join, projection, sub-query… – … • What shouldn’t be done? – BI tools – Full ANSI SQL – … 17
  • 18.
    Explain your designin simple words • If you can’t explain to your peers with simple words, there must be something wrong. • Challenge each other! • Good design is involving! 18
  • 19.
    Build a workableprototype • Paper work can’t verify your design • Only the workable prototype can validate your design. • We use 1 month to build a workable prototype – SQL is parsed by hand-written ANTLR – Cube is built by simple map-reduce scripts 19
  • 20.
    20 How todevelop a product?
  • 21.
    Automate Test •Auto integration testing >> auto unit test – No mock! – Test on live system – Each case cover one user case • 1+ auto test for each feature & 1+ auto test for each bug fix • Reusing a golden-standard test sample will simplify the test cases building • Automate everything – Compare SQL result between H2 and Hadoop 21
  • 22.
    Code Review -Simple is Beautiful • Code is clear to read and easy to change • If I have problem understanding your code, FIX it! – One class has > 1 responsibilities – Code looks complex – Not easy to do enhancement – Duplicate logic – Package organization looks messy – … 22
  • 23.
    Code Review –Buddy Programming • Can Code Review find Bugs ??? – NO !!! • How can we find Bugs – Testing as a customer with vertical use case – You write first version, I write second version – Each component has 2+ owner 23
  • 24.
    Continues Code Refactoring • If other people have problem understanding your code, REFACTOR it! • Comprehensive auto test suite make refactor much easy 24
  • 25.
    DevOps – DevelopFor Operation • Logging every important information • Export every important metrics • Easy to trouble shooting • Easy to monitor • One-liner installation 25
  • 26.
    Performance Tuning -Question Everything • System Level – CPU, Memory • JVM Level – GC: Calcite generate code and use up perm generation that trigger full GC – Java Profile to question yourself every hotspot – Remove hotspot One by One • Hadoop – Data Skew – MapReduce Job Tuning – … • Algorithm – Hyperloglog 26
  • 27.
    Open Source Adoption • Open source software is budget-free, but isn’t bug-free • We fix lots of bugs – Calcite – Trev4j – Hyperloglog – … 27
  • 28.
    28 How toon-board a product?
  • 29.
    Customer is 1stpriority • Work with customer closely – Help customer to design cube – Refine requirements to reduce complexity • because we make impossible become possible • Fix bug quickly – Fix product bus is more important than feature development • Continues Improvement – Prioritize the customer requirement – Give a workable solution quickly, then improve it later. • Specific requirement vs. Generic requirement – Do your best to give generic solution for specific requirement – Say NO to very specific solution 29
  • 30.
    2+ Cases forProduct Verification • To develop a good product, we need at least 2+ use case to verify and finalize our design. • Try to find different use cases to verify product – Transaction Data – Behavior Data 30
  • 31.
    Usability is Key • Usability is key for customer onboarding • Easy used GUI to hide the complexity concepts • … 31
  • 32.
    Q & A 32