KEMBAR78
H2O platform workshop | PPTX
H2O Workshop 
Hassan Namarvar 
Principal Data Scientist 
Oct 8, 2014
2 
WHY USE A NEW MACHINE LEARNING TOOL? 
Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop 
RMR, native Spark MLLib have their own issues. 
Critical Features for state-of-the-art ML package: 
 Ease of use 
 System reliability 
 In-memory (fast) 
 Distributed 
 Extensible (API/SDK) 
 Accurate algorithms 
 Visualization (data and results) 
 …
3 
INTRODUCTION TO H2O PLATFORM 
H2O is the world’s fastest in-memory open source machine learning 
library. 
Important Features: 
 Open source licensed under Apache 
 Scalable in-memory processing for big data (written in Java) 
 Run on one node or multi-node cluster 
 High quality implementation of state-of-the-art ML libraries 
 H2O package for R 
 Spark+H2O = Sparkling Water
4 
WORKSHOP AGENDA 
 Download the bleeding edge version of platform! 
 Tutorial on Web API 
 Upload a real dataset into the platform 
 Build a CPA model using GLM algorithm 
 Validate the CPA Model on test set 
 Build more advanced models: 
 GBMs (Gradient Boost Models) 
 BigData Random Forest 
 Deep Learning Neural Networks 
 Model selection
5 
LET’S DO SOME HACKING! 
Download the bleeding edge version of platform from: 
http://0xdata.com/download/ 
Run locally: 
cd ~/Downloads 
unzip h2o-2.7.0.1533.zip 
cd h2o-2.7.0.1533.zip 
java –Xmx4g –jar h2o.jar 
Point your browser to: 
http://localhost:54321
6 
BUILDING A CPA MODEL 
RETARGETED VISITS AS A PROXY FOR CONVERSIONS 
USER-CENTRIC 
Focus on RT Users 
Deliver Ads at the optimal 
times 
BETTER 
PERFORMANCE 
Leverage optimization 
opportunities 
OPTIMAL TIME 
DON’T WASTE IMP. 
Target Users Who Likely 
Convert
7 
GLM MODEL 
Screen shot for the CPA model using the GLM algorithm.
8 
GBM MODEL 
Screen shot for the CPA model using the GBM algorithm.
9 
BigData Random Forest MODEL 
Screen shot for the CPA model using the RF algorithm.
10 
MODEL COMPARISON 
Comparing AUC plots of GLM, GBM and RF models on test data:
11 
LIVE TEST ON A CAR INSURANCE CAMPAIGN 
TESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA. 
The CPA test for a car Insurance campaign showed 58% improvement on 
eCPA and 57% on conversion rate (CVR).
THANK YOU!

H2O platform workshop

  • 1.
    H2O Workshop HassanNamarvar Principal Data Scientist Oct 8, 2014
  • 2.
    2 WHY USEA NEW MACHINE LEARNING TOOL? Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop RMR, native Spark MLLib have their own issues. Critical Features for state-of-the-art ML package:  Ease of use  System reliability  In-memory (fast)  Distributed  Extensible (API/SDK)  Accurate algorithms  Visualization (data and results)  …
  • 3.
    3 INTRODUCTION TOH2O PLATFORM H2O is the world’s fastest in-memory open source machine learning library. Important Features:  Open source licensed under Apache  Scalable in-memory processing for big data (written in Java)  Run on one node or multi-node cluster  High quality implementation of state-of-the-art ML libraries  H2O package for R  Spark+H2O = Sparkling Water
  • 4.
    4 WORKSHOP AGENDA  Download the bleeding edge version of platform!  Tutorial on Web API  Upload a real dataset into the platform  Build a CPA model using GLM algorithm  Validate the CPA Model on test set  Build more advanced models:  GBMs (Gradient Boost Models)  BigData Random Forest  Deep Learning Neural Networks  Model selection
  • 5.
    5 LET’S DOSOME HACKING! Download the bleeding edge version of platform from: http://0xdata.com/download/ Run locally: cd ~/Downloads unzip h2o-2.7.0.1533.zip cd h2o-2.7.0.1533.zip java –Xmx4g –jar h2o.jar Point your browser to: http://localhost:54321
  • 6.
    6 BUILDING ACPA MODEL RETARGETED VISITS AS A PROXY FOR CONVERSIONS USER-CENTRIC Focus on RT Users Deliver Ads at the optimal times BETTER PERFORMANCE Leverage optimization opportunities OPTIMAL TIME DON’T WASTE IMP. Target Users Who Likely Convert
  • 7.
    7 GLM MODEL Screen shot for the CPA model using the GLM algorithm.
  • 8.
    8 GBM MODEL Screen shot for the CPA model using the GBM algorithm.
  • 9.
    9 BigData RandomForest MODEL Screen shot for the CPA model using the RF algorithm.
  • 10.
    10 MODEL COMPARISON Comparing AUC plots of GLM, GBM and RF models on test data:
  • 11.
    11 LIVE TESTON A CAR INSURANCE CAMPAIGN TESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA. The CPA test for a car Insurance campaign showed 58% improvement on eCPA and 57% on conversion rate (CVR).
  • 12.