KEMBAR78
Winning Kaggle 101: Introduction to Stacking | PDF
Winning Kaggle 101:
Introduction to Stacking
Erin LeDell Ph.D.
March 2016
Introduction
• Statistician & Machine Learning Scientist at H2O.ai in
Mountain View, California, USA
• Ph.D. in Biostatistics with Designated Emphasis in
Computational Science and Engineering from 

UC Berkeley (focus on Machine Learning)
• Worked as a data scientist at several startups
Ensemble Learning
In statistics and machine learning,
ensemble methods use multiple
learning algorithms to obtain
better predictive performance
than could be obtained by any of
the constituent algorithms.


— Wikipedia (2015)
Common Types of Ensemble Methods
• Also reduces variance and increases accuracy
• Not robust against outliers or noisy data
• Flexible — can be used with any loss function
Bagging
Boosting
Stacking
• Reduces variance and increases accuracy
• Robust against outliers or noisy data
• Often used with Decision Trees (i.e. Random Forest)
• Used to ensemble a diverse group of strong learners
• Involves training a second-level machine learning
algorithm called a “metalearner” to learn the 

optimal combination of the base learners
History of Stacking
• Leo Breiman, “Stacked Regressions” (1996)
• Modified algorithm to use CV to generate level-one data
• Blended Neural Networks and GLMs (separately)
Stacked
Generalization
Stacked
Regressions
Super Learning
• David H. Wolpert, “Stacked Generalization” (1992)
• First formulation of stacking via a metalearner
• Blended Neural Networks
• Mark van der Laan et al., “Super Learner” (2007)
• Provided the theory to prove that the Super Learner is
the asymptotically optimal combination
• First R implementation in 2010
The Super Learner Algorithm
• Start with design matrix, X, and response, y
• Specify L base learners (with model params)
• Specify a metalearner (just another algorithm)
• Perform k-fold CV on each of the L learners
“Level-zero” 

data
The Super Learner Algorithm
• Collect the predicted values from k-fold CV that was
performed on each of the L base learners
• Column-bind these prediction vectors together to
form a new design matrix, Z
• Train the metalearner using Z, y
“Level-one” 

data
Super Learning vs. Parameter Tuning/Search
• A common task in machine learning is to perform model selection by
specifying a number of models with different parameters.
• An example of this is Grid Search or Random Search.
• The first phase of the Super Learner algorithm is computationally
equivalent to performing model selection via cross-validation.
• The latter phase of the Super Learner algorithm (the metalearning step)
is just training another single model (no CV).
• With Super Learner, your computation does not go to waste!
H2O Ensemble
Lasso GLM
Ridge GLM
Random

Forest
GBMRectifier

DNN
Maxout 

DNN
H2O Ensemble Overview
• H2O Ensemble implements the Super Learner algorithm.
• Super Learner finds the optimal combination of a
combination of a collection of base learning algorithms.
ML Tasks
Super Learner
Why
Ensembles?
• When a single algorithm does not approximate the true
prediction function well.
• Win Kaggle competitions!
• Regression
• Binary Classification
• Coming soon: Support for multi-class classification
How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/leaderboard/private
How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7229#post7229
How to Win Kaggle
https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7230#post7230
H2O Ensemble R Package
H2O Ensemble R Interface
H2O Ensemble R Interface
Live Demo!
The H2O Ensemble demo, including R code:
http://tinyurl.com/github-h2o-ensemble
The H2O Ensemble homepage on Github:
http://tinyurl.com/learn-h2o-ensemble
New H2O Ensemble features!
h2o.stack
Early access to a new H2O Ensemble function:
h2o.stack
http://tinyurl.com/h2o-stacking
ML@Berkeley Exclusive!!
Where to learn more?
• H2O Online Training (free): http://learn.h2o.ai
• H2O Slidedecks: http://www.slideshare.net/0xdata
• H2O Video Presentations: https://www.youtube.com/user/0xdata
• H2O Community Events & Meetups: http://h2o.ai/events
• Machine Learning & Data Science courses: http://coursebuffet.com
Thank you!
@ledell on Github, Twitter
erin@h2o.ai
http://www.stat.berkeley.edu/~ledell

Winning Kaggle 101: Introduction to Stacking

  • 1.
    Winning Kaggle 101: Introductionto Stacking Erin LeDell Ph.D. March 2016
  • 2.
    Introduction • Statistician &Machine Learning Scientist at H2O.ai in Mountain View, California, USA • Ph.D. in Biostatistics with Designated Emphasis in Computational Science and Engineering from 
 UC Berkeley (focus on Machine Learning) • Worked as a data scientist at several startups
  • 3.
    Ensemble Learning In statisticsand machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained by any of the constituent algorithms. 
 — Wikipedia (2015)
  • 4.
    Common Types ofEnsemble Methods • Also reduces variance and increases accuracy • Not robust against outliers or noisy data • Flexible — can be used with any loss function Bagging Boosting Stacking • Reduces variance and increases accuracy • Robust against outliers or noisy data • Often used with Decision Trees (i.e. Random Forest) • Used to ensemble a diverse group of strong learners • Involves training a second-level machine learning algorithm called a “metalearner” to learn the 
 optimal combination of the base learners
  • 5.
    History of Stacking •Leo Breiman, “Stacked Regressions” (1996) • Modified algorithm to use CV to generate level-one data • Blended Neural Networks and GLMs (separately) Stacked Generalization Stacked Regressions Super Learning • David H. Wolpert, “Stacked Generalization” (1992) • First formulation of stacking via a metalearner • Blended Neural Networks • Mark van der Laan et al., “Super Learner” (2007) • Provided the theory to prove that the Super Learner is the asymptotically optimal combination • First R implementation in 2010
  • 6.
    The Super LearnerAlgorithm • Start with design matrix, X, and response, y • Specify L base learners (with model params) • Specify a metalearner (just another algorithm) • Perform k-fold CV on each of the L learners “Level-zero” 
 data
  • 7.
    The Super LearnerAlgorithm • Collect the predicted values from k-fold CV that was performed on each of the L base learners • Column-bind these prediction vectors together to form a new design matrix, Z • Train the metalearner using Z, y “Level-one” 
 data
  • 8.
    Super Learning vs.Parameter Tuning/Search • A common task in machine learning is to perform model selection by specifying a number of models with different parameters. • An example of this is Grid Search or Random Search. • The first phase of the Super Learner algorithm is computationally equivalent to performing model selection via cross-validation. • The latter phase of the Super Learner algorithm (the metalearning step) is just training another single model (no CV). • With Super Learner, your computation does not go to waste!
  • 9.
    H2O Ensemble Lasso GLM RidgeGLM Random
 Forest GBMRectifier
 DNN Maxout 
 DNN
  • 10.
    H2O Ensemble Overview •H2O Ensemble implements the Super Learner algorithm. • Super Learner finds the optimal combination of a combination of a collection of base learning algorithms. ML Tasks Super Learner Why Ensembles? • When a single algorithm does not approximate the true prediction function well. • Win Kaggle competitions! • Regression • Binary Classification • Coming soon: Support for multi-class classification
  • 11.
    How to WinKaggle https://www.kaggle.com/c/GiveMeSomeCredit/leaderboard/private
  • 12.
    How to WinKaggle https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7229#post7229
  • 13.
    How to WinKaggle https://www.kaggle.com/c/GiveMeSomeCredit/forums/t/1166/congratulations-to-the-winners/7230#post7230
  • 14.
  • 15.
    H2O Ensemble RInterface
  • 16.
    H2O Ensemble RInterface
  • 17.
    Live Demo! The H2OEnsemble demo, including R code: http://tinyurl.com/github-h2o-ensemble The H2O Ensemble homepage on Github: http://tinyurl.com/learn-h2o-ensemble
  • 18.
  • 19.
    h2o.stack Early access toa new H2O Ensemble function: h2o.stack http://tinyurl.com/h2o-stacking ML@Berkeley Exclusive!!
  • 20.
    Where to learnmore? • H2O Online Training (free): http://learn.h2o.ai • H2O Slidedecks: http://www.slideshare.net/0xdata • H2O Video Presentations: https://www.youtube.com/user/0xdata • H2O Community Events & Meetups: http://h2o.ai/events • Machine Learning & Data Science courses: http://coursebuffet.com
  • 21.
    Thank you! @ledell onGithub, Twitter erin@h2o.ai http://www.stat.berkeley.edu/~ledell