KEMBAR78
Common Problems in Hyperparameter Optimization | PDF
Common Problems in Hyperparameter
Optimization
Alexandra Johnson
@alexandraj777
What are Hyperparameters?
Hyperparameter Optimization
● Hyperparameter
tuning, model tuning,
model selection
● Finding "the best"
values for the
hyperparameters of
your model
Better Performance
● +315% accuracy boost for TensorFlow
● +49% accuracy boost for xgboost
● -41% error reduction for recommender system
#1 Trusting the Defaults
● Default values are an implicit choice
● Defaults not always appropriate for your model
● You may build a classifier that looks like this:
Default Values
#2 Using the Wrong Metric
Choosing a Metric
● Balance long-term
and short-term goals
● Question underlying
assumptions
● Example from
Microsoft
Choose Multiple Metrics
●
● Composite Metric
● Multi-metric
#3 Overfitting
Metric Generalization
● Cross validation
● Backtesting
● Regularization terms
Metric Generalization
● Cross validation
● Backtesting
● Regularization terms
Metric Generalization
● Cross validation
● Backtesting
● Regularization terms
#4 Too Few Hyperparameters
Optimize all Parameters at Once
Include Feature Parameters
Include Feature Parameters
Example: xgboost
● Optimized model
always performed
better with tuned
feature parameters
● No matter which
optimization method
#5 Hand Tuning
What is an Optimization Method?
You are not an Optimization Method
● Hand tuning is time
consuming and
expensive
● Algorithms can
quickly and cheaply
beat expert tuning
Grid Search Random Search Bayesian Optimization
Use an Algorithm
#6 Grid Search
No Grid Search
Hyper-
parameters
Model
Evaluations
2 100
3 1,000
4 10,000
5 100,000
#7 Random Search
Random Search
● Theoretically more
effective than grid
search
● Large variance in
results
● No intelligence
Use an Intelligent Method
Genetic algorithms
Bayesian optimization
Particle-based methods
Convex optimizers
Simulated annealing
To name a few...
SigOpt: Bayesian Optimization Service
Three API calls:
1. Define
hyperparameters
2. Receive suggested
hyperparameters
3. Report observed
performance
Thank You!
Intro
Ian Dewancker. SigOpt for ML: TensorFlow ConvNets on a Budget with Bayesian Optimization.
Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization.
Ian Dewancker. SigOpt for ML : Bayesian Optimization for Collaborative Filtering with MLlib.
#1 Trusting the Defaults
Keras recurrent layers documentation
#2 Using the Wrong Metric
Ron Kohavi et al. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained.
Xavier Amatriain. 10 Lessons Learning from building ML systems [Video at 19:03].
Image from PhD Comics.
See also: SigOpt in Depth: Intro to Multicriteria Optimization.
#4 Too Few Hyperparameters
Image from TensorFlow Playground.
Ian Dewancker. SigOpt for ML: Unsupervised Learning with Even Less Supervision Using Bayesian Optimization.
#5 Hand Tuning
On algorithms beating experts: Scott Clark, Ian Dewancker, and Sathish Nagappan. Deep Neural Network Optimization with SigOpt and Nervana
Cloud.
#6 Grid Search
NoGridSearch.com
References - by Section
References - by Section
#7 Random Search
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization.
Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke. A Stratified Analysis of Bayesian Optimization
Methods.
Learn More
blog.sigopt.com
sigopt.com/research

Common Problems in Hyperparameter Optimization