KEMBAR78
Hyperparameter Optimization with Hyperband Algorithm | PDF
Hyperparameter Optimization
with Hyperband Algorithm
Deep Learning Meetup Italy
● Gilberto Batres-Estrada
Senior Data Scientist @ Trell Technologies
● AIFI: Graduate teaching fellow
● Co-author: Big Data and Machine Learning
in Quantitative Investment, Wiley. (Ch on LSTM)
● MSc in Theoretical Physics, Stockholm University
● MSc in Engineering: Applied Mathematics and Statistics ,
(KTH Royal Institute of Technology) in Stockholm.
Goals for today’s talk
1. Make the training process of neural networks faster
2. Get better performance and accurate neural networks (better test error)
3. To get more time for exploring different architectures
Agenda
● Random Search for Hyper-Parameter Optimization
● Bayesian optimization
● Hyperband
● Other methods
● Implementations and examples
Random Search
Proposed by James Bergstra and Yoshua Bengio
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
Bayesian Optimization
Model the conditional probability
Where y is an evaluation metric such as test error and
is a set of hyperparameters.
Sequential Model-Based Algorithm Configuration SMAC
SMAC uses random forest to model
as a Gaussian Distribution (Hetter et al., 2011)
Tree Structured Parzen Estimator (TPE)
TPE is a non-standard Bayesian optimization algorithm based on tree-structured
Parzen density estimators (Bergstra et al., 2011)
Spearmint
Uses Gaussian Processes (GP) to model
And Performs slice sampling over GP (Sonek et al. 2012)
Hyperband
Hyperband
Successive Halving
Hyperband extends Successive Halving (Jamieson and Talwalkar, 2005) and uses it as a
subroutine
● Uniformly allocate a budget to a set of hyperparameter configurations
● Evaluate the performance of all configurations
● Throw out the worst half
● Repeat until one configuration remains
The algorithm allocates exponentially more resources to more promising configurations.
Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
Hyperband
● get_hyperparameter_configuration(n): returns a set of n i.i.d samples from some
distribution defined over the hyperparameter configuration space. Uniformly sample the hyperparameters from
a predefined space (hypercube with min and max bounds for each hyperparameter).
● run_then_return_val_loss(t, r): a function that takes a hyperparameter configuration t
and resource allocation r as input and returns the validation loss after training the configuration for the
allocated resources.
● top_k(configs, losses, k): a function that takes a set of configurations as well as their
associated losses and returns the top k performing configurations.
Hyperband: Implementation
Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
Finding the right hyperparameter configuration
Takeaways from Figure 2, more resources are needed to differentiate between the two configurations when
either:
1. The envelope functions are wider
2. The terminal losses are closer together
Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
Example from the Paper: LeNet
Example from the Paper: LeNet, Parameter Space
Experiment in the Paper
CNN used in Snoek et al. (2012) and Domhan et al. (2015)
Data-sets
● CIFAR-10 (40k, 10k, 10k)
● Rotated MNIST with Background images (MRBI)
(Larochelle et al., 2007) (10k, 2k, 50k)
● Street View House Numbers (SVHN) (600k, 6k, 26k)
Keras Tuner: Hyperparameter search
https://keras-team.github.io/keras-tuner/
Source code for Hyperband:
https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py
Other Methods: Cyclical Learning Rate
Lesley N. Smith
https://arxiv.org/pdf/1506.01186.pdf
Cyclical Learning Rate (CLR)
Torch:
Learning Rate Scheduler tf.keras
References
Gilberto Batres-Estrada
+46703387868
gilberto.batres-estrada@live.com
Repository https://github.com/gilberto-BE/deep_learning_italia
Cyclical Learning Rate: https://arxiv.org/pdf/1506.01186.pdf
Random Search: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
Keras tuner: https://keras-team.github.io/keras-tuner/
Learning Rate Scheduler: fastai (pytorch high level API) https://docs.fast.ai/callbacks.one_cycle.html
Source code for Hyperband: https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py

Hyperparameter Optimization with Hyperband Algorithm

  • 1.
    Hyperparameter Optimization with HyperbandAlgorithm Deep Learning Meetup Italy
  • 2.
    ● Gilberto Batres-Estrada SeniorData Scientist @ Trell Technologies ● AIFI: Graduate teaching fellow ● Co-author: Big Data and Machine Learning in Quantitative Investment, Wiley. (Ch on LSTM) ● MSc in Theoretical Physics, Stockholm University ● MSc in Engineering: Applied Mathematics and Statistics , (KTH Royal Institute of Technology) in Stockholm.
  • 3.
    Goals for today’stalk 1. Make the training process of neural networks faster 2. Get better performance and accurate neural networks (better test error) 3. To get more time for exploring different architectures
  • 4.
    Agenda ● Random Searchfor Hyper-Parameter Optimization ● Bayesian optimization ● Hyperband ● Other methods ● Implementations and examples
  • 5.
    Random Search Proposed byJames Bergstra and Yoshua Bengio http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
  • 6.
    Bayesian Optimization Model theconditional probability Where y is an evaluation metric such as test error and is a set of hyperparameters.
  • 7.
    Sequential Model-Based AlgorithmConfiguration SMAC SMAC uses random forest to model as a Gaussian Distribution (Hetter et al., 2011)
  • 8.
    Tree Structured ParzenEstimator (TPE) TPE is a non-standard Bayesian optimization algorithm based on tree-structured Parzen density estimators (Bergstra et al., 2011)
  • 9.
    Spearmint Uses Gaussian Processes(GP) to model And Performs slice sampling over GP (Sonek et al. 2012)
  • 10.
  • 11.
    Hyperband Successive Halving Hyperband extendsSuccessive Halving (Jamieson and Talwalkar, 2005) and uses it as a subroutine ● Uniformly allocate a budget to a set of hyperparameter configurations ● Evaluate the performance of all configurations ● Throw out the worst half ● Repeat until one configuration remains The algorithm allocates exponentially more resources to more promising configurations. Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
  • 12.
    Hyperband ● get_hyperparameter_configuration(n): returnsa set of n i.i.d samples from some distribution defined over the hyperparameter configuration space. Uniformly sample the hyperparameters from a predefined space (hypercube with min and max bounds for each hyperparameter). ● run_then_return_val_loss(t, r): a function that takes a hyperparameter configuration t and resource allocation r as input and returns the validation loss after training the configuration for the allocated resources. ● top_k(configs, losses, k): a function that takes a set of configurations as well as their associated losses and returns the top k performing configurations.
  • 13.
    Hyperband: Implementation Lisha Liet al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
  • 14.
    Finding the righthyperparameter configuration Takeaways from Figure 2, more resources are needed to differentiate between the two configurations when either: 1. The envelope functions are wider 2. The terminal losses are closer together Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
  • 15.
    Example from thePaper: LeNet
  • 16.
    Example from thePaper: LeNet, Parameter Space
  • 17.
    Experiment in thePaper CNN used in Snoek et al. (2012) and Domhan et al. (2015) Data-sets ● CIFAR-10 (40k, 10k, 10k) ● Rotated MNIST with Background images (MRBI) (Larochelle et al., 2007) (10k, 2k, 50k) ● Street View House Numbers (SVHN) (600k, 6k, 26k)
  • 18.
    Keras Tuner: Hyperparametersearch https://keras-team.github.io/keras-tuner/ Source code for Hyperband: https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py
  • 19.
    Other Methods: CyclicalLearning Rate Lesley N. Smith https://arxiv.org/pdf/1506.01186.pdf
  • 20.
  • 21.
  • 22.
    References Gilberto Batres-Estrada +46703387868 gilberto.batres-estrada@live.com Repository https://github.com/gilberto-BE/deep_learning_italia CyclicalLearning Rate: https://arxiv.org/pdf/1506.01186.pdf Random Search: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf Keras tuner: https://keras-team.github.io/keras-tuner/ Learning Rate Scheduler: fastai (pytorch high level API) https://docs.fast.ai/callbacks.one_cycle.html Source code for Hyperband: https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py