KEMBAR78
PythonとAutoML at PyConJP 2019 | PDF
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
Automated Machine Learning in Python
PyCon JP 2019
AI Lab
Python AutoML
Feature
Preprocessing
Raw Data
Feature
Selection
Feature
Model
Selection
Data Cleaning
CyberAgent AI Lab
Masashi SHIBATA
c-bata c_bata_
Python
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
1
2
3
4
Automated Feature Engineering
AutoML
Automated Hyperparameter Optimization
Automated Algorithm(Model) Selection
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 1
AutoML


Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions


Automated Hyperparameter Optimization

Hyperopt, Optuna, SMAC3, scikit-optimize, …
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions


HPO + Automated Feature Engineering

featuretools, tsfresh, boruta, …


Automated Algorithm(Model) Selection

Auto-sklearn, TPOT, H2O, auto_ml, MLBox, …
Jeff Dean, “An Overview of Google's Work on AutoML and Future Directions” , ICML 2019 

https://slideslive.com/38917182/an-overview-of-googles-work-on-automl-and-future-directions
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 2
Grid Search / Random Search
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.


: 

:
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.






Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions,

with application to active user modeling and hierarchical reinforcement learning. 2010. arXiv:1012.2599.




Jamieson, K. G. and Talwalkar, A. S.: Non-stochastic Best Arm Identification

and Hyperparameter Optimization, in AIS-TATS (2016).
10 epochs
trial #1
trial #2
trial #3
trial #4
trial #5
trial #6
trial #7
trial #8
trial #9
30 epochs 90 epochs
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina,
Moritz Hardt, Benjamin Recht, and Ameet Talwalkar.

Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934, 2018.


TPE, Asynchronous Successive Halving, Median Stopping Rule
Define-by-Run
https://github.com/pfnet/optuna
rung 0
0.088 0.056 0.035 0.027
0.495
0.122
0.150
0.788
0.093
0.115
0.238
0.106
0.104 0.058
trial 0
trial 4
trial 2
trial 6
trial 24
trial 1
trial 8
trial 5
trial 18
trial 3
trial 7
rung 1 rung 2 rung 3
rung


10 worker 









scikit-optimize


Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 3
1. Feature Preprocessing Operators.

StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn
1. Feature Preprocessing Operators.

StandardScaler, RobustScaler,
MinMaxScaler, MaxAbsScaler,
RandomizedPCA, Binarizer, and
PolynomialFeatures.

2. Feature Selection Operators:
VarianceThreshold, SelectKBest,
SelectPercentile, SelectFwe, and
Recursive Feature Elimination (RFE).
AutoML feature preprocessing
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning.

In Neural Information Processing Systems (NIPS), 2015
R. S. Olson and J. H. Moore. Tpot: A tree-based pipeline optimization

tool for automating machine learning.

In Workshop on Automatic Machine Learning, 2016
TPOT Auto-sklearn
TPOT Auto-sklearn AutoML
• featuretools: Deep feature synthesis

• tsfresh

Wrapper methods, Filter methods, Embedded methods

• scikit-learn Boruta
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning








63 

794
Time Series FeatuRE extraction based on Scalable Hypothesis tests
https://github.com/blue-yonder/tsfresh
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Guyon and A. Elisseeff. An introduction to variable and feature selection.

Journal of Machine Learning Research, 3:1157–1182, 2003.
Filter method
Wrapper method
sklearn.feature_selection.RFE(Recursive Feature Elimination), Boruta (boruta_py)
Embedded method
scikit-learn feature_importances_
Guyon and A. Elisseeff. An introduction to variable and feature selection.

Journal of Machine Learning Research, 3:1157–1182, 2003.
Feature
Preprocessing
Feature
Selection
Feature
Construction
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning
Topic 4
( )
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
• AutoML 2 

• ML 

• ML 

• 

AutoML as a CASH Problem
Combined Algorithm Selection and Hyperparameter optimization
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter.

Efficient and robust automated machine learning. In Neural Information Processing Systems (NIPS), 2015
Using Optuna for CASH Problems
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Optuna for CASH Problem
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Algorithm Selection
Optuna for CASH Problem
def objective(trial):
iris = sklearn.datasets.load_iris()
x, y = iris.data, iris.target
classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if classifier_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
classifier_obj = sklearn.svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
classifier_obj = sklearn.ensemble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=10)
score = sklearn.model_selection.cross_val_score(
classifier_obj, x, y, n_jobs=-1, cv=3)
accuracy = score.mean()
return accuracy
https://github.com/pfnet/optuna/blob/v0.16.0/examples/sklearn_simple.py
Hyperparameter optimization
• auto-sklearn

• TPOT

• h2o-3

• auto_ml (unmaintained)

• MLBox
Adithya Balaji, Alexander Allen Benchmarking Automatic Machine Learning Frameworks https://arxiv.org/pdf/1808.06492v1.pdf
AutoML
SMAC3
ChaLearn AutoML challenge 2 track
Auto-sklearn
import sklearn.metrics
import autosklearn.classification
X_train, X_test, y_train, y_test = train_test_split(…)
automl = autosklearn.classification.AutoSklearnClassifier(…)
automl.fit(X_train.copy(), y_train.copy(), dataset_name='breast_cancer')
print(automl.show_models())
predictions = automl.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
https://github.com/automl/auto-sklearn
TPOT
https://github.com/EpistasisLab/tpot
Tree-based Pipeline Optimization Tool for Automating Data Science


TPOT
https://github.com/EpistasisLab/tpot
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(…)
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')
• 20 preprocessors

• 16 feature selectors,

• 1-hot encoding, missing
value imputation,
balancing, scaling

• 17 classifiers

• pre-defined
hyperparameter spaces
• 20 preprocessors

• 12 classifiers

• pre-defined
hyperparameter spaces

• Pipeline search space:

• Flexible: combining
tree-shaped pipelines
Auto-sklearn TPOT
Automated Neural
Architecture Search
Feature
Preprocessing
Feature
Selection
Model
Selection
Parameter
Optimization
Model
Validation
Data Cleaning


THANK YOU







PythonとAutoML at PyConJP 2019