FLAML: A Fast Library for AutoML & Tuning
github.com/microsoft/FLAML
Chi Wang1, Qingyun Wu2, Susan Xueqing Liu3, Luis Quintanilla1
1. Microsoft
2. Penn State University
3. Stevens Institute of technology
1
Copyright © 2022, by tutorial authors
What Will You Learn
How to use FLAML to (1) find accurate ML models with low computational
resources for common ML tasks; (2) tune hyperparameters generically
How to leverage the flexible and rich Finish the last mile for deployment
customization choices Create new applications
Code examples, demos, use cases
Research & Development opportunities
2 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
3 Copyright © 2022, by tutorial authors
Overview
7 Cross-industry Disruptive Trends in Next Few Decades –
McKinsey (2021)
Applied AI Future of programming
>75% of all digital-service touch points will ~30x reduction in the working time required
see improved usability, enriched for software development and analytics
personalization, and increased conversion
Companies need to master MLOps to make the most
of both trends
5 Copyright © 2022, by tutorial authors
Machine Learning Workflow
Get the data Prepare the data Train Evaluate Deploy Inferencing
Model Training Model Consumption
6 Copyright © 2022, by tutorial authors
Lots of decisions to make to
optimize model performance
• Select learner
Get the data
• Tune hyperparameters
Prepare the data Train Evaluate Deploy Inferencing
• …
Model Training Model Consumption
7 Copyright © 2022, by tutorial authors
Lots of decisions to make:
• Select learner
• Tune hyperparameters
• …
Optimize model performance
8 Copyright © 2022, by tutorial authors
Lots of decisions to make:
• Select learner
• Tune hyperparameters
• …
n_estimators = 5
Optimize model performance n_leaves = 10
learning_rate = 0.1
…
n_estimators = 1000
n_leaves = 100
learning_rate = 0.01
…
…
9 Copyright © 2022, by tutorial authors
Lots of decisions to make:
• Select learner ML
• Tune hyperparameters
• …
Optimize model performance
10 Copyright © 2022, by tutorial authors
Lots of decisions to make:
• Select learner
• Tune hyperparameters
Get the data
• …
Prepare the data Train Evaluate Deploy Inferencing
Optimize model performance
Model Training Model Consumption
11 Copyright © 2022, by tutorial authors
AutoML
Click to add text
Get the data Prepare the data Train Evaluate Deploy Inferencing
Model Training Model Consumption
AutoML Autonomous data platform
Market Size Forecast in 2030 $14.8 Billion $4.8 Billion
12
Copyright © 2022, by tutorial authors
Benefits of AutoML
Enables and empowers novices
Standardizes the ML workflow for better reproducibility, code
maintainability, knowledge sharing
Prevents suboptimal results due to idiosyncrasies of ML Innovators
Builds models more effectively and efficiently
Enables rapid prototyping
Fosters learning
[Xin et al. 2021] Xin, D., Wu, E. Lee, Y., Salehi N., and Parameswaran, A. Whither AutoML?
Understanding the Role of Automation in Machine Learning Workflows. CHI 2021.
13 Copyright © 2022, by tutorial authors
Deficiencies of AutoML
Lacks comprehensive end-to-end
support
Causes system failures due to compute
intensive workloads
Lacks customizability
Lacks transparency and interpretability
14 Copyright © 2022, by tutorial authors
Deficiencies of AutoML
Lacks comprehensive end-to-end
support
Causes system failures due to No usable model returned
compute intensive workloads Revert to manual development
Lacks customizability
Lacks transparency and interpretability
15 Copyright © 2022, by tutorial authors
Deficiencies of AutoML
Lacks comprehensive end-to-end
support
Causes system failures due to compute
intensive workloads
Lacks customizability (Or users need to make too many hard choices)
Lacks transparency and interpretability
16 Copyright © 2022, by tutorial authors
Deficiencies of AutoML
Lacks comprehensive end-to-end Offer both AutoML and
support Tune API & extensible
Causes system failures due to Fast
compute intensive workloads Economical
Lacks customizability Smooth customization
Lacks Transparency and Interpretability
17 Copyright © 2022, by tutorial authors
Offer both AutoML and
Tune API & extensible
AutoML
Fast
Tune ML Economical
Smooth customization
18 Copyright © 2022, by tutorial authors
Python Library flaml
Task-oriented AutoML
Customization is easy
Tune user-defined function
19 Copyright © 2022, by tutorial authors
Economical Tuning & AutoML
üLeverage joint impact of multiple factors on cost & error
üHyperparameter
üLearner
üSample size
üResampling strategy: CV or Holdout
20 Copyright © 2022, by tutorial authors
Fast library for AutoML and tuning
[MLSys’21]
ChaCha Online Meta Offline
BlendSearch (local + global
(Champion- tuning learning tuning
search) [ICLR’21]
Challenger)
[ICML’21] Zero-shot [KDD-AutoML’22]
Cost-frugal optimization
[AAAI’21]
Simple and provable local search
21 Copyright © 2022, by tutorial authors
Users’ Background
Job Title
Years of Programming Experience
22 Copyright © 2022, by tutorial authors
Example Use Cases
APPLICATION WHO FIELD
AutoML tool for .NET developers (Visual Studio, ML.NET) SDE
Suspicious behavior detector Security
Credit assessment Finance
Auto finance functions (classification/regression) Finance
Hardware demand forecast Supply chain
A/B testing with automated causal inference (auto-causality) CRM
Air quality estimate Science
Pricing Insurance
Impact: Accuracy, Productivity, R&D
23 Copyright © 2022, by tutorial authors
Blog Post in Towards Data Science
24
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
25 Copyright © 2022, by tutorial authors
Overview of AutoML and Tune in FLAML
Task-oriented AutoML AutoML
Tune user-defined function Tune ML
26 Copyright © 2022, by tutorial authors
Agenda and Resources to Be Used in This Tutorial
https://github.com/microsoft/FLAML
https://github.com/microsoft/FLAML
/tree/tutorial/tutorial
27 Copyright © 2022, by tutorial authors
Task-oriented AutoML
What Is Task-oriented AutoML
Inputs:
• Resource budget
• ML task
ü Training
data
Prepare the data AutoML Train Evaluate
Deploy Inferencing
ü Task type
Model Training Model Consumption
29 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Supported Tasks
Required data format:
X: Numpy array or dataframe
y: Numpy array or series of labels in shape n*1.
Time series forecasting
tasks
NLP tasks
30 Copyright © 2022, by tutorial authors
Task-specific Built-in ML Estimators
XGBoost RandomForest CatBoost …
ssion
g re
/re
on
AutoML i c ati
assif
cl nlp Transformer
Tune ML tim
es
…
er
i es
fo
re
c as
tin
g
Prophet ARIMA …
31 Copyright © 2022, by tutorial authors
Resources to Be Used
https://github.com/microsoft/FL
AML/tree/tutorial/tutorial
32 Copyright © 2022, by tutorial authors
Task-oriented AutoML: A Basic Use Case
• Get data
• AutoML with FLAML
• AutoML Result
34 Copyright © 2022, by tutorial authors
Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
40 Copyright © 2022, by tutorial authors
Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
41 Copyright © 2022, by tutorial authors
Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
42 Copyright © 2022, by tutorial authors
Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
FLAML is better
Figure 1. Box plot of scaled score difference between FLAML and other libraries when FLAML
uses equal or smaller budget (positive difference meaning FLAML is better). [MLSys’21]
43 Copyright © 2022, by tutorial authors
Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
Rich customization choices in FLAML
Helpful in scenarios such as:
Due to business/deployment requirements, you need to use
- Special ML learner (e.g., a domain specific one), or/and
- Custom metrics, or/and
- Various constraints (e.g., in terms of computation resource,
model complexity, inference time)
44 … Copyright © 2022, by tutorial authors
Task-oriented AutoML: Overview
Empowered by our cost-frugal Hyperparameter Optimization (HPO) algorithms
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
Outputs: ML model
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( )
budget, ML task (and more)
𝐷'$( 𝐷"#$%&
Split data
45 Copyright © 2022, by tutorial authors
The Cost-Frugal HPO Problem
Validation loss under hyperparameter
configuration 𝑥.
Each time you query f(x), you
need to pay a cost of 𝑔(𝑥)
Hyperparameter configuration
• Two important properties:
o f(x) is a black-box function
o Function value evaluation is expensive
• Vanilla HPO: find 𝑥 ∗ with small number of iterations
• Cost-frugal HPO: find 𝑥 ∗ while keeping total cost ∑𝑔(𝒙𝒊 ) small
46 Copyright © 2022, by tutorial authors
Insights On The Cost-Frugal HPO Problem
• Vanilla HPO: find 𝑥 ∗ with small number of iterations
• Cost-frugal HPO: find 𝑥 ∗ while keeping total cost ∑𝑔(𝒙𝒊 ) small
1. If g(x) is constant, low cost ó small #iterations 𝑔( (# 𝑙𝑒𝑎𝑣𝑒𝑠, #𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟𝑠) )
High cost
Bayesian Optimization optimizes for this case
2. It is common to encounter cost-related
hyperparameters
(Examples: the number of estimators and leaves
in gradient boosted trees; the number of layers
and neurons in DNN)
Low cost
47 Copyright © 2022, by tutorial authors
Assumptions about Cost-related HPs
expensive
1. Lipschitz continuous
2. Easy to know low-cost
configurations before we start the
optimization
cheap
low-cost region
48 Copyright © 2022, by tutorial authors
Cost-Frugal HPO Algorithm Design
To avoid high-cost points until necessary -> Low-cost starting point
+ local search
To find low-loss points -> Follow loss-descent directions
Cannot directly use gradient-based method: no gradient available.
Surprise: function values are enough to find the right directions for
efficient convergence.
More surprise: sign comparison between function values is enough!!
49 Copyright © 2022, by tutorial authors
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
Repeat the following steps
after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
?
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.
50
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
Repeat the following steps
after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.
51
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
Repeat the following steps
after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.
52
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
Repeat the following steps
after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.
53
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
Repeat the following steps
after each move::
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.
54
Implications:
At any iteration,
• The incumbent has the best loss
• The evaluated point in the next iteration is in the neighboring area
55 of the incumbent =>the cost is not far away from the incumbent
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
• Theoretical guarantees on:
o Convergence rate
High loss
Low loss
56
Copyright © 2022, by tutorial authors
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
• Theoretical guarantees on:
o Convergence rate o The total evaluation cost
High loss High cost
Low loss Low cost
57
Copyright © 2022, by tutorial authors
Combine local search and global search
LS – low cost; may get trapped in local optima
Global search– able to explore the whole space; high cost
58 Copyright © 2022, by tutorial authors
Framework
• Economical Hyperparameter Optimization With Blended Search Strategy.
Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
59 Copyright © 2022, by tutorial authors
Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
Rich customization choices in FLAML
Helpful in scenarios such as:
Due to business/deployment requirements, you need to use
- Special ML learner (e.g., a domain specific one), or/and
- Custom metrics, or/and
- Various constraints (e.g., in terms of computation resource,
model complexity, inference time)
60 … Copyright © 2022, by tutorial authors
An Example Use Case That Needs Customization
Application domain: Security
Overall objective: Find a good model to detect suspicious behaviors.
Comments about customization (in blue).
Comments about performance (in green).
“
• I am able to use *** classifiers as the custom learner and use search space for random forest,
lr and lightgbm.
…
• It is useful for me to optimize the hyperparameters in a short time. So I appreciate that, and
to be able to customize the metrics. My job is to create detectors using models, my last few
models were optimized using flaml. It is a corporate level product.
…
• Adding 0.5% positive precision increase to "suspicious behavior detector" while adding 3.5% more
true positives (positive recall) Adding 0.8% positive precision increase to "suspicious remote behavior
detector" while adding 24% more true positives (positive recall). Contributed to ~2000 detections
weekly for both detectors above.
”
61 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Optimization Metric
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
𝐷'$( 𝐷"#$%&
Split data
“
I appreciate that I am able to customize the metrics. My job is to create
detectors using models, my last few models were optimized using flaml. ”
62 Copyright © 2022, by tutorial authors
Built-in Optimization Metrics
• metric
63 Copyright © 2022, by tutorial authors
Built-in Optimization Metrics
• metric
64 Copyright © 2022, by tutorial authors
User-defined Metric Function
• metric
65 Copyright © 2022, by tutorial authors
Optimization metric
Metrics to log
“It does allow us to build our models fast; esp. it allows us to control overfitting.
We did observe that the models we built using the packages are simpler (in terms of #trees,
depth, and #features in the models) and more robust (over time), compared to other models
we built in the past using similar tools like ***(another HPO service).
66 Copyright © 2022,
Copyright by by
© 2022, tutorial authors
tutorial authors
”
The Power of a User-defined Metric Function
“Closing the gap between the loss function we optimize in ML and the product metrics we
really want to optimize.”
--Carlos Guestrin at KDD ’19, on 4 Perspectives in Human-Centered Machine Learning
• User-defined Metric Function in FLAML:
Allow creative metrics toward the ultimate objectives
Metrics beyond typical ML For example, business objectives,
predictive performance metrics such as profit or revenue
Concrete examples:
1. A heuristic objective to control overfitting: 𝑜𝑏𝑗 = 𝐿𝑜𝑠𝑠!"# ∗ 1 + 𝛼 − 𝛼 ∗ 𝐿𝑜𝑠𝑠$%"&'
2. Integrating business optimization with a machine learning model
67 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Estimators and Search Space
“I am able to use *** classifiers as the custom learner and use search space for random
forest, lr and lightgbm.
Candidate Estimators Hyperparameter Search Space
x
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
𝐷'$( 𝐷"#$%&
Split data
68 Copyright © 2022, by tutorial authors
Estimators
estimator_list
Use built-in estimators with default search space
ü Classification/regression task:
"lgbm", "xgboost", “rf”, "extra_tree", “catboost”, “lrl2”/“lrl1”, “kneighbor”,
ü Time series forecasting task: ”prophet”, “arima”, “sarimax”
ü NLP task: “transformer”
Add custom estimators
69 Copyright © 2022, by tutorial authors
Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.
70 Copyright © 2022, by tutorial authors
Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.
71 Copyright © 2022, by tutorial authors
Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.
2. Give the custom estimator a name and add it in AutoML.
72 Copyright © 2022, by tutorial authors
Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.
2. Give the custom estimator a name and add it in AutoML.
3. Tune the newly added custom estimator depending on your
needs.
73 Copyright © 2022, by tutorial authors
Search Space
Each hyperparameter is associated
with a dict with the following fields:
• “domain”, specifies the possible
values of the hyperparameter and
their distribution.
• “init_value” (optional), which
specifies the initial value of the
hyperparameter.
• “low_cost_init_value” (optional),
which specifies the value of the
hyperparameter that is associated
with low computation cost.
74 Copyright © 2022, by tutorial authors
Search Space
Each hyperparameter is associated
with a dict with the following fields:
• “domain”, specifies the possible
values of the hyperparameter and
their distribution.
• “init_value” (optional), which
specifies the initial value of the
hyperparameter.
• “low_cost_init_value” (optional),
which specifies the value of the
hyperparameter that is associated
with low computation cost.
Cost-related hyperparameter
76 Copyright © 2022, by tutorial authors
Cost-related Hyperparameter in the Search Space
77 Copyright © 2022, by tutorial authors
Resources to Be Used in This Tutorial
https://github.com/microsoft/F
LAML/tree/tutorial/tutorial
78 Copyright © 2022, by tutorial authors
Customize the Search Spaces (of Existing Estimators)
Option 1. Create a new estimator with a revised search space.
Option 2. A shortcut to override the search space of existing estimators
via custom_hp.
79 Copyright © 2022, by tutorial authors
Customize the Search Spaces (of Existing Estimators)
Using a different search range for “n_estimators”
Disable search by setting “domain” to None
Setting a constant value
80 Copyright © 2022, by tutorial authors
Task-oriented AutoML: ML Procedure and Ensemble
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
𝐷'$( 𝐷"#$%&
Split data
eval_method: A string of resampling strategy, one of fit_kwargs: Provide additional key word arguments to
['auto', 'cv', 'holdout']. pass to fit() function of the candidate learners, such as
split_ratio, n_splits, sample_weight.
split_type: ["auto", 'stratified', 'uniform', 'time', 'group’] fit_kwargs_by_estimator: The user specified keywords
X_val, y_val arguments, grouped by estimator name.
81 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Advanced Functionalities
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name
• log_type
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials
82 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Advanced Functionalities
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning [Xin et al. 2021]
• n_concurrent_trials
83 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Logging
{"record_id": 12, "iter_per_learner": 2, "logged_metric": {"pred_time": 3.085368373117264e-07},
"trial_time": 0.056574344635009766, "wall_clock_time": 1.4277665615081787, "validation_loss":
0.40724474179893333, "config": {"n_estimators": 4, "max_leaves": 4, "learning_rate": 0.03859136192132082,
"subsample": 1.0, "colsample_bylevel": 0.8148474110627004, "colsample_bytree": 0.9777234800442423,
"reg_alpha": 0.0009765625, "reg_lambda": 5.525802807180917, "min_child_weight": 0.01199969653421202,
"FLAML_sample_size": 10000}, "learner": "xgboost", "sample_size": 10000}
84 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Logging with MLFlow
85 Copyright © 2022, by tutorial authors
Task-oriented AutoML: More Constraints
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit: Training time constraint in seconds
• pred_time_limit: Predict time constraint in seconds
• metric_constraints: A list of constraints on certain metrics
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials
86 Copyright © 2022, by tutorial authors
Constraints of 4 Different Types
1. Constraints on the AutoML process: time_budget, max_iter
2. Constraints on the constructor arguments of the estimators.
87 Copyright © 2022, by tutorial authors
Constraints of 4 Different Types
3. Constraints on the models tried in AutoML: train_time_limit, pred_time_limit
Or/and
88 Copyright © 2022, by tutorial authors
Constraints of 4 Different Types
4. Constraints on the metrics of the ML model tried in AutoML: metric_constraints
89 Copyright © 2022, by tutorial authors
Constraints of 4 Different Types
4. Constraints on the metrics of the ML model tried in AutoML: metric_constraints
90 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Warm Start
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points: Starting hyperparameter config for the estimators
• Parallel tuning
• n_concurrent_trials
91 Copyright © 2022, by tutorial authors
Warm Start
starting_points:
A dictionary of config start with or a str to specify the starting
hyperparameter config for the estimators | default="data".
If str:
- if "data", use data-dependent defaults; Will be covered
- if "data:path”, use data-dependent defaults which are stored at later in zero-
path; shot AutoML
- if "static", use data-independent defaults.
92 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Parallel Tuning
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials: The number of concurrent trials
93 Copyright © 2022, by tutorial authors
Parallel Tuning
AutoML
Sequential
Tune ML
Parallel
Ray NNI
94 Copyright © 2022, by tutorial authors
Easy Parallel Tuning Is a Desirable Feature
So I work in Research and that entire application I wrote myself over the last 9 months. My job
(and the teams) currently is to research a next state application stack for our existing products. In
particular the models out of this product *** have been implemented in FLAML (this is our own
GLM and GBM) which is exactly why we liked FLAML because it allowed us to easily extend the
estimators, something *** (an alternative HPO service) does not allow. Anyway the reason we
love FLAML and Ray is to get large scale parallel tuning, something the product team are
struggling with because it’s desktop and C# based, so to have FLAML out of the box
distribute across an auto scaled cluster removes at least 18 months work for us.
-- A S&P 500 company
95
Copyright © 2022, by tutorial authors
Parallel vs Sequential Tuning
Heads-up: Parallel tuning is not necessarily more desirable than sequential tuning.
Things to Consider:
• Different overhead and trial time.
E.g., when parallel tuning is used (ray backend is used), there will be a certain computation
overhead that is larger than sequential tuning.
• Availability of parallel resources
• Different randomness
Find more in this doc:
https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning
96
Copyright © 2022, by tutorial authors
Parallel vs Sequential Tuning
A rough estimation of the wall-clock time needed to finish N trials:
Computation overhead
Scale of parallelism
Trial time to evaluate a particular hyperparameter configuration
k(scale of parallelism) = 8,
Sequential tuning is faster
SingleTrialTime ≈ 0.3
97 OverHead (parallel tuning) ≈ 2.6 Copyright © 2022, by tutorial authors
use_ray in Sequential Tuning
use_ray: boolean or dict.
If boolean, default=False | Whether to use ray to run the training in
separate processes. This can be used to prevent OOM for large
datasets, but will incur more overhead in time.
If dict: the dict contains the keywords arguments to be passed to
ray.tune.run
Suggested scenarios to use ray backend:
1. Parallel tuning
2. Sequential tuning with potential Out Of Memory
(OOM) failure
98 Copyright © 2022, by tutorial authors
AutoML Use Cases: 1. Credit Scoring and Fraud
Detection in Financial Industry Advantages of FLAML:
Overall objective and constraints: Finding an “optimal” (gradient
boosting or deep learning) model that • Fast HPO
performs the best in out-of-time (OOT) period; • Controlled by our HPO algorithm
control model complexity (e.g., # of variables in the model, # of
trees, tree depth);
maintain model explainability; • Allow custom metric
preferably meet the following criteria: (1) do not over fit the training
data; (2) perform consistently between training/holdout/OOT; (3)
Simple and intuitive, and can pass regulatory exams.
“
• It does allow us to build our models fast; esp. it allows us to control overfitting.
• We did observe that the models we built using the packages are simpler (in terms of #trees,
depth, and #features in the models) and more robust (over time), compared to other models
we built in the past using similar tools like ***(another HPO service).
99
” Copyright © 2022, by tutorial authors
AutoML Use Cases: 2. Investment Management
• Overall objectives and constraints: Find the best model (customized
estimator) based on business requirements for deployment.
• Advantages of FLAML:
• Saved dev time
• High compatibility
“
Dev time saved 30 - 40 percent on average; for gigantic datasets the saving is even more;
regarding the performance, AUC wise the lift is about 0.1 - 0.2 point
For us, a big part of R&D is testing new algorithms released in recent publications with
internal data; thanks to the high compatibility of FLAML, we can conveniently incorporate
these new algorithms into the existing pipeline and make apples-to-apples comparisons.
” -- A large private equity firm
100
AutoML Use Cases: 3. Custom Retention and Growth
Analysis
Overall objectives and constraints: Find a good multiclass
workload classification model (mainly xgboost) based on the usage
and behavior pattern.
Advantages of FLAML:
• Fast (30 hours grid search -> minutes)
• Can find accurate models
“
the model runs faster (in minutes) and we are able to try out various sampling
techniques and additional derived attributes. Thus, we are using FLAML model
in the production classification models. It really helped to improve our team
productivity and reduced development iterations.
101 ” Copyright © 2022, by tutorial authors
AutoML Use Cases: 4. Security
Overall objective and constraints: Find a good model to detect suspicious
behaviors.
Advantages of FLAML:
• Support customized learner, custom metrics
• Fast
• Can find accurate models
“
It is useful for me to optimize the hyperparameters in a short time. So I appreciate that, and to be
able to customize the metrics. My job is to create detectors using models, my last few models
were optimized using flaml. It is a corporate level product.
…
Adding 0.5% positive precision increase to "suspicious behavior detector" while adding 3.5% more
true positives (positive recall). Adding 0.8% positive precision increase to "suspicious remote
behavior detector" while adding 24% more true positives (positive recall). Contributed to ~2000
detections weekly for both detectors above.
102 Copyright © 2022, by tutorial authors
”
Task-oriented AutoML: Q & A
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
Outputs: ML model
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( )
budget, ML task (and more)
𝐷'$( 𝐷"#$%&
Split data
103 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
104 Copyright © 2022, by tutorial authors
105 Copyright © 2022, by tutorial authors
ML.NET Platform
Command-line tool
Wizard-like experience CI/CD
Model Builder ML.NET CLI
(Visual Studio UI) (Cross-platform global tool)
AutoML .NET API (FLAML)
ML.NET API
(Microsoft.ML)
106 Copyright © 2022, by tutorial authors
Demo: FLAML AutoML in .NET
107 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
108 Copyright © 2022, by tutorial authors
Tune User Defined Function
AutoML vs Tune
AutoML
Estimator selection Hyperparameter selection
estimator 𝑚
Inputs: 𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
1. Resource budget Outputs:
2. ML task 𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
ML model
(and more)
𝐷'$( 𝐷"#$%&
Split data
Tune
Inputs: Hyperparameter selection
1. Resource budget 𝑟! Outputs:
𝑐
2. Search space Best config
3. User defined function
User defined function
110
Copyright © 2022, by tutorial authors
Tune User Defined Function
Tune
Hyperparameter selection
Inputs: Outputs: Best config
1. Resource budget 𝑟! 𝑐
2. Search space
3. User defined function User defined function
Model training
Inference
Downstream applications
111 Copyright © 2022, by tutorial authors
Tune User Defined Function
Examples:
It can be used to tune generic hyperparameters for:
• MLOps workflows, pipelines AzureML pipeline, MLflow pipeline
• Mathematical/statistical models Casual models
• Algorithms RL policies
• Computing experiments Simulations in environmental science
• Software configurations Database configurations
…
112
Copyright © 2022, by tutorial authors
Resources to Be Used
https://github.com/microsoft/F
LAML/tree/tutorial/tutorial
113 Copyright © 2022, by tutorial authors
Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function
114 Copyright © 2022, by tutorial authors
Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function
115 Copyright © 2022, by tutorial authors
Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function
• Evaluation function
• Optimization metric
• Optimization mode
116 Copyright © 2022, by tutorial authors
Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget Outputs: Best config
2. Search space
𝑟3 𝑐
3. User defined function
User defined function
117 Copyright © 2022, by tutorial authors
Tune User Defined Function: A Basic Tuning Procedure
118 Copyright © 2022, by tutorial authors
Resources to Be Used
https://github.com/microsoft/F
LAML/tree/tutorial/tutorial
119 Copyright © 2022, by tutorial authors
Benefits of Tune
• Save manual efforts (human resource)
• Effectiveness and efficiency: use small computation resource
to find hyperparameter configurations with good performance
• Flexibility: support even more diverse use cases than AutoML
120 Copyright © 2022, by tutorial authors
More about search space
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function
121 Copyright © 2022, by tutorial authors
Cost-related Hyperparameters in Search Space
• low_cost_partial_config (optional): A dictionary from a subset of controlled
dimensions to the initial low-cost values.
• cat_hp_cost (optional): A dictionary from a subset of categorical dimensions to
the relative cost of each choice.
• Search space
• Controlled dimensions where
we know the low-cost values
• The relative cost across
different categorical choices
122
Hierarchical Search Space
A hierarchical search space for xgboost
123
Copyright © 2022, by tutorial authors
HPO algorithm
Local search [AAAI’21] Local search + Global search [ICLR’21]
CFO is suggested when
• Simple search space
• Known good (low-cost) starting points
• Non-parallel tuning or low parallelization
124 Copyright © 2022, by tutorial authors
Tune User Defined Function: More Constraints ?
Inputs:
1. Resource budget Hyperparameter selection
2. Search space 𝑟3 𝑐 Outputs: Best config
3. User defined function
- More constraints? User defined function
125 Copyright © 2022, by tutorial authors
More Constraints on the Tuning
• config_constraints: constraints on the configurations (a list
of 3-tuple)
f: config -> float inequality constraint threshold
126 Copyright © 2022, by tutorial authors
More Constraints on the Tuning
• metric_constraints: constraints on the metrics (a list of 3-tuple)
Needs to be reported in the evaluation function
127
config_constraints vs metric_constraints
Does the calculation of the constraints relies on the
evaluation procedure in the metric function?
No Yes
config_constraints metric_constraints
Note: This type of constraint can be checked
before evaluation. So if a config does not
satisfy config_constraints, it will not be evaluated
(which saves computation).
128 Copyright © 2022, by tutorial authors
Tune User Defined Function: Parallel Tuning
Inputs:
1. Resource budget Hyperparameter selection
2. Search space 𝑟3 𝑐 Outputs: Best config
3. User defined function
- More constraints? User defined function
- Enable parallel tuning?
129 Copyright © 2022, by tutorial authors
Parallel Tuning
• resource_per_trial: a dict of hardware resources to allocate per trial
Required
Recommended
130 Copyright © 2022, by tutorial authors
Tune User Defined Function: Warm Start
Inputs:
1. Resource budget
2. Search space Hyperparameter selection
3. User defined function 𝑟3 𝑐 Outputs: Best config
- More constraints?
User defined function
- Enable parallel tuning?
- Enable warm start?
131 Copyright © 2022, by tutorial authors
Warm Start
points_to_evaluate: a list of initial configs to try first
evaluated_reward: a list of reward for the corresponding configs
provided in points_to_evaluate (must be the same or shorter length
than points_to_evaluate.)
The need to leverage results
from previous runs.
Results from previous runs +
some new configs to try first
132
Copyright © 2022, by tutorial authors
Warm Start
points_to_evaluate: a list of initial configs to try first
evaluated_reward: a list of reward for the corresponding configs
provided in points_to_evaluate (must be the same or shorter length
than points_to_evaluate.)
Evaluated configs
and results
Additional points
to evaluate
133
Copyright © 2022, by tutorial authors
Warm Start
Points to evaluate
134 Copyright © 2022, by tutorial authors
Tune User Defined Function: Trial Scheduling
Inputs:
1. Resource budget
2. User defined function Hyperparameter selection
3. Search space 𝑟3 𝑐 Outputs: Best config
- More constraints?
User defined function
- Enable parallel tuning?
- Enable warm start?
- Enable trial scheduling
135 Copyright © 2022, by tutorial authors
What Is a Scheduler Doing?
A scheduler can help manage the trials’ execution. It can be
used to perform multi-fidelity evaluation, or/and early
stopping.
136 Copyright © 2022, by tutorial authors
Trial Scheduling
scheduler: A scheduler for executing the trials.
• ’flaml’: Authentic scheduler in FLAML
• ‘asha’: The Asynchronous Successive Halving Algorithm
• An instance of the TrialScheduler class from ray.tune
resource_attr: A string to specify the resource dimension used by the
scheduler.
min_resource: A float of the minimal resource to use for the resource_attr.
max_resource: A float of the maximal resource to use for the resource_attr.
reduction_factor: A float of the reduction factor used for incremental
pruning.
137 Copyright © 2022, by tutorial authors
Copyright © 2022, by tutorial authors
Trial Scheduling
• Starts the search with the minimum resource.
At any time point (before the max resource is reached)
scheduler='flaml' • Switches between HPO with the current resource
(An authentic scheduler in FLAML) and increasing the resource for evaluation
depending on which leads to faster improvement.
resource_attr = the attribute name for r (e.g., sample size)
max_resource = 𝑅
Resource schedule
2% r
…
4r 𝑐! , 4r
?
?
reduction_factor = 2 2r 𝑐! , 2r 𝑐!"# , 2r
min_resource = r 𝑐$ , r
𝑐1 … 𝑐. 𝑐./0
138
search trajectory of configs
Effectiveness of the ”flaml” Scheduler
w/o flaml scheduler
w/o flaml scheduler
w/o flaml scheduler
w/ flaml scheduler
w/ flaml scheduler w/ flaml scheduler
Effectiveness of the authentic scheduler (scheduler = ‘flaml’) in FLAML [MLSys’21]
139 Copyright © 2022, by tutorial authors
Trial Scheduling With scheduler='flaml'
• Specifying “sample_size” as the resource dimension
• Using the “flaml” scheduler
• Setting lower limit, upper limit and
changing factor of the resource
140 Copyright © 2022, by tutorial authors
Trial Scheduling With scheduler='flaml'
• The suggested config
contains the additional
resource_attr dimension.
• In the evaluation function,
the resource dimension
shall be used properly.
141
Copyright © 2022, by tutorial authors
A Use Case of Tune: Validating Strategies
Overall objectives and constraints:
Tuning hyperparameters to find the best strategies (the evaluation of which
require expensive simulation), that lead to the highest profitable rate.
Advantages of FLAML.Tune:
• Fits the use case well
• “scheduler” allows for low cost evaluations
“ I started playing around with the tuning module - which is really cool by the way and fits really neatly
into one of my use cases, so cheers for that! I'm running simulations on market strategies in a given
bootstrapped sample.
…
I love the idea of having a resource_attr that increases to validate low cost results on a full dataset. For my
case, it's the number of bootstrap iterations that my data should be sampled with each strategy I'm testing.
So the low cost is 30 samples, which runs in a few seconds, and the validation/max scenario would be 1000
samples, which takes several minutes. The 30 folds help me exclude obviously bad strategies without
having to waste resources on running them 1000 times.
”
142
AutoML
Tune
Estimator selection Hyperparameter selection
Hyperparameter selection
Model validation Model training
User defined function
𝐷'$( 𝐷"#$%&
Data split
Application: Supercharge A/B testing w/automated causal inference
(AutoML: Automating generic regression/classification tasks)
"First of all, I use FLAML models FLAML AutoML as the generic regression. So whenever the
models need a regression component, I throw flaml into there. And also if they need the classifier, I
use FLAML classifier. The only exception is if my sample is actually random. I use the dummy
classifier as a propensity function. But then also on the second level I use FLAML for model
hyperparameter and estimator search.”
(Tune: Tuning causal models) -Head of AI at WISE
143
Copyright © 2022, by tutorial authors
Interested in Knowing More ?
Github: https://github.com/microsoft/FLAML
Documentation: https://microsoft.github.io/FLAML/
144 Copyright © 2022, by tutorial authors
Frequently Asked Questions (FAQ)
https://microsoft.github.io/FLAML/docs/FAQ
About low_cost_partial_config in Tune
How does FLAML handle imbalanced data (unequal distribution of
target classes in classification task)?
How to interpret model performance? Is it possible for me to
visualize feature importance, SHAP values, optimization history?
145 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
146 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
147 Copyright © 2022, by tutorial authors
Zero-shot AutoML
149 Copyright © 2022, by tutorial authors
What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime
The configuration depends on the dataset (X_train and y_train)
150 Copyright © 2022, by tutorial authors
What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime
Dataset name: houses
151 Copyright © 2022, by tutorial authors
What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime
Static default, r2 = 0.8296 Data-dependent default, r2 = 0.8537
152 Copyright © 2022, by tutorial authors
What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime
Requires mining good hyperparameter configurations across
different datasets offline
153 Copyright © 2022, by tutorial authors
Benefit of Zero-shot AutoML
Training one model only
The decision of hyperparameter configuration is instant
User code remains the same
It requires less input (tuning budget) from the user
The offline preparation can be customized for a domain and leverage the historical tuning data
154 Copyright © 2022, by tutorial authors
Concerns of Zero-shot AutoML
Robust meta-learning
Performance
Combine with HPO
Transparency Transparent portfolio-based approach
New tasks/estimators/metrics Customizable meta-learning
155 Copyright © 2022, by tutorial authors
Mature libraries already have carefully
Data- crafted static defaults
dependent LightGBM
Defaults vs. XGBoost
Data-agnostic Random Forest
Defaults Can data-dependent defaults outperform
static defaults consistently?
156 Copyright © 2022, by tutorial authors
157 Copyright © 2022, by tutorial authors
Learned Zero-shot vs. Static Default
The margin is large in many tasks
• bng_libras_move (+24%)
• nyc_taxi (+15%)
• Yolanda (+17%)
• >1% on >67% of the tasks
The static default performs catastrophically in some tasks
• brazil_houses (-0.39 vs 0.76)
• poker (0.28 vs. 0.94)
Learned zero-shot is slightly worse in 1 task
• comet (0.9857 vs. 0.9907, -0.5%)
158 Copyright © 2022, by tutorial authors
Use Zero-shot AutoML: Import a “Flamlized" Learner
LGBMClassifier, LGBMRegressor (inheriting LGBMClassifier,
LGBMRegressor from lightgbm)
XGBClassifier, XGBRegressor (inheriting XGBClassifier, XGBRegressor
from xgboost)
RandomForestClassifier, RandomForestRegressor (inheriting from
scikit-learn)
ExtraTreesClassifier, ExtraTreesRegressor (inheriting from scikit-learn)
159 Copyright © 2022, by tutorial authors
Magic Behind the Scene
flaml.default.LGBMRegressor inherits
lightgbm.LGBMRegressor
Decide the hyperparameter
configurations based on
Training data (size and other Task Configuration
metafeatures)
Offline AutoML results
The decision is made instantly
160 Copyright © 2022, by tutorial authors
Check the Configuration Before Training
161 Copyright © 2022, by tutorial authors
Combine Zero-shot AutoML and HPO
Further improve the accuracy by tuning
162 Copyright © 2022, by tutorial authors
Who may consider customized defaults?
AutoML providers for a particular
domain
Use Your Own Data scientists or engineers who need
to repeatedly train models for similar
Meta-learned tasks with varying training data
Defaults Researchers or developers who would
like to leverage meta learning for new
tasks/estimators/metrics
163 Copyright © 2022, by tutorial authors
Meta Learning in FLAML
Metafeatures
Configurations
Evaluation results
164 Copyright © 2022, by tutorial authors
Learn Data-dependent Defaults
Metafeatures
Configurations
Evaluation results
my/ python –m flaml.default.portfolio --output my --input my --
metafeatures my/all/metafeatures.csv --task binary --estimator lgbm rf
all/metafeatures.csv
lgbm/ my/
2dplanes.json lgbm/binary.json
… rf/binary.json
results.csv all/binary.json
rf/…
165 Copyright © 2022, by tutorial authors
Use Your Own Meta-learned Defaults
Use “flamlized” learner
Combine zero-shot and HPO
location_for_defaults/
all/multiclass.json
lgbm/multiclass.json
xgb_limitdepth/multiclass.json
rf/multiclass.json
166 Copyright © 2022, by tutorial authors
"Flamlize" a Learner
Share it with others
or the future yourself
Update the learned defaults continuously
167 Copyright © 2022, by tutorial authors
Reference
Mining Robust Default
Configurations for Resource-
constrained AutoML. Moe Kayali,
Chi Wang. KDD-AutoML 2022.
168
Copyright © 2022, by tutorial authors
Time Series Forecasting
169
Time Series Forecasting Tasks
Forecast label can be • Numerical: task=“ts_forecast”, or
numerical (default) or task=“ts_forecast_regression”
• Categorical: task=“ts_forecast_classification”
categorical
Data ordered by a • Daily: 3-30-2022, 3-31-2022, 4-1-2022, 4-2-2022, …
datetime column with • Weekly: 3-28-2022, 4-4-2022, 4-11-2022, …
equal intervals • Monthly: 3-1-2022, 4-1-2022, 5-1-2022, …
Forecast horizon • How many future time points to predict
170 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …
Time series Timestamp Forecast
ID (datetime) label
(categorical) (numerical
or
categorical)
171 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …
Time series Timestamp Forecast Known features Unknown
ID (datetime) label (numerical or features
(categorical) (numerical categorical) (numerical
or or
categorical) categorical)
172 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …
Time series Timestamp Forecast Known features Unknown
ID (datetime) label (numerical or features
(categorical) (numerical categorical) (numerical
or or
categorical) categorical)
173 Copyright © 2022, by tutorial authors
Estimators
Statistical models Regressors/Classifiers Neural networks
• Prophet • LightGBM • TemporalFusionTransformer
• ARIMA • XGBoost (pytorch-forecast)
• SARIMAX • RandomForest
• ExtraTrees
Id columns
Unknown features Unknown features Training cost is high
174 Copyright © 2022, by tutorial authors
Data Splitter
Holdout
Training Validation
Cross validation
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
175 Copyright © 2022, by tutorial authors
Example: Prepare Dataset
176 Copyright © 2022, by tutorial authors
Example: Run AutoML.fit()
List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost',
'extra_tree', 'xgb_limitdepth', 'prophet', 'arima', 'sarimax']
177 Copyright © 2022, by tutorial authors
Example: Check Result
178 Copyright © 2022, by tutorial authors
Example: Prediction
179 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
180 Copyright © 2022, by tutorial authors
Natural Language Processing
pip install "flaml[nlp]"
Copyright © 2022, by tutorial authors
Natural Language Processing Tasks by FLAML
Sequence • Sentiment classification / Hate speech
classification detection / Document categorization
Natural Sequence • Review rating prediction, book price
language regression prediction
understanding
Token
• Named entity recognition / POS tagging
classification
Multiple choice
• Multiple choice classification
classification
Natural
language
Seq2seq • Text summarization
generation
182 Copyright © 2022, by tutorial authors
FLAML NLP Backbone: Fine-Tuning Language Models
AutoML
Tune Transformers
Library for fine-tuning transformer
models
Supports major NLP tasks
Open access to most state-of-the-
art language
models: huggingface.co/models
183 Copyright © 2022, by tutorial authors
FLAML NLP Backbone: Fine-Tuning Language Models
Natural language Named entity QA
inference recognition
Stage 1: Pretraining
on unlabeled data
Stage 2: Fine tuning
on downstream
tasks
184 Copyright © 2022, by tutorial authors
NLP: A Basic Use Case
185 Copyright © 2022, by tutorial authors
NLP: A Basic Use Case
Get the data
AutoML using FLAML
Result
186 Copyright © 2022, by tutorial authors
NLP: A Basic Use Case
Get the data
AutoML using FLAML
Result
187 Copyright © 2022, by tutorial authors
NLP: A Basic Use Case
FLAML accuracy = 1-0.077 = 0.922 Electra accuracy = 0.912
188 Copyright © 2022, by tutorial authors
Fine-Tuning Hyperparameters: FLAML vs Transformers
FLAML's AutoML.fit vs Transformers' hyperparameter_search API:
Low code implementation
More customization, e.g., custom metric
Better performance
189 Copyright © 2022, by tutorial authors
Hyperparameter Fine-Tuning with Transformers
AutoML with
Transformers for
text summarization:
107 lines
190 Copyright © 2022, by tutorial authors
Hyperparameter Fine-Tuning with Transformers
AutoML with Transf
ormers
for NER: 133 lines
191
Copyright © 2022, by tutorial authors
Hyperparameter Fine-Tuning with Transformers
model
data collator Trainer
Post processing
Pre-processing/tokenization
User code for
implementing
summarization with
computing metrics Transformers
192 Copyright © 2022, by tutorial authors
Hyperparameter Fine-Tuning with Transformers
model Trainer
Data collator
Pre-processing/tokenization User code for
implementing NER
with Transformers
193
computing metrics Copyright © 2022, by tutorial authors
Low-code Hyperparameter Fine-Tuning with FLAML
AutoML
Tune Transformers
Pre-processing Model loading Data collation
Computing metrics Training Post processing
Training arguments (e.g., model)
194
Copyright © 2022, by tutorial authors
Low-code Hyperparameter Fine-Tuning with FLAML
FLAML for Text summarization
AutoML with FLAML for AutoML with FLAML for
text NER: 15 lines
summarization: 15 lines
195 Copyright © 2022, by tutorial authors
Performance: FLAML vs Transformers
FLAML > Transformers
x8
Validation Accuracy
FLAML < Transformers
x1
FLAML = Transformers
x1
196 Copyright © 2022, by tutorial authors
Number of Trials: FLAML vs Transformers
FLAML > Transformers
x8
FLAML < Transformers
Trials number
x2
197 Copyright © 2022, by tutorial authors
Open Problems on Fine-Tuning Hyperparameters
Model selection
Warm starting AutoML/ zero shot AutoML
Troubleshooting AutoML failure
Optimal search space
...
198 Copyright © 2022, by tutorial authors
Model Selection for Fine-Tuning LM Hyperparameters
RoBERTa: 84.6
BERT: 69.0
199 Copyright © 2022, by tutorial authors
Model Selection for Fine-Tuning LM Hyperparameters
• Non trivial problem. The "best" model does not exist!
Spooky-author-identification: BERT outperforms RoBERTa (and Electra, Muppet, etc.)
Validation Accuracy
Wall clock time
200 Copyright © 2022, by tutorial authors
Troubleshooting AutoML Failure
RS, BO, and ASHA underperform
recommended HPs [ACL 2021]
201 Copyright © 2022, by tutorial authors
Troubleshooting AutoML Failure
Reducing the search space can effectively reduce
overfitting in HPO [ACL 2021]
202 Copyright © 2022, by tutorial authors
FLAML NLP: Summary
An AutoML layer on top of Transformers
Low-code implementation
Outperforms Transformers' tuning performance
Join us to provide a better tool for AutoML for NLP
203 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)
Break, Q & A
Second half (11:20 AM – 12:30 AM)
Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
204 Copyright © 2022, by tutorial authors
Online AutoML https://vowpalwabbit.org/
time
Properties of online machine learning
Data is available in a sequential order
Training/learning is performed online
Prediction/decision needs to be made online
205 Copyright © 2022, by tutorial authors
Copyright © 2022, by tutorial authors
• Improves user experience with
real-time learning
https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/
206 Copyright © 2022, by tutorial authors
Online ML -> Online AutoML
Environment Learning agent Need AutoML
Online learner Hyperparameters:
Input 𝑋! - featurization
Prediction output 𝑌5! Predict(𝑋! ) choices
Ground truth 𝑌! Learn(𝑋! , 𝑌5! , 𝑌! ) - learning rate
- regularization
- …
Online evaluation (e.g., progressive
validation loss, cumulative regret)
207 Copyright © 2022, by tutorial authors
Online Namespace Interactions Tuning
• Features grouped into namespaces in VW:
• Sometimes adding feature interactions is helpful.
• How to automatically decide which namespace interactions to use?
(A double exponentially large search space)
208 Copyright © 2022, by tutorial authors
Online AutoML ?
Unique challenges of online AutoML (which make
conventional AutoML methods not directly applicable)
Sharp computation constraints Vastly different Learning algorithms
(only a constant factor more scales of data are being evaluated
computation is allowed at any volume constantly
time point)
209 Copyright © 2022, by tutorial authors
Online AutoML
Research question
How can we best design an efficient online automated
machine learning algorithm?
Key challenge
Finding a balance between
• Searching over a large number of plausible choices
• Concentrating the limited computation budget on a few
promising choices such that we do not pay a high `learning
price’ (regret)
210 Copyright © 2022, by tutorial authors
Champion-Challengers (ChaCha) for Online AutoML [ICML’21]
Champion: proven best config at the concerned time point
Challengers: the rest of the candidate configs under consideration
Champion update Challenger generation
An initial config
For online learning
211
Live challenger scheduling
Copyright © 2022, by tutorial authors
Resources to Be Used
https://github.com/microsoft/FL
AML/tree/tutorial/tutorial
212 Copyright © 2022, by tutorial authors
Demo: AutoVW +
AutoVW:
Vanilla VW:
213
Copyright © 2022, by tutorial authors
Demo: AutoVW +
214
Copyright © 2022, by tutorial authors
Demo: AutoVW +
215
ML Fairness and Fair AutoML
• Fairness in ML
Existing and ongoing efforts:
üFairness definitions in ML
üUnfairness mitigation methods
216 Source: https://fairmlbook.org/tutorial1.html
ML Fairness and Fair AutoML
• Fair AutoML: To find models not only with good accuracy but
also fair.
• Is it possible to find a model that is fair without applying
additional unfairness mitigation?
• If unfairness mitigation is necessary, how to incorporate it
into an AutoML pipeline?
• If we can find fair models (w/ or w/o unfairness mitigation),
how the “utility” (e.g., accuracy, cost) will be affected?
217 Copyright © 2022, by tutorial authors
Fair AutoML
• Developed abstractions for fairness assessment and unfairness
mitigation techniques in AutoML.
• Investigated the need and impact of unfairness mitigation on
AutoML.
[preprint] Fair AutoML. Qingyun Wu, Chi Wang, arXiv preprint arXiv:2111.06495 (2022).
218
Copyright © 2022, by tutorial authors
Fair AutoML
• The fair AutoML problem
Unfairness mitigation Fairness assessment
219 Copyright © 2022, by tutorial authors
Fair AutoML
E.g., from FairLearn
Inputs:
- Data: D)*#+, , D"#$ = X"#$ , Y"#$ Inputs:
- Loss function: Loss() - Fairness assessment procedure Fair()
- Hyperparameter search space 𝐶 Update - Unfairness mitigation procedure
Suggest(𝐶)
HyperparameterSearcher Config c Regular model building, m=0
No - ML model 𝑓%,-
( ;"#$ , Y"#$ )
Loss(Y
!"#$%
Yes - Model prediction (
Mitigate(c’)
;"#$ = 𝑓%,'
( Fair(𝑓%,' , 𝐷"#$ )
Model building w/ Y (X"#$ ) !"#$%
c’ = Suggest(C) unfairness mitigation, m=1
!"#$%
Fairness
Unfairness mitigation assessment
FairnessManager
Update
FairnessManager: for deciding
FairAutoML: A fair AutoML framework
whether we should perform
the mitigation
220 Copyright © 2022, by tutorial authors
FairFLAML
• A self-adaptive strategy (via FairnessManager) to balance HPO
and unfairness mitigation with a goal of finding a model with
the best fair loss efficiently and effectively.
Fair loss on the adult dataset
221 (sensitive attribute = “sex”) Copyright © 2022, by tutorial authors
Future Work
Open Problems and Research Opportunities on AutoML
Deficiencies of AutoML Timeout
Causes system failures due to compute Elasticity
intensive workloads
Multi-output tasks
Lacks customizability
Early stopping of trials
Lacks comprehensive end-to-end support
Lacks transparency and interpretability
Search space suggestion
Overfitting
Multiple optimization metrics
Customize meta learning
223 Copyright © 2022, by tutorial authors
Open Problems and Research Opportunities on AutoML
Deficiencies of AutoML Deployment constraints
Causes system failures due to compute Online learning
intensive workloads
Guardrail
Lacks customizability
Lacks comprehensive end-to-end support
Lacks transparency and interpretability How to decide time budget
Trustworthiness and Ethics
How many data samples are needed
Reproducibility
ML
224 Copyright © 2022, by tutorial authors
Use FLAML to Improve/Inspire Your Research
ü ML model tuning
- Tuning reinforcement learning models
- Tuning graph neural networks
…
ü Tuning certain choices in your task towards a particular objective
- Synthetic dataset generation by tuning data generation related choices
- Finding the most profitable investment strategies
- RL policy tuning
…
ü Compare different methods/models more comprehensively with
sufficient tuning of each method/model
225 Copyright © 2022, by tutorial authors
Call for Contribution
This project welcomes and encourages all forms of
contribution, including but not limited to:
ü Pushing patches.
ü Code review of pull requests.
ü Documentation, examples and test cases.
ü Community participation in issues, discussions, and gitter.
ü Tutorials, blog posts, talks that promote the project.
ü Sharing application scenarios and/or related research.
226 Copyright © 2022, by tutorial authors
Call For Contribution: Opportunities
üShare your use cases and needs
- Connect via gitter: https://gitter.im/FLAMLer/community
- Create issues
ü Development
- Improve existing features: zero-shot AutoML, time series
forecasting, online AutoML
- Develop new features: computer vision tasks, multi-modal
model, fair AutoML, quality monitoring and drift detection,
visualization and explanation
ü Integration with other libraries
227 Copyright © 2022, by tutorial authors
Call for Contribution
Gitter:
https://gitter.im/FLAMLer/community
Contributing guide:
https://microsoft.github.io/FLAML/docs/Contribute
Roadmap:
https://github.com/microsoft/FLAML/wiki/Roadmap-for-
Upcoming-Features
228 Copyright © 2022, by tutorial authors
References
• [MLSys’21] FLAML: A Fast and Lightweight AutoML Library. Chi Wang, Qingyun Wu, Markus Weimer,
Erkang Zhu.
• [ICML’21] ChaCha for Online AutoML. Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi.
• [ICLR’21] Economical Hyperparameter Optimization With Blended Search Strategy. Chi Wang, Qingyun Wu,
Silu Huang, Amin Saied.
• [AAAI’21] Frugal Optimization for Cost-related Hyperparameters. Qingyun Wu, Chi Wang, Silu Huang.
• [ACL’21] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language
Q&A •
Models. Susan Liu, Chi Wang.
[KDD-AutoML’22] Mining Robust Default Configurations for Resource-constrained AutoML. Moe Kayali, Chi
Wang.
• [preprint] Fair AutoML. Qingyun Wu, Chi Wang. arXiv preprint arXiv:2111.06495 (2022).
aka.ms/FLAML
Thanks to
all contributors
& collaborators!
229