0% found this document useful (0 votes)

179 views222 pages

FLAML Tutorial 2022-KDD

Uploaded by

heavywater

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views222 pages

FLAML Tutorial 2022-KDD

Uploaded by

heavywater

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 222

FLAML: A Fast Library for AutoML & Tuning

github.com/microsoft/FLAML

Chi Wang1, Qingyun Wu2, Susan Xueqing Liu3, Luis Quintanilla1

1. Microsoft
2. Penn State University
3. Stevens Institute of technology

1
Copyright © 2022, by tutorial authors
What Will You Learn

How to use FLAML to (1) find accurate ML models with low computational
resources for common ML tasks; (2) tune hyperparameters generically

How to leverage the flexible and rich Finish the last mile for deployment
customization choices Create new applications

Code examples, demos, use cases

Research & Development opportunities

2 Copyright © 2022, by tutorial authors

Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
3 Copyright © 2022, by tutorial authors
Overview
7 Cross-industry Disruptive Trends in Next Few Decades –
McKinsey (2021)
Applied AI Future of programming
>75% of all digital-service touch points will ~30x reduction in the working time required
see improved usability, enriched for software development and analytics
personalization, and increased conversion

Companies need to master MLOps to make the most

of both trends

5 Copyright © 2022, by tutorial authors

Machine Learning Workflow

Get the data Prepare the data Train Evaluate Deploy Inferencing

Model Training Model Consumption

6 Copyright © 2022, by tutorial authors

Lots of decisions to make to
optimize model performance
• Select learner
Get the data
• Tune hyperparameters
Prepare the data Train Evaluate Deploy Inferencing
• …

Model Training Model Consumption

7 Copyright © 2022, by tutorial authors

Lots of decisions to make:
• Select learner
• Tune hyperparameters
• …
Optimize model performance

8 Copyright © 2022, by tutorial authors

Lots of decisions to make:
• Select learner
• Tune hyperparameters
• …
n_estimators = 5
Optimize model performance n_leaves = 10
learning_rate = 0.1
…
n_estimators = 1000
n_leaves = 100
learning_rate = 0.01

…
…

9 Copyright © 2022, by tutorial authors

Lots of decisions to make:
• Select learner ML

• Tune hyperparameters
• …
Optimize model performance

10 Copyright © 2022, by tutorial authors

Lots of decisions to make:
• Select learner
• Tune hyperparameters
Get the data
• …
Prepare the data Train Evaluate Deploy Inferencing
Optimize model performance

Model Training Model Consumption

11 Copyright © 2022, by tutorial authors

AutoML
Click to add text
Get the data Prepare the data Train Evaluate Deploy Inferencing

Model Training Model Consumption

AutoML Autonomous data platform

Market Size Forecast in 2030 $14.8 Billion $4.8 Billion
12
Copyright © 2022, by tutorial authors
Benefits of AutoML
Enables and empowers novices
Standardizes the ML workflow for better reproducibility, code
maintainability, knowledge sharing
Prevents suboptimal results due to idiosyncrasies of ML Innovators
Builds models more effectively and efficiently
Enables rapid prototyping
Fosters learning

[Xin et al. 2021] Xin, D., Wu, E. Lee, Y., Salehi N., and Parameswaran, A. Whither AutoML?
Understanding the Role of Automation in Machine Learning Workflows. CHI 2021.

13 Copyright © 2022, by tutorial authors

Deficiencies of AutoML
Lacks comprehensive end-to-end
support
Causes system failures due to compute
intensive workloads
Lacks customizability
Lacks transparency and interpretability

14 Copyright © 2022, by tutorial authors

Deficiencies of AutoML
Lacks comprehensive end-to-end
support
Causes system failures due to No usable model returned
compute intensive workloads Revert to manual development

Lacks customizability
Lacks transparency and interpretability

15 Copyright © 2022, by tutorial authors

Deficiencies of AutoML
Lacks comprehensive end-to-end
support
Causes system failures due to compute
intensive workloads
Lacks customizability (Or users need to make too many hard choices)
Lacks transparency and interpretability

16 Copyright © 2022, by tutorial authors

Deficiencies of AutoML
Lacks comprehensive end-to-end Offer both AutoML and
support Tune API & extensible
Causes system failures due to Fast
compute intensive workloads Economical
Lacks customizability Smooth customization
Lacks Transparency and Interpretability

17 Copyright © 2022, by tutorial authors

Offer both AutoML and
Tune API & extensible
AutoML
Fast
Tune ML Economical
Smooth customization

18 Copyright © 2022, by tutorial authors

Python Library flaml
Task-oriented AutoML

Customization is easy

Tune user-defined function

19 Copyright © 2022, by tutorial authors

Economical Tuning & AutoML

üLeverage joint impact of multiple factors on cost & error

üHyperparameter
üLearner
üSample size
üResampling strategy: CV or Holdout

20 Copyright © 2022, by tutorial authors

Fast library for AutoML and tuning
[MLSys’21]

ChaCha Online Meta Offline

BlendSearch (local + global
(Champion- tuning learning tuning
search) [ICLR’21]
Challenger)
[ICML’21] Zero-shot [KDD-AutoML’22]

Cost-frugal optimization
[AAAI’21]
Simple and provable local search

21 Copyright © 2022, by tutorial authors

Users’ Background
Job Title

Years of Programming Experience

22 Copyright © 2022, by tutorial authors

Example Use Cases
APPLICATION WHO FIELD
AutoML tool for .NET developers (Visual Studio, ML.NET) SDE
Suspicious behavior detector Security
Credit assessment Finance
Auto finance functions (classification/regression) Finance
Hardware demand forecast Supply chain
A/B testing with automated causal inference (auto-causality) CRM
Air quality estimate Science
Pricing Insurance

Impact: Accuracy, Productivity, R&D

23 Copyright © 2022, by tutorial authors

Blog Post in Towards Data Science
24
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Task-oriented AutoML AutoML

Tune user-defined function Tune ML

26 Copyright © 2022, by tutorial authors

Agenda and Resources to Be Used in This Tutorial

https://github.com/microsoft/FLAML

https://github.com/microsoft/FLAML
/tree/tutorial/tutorial

27 Copyright © 2022, by tutorial authors

Task-oriented AutoML
What Is Task-oriented AutoML

Inputs:
• Resource budget
• ML task
ü Training
data
Prepare the data AutoML Train Evaluate
Deploy Inferencing

ü Task type

Model Training Model Consumption

29 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Supported Tasks
Required data format:
X: Numpy array or dataframe
y: Numpy array or series of labels in shape n*1.

Time series forecasting

tasks

NLP tasks

30 Copyright © 2022, by tutorial authors

Task-specific Built-in ML Estimators
XGBoost RandomForest CatBoost …

ssion
g re
/re
on
AutoML i c ati
assif
cl nlp Transformer
Tune ML tim
es
…

er
i es
fo
re
c as
tin
g
Prophet ARIMA …

31 Copyright © 2022, by tutorial authors

Resources to Be Used

https://github.com/microsoft/FL
AML/tree/tutorial/tutorial

32 Copyright © 2022, by tutorial authors

Task-oriented AutoML: A Basic Use Case
• Get data
• AutoML with FLAML
• AutoML Result

34 Copyright © 2022, by tutorial authors

Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance

40 Copyright © 2022, by tutorial authors

Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance

41 Copyright © 2022, by tutorial authors

Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance

42 Copyright © 2022, by tutorial authors

Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
FLAML is better

Figure 1. Box plot of scaled score difference between FLAML and other libraries when FLAML
uses equal or smaller budget (positive difference meaning FLAML is better). [MLSys’21]

43 Copyright © 2022, by tutorial authors

Estimator selection Hyperparameter selection

estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

Outputs: ML model
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( )
budget, ML task (and more)

𝐷'$( 𝐷"#$%&
Split data

45 Copyright © 2022, by tutorial authors

The Cost-Frugal HPO Problem
Validation loss under hyperparameter
configuration 𝑥.

Each time you query f(x), you

need to pay a cost of 𝑔(𝑥)

Hyperparameter configuration

• Two important properties:

o f(x) is a black-box function
o Function value evaluation is expensive

• Vanilla HPO: find 𝑥 ∗ with small number of iterations

• Cost-frugal HPO: find 𝑥 ∗ while keeping total cost ∑𝑔(𝒙𝒊 ) small

46 Copyright © 2022, by tutorial authors

Insights On The Cost-Frugal HPO Problem
• Vanilla HPO: find 𝑥 ∗ with small number of iterations
• Cost-frugal HPO: find 𝑥 ∗ while keeping total cost ∑𝑔(𝒙𝒊 ) small

1. If g(x) is constant, low cost ó small #iterations 𝑔( (# 𝑙𝑒𝑎𝑣𝑒𝑠, #𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟𝑠) )

High cost
Bayesian Optimization optimizes for this case

2. It is common to encounter cost-related

hyperparameters
(Examples: the number of estimators and leaves
in gradient boosted trees; the number of layers
and neurons in DNN)
Low cost
47 Copyright © 2022, by tutorial authors
Assumptions about Cost-related HPs

expensive

1. Lipschitz continuous

2. Easy to know low-cost

configurations before we start the
optimization
cheap
low-cost region

48 Copyright © 2022, by tutorial authors

Cost-Frugal HPO Algorithm Design
To avoid high-cost points until necessary -> Low-cost starting point
+ local search
To find low-loss points -> Follow loss-descent directions
Cannot directly use gradient-based method: no gradient available.

Surprise: function values are enough to find the right directions for
efficient convergence.
More surprise: sign comparison between function values is enough!!

49 Copyright © 2022, by tutorial authors

A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps

after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
?
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

50
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps

after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

51
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps

after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

52
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps

after each move:
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

53
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]

Repeat the following steps

after each move::
1.Uniformly sample a
direction from a local
unit sphere;
2.Compare;
3.Move (and break) or try
the opposite direction;
4.Move (and break) or stay.

54
Implications:
At any iteration,
• The incumbent has the best loss
• The evaluated point in the next iteration is in the neighboring area
55 of the incumbent =>the cost is not far away from the incumbent
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
• Theoretical guarantees on:
o Convergence rate
High loss

Low loss
56
Copyright © 2022, by tutorial authors
A Cost-Frugal HPO Algorithm (CFO)[AAAI’21]
• Theoretical guarantees on:
o Convergence rate o The total evaluation cost
High loss High cost

Low loss Low cost

57
Copyright © 2022, by tutorial authors
Combine local search and global search
LS – low cost; may get trapped in local optima
Global search– able to explore the whole space; high cost

58 Copyright © 2022, by tutorial authors

Framework
• Economical Hyperparameter Optimization With Blended Search Strategy.
Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.

59 Copyright © 2022, by tutorial authors

Benefits of Task-oriented AutoML in FLAML
Save manual efforts (human resource)
Effectiveness and efficiency: use small computation resource
to find models with good performance
Rich customization choices in FLAML
Helpful in scenarios such as:
Due to business/deployment requirements, you need to use
- Special ML learner (e.g., a domain specific one), or/and
- Custom metrics, or/and
- Various constraints (e.g., in terms of computation resource,
model complexity, inference time)
60 … Copyright © 2022, by tutorial authors
An Example Use Case That Needs Customization
Application domain: Security
Overall objective: Find a good model to detect suspicious behaviors.
Comments about customization (in blue).
Comments about performance (in green).
“
• I am able to use *** classifiers as the custom learner and use search space for random forest,
lr and lightgbm.
…
• It is useful for me to optimize the hyperparameters in a short time. So I appreciate that, and
to be able to customize the metrics. My job is to create detectors using models, my last few
models were optimized using flaml. It is a corporate level product.
…
• Adding 0.5% positive precision increase to "suspicious behavior detector" while adding 3.5% more
true positives (positive recall) Adding 0.8% positive precision increase to "suspicious remote behavior
detector" while adding 24% more true positives (positive recall). Contributed to ~2000 detections
weekly for both detectors above.
”
61 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Optimization Metric

Estimator selection Hyperparameter selection

estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

𝐷'$( 𝐷"#$%&
Split data

“
I appreciate that I am able to customize the metrics. My job is to create
detectors using models, my last few models were optimized using flaml. ”

62 Copyright © 2022, by tutorial authors

Built-in Optimization Metrics
• metric

63 Copyright © 2022, by tutorial authors

Built-in Optimization Metrics
• metric

64 Copyright © 2022, by tutorial authors

User-defined Metric Function
• metric

65 Copyright © 2022, by tutorial authors

Optimization metric

Metrics to log

“It does allow us to build our models fast; esp. it allows us to control overfitting.
We did observe that the models we built using the packages are simpler (in terms of #trees,
depth, and #features in the models) and more robust (over time), compared to other models
we built in the past using similar tools like ***(another HPO service).
66 Copyright © 2022,
Copyright by by
© 2022, tutorial authors
tutorial authors
”
The Power of a User-defined Metric Function
“Closing the gap between the loss function we optimize in ML and the product metrics we
really want to optimize.”
--Carlos Guestrin at KDD ’19, on 4 Perspectives in Human-Centered Machine Learning

• User-defined Metric Function in FLAML:

Allow creative metrics toward the ultimate objectives
Metrics beyond typical ML For example, business objectives,
predictive performance metrics such as profit or revenue

Concrete examples:
1. A heuristic objective to control overfitting: 𝑜𝑏𝑗 = 𝐿𝑜𝑠𝑠!"# ∗ 1 + 𝛼 − 𝛼 ∗ 𝐿𝑜𝑠𝑠$%"&'
2. Integrating business optimization with a machine learning model

67 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Estimators and Search Space
“I am able to use *** classifiers as the custom learner and use search space for random
forest, lr and lightgbm.

Candidate Estimators Hyperparameter Search Space

Estimator selection Hyperparameter selection

estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

𝐷'$( 𝐷"#$%&
Split data

68 Copyright © 2022, by tutorial authors

Estimators
estimator_list
Use built-in estimators with default search space
ü Classification/regression task:
"lgbm", "xgboost", “rf”, "extra_tree", “catboost”, “lrl2”/“lrl1”, “kneighbor”,
ü Time series forecasting task: ”prophet”, “arima”, “sarimax”
ü NLP task: “transformer”

Add custom estimators

69 Copyright © 2022, by tutorial authors

Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

70 Copyright © 2022, by tutorial authors

Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

71 Copyright © 2022, by tutorial authors

Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

2. Give the custom estimator a name and add it in AutoML.

72 Copyright © 2022, by tutorial authors

Adding a Custom Estimator
1. Build a custom estimator by inheriting flaml.model.BaseEstimator or
a derived class.

2. Give the custom estimator a name and add it in AutoML.

3. Tune the newly added custom estimator depending on your

needs.

73 Copyright © 2022, by tutorial authors

Search Space
Each hyperparameter is associated
with a dict with the following fields:
• “domain”, specifies the possible
values of the hyperparameter and
their distribution.
• “init_value” (optional), which
specifies the initial value of the
hyperparameter.
• “low_cost_init_value” (optional),
which specifies the value of the
hyperparameter that is associated
with low computation cost.

74 Copyright © 2022, by tutorial authors

Cost-related hyperparameter
76 Copyright © 2022, by tutorial authors
Cost-related Hyperparameter in the Search Space

77 Copyright © 2022, by tutorial authors

Resources to Be Used in This Tutorial

https://github.com/microsoft/F
LAML/tree/tutorial/tutorial

78 Copyright © 2022, by tutorial authors

Customize the Search Spaces (of Existing Estimators)
Option 1. Create a new estimator with a revised search space.
Option 2. A shortcut to override the search space of existing estimators
via custom_hp.

79 Copyright © 2022, by tutorial authors

Customize the Search Spaces (of Existing Estimators)

Using a different search range for “n_estimators”

Disable search by setting “domain” to None

Setting a constant value

80 Copyright © 2022, by tutorial authors

Task-oriented AutoML: ML Procedure and Ensemble
Estimator selection Hyperparameter selection
estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

𝐷'$( 𝐷"#$%&
Split data

eval_method: A string of resampling strategy, one of fit_kwargs: Provide additional key word arguments to
['auto', 'cv', 'holdout']. pass to fit() function of the candidate learners, such as
split_ratio, n_splits, sample_weight.
split_type: ["auto", 'stratified', 'uniform', 'time', 'group’] fit_kwargs_by_estimator: The user specified keywords
X_val, y_val arguments, grouped by estimator name.

81 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Advanced Functionalities
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name
• log_type
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials
82 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Advanced Functionalities
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning [Xin et al. 2021]
• n_concurrent_trials
83 Copyright © 2022, by tutorial authors
Task-oriented AutoML: Logging

{"record_id": 12, "iter_per_learner": 2, "logged_metric": {"pred_time": 3.085368373117264e-07},

"trial_time": 0.056574344635009766, "wall_clock_time": 1.4277665615081787, "validation_loss":
0.40724474179893333, "config": {"n_estimators": 4, "max_leaves": 4, "learning_rate": 0.03859136192132082,
"subsample": 1.0, "colsample_bylevel": 0.8148474110627004, "colsample_bytree": 0.9777234800442423,
"reg_alpha": 0.0009765625, "reg_lambda": 5.525802807180917, "min_child_weight": 0.01199969653421202,
"FLAML_sample_size": 10000}, "learner": "xgboost", "sample_size": 10000}

84 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Logging with MLFlow

85 Copyright © 2022, by tutorial authors

Task-oriented AutoML: More Constraints
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit: Training time constraint in seconds
• pred_time_limit: Predict time constraint in seconds
• metric_constraints: A list of constraints on certain metrics
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials
86 Copyright © 2022, by tutorial authors
Constraints of 4 Different Types
1. Constraints on the AutoML process: time_budget, max_iter

2. Constraints on the constructor arguments of the estimators.

87 Copyright © 2022, by tutorial authors

Constraints of 4 Different Types
3. Constraints on the models tried in AutoML: train_time_limit, pred_time_limit

Or/and

88 Copyright © 2022, by tutorial authors

Constraints of 4 Different Types
4. Constraints on the metrics of the ML model tried in AutoML: metric_constraints

89 Copyright © 2022, by tutorial authors

Constraints of 4 Different Types
4. Constraints on the metrics of the ML model tried in AutoML: metric_constraints

90 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Warm Start
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points: Starting hyperparameter config for the estimators
• Parallel tuning
• n_concurrent_trials
91 Copyright © 2022, by tutorial authors
Warm Start
starting_points:
A dictionary of config start with or a str to specify the starting
hyperparameter config for the estimators | default="data".
If str:
- if "data", use data-dependent defaults; Will be covered
- if "data:path”, use data-dependent defaults which are stored at later in zero-
path; shot AutoML
- if "static", use data-independent defaults.

92 Copyright © 2022, by tutorial authors

Task-oriented AutoML: Parallel Tuning
• Required inputs
• X_train, y_train, task, time_budget/max_iter AutoML
• Logging
• log_file_name: A string of the log file name
• log_type: “better” or “all”
• Constraints
• train_time_limit
• pred_time_limit
• metric_constraints
• Warm start
• starting_points
• Parallel tuning
• n_concurrent_trials: The number of concurrent trials
93 Copyright © 2022, by tutorial authors
Parallel Tuning
AutoML

Sequential
Tune ML
Parallel
Ray NNI

94 Copyright © 2022, by tutorial authors

Easy Parallel Tuning Is a Desirable Feature

So I work in Research and that entire application I wrote myself over the last 9 months. My job
(and the teams) currently is to research a next state application stack for our existing products. In
particular the models out of this product *** have been implemented in FLAML (this is our own
GLM and GBM) which is exactly why we liked FLAML because it allowed us to easily extend the
estimators, something *** (an alternative HPO service) does not allow. Anyway the reason we
love FLAML and Ray is to get large scale parallel tuning, something the product team are
struggling with because it’s desktop and C# based, so to have FLAML out of the box
distribute across an auto scaled cluster removes at least 18 months work for us.
-- A S&P 500 company

95
Copyright © 2022, by tutorial authors
Parallel vs Sequential Tuning

Heads-up: Parallel tuning is not necessarily more desirable than sequential tuning.

Things to Consider:
• Different overhead and trial time.
E.g., when parallel tuning is used (ray backend is used), there will be a certain computation
overhead that is larger than sequential tuning.
• Availability of parallel resources
• Different randomness

Find more in this doc:

https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning

96
Copyright © 2022, by tutorial authors
Parallel vs Sequential Tuning
A rough estimation of the wall-clock time needed to finish N trials:
Computation overhead

Scale of parallelism

Trial time to evaluate a particular hyperparameter configuration

k(scale of parallelism) = 8,
Sequential tuning is faster
SingleTrialTime ≈ 0.3
97 OverHead (parallel tuning) ≈ 2.6 Copyright © 2022, by tutorial authors
use_ray in Sequential Tuning
use_ray: boolean or dict.
If boolean, default=False | Whether to use ray to run the training in
separate processes. This can be used to prevent OOM for large
datasets, but will incur more overhead in time.
If dict: the dict contains the keywords arguments to be passed to
ray.tune.run

Suggested scenarios to use ray backend:

1. Parallel tuning
2. Sequential tuning with potential Out Of Memory
(OOM) failure

98 Copyright © 2022, by tutorial authors

AutoML Use Cases: 1. Credit Scoring and Fraud
Detection in Financial Industry Advantages of FLAML:
Overall objective and constraints: Finding an “optimal” (gradient
boosting or deep learning) model that • Fast HPO
performs the best in out-of-time (OOT) period; • Controlled by our HPO algorithm
control model complexity (e.g., # of variables in the model, # of
trees, tree depth);
maintain model explainability; • Allow custom metric
preferably meet the following criteria: (1) do not over fit the training
data; (2) perform consistently between training/holdout/OOT; (3)
Simple and intuitive, and can pass regulatory exams.
“
• It does allow us to build our models fast; esp. it allows us to control overfitting.
• We did observe that the models we built using the packages are simpler (in terms of #trees,
depth, and #features in the models) and more robust (over time), compared to other models
we built in the past using similar tools like ***(another HPO service).
99
” Copyright © 2022, by tutorial authors
AutoML Use Cases: 2. Investment Management
• Overall objectives and constraints: Find the best model (customized
estimator) based on business requirements for deployment.
• Advantages of FLAML:
• Saved dev time
• High compatibility
“
Dev time saved 30 - 40 percent on average; for gigantic datasets the saving is even more;
regarding the performance, AUC wise the lift is about 0.1 - 0.2 point

For us, a big part of R&D is testing new algorithms released in recent publications with
internal data; thanks to the high compatibility of FLAML, we can conveniently incorporate
these new algorithms into the existing pipeline and make apples-to-apples comparisons.
” -- A large private equity firm
100
AutoML Use Cases: 3. Custom Retention and Growth
Analysis
Overall objectives and constraints: Find a good multiclass
workload classification model (mainly xgboost) based on the usage
and behavior pattern.
Advantages of FLAML:
• Fast (30 hours grid search -> minutes)
• Can find accurate models
“
the model runs faster (in minutes) and we are able to try out various sampling
techniques and additional derived attributes. Thus, we are using FLAML model
in the production classification models. It really helped to improve our team
productivity and reduced development iterations.
101 ” Copyright © 2022, by tutorial authors
AutoML Use Cases: 4. Security
Overall objective and constraints: Find a good model to detect suspicious
behaviors.
Advantages of FLAML:
• Support customized learner, custom metrics
• Fast
• Can find accurate models
“
It is useful for me to optimize the hyperparameters in a short time. So I appreciate that, and to be
able to customize the metrics. My job is to create detectors using models, my last few models
were optimized using flaml. It is a corporate level product.
…
Adding 0.5% positive precision increase to "suspicious behavior detector" while adding 3.5% more
true positives (positive recall). Adding 0.8% positive precision increase to "suspicious remote
behavior detector" while adding 24% more true positives (positive recall). Contributed to ~2000
detections weekly for both detectors above.
102 Copyright © 2022, by tutorial authors
”
Task-oriented AutoML: Q & A

Estimator selection Hyperparameter selection

estimator 𝑚
𝑙𝑜𝑠𝑠)! estimator and config 𝑚!

Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

Outputs: ML model
𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( )
budget, ML task (and more)

𝐷'$( 𝐷"#$%&
Split data

103 Copyright © 2022, by tutorial authors

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
104 Copyright © 2022, by tutorial authors
105 Copyright © 2022, by tutorial authors
ML.NET Platform
Command-line tool
Wizard-like experience CI/CD

Model Builder ML.NET CLI

(Visual Studio UI) (Cross-platform global tool)

AutoML .NET API (FLAML)

ML.NET API
(Microsoft.ML)

106 Copyright © 2022, by tutorial authors

Demo: FLAML AutoML in .NET

107 Copyright © 2022, by tutorial authors

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Estimator selection Hyperparameter selection

estimator 𝑚
Inputs: 𝑙𝑜𝑠𝑠)! estimator and config 𝑚!
1. Resource budget Outputs:
2. ML task 𝑚! . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝐷'$( ) 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )
ML model
(and more)
𝐷'$( 𝐷"#$%&
Split data

Tune

Inputs: Hyperparameter selection

1. Resource budget 𝑟! Outputs:
𝑐
2. Search space Best config
3. User defined function
User defined function
110
Copyright © 2022, by tutorial authors
Tune User Defined Function
Tune

Hyperparameter selection
Inputs: Outputs: Best config
1. Resource budget 𝑟! 𝑐
2. Search space
3. User defined function User defined function

Model training

Inference
Downstream applications

111 Copyright © 2022, by tutorial authors

Tune User Defined Function
Examples:
It can be used to tune generic hyperparameters for:
• MLOps workflows, pipelines AzureML pipeline, MLflow pipeline
• Mathematical/statistical models Casual models
• Algorithms RL policies
• Computing experiments Simulations in environmental science
• Software configurations Database configurations
…

112
Copyright © 2022, by tutorial authors
Resources to Be Used

https://github.com/microsoft/F
LAML/tree/tutorial/tutorial

113 Copyright © 2022, by tutorial authors

Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

114 Copyright © 2022, by tutorial authors

Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

115 Copyright © 2022, by tutorial authors

Tune User Defined Function: A Basic Tuning Procedure
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

• Evaluation function
• Optimization metric
• Optimization mode

116 Copyright © 2022, by tutorial authors

Tune User Defined Function: A Basic Tuning Procedure

Inputs: Hyperparameter selection

1. Resource budget Outputs: Best config
2. Search space
𝑟3 𝑐
3. User defined function
User defined function

117 Copyright © 2022, by tutorial authors

Tune User Defined Function: A Basic Tuning Procedure

118 Copyright © 2022, by tutorial authors

Resources to Be Used

https://github.com/microsoft/F
LAML/tree/tutorial/tutorial

119 Copyright © 2022, by tutorial authors

Benefits of Tune
• Save manual efforts (human resource)
• Effectiveness and efficiency: use small computation resource
to find hyperparameter configurations with good performance
• Flexibility: support even more diverse use cases than AutoML

120 Copyright © 2022, by tutorial authors

More about search space
Inputs: Hyperparameter selection
1. Resource budget
𝑟3 𝑐 Outputs: Best config
2. Search space
3. User defined function
User defined function

121 Copyright © 2022, by tutorial authors

Cost-related Hyperparameters in Search Space
• low_cost_partial_config (optional): A dictionary from a subset of controlled
dimensions to the initial low-cost values.
• cat_hp_cost (optional): A dictionary from a subset of categorical dimensions to
the relative cost of each choice.

• Search space

• Controlled dimensions where

we know the low-cost values
• The relative cost across
different categorical choices

122
Hierarchical Search Space
A hierarchical search space for xgboost

123
Copyright © 2022, by tutorial authors
HPO algorithm

Local search [AAAI’21] Local search + Global search [ICLR’21]

CFO is suggested when

• Simple search space
• Known good (low-cost) starting points
• Non-parallel tuning or low parallelization

124 Copyright © 2022, by tutorial authors

Tune User Defined Function: More Constraints ?

Inputs:
1. Resource budget Hyperparameter selection
2. Search space 𝑟3 𝑐 Outputs: Best config
3. User defined function
- More constraints? User defined function

125 Copyright © 2022, by tutorial authors

More Constraints on the Tuning
• config_constraints: constraints on the configurations (a list
of 3-tuple)

f: config -> float inequality constraint threshold

126 Copyright © 2022, by tutorial authors

More Constraints on the Tuning
• metric_constraints: constraints on the metrics (a list of 3-tuple)

Needs to be reported in the evaluation function

127
config_constraints vs metric_constraints
Does the calculation of the constraints relies on the
evaluation procedure in the metric function?
No Yes
config_constraints metric_constraints
Note: This type of constraint can be checked
before evaluation. So if a config does not
satisfy config_constraints, it will not be evaluated
(which saves computation).

128 Copyright © 2022, by tutorial authors

Tune User Defined Function: Parallel Tuning

Inputs:
1. Resource budget Hyperparameter selection
2. Search space 𝑟3 𝑐 Outputs: Best config
3. User defined function
- More constraints? User defined function
- Enable parallel tuning?

129 Copyright © 2022, by tutorial authors

Parallel Tuning
• resource_per_trial: a dict of hardware resources to allocate per trial

Required
Recommended

130 Copyright © 2022, by tutorial authors

Tune User Defined Function: Warm Start

Inputs:
1. Resource budget
2. Search space Hyperparameter selection
3. User defined function 𝑟3 𝑐 Outputs: Best config
- More constraints?
User defined function
- Enable parallel tuning?
- Enable warm start?

131 Copyright © 2022, by tutorial authors

Warm Start
points_to_evaluate: a list of initial configs to try first
evaluated_reward: a list of reward for the corresponding configs
provided in points_to_evaluate (must be the same or shorter length
than points_to_evaluate.)

The need to leverage results

from previous runs.

Results from previous runs +

some new configs to try first
132
Copyright © 2022, by tutorial authors
Warm Start
points_to_evaluate: a list of initial configs to try first
evaluated_reward: a list of reward for the corresponding configs
provided in points_to_evaluate (must be the same or shorter length
than points_to_evaluate.)

Evaluated configs
and results
Additional points
to evaluate

133
Copyright © 2022, by tutorial authors
Warm Start

Points to evaluate

134 Copyright © 2022, by tutorial authors

Tune User Defined Function: Trial Scheduling

Inputs:
1. Resource budget
2. User defined function Hyperparameter selection
3. Search space 𝑟3 𝑐 Outputs: Best config
- More constraints?
User defined function
- Enable parallel tuning?
- Enable warm start?
- Enable trial scheduling

135 Copyright © 2022, by tutorial authors

What Is a Scheduler Doing?
A scheduler can help manage the trials’ execution. It can be
used to perform multi-fidelity evaluation, or/and early
stopping.

136 Copyright © 2022, by tutorial authors

Trial Scheduling
scheduler: A scheduler for executing the trials.
• ’flaml’: Authentic scheduler in FLAML
• ‘asha’: The Asynchronous Successive Halving Algorithm
• An instance of the TrialScheduler class from ray.tune
resource_attr: A string to specify the resource dimension used by the
scheduler.
min_resource: A float of the minimal resource to use for the resource_attr.
max_resource: A float of the maximal resource to use for the resource_attr.
reduction_factor: A float of the reduction factor used for incremental
pruning.
137 Copyright © 2022, by tutorial authors
Copyright © 2022, by tutorial authors

Trial Scheduling
• Starts the search with the minimum resource.
At any time point (before the max resource is reached)
scheduler='flaml' • Switches between HPO with the current resource
(An authentic scheduler in FLAML) and increasing the resource for evaluation
depending on which leads to faster improvement.
resource_attr = the attribute name for r (e.g., sample size)
max_resource = 𝑅

Resource schedule
2% r
…
4r 𝑐! , 4r
?
?
reduction_factor = 2 2r 𝑐! , 2r 𝑐!"# , 2r
min_resource = r 𝑐$ , r
𝑐1 … 𝑐. 𝑐./0
138
search trajectory of configs
Effectiveness of the ”flaml” Scheduler
w/o flaml scheduler

w/o flaml scheduler

w/ flaml scheduler
w/ flaml scheduler w/ flaml scheduler

Effectiveness of the authentic scheduler (scheduler = ‘flaml’) in FLAML [MLSys’21]

139 Copyright © 2022, by tutorial authors

Trial Scheduling With scheduler='flaml'

• Specifying “sample_size” as the resource dimension

• Using the “flaml” scheduler

• Setting lower limit, upper limit and
changing factor of the resource

140 Copyright © 2022, by tutorial authors

Trial Scheduling With scheduler='flaml'

• The suggested config

contains the additional
resource_attr dimension.

• In the evaluation function,

the resource dimension
shall be used properly.

141
Copyright © 2022, by tutorial authors
A Use Case of Tune: Validating Strategies
Overall objectives and constraints:
Tuning hyperparameters to find the best strategies (the evaluation of which
require expensive simulation), that lead to the highest profitable rate.
Advantages of FLAML.Tune:
• Fits the use case well
• “scheduler” allows for low cost evaluations
“ I started playing around with the tuning module - which is really cool by the way and fits really neatly
into one of my use cases, so cheers for that! I'm running simulations on market strategies in a given
bootstrapped sample.
…
I love the idea of having a resource_attr that increases to validate low cost results on a full dataset. For my
case, it's the number of bootstrap iterations that my data should be sampled with each strategy I'm testing.
So the low cost is 30 samples, which runs in a few seconds, and the validation/max scenario would be 1000
samples, which takes several minutes. The 30 folds help me exclude obviously bad strategies without
having to waste resources on running them 1000 times.
”
142
AutoML
Tune
Estimator selection Hyperparameter selection
Hyperparameter selection

Model validation Model training

User defined function

𝐷'$( 𝐷"#$%&
Data split

Application: Supercharge A/B testing w/automated causal inference

(AutoML: Automating generic regression/classification tasks)
"First of all, I use FLAML models FLAML AutoML as the generic regression. So whenever the
models need a regression component, I throw flaml into there. And also if they need the classifier, I
use FLAML classifier. The only exception is if my sample is actually random. I use the dummy
classifier as a propensity function. But then also on the second level I use FLAML for model
hyperparameter and estimator search.”
(Tune: Tuning causal models) -Head of AI at WISE
143
Copyright © 2022, by tutorial authors
Interested in Knowing More ?
Github: https://github.com/microsoft/FLAML
Documentation: https://microsoft.github.io/FLAML/

144 Copyright © 2022, by tutorial authors

Frequently Asked Questions (FAQ)
https://microsoft.github.io/FLAML/docs/FAQ
About low_cost_partial_config in Tune
How does FLAML handle imbalanced data (unequal distribution of
target classes in classification task)?
How to interpret model performance? Is it possible for me to
visualize feature importance, SHAP values, optimization history?

145 Copyright © 2022, by tutorial authors

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
146 Copyright © 2022, by tutorial authors
Agenda
First half (9:30 AM – 11:10 AM)
• Overview of AutoML and FLAML (9:30 AM)
• Task-oriented AutoML with FLAML (9:45 AM)
• ML.NET demo (10:30 AM)
• Tune user defined functions with FLAML (10:40 AM)

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Second half (11:20 AM – 12:30 AM)
• Zero-shot AutoML (11:20 AM)
• Time series forecasting
• Natural language processing (11:45 AM)
• Online AutoML (12:00 PM)
• Fair AutoML
• Challenges and open problems
147 Copyright © 2022, by tutorial authors
Zero-shot AutoML
149 Copyright © 2022, by tutorial authors
What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime
The configuration depends on the dataset (X_train and y_train)

150 Copyright © 2022, by tutorial authors

What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime

Dataset name: houses

151 Copyright © 2022, by tutorial authors

What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime

Static default, r2 = 0.8296 Data-dependent default, r2 = 0.8537

152 Copyright © 2022, by tutorial authors

What Is Zero-shot AutoML
Zero-shot AutoML = "no-tuning" AutoML
Recommend data-dependent default configurations at runtime
Requires mining good hyperparameter configurations across
different datasets offline

153 Copyright © 2022, by tutorial authors

Benefit of Zero-shot AutoML

Training one model only

The decision of hyperparameter configuration is instant

User code remains the same

It requires less input (tuning budget) from the user

The offline preparation can be customized for a domain and leverage the historical tuning data

154 Copyright © 2022, by tutorial authors

Concerns of Zero-shot AutoML

Robust meta-learning
Performance
Combine with HPO

Transparency Transparent portfolio-based approach

New tasks/estimators/metrics Customizable meta-learning

155 Copyright © 2022, by tutorial authors

Mature libraries already have carefully
Data- crafted static defaults
dependent LightGBM
Defaults vs. XGBoost
Data-agnostic Random Forest
Defaults Can data-dependent defaults outperform
static defaults consistently?

156 Copyright © 2022, by tutorial authors

157 Copyright © 2022, by tutorial authors
Learned Zero-shot vs. Static Default

The margin is large in many tasks

• bng_libras_move (+24%)
• nyc_taxi (+15%)
• Yolanda (+17%)
• >1% on >67% of the tasks

The static default performs catastrophically in some tasks

• brazil_houses (-0.39 vs 0.76)

• poker (0.28 vs. 0.94)

Learned zero-shot is slightly worse in 1 task

• comet (0.9857 vs. 0.9907, -0.5%)

158 Copyright © 2022, by tutorial authors

Use Zero-shot AutoML: Import a “Flamlized" Learner
LGBMClassifier, LGBMRegressor (inheriting LGBMClassifier,
LGBMRegressor from lightgbm)
XGBClassifier, XGBRegressor (inheriting XGBClassifier, XGBRegressor
from xgboost)
RandomForestClassifier, RandomForestRegressor (inheriting from
scikit-learn)
ExtraTreesClassifier, ExtraTreesRegressor (inheriting from scikit-learn)

159 Copyright © 2022, by tutorial authors

Magic Behind the Scene
flaml.default.LGBMRegressor inherits
lightgbm.LGBMRegressor
Decide the hyperparameter
configurations based on
Training data (size and other Task Configuration
metafeatures)
Offline AutoML results
The decision is made instantly

160 Copyright © 2022, by tutorial authors

Check the Configuration Before Training

161 Copyright © 2022, by tutorial authors

Combine Zero-shot AutoML and HPO
Further improve the accuracy by tuning

162 Copyright © 2022, by tutorial authors

Who may consider customized defaults?
AutoML providers for a particular
domain
Use Your Own Data scientists or engineers who need
to repeatedly train models for similar
Meta-learned tasks with varying training data
Defaults Researchers or developers who would
like to leverage meta learning for new
tasks/estimators/metrics

163 Copyright © 2022, by tutorial authors

Meta Learning in FLAML
Metafeatures

Configurations

Evaluation results

164 Copyright © 2022, by tutorial authors

Learn Data-dependent Defaults
Metafeatures

Configurations

Evaluation results

my/ python –m flaml.default.portfolio --output my --input my --

metafeatures my/all/metafeatures.csv --task binary --estimator lgbm rf
all/metafeatures.csv
lgbm/ my/
2dplanes.json lgbm/binary.json
… rf/binary.json
results.csv all/binary.json
rf/…
165 Copyright © 2022, by tutorial authors
Use Your Own Meta-learned Defaults
Use “flamlized” learner

Combine zero-shot and HPO

location_for_defaults/
all/multiclass.json
lgbm/multiclass.json
xgb_limitdepth/multiclass.json
rf/multiclass.json

"Flamlize" a Learner

Share it with others

or the future yourself

Update the learned defaults continuously

Reference
Mining Robust Default
Configurations for Resource-
constrained AutoML. Moe Kayali,
Chi Wang. KDD-AutoML 2022.

169
Time Series Forecasting Tasks

Forecast label can be • Numerical: task=“ts_forecast”, or

numerical (default) or task=“ts_forecast_regression”
• Categorical: task=“ts_forecast_classification”
categorical

Data ordered by a • Daily: 3-30-2022, 3-31-2022, 4-1-2022, 4-2-2022, …

datetime column with • Weekly: 3-28-2022, 4-4-2022, 4-11-2022, …
equal intervals • Monthly: 3-1-2022, 4-1-2022, 5-1-2022, …

Forecast horizon • How many future time points to predict

Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …

Time series Timestamp Forecast

ID (datetime) label
(categorical) (numerical
or
categorical)
171 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …

Time series Timestamp Forecast Known features Unknown

ID (datetime) label (numerical or features
(categorical) (numerical categorical) (numerical
or or
categorical) categorical)
172 Copyright © 2022, by tutorial authors
Time Series Forecasting Dataset
SKU Date Volume Month Price Temperature
SKU_01 01-01-2022 4 Jan 11 25
SKU_02 01-01-2022 8 Jan 12 25
SKU_01 02-01-2022 20 Feb 10.5 27
SKU_02 02-01-2022 32 Feb 11.2 27
… … … … … …

Time series Timestamp Forecast Known features Unknown

Statistical models Regressors/Classifiers Neural networks

• Prophet • LightGBM • TemporalFusionTransformer

• ARIMA • XGBoost (pytorch-forecast)
• SARIMAX • RandomForest
• ExtraTrees

Id columns
Unknown features Unknown features Training cost is high

Data Splitter
Holdout
Training Validation

Cross validation
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5

1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4

Example: Prepare Dataset

Example: Run AutoML.fit()
List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost',
'extra_tree', 'xgb_limitdepth', 'prophet', 'arima', 'sarimax']

Example: Check Result

Example: Prediction

Break, Q & A

Second half (11:20 AM – 12:30 AM)

Natural Language Processing Tasks by FLAML
Sequence • Sentiment classification / Hate speech
classification detection / Document categorization

Natural Sequence • Review rating prediction, book price

language regression prediction
understanding
Token
• Named entity recognition / POS tagging
classification
Multiple choice
• Multiple choice classification
classification
Natural
language
Seq2seq • Text summarization
generation

FLAML NLP Backbone: Fine-Tuning Language Models

AutoML
Tune Transformers

Library for fine-tuning transformer

models
Supports major NLP tasks
Open access to most state-of-the-
art language
models: huggingface.co/models

FLAML NLP Backbone: Fine-Tuning Language Models
Natural language Named entity QA
inference recognition
Stage 1: Pretraining
on unlabeled data

Stage 2: Fine tuning

on downstream
tasks

NLP: A Basic Use Case

Get the data

AutoML using FLAML
Result

NLP: A Basic Use Case

Get the data

AutoML using FLAML
Result

NLP: A Basic Use Case

FLAML accuracy = 1-0.077 = 0.922 Electra accuracy = 0.912

Fine-Tuning Hyperparameters: FLAML vs Transformers

FLAML's AutoML.fit vs Transformers' hyperparameter_search API:

Low code implementation
More customization, e.g., custom metric
Better performance

Hyperparameter Fine-Tuning with Transformers

AutoML with
Transformers for
text summarization:
107 lines

Hyperparameter Fine-Tuning with Transformers

AutoML with Transf

ormers
for NER: 133 lines

model
data collator Trainer

Post processing
Pre-processing/tokenization

User code for

model Trainer

Data collator

Pre-processing/tokenization User code for

implementing NER
with Transformers

AutoML
Tune Transformers

Pre-processing Model loading Data collation

Computing metrics Training Post processing

Training arguments (e.g., model)

FLAML for Text summarization

AutoML with FLAML for AutoML with FLAML for

text NER: 15 lines
summarization: 15 lines

Performance: FLAML vs Transformers

FLAML > Transformers

x8
Validation Accuracy

FLAML < Transformers

FLAML = Transformers

Number of Trials: FLAML vs Transformers

FLAML > Transformers

FLAML < Transformers

Trials number

Open Problems on Fine-Tuning Hyperparameters

Model selection
Warm starting AutoML/ zero shot AutoML
Troubleshooting AutoML failure
Optimal search space
...

Model Selection for Fine-Tuning LM Hyperparameters

RoBERTa: 84.6

• Non trivial problem. The "best" model does not exist!

Spooky-author-identification: BERT outperforms RoBERTa (and Electra, Muppet, etc.)

Validation Accuracy

Wall clock time

RS, BO, and ASHA underperform

Reducing the search space can effectively reduce

An AutoML layer on top of Transformers

Low-code implementation
Outperforms Transformers' tuning performance
Join us to provide a better tool for AutoML for NLP

Break, Q & A

Second half (11:20 AM – 12:30 AM)

time

Properties of online machine learning

Data is available in a sequential order
Training/learning is performed online
Prediction/decision needs to be made online

• Improves user experience with

real-time learning

Environment Learning agent Need AutoML

Online learner Hyperparameters:
Input 𝑋! - featurization
Prediction output 𝑌5! Predict(𝑋! ) choices
Ground truth 𝑌! Learn(𝑋! , 𝑌5! , 𝑌! ) - learning rate
- regularization
- …
Online evaluation (e.g., progressive
validation loss, cumulative regret)

Online Namespace Interactions Tuning

• Features grouped into namespaces in VW:

• Sometimes adding feature interactions is helpful.

• How to automatically decide which namespace interactions to use?

(A double exponentially large search space)

Online AutoML ?
Unique challenges of online AutoML (which make
conventional AutoML methods not directly applicable)

Sharp computation constraints Vastly different Learning algorithms

(only a constant factor more scales of data are being evaluated
computation is allowed at any volume constantly
time point)

Online AutoML
Research question
How can we best design an efficient online automated
machine learning algorithm?

Key challenge
Finding a balance between
• Searching over a large number of plausible choices
• Concentrating the limited computation budget on a few
promising choices such that we do not pay a high `learning
price’ (regret)

Champion-Challengers (ChaCha) for Online AutoML [ICML’21]
Champion: proven best config at the concerned time point
Challengers: the rest of the candidate configs under consideration
Champion update Challenger generation

An initial config

For online learning

https://github.com/microsoft/FL
AML/tree/tutorial/tutorial

Demo: AutoVW +
AutoVW:

Vanilla VW:

215
ML Fairness and Fair AutoML
• Fairness in ML

Existing and ongoing efforts:

üFairness definitions in ML
üUnfairness mitigation methods
216 Source: https://fairmlbook.org/tutorial1.html
ML Fairness and Fair AutoML

• Fair AutoML: To find models not only with good accuracy but
also fair.
• Is it possible to find a model that is fair without applying
additional unfairness mitigation?
• If unfairness mitigation is necessary, how to incorporate it
into an AutoML pipeline?
• If we can find fair models (w/ or w/o unfairness mitigation),
how the “utility” (e.g., accuracy, cost) will be affected?

Fair AutoML
• Developed abstractions for fairness assessment and unfairness
mitigation techniques in AutoML.
• Investigated the need and impact of unfairness mitigation on
AutoML.

• The fair AutoML problem

Unfairness mitigation Fairness assessment

Fair AutoML
E.g., from FairLearn
Inputs:
- Data: D)*#+, , D"#$ = X"#$ , Y"#$ Inputs:
- Loss function: Loss() - Fairness assessment procedure Fair()
- Hyperparameter search space 𝐶 Update - Unfairness mitigation procedure

Suggest(𝐶)
HyperparameterSearcher Config c Regular model building, m=0
No - ML model 𝑓%,-
( ;"#$ , Y"#$ )
Loss(Y
!"#$%
Yes - Model prediction (
Mitigate(c’)
;"#$ = 𝑓%,'
( Fair(𝑓%,' , 𝐷"#$ )
Model building w/ Y (X"#$ ) !"#$%

c’ = Suggest(C) unfairness mitigation, m=1

!"#$%
Fairness
Unfairness mitigation assessment
FairnessManager
Update
FairnessManager: for deciding
FairAutoML: A fair AutoML framework
whether we should perform
the mitigation
220 Copyright © 2022, by tutorial authors
FairFLAML
• A self-adaptive strategy (via FairnessManager) to balance HPO
and unfairness mitigation with a goal of finding a model with
the best fair loss efficiently and effectively.

Fair loss on the adult dataset

221 (sensitive attribute = “sex”) Copyright © 2022, by tutorial authors
Future Work
Open Problems and Research Opportunities on AutoML
Deficiencies of AutoML Timeout
Causes system failures due to compute Elasticity
intensive workloads
Multi-output tasks
Lacks customizability
Early stopping of trials
Lacks comprehensive end-to-end support
Lacks transparency and interpretability
Search space suggestion
Overfitting
Multiple optimization metrics
Customize meta learning

Open Problems and Research Opportunities on AutoML
Deficiencies of AutoML Deployment constraints
Causes system failures due to compute Online learning
intensive workloads
Guardrail
Lacks customizability
Lacks comprehensive end-to-end support
Lacks transparency and interpretability How to decide time budget
Trustworthiness and Ethics
How many data samples are needed
Reproducibility
ML

Use FLAML to Improve/Inspire Your Research
ü ML model tuning
- Tuning reinforcement learning models
- Tuning graph neural networks
…
ü Tuning certain choices in your task towards a particular objective
- Synthetic dataset generation by tuning data generation related choices
- Finding the most profitable investment strategies
- RL policy tuning
…
ü Compare different methods/models more comprehensively with
sufficient tuning of each method/model
225 Copyright © 2022, by tutorial authors
Call for Contribution
This project welcomes and encourages all forms of
contribution, including but not limited to:
ü Pushing patches.
ü Code review of pull requests.
ü Documentation, examples and test cases.
ü Community participation in issues, discussions, and gitter.
ü Tutorials, blog posts, talks that promote the project.
ü Sharing application scenarios and/or related research.

Call For Contribution: Opportunities
üShare your use cases and needs
- Connect via gitter: https://gitter.im/FLAMLer/community
- Create issues
ü Development
- Improve existing features: zero-shot AutoML, time series
forecasting, online AutoML
- Develop new features: computer vision tasks, multi-modal
model, fair AutoML, quality monitoring and drift detection,
visualization and explanation
ü Integration with other libraries
227 Copyright © 2022, by tutorial authors
Call for Contribution

Gitter:
https://gitter.im/FLAMLer/community

Contributing guide:
https://microsoft.github.io/FLAML/docs/Contribute

Roadmap:
https://github.com/microsoft/FLAML/wiki/Roadmap-for-
Upcoming-Features

References
• [MLSys’21] FLAML: A Fast and Lightweight AutoML Library. Chi Wang, Qingyun Wu, Markus Weimer,
Erkang Zhu.
• [ICML’21] ChaCha for Online AutoML. Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi.
• [ICLR’21] Economical Hyperparameter Optimization With Blended Search Strategy. Chi Wang, Qingyun Wu,
Silu Huang, Amin Saied.
• [AAAI’21] Frugal Optimization for Cost-related Hyperparameters. Qingyun Wu, Chi Wang, Silu Huang.
• [ACL’21] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language
Q&A •
Models. Susan Liu, Chi Wang.
[KDD-AutoML’22] Mining Robust Default Configurations for Resource-constrained AutoML. Moe Kayali, Chi
Wang.
• [preprint] Fair AutoML. Qingyun Wu, Chi Wang. arXiv preprint arXiv:2111.06495 (2022).
aka.ms/FLAML

Thanks to
all contributors
& collaborators!

229

Auto ML
No ratings yet
Auto ML
15 pages
ITML Blog
No ratings yet
ITML Blog
8 pages
AIDI 1010 WEEK3 (A) v1.4
No ratings yet
AIDI 1010 WEEK3 (A) v1.4
24 pages
SEMINAR
No ratings yet
SEMINAR
9 pages
AutoML: Simplifying ML for All
No ratings yet
AutoML: Simplifying ML for All
15 pages
ITML Blog
No ratings yet
ITML Blog
5 pages
REF-10-Automated Machine Learning The New Wave of Machine Learning
No ratings yet
REF-10-Automated Machine Learning The New Wave of Machine Learning
8 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
64 pages
AutoML Democratizing Machine Learning
No ratings yet
AutoML Democratizing Machine Learning
10 pages
Slides Seminar Tung KHUAT v2
No ratings yet
Slides Seminar Tung KHUAT v2
52 pages
Top 14 Python AutoML Frameworks
No ratings yet
Top 14 Python AutoML Frameworks
3 pages
AutoML Automating The Machine Learning Pipeline
No ratings yet
AutoML Automating The Machine Learning Pipeline
12 pages
Module II - Lecture 3 - AI & ML For Robotics
No ratings yet
Module II - Lecture 3 - AI & ML For Robotics
14 pages
Automated Machine Learning Practices
No ratings yet
Automated Machine Learning Practices
1 page
Evaluation and Comparison of AutoML Approaches and Tools
No ratings yet
Evaluation and Comparison of AutoML Approaches and Tools
9 pages
Developing Machine Learning Applications With TensorFlow
No ratings yet
Developing Machine Learning Applications With TensorFlow
22 pages
Auto ML v21657563907199
No ratings yet
Auto ML v21657563907199
39 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
Unit - 6 CCD
No ratings yet
Unit - 6 CCD
5 pages
15 NIPS Auto Sklearn Preprint
No ratings yet
15 NIPS Auto Sklearn Preprint
9 pages
130+ +Azure+Hyperdrive+Slides
No ratings yet
130+ +Azure+Hyperdrive+Slides
23 pages
Introduction To ML
No ratings yet
Introduction To ML
34 pages
Lecture 2 - What Is ML
No ratings yet
Lecture 2 - What Is ML
17 pages
ML 1
No ratings yet
ML 1
9 pages
Xautoml: A Visual Analytics Tool For Understanding and Validating Automated Machine Learning
No ratings yet
Xautoml: A Visual Analytics Tool For Understanding and Validating Automated Machine Learning
38 pages
Machine Learning Long Answers
No ratings yet
Machine Learning Long Answers
4 pages
ITML Blog
No ratings yet
ITML Blog
2 pages
Unit-1 Introduction To Machine Learning (5hrs)
No ratings yet
Unit-1 Introduction To Machine Learning (5hrs)
8 pages
Achine Learning Actionable Roadmap
No ratings yet
Achine Learning Actionable Roadmap
19 pages
M6 - Custom Model Building With Cloud AutoML Slides
No ratings yet
M6 - Custom Model Building With Cloud AutoML Slides
31 pages
A Review On Automated Machine Learning (AutoML) Systems
No ratings yet
A Review On Automated Machine Learning (AutoML) Systems
6 pages
Neural Network Classification With
No ratings yet
Neural Network Classification With
25 pages
2024 AutoML Past, Present and Future
No ratings yet
2024 AutoML Past, Present and Future
82 pages
ML Resources CW 2025
No ratings yet
ML Resources CW 2025
5 pages
Introduction To AutoML
No ratings yet
Introduction To AutoML
9 pages
Enabling Automated Machine Learning For Model-Driven AI Engineering
No ratings yet
Enabling Automated Machine Learning For Model-Driven AI Engineering
5 pages
Automl-Zero: Evolving Machine Learning Algorithms From Scratch
No ratings yet
Automl-Zero: Evolving Machine Learning Algorithms From Scratch
23 pages
Democratizing AI, and Surviving Titanic With Automated Machine Learning - Adnan Masood
No ratings yet
Democratizing AI, and Surviving Titanic With Automated Machine Learning - Adnan Masood
21 pages
Machine Learning (ML) - Comprehensive Summary
No ratings yet
Machine Learning (ML) - Comprehensive Summary
7 pages
ML Intro Beginner Detailed
No ratings yet
ML Intro Beginner Detailed
22 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
Automated Machine Learning in Action 1st Edition Qingquan Song Full
100% (2)
Automated Machine Learning in Action 1st Edition Qingquan Song Full
143 pages
1 s2.0 S2949715923000604 Main
No ratings yet
1 s2.0 S2949715923000604 Main
30 pages
Lecture 15 17
No ratings yet
Lecture 15 17
44 pages
Mooc Progress Report
No ratings yet
Mooc Progress Report
8 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
2 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
5 pages
ITML Blog
No ratings yet
ITML Blog
5 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Information 15 00039
No ratings yet
Information 15 00039
20 pages
00 Pytorch and Deep Learning Fundamentals PDF
No ratings yet
00 Pytorch and Deep Learning Fundamentals PDF
44 pages
Amlb: An Automl Benchmark
No ratings yet
Amlb: An Automl Benchmark
65 pages
Advanced Machine Learning Tutorial
No ratings yet
Advanced Machine Learning Tutorial
37 pages
Hiperparametre
No ratings yet
Hiperparametre
10 pages
Unit Ii
No ratings yet
Unit Ii
83 pages
Deep Atlas MLI Syllabus
No ratings yet
Deep Atlas MLI Syllabus
1 page
What Is Azure Machine Learning Studio
No ratings yet
What Is Azure Machine Learning Studio
18 pages
10 From Zero To ML
No ratings yet
10 From Zero To ML
53 pages
Deep Speech 3 1707.07413
No ratings yet
Deep Speech 3 1707.07413
8 pages
Auc Mu Kleiman19a
No ratings yet
Auc Mu Kleiman19a
9 pages
Transformer 1803.02155
No ratings yet
Transformer 1803.02155
5 pages
5634 Convolutional Neural Networks With Intra Layer Recurrent Connections For Scene Labeling
No ratings yet
5634 Convolutional Neural Networks With Intra Layer Recurrent Connections For Scene Labeling
9 pages
Cost-Effective Hyperparameter Optimization
No ratings yet
Cost-Effective Hyperparameter Optimization
29 pages
Weighted Focal Loss
No ratings yet
Weighted Focal Loss
9 pages
Transformers for Time Series Survey
No ratings yet
Transformers for Time Series Survey
8 pages
Do We Really Need Deep Learning Models For Tiem Series Forecasting 2101.02118
No ratings yet
Do We Really Need Deep Learning Models For Tiem Series Forecasting 2101.02118
16 pages
Sbom Fda
No ratings yet
Sbom Fda
14 pages
DM Trends
No ratings yet
DM Trends
49 pages
BBM-Online Class Routine
No ratings yet
BBM-Online Class Routine
1 page
Product Management Intern Insights
No ratings yet
Product Management Intern Insights
10 pages
SAATVIK TRIVEDI - Resume
No ratings yet
SAATVIK TRIVEDI - Resume
1 page
Aspera File Transfer Level 2 Quiz
No ratings yet
Aspera File Transfer Level 2 Quiz
14 pages
Federal Bank: BY Bharath.N 1091007
No ratings yet
Federal Bank: BY Bharath.N 1091007
15 pages
Apr 23-24 Jio Invoice
0% (1)
Apr 23-24 Jio Invoice
1 page
Windows Update Ring Troubleshooting
No ratings yet
Windows Update Ring Troubleshooting
13 pages
NPS01660 - E - Government Engineering College Thrissur
No ratings yet
NPS01660 - E - Government Engineering College Thrissur
3 pages
CADD-Course Summary - PDF PDF
No ratings yet
CADD-Course Summary - PDF PDF
32 pages
Cyber-Attacks and Threats For Healthcare - A Multi-Layer Thread Analysis
No ratings yet
Cyber-Attacks and Threats For Healthcare - A Multi-Layer Thread Analysis
4 pages
4 e 7 FG Ns ZTX BAbh J9 Wuyjm V
No ratings yet
4 e 7 FG Ns ZTX BAbh J9 Wuyjm V
73 pages
SIMATIC WinCC OA - For Tunnels
No ratings yet
SIMATIC WinCC OA - For Tunnels
59 pages
MD Latest CV 06032025-1
No ratings yet
MD Latest CV 06032025-1
4 pages
III CS - Software Project Management - Model Question Paper
No ratings yet
III CS - Software Project Management - Model Question Paper
2 pages
Isc2 CC
No ratings yet
Isc2 CC
68 pages
GENPACT Hyderabad
67% (6)
GENPACT Hyderabad
624 pages
AWS System Administration, 2nd Edition (Second Early Release) Mike Ryan and Federico Lucifredi
No ratings yet
AWS System Administration, 2nd Edition (Second Early Release) Mike Ryan and Federico Lucifredi
49 pages
Sample Resume
No ratings yet
Sample Resume
2 pages
V Model in Automotive ECU Development
No ratings yet
V Model in Automotive ECU Development
20 pages
Design Process: Obstacles and Pitfalls in Development Path
No ratings yet
Design Process: Obstacles and Pitfalls in Development Path
33 pages
RD 30 - AUD - AWR Oracle Procurement 2
No ratings yet
RD 30 - AUD - AWR Oracle Procurement 2
9 pages
5.4 Organization Structure and Risk Management
No ratings yet
5.4 Organization Structure and Risk Management
10 pages
Introduction To Power Apps Portals
No ratings yet
Introduction To Power Apps Portals
4 pages
IT Infrastructure Architecture Infrastructure Building Blocks and Concepts Laan PDF Download
100% (1)
IT Infrastructure Architecture Infrastructure Building Blocks and Concepts Laan PDF Download
56 pages
Implementing Business Logic in Microservices
No ratings yet
Implementing Business Logic in Microservices
5 pages
Microsoft Teams Admin Exam Guide
No ratings yet
Microsoft Teams Admin Exam Guide
8 pages
Introduction To Tableau: Data Visualization With Tableau
No ratings yet
Introduction To Tableau: Data Visualization With Tableau
17 pages
Software Project Management Guide
No ratings yet
Software Project Management Guide
15 pages

FLAML Tutorial 2022-KDD

Uploaded by

FLAML Tutorial 2022-KDD

Uploaded by

FLAML: A Fast Library for AutoML & Tuning

Chi Wang1, Qingyun Wu2, Susan Xueqing Liu3, Luis Quintanilla1

Code examples, demos, use cases

Research & Development opportunities

2 Copyright © 2022, by tutorial authors

Second half (11:20 AM – 12:30 AM)

Companies need to master MLOps to make the most

5 Copyright © 2022, by tutorial authors

Model Training Model Consumption

6 Copyright © 2022, by tutorial authors

Model Training Model Consumption

7 Copyright © 2022, by tutorial authors

8 Copyright © 2022, by tutorial authors

9 Copyright © 2022, by tutorial authors

10 Copyright © 2022, by tutorial authors

Model Training Model Consumption

11 Copyright © 2022, by tutorial authors

Model Training Model Consumption

AutoML Autonomous data platform

13 Copyright © 2022, by tutorial authors

14 Copyright © 2022, by tutorial authors

15 Copyright © 2022, by tutorial authors

16 Copyright © 2022, by tutorial authors

17 Copyright © 2022, by tutorial authors

18 Copyright © 2022, by tutorial authors

 Tune user-defined function

19 Copyright © 2022, by tutorial authors

üLeverage joint impact of multiple factors on cost & error

20 Copyright © 2022, by tutorial authors

ChaCha Online Meta Offline

21 Copyright © 2022, by tutorial authors

Years of Programming Experience

22 Copyright © 2022, by tutorial authors

 Impact: Accuracy, Productivity, R&D

23 Copyright © 2022, by tutorial authors

Second half (11:20 AM – 12:30 AM)

Task-oriented AutoML AutoML

Tune user-defined function Tune ML

26 Copyright © 2022, by tutorial authors

27 Copyright © 2022, by tutorial authors

Model Training Model Consumption

29 Copyright © 2022, by tutorial authors

Time series forecasting

30 Copyright © 2022, by tutorial authors

31 Copyright © 2022, by tutorial authors

32 Copyright © 2022, by tutorial authors

34 Copyright © 2022, by tutorial authors

40 Copyright © 2022, by tutorial authors

41 Copyright © 2022, by tutorial authors

42 Copyright © 2022, by tutorial authors

43 Copyright © 2022, by tutorial authors

Estimator selection Hyperparameter selection

Inputs: resource 𝑚! . 𝑓𝑖𝑡(𝐷"#$%& )

45 Copyright © 2022, by tutorial authors

Each time you query f(x), you

• Two important properties:

• Vanilla HPO: find 𝑥 ∗ with small number of iterations

46 Copyright © 2022, by tutorial authors

1. If g(x) is constant, low cost ó small #iterations 𝑔( (# 𝑙𝑒𝑎𝑣𝑒𝑠, #𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟𝑠) )

2. It is common to encounter cost-related

2. Easy to know low-cost

48 Copyright © 2022, by tutorial authors

49 Copyright © 2022, by tutorial authors

Repeat the following steps

Repeat the following steps

Repeat the following steps

Repeat the following steps

Repeat the following steps

Low loss Low cost

58 Copyright © 2022, by tutorial authors

59 Copyright © 2022, by tutorial authors

Estimator selection Hyperparameter selection

62 Copyright © 2022, by tutorial authors

63 Copyright © 2022, by tutorial authors

64 Copyright © 2022, by tutorial authors

65 Copyright © 2022, by tutorial authors

Tune user-defined function

Impact: Accuracy, Productivity, R&D

Add custom estimators