KEMBAR78
2 Machine Learning Overview | PDF | Machine Learning | Statistical Classification
0% found this document useful (0 votes)
79 views86 pages

2 Machine Learning Overview

Uploaded by

ahmadaus158
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views86 pages

2 Machine Learning Overview

Uploaded by

ahmadaus158
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Machine Learning Overview

Objectives

⚫ Upon completion of this course, you will understand:


◼ Learning algorithm definitions and machine learning process
◼ Related concepts such as hyperparameters, gradient descent, and cross-validation
◼ Common machine learning algorithms

2 Huawei Confidential
Contents

1. Machine Learning Algorithms

2. Types of Machine Learning

3. Machine Learning Process

4. Important Machine Learning Concepts

5. Common Machine Learning Algorithms

3 Huawei Confidential
Machine Learning Algorithms (1)
⚫ Machine learning is often combined with deep learning methods to study and observe AI
algorithms. A computer program is said to learn from experience 𝐸 with respect to some class
of tasks 𝑇 and performance measure 𝑃, if its performance at tasks in 𝑇, as measured by 𝑃,
improves with experience 𝐸.

Understanding
Data Learning algorithm
(Performance
(Experience 𝐸) (Task 𝑇)
measure 𝑃)

4 Huawei Confidential
Machine Learning Algorithms (2)

Historical
Experience
data

Summarize Train
Input Predict Input Predict
New New Future
Rules Future Model
problem data attributes

6 Huawei Confidential
Created by: Jim Liang

Differences Between Machine Learning Algorithms and Traditional Rule-


based Methods
Rule-based method Machine learning

Training data

Machine
learning

New
Model Prediction
data

• Models are trained on samples.


• Explicit programming is used to solve
• Decision-making rules are complex or
problems.
difficult to describe.
• Rules can be manually determined.
• Machines automatically learn rules.
7 Huawei Confidential
When to Use Machine Learning (1)
⚫ Machine learning provides solutions to complex problems, or those involving a large amount of data
whose distribution function cannot be determined.
⚫ Consider the following scenarios:

Task rules change over time, for example, Data distribution changes over time and
Rules are complex or difficult to describe, part-of-speech tagging, in which new words programs need to adapt to new data
for example, speech recognition. or word meanings can be generated at any constantly, for example, sales trend
time. forecast.

8 Huawei Confidential
When to Use Machine Learning (2)

High
Manual Machine learning
rules algorithms

Complexity of
rules

Simple Rule-based
Low questions algorithms

Small Large
Scale of the problem

9 Huawei Confidential
Rationale of Machine Learning Algorithms

Target equation
𝑓: 𝑋 → 𝑌

Ideal

Actual
Training data Learning Hypothesis function
𝐷: {(𝑥1 , 𝑦1 ) ⋯ , (𝑥𝑛 , 𝑦𝑛 )} algorithm 𝑔≈𝑓

⚫ The objective function 𝑓 is unknown, and the learning algorithm cannot obtain a
perfect function 𝑓.
⚫ Hypothesis function 𝑔 approximates function 𝑓, but may be different from function
𝑓.

10 Huawei Confidential
Main Problems Solved by Machine Learning
⚫ Machine learning can solve many types of tasks. Three most common types are:
◼ Classification: To specify a specific one of the k categories for the input, the learning algorithm usually outputs a function
𝑓: 𝑅 𝑛 → (1,2, … , 𝑘) . For example, image classification algorithms in computer vision solve classification tasks.
◼ Regression: The program predicts the output for a given input. The learning algorithms usually output a function 𝑓: 𝑅 𝑛 → 𝑅.
Such tasks include predicting the claim amount of a policy holder to set an insurance premium or predicting the security price.
◼ Clustering: Based on internal similarities, the program groups a large amount of unlabeled data into multiple classes. Same-
class data is more similar than data across classes. Clustering tasks include search by image and user profiling.
⚫ Classification and regression are two major types of prediction tasks. The output of classification is discrete class
values, and the output of regression is continuous values.

11 Huawei Confidential
Contents

1. Machine Learning Algorithms

2. Types of Machine Learning

3. Machine Learning Process

4. Important Machine Learning Concepts

5. Common Machine Learning Algorithms

12 Huawei Confidential
Types of Machine Learning
⚫ Supervised learning: The program takes a known set of samples and trains an optimal model to generate
predictions. Then, the trained model maps all inputs to outputs and performs simple judgment on the outputs. In
this way, unknown data is classified.
⚫ Unsupervised learning: The program builds a model based on unlabeled input data. For example, a clustering model
groups objects based on similarities. Unsupervised learning algorithms model the highly similar samples, calculate
the similarity between new and existing samples, and classify new samples by similarity.
⚫ Semi-supervised learning: The program trains a model through a combination of a small amount of labeled data and
a large amount of unlabeled data.
⚫ Reinforcement learning: The learning systems learn behavior from the environment to maximize the value of reward
(reinforcement) signal function. Reinforcement learning differs from supervised learning of connectionism in that,
instead of telling the system the correct action, the environment provides scalar reinforcement signals to evaluate its
actions.
⚫ Machine learning evolution is producing new machine learning types, for example, self-supervised learning,
contrastive learning, generative learning.

13 Huawei Confidential
Supervised Learning

Data features Labels

Feature 1 ······ Feature n Target

Supervised learning
Feature 1 ······ Feature n Target
algorithm

Feature 1 ······ Feature n Target

Suitable for
Weather Temperature Wind Speed
Exercise
Sunny High High
Yes
Rainy Low Medium
No
Sunny Low Low
Yes
15 Huawei Confidential
Supervised Learning - Regression
⚫ Regression reflects the features of sample attributes in a dataset. A function is used to express
the sample mapping relationship and further discover the dependency between attributes.
Examples include:
◼ How much money can I make from stocks next week?
◼ What will the temperature be on Tuesday?

Monday Tuesday

38° ?

16 Huawei Confidential
Supervised Learning - Classification
⚫ Classification uses a classification model to map samples in a dataset to a given category.
◼ What category of garbage does the plastic bottle belong to?
◼ Is the email a spam?

17 Huawei Confidential
Unsupervised Learning
Data features

Feature 1 ······ Feature n

Unsupervised learning Intra-cluster


Feature 1 ······ Feature n similarity
algorithm

Feature 1 ······ Feature n

Monthly Sales Product Sale Duration Category


Volume Cluster 1
1000-2000 Badminton racket 6:00-12:00 Cluster 2
500-1000 Basketball 18:00-24:00
1000-2000 Game console 00:00-6:00

18 Huawei Confidential
Unsupervised Learning - Clustering
⚫ Clustering uses a clustering model to classify samples in a dataset into several categories based
on similarity.
◼ Defining fish of the same species.
◼ Recommending movies for users.

19 Huawei Confidential
Semi-supervised Learning

Data features Labels

Feature 1 ······ Feature n Target

Semi-supervised learning
Feature 1 ······ Feature n Unknown
algorithm

Feature 1 ······ Feature n Unknown

Weather Temperature Wind Speed Suitable for


Sunny High High Exercise

Rainy Low Medium Yes

Sunny Low Low /


/
20 Huawei Confidential
Reinforcement Learning
⚫ A reinforcement learning model learns from the environment, takes actions, and adjusts the
actions based on a system of rewards.

Model

Status 𝑠𝑡 Reward 𝑟𝑡 Action 𝑎𝑡

𝑟𝑡+1

𝑠𝑡+1 Environment

21 Huawei Confidential
Reinforcement Learning - Best Action
⚫ Reinforcement learning always tries to find the best action.
◼ Autonomous vehicles: The traffic lights are flashing yellow. Should the vehicle brake or accelerate?
◼ Robot vacuum: The battery level is 10%, and a small area is not cleaned. Should the robot continue cleaning or
recharge?

22 Huawei Confidential
Contents

1. Machine Learning Algorithms

2. Types of Machine Learning

3. Machine Learning Process

4. Important Machine Learning Concepts

5. Common Machine Learning Algorithms

23 Huawei Confidential
Machine Learning Process

Feature Model
Data Data Model Model
extraction and deployment and
preparation cleansing training evaluation
selection integration

Feedback and
iteration

24 Huawei Confidential
Machine Learning Basic Concept - Dataset
⚫ Dataset: collection of data used in machine learning tasks, where each piece of data is called a
sample. Items or attributes that reflect the presentation or nature of a sample in a particular
aspect are called features.
 Training set: dataset used in the training process, where each sample is called a training sample.
Learning (or training) is the process of building a model from data.
 Test set: dataset used in the testing process, where each sample is called a test sample. Testing refers
to the process, during which the learned model is used for prediction.

25 Huawei Confidential
Data Overview
⚫ Typical dataset composition

Feature 1 Feature 2 Feature 3 Label

No. Area Location Orientation House Price


1 100 8 South 1000

2 120 9 Southwest 1300


Training
set 3 60 6 North 700

4 80 9 Southeast 1100

Test set 5 95 3 South 850

26 Huawei Confidential
Importance of Data Processing
⚫ Data is crucial to models and determines the scope of model capabilities. All good models
require good data.

Data cleansing
Data Data
standardization
preprocessing
Fill in missing values, Standardize data to
and detect and reduce noise and
eliminate noise and improve model
other abnormal points accuracy
Data dimension
reduction

Simplify data
attributes to avoid the
curse of
dimensionality

27 Huawei Confidential
Data Cleansing
⚫ Most machine learning models process features, which are usually numeric representations of
input variables that can be used in the model.
⚫ In most cases, only preprocessed data can be used by algorithms. Data preprocessing involves
the following operations:
◼ Data filtering
◼ Data loss handling
◼ Handling of possible error or abnormal values
◼ Merging of data from multiple sources
◼ Data consolidation

29 Huawei Confidential
Dirty Data
⚫ Raw data usually contains data quality problems:
◼ Incompleteness: Incomplete data or lack of relevant attributes or values.
◼ Noise: Data contains incorrect records or abnormal points.
◼ Inconsistency: Data contains conflicting records.

Missing value

Invalid value

Misfielded value

Invalid duplicate Incorrect


items format Dependent attributes Misspelling

30 Huawei Confidential
Data Conversion
⚫ Preprocessed data needs to be converted into a representation suitable for machine learning models.
The following are typically used to convert data:
◼ Encoding categorical data into numerals for classification
◼ Converting numeric data into categorical data to reduce the values of variables (for example, segmenting age
data)
◼ Other data:
◼ Embedding words into text to convert them into word vectors (Typically, models such as word2vec and BERT are used.)
◼ Image data processing, such as color space conversion, grayscale image conversion, geometric conversion, Haar-like
features, and image enhancement
◼ Feature engineering:
◼ Normalizing and standardizing features to ensure that different input variables of a model fall into the same value range
◼ Feature augmentation: combining or converting the existing variables to generate new features, such as averages.

31 Huawei Confidential
Necessity of Feature Selection
⚫ Generally, a dataset has many features, some of which may be unnecessary or irrelevant to the
values to be predicted.
⚫ Feature selection is necessary in the following aspects:

Simplifies
models for Shortens
easy training time
interpretation

Improves
Avoids the model
curse of generalization
dimensionality and avoids
overfitting

32 Huawei Confidential
Feature Selection Methods - Filter
⚫ Filter methods are independent of models during feature selection.

By evaluating the correlation between each feature


and target attribute, a filter method scores each
feature using a statistics measurement and then sorts
the features by score. This can preserve or eliminate
specific features.
Common methods:
• Pearson correlation coefficient
Selecting the best
Traversing all
feature subset
Learning Model • Chi-square coefficient
features algorithm evaluation
• Mutual information
Limitations of filter methods:
Filter method process
• Filter methods tend to select redundant variables
because they do not consider the relationships
between features.

33 Huawei Confidential
Feature Selection Methods - Wrapper
⚫ Wrapper methods use a prediction model to score a feature subset.

Wrapper methods treat feature selection as a search


issue and evaluate and compare different
combinations. Wrapper methods use a predictive
model to evaluate the different feature combinations,
and score the feature subsets by model accuracy.
Selecting the best feature subset Common method:
• Recursive feature elimination
Traversing all Generating a Learning Model Limitations of wrapper methods:
features feature subset algorithm evaluation
• Wrapper methods train a new model for each
feature subset, which can be computationally
Wrapper method process intensive.
• Wrapper methods usually provide high-
performance feature sets for a specific type of
model.

34 Huawei Confidential
Feature Selection Methods - Embedded
⚫ Embedded methods treat feature selection as a part of the modeling process.

Regularization is the most common type of


embedded methods.
Regularization methods, also called penalization
Selecting the most appropriate feature subset methods, introduce additional constraints into the
optimization of a predictive algorithm to bias the
model toward lower complexity and reduce the
Traversing all Generating a Learning algorithm + number of features.
features feature subset model evaluation

Common method:
Embedded method process
• LASSO regression

35 Huawei Confidential
Supervised Learning Example - Learning Phase
⚫ Use a classification model to determine whether a person is a basketball player based on
specific features.
Features (attributes) Target (label)

Service Name City Age Label


data Mike Miami 42 yes Training set
Data used by the model to
Jerry New York 32 no determine the relationships
(Cleansed features and labels)
Split between features and
Bryan Orlando 18 no
targets.
Task: Use a classification model to Patricia Miami 45 yes
determine whether a person is a basketball
player using specific features Elodie Phoenix 35 no Test set
Remy Chicago 72 yes New data for evaluating
model effectiveness.
John New York 48 yes
Train the
model Each feature or set of features provides a
judgment basis for the model.
37 Huawei Confidential
Supervised Learning Example - Prediction Phase
Name City Age Label
Marine Miami 45 ?
Unknown data
Julien Miami 52 ? Recent data cannot
New determine whether they are
data Fred Orlando 20 ? a basketball player.
Michelle Boston 34 ?
Nicolas Phoenix 90 ?
IF city = Miami → Probability = +0.7
IF city= Orlando → Probability = +0.2
Apply the IF age > 42 → Probability = +0.05*age + 0.06
model IF age <= 42 → Probability = +0.01*age + 0.02

Name City Age Prediction


Marine Miami 45 0.3
New Predicted possibility
data Julien Miami 52 0.9 Use the model against new
data to predict the
Fred Orlando 20 0.6 possibility of them being
Predicted
Michelle Boston 34 0.5 basketball players.
data
Nicolas Phoenix 90 0.4

38 Huawei Confidential
What Is a Good Model?

• Generalization
The accuracy of predictions based on actual data

• Explainability
Predicted results are easy to explain

• Prediction speed
The time needed to make a prediction

39 Huawei Confidential
Model Effectiveness (1)
⚫ Generalization capability: Machine learning aims to ensure models perform well on new
samples, not just those used for training. Generalization capability, also called robustness, is
the extent to which a learned model can be applied to new samples.
⚫ Error is the difference between the prediction of a learned model on a sample and the actual
result of the sample.
◼ Training error is the error of the model on the training set.
◼ Generalization error is the error of the model on new samples. Obviously, we prefer a model
with a smaller generalization error.
⚫ Underfitting: The training error is large.
⚫ Overfitting: The training error of a trained model is small while the generalization error is large.

40 Huawei Confidential
Model Effectiveness (2)
⚫ Model capacity, also known as model complexity, is the capability of the model to fit various functions.
◼ With sufficient capacity to handle task complexity and training data volumes, the algorithm results are optimal.
◼ Models with an insufficient capacity cannot handle complex tasks because underfitting may occur.
◼ Models with a large capacity can handle complex tasks, but overfitting may occur when the capacity is greater
than the amount required by a task.

Underfitting: Overfitting:
features not learned Good fitting
noises learned
41 Huawei Confidential
Cause of Overfitting - Errors
⚫ Prediction error = Bias2 + Variance + Ineliminable error
⚫ In general, the two main factors of prediction error are variance and
Variance
bias.
⚫ Variance: Bias

◼ How much a prediction result deviates from the mean


◼ Variance is caused by the sensitivity of the model to small fluctuations in
a training set.
⚫ Bias:
◼ Difference between the average of the predicted values and the actual
values.

42 Huawei Confidential
Variance and Bias
⚫ Different combinations of variance and bias are as
follows:
◼ Low bias & low variance ➜ good model
◼ Low bias & high variance ➜ inadequate model
◼ High bias & low variance ➜ inadequate model
◼ High bias & high variance ➜ bad model
⚫ An ideal model can accurately capture the rules in the
training data and be generalized to invisible (new)
data. However, it is impossible for a model to complete
both tasks at the same time.

43 Huawei Confidential
Complexity and Errors of Models
⚫ The more complex a model is, the smaller its training error is.
⚫ As the model complexity increases, the test error decreases before increasing again, forming a
convex curve.

Test error
Error

Training error
Model complexity

44 Huawei Confidential
Performance Evaluation of Machine Learning - Regression
⚫ Mean absolute error (MAE). An MAE value closer to 0 indicates the model fits the training data
better.
1 m
MAE =  yi − yi
m i =1
⚫ Mean squared error (MSE).
2
1 m
MSE =  ( yi − yi )
m i =1
⚫ The value range of 𝑅2 is [0,1]. A larger value indicates that the model fits the training data
better. 𝑇𝑆𝑆 indicates the difference between samples, and 𝑅𝑆𝑆 indicates the difference
between the predicted values and sample values.
m 2

RSS  ( yi − yi )
R = 1−
2
= 1 − i =1 2
TSS m

( y
i =1
i − yi )

45 Huawei Confidential
Performance Evaluation of Machine Learning - Classification (1)
⚫ Terms:
◼ 𝑃: positive, indicating the number of real positive cases in the data. Predicted
◼ 𝑁: negative, indicating the number of real negative cases in the data. Yes No Total
Actual
◼ 𝑇P : true positive, indicating the number of positive cases that are correctly
Yes 𝑇𝑃 𝐹𝑁 𝑃
classified.
◼ 𝑇𝑁: true negative, indicating the number of negative cases that are correctly No 𝐹𝑃 𝑇𝑁 𝑁
classified.
Total 𝑃′ 𝑁′ 𝑃+𝑁
◼ 𝐹𝑃: false positive, indicating the number of positive cases that are incorrectly
classified.
Confusion matrix
◼ 𝐹𝑁: false negative, indicating the number of negative cases that are incorrectly
classified.
⚫ The confusion matrix is an 𝑚 × 𝑚 table at minimum. The entry 𝐶𝑀𝑖,𝑗 in the first 𝑚 rows and 𝑚 columns indicates the number of
cases that belong to class 𝑖 but are labeled as 𝑗.
 For classifiers with high accuracy, most of the cases should be represented by entries on the diagonal of the confusion matrix from 𝐶𝑀1,1 to
𝐶𝑀𝑚,𝑚 ,while other entries are 0 or close to 0. That is, 𝐹𝑃 and 𝐹𝑁 are close to 0.

46 Huawei Confidential
Performance Evaluation of Machine Learning - Classification (2)
Measurement Formula
𝑇𝑃 + 𝑇𝑁
Accuracy, recognition rate
𝑃+𝑁
𝐹𝑃 + 𝐹𝑁
Error rate, misclassification rate
𝑃+𝑁
True positive rate, sensitivity, 𝑇𝑃
recall 𝑃
𝑇𝑁
True negative rate, specificity
𝑁
𝑇𝑃
Precision
𝑇𝑃 + 𝐹𝑃
𝐹1 value, harmonic mean of 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
precision and recall 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

𝐹𝛽 value, where 𝛽 is a non- (1 + 𝛽 2 ) × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙


negative real number 𝛽 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

47 Huawei Confidential
Performance Evaluation of Machine Learning - Example
⚫ In this example, an ML model was trained to identify an image of a cat. To evaluate the model's
performance, 200 images were used, of which 170 of them were cats.
⚫ The model reported that 160 images were cats.
𝑇𝑃 140
Precision: 𝑃 = = = 87.5% Predicted
𝑇𝑃+𝐹𝑃 140+20
𝒚𝒆𝒔 𝒏𝒐 Total
𝑇𝑃 140 Actual
Recall: 𝑅 = = = 82.4%
𝑃 170
𝑦𝑒𝑠 140 30 170
𝑇𝑃+𝑇𝑁 140+10
Accuracy: 𝐴𝐶𝐶 = = = 75% 𝑛𝑜 20 10 30
𝑃+𝑁 170+30

Total 160 40 200

48 Huawei Confidential
Contents

1. Machine Learning Algorithms

2. Types of Machine Learning

3. Machine Learning Process

4. Important Machine Learning Concepts

5. Common Machine Learning Algorithms

49 Huawei Confidential
Machine Learning Training Methods - Gradient Descent (1)
⚫ This method uses the negative gradient direction
of the current position as the search direction,
which is the fastest descent direction of the
current position. The formula is as follows:


w k +1 = w k −   f w ( x i
)
𝜂 is the learning rate. 𝑖 indicates the 𝑖-th data
k

record. 𝜂𝛻𝑓𝑤𝑘 (𝑥 𝑖 ) indicates the change of weight


parameter 𝑤 in each iteration.
⚫ Convergence means that the value of the
objective function changes very little or reaches
the maximum number of iterations.

50 Huawei Confidential
Parameters and Hyperparameters
⚫ A model contains not only parameters but also hyperparameters. Hyperparameters enable the
model to learn the optimal configurations of the parameters.
◼ Parameters are automatically learned by models.
◼ Hyperparameters are manually set. Parameters are
"distilled" from data.

Model

Train

Use hyperparameters to
control training

53 Huawei Confidential
Hyperparameters
• Commonly used for model parameter • λ of Lasso/Ridge regression
estimation. • Learning rate, number of iterations,
• Specified by the user. batch size, activation function, and
• Set heuristically. number of neurons of a neural network
• Often tuned for a given predictive to be trained
modeling problem. • 𝐶 and 𝜎 of support vector machines
(SVMs)
• k in the k-nearest neighbors (k-NN)
algorithm
• Number of trees in a random forest

Hyperparameters are
Common
configurations outside
hyperparameters
the model.

54 Huawei Confidential
Hyperparameter Search Process and Methods

1. Divide a dataset into a training set, validation set, and test set.
2. Optimize the model parameters using the training set based on the model
performance metrics.
3. Search for model hyperparameters using the validation set based on model
Hyperparameter performance metrics.
search general 4. Perform step 2 and step 3 alternately until model parameters and
process hyperparameters are determined, and assess the model using the test set.

•Grid search
•Random search
•Heuristic intelligent search
Search algorithms •Bayesian search
(step 3)

55 Huawei Confidential
Hyperparameter Tuning Methods - Grid Search

⚫ Grid search performs an exhaustive search of all possible hyperparameter


combinations to form a hyperparameter value grid.
Grid search
⚫ In practice, the hyperparameter ranges and steps are
5
manually specified.

Hyperparameter 1
4

⚫ Grid search is expensive and time-consuming.


3

 This method works well when there are relatively few 2

hyperparameters. Therefore, it is feasible for general machine learning


1
algorithms, but not for neural networks (see the deep learning course).
0 1 2 3 4 5

Hyperparameter 2

56 Huawei Confidential
Hyperparameter Tuning Methods - Random Search
⚫ If the hyperparameter search space is large, random search is
more appropriate than grid search.
Random search
⚫ In a random search, each setting item is sampled from possible
parameter values to find the most appropriate parameter
subset.

Hyperparameter 1
⚫ Note:
◼ In a random search, a search is first performed within a broad range, and
then the range is narrowed based on the location of the best result.
◼ Some hyperparameters are more important than others and affect random
search preferences.
Hyperparameter 2

57 Huawei Confidential
Cross-Validation (1)
⚫ Cross-validation is a statistical analysis method used to check the performance of classifiers. It splits the original
data into the training set and validation set. The former is used to train a classifier, whereas the latter is used to
evaluate the classifier by testing the trained model.
⚫ k-fold cross-validation (k-fold CV):
◼ Divides the original data into 𝑘 (usually equal-sized) subsets.
◼ Each unique group is treated as a validation set, and the remaining 𝑘 − 1 groups are treated as the training set.
In this way, 𝑘 models are obtained.
◼ The average classification accuracy score of the 𝑘 models on the validation set is used as the performance metric
for k-fold CV classifiers.

58 Huawei Confidential
Cross-Validation (2)

Full dataset

Training set Test set

Training set Validation set Test set

⚫ Note: k in k-fold CV is a hyperparameter.

59 Huawei Confidential
Contents

1. Machine Learning Algorithms

2. Types of Machine Learning

3. Machine Learning Process

4. Important Machine Learning Concepts

5. Common Machine Learning Algorithms

60 Huawei Confidential
Machine Learning Algorithm Overview
Machine learning

Supervised learning Unsupervised learning

Classification Regression Clustering Other

Logistic regression Linear regression k-means clustering Association rule


Principal component analysis
SVM SVM Hierarchical clustering

Neural network Neural network Density-based clustering Gaussian mixture modeling

Decision tree Decision tree

Random forest Random forest


Gradient boosted decision tree
GBDT
(GBDT)
k-NN k-NN

Naive Bayes

61 Huawei Confidential
Linear Regression (1)
⚫ Linear regression uses the regression analysis of mathematical statistics to determine the
quantitative relationship between two or more variables.
⚫ Linear regression is a type of supervised learning.

Simple linear regression Multiple linear regression

62 Huawei Confidential
Linear Regression (2)
⚫ The model function of linear regression is as follows, where 𝑤 is the weight parameter, 𝑏 is the bias, and 𝑥
represents the sample:
hw ( x) = w x + b
T

⚫ The relationship between the value predicted by the model and the actual value is as follows, where 𝑦 indicates the
actual value, and 𝜀 indicates the error:
y = w x +b+
T

⚫ The error 𝜀 is affected by many independent factors. Linear regression assumes that the error 𝜀 follows normal
distribution. The loss function of linear regression can be obtained using the normal distribution function and
maximum likelihood estimation (MLE):
1
J ( w) =  ( w − )
2
h ( x ) y
2m
⚫ We want the predicted value approaches the actual value as far as possible, that is, to minimize the loss value. We
can use a gradient descent algorithm to calculate the weight parameter 𝑤 when the loss function reaches the
minimum, thereby complete model building.
63 Huawei Confidential
Linear Regression Extension - Polynomial Regression
⚫ Polynomial regression is an extension of linear regression. Because the complexity of a dataset
exceeds the possibility of fitting performed using a straight line (obvious underfitting occurs if
the original linear regression model is used), polynomial regression is used.

hw ( x ) = w1 x + w2 x 2 + + wn x n + b
Here, 𝑛-th power indicates the degree of the
polynomial.

Polynomial regression is a type of linear


regression. Although its features are non-linear,
the relationship between its weight parameters 𝑤
Comparison between linear and polynomial regression
is still linear.
64 Huawei Confidential
Preventing Overfitting of Linear Regression
⚫ Regularization terms help reduce overfitting. The 𝑤 value cannot be too large or too small in
the sample space. You can add a square sum loss to the target function:
1 2 2
𝐽(𝑤) = ෍ ℎ𝑤 (𝑥) − 𝑦 +𝜆 𝑤 2
2𝑚
⚫ Regularization term: This regularization term is called L2-norm. Linear regression that uses this
loss function is called Ridge regression.
1 2
𝐽(𝑤) = ෍ ℎ𝑤 (𝑥) − 𝑦 +𝜆 𝑤 1
2𝑚
⚫ Linear regression with an absolute loss is called Lasso regression.

65 Huawei Confidential
Logistic Regression (1)
⚫ The logistic regression model is a classification model used to resolve classification problems. The model
is defined as follows:
𝑒 −(𝑤𝑥+𝑏)
𝑃 𝑌=0𝑥 =
1 + 𝑒 −(𝑤𝑥+𝑏)
1
𝑃 𝑌=1𝑥 =
1 + 𝑒 −(𝑤𝑥+𝑏)
𝑤 represents the weight, 𝑏 represents the bias, and 𝑤𝑥 + 𝑏 represents a linear function with respect to 𝑥.
Compare the preceding two probability values. 𝑥 belongs to the type with a larger probability value.

66 Huawei Confidential
Logistic Regression (2)
⚫ Logistic regression and linear regression are both linear models in broad sense. The former
introduces a non-linear factor (sigmoid function) on the basis of the latter and sets a threshold.
Therefore, logistic regression applies to binary classification.
⚫ According to the model function of logistic regression, the loss function of logistic regression
can be calculated through maximum likelihood estimation as follows:
1
J ( w) = -  ( y ln hw ( x) + (1 − y ) ln(1 − hw ( x)) )
m
⚫ In the formula, 𝑤 indicates the weight parameter, 𝑚 indicates the number of samples, 𝑥
indicates the sample, and 𝑦 indicates the actual value. You can also obtain the values of all the
weight parameters 𝑤 by using a gradient descent algorithm.

67 Huawei Confidential
Decision Tree
⚫ Each non-leaf node of the decision tree denotes a test on an attribute; each branch represents the output of a test;
and each leaf (or terminal) node holds a class label. The algorithm starts at the root node (topmost node in the
tree), tests the selected attributes on the intermediate (internal) nodes, and generates branches according to the
output of the tests. Then, it saves the class labels on the leaf nodes as the decision results.
Root node

Small Large

Does not Short- Long-


squeak Squeaks necked necked

Short- Long-
It could be a nosed nosed It could be a
It could be a
squirrel. giraffe.
rat.
Stays on Stays in It could be an
land water elephant.

It could be a rhino. It could be a hippo.


71 Huawei Confidential
Structure of a Decision Tree

Root node

Subnode
Subnode

Leaf node Leaf node Subnode Leaf node

Leaf node Leaf node Leaf node

72 Huawei Confidential
Key to Decision Tree Construction
⚫ A decision tree requires feature attributes and an appropriate tree structure. The key step of
constructing a decision tree is to divide data of all feature attributes, compare the result sets in terms of
purity, and select the attribute with the highest purity as the data point for dataset division.
⚫ Purity is measured mainly through the information entropy and GINI coefficient. The formula is as
follows:
K K
H ( X )= - pk log 2 ( pk ) Gini = 1 −  pk2
k =1 2 k =1 2
𝑚𝑖𝑛𝑗,𝑠 [𝑚𝑖𝑛𝑐1 ෍ 𝑦𝑖 − 𝑐1 + 𝑚𝑖𝑛𝑐2 ෍ 𝑦𝑖 − 𝑐2 ]
𝑥𝑖 ∈𝑅1 𝑗,𝑠 𝑥𝑖 ∈𝑅2 𝑗,𝑠

⚫ 𝑝𝑘 indicates the probability that a sample belongs to category 𝑘 (in a total of K categories). A larger purity difference
between the sample before and after division indicates a better decision tree.
⚫ Common decision tree algorithms include ID3, C4.5, and CART.

73 Huawei Confidential
Decision Tree Construction Process
⚫ Feature selection: Select one of the features of the training data as the split standard of the
current node. (Different standards distinguish different decision tree algorithms.)
⚫ Decision tree generation: Generate subnodes from top down based on the selected feature
and stop until the dataset can no longer be split.
⚫ Pruning: The decision tree may easily become overfitting unless necessary pruning (including
pre-pruning and post-pruning) is performed to reduce the tree size and optimize its node
structure.

74 Huawei Confidential
Decision Tree Example
⚫ The following figure shows a decision tree for a classification problem. The classification result is affected
by three attributes: refund, marital status, and taxable income.

TID Refund Marital Taxable Cheat


Status Income
Refund
1 Yes Single 125K No
2 No Married 100K No
Marital
3 No Single 70K No No Status
4 Yes Married 120K No
5 No Divorced 95K Yes
Taxable
6 No Married 60K No Income
No
7 Yes Divorced 220K No
8 No Single 85K Yes No Yes
9 No Married 75K No
10 No Single 90K Yes

75 Huawei Confidential
Support Vector Machine
⚫ Support vector machines (SVMs) are binary classification models. Their basic model is the linear classifier
that maximizes the width of the gap between the two categories in the feature space. SVMs also have a
kernel trick, which makes it a non-linear classifier. The learning algorithm of SVMs is the optimal
algorithm for convex quadratic programming.

Mapping

Difficult to split in a low- Easy to split in a high-


dimensional space. dimensional space.

76 Huawei Confidential
Linear SVM (1)
⚫ How can we divide the red and blue data points with just one line?

or

Two-dimensional data set with Both the division methods on the left and right can divide
two sample categories data. But which is correct?

77 Huawei Confidential
Linear SVM (2)
⚫ We can use different straight lines to divide data into different categories. SVMs find a straight line and
keep the most nearby points as far from the line as possible. This gives the model a strong generalization
capability. These most nearby points are called support vectors.
⚫ In the two-dimensional space, a straight line is used for division; in the high-dimensional space, a
hyperplane is used for division.

Maximize the
distance from each
support vector to
the line

78 Huawei Confidential
Non-linear SVM (1)
⚫ How can we divide a linear inseparable data set?

Linear SVM works well on a linear A non-linear data set cannot


separable data set. be divided using a straight
line.

79 Huawei Confidential
Non-linear SVM (2)
⚫ Kernel functions can be used to create non-linear SVMs.
⚫ Kernel functions allow algorithms to fit a maximum-margin hyperplane in a transformed high-
dimensional feature space.
Common kernel functions

Polynomial
Linear kernel
kernel

Gaussian Sigmoid
kernel kernel
Input space High-dimensional
feature space

80 Huawei Confidential
k-Nearest Neighbors Algorithm (1)
⚫ The k-nearest neighbor (k-NN) classification algorithm
is a theoretically mature method and one of the
simplest machine learning algorithms. The idea of k-
NN classification is that, if most of k closest samples
(nearest neighbors) of a sample in the feature space ?
belong to a category, the sample also belongs to this
category.

The category of point ? varies according


to how many neighbor nodes are chosen.

81 Huawei Confidential
k-Nearest Neighbors Algorithm (2)
⚫ The logic of k-NN is simple: If an object's k nearest neighbors belong to a class, so does the
object.
⚫ k-NN is a non-parametric method and is often used for datasets with irregular decision
boundaries.
◼ k-NN typically uses the majority voting method to predict classification, and uses the mean value
method to predict regression.
⚫ k-NN requires a very large amount of computing.

82 Huawei Confidential
k-Nearest Neighbors Algorithm (3)
⚫ Typically, a larger k value reduces the impact of noise on classification, but makes the boundary between
classes less obvious.
◼ A large k value indicates a higher probability of underfitting because the division is too rough; while a small k
value indicates a higher probability of overfitting because the division is too refined.

• As seen from the figure, the boundary


becomes smoother as the k value
increases.
• As the k value increases, the points will
eventually become all blue or all red.

83 Huawei Confidential
Naive Bayes (1)
⚫ Naive Bayes classifiers are a family of simple "probabilistic classifiers" based on Bayes' theorem with strong
independence assumptions between the features. For a given sample feature 𝑋, the probability that the sample
belongs to category 𝐻 is:
P ( X 1 ,  , X n | Ck ) P ( C k )
P ( Ck | X 1 ,  , X n ) =
P ( X 1 , , X n )

◼ 𝑋1 , 𝑋2 , … , 𝑋𝑛 are data features, which are usually described by m measurement values of the attribute set.
◼ For example, the attribute of the color feature may be red, yellow, and blue.

◼ 𝐶𝑘 indicates that the data belongs to a specific class 𝐶.


◼ 𝑃(𝐶𝑘 |𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is the posterior probability, or the posterior probability of 𝐻 under condition 𝐶𝑘 .
◼ P(𝐶𝑘 ) is the prior probability independent of 𝑋1 , 𝑋2 , … , 𝑋𝑛 .
◼ 𝑃(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is the prior probability of 𝑋.

84 Huawei Confidential
Naive Bayes (2)
⚫ Feature independent hypothesis example:
◼ If a fruit is red, round, and about 10 cm in diameter, it can be considered an apple.
◼ A Naive Bayes classifier believes that each of these features independently contributes to the
probability of the fruit being an apple, regardless of any possible correlation between color,
roundness, and diameter features.

85 Huawei Confidential
Ensemble Learning
⚫ Ensemble learning is a machine learning paradigm in which multiple learners are trained and combined to resolve an
issue. When multiple learners are used, the generalization capability of the ensemble can be much stronger than
that of a single learner.
⚫ For example, If you ask thousands of people at random a complex question and then summarize their answers, the
summarized answer is more accurate than an expert's answer in most cases. This is the wisdom of the crowd.

Training set

Dataset 1 Dataset 2 Dataset m

Model 1 Model 2 Model m

Large
Ensemble model

86 Huawei Confidential
Types of Ensemble Learning

Example: random forest


Bagging • Bagging independently builds multiple basic learners and then
averages their predictions.
• On average, an ensemble learner is usually better than a
single-base learner because of a smaller variance.
Ensemble learning

Example: AdaBoost, GBDT, XGBoost


Boosting Bagging builds basic learners in sequence and gradually
reduces the biases of the ensemble learner. An ensemble
learner has a strong fitting capability but may overfit.

87 Huawei Confidential
Ensemble Learning - Random Forest
⚫ Random forest = Bagging + Classification and regression tree (CART)
⚫ Random forest builds multiple decision trees and aggregates their results to make prediction more accurate and
stable.
◼ The random forest algorithm can be used for classification and regression problems.
Bootstrap sampling Build trees Aggregate results

Subset 1 Prediction 1

Subset 2 Prediction 2 • Classification:


majority voting
All training data Final prediction
• Regression:
Prediction
mean value

Subset n Prediction n

88 Huawei Confidential
Ensemble learning - Gradient Boosted Decision Tree
⚫ Gradient boosted decision tree (GBDT) is a type of boosting algorithm.
⚫ The prediction result of the ensemble model is the sum of results of all base learners. The essence of GBDT is that
the next base learner tries to fit the residual of the error function to the prediction value, that is, the residual is the
error between the prediction value and the actual value.
⚫ During GBDT model training, the loss function value of the sample predicted by the model must be as small as
possible.
Predict
30 20
Calculate
residual
Predict
10 9
Calculate
residual
Predict
1 1

89 Huawei Confidential
Unsupervised Learning - k-Means Clustering
⚫ k-means clustering takes the number of clusters k and a dataset of n objects as inputs, and outputs k
clusters with minimized within-cluster variances.
⚫ In the k-means algorithm, the number of clusters is k, and n data objects are split into k clusters. The
obtained clusters meet the following requirements: high similarity between objects in the same cluster,
and low similarity between objects in different clusters.
x1 x1

k-means clustering

k-means clustering
automatically classifies
unlabeled data.
x2 x2

90 Huawei Confidential
Unsupervised Learning - Hierarchical Clustering
⚫ Hierarchical clustering divides a dataset at different layers and forms a tree-like clustering structure. The
dataset division may use a "bottom-up" aggregation policy, or a "top-down" splitting policy. The
hierarchy of clustering is represented in a tree diagram. The root is the only cluster of all samples, and
the leaves are clusters of single samples.

91 Huawei Confidential
Summary

⚫ This course first describes the definition and types of machine learning, as well as
problems machine learning solves. Then, it introduces key knowledge points of
machine learning, including the overall procedure (data preparation, data cleansing,
feature selection, model evaluation, and model deployment), common algorithms
(including linear regression, logistic regression, decision tree, SVM, Naive Bayes, k-
NN, ensemble learning, and k-means clustering), and hyperparameters.

92 Huawei Confidential
Quiz

1. (Single-answer) Which of the following is not a supervised learning algorithm? ( )


A. Linear regression
B. Decision tree
C. k-NN
D. k-means clustering
2. (True or false) Gradient descent is the only method of machine learning. ( )

93 Huawei Confidential
Recommendations

⚫ Huawei Talent
 https://e.huawei.com/en/talent/portal/#/

⚫ Huawei knowledge base


 https://support.huawei.com/enterprise/en/knowledge?lang=en

94 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive statements


including, without limitation, statements regarding the future financial
and operating results, future product portfolio, new technology, etc.
There are a number of factors that could cause actual results and
developments to differ materially from those expressed or implied in the
predictive statements. Therefore, such information is provided for
reference purpose only and constitutes neither an offer nor an
acceptance. Huawei may change the information at any time without
notice.

You might also like