KEMBAR78
Statistical Learning Slides | PDF | Machine Learning | Sensitivity And Specificity
0% found this document useful (0 votes)
431 views60 pages

Statistical Learning Slides

This document discusses statistical learning and machine learning methods. It provides an overview of supervised learning techniques like linear regression, logistic regression, and decision trees. It also covers unsupervised learning techniques like clustering. An example of IBM's Watson question answering system is provided. The document then explains the process of text mining and feature engineering. It discusses the differences between supervised and unsupervised learning. Finally, it provides examples of classification tasks using decision tree classifiers to make predictions.

Uploaded by

john
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
431 views60 pages

Statistical Learning Slides

This document discusses statistical learning and machine learning methods. It provides an overview of supervised learning techniques like linear regression, logistic regression, and decision trees. It also covers unsupervised learning techniques like clustering. An example of IBM's Watson question answering system is provided. The document then explains the process of text mining and feature engineering. It discusses the differences between supervised and unsupervised learning. Finally, it provides examples of classification tasks using decision tree classifiers to make predictions.

Uploaded by

john
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

STATISTICAL LEARNING

STÉPHANIE VAN DEN BERG


MARYAM AMIR HAERI
'

RECAP OF THIS COURSE

• Linear regression
• Logistic regression
• Data pre-processing
• Feature extraction and selection

• Next step: machine learning


BASICS STATISTICAL/MACHINE LEARNING
AN EXAMPLE

▪ IBM Watson
▪ Watson is a question answering computer system capable of answering questions posed in natural
language, developed in IBM's DeepQA project. It can handle speaking language.
▪ Jeopardy
PROCESS OF TEXT MINING AND STATISTICAL LEARNING

▪ Text preprocessing
▪ Feature generation
Text Feature Feature Analysis of
▪ Feature selection preprocessing generation selection
Mining
results
▪ Mining
▪ Analysis of results
PROCESS OF TEXT MINING AND STATISTICAL LEARNING

▪ Text preprocessing
▪ Feature generation
Text Feature Feature Analysis of
▪ Feature selection preprocessing generation selection
Mining
results
▪ Mining
▪ Analysis of results
26-11-2020

MACHINE LEARNING METHODS

Source: https://www.diegocalvo.es/en/machine-learning-supervised-unsupervised/
Supervised Learning Unsupervised Learning

Classification Regression Clustering

https://datasolut.com/wiki/unsupervised-learning/
SUPERVISED VS UNSUPERVISED

In supervised learning, a function 𝐹 is sought that


predicts the outcome 𝑦 based on the input 𝑥.

𝑦 = 𝐹(𝑥)
The function 𝑭 is trained on known cases.

Unsupervised learning is more about recognizing


groups of observations 𝑥 that have more in common
with themselves compared to the other observations.
CLASSIFICATION

training prediction
THIS COURSE

▪ Supervised learning
▪ Focus on Classification and regression trees
▪ KNN
▪ Evaluation of classification
▪ Accuracy, Sensitivi
CLASSIFICATION TASK

Training data Learning Algorithm


X1 X2 Xn Y

Learn Model
Model

X1 X2 Xn Y
Apply Model
Years of behavior University Past work Qualified for the
Experiences test results degree performance job
score

10 85 Master 80 Yes

4 55 PhD 65 No

7 80 Master 90 Yes Learning


3 40 PhD 50 NO Algorithm

Learn Model
Model
CLASSIFICATION TASK
Years of behavior University Past work Qualified for the
Experiences test results degree performance job
score

10 85 Master 80 Yes

4 55 PhD 65 No

7 80 Master 90 Yes Learning


3 40 PhD 50 No Algorithm

Learn Model
Model

deduction
Apply Model

Years of behavior University Past work


Experiences test results degree performance
score

5 35 Master 60

3 55 Master 85

9 87 PhD 75

8 90 Master 30
CLASSIFICATION TASK
Years of behavior University Past work Qualified for the
Experiences test results degree performance job
score

10 85 Master 80 Yes

4 55 PhD 65 No

7 80 Master 90 Yes Learning


3 40 PhD 50 No Algorithm

Learn Model
Years of behavior University Past work Predicted output Model
Experiences test results degree performance Accept for job?
score

5 35 Master 60 No deduction
3 55 Master 85 Yes Apply Model
9 87 PhD 75 Yes

8 90 Master 30 No

Years of behavior University Past work


Experiences test results degree performance
score

5 35 Master 60

3 55 Master 85

9 87 PhD 75

8 90 Master 30
OUTPUT OF A CLASSIFIER

Label Input Classifier 𝑌෠

𝑌෠ =0, 1
i.e. the output can be a
label,
for example it can be
class zero or class one
Score (Rank)

Input Classifier 𝑌෠

𝑌෠ can be any number


in a district range
for example [0,1]
EXAMPLE OF DECISION TREE CLASSIFIER

Years of
experiences
≥5 <5

Past Behavior
performance
score test results
≥ 70 < 70 ≥ 60 <60

Behavior University
No No
test results degree

≥ 60 <60 PhD Master

Yes No Yes No
Years of behavior University Past work Qualified for the
Experiences test results degree performance job
score

10 85 Master 80 ?

Years of
experiences
≥5 <5

Past Behavior
performance
score test results
≥ 70 < 70 ≥ 60 <60

Behavior University
No No
test results degree

≥ 60 <60 PhD Master

Yes No Yes No
Years of behavior University Past work Qualified for the
Experiences test results degree performance job
score

10 85 Master 80 Yes

Years of
experiences
≥5 <5

Past Behavior
performance
score test results
≥ 70 < 70 ≥ 60 <60

Behavior University
No No
test results degree

≥ 60 <60 PhD Master

Yes No Yes No
LOGISTIC REGRESSION CLASSIFIER

▪ Provide score as the output


▪ The score can transform into a label

Yes

No

https://www.javatpoint.com/logistic-regression-in-machine-learning
CLASSIFICATION AND REGRESSION TREES
DECISION TREE

https://www.aunalytics.com/decision-trees-an-overview/
26-11-2020

REGRESSION TREE

98 150 250 300

https://www.aunalytics.com/decision-trees-an-overview/
INTERPRETATION

▪ Can you describe the rules that come with this tree?

▪ If Income ≤ 50k and Age ≤ 50 then the average of home value is 98k
▪ If Income ≤ 50k and Age > 50 then the average of home value is 150k
▪ If Income > 50k and Age ≤ 50 then the average of home value is 250 k
▪ If Income > 50k and Age > 50 then the average of home value is 300 k
EXERCISE: SHOULD I PLAY BASEBALL?

▪ Please describe the set of rules for the following classification tree:
DECISION TREE
K NEAREST NEIGHBOR
K NEAREST NEIGHBOR

Source: https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn
K NEAREST NEIGHBOR

https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn
TWO ISSUES IN MACHINE LEARNING
• CURSE OF DIMENSIONALITY
• OVER-FITTING YOUR MODEL
ISSUE 1: CURSE OF DIMENSIONALITY

▪ To train more complex models, exponentially more data is needed.


▪ Let’s go back to the example of the regression trees.
CURSE OF DIMENSIONALITY

▪ To train more complex models, exponentially more data is needed.


HOW MUCH DATA WOULD YOU NEED?

▪ Imagine a data set with 1000 individuals for a binary classification


3 variables, two
2 variables, two categories categories
1 variable, two 127 124
categories X1=a 248 252 122 127 126 126
500 500 X1=b 252 248 124 126
X1=a X1=b X2=c X2=d

4 variables, two
categories 60 61
62 63 61 63
61 62 64 61
60 65 63 61
70 64
HOW MUCH DATA WOULD YOU NEED?

▪ With 4 times more variables, you need almost 10 times more cases.
▪ As a consequence, for large models, you will have very sparse area’s without many observations.

▪ Imagine what would happen with more categories


per variable, and even more than 4 variables.
HOW MUCH DATA WOULD YOU NEED?

▪ Rule of thumb: 50 respondents per combination of categories


HOW MUCH DATA WOULD YOU NEED?

▪ Rule of thumb: 50 respondents per combination of categories

3*2*2 = 12 , therefore 12*50 = 600 observations


HOW MUCH DATA WOULD YOU NEED?

▪ Rule of thumb: 50 respondents per combination of categories

(4 variables), 3*3*2*2 = 36, therefore 1800 observations


ISSUE 2: OVER-FITTING YOUR MODEL

▪ A model might perfectly describe the data that you use to train it with. However, for future unseen
cases, it might be pretty far off.
▪ Technically, if the number of variables in your model is as large as the number of observations, you
can come to a perfect model.
OVERFITTING

Source: https://datascience.foundation/sciencewhitepaper/underfitting-and-overfitting-in-machine-learning
EXAMPLE OF K-NEAREST NEIGHBOR
ISSUE OF OVER-FITTING

▪ When k-nearest neighbour is applied as the machine learning algorithm, the optimal value for k
has to be chosen.
▪ K=1: over-fits the data
▪ K=optimum value
▪ K=very high: under-fits the data
▪ Because of this, various settings of the algorithm have to be applied. The difference in fit
between the training set and the test set is an indication for over-fitting.
▪ Usually the k-value with the best fit in the test set is chosen as optimal.
STEP 5: INTERPRETATION AND EVALUATION
HOW TO TEST THE QUALITY OF THE MACHINE LEARNING MODEL?

▪ With linear regression, you check R-square

▪ For classification algorithm, instead, we use a matrix related to (in)correct decisions. In case of
two possible outcomes:

PREDICTED CLASS
Class=Yes Class=No

Class=Yes TP FN
ACTUAL
CLASS Class=No
FP TN
METRICS FOR PERFORMANCE EVALUATION

Confusion Matrix:
PREDICTED CLASS (R)

Class=1 Class=0

ACTUAL Class=1 TP (true positive) FN (false negative)


CLASS (Y)
Class=0 FP (false positive) TN (true negative)
EXAMPLE OF CONFUSION MATRIX

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100
ACCURACY

▪ How many predictions were correct predictions (proportion)


PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100
𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁
Accuracy = =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

𝐹 + 𝐹𝑁
Error rate = 1 − 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
ACCURACY

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100

𝑇𝑃 + 𝑇𝑁 500
Accuracy = = = 0,869
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 575
𝐹+𝐹𝑁
Error rate = 1 − 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = =0.131
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
SENSITIVITY (RECALL)

✓ Sensitivity focuses on the percentage of instances predicted as positive with respect to


the amount of existing positive instances.

✓ The other names of Sensitivity areTrue Positive Rate (TPR), and recall.
✓ How many of the YES-es were correctly identified (proportion)

PREDICTED CLASS 𝑇𝑃
Sensitivity =
𝑇𝑃 + 𝐹𝑁
Class=Yes Class=No

Class=Yes TP FN
ACTUAL
CLASS Class=No
FP TN
SENSITIVITY (RECALL)

PREDICTED CLASS 𝑇𝑃 400


Sensitivity = = = 0.941
𝑇𝑃 + 𝐹𝑁 425
Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100
SPECIFICITY

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP FN
ACTUAL
CLASS Class=No
FP TN

𝑇𝑁
Specificity= TNR =
𝑇𝑁 + 𝐹𝑃

▪ Specificity measure is used to determine the proportion of


actual negative cases, which got predicted correctly
SPECIFICITY

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100

𝑇𝑁 100
Specificity= TNR = = = 0.666
𝑇𝑁+𝐹𝑃 150
POSITIVE PREDICTIVE VALUE (PPV OR PRECISION)

✓ positive predictive value (PPV) or Precision focuses on the


percentage of instances that have been predicted as positive
instances and they are actually positive.
PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP FN
ACTUAL
CLASS Class=No
FP TN

𝑇𝑃
PPV=Precision =
𝑇𝑃 + 𝐹𝑃
POSITIVE PREDICTIVE VALUE (PPV OR PRECISION)

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100

𝑇𝑃 400
PPV=Precision = = = 0.888
𝑇𝑃+𝐹𝑃 450
NEGATIVE PREDICTIVE VALUE (NPV)

▪ The proportion of predicted negatives which are real negatives. It reflects the probability that a
predicted negative is a true negative.

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP FN
ACTUAL
CLASS Class=No
FP TN

𝑇𝑁
NPV =
𝑇𝑁 + 𝐹𝑁
NEGATIVE PREDICTIVE VALUE (NPV)

▪ The proportion of predicted negatives which are real negatives. It reflects the probability that a
predicted negative is a true negative.

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100

𝑇𝑁 100
NPV = = = 0.8
𝑇𝑁+𝐹𝑁 125
F-SCORE

PREDICTED CLASS

Class=Yes Class=No

Class=Yes TP FN
ACTUAL
CLASS Class=No
FP TN

𝑃𝑃𝑉 × 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 2. 𝑇𝑃
F1 − Score = 2. =
𝑃𝑃𝑉 + 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 2TP + FP + FN

❑ PPV considers true positive and false positive


❑ However, Sensitivity considers true positive and false negative.
❑ F-measure considers all except true negative.
❑ F-measure is high if both precision and recall are high.
F-SCORE
PREDICTED CLASS
2.𝑇𝑃

2TP + FP + FN Class=Yes Class=No

Class=Yes TP=400 FN=25


ACTUAL
CLASS Class=No
FP=50 TN=100

𝑃𝑃𝑉 × 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 2. 𝑇𝑃 800


F1 − Score = 2. = = = 0.91
𝑃𝑃𝑉 + 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 2TP + FP + FN 875
EVALUATION OF SUPERVISED MODELS

Train Test

Training Accuracy Test Accuracy


Training PPV Test PPV
Training Sensitivity Test Sensitivity
…. Dataset ….
K-FOLD CROSS-VALIDATION

Source: https://bradleyboehmke.github.io/HOML/process.html

You might also like