0% found this document useful (0 votes)

13 views43 pages

Lecture Slide 02 - Supervised Learning-1

The document outlines the principles of supervised learning in machine learning, detailing the process of training models using labeled data to minimize prediction errors. It discusses the distinction between regression and classification problems, the importance of model selection, and techniques to avoid overfitting, including cross-validation methods. Additionally, it emphasizes the need for proper data partitioning and evaluation to ensure accurate model performance on unseen data.

Uploaded by

exceptionmr3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views43 pages

Lecture Slide 02 - Supervised Learning-1

Uploaded by

exceptionmr3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Supervised Learning Setup

Course 4232: Machine Learning

Dept. of Computer Science

Faculty of Science and Technology

Lecturer No: Week No: 2 Semester:

Instructor: Prof. Dr. Kamruddin Nur (kamruddin@aiub.edu)
Supervised Learning
 Training experience: a set of labeled examples of the form
x = ( x1, x2, …, xn, y )

 where xj are values for input variables and y is the output

 This implies the existence of a “teacher” who knows the right answers

 What to learn: A function f : X1 × X2 × … × Xn → Y , which maps

the input variables into the output domain

2 Goal: minimize the error (loss function) on the training examples

Supervised Learning Problem
 Given a data set D X1 × X 2 × … × Xn × Y, find a function
h : X1 × X2 × … × Xn → Y
such that h(x) is a good predictor for the value of y
h is called a hypothesis

 Example: Suppose have a dataset D with:

 Input features (X₁, X₂, …, Xₙ):These are the variables used to make predictions.
(Example: In house price prediction, X₁ = size, X₂ = location, etc.)
 Output/target (Y):This is what you want to predict.
(Example: House price, whether an email is spam, etc.)
 Goal: Find a function h (called a hypothesis) that takes X₁, X₂, …, Xₙ as input and predicts Y
accurately.
 What is h? (The Hypothesis)
 h is just a rule or model that makes predictions.
(Example: A linear equation like h(X) = 2X + 3 could predict house prices.)
3 "Good predictor" means h(X) should be close to the true Y (measured by accuracy, error, etc.).
Supervised Learning Problem

 is a regression
If Y is the real set, this problem
 If Y is a finite discrete set, this problem is called classification

 Case 1: Regression (Y is a real number)

 Y is continuous (any numeric value).
(Example: Predicting house prices, temperature, etc.)
 Example Hypothesis:
h(size, location) = 50,000 + 200*(size) + 10,000*(location_rating)

 Case 2: Classification (Y is discrete)

 Y is a category (like labels).
(Example: Spam/Not spam, Cat/Dog/Bird, etc.)
 Binary Classification (2 classes):
Y = {0, 1} or {Spam, Not Spam}.
(Example:h(email_text) = "Spam" or "Not Spam".)
 Multiclass Classification (>2 classes):
4 Y = {Cat, Dog, Bird, ...}.
(Example:h(image) = "Cat".)
Supervised Learning Steps
 Decide what the training examples are
 Data collection
 Feature extraction or selection:
 Discriminative features
 Relevant and insensitive to noise
 Input space X, output space Y, and feature vectors
 Choose a model, i.e. representation for h;
 or, the hypothesis class H = {h1, …, hr})
 Choose an error function to define the best hypothesis
 Choose a learning algorithm: regression or classification method
 Training

5
 Evaluation = testing
EX: What Model or Hypothesis Space H ?

• Training examples:

ei = <xi, yi>

6 for i = 1, …, 10
Linear Hypothesis

7
Sum-of-Squares Error function
(or Mean Squared Error, MSE)
 Should define the error function to measure the difference between the predictions and the true answers?
 How to find prediction errors?

Purpose:
• Quantifies Model Accuracy: Measures how well the hypothesis hw fits the training data.
• Optimization Goal: Find the parameters ww that minimize J(w).
8
Example – Minimizing J(w)

9
Least Mean Squares (LMS)

10
Limitation of MSE

11
Mean Absolute Error (MAE)

12
Mean Absolute Error (MAE)

13
Huber Loss and RMSE

14
Some Linear Algebra

15
Some Linear Algebra …

16
Some Linear Algebra - The Solution!

17
Example of Linear Regression - Data Matrices

18
XTX

19
XTY

20
Solving for w – Regression Curve

21
Dr. M M Manjurul Islam
Linear Regression - Summary
 The optimal solution can be computed in polynomial time in the size of the
data set.
 Too simple for most real-valued problems
 The solution is w = (XTX)-1XTY, where
 X is the data matrix, augmented with a column of 1’s
 Y is the column vector of target outputs
 A very rare case in which an analytical exact solution is possible
 Nice math, closed-form formula, unique global optimum
 Problems when (XTX) does not have an inverse
 Possible solutions to this:
1. Include high-order terms in hw
2. Transform the input X to some other space X’, and apply linear regression on X’
3. Use a different but more powerful hypothesis representation
 Is Linear Regression enough ?
22
Generalization Ability vs Overfitting
 Very important issue for any machine learning algorithms.
 Can your algorithm predict the correct target y of any unseen x ?
 Hypothesis may perfectly predict for all known x’s but not unseen x’s
 This is called overfitting
 Each hypothesis h has an unknown true error on the universe: JU(h)
 But we only measured the empirical error on the training set: JD(h)
 Let h1 and h2 be two hypotheses compared on training set D, such that
we obtained the result JD(h1) < JD(h2)
 If h2 is “truly” better, that is JU(h2) < JU(h1)
 Then your algorithm is overfitting, and won’t generalize to unseen data
 We are not interested in memorizing the training set
 In our examples, highest degree d hypotheses overfit (i.e. memorize) the data

23 We need methods to overcome overfitting.

Overfitting
 We have overfitting when hypothesis h is more complex than the data
 Complexity of h = number of parameters in h
 Number of weight parameters in our example increases with degree d
 Overfitting = low error on training data but high error on unseen data
 Assume D is drawn from some unknown probability distribution
 Given the universe U of data, we want to learn a hypothesis h from the
training set 𝐷 ⊂ 𝑈 minimizing the error on unseen data 𝑈 ∖ 𝐷.
 Every h has a true error JU(h) on U, which is the expected error when the
data is drawn from the distribution
 We can only measure the empirical error JD(h) on D; we do not have U
 Then… How can we estimate the error JU(h) from D?
 Apply a cross-validation method during D
 Determining best hypothesis h which generalizes best is called model selection.
24
Avoiding Overfitting
• Red curve = Test set
• Blue curve = Training set

• What is the best h?

• Find the degree d
• Such that JT(h) minimal
• Training error decreases with complexity of h;
degree d in our example
• Testing error decreases initially then increases
• We need three disjoint sets of data T, V, U of D
• Learn a potential h using the training set T
• Estimate error of h using the validation set V
25 • Report unbiased h using the test set U
Cross-Validation
 General procedure for estimating the true error of a learner.

 Randomly partition the data into three subsets:

1. Training Set T: used only to find the parameters of classifier, e.g. w.

2. Validation Set V: used to find the correct hypothesis class, e.g. d.
3. Test Set U: used to estimate the true error of your algorithm

 These three sets do not intersect, i.e. they are disjoint

 Repeat cross-validation many times
 Results are averaged to give true error estimate.

26
Cross-Validation and Model Selection
 How to find the best degree d which fits the data D the best?
 Randomly partition the available data D into three disjoint sets;
training set T, validation set V, and test set U, then:
1. Cross-validation: For each degree d, perform a cross-
validation method using T and V sets for evaluating the
goodness of d.
 Some cross-validation techniques to be discussed later
2. Model Selection: Given the best d found in step 1, find hw,d
using T and V sets and report the prediction error of hw,d
using the test set U
 Some model selection approaches to be discussed later.
 The prediction error on U is an unbiased estimate of the true error
27
Leave-One-Out Cross-Validation
 For each degree d do:
1. for i ← 1 to m do:
1. Validation set Vi ← {ei = ( xi, yi )} ; leave the i-the sample out
2. Training set: Ti ← D \ Vi
3. wd,i ← Train(Ti, d) ; optimal wd,i using training set Ti
4. J(d, i) ← Test(Vi) ; validation error of wd,i on xi
; J(d, i) is an unbiased estimate of the true prediction error
2. Average validation error: 𝐽 𝑑 ← 1
𝑚
σ𝑚
𝑖=1 𝐽(𝑑, 𝑖)
 d* ← arg mind J(d) ; select the degree d with lowest average error
; J(d*) is not an unbiased estimate since all data is used to find it.
28
Example: Estimating True Error for d = 1

29
Example: Estimation results for all d

 Optimal choice is d = 2
 Overfitting for d > 2
 Very high validation error for d = 8 and 9
30
Model Selection
 J(d*) is not unbiased since it was obtained using all m sample data
 We chose the hypothesis class d* based on 𝐽 𝑑 =
1
𝑚
σ𝑚𝑖=1 𝐽(𝑑, 𝑖)
 We want both an hypothesis class and an unbiased true error
estimate
 If we want to compare different learning algorithms (or different
hypotheses) an independent test data U is required in order to
decide for the best algorithm or the best hypothesis
 In our case, we are trying to decide which regression model to
use, d=1, or d=2, or …, or d=11?
 And, which has the best unbiased true error estimate
31
k-Fold Cross-Validation
 Partition D into k disjoint subsets of same size and same
distribution, P1, P2, …, Pk
 For each degree d do:
 for i ← 1 to k do:
1. Validation set Vi ← Pi ; leave Pi out for validation
2. Training set Ti ← D \ Vi
3. wd,i ← Train(Ti, d) ; train on Ti
4. J(d, i) ← Test(Vi) ; compute validation error on Ti
 Average validation error: 𝐽 𝑑 ← 1
𝑚
σ𝑚
𝑖=1 𝐽(𝑑, 𝑖)
 d* ← arg mind J(d) ; return optimal degree d
32
Learning a Class from Examples
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect from a
family car?
 Output:
Positive (+) and negative (–) examples
 Input representation:
x1: price, x2 : engine power

33
Training set X
X = {xt ,r t }tN=1

 1 if x is positive
r =
0 if x is negative

 x1 
x= 
x2 
Class C
(p1  price  p2 ) AND (e1  engine power  e2 )

Dr. M M Manjurul Islam 35

Hypothesis class H
 1 if h says x is positive
h( x) = 
0 if h says x is negative

Error of h on H

E (h| X ) = 1(h(xt )  r t )
N

t =1
S, G, and the Version Space

most specific hypothesis, S

most general hypothesis, G

h  H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)

37
Margin
 Choose h with largest margin
Noise and Model Complexity
Use the simpler one because
 Simpler to use
(lower computational
complexity)
 Easier to train (lower
space complexity)
 Easier to explain
(more interpretable)
 Generalizes better (lower
variance - Occam’s razor)

39
Multiple Classes, Ci i=1,...,K
X = {xt ,r t }tN=1
1 if x t
Ci
ri = 
t

0 if x t
C j , j  i

Train hypotheses
hi(x), i =1,...,K:

1 x t
Ci
hi (x ) = 
t
if
0 if x t
C j , j  i
Regression

X = x , r
t

t N
t =1
g(x ) = w1x + w0
rt 
g(x ) = w 2 x 2 + w1 x + w 0
r t = f (x t ) + 

1 N t
N t =1

E (g | X ) =  r − g (x )
t 2

1 N t
N t =1

E (w1 , w 0 | X ) =  r − (w1 x + w 0 )
t 2

Cross-Validation
 To estimate generalization error, we need data unseen during
training. We split the data as
 Training set (50%)
 Validation set (25%)
 Test (publication) set (25%)
 Resampling when there is few data
Textbook/ Reference Materials

 Introduction to Machine Learning by Ethem Alpaydin

 Machine Learning: An Algorithmic Perspective by
Stephen Marsland
 Pattern Recognition and Machine Learning by
Christopher M. Bishop

Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
L2 Supervised Learning
No ratings yet
L2 Supervised Learning
43 pages
Regularization CrossValidation
No ratings yet
Regularization CrossValidation
37 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
ML 01
No ratings yet
ML 01
57 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Week 3
No ratings yet
Week 3
56 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Supervised Learning & Regression
No ratings yet
Supervised Learning & Regression
41 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
ML 4
No ratings yet
ML 4
21 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
7 pages
Unit 4 Regression
No ratings yet
Unit 4 Regression
26 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Unit I 2
No ratings yet
Unit I 2
78 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
IML Summary
No ratings yet
IML Summary
12 pages
Unit 1
No ratings yet
Unit 1
92 pages
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
No ratings yet
Project 03: Data Fitting Applied Mathematics and Statistics For Information Technology
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Chapter 6 ML
No ratings yet
Chapter 6 ML
45 pages
ML 1 Lecture 2
No ratings yet
ML 1 Lecture 2
50 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Unit 7 Deterministic Models
No ratings yet
Unit 7 Deterministic Models
71 pages
Precision and Recall in ML Evaluation
No ratings yet
Precision and Recall in ML Evaluation
28 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Supervised Machine Learning Algorithm
100% (1)
Supervised Machine Learning Algorithm
111 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Overfitting vs Underfitting in ML
No ratings yet
Overfitting vs Underfitting in ML
20 pages
Model Selection and Evaluation
No ratings yet
Model Selection and Evaluation
23 pages
Lect 1
No ratings yet
Lect 1
24 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
DS Notes Unit - V
No ratings yet
DS Notes Unit - V
13 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
INSY662 - F23 - Week 3-1
No ratings yet
INSY662 - F23 - Week 3-1
22 pages
机器学习
No ratings yet
机器学习
41 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Lec 5
No ratings yet
Lec 5
28 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Regularization
No ratings yet
Regularization
42 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
002-Supervised Learning Setup 01 W2L2
No ratings yet
002-Supervised Learning Setup 01 W2L2
21 pages
Regression and Generalization
No ratings yet
Regression and Generalization
67 pages
002 TwinCAT3 QuickStart SampleProject
No ratings yet
002 TwinCAT3 QuickStart SampleProject
33 pages
Keihlasan Dan Arti Pentingnya Dalam Mengelola Pendidikan
No ratings yet
Keihlasan Dan Arti Pentingnya Dalam Mengelola Pendidikan
18 pages
The Victorian Actress in The Novel and On The Stage Renata Kobetts Miller PDF Download
No ratings yet
The Victorian Actress in The Novel and On The Stage Renata Kobetts Miller PDF Download
65 pages
Abstract, Concrete, General, and Specific Terms
No ratings yet
Abstract, Concrete, General, and Specific Terms
4 pages
Toeic Speaking: Part 1: Questions 1-2
No ratings yet
Toeic Speaking: Part 1: Questions 1-2
21 pages
Ulysses - TEST
0% (1)
Ulysses - TEST
2 pages
Cisco IOS XR Getting Started Guide For The Cisco CRS Router
No ratings yet
Cisco IOS XR Getting Started Guide For The Cisco CRS Router
220 pages
English
No ratings yet
English
21 pages
The Noun - Eng
No ratings yet
The Noun - Eng
2 pages
Set 2 - SPED
No ratings yet
Set 2 - SPED
9 pages
Gejala Batu Hempedu
No ratings yet
Gejala Batu Hempedu
4 pages
Aughaval Parishes (Church of Ireland) Bulletin, Final, 10 Nov 2024
No ratings yet
Aughaval Parishes (Church of Ireland) Bulletin, Final, 10 Nov 2024
5 pages
Features of Maltego
No ratings yet
Features of Maltego
6 pages
CMIS320-Project 2-DatabaseDesign
No ratings yet
CMIS320-Project 2-DatabaseDesign
5 pages
Digital Citizenship: Computer Troubleshooting
100% (1)
Digital Citizenship: Computer Troubleshooting
6 pages
Kerala SSLC English Grammar Extended Activities Based On 16.10.2020 Class
No ratings yet
Kerala SSLC English Grammar Extended Activities Based On 16.10.2020 Class
13 pages
Lie Detection
No ratings yet
Lie Detection
4 pages
CELTA Pre-Interview Task Guide
No ratings yet
CELTA Pre-Interview Task Guide
6 pages
DBH 2 Revision 1 Term
No ratings yet
DBH 2 Revision 1 Term
6 pages
Tarek Hajj Shehadi: VECTOR INTEGRAL CALCULUS
No ratings yet
Tarek Hajj Shehadi: VECTOR INTEGRAL CALCULUS
65 pages
CCNP ENARSI v8 Final Exam Answers Full - Advanced Routing
No ratings yet
CCNP ENARSI v8 Final Exam Answers Full - Advanced Routing
38 pages
PDF
No ratings yet
PDF
25 pages
Assertion and Reasoning Questions Class 6
No ratings yet
Assertion and Reasoning Questions Class 6
3 pages
FIKAYI Harmo Augustin TSHOMBE IBANDA & Jonathan MWAMBA
No ratings yet
FIKAYI Harmo Augustin TSHOMBE IBANDA & Jonathan MWAMBA
11 pages
Burlingame-The Act of Truth (Saccakiriya), A Hindu Spell and Its Employment As A Psychic Motif-1917
No ratings yet
Burlingame-The Act of Truth (Saccakiriya), A Hindu Spell and Its Employment As A Psychic Motif-1917
40 pages
Coal-Assignment 3
No ratings yet
Coal-Assignment 3
11 pages
2021-2022 Diagnostic Test English 5
No ratings yet
2021-2022 Diagnostic Test English 5
6 pages
Where Can We Draw The Line?: On The Hardness of Satisfiability Problems
No ratings yet
Where Can We Draw The Line?: On The Hardness of Satisfiability Problems
25 pages
Evaluacion de Ingles 4
No ratings yet
Evaluacion de Ingles 4
2 pages
CV - Nguyen Tuong Vy
No ratings yet
CV - Nguyen Tuong Vy
4 pages

Lecture Slide 02 - Supervised Learning-1

Uploaded by

Lecture Slide 02 - Supervised Learning-1

Uploaded by

Supervised Learning Setup

Course 4232: Machine Learning

Dept. of Computer Science

Lecturer No: Week No: 2 Semester:

 where xj are values for input variables and y is the output

 What to learn: A function f : X1 × X2 × … × Xn → Y , which maps

2 Goal: minimize the error (loss function) on the training examples

 Example: Suppose have a dataset D with:

 Case 1: Regression (Y is a real number)

 Case 2: Classification (Y is discrete)

23 We need methods to overcome overfitting.

• What is the best h?

 Randomly partition the data into three subsets:

1. Training Set T: used only to find the parameters of classifier, e.g. w.

 These three sets do not intersect, i.e. they are disjoint

Dr. M M Manjurul Islam 35

most specific hypothesis, S

 Introduction to Machine Learning by Ethem Alpaydin

You might also like