0% found this document useful (0 votes)

33 views24 pages

Lecture 2 Ai

The document discusses machine learning tasks including classification, regression, and transcription. It then covers topics such as learning algorithms' performance, experience, capacity, overfitting, underfitting, regularization, hyperparameters, validation sets, estimation, bias, variance and their decomposition.

Uploaded by

Enes sağnak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views24 pages

Lecture 2 Ai

Uploaded by

Enes sağnak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

UCK358E – INTR.

TO ARTIFICIAL INTELLIGENCE
SPRING ‘24

LECTURE 2
MACHINE LEARNING BASICS

Instructor: Barış Başpınar

Learning Algorithms: Tasks

 A learning algorithm is an algorithm that performs a task by using data.

 A learning problem usually has three components: Task, Performance and Experience
 Here are some common learning tasks
 Classification: Find a function that maps input data to one of k categories
 Example: Object recognition
 Classification with missing inputs: Find a function or a collection of functions that maps input data
to one of k categories, where the input vector might have missing values
 Example: Medical diagnosis
 Regression: Find a function that predicts a continuous output
 Example: Predict expected amount an insured person will make
 Transcription: Observe some unstructured data and transcribe it into discrete, textual form
 Example: Image translation, speech recognition
Learning Algorithms: Tasks

 Here are more learning tasks

 Machine Translation: Convert from one language to another
 Example: Translation for natural languages
 Structured Output: Any task that involves the output as a vector with important relationship among
its elements
 Example: Break down a sentence into nouns, verbs, etc, image captioning
 Anomaly Detection: Flag unusual or atypical objects
 Example: Spam ﬁltering, credit card fraud detection
 Data Completion: Estimate missing values in data
 Example: Product recommendation
 Denoising: Recover a clean example from a corrupt one
 Example: Image and video denoising
 Density Estimation: Compute the 𝑝(𝒙) that generated the data
 Example: Estimate the structure of a PGM
Learning Algorithms: Performance

 We need a quantitative measure 𝑃 to measure how well our algorithm is performing on task 𝑇
 For classiﬁcation and translation tasks, we measure the performance by accuracy,
 Just divide the number of correctly classiﬁed examples to number of all examples

 Performance is evaluated on test data, data that has never been seen by the algorithm before
 This set is usually diﬀerent from the training data

 Sometimes selecting the correct performance criteria can be a complicated task itself
 For translation, should we measure the accuracy on word by word basis or a sentence by sentence
basis?
 For regression, should we penalize small medium frequency mistakes, or large rare mistakes
Learning Algorithms: Experience

 Many learning algorithms are categorized as supervised or unsupervised based on the data
that experience
 Roughly speaking, unsupervised algorithms try to determine 𝑝(𝒙) from 𝒙
 Whereas the supervised algorithm try to determine 𝑝( 𝒚 ∣ 𝒙 ) from (𝒙, 𝒚)

 However, sometimes the lines between the two categories get blurry
 In semi-supervised algorithms, where the data is only partially labelled
 Unsupervised algorithms can also be used for supervised learning
 Simply estimate 𝑝( 𝒙, 𝒚 ) and then marginalize

 And vice versa:

 Use chain rule 𝑝 𝑥 = ς𝑛𝑖=1 𝑝 𝑥𝑖 𝑥1 , … , 𝑥𝑖−1 to turn an unsupervised problem to a collection of 𝑛
supervised problems
Learning Algorithms: Linear Regression Example

 Let’s use a linear regression example to clarify these subjects

 In linear regression, our model is as follows:

 Where 𝒘 ∈ 𝑅𝑛 is a weight vector. Hence the task is to ﬁnd 𝒘 that does well according to some
performance measure.

 A popular type of performance measure for these type of tasks is the mean squared error

 The experience is gained by observing the dataset

 It can be shown that the optimal solution is found by
Learning Algorithms: Linear Regression Example
Capacity, Overﬁtting and Underﬁtting: Generalization

 A central challenge in machine learning is to perform well on new, previously unseen inputs.
 This property is known as generalization. The error on the test data is referred to as generalization
error

 But how can we say something about the test data, if all we see is training data?
 If these datasets were generated arbitrarily, we can’t...
 However, if they come from the same distribution 𝑝(𝒙, 𝑦) , then we can say something! For
instance, their mean should be equal

 In general, we want to ﬁnd a 𝒘 such that:

I. Make the training error small
II. Make the gap between training and test error small

 Unfortunately, we rarely do both good at the same time

 Failing on I, is known as underfitting, our model’s capacity is not sufficient to capture 𝑝(𝒙, 𝑦)
 Success on I but failing on II, is known as overfitting, our model’s capacity is so high that it becomes
overtuned to work only on training data
Capacity, Overfitting and Underfitting: Generalization
Capacity, Overfitting and Underfitting: Capacity
Capacity, Overfitting and Underfitting: Capacity
Capacity, Overfitting and Underfitting: Regularization

 Regularization is any modiﬁcation we make to the learning algorithm to reduce generalization

error.
 The most common form of regularization is to modify the loss function so that larger parameters
are penalized

 There are other forms of regularization, for instance ‘dropout‘ (randomly dropping some of the
weights in your model) is very popular in deep learning
Hyperparameters and Validation Sets

 Hyperparameters are the parameters of our model that controls the capacity, as well as the
behaviour of the algorithm
 Examples: degree of a polynomial, number of layers in a neural net, structure of a PGM...
 These are usually set by domain experts

 Can we ’learn’ those parameters as well?

 We can try optimizing them, this is called hyperparameter optimization

 However, using the whole training data does not make sense
 The optimization procedure would simply select the parameters that result in highest capacity (i.e.
overfitting)

 Similar to how we separated the whole data into test and training, we can further divide the
training data into two disjoint subsets
 The smaller part that is used for learning the hyperparameters is called the validation set
Estimators, Bias and Variance: Estimation

 How can we quantify the concept of overﬁtting/underﬁtting?

 Classical statistics and estimation theory can help with that
 We will take the frequentist view for the rest of this section

 Point estimation is an attempt to provide a single best estimate for a quantity of interest. We
denote the point estimate of 𝜽 as 𝜽෡

 Let {𝑥 1 , . . . , 𝑥 (𝑚) } be a set of 𝑚 i.i.d. data points. A point estimator or statistic is any function
of the data:

෡ 𝑚 is a random variable, even though 𝜽 is not

 {𝑥 1 , . . . , 𝑥 (𝑚) } is generated randomly, hence 𝜽
 We are interested in analyzing the relationship between 𝜽 ෡ 𝑚 and 𝜽, both for finite 𝑚 and 𝑚 → ∞
Estimators, Bias and Variance: Bias

 The bias of an estimator is deﬁned as:

 The expectation is taken over the data

෡ 𝑚 ) = 0 and asymptotically unbiased if lim Bias (𝜽
 An estimator is unbiased if Bias(𝜽 ෡𝑚 ) = 0
𝑚→∞

 Let 𝑥 (𝑖) be generated from a Bernoulli distribution with mean 𝜃. Then the estimator
1
𝜃መ𝑚 = σ𝑚 𝑖=1 𝑥
(𝑖) is unbiased
𝑚
 The same estimator is also unbiased when 𝑥 (𝑖) are generated from a Gaussian distribution with
mean 𝜇
1
 However, the estimator 𝜎ො𝑚 = 𝑚 σ𝑚𝑖=1 𝑥
(𝑖) − 𝜇Ƹ
𝑚 is only asymptotically unbiased. We can make it
1
unbiased by setting 𝜎ො𝑚 = σ𝑚𝑖=1 𝑥
(𝑖) − 𝜇Ƹ
𝑚
𝑚−1

 It might seem like unbiased estimators are preferable, but this is not necessarily true, we will
soon see that biased estimators possess some interesting properties
Estimators, Bias and Variance: Variance and Standard Error

 Another interesting property of an estimator is how much it varies as a function of the data
sample. This is called variance

 Low variance estimators give similar outputs irregardless of the sampled data. Variance is an
excellent measure for quantifying the uncertainty in our estimations.
 For instance, for Gaussian distributions we can say that our estimate of the mean 𝜇Ƹ falls into the
interval of

with 95% probability.

 Hence ideally, we want both low bias and low variance.

1
 For Bernoulli distribution, variance of the estimator is 𝜃(1 − 𝜃)
𝑚
 The variance decreases as 𝑚 increases, this is a common property of popular estimators
Estimators, Bias and Variance: Bias-Variance Decomposition

 Bias and variance are two diﬀerence sources of error in an estimator

 Bias measures the expected deviation from the true value.
 Variance measures deviation from the expected estimator value among diﬀerent subsets of data.

 Should we choose a model with low bias or low variance?

 The answer is given by the key idea that mean squared error can be decomposed as follows

 Hence for a fixed MSE, there is a trade-off between variance and bias
 In general, increasing capacity increases the variance and decreases the bias.
Estimators, Bias and Variance: Bias-Variance Trade-off
Supervised Learning Algorithms: Probabilistic Supervised Learning

 The supervised learning is mainly about estimating the probability distribution 𝑝(𝑦 ∣ 𝒙)
 To use MLE for this job, we ﬁrst deﬁne a family of probability distributions 𝑝(𝑦 ∣ 𝒙; 𝜽)

 Linear regression problem can be expressed by the family:

 It can be shown that in this case 𝜃𝑀𝐿 has a closed form expression, which is the same as least
squares solution!

 For binary classiﬁcation, we can use the logistic sigmoid function:

 No closed form solution in this case, we need to use numerical optimization methods.
Supervised Learning Algorithms: Parametric Supervised Learning

 Not all supervised learning algorithms are probabilistic. We can simply deﬁne a parametric
family of deterministic models 𝑦 = 𝑓(𝒙; 𝒘) and use some loss function to find an optimal 𝒘∗
 One of the most famous approaches to this problem is the Support Vector Machines (SVMs)
 For binary classiﬁcation, we let 𝑓(𝒙) = 𝒘𝑇 𝒙 + 𝑏
 If 𝑓(𝒙) > 0 we predict that sample belongs to class 1 and vice versa

 SVM is useful for two reasons

 Solving for optimal 𝒘 is a convex optimization problem
 It can be shown that:

 The second part is the key results. We can replace 𝒙𝑇 𝒙 (𝑖) by any inner product and we can
still use the same algorithm!
 This is how we generalize to nonlinear classiﬁers:
 The 𝑘 is called a 𝑘𝑒𝑟𝑛𝑒𝑙
Supervised Learning Algorithms: Nonparametric Supervised Learning

 For most supervised learning algorithms we need to ﬁx the form and number of parameters in
the model beforehand.
 Can we also learn those from the model?
 Models that allow these are called nonparametric, in some sense, they let the data speak for itself

 One of the simplest nonparametric classiﬁer is k-nearest neighborhood method

 Fix a positive integer k (yes even nonparametric models have parameters)
 For a new point, ﬁnd the nearest k samples from the dataset, compute their class/output averages
and simply predict this value for the new point

 There are other popular nonparametric classiﬁers as well, decision trees, random forests etc.
Supervised Learning Algorithms: Nonparametric Supervised Learning
Unsupervised Learning

 There are unsupervised learning algorithms for dimensionality reduction, e.g. PCA
 PCA takes unlabeled and diﬃcult to interpret data and transforms it into a much more manageable
form

 Another popular form of unsupervised learning is clustering

 How can we segment the data in a meaningful way?
 k-means clustering is a very popular algorithm for this job.

 Generating association rules is another task that can be solved via unsupervised learning
 An association rule is an unsupervised learning method which is used for finding the relationships
between variables in the large database.
References

 I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning”, 2016.

Machine Learning
No ratings yet
Machine Learning
21 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
DL Unit1
No ratings yet
DL Unit1
61 pages
DL Unit1
100% (2)
DL Unit1
79 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
AI ch6
No ratings yet
AI ch6
42 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
Lec 3
No ratings yet
Lec 3
31 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
L2 - Problems in ML & Performance Evaluation
No ratings yet
L2 - Problems in ML & Performance Evaluation
30 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
All DL
No ratings yet
All DL
72 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
ML Hand Written Notes
No ratings yet
ML Hand Written Notes
19 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
Unit 2
No ratings yet
Unit 2
97 pages
Machine Learning Basics Dl2 RK
No ratings yet
Machine Learning Basics Dl2 RK
16 pages
Chapter 19
No ratings yet
Chapter 19
30 pages
Notes
No ratings yet
Notes
125 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
Artificial Intelligence - Beyond Classical AI - Reema Thareja
60% (5)
Artificial Intelligence - Beyond Classical AI - Reema Thareja
1,468 pages
Proceedings of RMSW 2010 - Volume I
No ratings yet
Proceedings of RMSW 2010 - Volume I
546 pages
UdemyforBusinessCourseList PDF
No ratings yet
UdemyforBusinessCourseList PDF
72 pages
Visionwaves - Fresh Grad - Selection Criteria & Procedure
No ratings yet
Visionwaves - Fresh Grad - Selection Criteria & Procedure
5 pages
6th Sem Project PDF
No ratings yet
6th Sem Project PDF
18 pages
SDE2 QuickGuide
No ratings yet
SDE2 QuickGuide
13 pages
Artificial Intelligence in Drug Development Patenting and Regulatory Aspects Kavita Sharma Download
No ratings yet
Artificial Intelligence in Drug Development Patenting and Regulatory Aspects Kavita Sharma Download
52 pages
Marketing Artificial Intelligence
No ratings yet
Marketing Artificial Intelligence
20 pages
Research Paper On DATA SCIENCE
No ratings yet
Research Paper On DATA SCIENCE
11 pages
Deep Learning Optimization Guide
No ratings yet
Deep Learning Optimization Guide
30 pages
Neuro Solutions
No ratings yet
Neuro Solutions
3 pages
AI in Foreign Language Learning and Teaching - Theory and Practice
100% (2)
AI in Foreign Language Learning and Teaching - Theory and Practice
160 pages
A Survey and Critique of Deep Learning On Recommender Systems
No ratings yet
A Survey and Critique of Deep Learning On Recommender Systems
31 pages
Basics of Soft Computing
No ratings yet
Basics of Soft Computing
9 pages
6 JC Report Forest Fire
No ratings yet
6 JC Report Forest Fire
36 pages
Integrated BTech Brochure 19-1-24
No ratings yet
Integrated BTech Brochure 19-1-24
40 pages
9+-+ JurnalTekberInggris (Revisi)
No ratings yet
9+-+ JurnalTekberInggris (Revisi)
9 pages
Artificial Intelligence and The Biofield
No ratings yet
Artificial Intelligence and The Biofield
10 pages
Recipe - Generation - From Food Image-Project
No ratings yet
Recipe - Generation - From Food Image-Project
6 pages
Get (Ebook) 3D Diagnosis and Treatment Planning in Orthodontics: An Atlas For The Clinician by Jean-Marc Retrouvey Mohamed-Nur Abdallah ISBN 9783030572228, 3030572226 Free All Chapters
No ratings yet
Get (Ebook) 3D Diagnosis and Treatment Planning in Orthodontics: An Atlas For The Clinician by Jean-Marc Retrouvey Mohamed-Nur Abdallah ISBN 9783030572228, 3030572226 Free All Chapters
81 pages
LDC 2025 Conferences
No ratings yet
LDC 2025 Conferences
3 pages
AI-KRR-Introduction
No ratings yet
AI-KRR-Introduction
85 pages
Youtube Shorts Blueprint A Beginners Guide To Success
100% (2)
Youtube Shorts Blueprint A Beginners Guide To Success
43 pages
Icecmsn 350 Fake News
No ratings yet
Icecmsn 350 Fake News
10 pages
Arxiv Impact of GENAI
No ratings yet
Arxiv Impact of GENAI
9 pages
Artikel 2
No ratings yet
Artikel 2
9 pages
Chloe Wong May 2024 Resume
No ratings yet
Chloe Wong May 2024 Resume
1 page
Updated Paper ID-118
No ratings yet
Updated Paper ID-118
5 pages
Ciml Mini Project
No ratings yet
Ciml Mini Project
19 pages
Autoencoder
No ratings yet
Autoencoder
24 pages

Lecture 2 Ai

Uploaded by

Lecture 2 Ai

Uploaded by

UCK358E – INTR.

Instructor: Barış Başpınar

 A learning algorithm is an algorithm that performs a task by using data.

 Here are more learning tasks

 And vice versa:

 Let’s use a linear regression example to clarify these subjects

 The experience is gained by observing the dataset

 In general, we want to ﬁnd a 𝒘 such that:

 Unfortunately, we rarely do both good at the same time

 Regularization is any modiﬁcation we make to the learning algorithm to reduce generalization

 Can we ’learn’ those parameters as well?

 How can we quantify the concept of overﬁtting/underﬁtting?

෡ 𝑚 is a random variable, even though 𝜽 is not

 The bias of an estimator is deﬁned as:

 The expectation is taken over the data

with 95% probability.

 Hence ideally, we want both low bias and low variance.

 Bias and variance are two diﬀerence sources of error in an estimator

 Should we choose a model with low bias or low variance?

 Linear regression problem can be expressed by the family:

 For binary classiﬁcation, we can use the logistic sigmoid function:

 SVM is useful for two reasons

 One of the simplest nonparametric classiﬁer is k-nearest neighborhood method

 Another popular form of unsupervised learning is clustering

 I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning”, 2016.

You might also like