0% found this document useful (0 votes)

10 views16 pages

Machine Learning Homework1 Solutions

The document presents solutions to four machine learning problems, including polynomial regression and multivariate regression, highlighting optimal degrees and regularization techniques. It also proves properties of logistic functions and implements a logistic regression algorithm in MATLAB, detailing the performance metrics and convergence rates for various hyperparameter settings. The findings indicate that the model can achieve 100% accuracy with zero classification error under specific conditions.

Uploaded by

harishbalaji0905

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views16 pages

Machine Learning Homework1 Solutions

Uploaded by

harishbalaji0905

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

MACHINE LEARNING HOMEWORK 1 SOLUTIONS

HARISH BALAJI BOOMINATHAN (hb2917)

September 16,2024

Problem 1:
Goal : To find a one dimensional function that takes a scalar input and outputs a scalar
f : ℝ → ℝ.

Form of the function :

f( x,𝜃 ) = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 + . . . + 𝜃𝑑 𝑥 𝑑

Where d is the degree of the polynomial.

Empirical Risk is given by

𝑁
1 2
𝑅𝑒𝑚𝑝 (𝜃) = ∑ (𝑦𝑖 − 𝑓( 𝑥; 𝜃))
2𝑁
𝑖=1

On partially differentiating with respect to 𝜃 and equating to 0 ,we get:

𝜃 ∗ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌

After splitting the dataset into two random halves, compute 𝜃 ∗ and use this Regression model for d ranging
from 1 to 100 we get,
Regression Model for D = 100 :
When we use cross validation and plot the Train Error vs Test Error graph, we can see that the minimum value
of Test Error is present at Degree = 9 with error value = 1.085044e+21
As we can see in the graph that the testing error reaches minimum when the degree = 9 and starts increasing
afterwards while the training error decreases and stays constant once it reaches 0, This shows that the model
starts to overfit, as the degree increases and hence the testing error increases rapidly.

Problem 2:
Goal : To Build a Multi-variate Regression function f : ℝ100 → ℝ , where the basis functions are of the form
𝑘

𝑓(𝑥; 𝜃) = ∑ 𝜃𝑖 𝑥𝑖
𝑖=1

We use a l2 function to penalize the model and minimize the risk of overfitting.
𝑁
1 2 𝜆 2
𝑅(𝑅𝑒𝑔) = ∑ (𝑦𝑖 − 𝑓(𝑥; 𝜃)) + ||𝜃 ||
2𝑁 2𝑁
𝑖=1

When we partially differentiate Risk with respect to 𝜃 and equate to 0 we get the model with minimum risk.
𝜃 ∗ = (𝑋 𝑇 𝑋 + 𝜆𝐼)−1𝑋 𝑇 𝑌

On applying two-fold cross validation and plotting the graph for various values of 𝜆 in the range 0 to 1000.we
obtain :
We can see that the test error is minimum for 𝜆 = 422. And as 𝜆 increases we can also observe that the values in
𝜃 decreases and the model starts to slowly underfit. Hence the training error slowly decreases while the testing
error increases.

Problem 3:

(i) To prove:
1
For 𝑔(𝑧) = , 𝑔(−𝑧) = 1 − 𝑔(𝑧) is true.
1+𝑒 −𝑧

Proof :

1
𝑔(𝑧) = → (1)
1+𝑒 −𝑧

Then,

1
𝑔(−𝑧) =
1 + 𝑒𝑧

On multiplying and dividing by 𝑒 −𝑧 , we get

𝑒 −𝑧
𝑔(−𝑧) = 𝑒 −𝑧 +1
On adding and subtracting 1 on the numerator,

1 + 𝑒 −𝑧 − 1
𝑔(−𝑧) =
1 + 𝑒 −𝑧

1 + 𝑒 −𝑧 1
𝑔(−𝑧) = −
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
1
𝑔(−𝑧) = 1 −
1 + 𝑒 −𝑧

Using equation (1):

𝑔(−𝑧) = 1 − 𝑔(𝑧)

And hence proved.

(ii) To prove :

𝑦
𝑔 −1 (𝑦) = ln ( )
1−𝑦

Proof

We know that 𝑔(𝑔−1 (𝑦)) = 𝑦

Therefore,
1
𝑔(𝑔−1 (𝑦)) = = 𝑦
1 + 𝑒 −𝑔−1(𝑦)

1 −1
= 1 + 𝑒 −𝑔 (𝑦)
𝑦

1 −1
− 1 = 𝑒 −𝑔 (𝑦)
𝑦

1−𝑦 −1
= 𝑒 −𝑔 (𝑦)
𝑦

On taking natural logarithm on both sides we have,

1−𝑦 −1
ln ( ) = ln(𝑒 −𝑔 (𝑦) )
𝑦

Using the properties of logarithm,

1−𝑦
ln ( ) = −𝑔−1 (𝑦)
𝑦
Therefore,

1−𝑦
𝑔 −1 (𝑦) = − ln ( )
𝑦

Again using properties of logarithm,

𝑦
𝑔 −1 (𝑦) = ln ( )
1−𝑦

Hence Proved.

Problem 4:

Goal :
To Implement a linear Logistic regression algorithm for binary classification in MatLab using gradient
descent.

Classification Function :

𝑓(𝑥; 𝜃) = (1 + exp(−𝜃 𝑇 𝑋))−1

That minimizes the Empirical Risk with logistical loss

𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = ∑(𝑦𝑖 − 1) log(1 − 𝑓(𝑥𝑖 , 𝜃)) − 𝑦𝑖 log(𝑓(𝑥𝑖 , 𝜃))
𝑁
𝑖=1

𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) log(1 − 𝑓(𝑥𝑖 , 𝜃)) + 𝑦𝑖 log(𝑓(𝑥𝑖 , 𝜃))
𝑁
𝑖=1

We know that,
𝜕
𝑓(𝑥𝑖 , 𝜃) = 𝑔(𝜃 𝑇 𝑋) and 𝜕𝜃 𝑔(𝜃 𝑇 𝑋) = (𝑔(𝜃 𝑇 𝑋))(1 − 𝑔(𝜃 𝑇 𝑋)) 𝑋 → (1)

Therefore,
𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) log(1 − 𝑔(𝜃 𝑇 𝑋)) + 𝑦𝑖 log(𝑔(𝜃 𝑇 𝑋))
𝑁
𝑖=1

Hence the gradient is,

𝑁
𝜕 1 1 𝜕 1 𝜕
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) ( 𝑇
) (− 𝑔(𝜃 𝑇 𝑋)) + 𝑦𝑖 𝑇
( 𝑔(𝜃 𝑇 𝑋))
𝜕𝜃 𝑁 1 − 𝑔(𝜃 𝑋) 𝜕𝜃 𝑔(𝜃 𝑋) 𝜕𝜃
𝑖=1

𝑁
𝜕 1 1 1 𝜕
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 ( 𝑇
) − (1 − 𝑦𝑖 ) ( 𝑇
)) ( 𝑔(𝜃 𝑇 𝑋))
𝜕𝜃 𝑁 𝑔(𝜃 𝑋) 1 − 𝑔(𝜃 𝑋) 𝜕𝜃
𝑖=1

Using Equation (1):

𝑁
𝜕 1 1 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 ( 𝑇
) − (1 − 𝑦𝑖 ) ( )) (𝑔(𝜃 𝑇 𝑋)) (1 − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁 𝑔(𝜃 𝑋) 1 − 𝑔(𝜃 𝑇 𝑋)
𝑖=1

𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 (1 − 𝑔(𝜃 𝑇 𝑋)) − (1 − 𝑦𝑖 )(𝑔(𝜃 𝑇 𝑋))) 𝑋
𝜕𝜃 𝑁
𝑖=1

𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑( 𝑦𝑖 − (𝑦𝑖 ) 𝑔(𝜃 𝑇 𝑋) + (𝑦𝑖 ) 𝑔(𝜃 𝑇 𝑋) − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁
𝑖=1

𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑( 𝑦 − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁
𝑖=1

We can Apply batch gradient to get 𝜃 ∗ ,

(𝑡+1) 𝜕
𝜃{ = 𝜃𝑡 − 𝜂 𝑅(𝜃).
𝜕𝜃

Here , 𝜂 is the step size and the iteration can be stopped when the descent is negligible in size or when it is less
than tolerance 𝜀 . Where both 𝜀 and 𝜂 are hyper parameters.

We can Initialize 𝜃 0 to a small random matrix.

For 𝜂 = 3 and 𝜀 = 0.1 , The model produces a 12.5% binary classification error and performs with
87.5 % accuracy and it takes 19 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.03 , The model produces a 10.5% binary classification error and performs with
89.5 % accuracy and it takes 161 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.01 , The model produces a 2% binary classification error and performs with
98 % accuracy and it takes 1050 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.001 , The model produces a 0% binary classification error and performs with
100% accuracy and it takes 27784 iterations to converge.
This model performs excellently with 0% error and with 100% accuracy but takes 27784 iterations and
converges many iterations after it reaches zero error. So, we can increase the step size and tolerance
level so that the model reaches convergence and attains zero error just at the right time.

For 𝜂 = 4 and 𝜀 = 0.0025 , The model produces a 0% binary classification error and performs with
100% accuracy and it takes just 10,292 iterations to converge.
Here the model performs with 100% accuracy and 0% error and also converges right after the binary
classification error reaches zero. And runs in least amount of time.

LogisticRegression ExercisesSolutions
No ratings yet
LogisticRegression ExercisesSolutions
5 pages
HW 2
No ratings yet
HW 2
5 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
Homework - 1
No ratings yet
Homework - 1
10 pages
Sol3 2016
No ratings yet
Sol3 2016
8 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
Data 604 HW 5 Taneir Arani
No ratings yet
Data 604 HW 5 Taneir Arani
13 pages
Machine Learning Lab (3) Report (21 CP 81)
No ratings yet
Machine Learning Lab (3) Report (21 CP 81)
7 pages
Homework 2: Caltech edX CS1156x
No ratings yet
Homework 2: Caltech edX CS1156x
5 pages
03 Regression
No ratings yet
03 Regression
55 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
Sol3 2015
No ratings yet
Sol3 2015
8 pages
480 Note Lin
No ratings yet
480 Note Lin
11 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
hw4 Red
No ratings yet
hw4 Red
6 pages
Stats 205 Hw4
No ratings yet
Stats 205 Hw4
3 pages
Homework 1
No ratings yet
Homework 1
8 pages
Cost Function
No ratings yet
Cost Function
17 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
Homework 2
No ratings yet
Homework 2
3 pages
Formula Sheet Posted
No ratings yet
Formula Sheet Posted
5 pages
Lecture 5
No ratings yet
Lecture 5
9 pages
HW 3
No ratings yet
HW 3
7 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Maths Behind ML Algos
No ratings yet
Maths Behind ML Algos
18 pages
Math Behind ML Algos
No ratings yet
Math Behind ML Algos
18 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
CH - En.u4cse19101 Cheduri Linearregression
No ratings yet
CH - En.u4cse19101 Cheduri Linearregression
8 pages
EndSem 202223 Solution
No ratings yet
EndSem 202223 Solution
4 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
3 pages
Columbia ML Homework Guide
No ratings yet
Columbia ML Homework Guide
3 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Mid-Term A2 ML Solution
No ratings yet
Mid-Term A2 ML Solution
7 pages
Deep Learning Assignment 1 - Logistic Regression Solutions
No ratings yet
Deep Learning Assignment 1 - Logistic Regression Solutions
8 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
7 pages
Exercises On Backpropagation
No ratings yet
Exercises On Backpropagation
4 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Lecture 1.5-1.6
No ratings yet
Lecture 1.5-1.6
23 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
Linear-Regression 231212 072619
No ratings yet
Linear-Regression 231212 072619
13 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
CS-31002 (ML) - CS End April 2025
No ratings yet
CS-31002 (ML) - CS End April 2025
19 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Online Gradient Descent
No ratings yet
Online Gradient Descent
7 pages
ПМиИИ Демо ENG
No ratings yet
ПМиИИ Демо ENG
11 pages
Machine Learning Problem Set
No ratings yet
Machine Learning Problem Set
7 pages
w02 LectureSlices MA4550
No ratings yet
w02 LectureSlices MA4550
27 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
VLM ZG513 Updated Problem Sheet
No ratings yet
VLM ZG513 Updated Problem Sheet
4 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Practice Midterm Sol
No ratings yet
Practice Midterm Sol
15 pages
Curve4-G7 System ADS 004
No ratings yet
Curve4-G7 System ADS 004
12 pages
Validity, Reliability & Accuracy Guide
No ratings yet
Validity, Reliability & Accuracy Guide
3 pages
20C116 - 20C118 - 20C135 - Comparative Evaluation of YOLO Versions in Trash Detection
No ratings yet
20C116 - 20C118 - 20C135 - Comparative Evaluation of YOLO Versions in Trash Detection
29 pages
DLL - English 5 - Q2 - W10
No ratings yet
DLL - English 5 - Q2 - W10
9 pages
Final Guidelines For AFRL - Endorsed by ACCSQ
No ratings yet
Final Guidelines For AFRL - Endorsed by ACCSQ
7 pages
Instrumentation & Measurement Guide
0% (1)
Instrumentation & Measurement Guide
20 pages
Physics 211 Lab 1
No ratings yet
Physics 211 Lab 1
2 pages
CHE290 F24 Lab Manual
No ratings yet
CHE290 F24 Lab Manual
87 pages
Huawei Discovery Introduction PDF
88% (8)
Huawei Discovery Introduction PDF
26 pages
Full Placement Test - Updated
No ratings yet
Full Placement Test - Updated
17 pages
Determination of Lead in Wax Crayon Using Faas
No ratings yet
Determination of Lead in Wax Crayon Using Faas
38 pages
Writing Task One Course Pack
No ratings yet
Writing Task One Course Pack
58 pages
Clinical Chemistry Charts
No ratings yet
Clinical Chemistry Charts
7 pages
Physics IA Draft
No ratings yet
Physics IA Draft
17 pages
Terminal Compliance Guide
100% (1)
Terminal Compliance Guide
23 pages
Business Research MCQ Test
No ratings yet
Business Research MCQ Test
51 pages
Dome Structure Derived From Digital Close Range Photogrammetry: A Case Study From Kadavur Area, Karur District, Tamil Nadu, India
No ratings yet
Dome Structure Derived From Digital Close Range Photogrammetry: A Case Study From Kadavur Area, Karur District, Tamil Nadu, India
9 pages
ISO 63212002 (E) Slip Melting Point
No ratings yet
ISO 63212002 (E) Slip Melting Point
11 pages
Unit 1 - Measurement and Uncertainties - Teacher
No ratings yet
Unit 1 - Measurement and Uncertainties - Teacher
55 pages
BS 01747-12-1993 (1999) Iso 10313-1993
No ratings yet
BS 01747-12-1993 (1999) Iso 10313-1993
12 pages
QAM Quality Manual Rev 15 PDF
No ratings yet
QAM Quality Manual Rev 15 PDF
36 pages
Sadt HT 225a Manual Book For User
No ratings yet
Sadt HT 225a Manual Book For User
20 pages
Handbook For Health Care Research - 2nd Edition Accessible PDF Download
100% (10)
Handbook For Health Care Research - 2nd Edition Accessible PDF Download
16 pages
CAIE-IGCSE-Physics - Alternative To Practical
No ratings yet
CAIE-IGCSE-Physics - Alternative To Practical
5 pages
FAI Sporting Code: Annex C Official Observer & Pilot Guide
No ratings yet
FAI Sporting Code: Annex C Official Observer & Pilot Guide
37 pages
Model Evaluation Metrics Guide
No ratings yet
Model Evaluation Metrics Guide
9 pages
Rohini College of Engineering and Technology
No ratings yet
Rohini College of Engineering and Technology
11 pages
Screenshot 2024-10-12 at 01.50.25
No ratings yet
Screenshot 2024-10-12 at 01.50.25
13 pages
Leadership Retreat 10012024
No ratings yet
Leadership Retreat 10012024
44 pages
Merged
No ratings yet
Merged
148 pages

Machine Learning Homework1 Solutions

Uploaded by

Machine Learning Homework1 Solutions

Uploaded by

MACHINE LEARNING HOMEWORK 1 SOLUTIONS

HARISH BALAJI BOOMINATHAN (hb2917)

Form of the function :

Where d is the degree of the polynomial.

Empirical Risk is given by

On partially differentiating with respect to 𝜃 and equating to 0 ,we get:

On multiplying and dividing by 𝑒 −𝑧 , we get

Using equation (1):

And hence proved.

We know that 𝑔(𝑔−1 (𝑦)) = 𝑦

On taking natural logarithm on both sides we have,

Using the properties of logarithm,

Again using properties of logarithm,

𝑓(𝑥; 𝜃) = (1 + exp(−𝜃 𝑇 𝑋))−1

That minimizes the Empirical Risk with logistical loss

Hence the gradient is,

Using Equation (1):

We can Apply batch gradient to get 𝜃 ∗ ,

We can Initialize 𝜃 0 to a small random matrix.

You might also like