0% found this document useful (0 votes)

20 views74 pages

Logistic Regression

Uploaded by

mertdene10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views74 pages

Logistic Regression

Uploaded by

mertdene10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

CMPE 442 Introduction to

Machine Learning
• Polynomial Regression
• Locally Weighted Linear Regression
• Logistic Regression/ Softmax Regression
Polynomial Regression

 What if your data is actually more complex than a simple straight line?
 We can use a linear model to fit nonlinear data
 A model is said to be linear if it is linear in parameters
y   0  1 x1   2 x2  

y   0  1 x   2 x 2  

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 Add powers of each feature as new features, then train a linear model on
this extended set of features
Polynomial Regression

 A kth order polynomial model in one variable is given by

y   0  1 x   2 x 2  ...   k x k  

 A second order polynomial model in two variable is given by

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 An array containing n features can be transformed into an array containing

( n  d )!
features, where d is polynomial order
d !n!
Polynomial Regression: Example

 Let’s generate some nonlinear data based on a simple quadratic equation

 x’s are assigned randomly
y  2  x  0.5 x 2  
Polynomial Regression: Example

A straight line poorly fits nonlinear data

Case for underfitting

y  3.52  0.98 x
Polynomial Regression: Example

Add powers of each feature as new features

x (i )  [ x (i ) , ( x (i ) ) 2 ]

y  2  x  0.5 x 2  

h(x)  2.04  0.96 x  0.51x 2

Polynomial Regression: Example
Polynomial Regression: Example
Evaluating the Performance of the
Model
 RMSE - is the square root of the average of the sum of the squares of residuals

1 m
RMSE ( X, h )  
m i 1
(h ( x (i ) )  y (i ) ) 2

 R2- score or the coefficient of determination explains how much the total variance of
the dependent variable can be reduced by using the least square regression.

𝑅 =1− where, 𝑆𝑆 = ∑ (𝑦 − 𝑦) and 𝑆𝑆 = ∑ (ℎ 𝑥 − 𝑦( ))

Locally Weighted Linear Regression

 In the original linear regression algorithm, to make a prediction at a query

point x, we would:
1. Fit θ to minimize (y i
(i )
 θT x ( i ) ) 2
T (i )
2. Output θ x

 In locally weighted linear regression algorithm

1. Fit θ to minimize w
i
(i )
( y ( i )  θT x ( i ) ) 2
2. Output θ x T (i )
Linear Regression

y  2  x  0.5 x 2  
Linear Regression

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

Linear Regression

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

𝑥 = 2.5 ℎ 2.5 = 2.04 + 0.96 ∗ 2.5 + 0.51 ∗ 2.5 = 7.63

Locally Weighted Linear Regression

y  2  x  0.5 x 2  
Locally Weighted Linear Regression

y  2  x  0.5 x 2  

𝑥 = 2.5
Locally Weighted Linear Regression

y  2  x  0.5 x 2  

𝑥 = 2.5
Locally Weighted Linear Regression

This line is learnt specifically

y  2  x  0.5 x 2   for x_new

𝑥 = 2.5
Locally Weighted Linear Regression

y  2  x  0.5 x 2  

𝑥 = 2.5
Gradient Descent

 Update Rule:
 𝜃 =𝜃 −η ∑ 𝑤 ( ) (ℎ 𝑥 ( ) − 𝑦 ( ) ) 𝑥 ()

 The magnitude of the update is proportional to the error and the weight of
a sample
Locally Weighted Linear Regression

 w(i)’s are non-negative valued weights.

 Weights depend on the particular point x at which we are trying to
evaluate x.

w  exp(
(i ) ( x ( i )  x) 2
)  bandwidth parameter
2 2

 If x  x is small, then w(i ) is close to 1

(i )

 If x  x is large, then w(i ) is small

(i )

 θ is chosen giving a much higher weight to the (errors on) training examples
close to the query point x.
Locally Weighted Linear Regression

 Linear regression algorithm is a parametric learning algorithm

 It has fixed, finite number of parameters which are fit to the data
 Once we fit θ’s and store them away, we don’t need to keep training data for
future predictions
 Locally weighted linear regression is a non-parametric algorithm
 To make predictions we need to keep entire training set around
 Amount of stuff we need to keep in order to represent the hypothesis h grows
linearly with the size of the training set
Question

 Is the gradient descent algorithm affected by the scales of features?

Question

 Is the gradient descent algorithm affected by the scales of features?

Feature Normalization

 We can speed up the gradient descent by normalizing features (having

them in approximately equal ranges),
 𝜃 will descent quickly on small ranges
1) Feature scaling: 𝑥 =
( )

2) Mean normalization: 𝑥 = 𝑜𝑟 𝑥 =
( )

 For polynomial regression feature normalization is even more important.

 Not needed for Normal Equation.
Summary on Linear Regression

 Considered the case when the hypothesis function is linear.

 We can make our hypothesis nonlinear by combining existing features and
obtaining more features and make the function quadratic, cubic.
 Closed-form solution: Normal Equations
 Scaling the data for gradient descent algorithm is important.
Logistic Regression

 Classification problem
 Similar to regression problem
 Assume a binary classifier: y can take on only two values 0 and 1.
 Ex.: spam email (1, positive class), ham email (0 negative class)
Linear Regression

ℎ 𝑥 = 𝜃 + 𝜃 𝑥 + 𝜃 𝑥 + ⋯+ 𝜃 𝑥
Classification Problem

Threshold classifier output at 0.5

If ℎ 𝑥 ≥ 0.5 predict 𝑦 = 1
ℎ 𝑥 = −0.34 + 1.58𝑥 If ℎ (𝑥) < 0.5 predict 𝑦 = 0
Classification Problem

 𝑦 ∈ 0,1
 So, we want 0≤ℎ 𝑥 ≤1
Logistic Regression

 Let’s change the form of hypotheses h (x)

1
h ( x )  g ( T x )  T
1  e  x

1
g (z)  Logistic function or sigmoid function
1  ez
Logistic Regression

1
g (z) 
1  ez
g ( z ) tends towards 1 as z  

g ( z ) tends towards 0 as z  

g(z) is always bounded between 0 and 1

1
h ( x )  g ( T x )  T
1  e  x

n
 x  0   j x j
T
x0  1
j 1
Logistic Regression
Predicts something that is
True or False, instead of
something continuous like
size
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
Interpretation of Logistic Function

 ℎ 𝑥 - can be estimated as a probability that 𝑦 = 1 on input 𝑥

 ℎ 𝑥 = 𝑃(𝑦 = 1|𝑥; 𝜃)
 𝑃 𝑦 = 0 𝑥; 𝜃 = 1 − 𝑃(𝑦 = 1|𝑥; 𝜃)
Decision Boundary

 Assume: 𝑦 = 1 𝑖𝑓 ℎ 𝑥 ≥ 0.5
𝑦 = 0 𝑖𝑓 ℎ < 0.5

ℎ 𝑥 = 𝑔(𝜃 + 𝜃 𝑥 + 𝜃 𝑥 )

−3
𝜃= 1
1

Predict 𝑦 = 1 if −3 + 𝑥 + 𝑥 ≥ 0

𝑥 +𝑥 =3 Decision Boundary
Logistic Regression

 Given the logistic regression model, how do we fit θ for it?

Logistic Regression

 Training set: { 𝑥 ,𝑦 , 𝑥 ,𝑦 , … , (𝑥 , 𝑦( )
)}
 𝑚 examples
𝑥
 𝑥= ⋮ 𝑥 =1
𝑥
 𝑦 ∈ {0,1}
 ℎ 𝑥 =
Cost Function

 Linear Regression: 𝐽 𝜃 = ∑ (ℎ 𝑥 − 𝑦( ))

 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 = (ℎ 𝑥 − 𝑦( ))
 Using this cost function for logistic regression gives non-convex function
 Not guaranteed to converge to global minima
Cost Function

− log ℎ 𝑥 𝑖𝑓 𝑦 = 1
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 =
− log 1 − ℎ 𝑥 𝑖𝑓 𝑦 = 0

 𝐶𝑜𝑠𝑡 = 0 𝑖𝑓 𝑦 = 1 𝑎𝑛𝑑 ℎ 𝑥 = 1
 But as ℎ 𝑥 → 0 𝐶𝑜𝑠𝑡 → ∞
 If ℎ 𝑥 = 0 => predict that 𝑝 𝑦 = 1 𝑥; 𝜃 = 0  if actually 𝑦 = 1 then
penalizes learning algorithm by a very large cost
Cost Function

− log ℎ 𝑥 𝑖𝑓 𝑦 = 1 1
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 = 𝐽 𝜃 = 𝐶𝑜𝑠𝑡(ℎ 𝑥 , 𝑦( ))
− log 1 − ℎ 𝑥 𝑖𝑓 𝑦 = 0 𝑚

Cost Cost

ℎ 𝑥 ℎ 𝑥
Cost Function

− log ℎ 𝑥 𝑖𝑓 𝑦 = 1
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 =
− log 1 − ℎ 𝑥 𝑖𝑓 𝑦 = 0
 𝑦 = 1 𝑜𝑟 𝑦 = 0 only two values are possible
 𝐶𝑜𝑠𝑡 ℎ 𝑥 , 𝑦 = −𝑦𝑙𝑜𝑔 ℎ 𝑥 − 1 − 𝑦 log 1 − ℎ 𝑥

 𝐽 𝜃 = ∑ 𝐶𝑜𝑠𝑡(ℎ 𝑥 , 𝑦( )) = − ∑ 𝑦 ( ) log(ℎ 𝑥 )+ 1−
𝑦 log(1 − ℎ 𝑥 )
 This is a convex function
Logistic Regression

 To fit parameters 𝜃:
 min 𝐽 𝜃 to obtain 𝜃

 To make a prediction on given new 𝑥:

 Output ℎ 𝑥 =  𝑝(𝑦 = 1|𝑥; 𝜃)
Gradient Descent

 Want min 𝐽 𝜃 :

Repeat{

𝜃 ≔𝜃 −𝞰 𝐽(𝜃)

} (Simultaneously update for all 𝜃 s)

Gradient Descent

1
𝐽 𝜃 = 𝐶𝑜𝑠𝑡(ℎ 𝑥 , 𝑦( ))
𝑚
1
=− 𝑦 ( ) log(ℎ 𝑥 )+ 1−𝑦 log(1 − ℎ 𝑥 )
𝑚

𝜕 1
𝐽 𝜃 = (ℎ 𝑥 − 𝑦 ( ) )𝑥 ()
𝜕𝜃 𝑚
Gradient Descent

 Want min 𝐽 𝜃 : 1
ℎ 𝑥 =
Repeat{ 1+𝑒
𝜃 ≔𝜃 −𝞰 ∑ (ℎ 𝑥 − 𝑦 ( ) )𝑥 ()

} (Simultaneously update for all 𝜃 s)

Python Implementation

Gives probabilities.
To get the classes just use log_reg.predict()
Example
Let’s apply Logistic Regression on Iris dataset.
Irisfamous dataset, contains the sepal and petal length and width of 150 iris flowers
Three different species: Iris-Setosa, Iris-Versicolor and Iris-Virginica
Iris Dataset
Example
Example
Nonlinear Decision Boundary
Nonlinear Decision Boundary

𝑥 =𝑥
Nonlinear Decision Boundary

ℎ 𝑥 = 𝑔(𝜃 + 𝜃 𝑥 + 𝜃 𝑥 )
𝟏
ℎ 𝑥 = 𝑔(𝜃 + 𝜃 𝑥 + 𝜃 𝑥 +𝜃 𝑥 +𝜃 𝑥 )

−1
0
𝟏 𝑋 𝜃= 0
−𝟏
1
−𝟏 1

Predict 𝑦 = 1 if −1 + 𝑥 +𝑥 ≥0

𝑥 +𝑥 ≥1 𝑥 +𝑥 =1
Multiclass Classification

 What if 𝑦 ∈ {1,2, … , 𝑘} classes

 Example: weather prediction {sunny, cloudy, snowy, …}

𝑋
One-vs-all Classifier (one-vs-rest)

Class 1:
Class 2:
Class 3:

𝑋 𝑋 𝑋

𝑋 𝑋 𝑋
ℎ (𝑥) ℎ (𝑥) ℎ (𝑥)
One-vs-all Classifier (one-vs-rest)

 ℎ 𝑥 = 𝑝 𝑦 = 𝑖 𝑥; 𝜃 𝑓𝑜𝑟 𝑖 = 1,2,3
 One-vs-all:
 Train a logistic regression classifier ℎ (𝑥) for each class 𝑖 to predict the probability
𝑦=1
 On a new input 𝑥, to make a prediction pick the class 𝑖 that maximizes max ℎ (𝑥)
Softmax Regression

 Multi-class classification
 Multinomial Logistic Regression
 Generalization of logistic regression
 𝑦 ∈ {1,2, … , 𝐾}
Softmax Regression

 Given a test input 𝑥, estimate the probability that 𝑃(𝑦 = 𝑘|𝑥) for each value
of 𝑘 = 1, … , 𝐾
 Estimate the probability of the class label taking on each of the 𝐾 different
possible values.
 Our hypothesis will output a 𝐾-dimensional vector (whose elements sum to
1) giving 𝐾 estimated probabilities.
Softmax Regression: Hypothesis

 Hypothesis ℎ (𝑥) takes the form:

𝑃(𝑦 = 1|𝒙; 𝜃) exp(𝜽 𝒙)
𝑃(𝑦 = 2|𝒙; 𝜃) exp(𝜽 𝒙)
 ℎ 𝒙 = =∑
⋮ (𝜽 𝒙) ⋮
𝑃(𝑦 = 𝐾|𝒙; 𝜃) exp(𝜽 𝒙)
 𝜽( ) , 𝜽( ) , … , 𝜽( )
∈ℜ parameters of the model
 ∑
normalizes distribution so it sums to one
(𝜽 𝒙)

 1 - indicator function
 1 𝑇𝑟𝑢𝑒 𝑠𝑡𝑎𝑡𝑒𝑚𝑒𝑛𝑡 = 1 and 1 𝐹𝑎𝑙𝑠𝑒 𝑠𝑡𝑎𝑡𝑒𝑚𝑒𝑛𝑡 = 0
 Minimize the cost function:

()
exp(𝜽 𝒙( ) )
𝐽 𝜃 =− 1𝑦 = 𝑘 𝑙𝑜𝑔
∑ exp(𝜽 𝒙( ) )

(𝒊)
exp(𝜽 𝒙( ) )
𝑃 𝑦 = 𝑘 𝒙 ;𝜽 =
∑ exp(𝜽 𝒙( ) )
Softmax Regression

 We cannot solve for the minimum of 𝐽(𝜃) analytically

 Use iterative optimization algorithm.
 Taking derivatives, one can show that the gradient is:

𝛻 𝐽 𝜽 =− 𝒙 (1 𝑦 = 𝑘 − 𝑃(𝑦 = 𝑘|𝒙; 𝜽))

exp(𝜽 𝒙( ) )
𝑃 𝑦 =𝑘 𝒙(𝒊) ; 𝜽 =
∑ exp(𝜽 𝒙( ) )
Example: Iris dataset
Python Implementation

 ScikitLearn’s Logistic Regression uses one-versus-all bu default

 Switch to Softmax Regression by setting multi-class=“multinomial”
Example: MNIST dataset

 70000 images of digits handwritten by high school students and employees

of the US Census Bureau
 Each image is 28x28 pixels (hence has 784 features)
 Each feature simply represents one pixel’s intensity, from 0 (white) to 255
(black)
 Is already split into a training set ( the first 60000 images) and a test set (last
10000 images)
Example: MNIST dataset
Example: MNIST dataset
Learned Weights
Example: MNIST dataset

Illustration of the decrease of the cost

function over time

 Training Accuracy: 0.9209666666666667

 Test Accuracy: 0.9206
Question

 Suppose you train a logistic regression classifier and your hypothesis function H is

 Which of the following figure will represent the decision boundary as given by above classifier?
Question

 Imagine, you have given the below graph of logistic regression which is shows the relationships
between cost function and number of iteration for 3 different learning rate values (different colors
are showing different curves at different learning rates ).
Suppose, you save the graph for future reference but you forgot to save the value of
different learning rates for this graph. Now, you want to find out the relation between
the leaning rate values of these curve. Which of the following will be the true relation?
Note:
1. The learning rate for blue is l1
2. The learning rate for red is l2
3. The learning rate for green is l3

A) l1>l2>l3
B) l1 = l2 = l3
C) l1 < l2 < l3
D) None of these
Question

 Suppose you are dealing with 4 class classification problem and you want to train a logistic
regression model on the data for that you are using One-vs-all method.
 How many logistic regression models do we need to fit in 4-class classification problem?

AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Logistic Regression
No ratings yet
Logistic Regression
61 pages
Data Science L19 - LogisticRegression
No ratings yet
Data Science L19 - LogisticRegression
52 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Intro to Classification & Regression
No ratings yet
Intro to Classification & Regression
42 pages
3 Logistic Regression and Regularization
No ratings yet
3 Logistic Regression and Regularization
42 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
DDA3020 Lecture 06 Logistic Regression
No ratings yet
DDA3020 Lecture 06 Logistic Regression
47 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
2.1.3 Logistic Regression
No ratings yet
2.1.3 Logistic Regression
17 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
41 pages
Fileml
No ratings yet
Fileml
54 pages
Final ML
No ratings yet
Final ML
54 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Cost Function
No ratings yet
Cost Function
17 pages
Supervised Learning Part2
No ratings yet
Supervised Learning Part2
27 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
25 pages
CH3 Logistic Regression 2020
No ratings yet
CH3 Logistic Regression 2020
28 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
04 LogisticRegression
No ratings yet
04 LogisticRegression
29 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
Logistic - Regression Class 3
No ratings yet
Logistic - Regression Class 3
88 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function - GRP 06
No ratings yet
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function - GRP 06
31 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Notes 05
No ratings yet
Notes 05
51 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
VLM ZG513 Updated Problem Sheet
No ratings yet
VLM ZG513 Updated Problem Sheet
4 pages
LR2
No ratings yet
LR2
25 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
38 pages
Polynomials 50marks 1
No ratings yet
Polynomials 50marks 1
2 pages
Graphical Solutions for Linear Programming
No ratings yet
Graphical Solutions for Linear Programming
13 pages
Grade 9 Informal Activities For Algebraic Expressions Learner Activities
No ratings yet
Grade 9 Informal Activities For Algebraic Expressions Learner Activities
17 pages
Adobe Scan 23-Jun-2023
No ratings yet
Adobe Scan 23-Jun-2023
25 pages
Q. Wang, J. S. Arora - Optimization of Large-Scale Truss Structures Using Sparse SAND Formulations
No ratings yet
Q. Wang, J. S. Arora - Optimization of Large-Scale Truss Structures Using Sparse SAND Formulations
18 pages
Constrained Optimization 1
No ratings yet
Constrained Optimization 1
7 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
9th Polynomial Test Paper-5
No ratings yet
9th Polynomial Test Paper-5
1 page
Econometrics Diagnostics for Finance
No ratings yet
Econometrics Diagnostics for Finance
3 pages
Math 265 Answer Key
No ratings yet
Math 265 Answer Key
5 pages
CH 2
No ratings yet
CH 2
25 pages
Grade 8 Math Curriculum Guide
No ratings yet
Grade 8 Math Curriculum Guide
2 pages
Machine Learning Week 3
No ratings yet
Machine Learning Week 3
4 pages
Hermite Interpolation
No ratings yet
Hermite Interpolation
52 pages
Trapezoidal Rule - Numeric Method
No ratings yet
Trapezoidal Rule - Numeric Method
2 pages
O It Is The Simple Algorithm Which Is Simpler Then Psio!
No ratings yet
O It Is The Simple Algorithm Which Is Simpler Then Psio!
1 page
Bba Bs 340 Unit 5&6
No ratings yet
Bba Bs 340 Unit 5&6
22 pages
Ejercicio Resuelto MS PDF
No ratings yet
Ejercicio Resuelto MS PDF
2 pages
Precalculus Binomial Worksheet
No ratings yet
Precalculus Binomial Worksheet
2 pages
ECON814 January 2025 Exam Paper - Final Version-1-1
No ratings yet
ECON814 January 2025 Exam Paper - Final Version-1-1
6 pages
M2T2
No ratings yet
M2T2
2 pages
Reservoir Simulation Techniques
No ratings yet
Reservoir Simulation Techniques
23 pages
CISE301 Topic8L2
No ratings yet
CISE301 Topic8L2
31 pages
Math 7 - Q2
No ratings yet
Math 7 - Q2
2 pages
NM Unit 2 PDF
No ratings yet
NM Unit 2 PDF
20 pages
Class 9 Practice Worksheet - Polynomial
No ratings yet
Class 9 Practice Worksheet - Polynomial
2 pages
Ecc 501 Cat 1 2020 2021
No ratings yet
Ecc 501 Cat 1 2020 2021
4 pages
A Technique For Factorising Trinomials With Non-Unitary X 2 Coefficient
No ratings yet
A Technique For Factorising Trinomials With Non-Unitary X 2 Coefficient
2 pages
Higher Degree Polynomials
No ratings yet
Higher Degree Polynomials
58 pages
Newton-Raphson Method of Solving A Nonlinear Equation: After Reading This Chapter, You Should Be Able To
No ratings yet
Newton-Raphson Method of Solving A Nonlinear Equation: After Reading This Chapter, You Should Be Able To
10 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

CMPE 442 Introduction to

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 A kth order polynomial model in one variable is given by

 A second order polynomial model in two variable is given by

y   0  1 x1   2 x2  11 x12   22 x22  12 x1 x2  

 An array containing n features can be transformed into an array containing

 Let’s generate some nonlinear data based on a simple quadratic equation

A straight line poorly fits nonlinear data

Add powers of each feature as new features

h(x)  2.04  0.96 x  0.51x 2

𝑅 =1− where, 𝑆𝑆 = ∑ (𝑦 − 𝑦) and 𝑆𝑆 = ∑ (ℎ 𝑥 − 𝑦( ))

 In the original linear regression algorithm, to make a prediction at a query

 In locally weighted linear regression algorithm

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

y  2  x  0.5 x 2   h(x)  2.04  0.96 x  0.51x 2

𝑥 = 2.5 ℎ 2.5 = 2.04 + 0.96 ∗ 2.5 + 0.51 ∗ 2.5 = 7.63

This line is learnt specifically

 w(i)’s are non-negative valued weights.

 If x  x is small, then w(i ) is close to 1

 If x  x is large, then w(i ) is small

 Linear regression algorithm is a parametric learning algorithm

 Is the gradient descent algorithm affected by the scales of features?

 Is the gradient descent algorithm affected by the scales of features?

 We can speed up the gradient descent by normalizing features (having

 For polynomial regression feature normalization is even more important.

 Considered the case when the hypothesis function is linear.

Threshold classifier output at 0.5

 Let’s change the form of hypotheses h (x)

g(z) is always bounded between 0 and 1

 ℎ 𝑥 - can be estimated as a probability that 𝑦 = 1 on input 𝑥

 Given the logistic regression model, how do we fit θ for it?

 To make a prediction on given new 𝑥:

} (Simultaneously update for all 𝜃 s)

} (Simultaneously update for all 𝜃 s)

 What if 𝑦 ∈ {1,2, … , 𝑘} classes

 Hypothesis ℎ (𝑥) takes the form:

 We cannot solve for the minimum of 𝐽(𝜃) analytically

𝛻 𝐽 𝜽 =− 𝒙 (1 𝑦 = 𝑘 − 𝑃(𝑦 = 𝑘|𝒙; 𝜽))

 ScikitLearn’s Logistic Regression uses one-versus-all bu default

 70000 images of digits handwritten by high school students and employees

Illustration of the decrease of the cost

 Training Accuracy: 0.9209666666666667

You might also like