0% found this document useful (0 votes)

11 views47 pages

01B DL2023 LinearModels

The document outlines concepts in supervised learning, focusing on classification and regression problems using linear models, including linear and logistic regression. It explains the learning process, cost functions, gradient descent, and feature scaling, along with their applications in predicting outcomes. Additionally, it covers logistic regression for probability predictions and classification based on the sigmoid function.

Uploaded by

marshe386

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views47 pages

01B DL2023 LinearModels

Uploaded by

marshe386

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

DASC7606-1B

Deep Learning
Linear Models

Dr Bethany Chan
Professor Francis Chin

2023
1
Outline
• Supervised learning, classification &
regression problems, linear models
• Linear regression and gradient descent
• Logistic regression and classification

2
How did we learn in school?
Asked Questions
given Answers

Questions & Answers

® Practice exercises
(training data)
® Mock exams (validation)
® Final exams (testing)

Supervised Learning

3
Classification problems (discrete answers)

• Example 1: Given an image, • Example 3: Given image of

determine whether an handwritten digit,
image is a dog or not determine which digit is it
• Example 2: Given a loan
applicant, approve or deny
the loan
salary 150,000
current debt 75,000
age 28 years old
years in current job 3
… … Binary vs multi- classification

4
Regression problems (continuous answers)
• Example 1: Given the car camera view, predict how much you
should turn the steering wheel
• Example 2: Given the features of a flat (e.g. size, number of
bedrooms, number of bathrooms, age of building), predict the
rent of the flat
1st 2nd 3rd 4th
feature feature feature feature label
Size No. of No. of Age of Rent per
(sq.ft.) Bedrooms Bathrooms Building (yrs) Month ($)
x1 1700 4 3 10 70k
x2 1420 3 2 12 54k
x3 1290 4 1.5 8 45k
x4 880 2 2 2 40k
x5 510 2 2 3 26.5k
5
Learning Model for supervised learning
System
Questions Answers
(unknown function)
(x1, x2, … xM) (y1, y2, … yM)
f: X ®Y

Past data on Known answers

Picture Training Data Set Dog or not dog
Car camera view (x1, y1), (x2, y2), … , (xM, yM) Turn right 15%
… …
Learning Algorithm

New Trained machine Predicted

Question (hypothesis Answer
function h»f)

6
Perspective on

Classification Regression

7
Linear models for supervised learning

System
Questions Answers
(unknown function)
(x1, x2, … xM) (y1, y2, … yM)
f: X ®Y

Training Data Set

(x1, y1), (x2, y2), … , (xM, yM)
Learning coefficients or
weights q of a line or
Learning Algorithm hyperplane to minimize
error
New Trained machine Predicted
Question (hypothesis Answer
function h»f)

8
Outline
• Supervised learning, classification &
regression problems, linear models
• Linear regression and gradient descent
• Logistic regression and classification

9
Linear Regression
Predict ℎ(𝒙) given 𝒙 = (𝑥!, … , 𝑥" ) Loss function vs Cost function
Find hyperplane or line:
ℎ 𝒙 = 𝜃# + 𝜃!𝑥! + ⋯ + 𝜃" 𝑥" A loss function is for a single
[or ℎ𝜽 𝒙 = 𝜽𝒙 with 𝑥" = 1] training example (sometimes
to minimize error called error function).

ℎ(𝑥) A cost function 𝐽(𝜽) is

the average loss over the
entire training dataset, which
is to be minimized.

𝑥
10
Linear Regression cost functions
𝑦
𝑥 𝑦
1.00 1.00 hq(x)
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25

Mean square error

! )
= ) ∑*+!(h(𝑥* ) − 𝑦* )2 (norm 2)

Absolute Error Loss

!
= ∑) *+! h(𝑥* ) − 𝑦* | (norm 1) 𝑥
)
11
'
Look at cost 𝐽 𝜽 = ( ∑(
)*' ℎ 𝑥) − 𝑦)
+
(norm 2)

where ℎ 𝑥) = 𝜃, + 𝜃' 𝑥) = 0.785 + 0.425𝑥)

ℎ 𝑥 −𝑦 ℎ 𝑥 −𝑦 2
𝑥 𝑦 ℎ(𝑥)
1.00 1.00 1.210 0.210 0.044
2.00 2.00 1.635 -0.365 0.133
3.00 1.30 2.060 0.760 0.578
4.00 3.75 2.485 -1.265 1.600
5.00 2.25 2.910 0.660 0.436

𝜃#

𝜃!
12
Iterative way to learn 𝜽

Make an initial guess for 𝜽

Calculate the error 𝐽 𝜽
Repeat until error is small enough:
make a better guess for 𝜽
calculate the error 𝐽 𝜽
Return θ

Technique: Gradient Descent

13
Gradient Descent (one feature)
Objective is to minimize cost function:
'
J(q0, q1) = ∑(
)*' ℎ- 𝑥) − 𝑦)
+
(

Random guess of (q0, q1)

14
Random guess of (q0, q1)
Gradient is the
slope of the
tangent at (q0, q1)

Move in opposite direction of gradient (steepest descent)

15
J(q)

tangent line
Learning
rate

-aDJ(q)

DJ(q)

q
Move in opposite direction of gradient (steepest descent)
Step size is determined by the gradient magnitude DJ(q),
larger step size for steeper tangent line
Small step when close to the minimum, DJ(q) is small
as the tangent line is almost horizontal.
16
Random guess of (q0, q1)
Gradient is the
slope of the
tangent at (q0, q1)

Make steps
repeatedly
Move in opposite direction of gradient (steepest descent)
17
Gradient Descent (one feature)
Make steps of size a (learning rate) down the cost
function J in the direction with the steepest descent
(as determined by slope of the tangent at (q0, q1) )

Repeat until error is small enough:

𝜕
𝜃! ← 𝜃! − 𝛼 J(q0, q1)
𝜕𝜃!
𝜕
𝜃" ← 𝜃" − 𝛼 J(q0, q1)
𝜕𝜃"
18
Cost Function for Gradient Descent
(one variable)

Mean-square error:
!
J(q0, q1) = ∑"
#$! ℎ% 𝑥# − 𝑦#
&
"

Half mean-square error :

!
J(q0, q1) = ∑"
#$! ℎ% 𝑥# − 𝑦#
&
&"

19
Taking derivatives (one variable)
!
J(q0, q1) = ,) ∑)
*+! ℎ- 𝑥* − 𝑦*
,
and ℎ- 𝑥* = 𝜃# + 𝜃!𝑥*
)
𝜕 1
J(q0, q1) = 7 ℎ- 𝑥* − 𝑦*
𝜕𝜃# 𝑀
*+!
)
𝜕 1
J(q0, q1) = 7 ℎ- 𝑥* − 𝑦* 𝑥*
𝜕𝜃! 𝑀
*+!

Repeat until error is small enough:

Repeat: )
𝜕 1
𝜃" ← 𝜃" − 𝛼 J(q , q ) 𝜃# ← 𝜃# − 𝛼 7 ℎ- 𝑥* − 𝑦*
𝜕𝜃" 0 1 𝑀
*+!
)
𝜕 1
𝜃# ← 𝜃# − 𝛼 J(q , q ) 𝜃! ← 𝜃! − 𝛼 7 ℎ- 𝑥* − 𝑦* 𝑥*
𝜕𝜃# 0 1
𝑀
*+! 20
Problem: Learning Rate

Small learning rate: Large learning rate:

- Many iterations - Overshooting
till convergence - No convergence
- Trapped in local
minimum
21
Learning Rate and No. of Iterations
• Plot no. of iterations x-axis
cost J(q) y-axis
• If increases, then decrease learning rate a
• Stop when D J(q) smaller than chosen threshold

22
Gradient Descent (multiple features)
Repeat until error is small enough:
'
𝜃, ← 𝜃, − 𝛼 ( ∑(
)*' ℎ- 𝒙) − 𝑦) 𝑥),,
'
𝜃' ← 𝜃' − 𝛼 ∑(
)*' ℎ- 𝒙) − 𝑦) 𝑥),'
(
'
𝜃+ ← 𝜃+ − 𝛼 ∑(
)*' ℎ- 𝒙) − 𝑦) 𝑥),+
(
…
'
𝜃4 ← 𝜃+ − 𝛼 ( ∑(
)*' ℎ- 𝒙) − 𝑦) 𝑥),4
Þ Repeat until error is small enough:
'
𝜃5 ← 𝜃5 − 𝛼 ( ∑(
)*' ℎ- 𝒙) − 𝑦) 𝑥),5 for j = 0,…,N

23
Multiple feature example
1st 2nd 3rd 4th
feature feature feature feature label
Size No. of No. of Age of Rent per
(sq.ft.) Bedrooms Bathrooms Building (yrs) Month ($)
x1 1700 4 3 10 70k
x2 1420 3 2 12 54k
x3 1290 4 1.5 8 45k
x4 880 2 2 2 40k
x5 510 2 2 3 26.5k

𝑥!" = 𝑗 th feature of 𝑖 th example with 𝑥!,$ = 1

𝑦! = label associated with 𝑖 th example

𝒙𝟏 = 𝑥&,$ , 𝑥&,& , 𝑥&,' , 𝑥&,( , 𝑥&,) = (1, 1700, 4, 3, 10)

24
Feature Scaling
• Note that the value ranges of different features may
be very different, e.g., size of flat is in hundreds or
thousands sq. ft., age of flat is in tens.
• As a used across all N features (variables), would like
input values to be roughly in same range
Þ feature scaling or mean normalization
𝑥*J − meanJ
𝑥*J ←
maxJ − minJ

25
Gradient Descent (multiple variables)
Make steps of size a (learning rate) down the cost
function J in the direction with the steepest descent
(as determined by slope of the tangent at (q0, q1,… qN))

Repeat until error is small enough:

(
1
𝜃5 ← 𝜃5 − 𝛼 0 ℎ- 𝒙) − 𝑦) 𝑥),5 for j = 0,…,N
𝑀
)*'

K L
[ Or 𝛉 ← 𝛉 − )X X𝛉 − 𝒚 in vectorized form]
26
Outline
• Supervised learning, classification &
regression problems, linear models
• Linear regression and gradient descent
• Logistic regression and classification

27
Logistic Regression
Predict the probability of an event occurring
(e.g. prediction of a heart attack)

Prediction : ℎ 𝒙 = 𝜎 𝑠 = 𝜎(𝛉𝒙)
s : RN → [0,1] interpreted as a probability

𝛉𝒙 gives a sort of “risk score” that gets passed

through the sigmoid function (a.k.a logistic
function) s in order to determine the probability
the event (e.g. heart attack)
28
Logistic Function (sigmoid)
𝑒' 1
𝜎 𝑠 = '
=
1+𝑒 1 + 𝑒 ('

s®¥

s ® -¥ s=0

29
Logistic Regression and classification
𝑒' 1
𝜎 𝑠 = '
=
1+𝑒 1 + 𝑒 ('
For classification
Predict y = 1

Predict y = 0

If the output > 0.5 (50% Probability),

it is classified as positive class.
If the output < 0.5,
it is classified as negative class.
30
Logistic Regression Loss Function
y h(x)
Loss is 0
Loss is very high
Loss is very high
Loss is 0
entropy

• To predict positive class (y = 1), loss = - log (h(x))

• To predict negative class (y = 0), loss = - log (1 - h(x))
loss Positive class (y = 1) loss Negative class
(y = 0)
loss = - log (h(x))
loss = - log(1-h(x))

h(x) h(x)
Logistic Regression Loss Function

− log ℎ- 𝑥 if 𝑦 = 1
J ℎ- 𝑥 , 𝑦 = '
− log 1 − ℎ- 𝑥 if 𝑦 = 0

J ℎ- 𝑥 , 𝑦 =
−𝑦 log ℎ- 𝑥 − (1 − 𝑦) log 1 − ℎ- 𝑥
Gradient Descent (Logistic Regression)
!
𝐽 𝛉 =− " ∑"
#$! [𝑦# log ℎ 𝒙# + (1 − 𝑦# ) log(1 − ℎ 𝒙# )]
M ! " %
MN!
J(q) =− " ∑#$! %& [𝑦# log ℎ 𝒙# + (1 − 𝑦# ) log(1 − ℎ 𝒙# )]
%

𝐽! (𝛉)
!
where ℎ 𝒙# = 𝜎 𝑠# ; 𝑠# = 𝒙# 𝛉, 𝜎 𝑠# =
!'( &𝒔(
𝑠* = 𝒙* 𝛉 = θ!𝑥*,!+ … θJ 𝑥*,J + … θ" 𝑥*," ; N = dimension
67! (𝛉) 67!
6: 𝒙! 6𝒔!
69"
= 6: 𝒙! 6;! 69"
(chain rule)
67! =! '>=!
= − M
log
!
x=Q
6: 𝒙! : 𝒙! '>: 𝒙! MQ
=! >: 𝒙!
=
: 𝒙! ('>: 𝒙! )
33
Gradient Descent (Logistic Regression)
𝐽) (𝛉) = 𝑦) log ℎ 𝒙) + (1 − 𝑦) ) log(1 − ℎ 𝒙) )
*
where ℎ 𝒙* = 𝜎 𝑠) , 𝜎 𝑠) =
*+, ,𝒔-
and 𝑠* = 𝒙* 𝛉 = θ! 𝑥*,! + … θJ 𝑥*,J + … θ" 𝑥*,"
MU$ (𝛉) MU$ MV 𝒙$ MW$
MN
= MV 𝒙 MW MN
(chain rule)
% $ $ %
MU$ X$ YV 𝒙$ Linear Regression
1) = *
MV 𝒙$ V 𝒙$ (!YV 𝒙$ ) J(q) = ∑. -
-. )/* ℎ 𝒙) − 𝑦)
*𝒔" − 1 01(𝛉)
* .
MV 𝒙$ [ &𝒔$ 1 + 𝑒 = ∑)/* ℎ 𝒙) −𝑦) 𝑥),7
2) M𝒔$
= &𝒔
(!\[ $ ) ! (1 + 𝑒 *𝒔" )(1 + 𝑒 *𝒔" ) 05 . .

! !
= 1− = 𝜎(𝑠! )(1 − 𝜎(𝑠! ))
!\[ &𝒔$ !\[ &𝒔$
M𝒔$ Repeat until error is small:
3) = 𝑥*,J
MN% as ℎ 𝒙( = 𝜎 𝑠! 6
MU$ (𝛉) θ5 ← θ5 − 𝛼 69 J(q)
= 𝑦* − ℎ 𝒙* 𝑥*,J '
MN%
for j = 0, …., N
MU(𝛉) "
MN%
= − # ∑#
!$" 𝑦! − ℎ 𝒙! 𝑥!,& Same equation as Linear Regression
34
Logistic Function and classification
𝑒' 1
𝜎 𝑠 = '
=
1+𝑒 1 + 𝑒 ('
For classification
Predict y = 1
threshold

Predict y = 0

35
Outcomes of Binary Classification
predict 1 if 𝑓 𝑥 > threshold
predict 0 otherwise
Actual 0 Actual 1
Predicted 0 true negative false negative
Predicted 1 false positive true positive

• True positives:
data points predicted as positive that are actually positive
• False positives:
data points predicted as positive that are actually negative
• True negatives:
data points predicted as negative that are actually negative
• False negatives:
data points predicted as negative that are actually positive
Accuracy of Predictions
predict 1 if 𝑓 𝑥 > threshold
predict 0 otherwise
Actual 0 Actual 1
Predicted 0 true negative false negative
Predicted 1 false positive true positive

precision means how “accurate” is the answer

true positives true positives
precision = =
predicted positives false positives + true positives

recall means how “𝑔𝑜𝑜𝑑” is the answer,

i. e. , the ability to Yind the correct ones.
true positives true positives
recall = =
actual positives false negatives + true positives
37
Accuracy of Predictions
predict 1 if 𝑓 𝑥 > threshold
predict 0 otherwise

Actual 0 Actual 1
Predicted 0 true negative false negative
Predicted 1 false positive true positive

true positives true positives

precision = =
predicted positives false positives + true positives

threshold precision
¯ threshold ¯ precision

38
Accuracy of Predictions
predict 1 if 𝑓 𝑥 > threshold
predict 0 otherwise

Actual 0 Actual 1
Predicted 0 true negative false negative
Predicted 1 false positive true positive

true positives true positives

recall = =
actual positives false negatives + true positives

threshold precision ¯ recall

¯ threshold ¯ precision recall

39
Combining Precision and Recall
• Ideally both precision and recall are 1
• With different thresholds, we can have higher
precision and recall values
• Depending on applications
– Disease screening - higher recall
– Prosecution of criminals – higher precision
– Identify terrorists - both
precisi𝑜𝑛 ∗ recall
F1 = 2∗
precisi𝑜𝑛 + recall
• Harmonic mean instead of simple average to
penalize extreme values
40
Logistic Regression, diagrammatically

Prediction : ℎ 𝒙 = 𝜎 𝑠 = 𝜎(𝛉𝒙)

1
𝜃$
𝑥&
𝜃& 𝑠 ℎ(𝒙)
∑ 𝜎
:

𝜃/
𝑥/

41
Multi-classification example
MNIST (Mixed National Institute of Standards and Technology) database

42
Multi-Classification
Pr(Class 1) Pr(Class 1)

Softmax
Pr(Class 2) Pr(Class 2)

: :

Pr(Class K) Pr(Class K)

Pr(Class x) = Probability of Softmax function: not only

Class x being the correct class normalizes a set of scores to
numbers in [0,1] but also
Class with highest probability
makes sure that the
is the predicted
numbers all add up to 1
Sigmoid vs. Softmax
For sigmoid (2-class):
!
Class 1 probability: 𝜎 𝑠 = !\[ "#
[ "# !
Class 2 probability: 1 − 𝜎 𝑠 = =
!\[ "# !\[ #

For softmax (𝐾-class): For softmax (2-class):

[ #& !
Given scores 𝑠!, 𝑠,, … , 𝑠d Class 1 probability: [ #& \[ #'
= !\[ #' "#&
[ #$
Class 𝑘 probability: ∑ [ #% [ #' !
Class 2 probability: [ #& \[ #'
= !\[ #& "#'

44
• Question: Why do we have to pass each value
through an exponential before normalizing them?
Why can’t we just normalize the values themselves?
• Answer: This is because the goal of softmax is to
make sure one value is very high (close to 1) and
all other values are very low (close to 0).
• Softmax uses exponential to make sure this happens.

5
Softmax

0.3

2.1
s h(x)
Loss function for multi-classification
Categorical Cross Entropy loss Positive class (y = 1)
loss = - log (h(x))
Loss ℎ 𝑥 , 𝑦
= − 0 𝑦 log ℎ 𝑥
B
h(x)

Ex: Predicted h(x) Label y

[0.94, 0.01, 0.05] [0, 1, 0]

Loss = −[0 ∗ log 0.94 + 1 ∗ log 0.01 + 0 ∗ log 0.05 ] = − log(0.01)

46
Another Loss Function: Hinge Loss
• Probability for the correct class (pi,ci) should be
> the other probabilities by at least margin ∆
• For instance i: 𝐿𝑖 = ∑678# max(0, pi,j − pi,ci +∆ )
• Cost function = ∑"
#$! 𝐿𝑖

Predicted h(x) Label y

Ex:
[0.25, 0.35, 0.4] [0, 0, 1]
With ∆ = 0.1
𝐿𝑖 = max (0, 0.25-0.4+∆) + max (0, 0.35-0.4+∆)
= 0 + 0.05 = 0.05

AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Cost Function
No ratings yet
Cost Function
17 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Regression
No ratings yet
Regression
30 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Machine Learning Basics for Students
No ratings yet
Machine Learning Basics for Students
7 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229
No ratings yet
CS229
69 pages
Notes 1
No ratings yet
Notes 1
30 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
CS 229: Supervised Learning Basics
100% (1)
CS 229: Supervised Learning Basics
48 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
No ratings yet
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
89 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Linear Regression For Machine Learning Course
No ratings yet
Linear Regression For Machine Learning Course
41 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
8 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Week 6
No ratings yet
Week 6
72 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
Week 04
No ratings yet
Week 04
101 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
11964-Article Text-21255-1-10-20220114
No ratings yet
11964-Article Text-21255-1-10-20220114
8 pages
LDA in Python: Machine Learning Lab
No ratings yet
LDA in Python: Machine Learning Lab
12 pages
Quiz Week 5 - Attempt Review
No ratings yet
Quiz Week 5 - Attempt Review
6 pages
Football Rating System in PA
No ratings yet
Football Rating System in PA
9 pages
2022-2023 AI Machine Learning Deep Learning NLP Computer Vision
No ratings yet
2022-2023 AI Machine Learning Deep Learning NLP Computer Vision
5 pages
Responsible Ai Best Practices For Creating Trustworthy Ai Systems Early Release 9780138073947 9780138073923 0138073929
No ratings yet
Responsible Ai Best Practices For Creating Trustworthy Ai Systems Early Release 9780138073947 9780138073923 0138073929
340 pages
Kernel Methods and Machine Learning S. Y. Kung Download
No ratings yet
Kernel Methods and Machine Learning S. Y. Kung Download
58 pages
Module1 ML
No ratings yet
Module1 ML
13 pages
E Journals
No ratings yet
E Journals
16 pages
Bnblist 3570
No ratings yet
Bnblist 3570
272 pages
GitHub - Peggy1502 - Amazing-Resources - List of References and Online Resources Related To Data Science, Machine Learning and Deep Learning
No ratings yet
GitHub - Peggy1502 - Amazing-Resources - List of References and Online Resources Related To Data Science, Machine Learning and Deep Learning
41 pages
NLP Techniques for Hate Speech Detection
No ratings yet
NLP Techniques for Hate Speech Detection
4 pages
6G Report
No ratings yet
6G Report
33 pages
DSS Lec.1
No ratings yet
DSS Lec.1
21 pages
The Coming AI Economic Revolution
No ratings yet
The Coming AI Economic Revolution
14 pages
#INTERACTION Handout With Quiz 14
No ratings yet
#INTERACTION Handout With Quiz 14
8 pages
An Ensemble Features Aware Machine Learning Model For Detection and Staging of Dyslexia
No ratings yet
An Ensemble Features Aware Machine Learning Model For Detection and Staging of Dyslexia
10 pages
PyTorch Guide
No ratings yet
PyTorch Guide
17 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
31 pages
Data Science & ML Expert Profile
No ratings yet
Data Science & ML Expert Profile
1 page
Whitepaper Artificial Intelligence - Transforming Regulatory Affairs 2023
No ratings yet
Whitepaper Artificial Intelligence - Transforming Regulatory Affairs 2023
15 pages
CS405-6 2 1 2-Wikipedia
No ratings yet
CS405-6 2 1 2-Wikipedia
7 pages
1 s2.0 S2589721722000034 Main
No ratings yet
1 s2.0 S2589721722000034 Main
8 pages
Towards A Continuously Learning Humanoid, AI-Driven Sensory Perception and Adaptive Intelligence.
No ratings yet
Towards A Continuously Learning Humanoid, AI-Driven Sensory Perception and Adaptive Intelligence.
21 pages
w09 s01 Evaluation Part02
No ratings yet
w09 s01 Evaluation Part02
14 pages
Cyc 6
No ratings yet
Cyc 6
26 pages
Nlp Basic 03-N-gram Language Model: Nguyễn Quốc Thái
No ratings yet
Nlp Basic 03-N-gram Language Model: Nguyễn Quốc Thái
31 pages
Razaq Resume
No ratings yet
Razaq Resume
1 page
Oneata
No ratings yet
Oneata
7 pages
Business Intelligence Unit 5
No ratings yet
Business Intelligence Unit 5
12 pages

01B DL2023 LinearModels

Uploaded by

01B DL2023 LinearModels

Uploaded by

DASC7606-1B

Questions & Answers

• Example 1: Given an image, • Example 3: Given image of

Past data on Known answers

New Trained machine Predicted

Training Data Set

ℎ(𝑥) A cost function 𝐽(𝜽) is

Mean square error

Absolute Error Loss

where ℎ 𝑥) = 𝜃, + 𝜃' 𝑥) = 0.785 + 0.425𝑥)

Make an initial guess for 𝜽

Technique: Gradient Descent

Random guess of (q0, q1)

Move in opposite direction of gradient (steepest descent)

Repeat until error is small enough:

Half mean-square error :

Repeat until error is small enough:

Small learning rate: Large learning rate:

𝑥!" = 𝑗 th feature of 𝑖 th example with 𝑥!,$ = 1

𝒙𝟏 = 𝑥&,$ , 𝑥&,& , 𝑥&,' , 𝑥&,( , 𝑥&,) = (1, 1700, 4, 3, 10)

Repeat until error is small enough:

𝛉𝒙 gives a sort of “risk score” that gets passed

If the output > 0.5 (50% Probability),

• To predict positive class (y = 1), loss = - log (h(x))

precision means how “accurate” is the answer

recall means how “𝑔𝑜𝑜𝑑” is the answer,

true positives true positives

true positives true positives

­ threshold ­ precision ¯ recall

Pr(Class x) = Probability of Softmax function: not only

For softmax (𝐾-class): For softmax (2-class):

Ex: Predicted h(x) Label y

Loss = −[0 ∗ log 0.94 + 1 ∗ log 0.01 + 0 ∗ log 0.05 ] = − log(0.01)

Predicted h(x) Label y

You might also like

threshold precision ¯ recall