Classification
Motivations
FALL 2024 : WEEK 3
Classification
Question Answer “𝑦”
Is this email spam? no yes
Is the transaction fraudulent? no yes
Is the tumor malignant? no yes
𝑦 can only be one of two values false true
“binary classification” 0 1
“negative class” “positive class”
Andrew Ng
(yes) 1
malignant?
(no) 0
tumor size 𝑥
(diameter in cm)
benign
malignant
Andrew Ng
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
(yes) 1
threshold
malignant?
(no) 0
tumor size 𝑥
(diameter in cm)
if 𝑓𝑤,𝑏 𝑥 < 0.5 → 𝑦ො = 0
if 𝑓𝑤,𝑏 𝑥 ≥ 0.5 → 𝑦ො = 1
Andrew Ng
Classification
Logistic Regression
Want outputs between 0 and 1
1
(yes) 1
threshold
malignant? 0.5
(no) 0 0 𝑧
tumor size 𝑥 -3 sigmoid function 3
(diameter in cm) logistic function
outputs between 0 and 1
1
𝑔 𝑧 = 0<𝑔 𝑧 <1
1+𝑒−𝑧
Andrew Ng
Want outputs between 0 and 1 𝑓w,𝑏 x
1 𝑧 =w∙x+𝑏
0.5
1
𝑧 𝑔 𝑧 =
0 1+𝑒−𝑧
-3 sigmoid function 3 1
logistic function
𝑓w,𝑏 x = 𝑔(w ∙ x + 𝑏) =
1 + 𝑒− w∙x+𝑏
outputs between 0 and 1
“logistic regression”
1
𝑔 𝑧 = 0<𝑔 𝑧 <1
1+𝑒−𝑧
Andrew Ng
Interpretation of logistic regression output
1
𝑓w,𝑏 x =
1 + 𝑒 − w∙x+𝑏 𝑓w,𝑏 x = 𝑃 𝑦 = 1 x; w,𝑏
“probability” that class is 1 Probability that 𝑦 is 1,
given input x, parameters w,𝑏
Example:
𝑥 is “tumor size”
𝑦 is 0 (not malignant) 𝑃(𝑦 = 0) + 𝑃(𝑦 = 1) = 1
or 1 (malignant)
𝑓w,𝑏 x = 0.7
70% chance that 𝑦 is 1
Andrew Ng
Classification
Decision Boundary
1 1
w ,𝑏( )
𝑔(𝑧)
𝑓→ →
x = 𝑔(→w ∙ →
x + =
𝑏)
0.5 − (→ x + 𝑏)
w ∙→
1+ 𝑒
= 𝑃 (𝑦 = 1 𝑥; →
w ,𝑏)
0 z
w ,𝑏( x ) w ,𝑏( x ) ≥0.5?
𝑓→ → Is 𝑓
! → →
!𝑧 = →
w ∙→
x + 𝑏 Yes: !𝑦^ = 1 No: 𝑦!^ = 0
When is
w ,𝑏( x ) ≥0.5
𝑓
! → → ? ≥0.5
!𝑔(𝑧)
!𝑧 ≥0 !𝑧 < 0
1 → → → →
!𝑔(𝑧) = ! w ∙ x + 𝑏 ≥0 !w ∙ x + 𝑏 < 0
1 + 𝑒− 𝑧 ^= 1
!𝑦 𝑦!^ = 0
Andrew Ng
Decision boundary
𝑓w,𝑏 x = 𝑔 𝑧 = 𝑔 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑏
Decision boundary 𝑧 = w ∙ x + 𝑏 = 0
𝑧 = 𝑥1 + 𝑥2 − 3 = 0
𝑥1 + 𝑥2 = 3
𝑥2
3 𝑥1 + 𝑥2 ≥ 3
2
𝑥1 + 𝑥2 < 3
1
1 2 3 𝑥1
Andrew Ng
Non-linear decision boundaries
𝑥2 𝑥12 + 𝑥22 ≥ 1
𝑦ො = 1 𝑓w,𝑏 x = 𝑔 𝑧 = 𝑔( 𝑤1 𝑥12 + 𝑤2𝑥22+b )
1
decision 𝑧 = 𝑥12 + 𝑥22 − 1 = 0
𝑥1
−1 1 boundary 𝑥12 + 𝑥22 = 1
−1
𝑥12 + 𝑥22 < 1
𝑦ො = 0
Andrew Ng
Non-linear decision boundaries
𝑥2
w ,𝑏( x ) = 𝑔(𝑧) = 𝑔 (𝑤1𝑥1 + 𝑤2𝑥2
𝑓
! →
→
𝑥1
+! 𝑤3𝑥12 + 𝑤4𝑥1𝑥2 + 𝑤5𝑥22
! + 𝑤6𝑥13 +⋯+ b)
𝑥2
𝑥1
Andrew Ng
Cost Function
Cost Function for
Logistic Regression
Training set
tumor size … patient’s age malignant? 𝑖 = 1, … , 𝑚
(cm)
𝑗 = 1, … , 𝑛
10 52 1
target 𝑦 is 0 or 1
2 73 0
5 55 0 1
𝑓w,𝑏 x =
12 49 1 1 + 𝑒 − w∙x+𝑏
… … …
How to choose w = [𝑤1 𝑤2 ⋯ 𝑤𝑛 ] and 𝑏?
Andrew Ng
Squared error cost
average of training set
𝐿 𝑓w,𝑏 x 𝑖 ,𝑦 𝑖
linear regression logistic regression
1
𝑓w,𝑏 x = w ∙ x + 𝑏 𝑓w,𝑏 x =
1 + 𝑒 − w∙x+𝑏
𝐽 w, 𝑏 convex 𝐽 w, 𝑏 non-convex
w, b w, 𝑏
Andrew Ng
Logistic loss function
log 𝑓
𝐿 𝑓w,𝑏 x 𝑖 ,𝑦 𝑖
𝑓
if 𝑦 𝑖 =1 −log 𝑓
0 0.1 0.5 1
Loss is lowest when
As 𝑓w,𝑏 x 𝑖 → 1 then loss → 0 𝑓w,𝑏 x 𝑖
𝑓w,𝑏 x 𝑖 predicts
As 𝑓 w,𝑏 x 𝑖
→ 0 then loss → ∞ close to true label 𝑦 𝑖 .
Andrew Ng
Logistic loss function
−log 1 − 𝑓
As 𝑓w,𝑏 x 𝑖
→ 0 then loss → 0
𝑓
𝑖 𝑖
𝐿 𝑓w,𝑏 x ,𝑦
𝑖
if 𝑦 =0 The further prediction
𝑓w,𝑏 x 𝑖 is from
0 𝑓w,𝑏 x 𝑖 1 target 𝑦 𝑖 , the
As 𝑓w,𝑏 x 𝑖 → 1 then loss → ∞ higher the loss.
Andrew Ng
Cost
Andrew Ng
Cost Function
Simplified Cost
Function for Logistic
Regression
Simplified loss function
𝐿 𝑓w,𝑏 x 𝑖 , 𝑦 𝑖
= − 𝑦 𝑖 log 𝑓w,𝑏 x 𝑖
− 1−𝑦 𝑖
log 1 − 𝑓w,𝑏 x 𝑖
if 𝑦 𝑖 = 1:
𝐿 𝑓w,𝑏 x 𝑖 , 𝑦 𝑖 =
Andrew Ng
Simplified loss function
𝐿 𝑓w,𝑏 x 𝑖 , 𝑦 𝑖
= − 𝑦 𝑖 log 𝑓w,𝑏 x 𝑖
− 1−𝑦 𝑖
log 1 − 𝑓w,𝑏 x 𝑖
if 𝑦 𝑖 = 1:
𝐿 𝑓w,𝑏 x 𝑖 , 𝑦 𝑖
=
if 𝑦 𝑖 = 0:
𝐿 𝑓w,𝑏 x 𝑖 , 𝑦 𝑖 =
Andrew Ng
Simplified cost function
𝐿 𝑓w,𝑏 x 𝑖 , 𝑦 𝑖 = − 𝑦 𝑖 log 𝑓w,𝑏 x 𝑖 − 1−𝑦 𝑖 log 1 − 𝑓w,𝑏 x 𝑖
𝐽 w, 𝑏 =
Andrew Ng
Training logistic regression
Find w, 𝑏
1
Given new x, output 𝑓w,𝑏 x =
1+𝑒 −(w∙x+𝑏)
𝑃 𝑦 = 1 x; w, 𝑏
Andrew Ng
Gradient descent
repeat { 𝑚
𝜕 𝜕 1 𝑖 𝑖
𝑤𝑗 = 𝑤𝑗 − 𝛼 𝐽 w, 𝑏 𝐽 w, 𝑏 = (𝑓w,𝑏 x − 𝑦 𝑖 )𝑥𝑗
𝜕𝑤𝑗 𝜕𝑤𝑗 𝑚
𝑖=1
𝜕 𝑚
𝑏 = 𝑏−𝛼 𝐽 w, 𝑏 𝜕 1 𝑖
𝜕𝑏 𝐽 w, 𝑏 = (𝑓w,𝑏 x −𝑦 𝑖 )
𝜕𝑏 𝑚
𝑖=1
} simultaneous updates
Andrew Ng
Gradient descent for logistic regression
repeat {
𝑚
1 𝑖 𝑖
𝑤𝑗 = 𝑤𝑗 − 𝛼 (𝑓w,𝑏 x − 𝑦 𝑖 )𝑥𝑗
𝑚
𝑖=1
𝑚
1 𝑖 Same concepts:
𝑏 = 𝑏−𝛼 (𝑓w,𝑏 x −𝑦 𝑖 )
𝑚 • Monitor gradient descent
𝑖=1 (learning curve)
} simultaneous updates • Vectorized implementation
• Feature scaling
Linear regression 𝑓w,𝑏 x = w ∙ x + 𝑏
1
Logistic regression 𝑓 w,𝑏 x =
1 + 𝑒 (−w∙x+𝑏)
Andrew Ng
QUESTIONS???
AC K N OW L E D G E M E N T !
• Various contents in this presentation have been taken from different books,
lecture notes, and the web. These solely belong to their owners, and are here used
only for clarifying various educational concepts. Any copyright infringement is
not intended.