Transfer Functions
Supervised Learning - Classification
Techniques used in ML applications
Supervised learning:
Defined by its use of labeled datasets.
The datasets are designed to train or “supervise” algorithms into
classifying data or predicting outcomes accurately.
Using labeled inputs and outputs, the model can measure its accuracy
and learn over time.
Regression
Classification
• Logistic regression, support
vector machines, decision trees
and random forest
Background
There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data
Example: feedforward ANN (multi-layered perceptron)
c) Make a probabilistic model of data within each class
Examples: naive Bayes, model based classifiers
a) and b) are examples of discriminative classification
c) is an example of generative classification
b) and c) are both examples of probabilistic classification
5
Learning a Logistic Regression Model
• How to learn a logistic regression model 𝒉𝜽 𝒙 = 𝒈(𝜽𝑻 𝒙), where 𝞱 =
[𝜽𝟎 , … , 𝜽𝒎 ] and 𝒙 = [𝒙𝟎, … , 𝒙𝒎 ]?
• By minimizing the following cost function:
1 1
Cost(𝒉𝜽 𝒙 , y) = −𝑦 log 𝑇 − (1 − 𝑦) log 1 − 𝑇
1+𝑒 −𝜽 𝑥 1 + 𝑒 −𝜽 𝑥
• That is: 𝑛
1 𝑖
minimize Cost(𝒉𝜽 𝒙 ,𝑦 𝑖 )
𝜃 𝑛
𝑖=1
≣
𝑛
1 1 1 Cost function
minimize −𝑦 𝑖 log − (1 − 𝑦) log 1 −
𝜃 𝑛 1+
𝑇
𝑒 −𝜽 𝑥
𝑖
1 + 𝑒 −𝜽
𝑇
𝑥𝑖 𝑱(𝞱)
𝑖=1
Gradient Descent For Logistic Regression
• Outline:
• Have cost function 𝑱 𝞱 , where 𝞱 = [𝜽𝟎 , … , 𝜽𝒎 ]
• Start off with some guesses for 𝜃0 , … , 𝜃𝑚
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence Partial derivative
𝜕𝑱 𝞱
𝜃𝑗 = 𝜃𝑗 − α Note: Update all 𝜽𝒋 simulatenously
𝜕 𝜃𝑗
Learing rate, which controls how big a step we take
when we update 𝜃𝑗
Gradient Descent For Logistic Regression
• Outline:
• Have cost function 𝑱 𝞱 , where 𝞱 = [𝜽𝟎 , … , 𝜽𝒎 ]
• Start off with some guesses for 𝜃0 , … , 𝜃𝑚
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence
𝑛 The final formula
1 𝑖
𝜃𝑗 = 𝜃𝑗 − α 𝑇 𝑖
−𝑦 𝑥𝑗 (𝑖) after applying
1+ 𝑒 −𝜽 𝑥 partial derivatives
𝑖=1
Inference After Learning
• After learning the parameters 𝞱 = [𝜃0 , … , 𝜃𝑚 ], we can predict the
output of any new unseen 𝒙 = [𝒙𝟎, … , 𝒙𝒎 ] as follows:
1
𝒊𝒇 𝒉𝜽 𝒙 = −𝜽𝑇
𝒙
< 𝟎. 𝟓 𝐩𝐫𝐞𝐝𝐢𝐜𝐭 𝟎
1+𝑒
1
𝑬𝒍𝒔𝒆 𝒊𝒇 𝒉𝜽 𝒙 = −𝜽𝑇
𝒙
≥ 𝟎. 𝟓 𝐩𝐫𝐞𝐝𝐢𝐜𝐭 𝟏
1+𝑒
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
and vaccine the of nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
A Training Dataset
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
and vaccine the of nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
1 entails that a word (i.e., “and”) is present in an email (i.e., “Email a”)
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
and vaccine the of nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
0 entails that a word (i.e., “and”) is abscent in an email (i.e., “Email b”)
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0] We define 6 parameters
(the first one, i.e., 𝜃0 ,
5 words (or features) = [𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 , 𝒙𝟓 ] is the intercept)
𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 0 1 1 1
Email b 0 0 1 1 0 0
Email c 0 1 1 0 0 1
Email d 1 0 0 1 0 0
Email e 1 0 1 0 1 1
Email f 1 0 1 1 0 0
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0] The parameter vector:
𝜽 = [𝜽𝟎 , 𝜽𝟏 , 𝜽𝟐 , 𝜽𝟑 , 𝜽𝟒 , 𝜽𝟓 ]
x = [𝒙𝟎 , 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 , 𝒙𝟓 ] The feature vector
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
To account for the intercept
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function 𝑱 𝞱 , where 𝞱 = [𝜽𝟎 , … , 𝜽𝒎 ]
• Start off with some guesses for 𝜃0 , … , 𝜃𝑚
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence
𝑛
1 𝑖
𝜃𝑗 = 𝜃𝑗 − α 𝑇 𝑖
−𝑦 𝑥𝑗 (𝑖)
1+ 𝑒 −𝜽 𝑥
𝑖=1 First, let us calculate this factor
for every example in our
training dataset
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟎
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function 𝑱 𝞱 , where 𝞱 = [𝜽𝟎 , … , 𝜽𝒎 ]
• Start off with some guesses for 𝜃0 , … , 𝜃𝑚
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence Second, let us calculate
this equation for every
𝑛 example in our training
1 𝑖 dataset and for every 𝜽𝒋 ,
𝜃𝑗 = 𝜃𝑗 − α 𝑇 𝑖
−𝑦 𝑥𝑗 (𝑖) where j is between 0
1+ 𝑒 −𝜽 𝑥
𝑖=1 and m
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟎
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 ( 1 − 𝟏) × 𝟏 = -0.5
1+𝑒 −𝟎
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 1
(1+1 − 𝟎) × 𝟏 = 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 1
(1+1 − 𝟏) × 𝟏 = -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 1
(1+1 − 𝟎) × 𝟏 = 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 1
(1+1 − 𝟏) × 𝟏 = -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 1
(1+1 − 𝟎) × 𝟏 = 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟎
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 ( 1 − 𝟏) × 𝟏 = -0.5
1+𝑒 −𝟎
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 1
(1+1 − 𝟎) × 𝟏 = 0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 1
(1+1 − 𝟏) × 𝟏 = -0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 1
(1+1 − 𝟎) × 𝟏 = 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 1
(1+1 − 𝟏) × 𝟏 = -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 1
(1+1 − 𝟎) × 𝟏 = 0.5
Recap: Gradient Descent For Logistic
Regression
• Outline:
• Have cost function 𝑱 𝞱 , where 𝞱 = [𝜽𝟎 , … , 𝜽𝒎 ]
• Start off with some guesses for 𝜃0 , … , 𝜃𝑚
• It does not really matter what values you start off with, but a common
choice is to set them all initially to zero
• Repeat until convergence
𝑛
1 𝑖 (𝑖) Third, let us compute
𝜃𝑗 = 𝜃𝑗 − α 𝑇 𝑖
−𝑦 𝑥𝑗 every 𝜽𝒋
1+ 𝑒 −𝜽 𝑥
𝑖=1
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟎 𝑇 𝑖
−𝑦 𝑖
𝑥0 (𝑖) = 𝟎
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃0 = 𝜃0 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 New 𝜽𝟎
0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟎 𝑇 𝑖
−𝑦 𝑖
𝑥0 (𝑖) = 𝟎
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃0 = 𝜃0 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 Old 𝜽𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟎 𝑇 𝑖
−𝑦 𝑖
𝑥0 (𝑖) = 𝟎
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃0 = 𝜃0 − α ×0
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 𝟎 = 𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝜽 = [𝟎, 𝜽𝟏 , 𝜽𝟐 , 𝜽𝟑 , 𝜽𝟒 , 𝜽𝟓 ]
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟏
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟏 𝑇 𝑖
−𝑦 𝑖
𝑥1 (𝑖) = 𝟎
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃1 = 𝜃1 − α ×0
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 𝟎 = 𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝜽 = [𝟎, 𝟎, 𝜽𝟐 , 𝜽𝟑 , 𝜽𝟒 , 𝜽𝟓 ]
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟐
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟐 𝑇 𝑖
−𝑦 𝑖
𝑥2 (𝑖) = −𝟏
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃2 = 𝜃2 − α × (−𝟏)
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0 = 0 − 0.5 × −𝟏 = 𝟎. 𝟓
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 0
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0 New Paramter Vector:
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝜽𝟑 , 𝜽𝟒 , 𝜽𝟓 ]
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟑
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟑 𝑇 𝑖
−𝑦 𝑖
𝑥3 (𝑖) = 𝟎
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 0
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃3 = 𝜃3 − α × 𝟎
-0.5
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0 = 0 − 0.5 × 0 = 𝟎
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, 𝜽𝟒 , 𝜽𝟓 ]
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟒
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟒 𝑇 𝑖
−𝑦 𝑖
𝑥4 (𝑖) = 𝟏
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0.5
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃4 = 𝜃4 − α × 𝟏
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5 = 0 − 0.5 × 1 = −𝟎. 𝟓
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 0
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5 New Paramter Vector:
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓, 𝜽𝟓 ]
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝒙 𝒚 𝜽𝑻 𝒙 1
(1+𝑒 −𝜽 𝒙 − 𝒚)𝒙𝟓
𝑇
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0 -0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 0.5
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0 -0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 0.5
A Concrete Example: The Training Phase
• Let us apply logistic regression on the spam email recognition problem,
assuming 𝞪 = 0.5 and starting with 𝜽 = [0, 0, 0, 0, 0, 0]
𝑛
𝒙 𝒚 𝑻
𝜽 𝒙 1 1
(1+𝑒 −𝜽𝑇𝒙 − 𝒚)𝒙𝟓 −𝑦 𝑖
𝑥5 (𝑖) = −𝟏
𝑇 𝑖
𝑖=1
1 + 𝑒 −𝜽 𝑥
[1,1,1,0,1,1] 1 [0,0,0,0,0,0]×[1,1,1,0,1,1]=0
-0.5
[1,0,0,1,1,0] 0 [0,0,0,0,0,0]×[1,0,0,1,1,0]=0 𝑻𝒉𝒆𝒏,
0
[1,0,1,1,0,0] 1 [0,0,0,0,0,0]×[1,0,1,1,0,0]=0 𝜃5 = 𝜃5 − α × (−𝟏)
0
[1,1,0,0,1,0] 0 [0,0,0,0,0,0]×[1,1,0,0,1,0]=0 = 0 − 0.5 × (−1) = 𝟎. 𝟓
0
[1,1,0,1,0,1] 1 [0,0,0,0,0,0]×[1,1,0,1,0,1]=0
-0.5
[1,1,0,1,1,0] 0 [0,0,0,0,0,0]×[1,1,0,1,1,0]=0 New Paramter Vector:
0
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓, 𝟎. 𝟓]
A Concrete Example: Testing
• Let us now test logistic regression on the spam email recognition
problem, using the just learnt 𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
• Note: Testing is typically done over a portion of the dataset that is not used
during training, but rather kept only for testing the accuracy of the algorithm’s
predictions thus far
• In this example, we will test over all the examples that we used during training,
just for illustrative purposes
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt 𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
𝒙 𝒚 𝜽𝑻 𝒙 1
𝒉𝜽 𝒙 = (1+𝑒 −𝜽 𝒙 )
𝑇
Predicted Class
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt 𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
𝒙 𝒚 𝜽𝑻 𝒙 1
𝒉𝜽 𝒙 = (1+𝑒 −𝜽 𝒙 )
𝑇
Predicted Class
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt 𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 1
𝒉𝜽 𝒙 = (1+𝑒 −𝜽 𝒙 )
𝑇
Predicted Class (or 𝒚’)
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt 𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 1
𝒉𝜽 𝒙 = (1+𝑒 −𝜽 𝒙 )
𝑇
Predicted Class (or 𝒚’)
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331 1
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669 0
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331 1
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 0.377540669 0
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331 1
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669 0
A Concrete Example: Testing
• Let us test logistic regression on the spam email recognition problem,
using the just learnt 𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
(𝒊𝒇 𝒉𝜽 𝒙 ≥ 𝟎. 𝟓, 𝒚’ = 𝟏; 𝒆𝒍𝒔𝒆 𝒚’ = 𝟎)
𝒙 𝒚 𝜽𝑻 𝒙 1
𝒉𝜽 𝒙 = (1+𝑒 −𝜽 𝒙 )
𝑇
Predicted Class (or 𝒚’)
[1,1,1,0,1,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,1,0,1,1]=0.5 0.622459331 1
[1,0,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,0,0,1,1,0]=-0.5 0.377540669 0
[1,0,1,1,0,0] 1 [0,0,0.5,0,-0.5,0.5]×[1,0,1,1,0,0]=0.5 0.622459331 NO
1
[1,1,0,0,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,0,1,0]=-0.5 Mispredictions!
0.377540669 0
[1,1,0,1,0,1] 1 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,0,1]=0.5 0.622459331 1
[1,1,0,1,1,0] 0 [0,0,0.5,0,-0.5,0.5]×[1,1,0,1,1,0]=-0.5 0.377540669 0
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 ?
Our Training Dataset
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 ?
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
𝟎
1 𝟎
𝒉𝜽 𝒙 = 𝑇
𝟎. 𝟓
1+𝑒 𝒙
−𝜽 𝟏, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏 = 𝟎. 𝟓 × 𝟏 + 𝟎. 𝟓 × 𝟏 = 𝟏
𝟎
1 −𝟎. 𝟓
= 𝟎. 𝟓
1 + 𝑒 −𝟏
= 0.731
≥ 0.5 Class 1 (i.e., Spam)
A Concrete Example: Inference
• Let us infer whether a given new email, say, k = [1, 0, 1, 0, 0, 1] is a spam
or not, using logistic regression with the just learnt parameter vector
𝜽 = [𝟎, 𝟎, 𝟎. 𝟓, 𝟎, −𝟎. 𝟓,𝟎. 𝟓]
𝒙𝟎 = 𝟏 𝒙𝟏 = and 𝒙𝟐 = vaccine 𝒙𝟑 = the 𝒙𝟒 = of 𝒙𝟓 = nigeria y
Email a 1 1 1 0 1 1 1
Email b 1 0 0 1 1 0 0
Email c 1 0 1 1 0 0 1
Email d 1 1 0 0 1 0 0
Email e 1 1 0 1 0 1 1
Email f 1 1 0 1 1 0 0
Email k 1 0 1 0 0 1 1
Somehow interesting since it considered “vaccine” and “nigeria” indicative of spam!