Lecture 4
Logistic regression
and neural networks
Machine Learning
Andrey Filchenkov
08.06.2016
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
• The presentation is prepared with
materials of the K.V. Vorontsov’s
course “Machine Leaning”.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 2
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 3
Logistic regression
We may want to talk about probably of belonging to a class
(we will discuss it on Lecture 5 in details).
1 checar lo de (-∞, ∞) -> (0, 1) con
𝑦 = = σ 〈𝑤, 𝑥 〉 , el exponente
1+𝑒 〈 , 〉
where σ 𝑧 is logistic (sigmoid) function.
Then classification model is
ℓ
𝑄 𝑎, 𝑇 ℓ = ln(1 + exp(− 𝑤, 𝑥 𝑦)) → min .
That is logarithmic loss function.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 4
Logarithmic loss function plot
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 5
Gradient descent
Derivative:
σ 𝑠 = σ 𝑠 σ(−𝑠).
Gradient:
ℓ
µ∇𝑄 𝑤 [ ] =− 𝑦 𝑥 σ −𝑀 𝑤 .
Gradient descent step:
𝑤[ ] = 𝑤 [ ] − µ𝑦 𝑥 σ −𝑀 𝑤 .
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 6
Smoothed Hebb’s rule
Hebb’s rule:
if − 𝑤 , 𝑥 𝑦 > 0, then 𝑤 [ ] = 𝑤 [ ] + µ𝑥 𝑦 .
Marginal [𝑀 < 0] and smoothed σ −𝑀 :
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 7
Smoothed Hebb’s rule
Hebb’s rule:
if − 𝑤 , 𝑥 𝑦 > 0, then 𝑤 [ ] = 𝑤 [ ] + µ𝑥 𝑦 .
Marginal [𝑀 < 0] and smoothed σ −𝑀 :
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 8
Logistic regression implementation
Python: LogisticRegression with different solvers
Weka: Logistic
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 9
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 10
Biological intuition
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 11
Neuron
Generalized McCulloch-Pitts neuron:
𝑎 𝑥, 𝑇 ℓ = σ 𝑤 𝑓 𝑥 −𝑤 ,
where σ is activation function.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 12
Activation functions
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 13
Rosenblatt’s rule and Hebb’s rule
Rosenblatt’s rule for {1; 0} classification case for
weight learning is for each object 𝑥( ) change
weight vector:
𝑤 [ ] ≔ 𝑤 − η(𝑎 𝑥 − 𝑦 ).
Hebb’s rule for {1; −1} classification case for
weight learning is for each object 𝑥( ) change
weight vector:
If 𝑤 𝑥 𝑦( ) < 0 then 𝑤 [ ] ≔𝑤 + η𝑥 𝑦 .
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 14
Delta rule
Let 𝐿 𝑎 , 𝑥 = 〈𝑤, 𝑥〉 − 1 .
Delta-rule for weight learning is for each object
𝑥( ) change weight vector:
𝑤 [ ] ≔ 𝑤 − η 𝑤, 𝑥 −𝑦 .
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 15
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 16
Completeness problem (for neuron)
Basic idea: synthesize combinations of neurons.
Completeness problem: how rich is family of
function which can be represented with neural
network?
Start with single neuron.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 17
Logical functions as neural networks
Logical AND
𝑥 ∧ 𝑥 = [𝑥 + 𝑥 − 3/2 > 0]
Logical OR
𝑥 ∨ 𝑥 = [𝑥 + 𝑥 − 1/2 > 0]
Logical NOT
¬𝑥 = [−𝑥 + 1/2 > 0]
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 18
Two ways of making it more complex
Example (Minkovski):
𝑥 ⊕𝑥
Two way of making it more complex
1. Use non-linear transformation:
𝑥 ⊕ 𝑥 = [𝑥 + 𝑥 − 2𝑥 𝑥 − 1/2 > 0]
2. Build superposition:
𝑥 ⊕ 𝑥 = [(𝑥 ∨ 𝑥 ) − (𝑥 ∧ 𝑥 ) − 1/2 > 0]
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 19
Completeness problem (Boolean functions)
Completeness problem: how rich is family of
function which can be represented with neural
network?
DNF Theorem:
Any particular Boolean function can be
represented by one and only one full disjunctive
normal form.
What is with a all possible functions?
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 20
Gorban Theorem
Theorem (Gorban, 1998)
Let
• 𝑋 be compact space,
• 𝐶(𝑋) be an algebra of continuous on 𝑋 real-
valued functions,
• 𝐹 be linear subspace 𝐶(𝑋), closed with respect to
nonlinear continuous function ϕ and containing
constant 1 ∈ 𝐹 ,
• 𝐹 separated points in 𝑋.
Then 𝐹 is dense in 𝐶 𝑋 .
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 21
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 22
Multilayer neural network
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 23
Multilayer neural network
Any number of layers
Any number of neurons on each layer
Any number of ties between different layers
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 24
Weights adjusting
Let use SGD to learn weights
𝑤 = 𝑤 ,𝑤 ∈ℝ :
𝑤[ ] = 𝑤 [ ] − η𝛻𝐿 𝑤, 𝑥 , 𝑦 ,
where 𝐿 𝑤, 𝑥 , 𝑦 is loss function (depends on the
problem we are solving).
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 25
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 26
Derivation of functions superposition
𝑎 𝑥 =𝜎 𝑤 𝑢 𝑥 ;
𝑢 𝑥 =𝜎 𝑤 𝑓 𝑥 ;
Let 𝐿 𝑤 = ∑ 𝑎 𝑥 −𝑦 .
Find partial derivatives
∂𝐿 (𝑤) ∂𝐿 (𝑤)
; .
∂𝑎 ∂𝑢
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 27
27
Errors on layers
∂𝐿 (𝑤)
=𝑎 𝑥 −𝑦
∂𝑎
ε =𝑎 𝑥 −𝑦 is error on output layer.
∂𝐿 (𝑤)
= 𝑎 𝑥 −𝑦 σ 𝑤 = ε σ 𝑤
∂𝑢
ε =∑ ε σ 𝑤 is error on hidden layer.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 28
28
Backpropagation discussion (advantages)
Advantages:
• efficacy: gradient can be computed in a time,
which is comparable to time of the network
processing;
• can be easily applied for any σ, 𝐿 ;
• can be applied in dynamical learning;
• not all the sample objects can be used;
• can be paralleled.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 29
29
Backpropagation discussion
(disadvantages)
Disadvantages:
• do not always converge;
• can stuck in local optima;
• number of neurons in the hidden layer should be
fixed;
• the more ties, the probable overfitting is;
• “paralysis” of a single neuron and for network.
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 30
30
Lecture plan
• Logistic regression
• Single-layer neural network
• Completeness problem of neural
networks
• Multilayer neural networks
• Backpropagation
• Modern neural networks
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 31
Plethora of neural networks
Tens or even hundreds different neural networks
exist:
• self-organizing map
• deep learning networks
• recurrental neural networks
• radial basis function networks
• Bayesian neural networks
• modular neural networks
• …
Machine learning. Lecture 4. Logisitc regression and neural neworks. 08.06.2016. 32