KEMBAR78
Lecture 3 | PDF | Logistic Regression | Applied Mathematics
0% found this document useful (0 votes)
8 views26 pages

Lecture 3

The document discusses logistic regression and neural networks, explaining binary and multiclass logistic regression, their mathematical formulations, and the role of activation functions. It also contrasts shallow and deep networks, emphasizing the advantages of deep learning for approximating complex functions. Additionally, it covers gradient descent and backpropagation as key optimization techniques in training neural networks.

Uploaded by

Dream 11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

Lecture 3

The document discusses logistic regression and neural networks, explaining binary and multiclass logistic regression, their mathematical formulations, and the role of activation functions. It also contrasts shallow and deep networks, emphasizing the advantages of deep learning for approximating complex functions. Additionally, it covers gradient descent and backpropagation as key optimization techniques in training neural networks.

Uploaded by

Dream 11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Logistic Regression: Neural Network:

Deep Learning.

Rohit Kumar

IIT Delhi

August 8, 2025

Rohit Kumar IIT Delhi August 8, 2025 1 / 26


Logistic Regression: Neural Network:

Logistic Regression:

▶ Binary logistic regression assumes there are two output labels Yi ∈ {0, 1}.

▶ The binary logistic regression postulates the conditional probability


Pr(Yi = 1|Xi ) of the form

Pr(Yi = 1|Xi ) = σ(Xi⊤ β + α),

where
exp(x)
σ(x) = ,
1 + exp(x)
is called sigmoid function.

Rohit Kumar IIT Delhi August 8, 2025 2 / 26


Logistic Regression: Neural Network:

Logistic Regression:

▶ Based on MLE estimator we get

c i = 1|Xi ) = σ(X ⊤ β̂ + α̂).


Pr(Y i

▶ We can define our classifier based on this as


n o
ϕ(Xi ) = arg max Pr(Y c i = 1|Xi ), Pr(Y
c i = 0|Xi )
(
c i = 1|Xi ) ≥ .5
1 Pr(Y
=
0 otherwise.

▶ It is easy to see that

c i = 1|Xi ) ≥ .5 ⇐⇒ Xi⊤ β̂ + α̂ ≥ 0.
Pr(Y

Rohit Kumar IIT Delhi August 8, 2025 3 / 26


Logistic Regression: Neural Network:

Logistic Regression:

▶ So ϕ(Xi ) is my label and Yi is true label.

▶ We can find Type I and type II error as

F P = Pr(Yi = 0 & ϕ(Xi ) = 1)


F N = Pr(Yi = 1 & ϕ(Xi ) = 0)

▶ Sample splitting to get idea about FP and FN.

Rohit Kumar IIT Delhi August 8, 2025 4 / 26


Logistic Regression: Neural Network:

Logistic Regression:

1 α
x1 β1  k
P

σ α+ βi x i
β2 i=1
x2 σ
β3
x3
.. βk
.
xn

Rohit Kumar IIT Delhi August 8, 2025 5 / 26


Logistic Regression: Neural Network:

Multiclass Logistic Regression:

▶ Multiclass logistic regression assumes there are J + 1 output labels


Yi ∈ {0, 1, . . . , J}.

▶ The logistic regression postulates the conditional probability Pr(Yi = k|Xi )


of the form
exp(Zik )
Pr(Yi = k|Xi ) = PJ j
,
j=0 exp(Zi )

where
Zik = Xi⊤ β k + αk .
is called sigmoid function.

Rohit Kumar IIT Delhi August 8, 2025 6 / 26


Logistic Regression: Neural Network:

Multiclass Logistic Regression:

▶ Based on MLE estimator we get

k
c i = k|Xi ) = P exp(Ẑi ) .
Pr(Y J j
j=0 exp(Ẑi )

▶ We can define our classifier based on this as


n o
ϕ(Xi ) = arg max Pr(Y
c i = k|Xi ) .
k∈{0,1,...,J}

▶ It is easy to see that

c i = j|Xi ) for j ̸= k ⇐⇒ Ẑik ≥ Ẑ j for j ̸= k.


c i = k|Xi ) ≥ Pr(Y
Pr(Y i

Rohit Kumar IIT Delhi August 8, 2025 7 / 26


Logistic Regression: Neural Network:

Multiclass Logistic Regression:

Input Hidden Output


layer layer layer

x1

x2
Output
x3

x4

Rohit Kumar IIT Delhi August 8, 2025 8 / 26


Logistic Regression: Neural Network:

Revisit Logistic Regression:

x2 Not linearly separable

(0, 1) → 1 (1, 1) → 0

x1
(0, 0) → 0 (1, 0) → 1

Rohit Kumar IIT Delhi August 8, 2025 9 / 26


Logistic Regression: Neural Network:

Revisit Logistic Regression:

Input
Hidden
layer
layer Output
layer
Input x1

Output y

Input x2

Rohit Kumar IIT Delhi August 8, 2025 10 / 26


Logistic Regression: Neural Network:

Neural Network:

▶ Logistic regression is an example of the simplest neural network consisting


only of the input and output layers.

▶ Neutral networks are just generalization of multi class logistic regressions.

▶ In general, a neural network has one input layer, hidden layer and one output
layer.

▶ There can be many hidden layer and each layer is assigned an activation
function.

Rohit Kumar IIT Delhi August 8, 2025 11 / 26


Logistic Regression: Neural Network:

Shallow Network:

Input Hidden Output


layer layer layer

x1

x2
Output
x3

x4

Rohit Kumar IIT Delhi August 8, 2025 12 / 26


Logistic Regression: Neural Network:

Deep Network:

Input Hidden Hidden Output


layer layer 1 layer L layer

x1

x2
Output
x3

x4

Rohit Kumar IIT Delhi August 8, 2025 13 / 26


Logistic Regression: Neural Network:

Neural Network:

▶ Our objective is to approximate a function f : X 7→ R, where X ⊆ Rd using


neural network.

▶ We consider a network with L hidden layers, with the width of layer (l)
denoted as Hl for l = 0, 1, ..., L + 1.

▶ H0 = d the number of inputs and HL+1 = 1 output layer.

▶ Let us denote the output vector for l-th layer by x(l) ∈ RHl , which will serve
as the input to the next layer.

▶ Each output of l-th layer is fed into l + 1-th layer as input.

Rohit Kumar IIT Delhi August 8, 2025 14 / 26


Logistic Regression: Neural Network:

Neural Network:

▶ Each input is weighted by β plus bias α as β ⊤ x + α, then applied an


activation function to get the output of layer as

σ(β ⊤ x + α).

▶ For layer l + 1 the output of each neuron is


 
(l+1) (l+1) ⊤ (l) (l+1)
xk = σ (βk ) x + αk , 1 ≤ k ≤ Hl+1 .

(l+1) (l+1)
where βk and αk are respectively known as the weights and bias, while
the function σ(·) is known as the activation function.

Rohit Kumar IIT Delhi August 8, 2025 15 / 26


Logistic Regression: Neural Network:

Neural Network:

▶ We can write whole thing in matrix form as

x(l+1) = σ(Ll+1 (x(l) )) = σ ◦ Ll+1 (x(l) ),

where Ll+1 (x(l) ) = B (l+1) x(l) + A(l+1) with


(l+1) ⊤
   (l+1) 
(β1 ) α1
(l+1) ⊤   (l+1)
(β )  α
 
 1
B (l+1) =   , A(l) =  2 .
 
. .

 .. 


 .. 

(l+1) ⊤ (l+1)
(βHl+1 ) αHl+1

▶ Final output can be written as

f (x; θ) = L(L+1) ◦ σ ◦ L(L) ◦ σ ◦ L(L−1) ◦ · · · ◦ σ ◦ L(1) (x).

Rohit Kumar IIT Delhi August 8, 2025 16 / 26


Logistic Regression: Neural Network:

Activation Function:

▶ Activation function plays a pivotal role in helping the network to represent


non-linear complex functions.

▶ Examples

1. Linear σ(x) = x.

2. ReLU σ(x) = max{0, x}.

3. Leaky ReLU σ(x) = max{0, x} + α max{0, −x}.

4. Sigmoid σ(x) = exp(x)/(1 + exp(x)).

Rohit Kumar IIT Delhi August 8, 2025 17 / 26


Logistic Regression: Neural Network:

Activation Functions:

σ(ξ) σ(ξ) σ(ξ)

ξ ξ ξ
α = 0.1

Figure: (1) Linear (2) ReLU (3) Leaky ReLU

Rohit Kumar IIT Delhi August 8, 2025 18 / 26


Logistic Regression: Neural Network:

Activation Functions:

σ(ξ) σ(ξ) σ(ξ)

1 1 1

ξ ξ ξ

−1 −1

Figure: (1) Logistic (2) Tanh (3) Sine

Rohit Kumar IIT Delhi August 8, 2025 19 / 26


Logistic Regression: Neural Network:

Example: 1 Layer.

b = −2
w=2 w=1
(1)
x1

b=0
One kink
w=1 w=1
b=0
0 (0)
x1 4

(0) (1) (2)

4 4

(1) (2)
x2 x1

One kink
Two kinks

0 (0) 0 (0)
x1 4 x1 4

Rohit Kumar IIT Delhi August 8, 2025 20 / 26


Logistic Regression: Neural Network:

Deep vs Shallow:

▶ When and Why Are Deep Networks Better than Shallow Ones?

▶ Both shallow and deep networks can approximate arbitrarily well any
continuous function of d variables on a compact domain.

▶ Suppose we want to approximate of functions with a compositional structure

f (x1 , . . . , xd ) = h1 (h2 . . . (hj (hi1 (x1 , x2 ), hi2 (x3 , x4 )), . . .))

▶ For shallow learning, we need parameter complexity of order ϵ−d/r .

▶ For deep learning, we need parameter complexity around ϵ−2/r .

Rohit Kumar IIT Delhi August 8, 2025 21 / 26


Logistic Regression: Neural Network:

Gradient Descent:

▶ Suppose we wish solve the minimization problem θ∗ = minθ Π(θ).

▶ Consider the Taylor expansion about θ0



∂Π(θ0 ) ∂ 2 Π(θ̂)
Π(θ0 + ∆θ) = Π(θ0 ) + ∆θ + ∆θ⊤ ∆θ.
∂θ ∂θ∂θ′

for some θ̂ = θ0 + α∆θ, where 0 ≤ α ≤ 1.

▶ When |∆θ| is small we can neglect the second order term as



∂Π(θ0 )
Π(θ0 + ∆θ) ≈ Π(θ0 ) + ∆θ.
∂θ

Rohit Kumar IIT Delhi August 8, 2025 22 / 26


Logistic Regression: Neural Network:

Gradient Descent:

▶ We should choose ∆θ to reduce objective function.

▶ We need to choose the step ∆θ in the opposite direction of the gradient,

∂Π(θ0 )
∆θ = −η
∂θ
with the step-size η ≥ 0, also known as the learning-rate.

▶ This is the crux of the GD algorithm,

Rohit Kumar IIT Delhi August 8, 2025 23 / 26


Logistic Regression: Neural Network:

Gradient Descent:

1. Initialize k = 0 and θ0

2. While |Π(θk ) − Π(θk−1 )| > ϵ1 , do

∂Π(θk )
(a) Evaluate ∂θ

(b) Update θk+1 = θk − η ∂Π(θ


∂θ
k)
.

(c) Increment k = k + 1

Rohit Kumar IIT Delhi August 8, 2025 24 / 26


Logistic Regression: Neural Network:

Advanced Algorithms:
▶ In general, the update formula for most optimization algorithms is

[θk+1 ]i = [θk ]i − [ηk ]i [gk ]i , 1 ≤ i ≤ Nθ ,

▶ Momentum methods make use of the history of the gradient

∂Π(θk )
[ηk ]i = η, gk = β1 gk−1 + (1 − β1 ) , g−1 = 0.
∂θ

▶ Adam’s algorithm: gk same as momentum algorithm (β1 = 0.9, β2 = 0.999


and ϵ = 10−8 ). Additionally,
 2
∂Π(θk )
[Gk ]i = β2 [Gk−1 ]i + (1 − β2 )
∂θi
η
[ηk ]i = p
[Gk ]i + ϵ

Rohit Kumar IIT Delhi August 8, 2025 25 / 26


Logistic Regression: Neural Network:

Back Propagation:

▶ In deep learning we are minimizing the least square


N
X
(Yi − f (Xi ; θ))2 .
i=1

▶ Recall that

f (x; θ) = L(L+1) ◦ σ ◦ L(L) ◦ σ ◦ L(L−1) ◦ · · · ◦ σ ◦ L(1) (x).

▶ We can take derivative backwards for all parameters.

Rohit Kumar IIT Delhi August 8, 2025 26 / 26

You might also like