0% found this document useful (0 votes)

8 views26 pages

Lecture 3

The document discusses logistic regression and neural networks, explaining binary and multiclass logistic regression, their mathematical formulations, and the role of activation functions. It also contrasts shallow and deep networks, emphasizing the advantages of deep learning for approximating complex functions. Additionally, it covers gradient descent and backpropagation as key optimization techniques in training neural networks.

Uploaded by

Dream 11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views26 pages

Lecture 3

Uploaded by

Dream 11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Logistic Regression: Neural Network:

Deep Learning.

Rohit Kumar

IIT Delhi

August 8, 2025

Rohit Kumar IIT Delhi August 8, 2025 1 / 26

Logistic Regression: Neural Network:

Logistic Regression:

▶ Binary logistic regression assumes there are two output labels Yi ∈ {0, 1}.

▶ The binary logistic regression postulates the conditional probability

Pr(Yi = 1|Xi ) of the form

Pr(Yi = 1|Xi ) = σ(Xi⊤ β + α),

where
exp(x)
σ(x) = ,
1 + exp(x)
is called sigmoid function.

Rohit Kumar IIT Delhi August 8, 2025 2 / 26

Logistic Regression: Neural Network:

Logistic Regression:

▶ Based on MLE estimator we get

c i = 1|Xi ) = σ(X ⊤ β̂ + α̂).

Pr(Y i

▶ We can define our classifier based on this as

n o
ϕ(Xi ) = arg max Pr(Y c i = 1|Xi ), Pr(Y
c i = 0|Xi )
(
c i = 1|Xi ) ≥ .5
1 Pr(Y
=
0 otherwise.

▶ It is easy to see that

c i = 1|Xi ) ≥ .5 ⇐⇒ Xi⊤ β̂ + α̂ ≥ 0.
Pr(Y

Rohit Kumar IIT Delhi August 8, 2025 3 / 26

Logistic Regression: Neural Network:

Logistic Regression:

▶ So ϕ(Xi ) is my label and Yi is true label.

▶ We can find Type I and type II error as

F P = Pr(Yi = 0 & ϕ(Xi ) = 1)

F N = Pr(Yi = 1 & ϕ(Xi ) = 0)

▶ Sample splitting to get idea about FP and FN.

Rohit Kumar IIT Delhi August 8, 2025 4 / 26

Logistic Regression: Neural Network:

Logistic Regression:

1 α
x1 β1 k
P

σ α+ βi x i
β2 i=1
x2 σ
β3
x3
.. βk
.
xn

Rohit Kumar IIT Delhi August 8, 2025 5 / 26

Logistic Regression: Neural Network:

Multiclass Logistic Regression:

▶ Multiclass logistic regression assumes there are J + 1 output labels

Yi ∈ {0, 1, . . . , J}.

▶ The logistic regression postulates the conditional probability Pr(Yi = k|Xi )

of the form
exp(Zik )
Pr(Yi = k|Xi ) = PJ j
,
j=0 exp(Zi )

where
Zik = Xi⊤ β k + αk .
is called sigmoid function.

Rohit Kumar IIT Delhi August 8, 2025 6 / 26

Logistic Regression: Neural Network:

Multiclass Logistic Regression:

▶ Based on MLE estimator we get

k
c i = k|Xi ) = P exp(Ẑi ) .
Pr(Y J j
j=0 exp(Ẑi )

▶ We can define our classifier based on this as

n o
ϕ(Xi ) = arg max Pr(Y
c i = k|Xi ) .
k∈{0,1,...,J}

▶ It is easy to see that

c i = j|Xi ) for j ̸= k ⇐⇒ Ẑik ≥ Ẑ j for j ̸= k.

c i = k|Xi ) ≥ Pr(Y
Pr(Y i

Rohit Kumar IIT Delhi August 8, 2025 7 / 26

Logistic Regression: Neural Network:

Multiclass Logistic Regression:

Input Hidden Output

layer layer layer

x2
Output
x3

Rohit Kumar IIT Delhi August 8, 2025 8 / 26

Logistic Regression: Neural Network:

Revisit Logistic Regression:

x2 Not linearly separable

(0, 1) → 1 (1, 1) → 0

x1
(0, 0) → 0 (1, 0) → 1

Rohit Kumar IIT Delhi August 8, 2025 9 / 26

Logistic Regression: Neural Network:

Revisit Logistic Regression:

Input
Hidden
layer
layer Output
layer
Input x1

Output y

Input x2

Rohit Kumar IIT Delhi August 8, 2025 10 / 26

Logistic Regression: Neural Network:

Neural Network:

▶ Logistic regression is an example of the simplest neural network consisting

only of the input and output layers.

▶ Neutral networks are just generalization of multi class logistic regressions.

▶ In general, a neural network has one input layer, hidden layer and one output
layer.

▶ There can be many hidden layer and each layer is assigned an activation
function.

Rohit Kumar IIT Delhi August 8, 2025 11 / 26

Logistic Regression: Neural Network:

Shallow Network:

Input Hidden Output

layer layer layer

x2
Output
x3

Rohit Kumar IIT Delhi August 8, 2025 12 / 26

Logistic Regression: Neural Network:

Deep Network:

Input Hidden Hidden Output

layer layer 1 layer L layer

x2
Output
x3

Rohit Kumar IIT Delhi August 8, 2025 13 / 26

Logistic Regression: Neural Network:

Neural Network:

▶ Our objective is to approximate a function f : X 7→ R, where X ⊆ Rd using

neural network.

▶ We consider a network with L hidden layers, with the width of layer (l)
denoted as Hl for l = 0, 1, ..., L + 1.

▶ H0 = d the number of inputs and HL+1 = 1 output layer.

▶ Let us denote the output vector for l-th layer by x(l) ∈ RHl , which will serve
as the input to the next layer.

▶ Each output of l-th layer is fed into l + 1-th layer as input.

Rohit Kumar IIT Delhi August 8, 2025 14 / 26

Logistic Regression: Neural Network:

Neural Network:

▶ Each input is weighted by β plus bias α as β ⊤ x + α, then applied an

activation function to get the output of layer as

σ(β ⊤ x + α).

▶ For layer l + 1 the output of each neuron is

(l+1) (l+1) ⊤ (l) (l+1)
xk = σ (βk ) x + αk , 1 ≤ k ≤ Hl+1 .

(l+1) (l+1)
where βk and αk are respectively known as the weights and bias, while
the function σ(·) is known as the activation function.

Rohit Kumar IIT Delhi August 8, 2025 15 / 26

Logistic Regression: Neural Network:

Neural Network:

▶ We can write whole thing in matrix form as

x(l+1) = σ(Ll+1 (x(l) )) = σ ◦ Ll+1 (x(l) ),

where Ll+1 (x(l) ) = B (l+1) x(l) + A(l+1) with

(l+1) ⊤
   (l+1) 
(β1 ) α1
(l+1) ⊤   (l+1)
(β )  α
 
 1
B (l+1) =   , A(l) =  2 .
 
. .

 .. 


 .. 

(l+1) ⊤ (l+1)
(βHl+1 ) αHl+1

▶ Final output can be written as

f (x; θ) = L(L+1) ◦ σ ◦ L(L) ◦ σ ◦ L(L−1) ◦ · · · ◦ σ ◦ L(1) (x).

Rohit Kumar IIT Delhi August 8, 2025 16 / 26

Logistic Regression: Neural Network:

Activation Function:

▶ Activation function plays a pivotal role in helping the network to represent

non-linear complex functions.

▶ Examples

1. Linear σ(x) = x.

2. ReLU σ(x) = max{0, x}.

3. Leaky ReLU σ(x) = max{0, x} + α max{0, −x}.

4. Sigmoid σ(x) = exp(x)/(1 + exp(x)).

Rohit Kumar IIT Delhi August 8, 2025 17 / 26

Logistic Regression: Neural Network:

Activation Functions:

σ(ξ) σ(ξ) σ(ξ)

ξ ξ ξ
α = 0.1

Figure: (1) Linear (2) ReLU (3) Leaky ReLU

Rohit Kumar IIT Delhi August 8, 2025 18 / 26

Logistic Regression: Neural Network:

Activation Functions:

σ(ξ) σ(ξ) σ(ξ)

1 1 1

ξ ξ ξ

−1 −1

Figure: (1) Logistic (2) Tanh (3) Sine

Rohit Kumar IIT Delhi August 8, 2025 19 / 26

Logistic Regression: Neural Network:

Example: 1 Layer.

b = −2
w=2 w=1
(1)
x1

b=0
One kink
w=1 w=1
b=0
0 (0)
x1 4

(0) (1) (2)

4 4

(1) (2)
x2 x1

One kink
Two kinks

0 (0) 0 (0)
x1 4 x1 4

Rohit Kumar IIT Delhi August 8, 2025 20 / 26

Logistic Regression: Neural Network:

Deep vs Shallow:

▶ When and Why Are Deep Networks Better than Shallow Ones?

▶ Both shallow and deep networks can approximate arbitrarily well any
continuous function of d variables on a compact domain.

▶ Suppose we want to approximate of functions with a compositional structure

f (x1 , . . . , xd ) = h1 (h2 . . . (hj (hi1 (x1 , x2 ), hi2 (x3 , x4 )), . . .))

▶ For shallow learning, we need parameter complexity of order ϵ−d/r .

▶ For deep learning, we need parameter complexity around ϵ−2/r .

Rohit Kumar IIT Delhi August 8, 2025 21 / 26

Logistic Regression: Neural Network:

Gradient Descent:

▶ Suppose we wish solve the minimization problem θ∗ = minθ Π(θ).

▶ Consider the Taylor expansion about θ0

⊤
∂Π(θ0 ) ∂ 2 Π(θ̂)
Π(θ0 + ∆θ) = Π(θ0 ) + ∆θ + ∆θ⊤ ∆θ.
∂θ ∂θ∂θ′

for some θ̂ = θ0 + α∆θ, where 0 ≤ α ≤ 1.

▶ When |∆θ| is small we can neglect the second order term as

⊤
∂Π(θ0 )
Π(θ0 + ∆θ) ≈ Π(θ0 ) + ∆θ.
∂θ

Rohit Kumar IIT Delhi August 8, 2025 22 / 26

Logistic Regression: Neural Network:

Gradient Descent:

▶ We should choose ∆θ to reduce objective function.

▶ We need to choose the step ∆θ in the opposite direction of the gradient,

∂Π(θ0 )
∆θ = −η
∂θ
with the step-size η ≥ 0, also known as the learning-rate.

▶ This is the crux of the GD algorithm,

Rohit Kumar IIT Delhi August 8, 2025 23 / 26

Logistic Regression: Neural Network:

Gradient Descent:

1. Initialize k = 0 and θ0

2. While |Π(θk ) − Π(θk−1 )| > ϵ1 , do

∂Π(θk )
(a) Evaluate ∂θ

(b) Update θk+1 = θk − η ∂Π(θ

∂θ
k)
.

Rohit Kumar IIT Delhi August 8, 2025 24 / 26

Logistic Regression: Neural Network:

Advanced Algorithms:
▶ In general, the update formula for most optimization algorithms is

[θk+1 ]i = [θk ]i − [ηk ]i [gk ]i , 1 ≤ i ≤ Nθ ,

▶ Momentum methods make use of the history of the gradient

∂Π(θk )
[ηk ]i = η, gk = β1 gk−1 + (1 − β1 ) , g−1 = 0.
∂θ

▶ Adam’s algorithm: gk same as momentum algorithm (β1 = 0.9, β2 = 0.999

and ϵ = 10−8 ). Additionally,
2
∂Π(θk )
[Gk ]i = β2 [Gk−1 ]i + (1 − β2 )
∂θi
η
[ηk ]i = p
[Gk ]i + ϵ

Rohit Kumar IIT Delhi August 8, 2025 25 / 26

Logistic Regression: Neural Network:

Back Propagation:

▶ In deep learning we are minimizing the least square

N
X
(Yi − f (Xi ; θ))2 .
i=1

▶ Recall that

f (x; θ) = L(L+1) ◦ σ ◦ L(L) ◦ σ ◦ L(L−1) ◦ · · · ◦ σ ◦ L(1) (x).

▶ We can take derivative backwards for all parameters.

Rohit Kumar IIT Delhi August 8, 2025 26 / 26

Lecture 22
No ratings yet
Lecture 22
27 pages
Week 14 (NN)
No ratings yet
Week 14 (NN)
49 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
06 - Week 4 Lecture Notes 2
No ratings yet
06 - Week 4 Lecture Notes 2
6 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
77 pages
06 NeuralNetworks 2024
No ratings yet
06 NeuralNetworks 2024
82 pages
ML Week 4 To 10 PDF
No ratings yet
ML Week 4 To 10 PDF
146 pages
13 - Neural Network (Perceptrons)
No ratings yet
13 - Neural Network (Perceptrons)
31 pages
Neural Networks Skimmed - Ipynb - Colab
No ratings yet
Neural Networks Skimmed - Ipynb - Colab
8 pages
4 Classification 2
No ratings yet
4 Classification 2
55 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
5 LogRegNN
No ratings yet
5 LogRegNN
74 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
Feedforward Networks: Marco Kuhlmann
No ratings yet
Feedforward Networks: Marco Kuhlmann
53 pages
01 Basics 01ML 02
No ratings yet
01 Basics 01ML 02
35 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
Pma 5
No ratings yet
Pma 5
39 pages
Neural Networks & Fuzzy Logic Intro
No ratings yet
Neural Networks & Fuzzy Logic Intro
42 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
55 pages
Session 6 Perceptron Logistic Regression
No ratings yet
Session 6 Perceptron Logistic Regression
27 pages
ANN-Unit 3 - Regression & Multi-Layer Perceptron
No ratings yet
ANN-Unit 3 - Regression & Multi-Layer Perceptron
35 pages
Slide 7 - Neural Networks
No ratings yet
Slide 7 - Neural Networks
64 pages
3-Neural Network
No ratings yet
3-Neural Network
26 pages
Chapter 2 Adaline
No ratings yet
Chapter 2 Adaline
71 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Chapter 2 - 2 Shallow Neural Network 2 - 2
No ratings yet
Chapter 2 - 2 Shallow Neural Network 2 - 2
34 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Unit II
No ratings yet
Unit II
12 pages
Neural Networks
No ratings yet
Neural Networks
12 pages
Neural Nets
No ratings yet
Neural Nets
33 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
AI2025 Lecture06 Recording Slide
No ratings yet
AI2025 Lecture06 Recording Slide
38 pages
2.3 Feed Forward Netwoks
No ratings yet
2.3 Feed Forward Netwoks
25 pages
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
No ratings yet
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
53 pages
Neural Network
No ratings yet
Neural Network
14 pages
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
No ratings yet
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
48 pages
Neural Network Techniques Explained
No ratings yet
Neural Network Techniques Explained
34 pages
4 Classification 3
No ratings yet
4 Classification 3
59 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
AI2025 Lecture04 Recording Slide
No ratings yet
AI2025 Lecture04 Recording Slide
42 pages
Neural Networks & Feature Transformations
No ratings yet
Neural Networks & Feature Transformations
28 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
cs188 sp24 Note22
No ratings yet
cs188 sp24 Note22
8 pages
EEE3335 - Final Exam Companion Prep Pack
No ratings yet
EEE3335 - Final Exam Companion Prep Pack
7 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
Lecture15 Neural Nets
No ratings yet
Lecture15 Neural Nets
70 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
59 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Deep Learning
No ratings yet
Deep Learning
50 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Lec-04-Logistic Regression and Neural Networks PDF
No ratings yet
Lec-04-Logistic Regression and Neural Networks PDF
32 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Backpropagation & RNNs in AI
No ratings yet
Backpropagation & RNNs in AI
162 pages
Linear Regression Assignment
No ratings yet
Linear Regression Assignment
49 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
Big Data Clustering Techniques
No ratings yet
Big Data Clustering Techniques
28 pages
What Is Backpropagation
No ratings yet
What Is Backpropagation
8 pages
AI Professional Diploma Course Overview
100% (2)
AI Professional Diploma Course Overview
8 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
A Multilayer Perceptron Neural Network Model For Predicting Diabetes
No ratings yet
A Multilayer Perceptron Neural Network Model For Predicting Diabetes
9 pages
Machine Learning for At-Risk Students
No ratings yet
Machine Learning for At-Risk Students
20 pages
DL MID2 Bit Bank 2024-25
No ratings yet
DL MID2 Bit Bank 2024-25
25 pages
Ecological Modelling and Energy DSS
No ratings yet
Ecological Modelling and Energy DSS
325 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
AI & Computer Vision Lab Manual
No ratings yet
AI & Computer Vision Lab Manual
36 pages
Program 4
No ratings yet
Program 4
3 pages
Data Collection: Primary vs Secondary
No ratings yet
Data Collection: Primary vs Secondary
24 pages
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition) Prateek Gupta instant download
No ratings yet
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition) Prateek Gupta instant download
81 pages
Dual View AI Enhances Breast Cancer Detection
No ratings yet
Dual View AI Enhances Breast Cancer Detection
15 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
63 pages
Final Unit 3 Questions
No ratings yet
Final Unit 3 Questions
9 pages
Student Dropout Prediction in MOOC Using Machine Learning Algorithms
No ratings yet
Student Dropout Prediction in MOOC Using Machine Learning Algorithms
6 pages
Fin Irjmets1642882332
No ratings yet
Fin Irjmets1642882332
17 pages
Digital Image Processing
No ratings yet
Digital Image Processing
15 pages
Knowledge Discovery in Databases: Javier B Ejar Cbea
No ratings yet
Knowledge Discovery in Databases: Javier B Ejar Cbea
30 pages
Alice Treesa M
No ratings yet
Alice Treesa M
10 pages
Adaline and Madaline Neural Network Architecture
No ratings yet
Adaline and Madaline Neural Network Architecture
9 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
ML001 Getting Started
No ratings yet
ML001 Getting Started
25 pages
AI & ML Trends for Academics
No ratings yet
AI & ML Trends for Academics
89 pages
MLP Lecture 4
No ratings yet
MLP Lecture 4
35 pages