10/24/24, 10:09 AM Deep Learning - IIT Ropar - - Unit 10 - Week 7
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
vjb059@gmail.com
NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Deep Learning - IIT Ropar (course)
Course Week 7 : Assignment 7
outline The due date for submitting this assignment has passed.
Due on 2024-09-11, 23:59 IST.
About
NPTEL ()
Assignment submitted on 2024-09-11, 15:16 IST
How does an Common Data Q1-Q2
NPTEL
online Consider two models:
course
f^1 (x) = w0 + w1 x
work? ()
f^2 (x) = w0 + w1 x2 + w2 x2 + w4 x4 + w5 x5
Week 1 ()
Week 2 () 1) Which of these models has higher complexity? 1 point
Week 3 ()
f^1 (x)
week 4 ()
f^2 (x)
Week 5 () It is not possible to decide without knowing the true distribution of data points in the
dataset.
Week 6 () Yes, the answer is correct.
Score: 1
Week 7 () Accepted Answers:
f^2 (x)
Bias and
Variance (unit? 2) We generate the data using the following model: 1 point
unit=92&lesso
n=93)
y = 5x3 + 2x + x + 3.
Train error vs
Test error We fit the two models f^1 (x) and f^2 (x) on this data and train them using a neural network.
(unit?
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=92&assessment=295 1/4
10/24/24, 10:09 AM Deep Learning - IIT Ropar - - Unit 10 - Week 7
unit=92&lesso
n=94) f^1 (x) has a higher bias than f^2 (x).
Train error vs
Test error f^1 (x) has a higher variance than f^2 (x).
(Recap) (unit?
unit=92&lesso f^2 (x) has a higher bias than f^1 (x).
n=95)
True error and f^2 (x) has a higher variance than f^1 (x).
Model
Yes, the answer is correct.
complexity Score: 1
(unit? Accepted Answers:
unit=92&lesso
f^1 (x) has a higher bias than f^2 (x).
n=96)
f^2 (x) has a higher variance than f^1 (x).
L2
regularization
Common Data Q3-Q6
(unit?
unit=92&lesso
Consider a function L(w, b) = 0.5w2 + 5b2 + 1 and its contour plot given below:
n=97)
Dataset
augmentation
(unit?
unit=92&lesso
n=98)
Parameter
sharing and
tying (unit?
unit=92&lesso
n=99)
Adding Noise
to the inputs
(unit?
unit=92&lesso
n=100)
Adding Noise
to the outputs
(unit?
unit=92&lesso
n=101)
Early stopping
(unit?
unit=92&lesso
3) What is the value of L(w∗ , b∗ ) where w∗ and b∗ are the values that minimize the function.
n=102)
Ensemble
1
Methods (unit?
Yes, the answer is correct.
unit=92&lesso Score: 1
n=103) Accepted Answers:
Dropout (unit?
(Type: Range) 0.9,1.1
unit=92&lesso
1 point
n=104)
4) What is the sum of the elements of ∇L(w∗ , b∗ ) ?
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=92&assessment=295 2/4
10/24/24, 10:09 AM Deep Learning - IIT Ropar - - Unit 10 - Week 7
Lecture 0
Material for
Week 7 (unit?
Yes, the answer is correct.
Score: 1
unit=92&lesso
Accepted Answers:
n=105)
(Type: Numeric) 0
Quiz: Week 7 1 point
: Assignment
7 5) What is the determinant of H L (w∗ , b∗ ), where H is the Hessian of the function?
(assessment?
10
name=295)
Week 7
Yes, the answer is correct.
Score: 1
Feedback
Accepted Answers:
Form: Deep
(Type: Numeric) 10
Learning - IIT
Ropar (unit? 1 point
unit=92&lesso
n=236) 6) Compute the Eigenvalues and Eigenvectors of the Hessian. According to the eigen- 1 point
values of the Hessian, which parameter is the loss more sensitive to?
Week 8 ()
b
Week 9 ()
w
week 10 () Yes, the answer is correct.
Score: 1
Week 11 () Accepted Answers:
b
Week 12 ()
7) Suppose that a model produces zero training error. What happens if we use L2 1 point
regularization, in general?
Download
Videos () It might increase training error
It might decrease test error
Books ()
It might decrease training error
Text Reduce the complexity of the model by driving less important weights to close to zero
Transcripts Yes, the answer is correct.
() Score: 1
Accepted Answers:
Problem It might increase training error
Solving It might decrease test error
Session - Reduce the complexity of the model by driving less important weights to close to zero
July 2024 ()
8) Suppose that we apply Dropout regularization to a feed forward neural network. 1 point
Suppose further that mini-batch gradient descent algorithm is used for updating the parameters
of the network. Choose the correct statement(s) from the following statements.
The dropout probability p can be different for each hidden layer
Batch gradient descent cannot be used to update the parameters of the network
Dropout with p = 0.5 acts as a ensemble regularize
The weights of the neurons which were dropped during the forward propagation at tth
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=92&assessment=295 3/4
10/24/24, 10:09 AM Deep Learning - IIT Ropar - - Unit 10 - Week 7
iteration will not get updated during t + 1 th iteration
Yes, the answer is correct.
Score: 1
Accepted Answers:
The dropout probability p can be different for each hidden layer
Dropout with p = 0.5 acts as a ensemble regularize
9) We have trained four different models on the same dataset using various 1 point
hyperparameters. The training and validation errors for each model are provided below. Based
on this information, which model is likely to perform best on the test dataset?
Model 1
Model 2
Model 3
Model 4
Yes, the answer is correct.
Score: 1
Accepted Answers:
Model 2
10) Consider the problem of recognizing an alphabet (in upper case or lower case) of 1 point
English language in an image. There are 26 alphabets in the language. Therefore, a team
decided to use CNN network to solve this problem. Suppose that data augmentation technique is
being used for regularization. Then which of the following transformation(s) on all the training
images is (are) appropriate to the problem
Rotating the images by ±10∘
Rotating the images by ±180∘
Translating image by 1 pixel in all direction
Cropping
Yes, the answer is correct.
Score: 1
Accepted Answers:
Rotating the images by ±10∘
Translating image by 1 pixel in all direction
Cropping
https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=92&assessment=295 4/4