DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
GULZAR COLLEGE OF ENGINEERING,
KHANNA, LUDHIANA
COURSE FILE
B. TECH: 7TH SEMESTER
COURSE NAME: DEEP LEARNING
COURSE CODE: BTCS 704-18
Course File (Part A)
Table of Contents
Sr. Page
No. Particulars No
1 University syllabus Copy 1
2 University question papers of the last 03 years 2
3 Academic Calendar 3-4
4 Faculty Time Table 5
Lesson Plan, to be prepared with allocation of the number of hours devoted to
5 each Unit as per university syllabus 6-8
6 Content Beyond Syllabus covered/to be covered in class 8
7 List of registered students 9
8 Quiz question paper(s) and result (minimum 3 quizzes are to be conducted) 10-13
9 Tutorial sheets NA
Identification of Academically Weak Students and action taken for
10 improvement 14
Identification of Fast learners and efforts made to help fast learners to achieve
11 university merit positions 14
12 Monthly students’ attendance report 15
13 Subject Notes 16-31
DEEP LEARNING(THEORY)
:
BTCS 704-18 DEEP LEARNING 3L:0T:0P 3 Credits
Detailed Contents:
UNIT 1: Machine Learning Basics: Learning, Under-fitting, Overfitting, Estimators, Bias,
Variance, Maximum Likelihood Estimation, Bayesian Statistics, Supervised Learning,
Unsupervised Learning and Stochastic Gradient Decent. [4hrs] (CO 1)
UNIT 2: Deep Feedforward Network: Feed-forward Networks, Gradient-based Learning, Hidden
Units, Architecture Design, Computational Graphs, Back-Propagation, Regularization, Parameter
Penalties, Data Augmentation, Multi-task Learning, Bagging, Dropout and Adversarial Training
and Optimization. [4hrs] (CO 2)
UNIT 3: Convolution Networks: Convolution Operation, Pooling, Basic Convolution Function,
Convolution Algorithm, Unsupervised Features and Neuroscientific for convolution Network.
[6hrs] (CO 3)
UNIT 4: Sequence Modelling: Recurrent Neural Networks (RNNs), Bidirectional RNNs,
Encoder- Decoder Sequence-to-Sequence Architectures, Deep Recurrent Network, Recursive
Neural Networks and Echo State networks. [12hrs] (CO 4)
UNIT 5: Deep Generative Models: Boltzmann Machines, Restricted Boltzmann Machines, Deep
Belief Networks, Deep Boltzmann Machines, Sigmoid Belief Networks, Directed Generative Net,
Drawing Samples from Auto encoders. [14hrs] (CO 5)
Last Year Question Papers
Academic Calendar
Session 2024
Faculty Time Table
GULZAR GROUP OF ENGINEERING, LUDHIANA
LESSON PLAN
B. TECH CSE (AIML & DS)
DEEP LEARNING (BTCS 704-18)
CO1: Comprehend the advancements in learning techniques
CO2: Compare and explain various deep learning architectures and algorithms.
Course CO3: Demonstrate the applications of Convolution Networks
Outcomes
CO4: Apply Recurrent Network for Sequence Modelling
CO5: Deploy the Deep Generative Models
LESSON PLAN SCHEDULE
Deviat
ion
Unit No. Lect Referred Lecture Facul
Learning from
& Unit ure Topic Book* Web Deliver signat
Objectives lesson
Name No. with Page Link y Date e
plan if
No
any
Goodfellow https://
L., Bengio Y. www.geeks
and Courville forgeeks.or
Understand the
A., g/deep-
process of training
1 Deep learning- NO MK
a
Learning, MIT tutorial/
Machine model to recognize
Press (2016).
Learning Basics: patterns in data
Learning, through
Unit 1 Under-fitting, experience.
Goodfellow https://
L., Bengio Y. www.geeksf
Identify scenarios and Courville orgeeks.org
where a model is A., /deep-
2 too Deep learning- NO MK
simple to capture Learning, MIT tutorial/
the underlying Press (2016).
Overfitting, patterns
Unit 1 Estimators, in the data.
Recognize when a Goodfellow https://
model is too L., Bengio Y. www.geeksf
complex and and Courville orgeeks.org
captures noise in A., /deep-
3
the data, leading Deep learning- NO MK
to Learning, MIT tutorial/
poor Press (2016).
Unit 1 Bias, Variance, generalization.
Unit 1 4 Maximum https://
Learn how different Goodfellow www.geeksf
Likelihood algorithms estimate L., Bengio orgeeks.org/
deep-
learning- NO MK
Y. tutorial/
and
Courville
A.,
Deep
Estimation,Bay Learning,
esian model parameters to MIT Press
Statistics, fit data. (2016).
Supervised Understand the Goodfellow https://
Learning, error introduced by L., Bengio Y. www.geeksf
Unsupervised approximating a and Courville NO MK
orgeeks.org/
Learning and real-world problem, A., deep-
5
often Deep learning-
due to simplifying Learning, tutorial/
assumptions in the MIT Press
Unit 1 model. (2016).
Understand the Goodfellow
https://
framework of L., Bengio Y.
www.geeksf
Bayesian and Courville NO MK
orgeeks.org/
statistics and how it A.,
deep-
updates the Deep
6 learning-
probability Learning,
tutorial/
of a hypothesis as MIT Press
Stochastic more evidence or (2016).
Gradient information
Unit 1 Decent. becomes available.
Understand the Goodfellow https://
principles and L., Bengio Y. www.geeksf
techniques of and Courville orgeeks.org NO MK
deep feedforward A., /deep-
networks, including Deep learning-
their Learning, tutorial/
structure (feed- MIT Press
forward networks, (2016).
7 hidden units,
architecture design,
computational
Deep graphs),
Feedforward learning methods
Network: Feed- (gradient-based
forward learning,
Networks ata back-propagation,
Unit 2 and records, optimization)
Understand the Goodfellow https://
principles and L., Bengio www.geeksf
techniques of Y. orgeeks.org
deep feedforward and /deep-
NO MK
networks, including Courville learning-
their A., tutorial/
structure (feed- Deep
forward networks, Learning,
8 hidden units, MIT Press
architecture design, (2016).
computational
graphs),
learning methods
Measure, (gradient-based
Dimension, learning,
Discrete and back-propagation,
Unit 2 Continuous. optimization)
Unit 2 Understand the Goodfellow https://
principles and L., Bengio Y. www.geeksf
techniques of orgeeks.org
deep feedforward and /deep-
networks, including Courville A., learning- NO MK
their tutorial/
structure (feed- Deep
forward networks, Learning,
9 hidden units, MIT Press
architecture design, (2016).
computational
graphs),
learning methods
(gradient-based
learning,
Gradient-based
back-propagation,
Learning,
Hidden Units, optimization)
Unit 2 10 Understand the Goodfellow https://
Architecture principles and www.geeksf
L., Bengio Y.
Design, techniques of and Courville orgeeks.org
Computational deep feedforward /deep-
A., NO MK
Graphs, networks, including learning-
Deep
their Learning, tutorial/
structure (feed- MIT Press
forward networks, (2016).
hidden units,
architecture design,
computational
graphs),
learning methods
(gradient-based
learning,
back-propagation,
optimization)
Understand the https://
principles and www.gee
NO MK
techniques of ksforgeek
deep feedforward s.org/
networks, including deep-
Goodfellow learning-
their
L., Bengio Y. tutorial/
structure (feed-
and Courville
forward networks,
A.,
11 hidden units,
Deep
architecture design,
Learning,
computational
MIT Press
graphs),
(2016).
Back- learning methods
Propagatio (gradient-based
n, learning,
Regulariza back-propagation,
Unit 2 tion, optimization)
Unit 2 Understand the Goodfellow https://
principles and L., Bengio Y. www.gee
techniques of and Courville ksforgeek
deep feedforward A., s.org/
networks, including Deep deep- NO MK
their Learning, learning-
structure (feed- MIT Press tutorial/
forward networks, (2016).
12 hidden units,
architecture design,
computational
Regularization, graphs),
Parameter learning methods
Penalties, (gradient-based
Data learning,
Augmentation back-propagation,
, optimization)
Unit 2 13 Multi-task Understand the Goodfellow https://
Learning, principles and L., Bengio www.geeksf
Bagging, techniques of Y. orgeeks.org/
Dropout
deep feedforward and deep- NO MK
networks, including Courville learning-
their A., tutorial/
structure (feed- Deep
forward networks, Learning,
hidden units, MIT Press
architecture design, (2016).
computational
graphs),
learning methods
(gradient-based
learning,
back-propagation,
optimization)
Unit 2 Understand the Goodfello https://
principles and w L., Bengio www.geeksf
techniques of Y. orgeeks.org/
deep feedforward and Courville deep-
networks, including A., learning-
their Deep tutorial/ NO MK
structure (feed- Learning, MIT
forward networks, Press (2016).
14 hidden units,
architecture design,
computational
graphs),
learning methods
(gradient-based
learning,
Adversarial
Training and back-propagation,
Optimization. optimization)
Goodfellow https://
in spreadsheets is to L., Bengio Y. www.geeksfo
effectively visualize rgeeks.org/ NO MK
complex data and deep-
relationships, trends, learning-
Courville A.,
15 and tutorial/
Convolution
Operation, comparisons to Deep
Pooling, Basic facilitate better Learning,
Convolution analysis MIT Press
Function, and insights. (2016).
UNIT 3
UNIT 3 16 Convolution is to effectively Goodfellow https://
Algorithm, visualize different L., Bengio Y. www.geeksf
Unsupervised dimensions orgeeks.org/
Features of data to reveal and deep-
patterns, Courville A., learning- NO MK
relationships, tutorial/
and trends in a clear Deep
and accessible Learning,
manner. MIT Press
(2016).
Goodfellow https://
L., Bengio Y. www.geeksf
orgeeks.org/
and deep-
NO MK
Courville A., learning-
17
To provide a concise tutorial/
visual comparison Deep
Neuroscientific of performance Learning,
for convolution against a target MIT Press
UNIT 3 Network. metric. (2016).
To provide Goodfellow https://
customizable L., Bengio Y. www.geeksf
mapping solutions orgeeks.org/
that allow and deep-
Sequence developers to create Courville A., learning-
18
Modelling: interactive tutorial/
Recurrent and visually Deep
Neural appealing maps for Learning,
Networks various MIT Press
UNIT 4 (RNNs), applications. (2016).
Goodfellow https://
is to effectively L., Bengio Y. www.geeksf
visualize complex orgeeks.org/
data and deep-
Bidirectional relationships, project Courville A., learning-
19
RNNs, Encoder- timelines, financial tutorial/
Decoder trends, and Deep
Sequence-to- comparative metrics Learning,
Sequence for clearer MIT Press
UNIT 4 Architectures, analysis and insights. (2016).
Goodfellow https://
L., Bengio Y. www.geeksf
orgeeks.org/
Deep Recurrent To visually represent and deep-
Network, proportions of a Courville A., learning-
20
Recursive whole, tutorial/
Neural emphasizing the Deep
Networks and relationship Learning,
Echo State between parts MIT Press
UNIT 4 networks. and the total. (2016).
UNIT 5 21 Boltzmann is to visually Goodfellow https://
Machines, compare changes in L., Bengio Y. www.geeksf
Restricted two related orgeeks.org/ NO MK
Boltzmann datasets over time and deep-
Machines, or categories, Courville A., learning-
highlighting tutorial/
trends and Deep
Learning,
differences MIT Press
effectively. (2016).
Goodfellow https://
L., Bengio Y. www.geeksf
to streamline data orgeeks.org/ NO MK
management and and deep-
22 visualization Courville A., learning-
Deep Belief processes, ensuring tutorial/
Networks, efficient Deep
Deep interactions and Learning,
Boltzmann intuitive user MIT Press
UNIT 5 Machines, experiences. (2016).
Goodfellow https://
to enable users to L., Bengio Y. www.geeksf
customize data orgeeks.org/ NO MK
views and deep-
and receive timely Courville A., learning-
23
notifications, tutorial/
Sigmoid Belief enhancing Deep
Networks, interactive analysis Learning,
Directed and decision-making MIT Press
UNIT 5 Generative Net, within dashboards. (2016).
Goodfellow https://
L., Bengio Y. www.geeksf
to create tailored orgeeks.org/ NO MK
visual and deep-
representations Courville A., learning-
24
that enhance data tutorial/
Drawing storytelling and Deep
Samples from improve the clarity Learning,
Auto – and impact of MIT Press
UNIT 5 encoders. visualizations. (2016).
Contents Beyond Syllabus
B.Tech CSE Semester: 7th
(AIML & DS)
Topics identified
under "Contents
Beyond Resource to Refer
Syllabus"
Grad-CAM. https://arxiv.org/abs/1610.02391
RMSprop, https://deepchecks.com/glossary/rmsprop/
List of Registered Students
Branch/ B. Tech. -
CSE DEEP LEARNING Semester:7th Section/Group: A
Course:
(AIML & DS)
Sr. No. Roll No. Name of Student Contact No Email
1 2121931 Abhinav Thakur
Abhishek Kumar
2 2121932 Mishra
3 2121933 Ashish Sharma
4 2121934 Bhavya Rathore
5 2121935 Dhanraj Verma
6 2121936 Fardin alam
7 2121937 Ghansham
HARSHIT KUMAR
8 2121938 MISHRA
9 2121939 Hartej Singh Gill
10 2121940 INDRA PRASAD THARU
11 2121941 Jasmeet Kaur
12 2121942 Khushpreet Kaur
13 2121943 Krishu
14 2121944 Md anas alam
15 2121946 MD Rahil Khan
16 2121947 Md Shadakat Ekwal
17 2121948 Mohd Amir
18 2121949 Mohd Amir
19 2121950 Muskaan Nazir
20 2121951 Rajvansh Singh
21 2121952 Rakesh Ranjan
22 2121956 SHAHID RAZA
23 2121958 SHAILENDRA
24 2121959 Shivam Shriwastav
25 2121960 Shubham Thakur
26 2121961 SONU
27 2121962 Sujay Mann
28 2121964 Suraj Rao
29 2121965 Vaseem
30 2221619 NIKET VERMA
31 2221620 Pooja Kumari
32 2221621 SHREYA SUMAN
33 2221622 Taha Pathan
34 2121966 Aakib Hussain
35 2121968 Arbab Ali
36 2121969 MAAZ AHMAD
37 2121970 MAHIMA SINGH
38 2121971 Pradeep Kumar
39 2121972 Saheb kumar
40 2121973 Sami akhtar
41 2121974 Sudha Kumari
42 2121975 Vishal kumar
43 2221667 Manvir Singh
44 2221668 Tanvi
Quiz 1
Which of the following is a popular activation function used in deep learning models?
a) Softmax
b) ReLU
c) Tanh
d) All of the above
What is the main advantage of using dropout in neural networks?
a) It increases the size of the neural network
b) It reduces overfitting
c) It improves training speed
d) It increases the model's complexity
Which of the following is not a type of neural network architecture?
a) Convolutional Neural Network (CNN)
b) Recurrent Neural Network (RNN)
c) Random Forest
d) Generative Adversarial Network (GAN)
In the context of deep learning, what does "backpropagation" refer to?
a) The process of initializing the weights in a neural network
b) The process of updating the weights of the network based on the error gradient
c) The process of splitting the data into training and testing sets
d) The process of normalizing the input data
What is the purpose of an embedding layer in a neural network?
a) To reduce the dimensionality of the input data
b) To convert categorical data into numerical data
c) To learn a low-dimensional representation of high-dimensional data
d) To apply regularization to the model
Which of the following techniques is used to prevent the vanishing gradient problem?
a) Batch normalization
b) ReLU activation function
c) LSTM cells
d) All of the above
Which type of neural network is best suited for sequential data, such as time series or
text?
a) Feedforward Neural Network
b) Convolutional Neural Network (CNN)
c) Recurrent Neural Network (RNN)
d) Radial Basis Function Network
What does "overfitting" mean in the context of machine learning and deep learning?
a) The model performs well on the training data but poorly on new, unseen data
b) The model performs well on both training and test data
c) The model performs poorly on both training and test data
d) The model's complexity is too low
Which of the following optimizers is commonly used in deep learning?
a) Gradient Descent
b) Stochastic Gradient Descent (SGD)
c) Adam
d) All of the above
Quiz 2
What is the main purpose of a loss function in a neural network?
a) To initialize the weights of the network
b) To measure the difference between the predicted and actual values
c) To update the learning rate
d) To regularize the model
Which of the following is a commonly used loss function for binary classification
problems?
a) Mean Squared Error (MSE)
b) Cross-Entropy Loss
c) Hinge Loss
d) L1 Loss
In the context of Convolutional Neural Networks (CNNs), what does "stride" refer
to?
a) The number of neurons in the convolutional layer
b) The step size with which the convolutional filter moves across the input
c) The depth of the convolutional layer
d) The number of output channels
Which type of layer is typically used in a neural network to reduce the risk of
overfitting?
a) Convolutional layer
b) Dense layer
c) Dropout layer
d) Pooling layer
What is the main function of an optimizer in training a neural network?
a) To add regularization to the model
b) To find the weights that minimize the loss function
c) To split the dataset into training and validation sets
d) To preprocess the input data
Which of the following is true about Transfer Learning?
a) It involves training a neural network from scratch
b) It leverages a pre-trained model for a different but related task
c) It requires more data than training from scratch
d) It is not suitable for image classification tasks
In the context of Recurrent Neural Networks (RNNs), what does "vanishing gradient" refer
to?
a) The network's output becoming zero
b) The gradient values becoming too small during backpropagation through time
c) The network's weights becoming too large
d) The network converging too quickly
What is the key feature of Long Short-Term Memory (LSTM) networks that helps them
handle long-term dependencies?
a) Convolutional layers
b) Gated units that control the flow of information
c) Dropout regularization
d) Large number of layers
Which of the following techniques is used to augment training data in image classification
tasks?
a) Dropout
b) Batch normalization
c) Data augmentation
d) Early stopping
What does "epoch" mean in the context of training a neural network?
a) One forward and backward pass of all the training examples
b) The number of layers in the neural network
c) The number of neurons in a layer
d) The size of the training dataset
Which of the following is not an activation function used in neural networks?
a) Sigmoid
b) Tanh
c) ReLU
d) k-means
In a neural network, what is the function of the softmax layer?
a) To perform convolution operations
b) To normalize the output probabilities so they sum to one
c) To reduce the number of parameters in the network
d) To apply regularization to the model
Which of the following describes a "fully connected layer" in a neural network?
a) Each neuron is connected to every neuron in the previous layer
b) Neurons are connected to their neighbors in a grid
c) Neurons are connected in a tree structure
d) Neurons have connections only with their own layer
What is the main advantage of using a Convolutional Neural Network (CNN) for
image processing tasks?
a) CNNs can handle sequential data
b) CNNs can learn spatial hierarchies of features
c) CNNs require less training data than other models
d) CNNs are more interpretable than other models
Which of the following regularization techniques adds a penalty proportional to the
absolute value of the weights?
a) L1 regularization
b) L2 regularization
c) Dropout regularization
d) Data augmentation
In the context of neural networks, what does "gradient descent" refer to?
a) A method to initialize the weights
b) A method to optimize the loss function by iteratively adjusting the weights
c) A technique to normalize the input data
d) A method to split the data into training and test sets
Which of the following describes the purpose of batch normalization?
a) To reduce the number of layers in the network
b) To standardize the inputs to a layer for each mini-batch
c) To randomly drop units during training
d) To increase the model's capacity
What is a common characteristic of deep learning models?
a) They have a large number of layers and parameters
b) They are always interpretable
c) They require little data to train effectively
d) They always outperform traditional machine learning models
Which of the following is true about Generative Adversarial Networks (GANs)?
a) They consist of a generator and a discriminator network
b) They are used primarily for supervised learning tasks
c) They cannot generate realistic images
d) They do not require a loss function
SNO NAME ROLL NO QUIZ 1 QUIZ 2 TOTAL
1
Abhinav Thakur 2121931
2 Abhishek Kumar
Mishra 2121932
3
Ashish Sharma 2121933
4
Bhavya Rathore 2121934
5
Dhanraj Verma 2121935
6 Fardin alam 2121936
7
Ghansham 2121937
8 HARSHIT KUMAR
MISHRA 2121938
9
Hartej Singh Gill 2121939
INDRA PRASAD
10 THARU 2121940
11
Jasmeet Kaur 2121941
Khushpreet
12 Kaur
2121942
13 Krishu 2121943
14 Md anas alam 2121944
15
MD Rahil Khan 2121946
Md Shadakat
16 Ekwal 2121947
17 Mohd Amir 2121948
18 Mohd Amir 2121949
19 Muskaan Nazir 2121950
20 Rajvansh Singh 2121951
21 Rakesh Ranjan 2121952
22 SHAHID RAZA 2121956
23 SHAILENDRA 2121958
24 Shivam Shriwastav 2121959
25 Shubham Thakur 2121960
26 SONU 2121961
27 Sujay Mann 2121962
28 Suraj Rao 2121964
29 Vaseem 2121965
30 NIKET VERMA 2221619
31 Pooja Kumari 2221620
32 SHREYA SUMAN 2221621
33 Taha Pathan 2221622
34 Aakib Hussain 2121966
35 Arbab Ali 2121968
36 MAAZ AHMAD 2121969
37 MAHIMA SINGH 2121970
38 Pradeep Kumar 2121971
39 Saheb kumar 2121972
40 Sami akhtar 2121973
42 Vishal kumar 2121975
43 Manvir Singh 2221667
44 Tanvi 2221668
Academically Weak Students
Identified as those having less than 40% marks in MST(s)
Branch/ Course: Semester:
Student
attendance
Sr. No. Name of Student Roll Number MST(s) marks
1
*Remedial measures undertaken:
Extra sessions by simplifying the topic by one-to-one interaction
Unit 1 was revised after MST 1.
Students were encouraged to make notes on these contents.
Fast Learners
Identified as those having good past results and MST(s) score
Branch/ Course: Semester:
University
Position last year
if any
Sr. No. Name of Student Roll Number MST(s) marks
1
2
3
4
5
*Action taken undertaken:
Motivate them to learn new things beyond syllabus
They were asked to help slow learners in creating notes
Preparation to achieve good campus placement
Student Attendance
Course: B. Tech. - CSE (AIML & DS)
Year / Semester: 7th
Subject Name: DEEP LEARNING
Faculty Name: Milandeep Kour Bali
Faculty Type: Regular
Total No.
Attended % Of
Of
Classes Attendance
Sr. No. Student Name Roll No. Classes
1 Abhinav Thakur 2221623
2 Abhishek Kumar Mishra 2221624
3 Ashish Sharma 2221625
4 Bhavya Rathore 2221626
5 Dhanraj Verma 2221627
6 Fardin alam 2221628
7 Ghansham 2221632
8 HARSHIT KUMAR MISHRA 2221633
9 Hartej Singh Gill 2221634
10 INDRA PRASAD THARU 2221635
11 Jasmeet Kaur 2221636
12 Khushpreet Kaur 2221638
13 Krishu 2221639
14 Md anas alam 2221643
15 MD Rahil Khan 2221646
16 Md Shadakat Ekwal 2221647
17 Mohd Amir 2221648
18 Mohd Amir 2221649
19 Muskaan Nazir 2221650
20 Rajvansh Singh 2221651
21 Rakesh Ranjan 2221653
22 SHAHID RAZA 2221654
23 SHAILENDRA 2221655
24 Shivam Shriwastav 2221656
25 Shubham Thakur 2221657
26 SONU 2221659
27 Sujay Mann 2221660
28 Suraj Rao 2221661
29 Vaseem 2221662
30 NIKET VERMA 2221663
31 Pooja Kumari 2221664
32 SHREYA SUMAN 2221665
33 Taha Pathan 2221666
34 Aakib Hussain 2121966
35 Arbab Ali 2121968
36 MAAZ AHMAD 2121969
37 MAHIMA SINGH 2121970
38 Pradeep Kumar 2121971
39 Saheb kumar 2121972
40 Sami akhtar 2121973
41 Sudha Kumari 2121974
42 Vishal kumar 2121975
43 Manvir Singh 2221667
44 Tanvi 2221668
Subject Notes
Deep learning is a branch of machine learning which is completely based on artificial neural
networks, as neural networks are going to mimic the human brain so deep learning is also a
kind of mimic of the human brain.
This Deep Learning tutorial is your one-stop guide for learning everything about Deep
Learning. It covers both basic and advanced concepts, providing a comprehensive
understanding of the technology for both beginners and professionals. Whether you’re new
to Deep Learning or have some experience with it, this tutorial will help you learn about
different technologies of Deep Learning with ease.
What is Deep Learning?
Deep Learning is a part of Machine Learning that uses artificial neural networks to learn
from lots of data without needing explicit programming. These networks are inspired by the
human brain and can be used for things like recognizing images, understanding speech, and
processing language. There are different types of deep learning networks, like feedforward
neural networks, convolutional neural networks, and recurrent neural networks. Deep
Learning needs lots of labeled data and powerful computers to work well, but it can achieve
very good results in many applications.
Why is Deep Learning Important?
The reasons why deep learning has become the industry standard:
Handling unstructured data: Models trained on structured data can easily learn from
unstructured data, which reduces time and resources in standardizing data sets.
Handling large data: Due to the introduction of graphics processing units (GPUs), deep
learning models can process large amounts of data with lightning speed.
High Accuracy: Deep learning models provide the most accurate results in computer visions,
natural language processing (NLP), and audio processing.
Pattern Recognition: Most models require machine learning engineer intervention, but deep
learning models can detect all kinds of patterns automatically.
In this tutorial, we are going to dive into the world of deep learning and discover all the key
concepts required for you to start a career in artificial intelligence (AI). If you're looking to
learn with some practical exercises, check out our course, An Introduction to Deep
Learning in Python.
Core Concepts of Deep Learning
Before diving into the intricacies of deep learning algorithms and their applications, it's
essential to understand the foundational concepts that make this technology so revolutionary.
This section will introduce you to the building blocks of deep learning: neural networks, deep
neural networks, and activation functions.
Neural networks
At the heart of deep learning are neural networks, which are computational models inspired
by the human brain. These networks consist of interconnected nodes, or "neurons," that work
together to process information and make decisions. Just like our brain has different regions
for different tasks, a neural network has layers designated for specific functions.
We have a full guide, What are Neural Networks, which covers the essentials in more
detail.
Deep neural networks
What makes a neural network "deep" is the number of layers it has between the input and
output. A deep neural network has multiple layers, allowing it to learn more complex features
and make more accurate predictions. The "depth" of these networks is what gives deep
learning its name and its power to solve intricate problems.
Our introduction to deep neural networks tutorial covers the significance of DNNs in deep
learning and artificial intelligence.
Activation functions
In a neural network, activation functions are like the decision-makers. They determine what
information should be passed along to the next layer. These functions add a level of
complexity, enabling the network to learn from the data and make nuanced decisions.
How Deep Learning Works
Deep learning uses feature extraction to recognize similar features of the same label and then
uses decision boundaries to determine which features accurately represent each label. In the
cats and dogs classification, the deep learning models will extract information such as the
eyes, face, and body shape of animals and divide them into two classes.
The deep learning model consists of deep neural networks. The simple neural network
consists of an input layer, a hidden layer, and an output layer. Deep learning models consist of
multiple hidden layers, with additional layers that the model's accuracy has improved.
The input layers contain raw data, and they transfer the data to hidden layers' nodes. The
hidden layers' nodes classify the data points based on the broader target information, and with
every subsequent layer, the scope of the target value narrows down to produce accurate
assumptions. The output layer uses hidden layer information to select the most probable label.
In our case, accurately predicting a dog's image rather than a cat's.
What is Deep Learning Used For?
Recently, the world of technology has seen a surge in artificial intelligence applications, and
they all are powered by deep learning models. The applications range from recommending
movies on Netflix to Amazon warehouse management systems.
In this section, we are going to learn about some of the most famous applications built using
deep learning. This will help you realize the full potential of deep neural networks.
Computer Vision
Computer vision (CV) is used in self-driving cars to detect objects and avoid collisions. It is
also used for face recognition, pose estimation, image classification, and anomaly detection.
Automatic Speech Recognition
Automatic speech recognition (ASR) is used by billions of people worldwide. It is in our
phones and is commonly activated by saying "Hey, Google" or "Hi, Siri." Such audio
applications are also used for text-to-speech, audio classification, and voice activity detection.
Generative AI
Generative AI has seen a surge in demand as CryptoPunk NFT just sold for $1 million.
CryptoPunk is a generative art collection that was created using deep learning models. The
introduction of the GPT-4 model by OpenAI has revolutionized the text generation domain
with its powerful ChatGPT tool; now, you can teach models to write an entire novel or even
write code for your data science projects.
Translation
Deep learning translation is not limited to language translation, as we are now able to translate
photos to text by using OCR, or translate text to images by using NVIDIA GauGAN2 .
Time Series Forecast
Time series forecasting is used for predicting market crashes, stock prices, and changes in
the weather. The financial sector survives on speculation and future projections. Deep
learning and time series models are better than humans in detecting patterns and so are pivotal
tools in this and similar industries.
Deep Learning Models
Let's learn about different types of deep learning models and how they work.
Supervised Learning
Supervised learning uses a labeled dataset to train models to either classify data or predict
values. The dataset contains features and target labels, which allow the algorithm to learn over
time by minimizing the loss between predicted and actual labels. Supervised learning can be
divided into classification and regression problems.
Classification
The classification algorithm divides the dataset into various categories based on feature
extractions. The popular deep learning models are ResNet50 for image classification
and BERT (language model)) for text classification.
Regression
Instead of dividing the dataset into categories, the regression model learns the relationship
between input and output variables to predict the outcome. Regression models are commonly
used for predictive analysis, weather forecasting, and predicting stock market
performance. LSTM and RNN are popular deep learning regression models.
Unsupervised Learning
Unsupervised learning algorithms learn the pattern within an unlabeled dataset and create
clusters. Deep learning models can learn hidden patterns without human intervention and
these models are often used in recommendation engines.
Unsupervised learning is used for grouping various species, medical imaging, and market
research. The most common deep learning model for clustering is the deep embedded
clustering algorithm.
Reinforcement Learning
Reinforcement learning (RL) is a machine learning method where agents learn various
behaviors from the environment. This agent takes random actions and gets rewards. The agent
learns to achieve goals by trial and error in a complex environment without human
intervention.
Just like a baby with encouragement from its parents learns to walk, the AI learns to perform
certain tasks by maximizing rewards, and the designer sets the rewards policy. Recently, RL
has seen high demands in automation due to advancements in robotics, self-driving cars,
defeating pro players in games, and landing rockets back to earth.
Generative Adversarial Networks
Generative adversarial networks (GANs) use two neural networks, and together, they
produce synthetic instances of original data. GANs have gained a lot of popularity in recent
years as they are able to mimic some of the great artists to produce masterpieces. They are
widely used for generating synthetic art, video, music, and texts. Learn more about real work
applications at Generative Adversarial Networks Tutorial.
What is a Deep Feed-Forward Network?
Basically Deep Feed-Forward Networks (I will use the abbreviation DFN for the rest of the
article) are such neural networks which only uses input to feed forward through a function,
let’s say f*, but only through forward. There is no feedback mechanism in DFN. There are
indeed such cases when we have feedback mechanism from the output, that are
called Recurrent Neural Networks (I am also planning to write about that later).
Computational Graphs in Deep Learning
Computational graphs are a type of graph that can be used to represent mathematical
expressions. This is similar to descriptive language in the case of deep learning models,
providing a functional description of the required computation.
In general, the computational graph is a directed graph that is used for expressing and
evaluating mathematical expressions.
These can be used for two different types of calculations:
1. Forward computation
2. Backward computation
The following sections define a few key terminologies in computational graphs.
A variable is represented by a node in a graph. It could be a scalar, vector, matrix, tensor, or
even another type of variable.
A function argument and data dependency are both represented by an edge. These are
similar to node pointers.
A simple function of one or more variables is called an operation. There is a set of
operations that are permitted. Functions that are more complex than these operations in this
set can be represented by combining multiple operations.
For example, consider this : 𝑌=(𝑎+𝑏)∗(𝑏−𝑐) Y=(a+b)∗(b−c) .
For better understanding, we introduce two variables d and e such that every operation has
an output variable. We now have:
𝑑=𝑎+𝑏 d=a+b
𝑒=𝑏−𝑐 e=b−c
𝑌=𝑑∗𝑒 Y=d∗e
Here, we have three operations, addition, subtraction, and multiplication. To create a
computational graph, we create nodes, each of them has different operations along with
input variables. The direction of the array shows the direction of input being applied to
other nodes.
We can find the final output value by initializing input variables and accordingly computing
nodes of the graph.
Computational Graphs in Deep Learning
Computations of the neural network are organized in terms of a forward pass or forward
propagation step in which we compute the output of the neural network, followed by a
backward pass or backward propagation step, which we use to compute
gradients/derivatives. Computation graphs explain why it is organized this way.
If one wants to understand derivatives in a computational graph, the key is to understand
how a change in one variable brings change on the variable that depends on it. If a directly
affects c, then we want to know how it affects c. If we make a slight change in the value
of a how does c change? We can term this as the partial derivative of c with respect to a.
Graph for backpropagation to get derivatives will look something like this:
We have to follow chain rule to evaluate partial derivatives of final output variable with
respect to input variables: a, b, and c. Therefore the derivatives can be given as :
This gives us an idea of how computational graphs make it easier to get the derivatives
using backpropagation.
Types of computational graphs:
Type 1: Static Computational Graphs
Involves two phases:-
o Phase 1:- Make a plan for your architecture.
o Phase 2:- To train the model and generate predictions, feed it a lot of data.
The benefit of utilizing this graph is that it enables powerful offline graph optimization and
scheduling. As a result, they should be faster than dynamic graphs in general.
The drawback is that dealing with structured and even variable-sized data is unsightly.
Type 2: Dynamic Computational Graphs
As the forward computation is performed, the graph is implicitly defined.
This graph has the advantage of being more adaptable. The library is less intrusive and
enables interleaved graph generation and evaluation. The forward computation is
implemented in your preferred programming language, complete with all of its features and
algorithms. Debugging dynamic graphs is simple. Because it permits line-by-line execution
of the code and access to all variables, finding bugs in your code is considerably easier. If
you want to employ Deep Learning for any genuine purpose in the industry, this is a must-
have feature.
The disadvantage of employing this graph is that there is limited time for graph
optimization, and the effort may be wasted if the graph does not change.
Backpropagation in Neural Network
A neural network is a network structure, by the presence of computing units(neurons) the
neural network has gained the ability to compute the function. The neurons are connected
with the help of edges, and it is said to have an assigned activation function and also
contains the adjustable parameters. These adjustable parameters help the neural network to
determine the function that needs to be computed by the network. In terms of activation
function in neural networks, the higher the activation value is the greater the activation is.
What is backpropagation?
In machine learning, backpropagation is an effective algorithm used to train artificial neural
networks, especially in feed-forward neural networks.
Backpropagation is an iterative algorithm, that helps to minimize the cost function by
determining which weights and biases should be adjusted. During every epoch, the model
learns by adapting the weights and biases to minimize the loss by moving down toward the
gradient of the error. Thus, it involves the two most popular optimization algorithms, such
as gradient descent or stochastic gradient descent .
Computing the gradient in the backpropagation algorithm helps to minimize the cost
function and it can be implemented by using the mathematical rule called chain rule from
calculus to navigate through complex layers of the neural network.
fig(a) A simple illustration of how the backpropagation works by adjustments of weights
Advantages of Using the Backpropagation Algorithm in Neural Networks
Backpropagation, a fundamental algorithm in training neural networks, offers several
advantages that make it a preferred choice for many machine learning tasks. Here, we
discuss some key advantages of using the backpropagation algorithm:
1. Ease of Implementation: Backpropagation does not require prior knowledge of neural
networks, making it accessible to beginners. Its straightforward nature simplifies the
programming process, as it primarily involves adjusting weights based on error derivatives.
2. Simplicity and Flexibility: The algorithm’s simplicity allows it to be applied to a wide
range of problems and network architectures. Its flexibility makes it suitable for various
scenarios, from simple feedforward networks to complex recurrent or convolutional neural
networks.
3. Efficiency: Backpropagation accelerates the learning process by directly updating weights
based on the calculated error derivatives. This efficiency is particularly advantageous in
training deep neural networks, where learning features of a function can be time-consuming.
4. Generalization: Backpropagation enables neural networks to generalize well to unseen data
by iteratively adjusting weights during training. This generalization ability is crucial for
developing models that can make accurate predictions on new, unseen examples.
5. Scalability: Backpropagation scales well with the size of the dataset and the complexity of
the network. This scalability makes it suitable for large-scale machine learning tasks, where
training data and network size are significant factors.
In conclusion, the backpropagation algorithm offers several advantages that contribute to its
widespread use in training neural networks. Its ease of implementation, simplicity,
efficiency, generalization ability, and scalability make it a valuable tool for developing and
training neural network models for various machine learning applications.
Working of Backpropagation Algorithm
The Backpropagation algorithm works by two different passes, they are:
Forward pass
Backward pass
How does Forward pass work?
In forward pass, initially the input is fed into the input layer. Since the inputs are raw data,
they can be used for training our neural network.
The inputs and their corresponding weights are passed to the hidden layer. The hidden layer
performs the computation on the data it receives. If there are two hidden layers in the neural
network, for instance, consider the illustration fig(a), h1 and h2 are the two hidden layers,
and the output of h1 can be used as an input of h2. Before applying it to the activation
function, the bias is added.
To the weighted sum of inputs, the activation function is applied in the hidden layer to each
of its neurons. One such activation function that is commonly used is ReLU can also be
used, which is responsible for returning the input if it is positive otherwise it returns zero.
By doing this so, it introduces the non-linearity to our model, which enables the network to
learn the complex relationships in the data. And finally, the weighted outputs from the last
hidden layer are fed into the output to compute the final prediction, this layer can also use
the activation function called the softmax function which is responsible for converting the
weighted outputs into probabilities for each class.
The forward pass using weights and biases
How does backward pass work?
In the backward pass process shows, the error is transmitted back to the network which
helps the network, to improve its performance by learning and adjusting the internal
weights.
To find the error generated through the process of forward pass, we can use one of the most
commonly used methods called mean squared error which calculates the difference between
the predicted output and desired output. The formula for mean squared error
is: 𝑀𝑒𝑎𝑛𝑠𝑞𝑢𝑎𝑟𝑒𝑑𝑒𝑟𝑟𝑜𝑟=(𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑜𝑢𝑡𝑝𝑢𝑡–
𝑎𝑐𝑡𝑢𝑎𝑙𝑜𝑢𝑡𝑝𝑢𝑡)2Meansquarederror=(predictedoutput–actualoutput)2
Once we have done the calculation at the output layer, we then propagate the error
backward through the network, layer by layer.
The key calculation during the backward pass is determining the gradients for each weight
and bias in the network. This gradient is responsible for telling us how much each
weight/bias should be adjusted to minimize the error in the next forward pass. The chain
rule is used iteratively to calculate this gradient efficiently.
In addition to gradient calculation, the activation function also plays a crucial role in
backpropagation, it works by calculating the gradients with the help of the derivative of the
activation function.
Example of Backpropagation in Machine Learning
Let us now take an example to explain backpropagation in Machine Learning,
Assume that the neurons have the sigmoid activation function to perform forward and
backward pass on the network. And also assume that the actual output of y is 0.5 and
the learning rate is 1. Now perform the backpropagation using backpropagation
algorithm.
Example (1) of backpropagation sum
Implementing forward propagation:
Step1: Before proceeding to calculating forward propagation, we need to know the two
formulae:
𝑎𝑗=∑(𝑤𝑖,𝑗∗𝑥𝑖)aj=∑(wi,j∗xi)
Where,
aj is the weighted sum of all the inputs and weights at each node,
wi,j – represents the weights associated with the jth input to the ith neuron,
xi – represents the value of the jth input,
𝑦𝑗=𝐹(𝑎𝑗)=11+𝑒−𝑎𝑗yj=F(aj)=1+e−aj1, yi – is the output value, F denotes the activation
function [sigmoid activation function is used here), which transforms the weighted sum into
the output value.
Step 2: To compute the forward pass, we need to compute the output for y3 , y4 , and
y5.
To find the outputs of y3, y4 and y5
We start by calculating the weights and inputs by using the formula:
𝑎𝑗=∑(𝑤𝑖,𝑗∗𝑥𝑖)aj=∑(wi,j∗xi) To find y3 , we need to consider its incoming edges along with
its weight and input. Here the incoming edges are from X1 and X2.
At h1 node,
𝑎1=(𝑤1,1𝑥1)+(𝑤2,1𝑥2)=(0.2∗0.35)+(0.2∗0.7)=0.21a1=(w1,1x1)+(w2,1x2
)=(0.2∗0.35)+(0.2∗0.7)=0.21
Once, we calculated the a1 value, we can now proceed to find the y3 value:
𝑦𝑗=𝐹(𝑎𝑗)=11+𝑒−𝑎𝑗yj=F(aj)=1+e−aj1
𝑦3=𝐹(0.21)=11+𝑒−0.21y3=F(0.21)=1+e−0.211
𝑦3=0.56y3=0.56
Similarly find the values of y4 at h2 and y5 at O3 ,
𝑎2=(𝑤1,2∗𝑥1)+(𝑤2,2∗𝑥2)=(0.3∗0.35)+(0.3∗0.7)=0.315a2=(w1,2∗x1)+(w2,2∗x2
)=(0.3∗0.35)+(0.3∗0.7)=0.315
𝑦4=𝐹(0.315)=11+𝑒−0.315y4=F(0.315)=1+e−0.3151
𝑎3=(𝑤1,3∗𝑦3)+(𝑤2,3∗𝑦4)=(0.3∗0.57)+(0.9∗0.59)=0.702a3=(w1,3∗y3)+(w2,3∗y4
)=(0.3∗0.57)+(0.9∗0.59)=0.702
𝑦5=𝐹(0.702)=11+𝑒−0.702=0.67y5=F(0.702)=1+e−0.7021=0.67
Values of y3, y4 and y5
Note that, our actual output is 0.5 but we obtained 0.67. To calculate the error, we can use
the below formula:
𝐸𝑟𝑟𝑜𝑟𝑗=𝑦𝑡𝑎𝑟𝑔𝑒𝑡–𝑦5Errorj=ytarget–y5
Error = 0.5 – 0.67
= -0.17
Using this error value, we will be backpropagating.
Implementing Backward Propagation
Each weight in the network is changed by,
∇wij = η ?j Oj
?j = Oj (1-Oj)(tj - Oj) (if j is an output unit)
?j = Oj (1-O)∑k ?k wkj (if j is a hidden unit)
where ,
η is the constant which is considered as learning rate,
tj is the correct output for unit j
?j is the error measure for unit j
Step 3: To calculate the backpropagation, we need to start from the output unit:
To compute the ?5, we need to use the output of forward pass,
?5 = y5(1-y5) (ytarget -y5)
= 0.67(1-0.67) (-0.17)
= -0.0376
For hidden unit,
To compute the hidden unit, we will take the value of ?5
?3 = y3(1-y3) (w1,3 * ?5)
=0.56(1-0.56) (0.3*-0.0376)
=-0.0027
?4 = y4 (1-y5) (w2,3 * ?5)
=0.59(1-0.59) (0.9*-0.0376)
=-0.0819
Step 4: We need to update the weights, from output unit to hidden unit,
∇ wj,i = η ?j Oj
Note- Here our learning rate is 1
∇ w2,3 = η ?5 O4
= 1 * (-0.376) * 0.59
= -0.22184
We will be updating the weights based on the old weight of the network,
w2,3(new) = ∇ w4,5 + w4,5 (old)
= -0.22184 + 0.9
= 0.67816
From hidden unit to input unit,
For an hidden to input node, we need to do calculations by the following;
∇ w1,1 = η ?3 O4
= 1 * (-0.0027) * 0.35
= 0.000945
Similarly, we need to calculate the new weight value using the old one:
w1,1(new) = ∇ w1,1+ w1,1 (old)
= 0.000945 + 0.2
= 0.200945
Similarly, we update the weights of the other neurons: The new weights are mentioned
below
w1,2 (new) = 0.271335
w1,3 (new) = 0.08567
w2,1 (new) = 0.29811
w2,2 (new) = 0.24267
The updated weights are illustrated below,
Through backward pass the weights are updated
Once, the above process is done, we again perform the forward pass to find if we obtain the
actual output as 0.5.
While performing the forward pass again, we obtain the following values:
y3 = 0.57
y4 = 0.56
y5 = 0.61
We can clearly see that our y5 value is 0.61 which is not an expected actual output, So again
we need to find the error and backpropagate through the network by updating the weights
until the actual output is obtained.
𝐸𝑟𝑟𝑜𝑟=𝑦𝑡𝑎𝑟𝑔𝑒𝑡–𝑦5Error=ytarget–y5
= 0.5 – 0.61
= -0.11
This is how the backpropagate works, it will be performing the forward pass first to see if
we obtain the actual output, if not we will be finding the error rate and then
backpropagating backwards through the layers in the network by adjusting the weights
according to the error rate. This process is said to be continued until the actual output is
gained by the neural network.
What are recurrent neural networks?
A recurrent neural network (RNN) is a type of artificial neural network which uses
sequential data or time series data. These deep learning algorithms are commonly used for
ordinal or temporal problems, such as language translation, natural language processing (nlp),
speech recognition, and image captioning; they are incorporated into popular applications
such as Siri, voice search, and Google Translate. Like feedforward and convolutional neural
networks (CNNs), recurrent neural networks utilize training data to learn. They are
distinguished by their “memory” as they take information from prior inputs to influence the
current input and output. While traditional deep neural networks assume that inputs and
outputs are independent of each other, the output of recurrent neural networks depend on the
prior elements within the sequence. While future events would also be helpful in determining
the output of a given sequence, unidirectional recurrent neural networks cannot account for
these events in their predictions.
Let’s take an idiom, such as “feeling under the weather”, which is commonly used when
someone is ill, to aid us in the explanation of RNNs. In order for the idiom to make sense, it
needs to be expressed in that specific order. As a result, recurrent networks need to account
for the position of each word in the idiom and they use that information to predict the next
word in the sequence.
Another distinguishing characteristic of recurrent networks is that they share parameters
across each layer of the network. While feedforward networks have different weights across
each node, recurrent neural networks share the same weight parameter within each layer of
the network. That said, these weights are still adjusted in the through the processes of
backpropagation and gradient descent to facilitate reinforcement learning.
Recurrent neural networks leverage backpropagation through time (BPTT) algorithm to
determine the gradients, which is slightly different from traditional backpropagation as it is
specific to sequence data. The principles of BPTT are the same as traditional
backpropagation, where the model trains itself by calculating errors from its output layer to its
input layer. These calculations allow us to adjust and fit the parameters of the model
appropriately. BPTT differs from the traditional approach in that BPTT sums errors at each
time step whereas feedforward networks do not need to sum errors as they do not share
parameters across each layer.
Through this process, RNNs tend to run into two problems, known as exploding gradients and
vanishing gradients. These issues are defined by the size of the gradient, which is the slope of
the loss function along the error curve. When the gradient is too small, it continues to become
smaller, updating the weight parameters until they become insignificant—i.e. 0. When that
occurs, the algorithm is no longer learning. Exploding gradients occur when the gradient is
too large, creating an unstable model. In this case, the model weights will grow too large, and
they will eventually be represented as NaN. One solution to these issues is to reduce the
number of hidden layers within the neural network, eliminating some of the complexity in the
RNN model.
Variant RNN architectures
Bidirectional recurrent neural networks (BRNN): These are a variant network architecture
of RNNs. While unidirectional RNNs can only drawn from previous inputs to make
predictions about the current state, bidirectional RNNs pull in future data to improve the
accuracy of it. If we return to the example of “feeling under the weather” earlier in this article,
the model can better predict that the second word in that phrase is “under” if it knew that the
last word in the sequence is “weather.”
Long short-term memory (LSTM): This is a popular RNN architecture, which was
introduced by Sepp Hochreiter and Juergen Schmidhuber as a solution to vanishing gradient
problem. In their paper (link resides outside ibm.com), they work to address the problem of
long-term dependencies. That is, if the previous state that is influencing the current prediction
is not in the recent past, the RNN model may not be able to accurately predict the current
state. As an example, let’s say we wanted to predict the italicized words in following, “Alice
is allergic to nuts. She can’t eat peanut butter.” The context of a nut allergy can help us
anticipate that the food that cannot be eaten contains nuts. However, if that context was a few
sentences prior, then it would make it difficult, or even impossible, for the RNN to connect
the information. To remedy this, LSTMs have “cells” in the hidden layers of the neural
network, which have three gates–an input gate, an output gate, and a forget gate. These gates
control the flow of information which is needed to predict the output in the network. For
example, if gender pronouns, such as “she”, was repeated multiple times in prior sentences,
you may exclude that from the cell state.
Gated recurrent units (GRUs): This RNN variant is similar the LSTMs as it also works to
address the short-term memory problem of RNN models. Instead of using a “cell state”
regulate information, it uses hidden states, and instead of three gates, it has two—a reset gate
and an update gate. Similar to the gates within LSTMs, the reset and update gates control how
much and which information to retain.
Restricted Boltzmann Machine
Restricted Boltzmann Machine (RBM) is a type of artificial neural network that is used for
unsupervised learning. It is a type of generative model that is capable of learning a
probability distribution over a set of input data.
RBM was introduced in the mid-2000s by Hinton and Salakhutdinov as a way to address the
problem of unsupervised learning. It is a type of neural network that consists of two layers
of neurons – a visible layer and a hidden layer. The visible layer represents the input data,
while the hidden layer represents a set of features that are learned by the network.
The RBM is called “restricted” because the connections between the neurons in the same
layer are not allowed. In other words, each neuron in the visible layer is only connected to
neurons in the hidden layer, and vice versa. This allows the RBM to learn a compressed
representation of the input data by reducing the dimensionality of the input.
The RBM is trained using a process called contrastive divergence, which is a variant of the
stochastic gradient descent algorithm. During training, the network adjusts the weights of
the connections between the neurons in order to maximize the likelihood of the training
data. Once the RBM is trained, it can be used to generate new samples from the learned
probability distribution.
RBM has found applications in a wide range of fields, including computer vision, natural
language processing, and speech recognition. It has also been used in combination with
other neural network architectures, such as deep belief networks and deep neural networks,
to improve their performance.
What are Boltzmann Machines?
It is a network of neurons in which all the neurons are connected to each other. In this
machine, there are two layers named visible layer or input layer and hidden layer. The
visible layer is denoted as v and the hidden layer is denoted as the h. In Boltzmann machine,
there is no output layer. Boltzmann machines are random and generative neural networks
capable of learning internal representations and are able to represent and (given enough
time) solve tough combinatoric problems.
The Boltzmann distribution (also known as Gibbs Distribution) which is an integral part of
Statistical Mechanics and also explain the impact of parameters like Entropy and
Temperature on the Quantum States in Thermodynamics. Due to this, it is also known
as Energy-Based Models (EBM). It was invented in 1985 by Geoffrey Hinton, then a
Professor at Carnegie Mellon University, and Terry Sejnowski, then a Professor at Johns
Hopkins University
What are Restricted Boltzmann Machines (RBM)?
A restricted term refers to that we are not allowed to connect the same type layer to each
other. In other words, the two neurons of the input layer or hidden layer can’t connect to
each other. Although the hidden layer and visible layer can be connected to each other.
As in this machine, there is no output layer so the question arises how we are going to
identify, adjust the weights and how to measure the that our prediction is accurate or not.
All the questions have one answer, that is Restricted Boltzmann Machine.
The RBM algorithm was proposed by Geoffrey Hinton (2007), which learns probability
distribution over its sample training data inputs. It has seen wide applications in different
areas of supervised/unsupervised machine learning such as feature learning, dimensionality
reduction, classification, collaborative filtering, and topic modeling.
Consider the example movie rating discussed in the recommender system section.
Movies like Avengers, Avatar, and Interstellar have strong associations with the latest
fantasy and science fiction factor. Based on the user rating RBM will discover latent factors
that can explain the activation of movie choices. In short, RBM describes variability among
correlated variables of input dataset in terms of a potentially lower number of unobserved
variables.
The energy function is given by
Applications of Restricted Boltzmann Machine
Restricted Boltzmann Machines (RBMs) have found numerous applications in various
fields, some of which are:
Collaborative filtering: RBMs are widely used in collaborative filtering for recommender
systems. They learn to predict user preferences based on their past behavior and recommend
items that are likely to be of interest to the user.
Image and video processing: RBMs can be used for image and video processing tasks such
as object recognition, image denoising, and image reconstruction. They can also be used for
tasks such as video segmentation and tracking.
Natural language processing: RBMs can be used for natural language processing tasks
such as language modeling, text classification, and sentiment analysis. They can also be
used for tasks such as speech recognition and speech synthesis.
Bioinformatics: RBMs have found applications in bioinformatics for tasks such as protein
structure prediction, gene expression analysis, and drug discovery.
Financial modeling: RBMs can be used for financial modeling tasks such as predicting
stock prices, risk analysis, and portfolio optimization.
Anomaly detection: RBMs can be used for anomaly detection tasks such as fraud detection
in financial transactions, network intrusion detection, and medical diagnosis.
It is used in Filtering.
It is used in Feature Learning.
It is used in Classification.
It is used in Risk Detection.
It is used in Business and Economic analysis.
How do Restricted Boltzmann Machines work?
In RBM there are two phases through which the entire RBM works:
1st Phase: In this phase, we take the input layer and using the concept of weights and
biased we are going to activate the hidden layer. This process is said to be Feed Forward
Pass. In Feed Forward Pass we are identifying the positive association and negative
association.
Feed Forward Equation:
Positive Association — When the association between the visible unit and the hidden unit
is positive.
Negative Association — When the association between the visible unit and the hidden unit
is negative.
2nd Phase: As we don’t have any output layer. Instead of calculating the output layer, we
are reconstructing the input layer through the activated hidden state. This process is said to
be Feed Backward Pass. We are just backtracking the input layer through the activated
hidden neurons. After performing this we have reconstructed Input through the activated
hidden state. So, we can calculate the error and adjust weight in this way:
Feed Backward Equation:
Error = Reconstructed Input Layer-Actual Input layer
Adjust Weight = Input*error*learning rate (0.1)
After doing all the steps we get the pattern that is responsible to activate the hidden neurons.
To understand how it works:
Let us consider an example in which we have some assumption that V1 visible unit
activates the h1 and h2 hidden unit and V2 visible unit activates the h2 and h3 hidden. Now
when any new visible unit let V5 has come into the machine and it also activates the h1 and
h2 unit. So, we can back trace the hidden units easily and also identify that the
characteristics of the new V5 neuron is matching with that of V1. This is because V1 also
activated the same hidden unit earlier.
Restricted Boltzmann Machines
Types of RBM :
There are mainly two types of Restricted Boltzmann Machine (RBM) based on the types of
variables they use:
1. Binary RBM: In a binary RBM, the input and hidden units are binary variables. Binary
RBMs are often used in modeling binary data such as images or text.
2. Gaussian RBM: In a Gaussian RBM, the input and hidden units are continuous variables
that follow a Gaussian distribution. Gaussian RBMs are often used in modeling continuous
data such as audio signals or sensor data.
Apart from these two types, there are also variations of RBMs such as:
1. Deep Belief Network (DBN): A DBN is a type of generative model that consists of
multiple layers of RBMs. DBNs are often used in modeling high-dimensional data such as
images or videos.
2. Convolutional RBM (CRBM): A CRBM is a type of RBM that is designed specifically for
processing images or other grid-like structures. In a CRBM, the connections between the
input and hidden units are local and shared, which makes it possible to capture spatial
relationships between the input units.
3. Temporal RBM (TRBM): A TRBM is a type of RBM that is designed for processing
temporal data such as time series or video frames. In a TRBM, the hidden units are
connected across time steps, which allows the network to model temporal dependencies in
the data.
Course File (Part B)
Table of Contents
Sr. Page
No. Particulars No
1 Vision and Mission of Department
2 PEO, PO and PSO
3 CO PO Mappings
4 Syllabus Mapped with CO
5 MST question papers
6 MST questions mapping with COs
7 MST's results
8 Micro level analysis of MST’s result
9 05 representative Answer Books of MST’s
Report on action taken on MST results such as change in teaching
10 pedagogy if any.
11 Assignments / alternate method of assessment such as Project work.
12 Assignments mapping with COs
13 Assignments Result
14 Course University Results Analysis of last 03 years
15 Course Assessment Report
Department of Computer Science & Engineering
Departmental Vision
To be the department of choice for students opting for computer science engineering education.
Departmental Mission
To prepare quality computer science professionals, who depending upon their choice will be
readily employable by the industry, or will venture for higher studies or entrepreneurship.
Department of Computer Science & Engineering
PROGRAM EDUCATION OBJECTIVES (PEO)
PEO1: To impart exhaustive knowledge of Computer Science & Engineering, applied sciences
and humanities as well as management abilities.
PEO2: To enable students to understand, analyze and solve real life problems in Computer
Science & Engineering through hands-on practice in laboratories.
PEO3: To promote collaborative learning and application development through multidisciplinary
projects and professional ethics.
PEO4: To expose students to industrial environment through meaningful internships.
PEO5: To inculcate communication skills and leadership abilities among students.
PROGRAMME OUTCOMES (PO)
1. Engineering Knowledge: An ability to apply knowledge of computing and mathematics
appropriate to information technology.
2. Problem Identification and Analysis: An ability to analyze a problem, identify and define
the computing requirements appropriate to its solution.
3. Design/Development of Solutions: An ability to design, implement and evaluate a
computer- based system, process, component, or program to meet desired needs.
4. Investigation of complex problems: An ability to indulge in research and methods to
design new experiments, analyse and interpret data and apply results to improve the
processes.
5. Usage of modern tools: An ability to use current techniques, skills, and tools necessary for
computing practice.
6. Engineer and Society Relationship: An ability to analyze the local and global impact of
computing on individuals, organizations, and society.
7. Environment and Sustainability: Analyze the local and global impact of computing
solutions on individuals, organizations and society and apply obtained knowledge for
sustainability.
8. Ethics and Responsibilities: An understanding of professional, ethical, legal, security,
social, political, and economic issues and responsibilities.
9. Role of Engineer as an Individual and Team: An ability to function effectively as a
member or leader of a technical team.
10. Communication and Presentation: An ability to communicate effectively with a range of
audiences.
11. Functionality of Engineering and Management Principles: An ability to apply the
knowledge of engineering and management principles to effectively manage projects in
diverse environments as a member/leader in the team.
12. Life-Long Learning: An understanding of the need for and an ability to engage in self-
directed continuing professional development.
PROGRAM SPECIFIC OUTCOMES (PSO)
PSO1. Students should be able to understand the principles, concepts, knowledge gained
during the course of the program to analyze, specify, design, develop, test and maintain real
life engineering problems/applications relating to industry/research/education work using
appropriate data structures, algorithms and latest technologies.
PSO2. Students should be able to apply professional engineering practices, software
engineering practises, innovative ideas, knowledge of ethical and management principles
required to lead or work in a team to deliver efficient solutions as per the requirements of the
customers.
Syllabus mapped with CO
UNITS TO CO MAPPING
UNIT CO1 CO2 CO3 CO4 CO5
1 1 0 0 0 0
II 0 1 0 0 0
III 0 0 1 0 0
IV 0 0 0 1 0
V 0 0 0 0 1
MST Question Paper
MST Question Paper with Cos
MST questions mapping with COs
S.No. Question List Marks CO1 CO2 CO3 CO4 *BLOOM's Taxonomy
1 Ques 1 2
2 Ques 2 2
3 Ques 3 2
4 Ques 4 2
5 Ques 5 4
6 Ques 6 4
7 Ques 7 4
8 Ques 8 8
9 Ques 9 8
MST 1 Marks
Section Name 21iNurture2 & 4
Program Name B. Tech. - CSE (AIML & DS)
Semester 7th
Deep Learning
S. No. Roll No. Name Milandeep Kour
24
10
11
12
13
14
15
Average Marks of Class
Maximum Marks of Class
Minimum Marks of Class
No:of Students scoring <40%(<8)
No:of Students scoring 40-60%(8 to <15)
No:of Students scoring 60-75%(15 to <18)
No:of Students scoring >75% ((>18)
No:of Students Absent
>20% Failure Rate
ASSIGNMENT QUESTIONS
Practical Questions
1. Implementing a Custom Layer: Implement a custom neural network layer in a deep learning
framework of your choice (e.g., TensorFlow, PyTorch). Explain the functionality of this layer
and provide an example of how it can be used in a neural network architecture.
2. GAN Training Stability: Train a Generative Adversarial Network (GAN) on a dataset of
your choice. Describe the challenges you faced in training the GAN, including any stability
issues. What techniques did you use to stabilize the training process?
3. Autoencoders: Build an autoencoder for image compression using a deep learning
framework. Evaluate the performance of your autoencoder in terms of compression ratio and
reconstruction quality. Discuss the trade-offs between the depth of the network and the quality
of the compressed images.
4. Hyperparameter Tuning: Choose a deep learning model and a corresponding dataset.
Perform hyperparameter tuning using techniques such as grid search, random search, or
Bayesian optimization. Report on the process and findings, including which hyperparameters
had the most significant impact on performance.
5. Sequence-to-Sequence Models: Implement a sequence-to-sequence model for a language
translation task. Discuss the challenges associated with training such models and how
attention mechanisms can help improve their performance. Provide an evaluation of your
model's performance on a test set.
Theoretical Questions
1. Bias-Variance Tradeoff: Explain the bias-variance tradeoff in the context of deep learning.
How does model complexity affect bias and variance? Provide a detailed analysis of this
tradeoff using mathematical formulations and graphical representations.
2. Activation Functions: Discuss the role of activation functions in deep learning. Compare and
contrast different activation functions such as ReLU, sigmoid, tanh, and their variants.
Provide examples of scenarios where one activation function might be more appropriate than
others.
3. Convergence and Initialization: Analyze the impact of weight initialization on the
convergence of deep learning models. Discuss different weight initialization techniques, such
as Xavier and He initialization. Provide experimental results to support your analysis.
4. Loss Functions: Discuss the importance of choosing the right loss function for a given task in
deep learning. Compare different loss functions used for classification and regression tasks.
Explain how imbalanced datasets can affect the choice of loss function and model
performance.
5. Regularization Techniques: Examine various regularization techniques used in deep
learning, such as L1/L2 regularization, dropout, batch normalization, and data augmentation.
Discuss the theoretical underpinnings of these techniques and provide examples of how they
improve model generalization.
Assignment Mapping CO & BLOOM's Taxonomy
Branch/ Course: B. Tech CSE (AIML & DS) Semester: 7th
Assignment No.:
S.No. Question List Marks CO1 CO2 CO3 CO4 CO5 *BLOOM's Taxonomy
1 Ques 01 30
2 Ques 02 30
3 Ques 03 30
Assignments Result
Marks: Assignment 1
Section Name 21iNurture2 & 4
Program Name B. Tech. - CSE (AIML & DS)
Semester 6th
Deep Learning
S. No. Roll No. Name
Milandeep Kour Bali
30