KEMBAR78
Deep Learning Notes | PDF
0% found this document useful (0 votes)
38 views58 pages

Deep Learning Notes

Deep learning notes

Uploaded by

Ajay Thakare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
38 views58 pages

Deep Learning Notes

Deep learning notes

Uploaded by

Ajay Thakare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 58
Machine Learning Gm \ Input Foature extraction Classification ‘Output input Feature extraction * Classification Output What is Deep Learning? Deep learning is a machine learning technique that learns feature and task directly from data awhere data may be image,text,or sound. Deep learning is a method in artificial intelligence (Al) that teaches computers to process data in a way that is inspired by the human brain. What are the uses of deep learning? DL use in Automotive, aerospace, manufacturing, electronics, medical research, and other fields. © Self-driving cars use deep learning models to automatically detect road signs and pedestrians. © Defense systems use deep learning . * Medical image analysis - uses deep learning to automatically detect cancer cells for medical diagnosis. * Factories use deep learning applications to automatically detect when people or objects are within an unsafe distance of machines. How does deep learning work? Deep learning algorithms are neural networks that are modeled after the human brain. For example, a human brain contains millions of interconnected neurons that work together to learn and process information. Similarly, deep learning neural networks, or artificial neural networks, are made of many layers of artificial neurons that work together inside the computer. Artificial neurons are software modules called nodes, which use mathematical calculations to process data. Artificial neural networks are deep learning algorithms that use these nodes to solve complex problems. What are the components of a deep learning network :- Input layer :- An artificial neural network has several nodes that input data into it. * Hidden layer :- The input layer processes and passes the data to layers further in the neural network. These hidden layers process information at different levels . © Output layer :- The output layer consists of the nodes that output the data. Taput Laver Hidden Layer #i| [Hidden Layer #2 ‘Output Layer What is deep learning in the context of machine leart iB machine learning methods require significant human effort to train the software. Example, in animal image recognition Manually label hundreds of thousands of animal images. Make the machine learning algorithms process those images. Test those algorithms on a set of unknown images. Identify why some results are inaccurate. Improve the dataset by labeling new images to improve result accuracy. This process is called supervised learning. What are the benefits of deep learning over machine learning : Machine learning methods find unstructured data, such as text documents, challenging to process because the training dataset can have infinite variations. On the other hand, deep learning models can comprehend unstructured data and make general observations without manual feature extraction. For Example , a neural network can recognize that these two different input sentences have the same meaning: * Can you tell me how to make the payment? © How do | transfer money? L.M.Kuwar A Brief History of Deep Learning :- Deep Learning, is a more evolved branch of machine learning, and uses layers of algorithms to process data, and imitate the thinking process, or to develop abstractions. It is often used to visually recognize objects and understand human speech. Information is passed through each layer, with the output of the previous layer providing input for the next layer. The first layer in a network is called the input layer, while the last is called an output layer. All the layers between input and output are referred to as hidden layers. Each layer is typically a simple, uniform algorithm containing one kind of activation function. Feature extraction is another aspect of deep learning. It is used for pattern recognition and image processing. Feature extraction uses an algorithm to automatically construct meaningful “features” of the data for purposes of training, learning, and understanding. Normally a data scientist, or a programmer, is responsible for feature extraction. The history of deep learning can be traced back to 1943, when Walter Pitts and Warren McCulloch created a computer model based on the neural networks of the human brain. They used a combination of algorithms and mathematics they called “threshold logic” to mimic the thought process. Since that time, Deep Learning has evolved steadily, with only two significant breaks in its development. 1, 1943: Warren McCulloch and Walter Pitts created a computer model based on the neural networks of the human brain, laying the groundwork for neural network theory1. 2. 1958: Frank Rosenblatt developed the Perceptron, an early type of artificial neural network that could learn to recognize patterns. 3. 1960s: The basics of backpropagation, a method for training neural networks, were developed by Henry J. Kelley and later refined by others. 4. 1970s: Kunihiko Fukushima introduced the first convolutional neural network (CNN), which is now widely used in image recognition. 5. 1980s: The term “deep learning” was introduced by Rina Dechter in 1986, and significant advancements were made in neural network training techniques . 6. 2006: Geoffrey Hinton and his team made breakthroughs in training deep neural networks, leading to a resurgence in interest and research in deep learning . 7. 2012: The success of deep learning was highlighted when a deep neural network won the ImageNet competition, significantly outperforming other methods in image recognition . 8. 2014-Present: Deep learning has continued to evolve, with advancements in architectures like Generative Adversarial Networks (GANs), transformers, and applications across various fields such as natural language processing, medical imaging, and autonomous driving . Application of Deep Learning 1. Healthcare: Deep learning is used for disease diagnosis, medical imaging, and personalized treatment plans. For example, it helps in detecting anomalies in X-rays and MRIs. 2. Automotive: Self-driving cars rely heavily on deep learning to process data from sensors and cameras to navigate and make decisions . 3. Finance: It is used for fraud detection, risk management, and algorithmic trading by analyzing large datasets to identify patterns and anomalies. 4. Retail: Deep learning enhances customer experience through personalized recommendations, inventory management, and demand forecasting. 5. Natural Language Processing (NLP): Applications include virtual assistants like Siri and Alexa, language translation, and sentiment analysi 6. Entertainment: It powers recommendation systems for streaming services like Netflix and Spotify, and is used in creating realistic animations and special effects2. 7. Cybersecurity: Deep learning helps in detecting malware, phishing attacks, and other cybersecurity threats by analyzing network traffic and user behavior1 8. Manufacturing: It is used for predictive maintenance, quality control, and optimizing supply chains. 9. Robotics: Deep learning enables robots to perform complex tasks such as object recognition, path planning, and autonomous navigation. 10. Agriculture: Applications include crop monitoring, soil analysis, and predicting yields to improve farming practices. 11 Weather Forecasting :-Colorful Clouds is using GPU computing and Al to process, predict, and communicate weather and air-quality conditions quickly through a new age which generating forecasting and reporting. McCulloch Pitts Neuron :- It is very well known that the most fundamental unit of deep neural networks is called an artificial neuron/perceptron. But the very first step towards the perceptron we use today was taken in 1943 by Warren McCulloch and Walter Pitts , by mimicking the functionality of a biological neuron. L.M.Kuwar The McCulloch-Pitts neuron is one of the earliest mathematical models of a biological neuron, proposed by in 1943. Biological Neurons: An Overly Simplified Illustration Dendrites A Biological Neuron — Dendrite: Receives signals from other neurons Soma: Processes the information Axon: Transmits the output of this neuron Synapse: Point of connection to other neurons Basically, a neuron takes an input signal (dendrite), processes it like the CPU (soma), passes the output through a cable like structure to other connected neurons (axon to synapse to other neuron’s dendrite). Now, this might be biologically inaccurate as there is a lot more going on out there but on a higher level, this is what is going on with a neuron in our brain — takes an input, processes it, throws out an output. Our sense organs interact with the outer world and send the visual and sound information to the neurons. Let's say you are watching Friends. Now the information your brain receives is taken in by the “laugh or not” set of neurons that will help you make a decision on whether to laugh or not. Each neuron gets fired/activated only when its respective criteria (more on this later) is met like shown below. one tome . x 4. Basic Structure: The McCulloch-Pitts neuron model is a simplified version of a biological neuron and consists of: ‘© Inputs (dendrites): The model receives multiple binary inputs, each representing a signal. © Weights: Each input has a corresponding weight, which determines the strength or importance of that input. © Summation Function: The inputs are summed together, and the result is compared toa threshold. * Activation (or Output): If the sum exceeds a certain threshold, the neuron “fires” and produces an output of 1 (active), otherwise, it produces an output of 0 (inactive). 2. Mathematical Model: The behavior of the McCulloch-Pitts neuron can be described by the following equation: Where: © yis the output (1 or 0), © rare the weights associated with each input X;, © Gis the threshold value, © f(z) is the activation function, which is typically a step function: © Ifthe weighted sum exceeds the threshold 8, the output y=1, © Ifthe weighted sum is below the threshold, the output y=0. 3. Threshold and Activation: Threshold: A predefined value that the sum of the weighted inputs is compared against. © Step Function: The activation function is usually a simple step function that outputs either 0 or 1, depending on whether the summed input exceeds the threshold. 4. Properties: © Binary Inputs and Outputs: The McCulloch-Pitts neuron works with binary values (either 0 or 1). * Linear Decision Boundary: The neuron makes decisions based on a linear combination of inputs and outputs a binary value. This means it can only solve problems that are linearly separable, like the logical operations AND, OR, and NOT. 5, Logical Operations: McCulloch and Pitts demonstrated that this simple model could perform logical operations. For example: AND Operation: The neuron will only fire (output 1) if all inputs are 1 © OR Operation: The neuron will ire if at least one input is 1. © NOT Operation: The output is the inverse of the input. The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types of inputs — Excitatory and Inhibitory. The excitatory inputs have weights of positive magnitude and the inhibitory weights have weights of negative magnitude. The inputs of the McCulloch-Pitts neuron could be either 0 or 1. It has a threshold function as an activation function. So, the output signal yout is 1 if the input ysum is greater than or equal to a given threshold value, else 0. The diagrammatic representation of the model is as follows: Input Signals x, | | H ‘Summation Junction McCuttoch-Pitts Model For better understanding purpose, let me consider an example: John carries an umbrella if it is sunny o1 1g. There are four given situations. | need to decide when John will carry the umbrella. The situations are as follows: © First scenario: It is not raining, nor itis sunny Second scenario: It is not raining, but it is sunny © Third scenario: It is raining, and it is not sunny © Fourth scenario: It is raining as well as it is sunny To analyse the situations using the McCulloch-Pitts neural model, | can consider the input signals as follows: © X1: [sit raining? © x2: Is it sunny? So, the value of both scenarios can be either 0 or 1. use the value of both weights X1 and X2 as 1 rai and a threshold function as 1. So, the neural network model will look like: Threshold Function L.M.Kuwar Truth Table for this case will be: Situatiion x1 x2 Ysum Yout 1 oO oO 0 0 2 oO 1 1 1 3 1 Oo 1 1 4 1 1 2 1 So, I can say that, Youm =27jn1W% Your=FlYsum)= 1 X>=1 0 x<1 The truth table built with respect to the problem is depicted above. From the truth table, | can conclude that in the situations where the value of yout is 1, John needs to carry an umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4. Perceptron: APerceptron is an Artificial Neuron Itis the simplest possible Neural Network Neural Networks are the building blocks of Deep Learning Frank Rosenblatt Frank Rosenblatt (1928 - 1971) was an American psychologist notable in the field of Artificial Intelligence. In 1957 he started something really big. He "invented" a Perceptron program, on an IBM 704 computer at Cornell Aeronautical Laboratory. Scientists had discovered that brain cells (Neurons) receive input from our senses by electrical signals. The Neurons, then again, use electrical signals to store information, and to make decisions based on previous input. Frank had the idea that Perceptrons could simulate brain principles, with the ability to learn and make decisions. The Perceptron The Perceptron is one of the simplest types of artificial neural networks and serves as the foundation for more complex neural network architectures. It was introduced by Frank Rosenblatt in 1958 and is primarily used for binary classification tasks. Structure of the Perceptron Ht Zwerb2o ‘Aix: gq w AZ wxabeo) w ‘Summation Inputs Weights ana'Bias Activation Output A perceptron consists of: Inputs (;,x....%,) Features of the data. Weights (w,,Ws,...¥,): Parameters that determine the importance of each input. Bias (b): An offset that allows the decision boundary to shift. Activation Function: Determines the output of the perceptron, typically a step function in the original formulation. aYNS The perceptron computes the weighted sum of inputs and applies the activation function: ot Working of the Perceptron «Input Data: The perceptron takes input values (x1,x2,...xn). 2. Weighted Sum: it calculates a weighted sum using the formula: 3. Activation Function: The step function outputs: © 1ifz20 o Oifz<0 4, Output: The result is a binary classification (e.g., true/false, positive/negative). Learning in the Perceptron The perceptron learns by adjusting its weights and bias using a learning algorithm: 1. Start with random weights and bias. 2. For each training sample: © Compute the perceptron’s output. © Compare it to the true label (target output). © Update the weights and bias using the perceptron learning rule: wicw/tdw, where Aw=n(ytrue-ypred)x, Here: =n: Learning rate (a small positive constant). = ytrue: True label. = ypredy: Predicted label. L.M.Kuwar 3. Repeat until the weights converge or a stopping criterion is met. Perceptron Learning Algorithm Repeat Wilt) =Wj +n dy» Perceptron Learning Algorithm To get started, Ill explain a type of artificial neuron called a perceptron, oe e 6 os 4 Function chematic for a neuron in a neural net Activation Function Applications itcan be used to implement Logic Gates, > 4d > = Gooo =o=oy ato < =on-og =ooo< Applications itcan be used to implement Logie Gates, Xi] X2 olo o|1 1] 0 xa] x2] ¥ 14 | o}o of1y4 r}o}4 aya}4 Applications It can be used to implement Logic Gates. x x x & 0 0 1 1 Applications Its used to classify any linearly separable set of inputs. ar ot of aft =a Perceptron Example Imagine a perceptron (in your brain). The perceptron tries to decide if you should go to a concert. Is the artist good? Is the weather good? What weights should these facts have? Criteria Input Weight Artists is Good xl=Oorl wl = 0.7 Weather is Good x2=Oor1 w2 = 0.6 Friend will Come x3 =Oorl w3 =0.5 Food is Served x4=Oorl w4 = 0.3 Water is Served xS =Oorl w5 = 0.4 The Perceptron Algorithm :- Frank Rosenblatt suggested this algorithm: 1. Set a threshold value 2. Multiply all inputs with its weights 3. Sum all the results 4. Activate the output . Set a threshold value: ¢ Threshold = 1.5 2. Multiply all inputs with its weights: xl * wl =1*0.7=0.7 x2 *w2=0*0.6 x3 *w3 =1*0.5 x4 * w4 *0.3 eo xS*w5=1*0.4=0.4 Lal Sum all the results: © 0.7+0+0.5 +0 + 0.4 = 1.6 (The Weighted Sum). . Activate the Output: * Return true if the sum > 1.5 ("Yes I will go to the Concert"). s Multi-layer Perceptron : History:- Deep Learning deals with training multi-layer artificial neural networks, also called Deep Neural Networks. After Rosenblatt perceptron was developed in the 1950s, there was a lack of interest in neural networks until 1986, when Dr.Hinton and his colleagues developed the backpropagation algorithm to train a multilayer neural network. Today it is a topic with many leading firms like Google, Facebook, and Microsoft which invest heavily in applications using deep neural networks. Multi-layer perception is also known as MLP. It is fully connected dense layers, which transform any input dimension to the desired dimension. A multi-layer perception is a neural network that has multiple layers. To create a neural network we combine neurons together so that the outputs of some neurons are inputs of other neurons. A multi-layer perceptron has one input layer and for each input, there is one neuron(or node), it has one output layer with a single node for each output and it can have any number of hidden layers and each hidden layer can have any L.M.Kuwar number of nodes. A schematic diagram of a Multi-Layer Perceptron (MLP) is depicted below. Outputs inputs Output Input Hidden Laver Layer ane Architecture of an MLP 1. Input Layer: © This is the first layer of the network. © Each neuron corresponds to one feature of the input data. © No computations are performed here; it merely passes the input to the next layer. 2. Hidden Layers: © Located between the input and output layers. © Each neuron in a hidden layer computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function (e.g., ReLU, Sigmoid, Tanh) © These layers enable the network to learn complex, non-linear relationships in the data 3. Output Layer © Provides the final output of the network © The number of neurons depends on the task: = Regression: Single neuron or multiple neurons for vector outputs. = Classification: One neuron (binary classification) or one neuron per class (multi-class classification). Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid activation function takes real values as input and converts them to numbers between 0 and 1 using the sigmoid formula. (x) = 1/(1 + exp(—2)) igmoid Neurons: In the perceptron model, the limitation is that there is a very harsh change in output function(binary output) which require linearly separable data. However, in most real-life cases, need a continuous output. So we propose a Sigmoid Neurons model . L.M.Kuwar it Y= Tyee) Above function shown is a sigmoid function where it takes linear input resulting in smooth continuous output(shown in red line). Here red line is the output of the sigmoid model and the blue line output of the perceptrons model. The output value lies between [0,1] irrespective of the number of inputs. As the sum keeps changing, Observe different values along the red line, The sigmoid model can be used both for regression and classification problems. in the case of regression, the predicted of the sigmoid function is y value whether in a classification problem, first predict using sigmoid function then decide threshold value for classes which classify the different class of predicted y. The threshold can be 0.5 or mean of predicted y or anything depending on the problem. Feed Forward Process in Deep Neural Network “Feedforward propagation, input data is passed through the network layer by layer, with each layer performing a computation based on the inputs it receives and passing the result to the next layer”. "The process of receiving an input to produce some kind of output to make some kind of prediction is known as Feed Forward". The information flow through the layer from input to the output without feedback loops. They are used for classification and regression. Architecture of Feedforward Neural Networks L.M.Kuwar The architecture of a feedforward neural network consists of three types of layers: the input layer, hidden layers, and the output layer. Each layer is made up of units known as neurons, and the layers are interconnected by weights. Input Layer: This layer consists of neurons that receive inputs and pass them on to the next layer. The number of neurons in the input layer is determined by the dimensions of the input data. Hidden Layers: These layers are not exposed to the input or output and can be considered as the computational engine of the neural network. Each hidden layer's neurons take the weighted sum of the outputs from the previous layer, apply an activation function, and pass the result to the next layer. The network can have zero or more hidden layers. Output Layer: The final layer that produces the output for the given inputs. The number of neurons in the output layer depends on the number of possible outputs the network is designed to produce. Each neuron in one layer is connected to every neuron in the next layer, making this a fully connected network. The strength of the connection between neurons is represented by weights, and learning in a neural network involves updating these weights based on the error of the output. How Feedforward Neural Networks Work The working of a feedforward neural network involves two phases: the feedforward phase and the backpropagation phase. Feedforward Phase: In this phase, the input data is fed into the network, and it propagates forward through the network. At each L.M.Kuwar hidden layer, the weighted sum of the inputs is calculated and passed through an activation function, which introduces non-linearity into the model. This process continues until the output layer is reached, and a prediction is made. Backpropagation Phase: Once a prediction is made, the error (difference between the predicted output and the actual output) is calculated, This error is then propagated back through the network, and the weights are adjusted to minimize this error. The process of adjusting weights is typically done using a gradient descent optimization algorithm. Activation Functions neurons in hidden layers use activation functions to introduce non-linearity into the model. This helps the network learn from complex data. Common activation functions such as: © ReLU (Rectified Linear Unit): Outputs the input directly if it is positive; otherwise, it outputs zero. Sigmoid: Converts the input into a value between 0 and 1, useful for binary classification. « Tanh: Similar to Sigmoid but outputs values between -1 and 1, often used in tasks where the input data is centered around zero. Training Feedforward Neural Networks Training a feedforward neural network involves using a dataset to adjust the weights of the connections between neurons. This is done through an iterative process where the dataset is passed through the network multiple times, and each time, the weights are updated to reduce the error in prediction. This process is known as gradient descent, and it continues until the network performs satisfactorily on the training data. Applications of Feedforward Neural Networks Feedforward neural networks are used in a variety of machine learning tasks including: Pattern recognition Classification tasks Regression analysis Image recognition Time series prediction To gain a best understanding of the feed-forward process, let's see this mathematically. 1) The first input is fed to the network, which is represented as matrix x1, x2, and one where one is the bias value. By x2 1] 2) Each input is multiplied by weight with respect to the first and second model to obtain their probability of being in the positive region in each model. So, we will multiply our inputs by a matrix of weight using matrix multiplication. [Im x, = [es wel = [score score] Wor Wee. 3) After that, we will take the sigmoid of our scores and gives us the probability of the point being in the positive region in both models. 1 ite 4) multiply the probability which have obtained from the previous step with the second set of weights. Always include a bias of one whenever taking a combination of inputs. Wit War [probability probability 1] x [es wa = [score] Wa, W32. [score score] = probability And as we know to obtain the probability of the point being in the positive region of this model, we take the sigmoid and thus producing our final output in a feed-forward process. L.M.Kuwar = [probability] Let takes the neural network which we had previously with the following linear models and the hidden layer which combined to form the non-linear model in the output layer. “sr 243 So, what will do use above non-linear model to produce an output that describes the probability of the point being in the positive region. The point was represented by 2 and 2. Along with bias,represent the input as f2 2 4) The first linear model in the hidden layer recall and the equation defined it 4x, — xp +12 Which means in the first layer to obtain the linear combination the inputs are multiplied by-4, -1 and the bias value is multiplied by twelve. 4+ Wi I2 2 1) x [3 | 12 Wyp. The weight of the inputs are multiplied by -1/5, 1, and the bias is multiplied by three to obtain the linear combination of that same point in our second model. =4 -1/5 [22 1) x J-1 =1 12203 2(-4) + 2(-1) + 103) 2(-4) + 2(-1) + 1(12). [2 0.6] Now, to obtain the probability of the point is in the positive region relative to both models we apply sigmoid to both points as 4 4 lt Il The second layer contains the weights which dictated the combination of the linear models in the first layer to obtain the non-linear model in the second layer. The weights are 1,5, 1, and a bias value of 0.5, L.M.Kuwar [3s a | 88 0.64] te le Now, we have to multiply our probabilities from the first layer with the second set of weights as 1.5} 1 0.5. [0.88 0.64 1] x | = [0.88(1.5) + (0.64)(1) + 1(0.5)] = 2.46 Now, we will take the sigmoid of our final score 4 -[0.92] ents 1+ It is complete math behind the feed forward process where the inputs from the input traverse the entire depth of the neural network. In this example, there is only one hidden layer. Whether there is one hidden layer or twenty, the computational processes are the same for all hidden layers. Backpropagation Process in Deep Neural Network:- Backpropagation is an algorithm used to train neural networks by iteratively adjusting the network's weights and biases in order to minimize the loss function. A loss function (also known as a cost function or objective function) is a measure of how well the model's predictions match the true target values in the training data. The loss function quantifies the difference between the predicted output of the model and the actual output, providing a signal that guides the optimization process during training. Input values W1=0.15 W2=0.20 W3=0.25 W4=0.30 Bias Values L.M.Kuwar b1=0.35 b2=0.60 Target Values T1=0.01 T2=0.99 Now, we first calculate the values of H1 and H2 by a forward pass. Forward Pass To find the value of H1 we first multiply the input value from the weights as H1=x1 xy tx2xwtb1 H1=0.05x0.15+0.10x0,20+0.35 H1=0.3775 To calculate the final result of H1, we performed the sigmoid function as Hl final = 1 ELginal = ——q— 1+ soars H1 finat = 0.593269992 We will calculate the value of H2 in the same way as H1 H2=x1xwetx2xwytb1 .05x0.25+0.10%0,3040.35 H2=0.3925 H2: To calculate the final result of H1, we performed the sigmoid function as 1 H2ginal = 1 lt+or 1 H2ginal = ——q— 1 1+ ga3es H2 i,q) = 0.596884378 Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2. To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the weights as L.M.Kuwar yI=H1 xws+H2xwetb2 y1=0.593269992x0.40+0.596884378x0.45+0.60 y1=1.10590597 To calculate the final result of y1 we performed the sigmoid function as Y1inat = ¥nnal = i + Ganse7 Ylfinal = 0.75136507 We will calculate the value of y2 in the same way as y1 y2=H1xw;+H2xwgtb2 y2=0.593269992x0, 50+0.596884378x0,55+0.60 y2=1,2249214 To calculate the final result of H1, we performed the sigmoid function as 1 ¥2final = 1 thor 1 y2anat = t 14+ ama tina = 0.772928465, Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values Ti and T2. Now, we will find the total error, which is simply the difference between the outputs from the target outputs. The total error is calculated as Etotat = D2 (target — output)? So, the total error is L.M.Kuwar 1 1 zu vifinas)? + yt y2pmat)? 1 1 : = 3 (0.01 — 0.75136507)" + (0.99 — 0.772928465)" = 0.274811084 + 0.0235600257 Eqotal ~ 0. 29837111 Now, we will backpropagate this error to update the weights using a backward pass. Backward pass at the output layer To update the weight, we calculate the error correspond to each weight with the help of a total error. The error on weight w is calculated by differentiating total error with respect to w, We perform backward process so first consider the last weight w5 as -@ ta Errorys = ae --(@) 1 at Exotal = 5 (T1— ¥liinal)® +5 (T2 — Y2 final)? 2 2 From equation two, it is clear that we cannot partially differentiate it with respect to w5 because there is no any w5. We split equation one into multiple terms so that we can easily differentiate it with respect to w5 as ae, ar, ay1_final coral Front Oy’ ayt 3wS Ayinnt vy dws ~ (3) Now, we calculate each term one by one to differentiate E,.,,, with respect to w5 as L.M.Kuwar 1 1 28g _ 203 T —¥t maa )® + 5 (12 — ynnt)®) yl Anal Oy Lenal 1 =2x5x(T1 —Ylgnat)?“* x (-1) +0 (TL yLpnat) = ~(0.01 — 0.75138507) Eo 0.74136507 O¥1 final tey = —t ¥htnd = TE eoH 1 Gm) Oy nat yl ayl ov “see = OFX (VLinal)? oo oe Ylfinal = 7 Lvl ena (7) vifinal Putting the value of e* in equation (5) 1 =vlenat = x Cy tpinal? © Yisinal = ylgnai X (1 —¥1 naa) = 0.75136507 x (1 — 0.75136507) (8) OyAsinat Dat = 0186815602. YL = Hi ging X WS + H fing) X WG +2. yl _ AHA X WS + H2hinat X WS +b2) - (9) aw5 aw5 = Hlfnat (40) yt dwg = 0596884378... Brora A¥innal oy AyL eva in equation no (2) to find the final result. So, we put the values of °Y*8at L.M.Kuwar Sytnnn . yt dyl dws = 0.74136507 x 0.186815602 x 0.593269992 PE erat ows, Now, we will calculate the updated weight W5,.,, with the help of the following formula Errors = 0,0821670407 vo oon (11) FEroral aws = 0.4— 0.5 x 00821670407 WSpow = WS — 1X Here,n = learning rate = 0.5 W5useue = 0. 35891648 wo (12) In the same way, We calculate W6rey,W7reu ANd W8ney and this will give us the following values W5poy=0.35891648 W6 pon =408666186 W7oy=0-511301270 WB uu=0.561370121 Backward pass at Hidden layer Now, backpropagate to our hidden layer and update the weight w1, w2, w3, and w4 as we have done with w5, w6, w7, and w8 weights. calculate the error at w1 as FE socal Errotws = Gare 1 ail + Erotat = 3 (T1—ylsinas)!’ + 5 (72 — ¥2 nai)” From equation (2), it is clear that we cannot partially differentiate it with respect to w1 because there is no any w1. We split equation (1) into multiple terms so that we can easily differentiate it with respect to w1 as Brora) __ABrorat_, OHILfinal Bwi BE Ij, ani Now, we calculate each term one by one to differentiate E,..y with respect to w1 as. 1 1 Broad _ 9G (TL yleas)? + 5 (12 ~ y2hna) THlgna OWL (14) We again split this because there is no any H1"™"' term in E"°*"! as L.M.Kuwar Aso AE, _OEs Fina AL” Bn (18) Es nasi will again split because in E1 and E2 there is no H1 term. Splitting is done as OB _ 8B, ayl PHlgna OY Oana OE, _ OE: oye lanai O92” OHI final (as) OE), OBy 28 ang 2B We again Split both 9!" @y2_ because there is no any y1 and y2 term in E1 and E2. We split it as 2¥Anaat ay aya, -@s) G9) 28: |g 282 Now, we find the value of 41°" 82. by putting values in equation (18) and (19) as From equation (18) Ey __ OE, Vana Syl Byles © Ay (4 (T 1 ytenat)®) Ov oF avi ay tpn 1 $25 (T1— vlna) XD xT From equation (8) 1 = 2 x5 (0.01 ~ 0.75136507) x (-1) x 0.196815602 oF, So = 0.138498562.. oy1 (20) From equation (19) L.M.Kuwar GE, GE: vf nal Oy2 Oy2san1 YZ _ OG (72 — y2anat)") — By2enat PY2hna aye: By 2eimat = 2X5 (12 ypu) X (1) X vosesow (21) ay2 (22) ¥2 final 2%? X (y2enet)? 23) 2emal = ———e ¥2emal = Ta 1-2 oor? = LO ¥2hieat (24) ¥2Qinat eo Putting the value of e% in equation (23) _ iy q X (¥2hinal)™ ¥26nal = y2tmai (1 — y2sinai) 0.772928465 x (1 — 0.772928465) B¥2Z hist ay2 0. 1755100838 ..........(25) From equation (21) =2x $10.99 —0.772928465) x (—1) x 0.175510053 SE: __o.osso9e2s66126414. ayl (26) Now from equation (16) and (17) L.M.Kuwar aE, _ayt yl lana (HL gna) X Ws + H2 pat X Wy + D2) OH Laat = 0.138498562 x A(HL final X Ws + H2pnat X Ws + b2) = 0.138498562 x > nt OH Lana = 0.138498862 x w5 = 0.138498562 x 0.40 ar, OHI einat oe; _ 88, ave OHI p,q) Jy2” OH1_final 0553994248 (27) O(H Linas X Wr + H2 final X We +b2) OH 1 gna 0.0380982366126414 x = —0.0380982366126414 x w7 0,0380982366126414 x 0.50 —0,0190491183063207 ......... (28) 25g OB Put the value of 941/inat_ — @Hnal_ in equation (15) as PE otal aE, OL peat AHL enat aby OH Lginat = 0.0553994248 + (—0,0190491183063207) OE coral Blagg 7 0° 0364908241736795 .......(29) SE total . @H1final dH1 We have®H4ninal’ we need to figure out 91” dwias L.M.Kuwar 1 gaa == final ~ Tp HT =H final Te (31) Putting the value of e*” in equation (30) Hy, fete (HI nat)? Hl geal = Hl gmat X (1 — Hl fnat) = 0,593269992 x (1 — 0.593269992) BHLpnat ‘on1 0.2413007085923199 We calculate the partial derivative of the total net input to H1 with respect to w1 the same as we did for the output neuron: HL = Hing X WS + H2pngt X W6 +b2 . (32) Oyl _ (xl x wi + x2 xw3 + bi x1) Ow awl =xi oH Bowd 7 OB ov (BB) SEtorat OH1finat aH2 So, we put the values of @44nna’ @H4 @w2 jn equation (13) to find the final result. FEroral FE rocal wl OH Lge = 0.0364908241736793 x 0.2413007085923199 X 0.05 OE rota awl OH gaa OF oui” aw Error, = 0,000438568 (34) L.M.Kuwar Now, we will calculate the updated weight W1,.. with the help of the following formula a Wlyew = wh — 1 x Si Here = learning rate = 0.5 = 0.15 — 0.5 x 0.000438568 Wee = 0.149780716 .......(35) In the same way, we calculate W2yey,W3;eq and w4 and this will give us the following values W4 poy3=0.29950229 updated all the weights. found the error 0.298371109 on the network when fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation, the total error is down to 0.291027924. After repeating this process 10,000, the total error is down to 0.0000351085. At this point, the outputs neurons generate 0.159121960 and 0,984065734 i.e, nearby our target value when feed forward the 0.05 and 0.1 Convolutional Neural Networks (CNNs) :- A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and interpret the image or visual data. When it comes to Machine Learning, Artificial Neural Networks perform really well. Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN) which is predominantly used to extract the feature from the grid-like matrix dataset. For example visual datasets like images or videos where data patterns play an extensive role. CNN Architecture :- Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer, Pooling layer, and fully connected layers. L.M.Kuwar Convolutional Max pooling Danse layer layer layer Input layer (Output layer Mage — — — —- A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of layers, and every layer transforms one volume to another through a differentiable function. Key components of a Convolutional Neural Network include: 1. Convolutional Layers: These layers apply convolutional operations to input images, using filters (also known as kernels) to detect features such as edges, textures, and more complex patterns. Convolutional operations help preserve the spatial relationships between pixels. 2. Pooling Layers: Pooling layers downsample the spatial dimensions of the input, reducing the computational complexity and the number of parameters in the network. Max pooling is a common pooling operation, selecting the maximum value from a group of neighboring pixels. 3. Activation Functions: Non-linear activation functions, such as Rectified Linear Unit (ReLU), introduce non-linearity to the model, allowing it to learn more complex relationships in the data. 4. Fully Connected Layers: These layers are responsible for making predictions based on the high-level features learned by the previous layers. They connect every neuron in one layer to every neuron in the next layer. Applications of CNN? Convolutional Neural Networks (CNNs) are a specialized type of neural network that are specifically designed for image processing and computer vision tasks. Here are some applications of CNNs in more detail: Image Classification: One of the most common applications of CNNs is image classification, where the task is to assign a label or category to an input image. CNNs can learn to recognize patterns and features in the input image and use them to make accurate predictions. © Object Detection: Object detection is the task of identifying and localizing ‘objects in an image. CNNs can be used to detect the presence and location of objects in an image, which is useful in applications such as self-driving cars, surveillance systems, and robotics. Facial Recognition: CNNs can be used for facial re cognition, which involves identifying and verifying the identity of a person from a digital image or video. This application is used in security systems, law enforcement, and social media. © Medical Image Analysis: CNNs can be used to analyze medical images such as X-rays, CT scans, and MRI scans. This can aid in the diagnosis of diseases and conditions, as well as the development of personalized treatment plans. ¢ Natural Language Processing: While CNNs are primarily used in image processing, they can also be used in natural language processing tasks such as text classification and sentiment analysis. CNNs can be used to learn patterns and features in text data, which can be used to classify or analyze text. * Video Analysis: CNNs can also be used for video analysis, such as detecting and tracking objects in video streams or recognizing actions in videos. ¢ CNNs are widely used in applications related to image processing, computer vision, and natural language processing. They are especially useful in tasks that involve recognizing patterns and features in complex data such as images and videos. There are many popular tools and frameworks for developing CNNs, including: TensorFlow: An open-source software library for deep learning developed by Google. PyTorch: An open-source deep learning framework developed by Facebook. MXNet: An open-source deep learning framework developed by Apache MXNet. « Keras: A high-level deep learning API for Python that can be used with TensorFlow, PyTorch, or MXNet. Locatio® shifted Variation 1 93-3 0-9-3 ogy ‘Tohandle variety in digits we can use ‘Simple artificial Neural network (ANN) 9.55.3 a 0.29 ° ° ° 0.07 ° papate ° a 092 ° ° a zi + 20 08. = 5 210 24 Aieieisiaie ° 0.001 7 by arid Image size = 1920 x 1080 x 5 First layer neurons = 1920 «1080 X 3 ~ 6 milion Hidden layer neurons = Let's say you keep it ~ 4 million Weights between input and hidden layer = 6 mil 4 mil = 24 million Disadvantages of using ANN for image classification 1, Too much computation 2. Treats local pixels same as pixels far apart 3. Sensitive to location of an object in an image L.M.Kuwar Loopy pattern filter Bs Diagonal line Verti¢al line pests filter -14141-1-1-1-14141 = -1 > -1/9 = -0. aS 1 | a4 aja jeg * a] i)]4]a] +t 4 055 0.11 -0.33 a -0.33 0.33 -0.33 a -0.22 -0.11 -0.22 a -0.33 -0.33 -0.33 1 Feature Map Loopy pattern Filters are nothing but the feature detectors L.M.Kuwar L.M.Kuwar Isthis Feature Extraction Classification engin [an Mon) a 5 2 o ° 2 : wssfas fas] 1 [ea a[ala 1 * > | 933) 033 | 043 0 ass Ewes ; = , 22 on|on oa : 1 7 433/03 | 033 L.M.Kuwar Pooling layer is used to reduce the size No me) ol weioj}re|e N|o|wo|w ol/e|ni]s 0 © 2 by 2 filter with stride = 2 Shifted 9 at different position eee | ESET i ‘ sfenfenfen| > fenpele| » O11 085 038 o o o pe ee O55 03 Oss 0 ° ° mi 2 8 2 9 2 4 45 1 3 0 1 2 0.75 | 2 has 2 | ° Benefits BF pooling Reduces Reduce overfitting Model is tolerant dimensions & as there are less towards variations, computation parameters distortions Eye.nose, ears ete oes a sh *B> a on Feature Extraction } Classification ReLU Pooling Beet i : Sees nee sparsity reduces foetal CNencits] S = + Introduces eos) Bening docile overfittin: eiecd ern tron Nene aece ic Foote 2 Makes the model Neem EC) se = pote ure ete ean te detection Sees) L.M.Kuwar Rotation Thickness CNN by itself doesn’t take care of rotation and scale + You need to have rotated, scaled samples in training dataset = If you don’t have such samples than use data augmentation methods to generate new rotated/scaled samples from existing training samples Different Types of CNN Models 1. LeNet 2. AlexNet 3. ResNet 4. GoogleNet 5. MobileNet 6. VGG 1.LeNet LeNet is one of the earliest convolutional neural network (CNN) architectures, introduced by Yann LeCun in 1989. It was primarily designed for handwritten digit recognition (e.g., the MNIST dataset) and played a foundational role in the development of modern deep learning techniques. LeNet-5 Architecture LeNet-5 is the most well-known version of LeNet. Its architecture consists of 7 layers, including convolutional, pooling, and fully connected layers, excluding the input layer. Here's a breakdown: 4. Input Layer: * Accepts grayscale images of size 32x3232 \times 3232%32. © If the input image is smaller (e.g., MNIST 282828 \times 282828), padding is applied to resize it. Convolutional Layer (C1): Applies 6 filters (kernels) of size 5x5 with a stride of 1. © Output feature maps: 6x28x28. © Activation function: Sigmoid or Tanh (used historically). 3. Pooling Layer (52): © Sub-sampling through average pooling with a window size of 2x2 and a stride of 2. © Output feature maps: 6«14%14. Convolutional Layer (C3): © Applies 16 filters of size 5x5. © Output feature maps: 161010. © The filters are connected to subsets of the previous layer's feature maps, introducing sparsity. 5. Pooling Layer (54): Average pooling with a 2*2 window and stride of 2. © Output feature maps: 16x5x5. 6. Fully Connected Layer (C5): «Fully connected to all neurons in the previous layer. © Output size: 120 neurons. 7. Fully Connected Layer (F6): «Fully connected to the previous layer. © Output size: 84 neurons. 8. Output Layer: Fully connected layer with 10 neurons (for 10 classes in MNIST). * Activation: Softmax to output probabilities for classification. Key Features of LeNet 1. Convolutional Layers: © Extract spatial features from input images by applying learnable filters. 2. Pooling Layers: © Reduce spatial dimensions, which helps in computational efficiency and robustness to small translations. 3. Fully Connected Layers: © Perform high-level reasoning for classification based on extracted features. 4, Activation Functions: © Originally used Sigmoid or Tanh, but modern implementations often use ReLU for better gradient flow. 5, Simple Connections: © Filters are not fully connected to all input channels, which reduces parameters and computational complexity. Advantages of LeNet 1. Efficient Feature Extraction: © Convolution and pooling layers enable efficient learning of spatial hierarchies. 2. Reduced Parameters: © By using shared weights in convolutional layers, LeNet significantly reduces the number of learnable parameters. 3. Scalability © The architecture inspired modern CNNs like AlexNet, VGG, and ResNet. Limitations of LeNet 4. Small Input Sizes: © Designed for small 32x3232 \times 323232 inputs, limiting its application to larger, more complex datasets. 2. Limited Depth: ©. Shallow architecture compared to modern CNNs, which limits its ability to capture highly complex patterns. 3. Outdated Activation Functions: © Sigmoid and Tanh can lead to vanishing gradients, making training deeper networks challenging. Applications 1. Handwritten Digit Recognition: © Initially used for recognizing digits in postal codes (e.g., MNIST dataset). 2. Document Processin, © Applied in OCR (Optical Character Recognition) systems. 2.AlexNet:- AlexNet is a deep convolutional neural network architecture that revolutionized computer vision by winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It was proposed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton and demonstrated the power of deep learning on large datasets using GPUs. Key Contributions of AlexNet 1. Deep Convolutional Architecture: © Introduced depth and complexity, with 8 layers (5 convolutional layers and 3 fully connected layers). 2. ReLU Activation: © Replaced Sigmoid/Tanh with ReLU (Rectified Linear Unit), enabling faster training. 3. Dropout Regularization: © Used to prevent overfitting in the fully connected layers. 4. GPU Utilization: © Leveraged GPUs for parallel processing, accelerating training on large datasets. 5. Data Augmentation: © Applied techniques like random cropping, flipping, and image jittering to increase dataset size and reduce overfitting. 6. Overlapping Pooling: © Used overlapping max-pooling instead of average pooling to improve feature representation. AlexNet Architecture AlexNet processes images of size 227x273 and outputs predictions for 1000 classes. Here's the detailed breakdown: 1. Input Layer: © Accepts RGB images of size 227x227%3. 2. Convolutional Layer 1 (Conv1): © Filters: 96 © Kernel size: 11%11 © Stride: 4 Activation: ReLU © Output: 55%55*96 3. Max Pooling 1 (Pool1): © Pool size: 3x3 © Stride: 2 © Output: 27*27%96 4, Convolutional Layer 2 (Conv2): Filters: 256 © Kernel size: 5x5 L.M.Kuwar © Stride: 1 * Activation: ReLU © Output: 27x27%256 5. Max Pooling 2 (Pool2): Poo! size: 3x33 \times 33x3 © Stride: 2 © Output: 13x13%256 6. Convolutional Layer 3 (Conv3): © Filters: 384 © Kernel size: 3x3 © Stride: 1 Activation: ReLU © Output: 13*13%384 7. Convolutional Layer 4 (Conv4): * Filters: 384 © Kernel size: 3x3 © Stride: 1 © Activation: ReLU © Output: 13x13%384 8. Convolutional Layer 5 (Conv5): © Filters: 256 © Kernel size: 3x3 © Stride: 1 Activation: ReLU © Output: 13*13*256 9. Max Pooling 3 (Pool3): © Pool size: 3x3 © Stride: 2 * Output: 6x6*256 10. Fully Connected Layer 1 (FC6): Neurons: 4096 © Activation: ReLU Dropout applied. 11. Fully Connected Layer 2 (FC7): Neurons: 4096 © Activation: ReLU Dropout applied. 12. Fully Connected Layer 3 (FCB): * Neurons: 1000 (number of classes in ImageNet) Activation: Softmax. Key Innovations 1. Local Response Normalization (LRN): © Applied after ReLU activations to enhance generalization. 2. Overlapping Pooling: © Reduced dimensions while maintaining more spatial information. 3. Dropout: © Regularized the model by randomly setting a fraction of neurons to zero during training. 4, GPU Training: © Trained using two GPUs, with layers split across them. Advantages of AlexNet 1. Breakthrough Accuracy: © Achieved a top-5 error rate of 15.3% on ImageNet, significantly better than previous models. 2. Scalable to Large Datasets: © Handled millions of images effectively. 3. Inspired Modern Architectures: © Paved the way for deeper networks like VGG, ResNet, and Inception. Limitations of AlexNet 1. Computationally intensive: © Requires significant hardware resources. 2. Fixed Input Size: © Only accepts 227*227 inputs. 3. Manual Design: © Lacks the automation seen in later architectures like NASNet. 3. Resnet (Residual Networks): Revolutionizing Deep Learning ResNet, introduced in 2015 by Kaiming He et al., is a groundbreaking deep convolutional neural network architecture that addressed the vanishing gradient problem and enabled the training of extremely deep networks. It won the ILSVRC 2015 competition, achieving remarkable accuracy on the ImageNet dataset. Key Idea: Residual Learning The core innovation of ResNet is the introduction of residual connections, which create shortcut paths to allow gradients to flow directly through layers. This addresses the degradation problem, where adding more layers leads to increased training error due to difficulties in optimizing very deep models. Residual Block: Arresidual block is the building block of ResNet. Instead of learning a full mapping HO)H(x)H(x), it learns a residual mapping F(x), where: H0d=FOo+xHere: © F(x): Residual function (output of convolutional layers) x: Input (shortcut connection). ResNet Architecture Variants ResNet comes in several versions based on depth: 1, ResNet-18: 18 layers 2, ResNet-34: 34 layers 3. ResNet-50: 50 layers 4, ResNet-101: 101 layers 5, ResNet-152: 152 layers Structure of ResNet 1. Convolutional Layer: © 7x7 convolution with stride 2. © Followed by a 3x3 max-pooling layer. 2. Residual Blocks: * Each block consists of stacked convolutional layers with shortcut connections. 3. Fully Connected Layer: © After global average pooling, a fully connected layer outputs class probabilities. ResNet-50 Detailed Architecture ResNet-50 is a deeper version that uses bottleneck blocks to reduce computational complexity. Each bottleneck block has three convolutional layers: 1. 11: Reduces dimensions. 2. 3x3: Extracts features. 3. 11: Restores dimensions. Layer Configuration: LayerName | Output Size | Layers Convt 1126112 7*7,64,stride 2 MaxPool 56x56 3x3,stride 2 Conv2_x 56x56 11, 64, 3x3 11,256 Conv3_x 28x28 11,128, 33,128, 1%1,51,1%1,512 Conva x 14x14 1%1,256,3*3,256,1%1,1024 Conv5_x 1 1%1,512,3*3,512,1%1,2048 Global Avg Pooling | 11 - FC 1000 Fully Connected Advantages of ResNet 1. Mitigates Vanishing Gradients: © Shortcut connections ensure gradients flow through the network without diminishing. L.M.Kuwar 2, Supports Very Deep Networks: © Enables training of networks with 100+ layers. 3, Improved Accuracy: © Achieves state-of-the-art performance on image classification tasks. 4, Modularity: © Residual blocks can be stacked to create deeper architectures. 5. MobileNet MobileNet, introduced by Andrew G. Howard et al. in 2017, is a convolutional neural network designed for mobile and embedded vision applications. It focuses on efficiency, enabling deployment on devices with limited computational power. Key Innovations in MobileNet 4. Depthwise Separable Convolutions: © Replaces standard convolution with two steps: = Depthwise Convolution: Applies a single filter to each input channel. = Pointwise Convolution: Applies a 1x1 convolution to combine features across channels. © Reduces computation by a factor of 1/N+1/D? where N is the number of output channels and Dis the kernel size. Width Multiplier (a\alphaa): © Scales the number of channels in each layer to trade off accuracy for efficiency. o ae (0,1]. 3, Resolution Multiplier (p\rhop): © Scales the input image resolution to reduce computational cost. o pe 0,1]. MobileNet Architecture MobileNet processes images of size 224x224224 \times 224224x224 by default. Here's the structure: Type Output size Filters | Kernel Size | Stride Input 224x224 - - - Conv + BatchNorm + ReLU6 an2x112, 32 | 3x3 2 Depthwise Conv + BN + ReLUG 112112, 32. | axa 1 Pointwise Conv + BN + ReLU6 112x112 64 1x1 1 Depthwise Conv + BN + ReLUS 56x56 64 | 33 2 Pointwise Conv + BN + ReLU6 56x56, 128 1x1 1 Depthwise + Pointwise Conv|- - - 7 (Repeat) Avg Pooling ore] - - - Fully Connected 11 1000 | - - Advantages of MobileNet 1. Efficiency: © Lower computational cost due to depthwise separable convolutions, 2. Flexibility: © Adjustable width and resolution multipliers allow customization for specific hardware constraints. 3. Performance: © Competitive accuracy on benchmarks like ImageNet with significantly fewer parameters. Variants of MobileNet 1, MobileNetV1: © Initial version with depthwise separable convolutions. 2. MobileNetv2: © Adds inverted residuals with linear bottlenecks for better performance. 3. MobileNetv3: © Combines techniques like squeeze-and-excitation (SE) blocks and NAS (Neural Architecture Search) for further optimization. 6. VGG The VGG network, introduced by Karen Simonyan and Andrew Zisserman in 2014, is a deep convolutional neural network architecture known for its simplicity and effectiveness. It was a top performer in the ILSVRC 2014 competition, achieving high accuracy on the ImageNet dataset. Key Features of VGG 1. Simplicity: © Uses only 3x3 convolutional layers stacked in increasing depth. © Avoids complex designs like inception modules or skip connections. 2. Deep Architecture: © Depth varies across VGG variants (e, layers. 3. Uniform Design: © Convolutional layers use the same kernel size, stride, and padding throughout the network. 4, Fully Connected Layers: VGG-11, VGG-16, VGG-19) with up to 19 © Concludes with one or more fully connected layers, followed by a softmax layer for classification. VGG Architecture VGG processes images of size 224x224 and increases the number of filters as the depth increases. VGG-16 Detailed Architecture: Layer Type Output Size Configuration Input 224x224 - Conv Block 1 224x224 64: 3x3, ReLU (x2) Max Pooling 112112 2x2, stride 2 Conv Block 2 112x112, 128: 3x3, ReLU (x2) Max Pooling 56x56 2x2, stride 2 Conv Block 3 56x56 256: 3x3, ReLU (x3) Max Pooling 28x28 2x2, stride 2 Conv Block 4 28x28 512 :3x3, ReLU (x3) Max Pooling 14x14 2x2, stride 2 Conv Block 5 14x14 512: 3x3, ReLU (x3) Max Pooling 77 2x2, stride 2 Fully Connected | 4096 ReLU 1 Fully Connected | 4096 ReLU 2 Fully Connected | 1000 Softmax 3 Variants of VGG 1. VGG-11: © 11 layers (8 convolutional + 3 fully connected). 2. VGG-16: © 16 layers (13 convolutional + 3 fully connected). 0 Most widely used variant. 3. VGG-19: © 19 layers (16 convolutional + 3 fully connected). Recurrent Neural Network (RNN) :- A recurrent neural network or RNN is a deep neural network trained on sequential or time series data to create a machine learning (ML) model that can make sequential predictions or conclusions based on sequential inputs. RNNs can also be used to solve ordinal or temporal problems such as language translation, natural language processing (NLP), sentiment analysis, speech recognition and image captioning. How Recurrent Neural Networks Work :- Recurrent Neural Network(RNN) is a type of Neural Network where the output from the previous step is fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other. Still, in cases when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer. The main and most important feature of RNN is its Hidden state, which remembers some information about a sequence. The state is also referred to as Memory State since it remembers the previous input to the network. It uses the same parameters for each input as it performs the same task on all the inputs or hidden layers to produce the output. This reduces the complexity of parameters, unlike other neural networks. Recurrent Neural Network 4. Input Layer: * Takes in the sequence data at each time step. This could be a series of words in a sentence or values in a time series. 2. Hidden Layer(s): L.M.Kuwar * Contains neurons with recurrent connections, allowing information from previous time steps to influence the current state. © The hidden state is updated using the current input and the previous hidden state: ht=f(Ux,+Wh,.,+b) where U, W, and b are the weights and bias, x, is the input at time t, andh,., is the previous hidden state. 3. Output Layer: * Produces the final output for each time step, which could be used immediately or fed into the next stage of processing. © The output is typically calculated as: ot=g(Vh,+c) where V and c are the weights and bias for the output layer. Recurrent Nature The key feature of RNNs is the recurrence within the hidden layers. Information is not only passed from the input to the output layer but also looped back within the network, allowing it to maintain a form of memory. Variants 4. LSTM (Long Short-Term Memory): * Contains special units called memory cells that can maintain information over long periods. Includes gates (input, forget, and output gates) to control the flow of information. 2. GRU (Gated Recurrent Unit): © Asimpler alternative to LSTM that combines the forget and input gates into a single update gate. How RNN differs from Feedforward Neural Network? Artificial neural networks that do not have looping nodes are called feed forward neural networks. Because all information is only passed forward, this kind of neural network is also referred to as a multi-layer neural network, Information moves from the input layer to the output layer - if any hidden layers are present - unidirectionally in a feedforward neural network. These networks are appropriate for image classification tasks, for example, where input and output are independent. Nevertheless, their inability to retain previous inputs automatically renders them less useful for sequential data analysis. L.M.Kuwar ag & (a) Recurrent Neural Network —_(b) Feed-Forward Neural Network Recurrent Vs Feedforward networks Recurrent Neuron and RNN Unfolding The fundamental processing unit in a Recurrent Neural Network (RNN) is a Recurrent Unit, which is not explicitly called a “Recurrent Neuron.” This unit has the unique ability to maintain a hidden state, allowing the network to capture sequential dependencies by remembering previous inputs while processing. Long Short-Term Memory (LSTM) and versions improve the RNN’s ability to handle long-term dependencies. Recurrent Neuron Unfold | w | w | w L.M.Kuwar NN Unfolding Types Of RNN There are four types of RNNs based on the number of inputs and outputs in the network. ‘One to One One to Many Many to One Many to Many ere One to One This type of RNN behaves the same as any simple Neural network it is also known as Vanilla Neural Network. In this Neural network, there is only one input and one output. ‘one to one Single Output Single input One to One RNN One To Many In this type of RNN, there is one input and many outputs associated with it. One of the most used examples of this network is Image captioning where given an image predict a sentence having Multiple words. = HH One to Many RNN Multiple outputs Single Input L.M.Kuwar Many to One In this type of network, Many inputs are fed to the network at several states of the network generating only one output. This type of network is used in the problems like sentimental analysis. Where give multiple words as input and predict only the sentiment of the sentence as output. anyone I sigh out Many to One RNN Many to Many In this type of neural network, there are multiple inputs and multiple outputs corresponding to a problem. One Example of this Problem will be language translation. In language translation, provide multiple words from one language as input and predict multiple words from the second language as output. Many to Many Multiple inputs i iu i Many to Many RNN Multiple Cutputs Key Differences Between CNN and RNN * CNN is applicable for sparse data like images. RNN is applicable for time series and sequential data. © While training the model, CNN uses a simple backpropagation and RNN uses backpropagation through time to calculate the loss. * RNN can have no restriction in length of inputs and outputs, but CNN has finite inputs and finite outputs. ¢ CNN has a feedforward network and RNN works on loops to handle sequential data. L.M.Kuwar CNN can also be used for video and image processing. RNN is primarily used for speech and text analysis. L.M.Kuwar

You might also like