0% found this document useful (0 votes)

13 views6 pages

Activation Function

The document discusses various activation functions used in neural networks, highlighting their roles, equations, examples, advantages, and limitations. It covers functions such as Binary Step, Linear, Sigmoid, Tanh, ReLU, Leaky ReLU, PReLU, Softmax, and Swish, emphasizing the importance of non-linear functions for learning complex patterns. Each function is evaluated for its applicability in different scenarios, particularly in relation to classification tasks and the challenges of gradient issues.

Uploaded by

srinujpt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Activation Function

Uploaded by

srinujpt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Their primary role is to transform the summed weighted input from a neuron into an

output value. This output is then fed to the next layer of neurons or becomes the final output
of the network. Without them, a neural network would just be a linear regression model, no
matter how many layers it had, and couldn't learn complex patterns.

Let's explore different types of activation functions:

1. Binary Step Function

Imagine a light switch: it's either ON or OFF.

 How it works: This function is like a simple "if-else" statement. You set a
"threshold" value. If the input to the neuron is greater than this threshold, the neuron
"activates" (outputs 1). Otherwise, it "deactivates" (outputs 0).
 Equation:
o f(x)=1 if x≥threshold
o f(x)=0 if x<threshold
 Example: Let's say your threshold is 0.5.
o If the input is 0.7, output is 1.
o If the input is 0.3, output is 0.
 When to use: It's mainly used in binary classification problems (e.g., spam or not
spam) where you need a clear 0 or 1 output.
 Limitations:
o Cannot handle multi-class classification: If you have more than two
categories (e.g., cat, dog, bird), this function won't work.
o No gradient: It's not differentiable, which means you can't use it with
gradient-based optimization algorithms (like backpropagation) that are crucial
for training neural networks. This is its biggest drawback.

2. Linear Activation Function

Think of a direct pipeline where what goes in, comes straight out, perhaps scaled.

 How it works: This function simply outputs a value that is directly proportional to the
input. It doesn't really "activate" anything in a non-linear way. It's also called "no
activation" or "identity function."
 Equation: y=Wx+b (where W is weight, x is input, b is bias). In the context of an
activation function, it simplifies to f(x)=x or f(x)=cx (where c is a constant).
 Example:
o If the input is 5, output is 5 (if f(x)=x).
o If the input is 2 and f(x)=2x, output is 4.
 When to use: While simple, it's not commonly used as an activation function in
hidden layers of deep neural networks because stacking multiple linear layers still
results in a single linear transformation. It might be used in the output layer for
regression problems where you need to predict a continuous value.
 Limitations:
o Cannot learn complex patterns: Since it's linear, it can only model linear
relationships. Neural networks need non-linear activation functions to learn
from complex, real-world data.
o Vanishing/Exploding Gradients: In deep networks, the gradients can either
shrink to zero or grow infinitely large during training, making learning very
difficult.

3. Non-Linear Activation Functions

These are the real stars of neural networks! They introduce non-linearity, allowing networks
to learn complex relationships and patterns in data.

a) Sigmoid / Logistic Activation Function

Imagine a smooth "S" curve that squishes any input value into a range between 0 and 1. This
is great for probabilities.

 How it works: Takes any real-valued input and transforms it into an output between 0
and 1. Larger positive inputs get closer to 1, while larger negative inputs get closer to
0.
 Equation: f(x)=1+e−x1
 Example:
o If input is very large positive (e.g., 100), output is very close to 1 (e.g.,
0.999...).
o If input is 0, output is 0.5.
o If input is very large negative (e.g., -100), output is very close to 0 (e.g.,
0.000...).
 When to use:
o Output layer for binary classification: Since its output is between 0 and 1,
it's perfect for predicting probabilities (e.g., the probability of an email being
spam).
o Historically popular in hidden layers: While less common now due to
limitations, it was widely used in hidden layers.
 Advantages:
o Smooth gradient: It's differentiable everywhere, providing a smooth gradient
for backpropagation.
o Probabilistic output: Maps outputs to probabilities, which is intuitive for
many problems.
 Limitations:
o Vanishing Gradient Problem: The most significant issue. As inputs move
further away from 0 (either very positive or very negative), the gradient of the
function becomes very small, almost flat. This means during backpropagation,
the gradients become tiny, and the network learns very slowly or stops
learning altogether in earlier layers. Imagine trying to learn by taking tiny, tiny
steps.
o Not Zero-centered: The output is always positive (between 0 and 1). This can
lead to issues with gradient updates where all gradients are either positive or
negative, causing a "zigzagging" effect during optimization.

b) Tanh Function (Hyperbolic Tangent)

Similar to Sigmoid, but "zero-centered," meaning its output ranges from -1 to 1.

 How it works: Like the sigmoid, it has an S-shape, but its output values range from -
1 to 1. Larger positive inputs are closer to 1, and larger negative inputs are closer to -
1.
 Equation: f(x)=ex+e−xex−e−x
 Example:
o If input is very large positive, output is very close to 1.
o If input is 0, output is 0.
o If input is very large negative, output is very close to -1.
 When to use: Often preferred over sigmoid in hidden layers because of its zero-
centered output. This helps with the optimization process.
 Advantages:
o Zero-centered output: This addresses one of the sigmoid's limitations,
making training generally more stable and faster. It means the mean of the
hidden layer outputs is closer to zero, which helps in centering the data for the
next layer.
o Stronger gradients near the origin: Compared to sigmoid, it has a larger
range of significant gradients.
 Limitations:
o Still suffers from Vanishing Gradient Problem: Although better than
sigmoid, it still faces the vanishing gradient issue for very large or very small
inputs, as its ends are also flat.

c) ReLU Activation Function (Rectified Linear Unit)

The most popular choice for hidden layers in deep learning today. It's simple and efficient.

 How it works: It's straightforward: if the input is positive, it outputs the input
directly. If the input is negative, it outputs zero.
 Equation: f(x)=max(0,x)
 Example:
o If input is 5, output is 5.
o If input is -2, output is 0.
 When to use: Widely used in hidden layers of deep neural networks for tasks like
image classification, natural language processing, etc.
 Advantages:
o Solves Vanishing Gradient Problem (for positive inputs): For positive
inputs, the gradient is always 1, preventing the vanishing gradient issue.
o Computationally efficient: Simple calculation (just a comparison and a
return). This makes training much faster.
o Sparsity: It introduces sparsity in the network by setting negative activations
to zero. This can lead to more efficient representations.
 Limitations:
o Dying ReLU Problem: This is its main drawback. If a neuron consistently
receives negative inputs, its output will always be zero, and its gradient will
also be zero. This means the neuron will stop learning and effectively "die,"
never activating again. Imagine a light that's permanently off.

d) Leaky ReLU Function

An improvement over ReLU to address the "Dying ReLU" problem.

 How it works: Instead of setting negative inputs to exactly zero, Leaky ReLU allows
a small, non-zero slope for negative inputs (e.g., 0.01 * x). This small slope ensures
that the neuron can still learn, even if the input is negative.
 Equation: f(x)=max(0.01x,x)
 Example:
o If input is 5, output is 5.
o If input is -2, output is 0.01∗−2=−0.02.
 When to use: When you suspect your ReLU neurons are "dying" and you want to
ensure some gradient flow for negative inputs.
 Advantages:
o Addresses Dying ReLU: By allowing a small gradient for negative values, it
prevents neurons from becoming permanently inactive.
o Computationally efficient: Still very fast to compute.
o Allows negative values during backpropagation: Ensures that information
flows even for negative inputs.
 Limitations:
o The "leak" (the small slope) is a fixed hyperparameter (e.g., 0.01). While
better than 0, it might not be optimal for all situations.

e) Parameterized ReLU (PReLU)

An even more flexible version of Leaky ReLU.

 How it works: Instead of fixing the small slope for negative inputs (like 0.01 in
Leaky ReLU), PReLU makes this slope 'a' a learnable parameter. The network itself
learns the optimal value for 'a' during training.
 Equation:
o f(x)=x if x≥0
o f(x)=ax if x<0 where 'a' is a trainable parameter.
 Example:
o If input is 5, output is 5.
o If input is -2, output is a∗−2. The value of 'a' will be learned by the network.
 When to use: When you want the network to learn the optimal "leakiness" for
negative activations, potentially leading to faster and more optimal convergence.
 Advantages:
o Learns optimal 'a': Allows the network to adapt the activation function to the
specific data, potentially leading to better performance than a fixed Leaky
ReLU.
o Addresses Dying ReLU: Like Leaky ReLU, it prevents dead neurons.
 Limitations:
o Adds another parameter to learn, slightly increasing computational
complexity, but often worth it for the performance gains.

f) Softmax Function

The go-to for multi-class classification problems, providing probabilities for each class.

 How it works: Unlike the previous functions that operate on a single input, Softmax
operates on a vector of inputs (typically the output of the last hidden layer). It
squashes these values into a range between 0 and 1, and crucially, all the outputs
sum up to 1. This makes them interpretable as probabilities for each class.
 Equation (for a single output yi in a vector of K outputs): yi=∑j=1Kezjezi where
zi is the input for class i, and the sum is over all K classes.
 Example: Imagine you're classifying an image as a "cat," "dog," or "bird." The raw
outputs from the previous layer might be [2.0, 1.0, 0.1]. Applying Softmax could give
you [0.7, 0.2, 0.1]. This means the network predicts a 70% chance of being a cat, 20%
a dog, and 10% a bird.
 When to use: Almost exclusively used in the output layer of neural networks for
multi-class classification problems.
 Advantages:
o Probabilistic output: Provides well-normalized probabilities for each class,
making the output easily interpretable.
o Handles multiple classes: Essential for problems with more than two
categories.
 Limitations:
o Can be computationally expensive for a very large number of classes.

g) Swish Activation Function

A relatively newer, self-gated activation function showing promising results.

 How it works: It's a smooth, non-monotonic function that allows small negative
values to pass through, unlike ReLU which zeros them out. This "self-gating"
mechanism helps in learning.
 Equation: f(x)=x⋅sigmoid(x) or f(x)=x⋅1+e−x1
 Example:
o If input is very positive, it behaves similarly to ReLU (output close to input).
o If input is 0, output is 0.
o If input is slightly negative (e.g., -0.5), it allows a small negative value to pass
through, unlike ReLU which would output 0.
o If input is very negative, output approaches 0, similar to ReLU.
 When to use: Being explored as an alternative to ReLU in deep networks, especially
in challenging domains like image classification and machine translation.
 Advantages:
o Smooth function: Unlike ReLU, it doesn't have an abrupt change at x=0,
which can help with training stability.
o Allows small negative values: This can preserve information that might be
lost by ReLU, as small negative values might still contain relevant patterns.
o Non-monotonicity: Its non-monotonic behavior can enhance the expression
of input data and weight learning.
o Outperforms ReLU: Researchers have shown it often matches or
outperforms ReLU on deep networks.
 Limitations:
o Slightly more computationally expensive than ReLU due to the sigmoid
calculation.
o Still relatively new compared to ReLU, so its long-term performance and
common pitfalls are still being researched and understood

Unit 2
No ratings yet
Unit 2
35 pages
Activation
No ratings yet
Activation
7 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
43 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
3 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
What Are The Activation Functions, How Do I Deter...
No ratings yet
What Are The Activation Functions, How Do I Deter...
3 pages
Activation Function
No ratings yet
Activation Function
36 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Arjun Yadav 32, Activation Function Assignment
No ratings yet
Arjun Yadav 32, Activation Function Assignment
7 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Dl-Module 2
No ratings yet
Dl-Module 2
138 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
Unit-2.a Feedforward DNN
No ratings yet
Unit-2.a Feedforward DNN
13 pages
Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
10 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Deeplearning Shreiyans
No ratings yet
Deeplearning Shreiyans
18 pages
3-Activation Function, Loss Function-24-07-2024
No ratings yet
3-Activation Function, Loss Function-24-07-2024
19 pages
Activation Function
No ratings yet
Activation Function
34 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Activation Function
No ratings yet
Activation Function
10 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Lec 22 Activations Functions Complete
No ratings yet
Lec 22 Activations Functions Complete
33 pages
Module 2
No ratings yet
Module 2
13 pages
Activation Function
No ratings yet
Activation Function
31 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
14 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Neural Networks: A Deep Dive
No ratings yet
Neural Networks: A Deep Dive
34 pages
UNIT-3 Deep Learning (Revised) - 1
No ratings yet
UNIT-3 Deep Learning (Revised) - 1
92 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Neural Networks Activation Functions 1694135997
No ratings yet
Neural Networks Activation Functions 1694135997
7 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
CNN Activation Functions Explained
No ratings yet
CNN Activation Functions Explained
5 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Fundamentals of Neural Network
No ratings yet
Fundamentals of Neural Network
84 pages
Module1
No ratings yet
Module1
124 pages
ANN Viva Prep
No ratings yet
ANN Viva Prep
66 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
Activation Function
No ratings yet
Activation Function
4 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
ANN Notes
No ratings yet
ANN Notes
7 pages
Transfer learning
No ratings yet
Transfer learning
6 pages
Lect 1
No ratings yet
Lect 1
17 pages
Code_Optimization_Lecture_Notes
No ratings yet
Code_Optimization_Lecture_Notes
3 pages
CNN 1
No ratings yet
CNN 1
11 pages
DL Unit 1
No ratings yet
DL Unit 1
199 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
48 pages
ACD Unit5
No ratings yet
ACD Unit5
18 pages
Week 9
No ratings yet
Week 9
4 pages
Unit-1 - 2 - 3 Notes
No ratings yet
Unit-1 - 2 - 3 Notes
46 pages
ACD Unit 4
No ratings yet
ACD Unit 4
64 pages
WEEK2
No ratings yet
WEEK2
3 pages
Project - Ipynb - Colaboratory
No ratings yet
Project - Ipynb - Colaboratory
4 pages
IET Smart Cities - 2024 - Kommey - An Artificial Intelligence Based Non Intrusive Load Monitoring of Energy Consumption in
No ratings yet
IET Smart Cities - 2024 - Kommey - An Artificial Intelligence Based Non Intrusive Load Monitoring of Energy Consumption in
24 pages
Learning From Class Imbalanced Data Review of Methods and Applications
No ratings yet
Learning From Class Imbalanced Data Review of Methods and Applications
20 pages
LSTM Model for Daily Climate Prediction
No ratings yet
LSTM Model for Daily Climate Prediction
8 pages
Garbage Classification Using Transfer Learning Code - Ipynb (2) .TXT - Colab
No ratings yet
Garbage Classification Using Transfer Learning Code - Ipynb (2) .TXT - Colab
20 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
Generative AI & Prompt Engg Content
No ratings yet
Generative AI & Prompt Engg Content
8 pages
ND Portfolio
No ratings yet
ND Portfolio
52 pages
Assignments 1
No ratings yet
Assignments 1
2 pages
BE02000041-Fundamental of AI-OPM-VGEC
No ratings yet
BE02000041-Fundamental of AI-OPM-VGEC
13 pages
HDP Developer-Enterprise Spark 1-Student Guide-Rev 1
No ratings yet
HDP Developer-Enterprise Spark 1-Student Guide-Rev 1
234 pages
CS405-6 2 1 2-Wikipedia
No ratings yet
CS405-6 2 1 2-Wikipedia
7 pages
Farming Guru Machine - Learning - Based - Innovation - For - Smart - Farming
No ratings yet
Farming Guru Machine - Learning - Based - Innovation - For - Smart - Farming
4 pages
Cold Mail Samples With Notes-10
No ratings yet
Cold Mail Samples With Notes-10
13 pages
10 1108 - Emjb 03 2024 0055
No ratings yet
10 1108 - Emjb 03 2024 0055
25 pages
15056-Article Text-44992-2-10-20210906
No ratings yet
15056-Article Text-44992-2-10-20210906
15 pages
Project Report On NEXA SOFTWARE
No ratings yet
Project Report On NEXA SOFTWARE
45 pages
Predicting Merge Conflicts with ML
No ratings yet
Predicting Merge Conflicts with ML
77 pages
Hackathon Sample Document (1) (3) 2
No ratings yet
Hackathon Sample Document (1) (3) 2
7 pages
AI Vs ML Vs DL
No ratings yet
AI Vs ML Vs DL
8 pages
Theory Exam Schedule BCA Regular Exam Semester 5 Winter 2024 25
No ratings yet
Theory Exam Schedule BCA Regular Exam Semester 5 Winter 2024 25
2 pages
A Minor Project Synopsis 2
No ratings yet
A Minor Project Synopsis 2
11 pages
ML for Student Stress Prediction
No ratings yet
ML for Student Stress Prediction
3 pages
Heart Disease Detection System Proposal - 2024
No ratings yet
Heart Disease Detection System Proposal - 2024
28 pages
The Risk Atlas of Mexico City Mexico A Tool For de
No ratings yet
The Risk Atlas of Mexico City Mexico A Tool For de
11 pages
LSTM-Based Hate Speech Detection
No ratings yet
LSTM-Based Hate Speech Detection
49 pages
Ieee Python 2024
No ratings yet
Ieee Python 2024
2 pages
IoT and Cloud Computing Trends
76% (17)
IoT and Cloud Computing Trends
84 pages
Research Paper
No ratings yet
Research Paper
6 pages
Contract Law and Data Science Concepts
No ratings yet
Contract Law and Data Science Concepts
19 pages

Activation Function

Uploaded by

Activation Function

Uploaded by

Their primary role is to transform the summed weighted input from a neuron into an

Let's explore different types of activation functions:

1. Binary Step Function

Imagine a light switch: it's either ON or OFF.

2. Linear Activation Function

3. Non-Linear Activation Functions

a) Sigmoid / Logistic Activation Function

b) Tanh Function (Hyperbolic Tangent)

Similar to Sigmoid, but "zero-centered," meaning its output ranges from -1 to 1.

c) ReLU Activation Function (Rectified Linear Unit)

d) Leaky ReLU Function

An improvement over ReLU to address the "Dying ReLU" problem.

e) Parameterized ReLU (PReLU)

An even more flexible version of Leaky ReLU.

g) Swish Activation Function

A relatively newer, self-gated activation function showing promising results.

You might also like