Ai Unit 4 Compiled Notes
Ai Unit 4 Compiled Notes
UNIT _4
Machine-Learning Paradigms: introduction, Machine Learning Systems. Supervised and
Unsupervised Learning, Inductive Learning, Learning Decision Trees. Artificial Neural
Networks: Introduction? Artificial Neural Networks, Single-Layer Feed-Forward Networks,
Multi-Layer Feed-Forward Networks Reinforcement learning: Learning from rewards. Passive
and Active reinforcement learning, Applications.
---------------------------------------------------------------------------------------------------------------------
Supervised learning involves labelled datasets, where each data observation is paired with a
corresponding class label. Algorithms in supervised learning aim to build a mathematical function
that maps input features to desired output values based on these labeled examples. Common
applications include classification and regression.
Unsupervised Learning
In unsupervised learning, algorithms work with unlabeled data to identify patterns and
relationships. These methods uncover commonalities within the data without predefined
categories. Techniques such as clustering and association rules fall under unsupervised learning.
Stages in Unsupervised Learning
Semi-supervised Learning
Semi-supervised learning strikes a balance by combining a small amount of labelled data with a
larger pool of unlabeled data. This approach leverages the benefits of both supervised and
unsupervised learning paradigms, making it a cost-effective and efficient method for training
models when the labeled data is limited.
Understanding Semi-supervised Learning pictorially
Reinforcement Learning
Reinforcement learning focuses on enabling intelligent agents to learn tasks through trial-and-
error interactions with dynamic environments. Without the need for labelled datasets, agents make
decisions to maximize a reward function. This autonomous exploration and learning approach is
crucial for tasks where explicit programming is challenging.
Action-Reward feedback loop: an agent takes actions in an environment, which is interpreted
into a reward and a representation of the state, which are fed back into the agent.
Reinforcement learning operates on an action-reward feedback loop, where agents take actions,
receive rewards, and interpret the environment’s state. This iterative process allows the agent to
autonomously learn optimal actions to maximize positive feedback.
Data science is an interdisciplinary field that employs scientific methods and machine learning
algorithms to extract insights and knowledge from structured and unstructured data.
Computer Vision
Computer vision teaches computers to interpret and analyze information from images and
videos. It enables machines to “see” and “understand” the world.
Computer vision is used in facial recognition for security systems and authentication, and in self-
driving cars for detecting pedestrians, traffic signs, and other objects on the road. Additionally,
it’s used in healthcare for diagnosing diseases from X-ray images and MRI scans.
Predictive Analytics
Predictive analytics empowers computers to learn patterns from past data, and use them to
forecast future trends, behaviors, or outcomes.
Data scientists use predictive analytics across industries, for example, to detect fraud, assess
credit risk, understand and anticipate customer churn, forecast energy demand, and optimize the
supply chain, among many other applications.
Recommendation Systems
Recommendation systems are algorithms that analyze user preferences. They analyze past
behavior, for example, past purchases, viewed films, or listened to and liked songs, to suggest
personalized content, products, or services that the customer might be interested in.
Recommender systems are used in streaming services like Spotify or Netflix and in e-commerce
like Amazon.
Speech Recognition
Speech recognition involves converting spoken language into text. Once it is in the form of text,
we can use NLP to allow computers to understand it.
Speech recognition is used in virtual assistants and customer services to understand and respond
to users and customers.
Supervised Learning
Supervised learning involves training a model on labeled data, where each input is associated
with an output. The goal of supervised learning is to learn a mapping function from input
variables to output variables. This allows the algorithm to make predictions or decisions when
given new, unseen data.
As we see in the diagram, initially we have a training set containing many observations, and each
observation is labeled. Some are triangles, some are circles, and some are squares. We use that
data to train a machine-learning algorithm. The model learns to match observations to shapes
based on their characteristics. And later on, we can give new observations to the model, and it
will be able to tell us which shape they have.
Regression
Regression models predict continuous values. For example, predicting house prices based on
features like square footage, number of bedrooms, and location is an example of a regression.
Popular algorithms for regression are linear regression, polynomial regression, decision tree
regression, random forest regression, and support vector regression.
Classification
Classification models predict discrete outcomes, or categories. For instance, classifying emails as
spam or non-spam based on their content is an example of classification.
Popular algorithms for classification are Logistic Regression, Naive Bayes, Support Vector
Machines, Decision Trees, Random Forest Classifiers, and K-Nearest Neighbors (KNN).
Unsupervised Learning
Unsupervised learning is a type of machine learning where the algorithm learns patterns and
structures from unlabeled data. Unlike supervised learning, there are no predefined labels for
unsupervised learning tasks. Instead, the algorithm seeks to discover hidden patterns or
groupings within the data.
In the following diagram, we pass a dataset without labels to a machine learning model, which,
by analyzing the intrinsic data patterns, learns to group observations based on their similarities:
Unsupervised learning has many applications. It can be used in clustering to find groups of
similar observations. It can be used to simplify the data representation through dimensionality
reduction. It can also be used to find anomalies.
Clustering
Clustering algorithms group similar data points together into clusters. The goal is to identify
natural groupings or clusters in the data without any prior knowledge of their labels. The
grouping is done by identifying similar patterns among variables.
Clustering can be used, for example, in customer segmentation to group together customers with
similar purchasing behaviors. Some machine learning techniques used for clustering are K-
Means Clustering, Hierarchical Clustering, and DBSCAN.
Dimensionality Reduction
Dimensionality reduction techniques aim to reduce the number of features in a dataset while
preserving its essential information.
Principal Component Analysis is a popular dimensionality reduction technique that projects
high-dimensional data into a lower dimension while preserving as much information as possible.
This can help visualize and analyze complex datasets more effectively.
Anomaly Detection
Anomaly detection with unsupervised learning involves identifying unusual patterns or outliers
in data without labeled examples. By analyzing the inherent structure and distribution of the data,
unsupervised learning algorithms detect deviations or irregularities that stand out from the
typical patterns, thus flagging potential anomalies.
Anomaly detection can be done by clustering and finding observations that do not fit in any
cluster, by determining distributions and flagging outliers, or by using specific machine learning
techniques, like one-class support vector machines or isolation forests.
Labels, also known as targets or responses, are the outcomes or values we want to predict. In a
dataset of house prices, features and variables may include square footage, number of bedrooms,
and location, while the label would be the actual sale price of the house.
Data Preprocessing
The data that is collected either by automated sensors, machines, or systems is not suitable in its
raw format to train machine learning models. Instead, data scientists devote a lot of time to
preparing data to train machine learning models.
Data preprocessing is done to convert the raw data into a processable form that can be fed to a
machine-learning model for training and making predictions. In fact, data preprocessing is the
initial step in data analysis and machine learning projects.
Data preprocessing includes among other things, the following:
Cleaning data, handling missing values and outliers, and removing duplicates.
Scaling or normalizing data for uniformity.
Encoding categorical data to numerical format which the machine can understand.
Transforming variables to meet model assumptions.
Extract features from complex structures, like texts, transactions or time series.
Create new features that capture business knowledge.
Exploratory Data Analysis
Data preprocessing goes hand in hand with exploratory data analysis (EDA). Through EDA, data
scientists seek to understand data patterns, correlations, and trends to gain insights into the
structure, characteristics, and relationships between features.
Visualizations, graphs, and plots are actively used during EDA. This step is crucial for data-
driven decision-making and hypothesis-testing. EDA also aids in creating predictive features and
optimizing model performance.
Training data
The training dataset is used to train the machine learning model by adjusting its parameters based
on the input features and corresponding target labels.
Validation data
This set is used to evaluate and adjust a model during training. It acts like pseudo-test data,
which provides an independent measure of how well the model generalizes to new data and
makes adjustments to improve its effectiveness.
Test data
This set is used to evaluate the final performance of a trained machine learning model, providing
independent examples with input features and target labels that the model has not seen during
training or validation. It serves as an unbiased measure to assess the model’s effectiveness in
real-world scenarios.
Model Training
With the data ready, it is time to train and evaluate the machine learning models. Model training
involves feeding the training data into a machine learning algorithm to adjust its parameters and
optimize its performance.
During model training and evaluation, it’s important to watch out for two common pitfalls:
overfitting and underfitting.
Hyperparameters
Hyperparameters are like settings or configurations that govern how a machine learning model
operates. These parameters are not learned from the data but are rather adjusted to control the
learning of a model.
Hyperparameters can be considered like the knobs of the machine learning model, which we can
adjust to make changes to how the model fits the data. Examples of hyperparameters are the
maximum depth of a decision tree, the number of trees in a random forest, or the kernel type in
SVM.
Methods like grid search and random search are used to find the optimal values for these
hyperparameters in a process called hyperparameter optimization to achieve the best
performance from a model.
Cross-validation
Cross-validation is a technique used to assess the performance and generalization ability of
machine learning models. It involves dividing the dataset into multiple subsets, training the
model on different combinations of these subsets, and evaluating its performance on the
remaining data, aiding in obtaining a more reliable estimate of the model’s performance.
K-fold cross-validation is a popular cross-validation technique. The dataset is divided into K
equal-sized subsets (folds). The model is trained K times, each time using K-1 folds for training
and the remaining fold for validation. This ensures that each data point is used for validation
exactly once. The final performance is calculated by averaging the results from the K validation
runs.
Model Evaluation
To assess the performance of a model, we use evaluation metrics. These metrics measure the
error in the model’s predictions. “Error” in machine learning refers to the difference between the
predicted values generated by a model and the actual values observed in the dataset. The smaller
the error, the better the performance of the model.
There are evaluation metrics for regression and for classification models.
Regression Metrics
There are several metrics that help us determine the performance of a regression model. Here, I
describe the most common ones.
Mean Squared Error (MSE): Measures the average squared difference between the predicted
and actual values. A smaller MSE indicates better model performance.
Root Mean Squared Error (RMSE): Similar to MSE but takes the square root of the average
squared difference. It’s easier to interpret since it’s in the same units as the target variable.
Mean Absolute Error (MAE): Measures the average absolute difference between the predicted
and actual values. It provides a more interpretable measure of error compared to MSE.
R-squared: Indicates how well the independent variables in a regression model explain the
variation in the dependent variable. R-qsuared values vary between 0 and 1, with higher values
indicating better model fit.
Classification Metrics
These are the most common evaluation metrics for classification:
Accuracy: Measures the proportion of correctly classified instances.
Precision: measures the proportion of true positive predictions out of all positive predictions
made by the model. It focuses on the accuracy of positive predictions.
Recall: Measures the proportion of true positive predictions out of all actual positive instances in
the dataset. It focuses on the model’s ability to capture all positive instances.
F1 Score: The harmonic mean of precision and recall, the F1 score provides a balance between
precision and recall.
ROC Curve (Receiver Operating Characteristic Curve): A graphical plot that illustrates the
trade-off between true positive rate (TPR) and false positive rate (FPR) across different threshold
values. The higher the area under the ROC curve, the better the performance.
Confusion matrix: a table that summarizes the performance of a classification model by
comparing actual and predicted class labels. It provides insights into the model’s true positive,
true negative, false positive, and false negative predictions.
Conclusion
Machine learning is revolutionizing how we approach digital challenges. It empowers computers
to learn autonomously, uncover patterns in data, and transform industries with predictive
insights. By grasping the machine learning basics, we open doors to endless possibilities,
enabling collaboration between humans and machines for a brighter, more innovative future.
What is Inductive Learning Algorithm?
Inductive Learning Algorithm (ILA) is an iterative and inductive machine
learning algorithm that is used for generating a set of classification rules, which produces
rules of the form “IF-THEN”, for a set of examples, producing rules at each iteration and
appending to the set of rules.
There are basically two methods for knowledge extraction firstly from domain experts and
then with machine learning. For a very large amount of data, the domain experts are not
very useful and reliable. So we move towards the machine learning approach for this work.
To use machine learning One method is to replicate the expert’s logic in the form of
algorithms but this work is very tedious, time taking, and expensive. So we move towards
the inductive algorithms which generate the strategy for performing a task and need not
instruct separately at each step.
The need was due to the pitfalls which were present in the previous algorithms, one
of the major pitfalls was the lack of generalization of rules.
The ID3 and AQ used the decision tree production method which was too specific
which were difficult to analyze and very slow to perform for basic short
classification problems.
The decision tree-based algorithm was unable to work for a new problem if some
attributes are missing.
The ILA uses the method of production of a general set of rules instead of decision
trees, which overcomes the above problems
Step 1: divide the table ‘T’ containing m examples into n sub-tables (t1, t2,…..tn).
One table for each possible value of the class attribute. (repeat steps 2 -8 for each
sub-table)
Step 2: Initialize the attribute combination count ‘ j ‘ = 1.
Step 3: For the sub-table on which work is going on, divide the attribute list into
distinct combinations, each combination with ‘j ‘ distinct attributes.
Step 4: For each combination of attributes, count the number of occurrences of
attribute values that appear under the same combination of attributes in unmarked
rows of the sub-table under consideration, and at the same time, not appears under
the same combination of attributes of other sub-tables. Call the first combination
with the maximum number of occurrences the max-combination ‘ MAX’.
Step 5: If ‘MAX’ == null, increase ‘ j ‘ by 1 and go to Step 3.
Step 6: Mark all rows of the sub-table where working, in which the values of
‘MAX’ appear, as classified.
Step 7: Add a rule (IF attribute = “XYZ” –> THEN decision is YES/ NO) to R
whose left-hand side will have attribute names of the ‘MAX’ with their values
separated by AND, and its right-hand side contains the decision attribute value
associated with the sub-table.
Step 8: If all rows are marked as classified, then move on to process another sub -
table and go to Step 2. Else, go to Step 4. If no sub-tables are available, exit with
the set of rules obtained till then.
An example showing the use of ILA suppose an example set having attributes Place type,
weather, location, decision, and seven examples, our task is to generate a set of rules that
under what condition is the decision.
Subset – 1
Subset – 2
At iteration 1 rows 3 & 4 column weather is selected and rows 3 & 4 are marked.
the rule is added to R IF the weather is warm then a decision is yes.
At iteration 2 row 1 column place type is selected and row 1 is marked. the rule is
added to R IF the place type is hilly then the decision is yes.
At iteration 3 row 2 column location is selected and row 2 is marked. the rule is
added to R IF the location is Shimla then the decision is yes.
At iteration 4 row 5&6 column location is selected and row 5&6 are marked. the
rule is added to R IF the location is Mumbai then a decision is no.
At iteration 5 row 7 column place type & the weather is selected and row 7 is
marked. the rule is added to R IF the place type is beach AND the weather is
windy then the decision is no.
Finally, we get the rule set:- Rule Set
Rule 1: IF the weather is warm THEN the decision is yes.
Rule 2: IF the place type is hilly THEN the decision is yes.
Rule 3: IF the location is Shimla THEN the decision is yes.
Rule 4: IF the location is Mumbai THEN the decision is no.
Rule 5: IF the place type is beach AND the weather is windy THEN the decision is
no.
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known
as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.
Relationship between Biological neural network and artificial neural network:
An Artificial Neural Network in the field of Artificial intelligence where it attempts to
mimic the network of neurons makes up a human brain so that computers will have an
option to understand things and make decisions in a human-like manner. The artificial
neural network is designed by programming computers to behave simply like
interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain, data
is stored in such a manner as to be distributed, and we can extract more than one piece of
this data when necessary from our memory parallelly. We can say that the human brain is
made up of incredibly amazing parallel processors.
Advertisement
We can understand the artificial neural network with an example, consider an example of
a digital logic gate that takes an input and gives an output. "OR" gate, which takes two
inputs. If one or both the inputs are "On," then we get "On" in output. If both the inputs
are "Off," then we get "Off" in output. Here the output depends upon input. Our brain
does not perform the same task. The outputs to inputs relationship keep changing because
of the neurons in our brain, which are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Advertisement
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the
output. Activation functions choose whether a node should fire or not. Only those who
are fired make it to the output layer. There are distinctive activation functions available
that can be applied upon the sort of task we are performing.
Advantages of Artificial Neural Network (ANN)
Parallel processing capability:
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
Capability to work with incomplete knowledge:
After ANN training, the information may produce output even with inadequate data. The
loss of performance here relies upon the significance of missing data.
Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to
encourage the network according to the desired output by demonstrating these examples
to the network. The succession of the network is directly proportional to the chosen
instances, and if the event can't appear to the network in all its aspects, it can produce
false output.
Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output, and
this feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:
Assurance of proper network structure:
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
Unrecognized behavior of the network:
It is the most significant issue of ANN. When ANN produces a testing solution, it does
not provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The duration of the network is unknown:
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-
20th century are exponentially developing. In the present time, we have
investigated the pros of artificial neural networks and the issues encountered in
the course of their utilization. It should not be overlooked that the cons of ANN
networks, which are a flourishing science branch, are eliminated individually,
and their pros are increasing day by day. It means that artificial neural networks
will turn into an irreplaceable part of our lives progressively important.
How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs and
neuron inputs can be viewed as the directed edges with weights. The Artificial Neural
Network receives the input signal from the external source in the form of a pattern and
image in the form of a vector. These inputs are then mathematically assigned by the
notations x(n) for every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ). In
general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network. All the weighted inputs are
summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or
non-linear sets of functions. Some of the commonly used sets of activation functions are
the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look
at each of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this,
there is a threshold value set up. If the net weighted input of neurons is more than 1, then
the final output of the activation function is returned as one or else the output is returned
as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function
is defined as:
F(x) = (1/1 + exp(-????x))
Where ???? is considered the Steepness parameter.
Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks. For example,
segmentation or classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for
Atmospheric Research. The feedback networks feed information back into itself and are
well suited to solve optimization issues. The Internal system error corrections utilize
feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron. Through assessment of its output by reviewing
its input, the intensity of the network can be noticed based on group behavior of the
associated neurons, and the output is decided. The primary advantage of this network is
that it figures out how to evaluate and recognize input patterns.
What is Perceptron?
Perceptron is a type of neural network that performs binary classification that maps input
features to an output decision, usually classifying data into one of two categories, such as 0 or
1.
Perceptron consists of a single layer of input nodes that are fully connected to a layer of output
nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of
artificial neurons called Threshold Logic Units (TLU), which were first introduced by
McCulloch and Walter Pitts in the 1940s. This foundational model has played a crucial role in
the development of more advanced neural networks and machine learning algorithms.
Types of Perceptron
1. Single-Layer Perceptron is a type of perceptron is limited to learning linearly
separable patterns. It is effective for tasks where the data can be divided into distinct
categories through a straight line. While powerful in its simplicity, it struggles with more
complex problems where the relationship between inputs and outputs is non-linear.
2. Multi-Layer Perceptron possess enhanced processing capabilities as they consist of
two or more layers, adept at handling more complex patterns and relationships within the
data.
Basic Components of Perceptron
A Perceptron is composed of key components that work together to process information and
make predictions.
Input Features: The perceptron takes multiple input features, each representing a
characteristic of the input data.
Weights: Each input feature is assigned a weight that determines its influence on the
output. These weights are adjusted during training to find the optimal values.
Summation Function: The perceptron calculates the weighted sum of its inputs,
combining them with their respective weights.
Activation Function: The weighted sum is passed through the Heaviside step
function, comparing it to a threshold to produce a binary output (0 or 1).
Output: The final output is determined by the activation function, often used
for binary classification tasks.
Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.
Learning Algorithm: The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.
These components enable the perceptron to learn from data and make predictions. While a
single perceptron can handle simple binary classification, complex tasks require multiple
perceptrons organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the importance of that input
in determining the output. The Perceptron’s output is calculated as a weighted sum of the
inputs, which is then passed through an activation function to decide whether the Perceptron
will fire.
The weighted sum is computed as:
z=w1x1+w2x2+…+wnxn=XTWz=w1x1+w2x2+…+wnxn=XTW
The step function compares this weighted sum to a threshold. If the input is larger than the
threshold value, the output is 1; otherwise, it’s 0. This is the most common activation function
used in Perceptrons are represented by the Heaviside step function:
h(z)={0if z<Threshold1if z≥Thresholdh(z)={01if z<Thresholdif z≥Threshold
A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully
connected to all input nodes.
In a fully connected layer, also known as a dense layer, all neurons in one layer are connected
to every neuron in the previous layer.
The output of the fully connected layer is computed as:
fW,b(X)=h(XW+b)fW,b(X)=h(XW+b)
where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is
the step function.
During training, the Perceptron’s weights are adjusted to minimize the difference between the
predicted output and the actual output. This is achieved using supervised learning algorithms
like the delta rule or the Perceptron learning rule.
The weight update formula is:
wi,j=wi,j+η(yj−y^j)xiwi,j=wi,j+η(yj−y^j)xi
Where:
wi,jwi,j is the weight between the ithith input and jthjth output neuron,
xixi is the ithith input value,
yjyj is the actual value, and y^jy^j is the predicted value,
ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy
over time.
Example: Perceptron in Action
Let’s take a simple example of classifying whether a given fruit is an apple or not based on two
inputs: its weight (in grams) and its color (on a scale of 0 to 1, where 1 means red). The
perceptron receives these inputs, multiplies them by their weights, adds a bias, and applies the
activation function to decide whether the fruit is an apple or not.
Input 1 (Weight): 150 grams
Input 2 (Color): 0.9 (since the fruit is mostly red)
Weights: [0.5, 1.0]
Bias: 1.5
The perceptron’s weighted sum would be:
(150∗0.5)+(0.9∗1.0)+1.5=76.4(150∗0.5)+(0.9∗ 1.0)+1.5=76.4
Let’s assume the activation function uses a threshold of 75. Since 76.4 > 75, the perceptron
classifies the fruit as an apple (output = 1).
Reinforcement Learning: An Overview
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to
maximize cumulative rewards in a given situation. Unlike supervised learning, which relies on
a training dataset with predefined answers, RL involves learning through experience. In RL,
an agent learns to achieve a goal in an uncertain, potentially complex environment by
performing actions and receiving feedback through rewards or penalties.
Key Concepts of Reinforcement Learning
Agent: The learner or decision-maker.
Environment: Everything the agent interacts with.
State: A specific situation in which the agent finds itself.
Action: All possible moves the agent can make.
Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behavior through trial and error. The agent
takes actions within the environment, receives rewards or penalties, and adjusts its behavior to
maximize the cumulative reward. This learning process is characterized by the following
elements:
Policy: A strategy used by the agent to determine the next action based on the current
state.
Reward Function: A function that provides a scalar feedback signal based on the state
and action.
Value Function: A function that estimates the expected cumulative reward from a
given state.
Model of the Environment: A representation of the environment that helps in
planning by predicting future states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying all
the possible paths and then choosing the path which gives him the reward with the least
hurdles. Each right step will give the robot a reward and each wrong step will subtract the
reward of the robot. The total reward will be calculated when it reaches the final reward that
is the diamond.
Example: Object
Example: Chess game,text summarization
recognition,spam detetction
Types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In other
words, it has a positive effect on behavior.
Advantages of reinforcement learning are:
Maximizes Performance
Sustain Change for a long period of time
Too much Reinforcement can lead to an overload of states which can diminish
the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
Increases Behavior
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behavior
Elements of Reinforcement Learning
i) Policy: Defines the agent’s behavior at a given time.
ii) Reward Function: Defines the goal of the RL problem by providing feedback.
iii) Value Function: Estimates long-term rewards from a state.
iv) Model of the Environment: Helps in predicting future states and rewards for planning.
Application of Reinforcement Learnings
i) Robotics: Automating tasks in structured environments like manufacturing.
ii) Game Playing: Developing strategies in complex games like chess.
iii) Industrial Control: Real-time adjustments in operations like refinery controls.
iv) Personalized Training Systems: Customizing instruction based on individual needs.
Advantages and Disadvantages of Reinforcement Learning
Advantages:
1. Reinforcement learning can be used to solve very complex problems that cannot be solved
by conventional techniques.
2. The model can correct the errors that occurred during the training process.
3. In RL, training data is obtained via the direct interaction of the agent with the environment
4. Reinforcement learning can handle environments that are non-deterministic, meaning that
the outcomes of actions are not always predictable. This is useful in real-world applications
where the environment may change over time or is uncertain.
5. Reinforcement learning can be used to solve a wide range of problems, including those that
involve decision making, control, and optimization.
6. Reinforcement learning is a flexible approach that can be combined with other machine
learning techniques, such as deep learning, to improve performance.
Disadvantages:
1. Reinforcement learning is not preferable to use for solving simple problems.
2. Reinforcement learning needs a lot of data and a lot of computation
3. Reinforcement learning is highly dependent on the quality of the reward function. If the
reward function is poorly designed, the agent may not learn the desired behavior.
4. Reinforcement learning can be difficult to debug and interpret. It is not always clear why
the agent is behaving in a certain way, which can make it difficult to diagnose and fix
problems.
Conclusion
Reinforcement learning is a powerful technique for decision-making and optimization in
dynamic environments. Its applications range from robotics to personalized learning systems.
However, the complexity of RL requires careful design of reward functions and significant
computational resources. By understanding its principles and applications, one can leverage
RL to solve intricate real-world problems.
Uses a large set of pre-labeled data to Starts with a small set of labeled data and requests
train the algorithm additional data from the user
The algorithm does not interact with The algorithm interacts with the user to acquire
the user additional data
Passive Learning Active Learning
It does not require user input after May continue to request additional data until a
training is complete satisfactory level of accuracy is achieved
Suitable for applications where a Suitable for applications where labeled data is scarce
large dataset is available or expensive to acquire
Conclusion:
In conclusion, passive learning and active learning are two approaches used in machine learning
to acquire data. Passive learning uses a large set of pre-labeled data to train the algorithm, while
active learning starts with a small set of labeled data and requests additional data from the user to
improve accuracy. The choice between passive learning and active learning depends on the
availability of labeled data and the application’s requirements.
Decision Trees
Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. It is
a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are used
to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not
contain any further branches.
The decisions or the test are performed on the basis of
features of the given dataset.
Decision Trees
Gain(S,Temperature) = 0.94-(4/14)(1.0)
-(6/14)(0.9183)
-(4/14)(0.8113)
=0.0289
Tempe Humidit Play
Day Outlook
rature y
Wind
Golf Attribute : Humidity
D1 Sunny Hot High Weak No
Values(Humidity) = High, Normal
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Tempe Humidit Play
Day Outlook Wind
rature y Golf Attribute : Wind
D1 Sunny Hot High Weak No Values(Wind) = Strong, Weak
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
We calculating information gain for all attributes:
Gain(S,Outlook)= 0.2464,
Gain(S,Temperature)= 0.0289
Gain(S,Humidity)=0.1516
Gain(S,Wind) =0.0478
We can clearly see that IG(S, Outlook) has the highest
information gain of 0.246, hence we chose Outlook attribute as
the root node. At this point, the decision tree looks like.
Here we observe that whenever the outlook is Overcast,
Play Golf is always ‘Yes’, it’s no coincidence by any
chance, the simple tree resulted because of the highest
information gain is given by the attribute Outlook.
Now how do we proceed from this point? We can simply
apply recursion, you might want to look at the algorithm
steps described earlier.
Now that we’ve used Outlook, we’ve got three of them
remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain.
Where the Overcast node already ended up having leaf
node ‘Yes’, so we’re left with two subtrees to compute:
Sunny and Rain.
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Humidity
Values(Humidity) = High, Normal
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Wind
Values(Wind) = Strong, Weak
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Gain(Ssunny,Temperature)= 0.570
Gain(Ssunny,Humidity)=0.97
Gain(Ssunny,Wind) =0.0192
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool
Gain(Srain,Humidity)=0.0192
Gain(Srain,Wind) =0.97
Decision Tree