KEMBAR78
Ai Unit 4 Compiled Notes | PDF | Machine Learning | Cross Validation (Statistics)
0% found this document useful (0 votes)
18 views66 pages

Ai Unit 4 Compiled Notes

Uploaded by

syedibrahim36258
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views66 pages

Ai Unit 4 Compiled Notes

Uploaded by

syedibrahim36258
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

ARTIFICIAL INTELLIGENCE

UNIT _4
Machine-Learning Paradigms: introduction, Machine Learning Systems. Supervised and
Unsupervised Learning, Inductive Learning, Learning Decision Trees. Artificial Neural
Networks: Introduction? Artificial Neural Networks, Single-Layer Feed-Forward Networks,
Multi-Layer Feed-Forward Networks Reinforcement learning: Learning from rewards. Passive
and Active reinforcement learning, Applications.

---------------------------------------------------------------------------------------------------------------------

Machine Learning Paradigms: Machine learning (ML) is a dynamic field dedicated to


developing methods that enable machines to learn from extensive datasets to enable machines to
learn and make predictions. The learning paradigms in ML are categorized based on their
resemblance to human interventions, each serving specific purposes and applications. This
dynamic field encompasses various learning paradigms, each with its uniue approach to handling
data.

Supervised and Unsupervised learning


Supervised Learning (SL)

Supervised learning involves labelled datasets, where each data observation is paired with a
corresponding class label. Algorithms in supervised learning aim to build a mathematical function
that maps input features to desired output values based on these labeled examples. Common
applications include classification and regression.

Stages in Supervised Learning

Understanding Supervised Learning pictorially

Unsupervised Learning

In unsupervised learning, algorithms work with unlabeled data to identify patterns and
relationships. These methods uncover commonalities within the data without predefined
categories. Techniques such as clustering and association rules fall under unsupervised learning.
Stages in Unsupervised Learning

Understanding Unsupervised Learning pictorially

Semi-supervised Learning

Semi-supervised learning strikes a balance by combining a small amount of labelled data with a
larger pool of unlabeled data. This approach leverages the benefits of both supervised and
unsupervised learning paradigms, making it a cost-effective and efficient method for training
models when the labeled data is limited.
Understanding Semi-supervised Learning pictorially

Self-supervised Learning (SSL)

In scenarios where obtaining high-quality labeled data is challenging, self-supervised learning


emerges as a solution. In this paradigm, models are pre-trained using unlabeled data, and data
labels are generated automatically during subsequent iterations. SSL transforms unsupervised ML
problems into supervised ones, enhancing learning efficiency. This paradigm is particularly
relevant with the rise of large language models.

Reinforcement Learning

Reinforcement learning focuses on enabling intelligent agents to learn tasks through trial-and-
error interactions with dynamic environments. Without the need for labelled datasets, agents make
decisions to maximize a reward function. This autonomous exploration and learning approach is
crucial for tasks where explicit programming is challenging.
Action-Reward feedback loop: an agent takes actions in an environment, which is interpreted
into a reward and a representation of the state, which are fed back into the agent.

Action-Reward Feedback Loop:

Reinforcement learning operates on an action-reward feedback loop, where agents take actions,
receive rewards, and interpret the environment’s state. This iterative process allows the agent to
autonomously learn optimal actions to maximize positive feedback.

Action-Reward Feedback Loop


Understanding these ML paradigms provides valuable insights into the diverse approaches used to
address different types of problems. Each paradigm comes with its strengths and applications,
contributing to the versatility of machine learning in various domains.
What is Machine Learning?
In today’s digital age, machines can perform tasks that were once thought to be solely in the
realm of human expertise. How can machines carry out these tasks? Thanks to machine learning.
Machine learning is a field of computer science that consists of developing procedures that
enable computers to learn from data without being explicitly programmed. These procedures are
called algorithms.
In simple terms, machine learning allows computers to learn from data and make decisions based
on what they’ve learned. It’s like teaching a computer to recognize faces in photos, understand
spoken language, translate texts, or even play games like chess or Go—all without being
explicitly programmed to do so.

Machine Learning, Deep Learning and Artificial intelligence


You probably hear these terms a lot. How are they related?
Machine learning is a subset of artificial intelligence that focuses on developing algorithms that
enable computers to learn from data.
Deep learning is a specific type of machine learning that uses neural networks to learn complex
patterns in data. Artificial neural networks are modeled after the workings of the human brain.. It
is said that a neural network can approximate, i.e., learn, any mathematical function, and
therefore, their learning potential is enormous.
Artificial intelligence (AI) is a broader concept involving any technique or system that tries to
mimic human intelligence. That includes machine learning and deep learning as specific
approaches within the field.

Data science is an interdisciplinary field that employs scientific methods and machine learning
algorithms to extract insights and knowledge from structured and unstructured data.

What Can Machine Learning Do?


Machine learning allows computers to recognize patterns in data, understand language, identify
objects in images or videos, make recommendations, and predict future outcomes based on past
data.
In fact, machine learning is revolutionizing numerous industries with its ability to analyze vast
amounts of data and extract valuable insights. These are some of the key applications:

Natural Language Processing (NLP)


NLP enables computers to understand, interpret, and generate human language. Thanks to NLP,
computers can detect sentiment, translate and make text summaries.
Generative AI consists of more modern algorithms that allow computers to return human-like
text. These algorithms are called “Generative Pretrained Transformers” or GPT. They are the
state-of-the-art models for NLP.

Computer Vision
Computer vision teaches computers to interpret and analyze information from images and
videos. It enables machines to “see” and “understand” the world.
Computer vision is used in facial recognition for security systems and authentication, and in self-
driving cars for detecting pedestrians, traffic signs, and other objects on the road. Additionally,
it’s used in healthcare for diagnosing diseases from X-ray images and MRI scans.

Predictive Analytics
Predictive analytics empowers computers to learn patterns from past data, and use them to
forecast future trends, behaviors, or outcomes.
Data scientists use predictive analytics across industries, for example, to detect fraud, assess
credit risk, understand and anticipate customer churn, forecast energy demand, and optimize the
supply chain, among many other applications.

Recommendation Systems
Recommendation systems are algorithms that analyze user preferences. They analyze past
behavior, for example, past purchases, viewed films, or listened to and liked songs, to suggest
personalized content, products, or services that the customer might be interested in.
Recommender systems are used in streaming services like Spotify or Netflix and in e-commerce
like Amazon.

Speech Recognition
Speech recognition involves converting spoken language into text. Once it is in the form of text,
we can use NLP to allow computers to understand it.
Speech recognition is used in virtual assistants and customer services to understand and respond
to users and customers.

Other Applications of Machine Learning


Other areas, like robotics—powered by reinforcement learning—are common examples where
machine learning plays a pivotal role.
As you can see, machine learning offers boundless applications in the real world. These are
possible thanks to different types of machine learning methodologies. What are these
methodologies?

Types of Machine Learning Algorithms


Machine learning methodologies can be broadly categorized into two main types: supervised
learning and unsupervised learning.

Supervised Learning
Supervised learning involves training a model on labeled data, where each input is associated
with an output. The goal of supervised learning is to learn a mapping function from input
variables to output variables. This allows the algorithm to make predictions or decisions when
given new, unseen data.
As we see in the diagram, initially we have a training set containing many observations, and each
observation is labeled. Some are triangles, some are circles, and some are squares. We use that
data to train a machine-learning algorithm. The model learns to match observations to shapes
based on their characteristics. And later on, we can give new observations to the model, and it
will be able to tell us which shape they have.

We can use supervised learning for regression and for classification.

Regression
Regression models predict continuous values. For example, predicting house prices based on
features like square footage, number of bedrooms, and location is an example of a regression.
Popular algorithms for regression are linear regression, polynomial regression, decision tree
regression, random forest regression, and support vector regression.

Classification
Classification models predict discrete outcomes, or categories. For instance, classifying emails as
spam or non-spam based on their content is an example of classification.
Popular algorithms for classification are Logistic Regression, Naive Bayes, Support Vector
Machines, Decision Trees, Random Forest Classifiers, and K-Nearest Neighbors (KNN).
Unsupervised Learning
Unsupervised learning is a type of machine learning where the algorithm learns patterns and
structures from unlabeled data. Unlike supervised learning, there are no predefined labels for
unsupervised learning tasks. Instead, the algorithm seeks to discover hidden patterns or
groupings within the data.

In the following diagram, we pass a dataset without labels to a machine learning model, which,
by analyzing the intrinsic data patterns, learns to group observations based on their similarities:

Unsupervised learning has many applications. It can be used in clustering to find groups of
similar observations. It can be used to simplify the data representation through dimensionality
reduction. It can also be used to find anomalies.

Clustering
Clustering algorithms group similar data points together into clusters. The goal is to identify
natural groupings or clusters in the data without any prior knowledge of their labels. The
grouping is done by identifying similar patterns among variables.
Clustering can be used, for example, in customer segmentation to group together customers with
similar purchasing behaviors. Some machine learning techniques used for clustering are K-
Means Clustering, Hierarchical Clustering, and DBSCAN.

Dimensionality Reduction
Dimensionality reduction techniques aim to reduce the number of features in a dataset while
preserving its essential information.
Principal Component Analysis is a popular dimensionality reduction technique that projects
high-dimensional data into a lower dimension while preserving as much information as possible.
This can help visualize and analyze complex datasets more effectively.

Anomaly Detection
Anomaly detection with unsupervised learning involves identifying unusual patterns or outliers
in data without labeled examples. By analyzing the inherent structure and distribution of the data,
unsupervised learning algorithms detect deviations or irregularities that stand out from the
typical patterns, thus flagging potential anomalies.
Anomaly detection can be done by clustering and finding observations that do not fit in any
cluster, by determining distributions and flagging outliers, or by using specific machine learning
techniques, like one-class support vector machines or isolation forests.

Fundamentals of Machine Learning


As you can see, machine learning has many applications in the real world, thanks to different
types of machine learning methodologies. However, the fundamentals of machine learning are
the same across applications and algorithms.
These machine learning basics include key components such as data, algorithms, training,
testing, and evaluation techniques, which are essential for building effective models that
generalize well to new, unseen data.
Let’s flesh these components out one by one.

Data in Machine Learning


Machines learn patterns, make predictions, and generate insights from data. Data is essential for
model performance, decision-making, and optimization. In fact, the field of data science is
devoted to analyzing, processing, and preparing data, either for machine learning or to extract
insight to drive decisions.
Data comes in many forms. We can have tables with numbers, images, or text. Images and texts
are self-explanatory. Tabular data, however, has different flavors. Let’s discover some of them.

Features and Labels


Tabular data comes in the form of tables, where each row is an observation and each column is a
feature or attribute.
Features, also called variables, are individual measurable properties or characteristics of the data
being analyzed. For example, height is a feature, weight is another feature, as is color, vehicle
make, city of residence, and so on.
Features serve as input variables for machine learning algorithms and can be numeric,
categorical, or binary in nature. They provide the information necessary for the algorithm to
learn patterns and make predictions or decisions.

Labels, also known as targets or responses, are the outcomes or values we want to predict. In a
dataset of house prices, features and variables may include square footage, number of bedrooms,
and location, while the label would be the actual sale price of the house.

Numerical and Categorical Data


Numerical data consists of numerical values that represent quantities or measurements. Examples
of numerical variables are the number of rooms in a house, the median income, and the blood
pressure, among others.
Categorical data consists of categories or labels that represent qualitative attributes or
characteristics. Examples of categorical features are gender, marital status, vehicle make, city of
residence, and so on.

Data Preprocessing
The data that is collected either by automated sensors, machines, or systems is not suitable in its
raw format to train machine learning models. Instead, data scientists devote a lot of time to
preparing data to train machine learning models.
Data preprocessing is done to convert the raw data into a processable form that can be fed to a
machine-learning model for training and making predictions. In fact, data preprocessing is the
initial step in data analysis and machine learning projects.
Data preprocessing includes among other things, the following:
 Cleaning data, handling missing values and outliers, and removing duplicates.
 Scaling or normalizing data for uniformity.
 Encoding categorical data to numerical format which the machine can understand.
 Transforming variables to meet model assumptions.
 Extract features from complex structures, like texts, transactions or time series.
 Create new features that capture business knowledge.
Exploratory Data Analysis
Data preprocessing goes hand in hand with exploratory data analysis (EDA). Through EDA, data
scientists seek to understand data patterns, correlations, and trends to gain insights into the
structure, characteristics, and relationships between features.
Visualizations, graphs, and plots are actively used during EDA. This step is crucial for data-
driven decision-making and hypothesis-testing. EDA also aids in creating predictive features and
optimizing model performance.

Training and Validation Data Sets


After data preprocessing and EDA, we are ready to start training machine learning models. To
train machine learning models, we typically split the original data set into two or more sets: a
training set, a testing set, and a validation set.

Training data
The training dataset is used to train the machine learning model by adjusting its parameters based
on the input features and corresponding target labels.

Validation data
This set is used to evaluate and adjust a model during training. It acts like pseudo-test data,
which provides an independent measure of how well the model generalizes to new data and
makes adjustments to improve its effectiveness.

Test data
This set is used to evaluate the final performance of a trained machine learning model, providing
independent examples with input features and target labels that the model has not seen during
training or validation. It serves as an unbiased measure to assess the model’s effectiveness in
real-world scenarios.

Model Training
With the data ready, it is time to train and evaluate the machine learning models. Model training
involves feeding the training data into a machine learning algorithm to adjust its parameters and
optimize its performance.
During model training and evaluation, it’s important to watch out for two common pitfalls:
overfitting and underfitting.

Overfitting & Underfitting


Overfitting occurs when a model learns the training data too well, capturing noise or irrelevant
patterns that do not generalize to new data. This leads to poor performance on unseen data.
Underfitting occurs when a model is too simplistic to capture the underlying patterns in the data.
This leads to poor performance both on the training and test datasets.

Bias & Variance


Overfitting and underfitting are related to the trade-off between bias and variance in model
performance. Bias refers to the error due to overly simplistic assumptions in the model, and
variance relates to the model’s sensitivity to fluctuations in the training data.
Bias represents the error introduced by the model’s assumptions or simplifications. High-bias
models underfit the data, leading to poor performance on both training and test datasets.
Variance represents the sensitivity of our model to a given data point. High-variance models
may overfit the data, capturing noise or irrelevant patterns and failing to generalize to new data.
A good model should strike a balance between bias and variance, known as the “bias-variance
tradeoff.”

Hyperparameters
Hyperparameters are like settings or configurations that govern how a machine learning model
operates. These parameters are not learned from the data but are rather adjusted to control the
learning of a model.
Hyperparameters can be considered like the knobs of the machine learning model, which we can
adjust to make changes to how the model fits the data. Examples of hyperparameters are the
maximum depth of a decision tree, the number of trees in a random forest, or the kernel type in
SVM.
Methods like grid search and random search are used to find the optimal values for these
hyperparameters in a process called hyperparameter optimization to achieve the best
performance from a model.

Cross-validation
Cross-validation is a technique used to assess the performance and generalization ability of
machine learning models. It involves dividing the dataset into multiple subsets, training the
model on different combinations of these subsets, and evaluating its performance on the
remaining data, aiding in obtaining a more reliable estimate of the model’s performance.
K-fold cross-validation is a popular cross-validation technique. The dataset is divided into K
equal-sized subsets (folds). The model is trained K times, each time using K-1 folds for training
and the remaining fold for validation. This ensures that each data point is used for validation
exactly once. The final performance is calculated by averaging the results from the K validation
runs.

Model Evaluation
To assess the performance of a model, we use evaluation metrics. These metrics measure the
error in the model’s predictions. “Error” in machine learning refers to the difference between the
predicted values generated by a model and the actual values observed in the dataset. The smaller
the error, the better the performance of the model.
There are evaluation metrics for regression and for classification models.

Regression Metrics
There are several metrics that help us determine the performance of a regression model. Here, I
describe the most common ones.
Mean Squared Error (MSE): Measures the average squared difference between the predicted
and actual values. A smaller MSE indicates better model performance.
Root Mean Squared Error (RMSE): Similar to MSE but takes the square root of the average
squared difference. It’s easier to interpret since it’s in the same units as the target variable.
Mean Absolute Error (MAE): Measures the average absolute difference between the predicted
and actual values. It provides a more interpretable measure of error compared to MSE.
R-squared: Indicates how well the independent variables in a regression model explain the
variation in the dependent variable. R-qsuared values vary between 0 and 1, with higher values
indicating better model fit.

Classification Metrics
These are the most common evaluation metrics for classification:
Accuracy: Measures the proportion of correctly classified instances.
Precision: measures the proportion of true positive predictions out of all positive predictions
made by the model. It focuses on the accuracy of positive predictions.
Recall: Measures the proportion of true positive predictions out of all actual positive instances in
the dataset. It focuses on the model’s ability to capture all positive instances.
F1 Score: The harmonic mean of precision and recall, the F1 score provides a balance between
precision and recall.
ROC Curve (Receiver Operating Characteristic Curve): A graphical plot that illustrates the
trade-off between true positive rate (TPR) and false positive rate (FPR) across different threshold
values. The higher the area under the ROC curve, the better the performance.
Confusion matrix: a table that summarizes the performance of a classification model by
comparing actual and predicted class labels. It provides insights into the model’s true positive,
true negative, false positive, and false negative predictions.

Conclusion
Machine learning is revolutionizing how we approach digital challenges. It empowers computers
to learn autonomously, uncover patterns in data, and transform industries with predictive
insights. By grasping the machine learning basics, we open doors to endless possibilities,
enabling collaboration between humans and machines for a brighter, more innovative future.
What is Inductive Learning Algorithm?
Inductive Learning Algorithm (ILA) is an iterative and inductive machine
learning algorithm that is used for generating a set of classification rules, which produces
rules of the form “IF-THEN”, for a set of examples, producing rules at each iteration and
appending to the set of rules.
There are basically two methods for knowledge extraction firstly from domain experts and
then with machine learning. For a very large amount of data, the domain experts are not
very useful and reliable. So we move towards the machine learning approach for this work.
To use machine learning One method is to replicate the expert’s logic in the form of
algorithms but this work is very tedious, time taking, and expensive. So we move towards
the inductive algorithms which generate the strategy for performing a task and need not
instruct separately at each step.

Why you should use Inductive Learning?


The ILA is a new algorithm that was needed even when other reinforcement learnings lik e
ID3 and AQ were available.

 The need was due to the pitfalls which were present in the previous algorithms, one
of the major pitfalls was the lack of generalization of rules.
 The ID3 and AQ used the decision tree production method which was too specific
which were difficult to analyze and very slow to perform for basic short
classification problems.
 The decision tree-based algorithm was unable to work for a new problem if some
attributes are missing.
 The ILA uses the method of production of a general set of rules instead of decision
trees, which overcomes the above problems

Basic Requirements to Apply Inductive Learning Algorithm


1. List the examples in the form of a table ‘T’ where each row correspo nds to an
example and each column contains an attribute value.
2. Create a set of m training examples, each example composed of k attributes and a
class attribute with n possible decisions.
3. Create a rule set, R, having the initial value false.
4. Initially, all rows in the table are unmarked.

Necessary Steps for Implementation

 Step 1: divide the table ‘T’ containing m examples into n sub-tables (t1, t2,…..tn).
One table for each possible value of the class attribute. (repeat steps 2 -8 for each
sub-table)
 Step 2: Initialize the attribute combination count ‘ j ‘ = 1.
 Step 3: For the sub-table on which work is going on, divide the attribute list into
distinct combinations, each combination with ‘j ‘ distinct attributes.
 Step 4: For each combination of attributes, count the number of occurrences of
attribute values that appear under the same combination of attributes in unmarked
rows of the sub-table under consideration, and at the same time, not appears under
the same combination of attributes of other sub-tables. Call the first combination
with the maximum number of occurrences the max-combination ‘ MAX’.
 Step 5: If ‘MAX’ == null, increase ‘ j ‘ by 1 and go to Step 3.
 Step 6: Mark all rows of the sub-table where working, in which the values of
‘MAX’ appear, as classified.
 Step 7: Add a rule (IF attribute = “XYZ” –> THEN decision is YES/ NO) to R
whose left-hand side will have attribute names of the ‘MAX’ with their values
separated by AND, and its right-hand side contains the decision attribute value
associated with the sub-table.
 Step 8: If all rows are marked as classified, then move on to process another sub -
table and go to Step 2. Else, go to Step 4. If no sub-tables are available, exit with
the set of rules obtained till then.
An example showing the use of ILA suppose an example set having attributes Place type,
weather, location, decision, and seven examples, our task is to generate a set of rules that
under what condition is the decision.

Example no. Place type weather location decision

1. hilly winter kullu Yes

2. mountain windy Mumbai No

3. mountain windy Shimla Yes

4. beach windy Mumbai No

5. beach warm goa Yes


Example no. Place type weather location decision

6. beach windy goa No

7. beach warm Shimla Yes

Subset – 1

s.no place type weather location decision

1. hilly winter kullu Yes

2. mountain windy Shimla Yes

3. beach warm goa Yes


s.no place type weather location decision

4. beach warm Shimla Yes

Subset – 2

s.no place type weather location decision

5. mountain windy Mumbai No

6. beach windy Mumbai No

7. beach windy goa No

 At iteration 1 rows 3 & 4 column weather is selected and rows 3 & 4 are marked.
the rule is added to R IF the weather is warm then a decision is yes.
 At iteration 2 row 1 column place type is selected and row 1 is marked. the rule is
added to R IF the place type is hilly then the decision is yes.
 At iteration 3 row 2 column location is selected and row 2 is marked. the rule is
added to R IF the location is Shimla then the decision is yes.
 At iteration 4 row 5&6 column location is selected and row 5&6 are marked. the
rule is added to R IF the location is Mumbai then a decision is no.
 At iteration 5 row 7 column place type & the weather is selected and row 7 is
marked. the rule is added to R IF the place type is beach AND the weather is
windy then the decision is no.
Finally, we get the rule set:- Rule Set
 Rule 1: IF the weather is warm THEN the decision is yes.
 Rule 2: IF the place type is hilly THEN the decision is yes.
 Rule 3: IF the location is Shimla THEN the decision is yes.
 Rule 4: IF the location is Mumbai THEN the decision is no.
 Rule 5: IF the place type is beach AND the weather is windy THEN the decision is
no.
 What is Artificial Neural Network?
 The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known
as nodes.


 The given figure illustrates the typical diagram of Biological Neural Network.
 The typical Artificial Neural Network looks something like the given figure.

 Dendrites from Biological Neural Network represent inputs in Artificial Neural
Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.
 Relationship between Biological neural network and artificial neural network:
 An Artificial Neural Network in the field of Artificial intelligence where it attempts to
mimic the network of neurons makes up a human brain so that computers will have an
option to understand things and make decisions in a human-like manner. The artificial
neural network is designed by programming computers to behave simply like
interconnected brain cells.
 There are around 1000 billion neurons in the human brain. Each neuron has an
association point somewhere in the range of 1,000 and 100,000. In the human brain, data
is stored in such a manner as to be distributed, and we can extract more than one piece of
this data when necessary from our memory parallelly. We can say that the human brain is
made up of incredibly amazing parallel processors.
 Advertisement
 We can understand the artificial neural network with an example, consider an example of
a digital logic gate that takes an input and gives an output. "OR" gate, which takes two
inputs. If one or both the inputs are "On," then we get "On" in output. If both the inputs
are "Off," then we get "Off" in output. Here the output depends upon input. Our brain
does not perform the same task. The outputs to inputs relationship keep changing because
of the neurons in our brain, which are "learning."
 The architecture of an artificial neural network:
 To understand the concept of the architecture of an artificial neural network, we have to
understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
 Artificial Neural Network primarily consists of three layers:


 Input Layer:
 As the name suggests, it accepts inputs in several different formats provided by the
programmer.
 Hidden Layer:
 The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
 Advertisement
 Output Layer:
 The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
 The artificial neural network takes input and computes the weighted sum of the inputs
and includes a bias. This computation is represented in the form of a transfer function.


 It determines weighted total is passed as an input to an activation function to produce the
output. Activation functions choose whether a node should fire or not. Only those who
are fired make it to the output layer. There are distinctive activation functions available
that can be applied upon the sort of task we are performing.
 Advantages of Artificial Neural Network (ANN)
 Parallel processing capability:
 Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
 Storing data on the entire network:
 Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
 Capability to work with incomplete knowledge:
 After ANN training, the information may produce output even with inadequate data. The
loss of performance here relies upon the significance of missing data.
 Having a memory distribution:
 For ANN is to be able to adapt, it is important to determine the examples and to
encourage the network according to the desired output by demonstrating these examples
to the network. The succession of the network is directly proportional to the chosen
instances, and if the event can't appear to the network in all its aspects, it can produce
false output.
 Having fault tolerance:
 Extortion of one or more cells of ANN does not prohibit it from generating output, and
this feature makes the network fault-tolerance.
 Disadvantages of Artificial Neural Network:
 Assurance of proper network structure:
 There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
 Unrecognized behavior of the network:
 It is the most significant issue of ANN. When ANN produces a testing solution, it does
not provide insight concerning why and how. It decreases trust in the network.
 Hardware dependence:
 Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
 Difficulty of showing the issue to the network:
 ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
 The duration of the network is unknown:
 The network is reduced to a specific value of the error, and this value does not give us
optimum results.

 Science artificial neural networks that have steeped into the world in the mid-
20th century are exponentially developing. In the present time, we have
investigated the pros of artificial neural networks and the issues encountered in
the course of their utilization. It should not be overlooked that the cons of ANN
networks, which are a flourishing science branch, are eliminated individually,
and their pros are increasing day by day. It means that artificial neural networks
will turn into an irreplaceable part of our lives progressively important.
 How do artificial neural networks work?
 Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes. The association between the neurons outputs and
neuron inputs can be viewed as the directed edges with weights. The Artificial Neural
Network receives the input signal from the external source in the form of a pattern and
image in the form of a vector. These inputs are then mathematically assigned by the
notations x(n) for every n number of inputs.

 Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ). In
general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network. All the weighted inputs are
summarized inside the computing unit.
 If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
 The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or
non-linear sets of functions. Some of the commonly used sets of activation functions are
the Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look
at each of them in details:
 Binary:
 In binary activation function, the output is either a one or a 0. Here, to accomplish this,
there is a threshold value set up. If the net weighted input of neurons is more than 1, then
the final output of the activation function is returned as one or else the output is returned
as 0.
 Sigmoidal Hyperbolic:
 The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function
is defined as:
 F(x) = (1/1 + exp(-????x))
 Where ???? is considered the Steepness parameter.
 Types of Artificial Neural Network:
 There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks. For example,
segmentation or classification.
 Feedback ANN:
 In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for
Atmospheric Research. The feedback networks feed information back into itself and are
well suited to solve optimization issues. The Internal system error corrections utilize
feedback ANNs.
 Feed-Forward ANN:
 A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron. Through assessment of its output by reviewing
its input, the intensity of the network can be noticed based on group behavior of the
associated neurons, and the output is decided. The primary advantage of this network is
that it figures out how to evaluate and recognize input patterns.

What is Perceptron?
Perceptron is a type of neural network that performs binary classification that maps input
features to an output decision, usually classifying data into one of two categories, such as 0 or
1.
Perceptron consists of a single layer of input nodes that are fully connected to a layer of output
nodes. It is particularly good at learning linearly separable patterns. It utilizes a variation of
artificial neurons called Threshold Logic Units (TLU), which were first introduced by
McCulloch and Walter Pitts in the 1940s. This foundational model has played a crucial role in
the development of more advanced neural networks and machine learning algorithms.
Types of Perceptron
1. Single-Layer Perceptron is a type of perceptron is limited to learning linearly
separable patterns. It is effective for tasks where the data can be divided into distinct
categories through a straight line. While powerful in its simplicity, it struggles with more
complex problems where the relationship between inputs and outputs is non-linear.
2. Multi-Layer Perceptron possess enhanced processing capabilities as they consist of
two or more layers, adept at handling more complex patterns and relationships within the
data.
Basic Components of Perceptron
A Perceptron is composed of key components that work together to process information and
make predictions.
 Input Features: The perceptron takes multiple input features, each representing a
characteristic of the input data.
 Weights: Each input feature is assigned a weight that determines its influence on the
output. These weights are adjusted during training to find the optimal values.
 Summation Function: The perceptron calculates the weighted sum of its inputs,
combining them with their respective weights.
 Activation Function: The weighted sum is passed through the Heaviside step
function, comparing it to a threshold to produce a binary output (0 or 1).
 Output: The final output is determined by the activation function, often used
for binary classification tasks.
 Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.
 Learning Algorithm: The perceptron adjusts its weights and bias using a learning
algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.
These components enable the perceptron to learn from data and make predictions. While a
single perceptron can handle simple binary classification, complex tasks require multiple
perceptrons organized into layers, forming a neural network.
How does Perceptron work?
A weight is assigned to each input node of a perceptron, indicating the importance of that input
in determining the output. The Perceptron’s output is calculated as a weighted sum of the
inputs, which is then passed through an activation function to decide whether the Perceptron
will fire.
The weighted sum is computed as:
z=w1x1+w2x2+…+wnxn=XTWz=w1x1+w2x2+…+wnxn=XTW
The step function compares this weighted sum to a threshold. If the input is larger than the
threshold value, the output is 1; otherwise, it’s 0. This is the most common activation function
used in Perceptrons are represented by the Heaviside step function:
h(z)={0if z<Threshold1if z≥Thresholdh(z)={01if z<Thresholdif z≥Threshold
A perceptron consists of a single layer of Threshold Logic Units (TLU), with each TLU fully
connected to all input nodes.

Threshold Logic units

In a fully connected layer, also known as a dense layer, all neurons in one layer are connected
to every neuron in the previous layer.
The output of the fully connected layer is computed as:
fW,b(X)=h(XW+b)fW,b(X)=h(XW+b)
where XX is the input WW is the weight for each inputs neurons and bb is the bias and hh is
the step function.
During training, the Perceptron’s weights are adjusted to minimize the difference between the
predicted output and the actual output. This is achieved using supervised learning algorithms
like the delta rule or the Perceptron learning rule.
The weight update formula is:
wi,j=wi,j+η(yj−y^j)xiwi,j=wi,j+η(yj−y^j)xi
Where:
 wi,jwi,j is the weight between the ithith input and jthjth output neuron,
 xixi is the ithith input value,
 yjyj is the actual value, and y^jy^j is the predicted value,
 ηη is the learning rate, controlling how much the weights are adjusted.
This process enables the perceptron to learn from data and improve its prediction accuracy
over time.
Example: Perceptron in Action
Let’s take a simple example of classifying whether a given fruit is an apple or not based on two
inputs: its weight (in grams) and its color (on a scale of 0 to 1, where 1 means red). The
perceptron receives these inputs, multiplies them by their weights, adds a bias, and applies the
activation function to decide whether the fruit is an apple or not.
 Input 1 (Weight): 150 grams
 Input 2 (Color): 0.9 (since the fruit is mostly red)
 Weights: [0.5, 1.0]
 Bias: 1.5
The perceptron’s weighted sum would be:
(150∗0.5)+(0.9∗1.0)+1.5=76.4(150∗0.5)+(0.9∗ 1.0)+1.5=76.4
Let’s assume the activation function uses a threshold of 75. Since 76.4 > 75, the perceptron
classifies the fruit as an apple (output = 1).
Reinforcement Learning: An Overview
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to
maximize cumulative rewards in a given situation. Unlike supervised learning, which relies on
a training dataset with predefined answers, RL involves learning through experience. In RL,
an agent learns to achieve a goal in an uncertain, potentially complex environment by
performing actions and receiving feedback through rewards or penalties.
Key Concepts of Reinforcement Learning
 Agent: The learner or decision-maker.
 Environment: Everything the agent interacts with.
 State: A specific situation in which the agent finds itself.
 Action: All possible moves the agent can make.
 Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behavior through trial and error. The agent
takes actions within the environment, receives rewards or penalties, and adjusts its behavior to
maximize the cumulative reward. This learning process is characterized by the following
elements:
 Policy: A strategy used by the agent to determine the next action based on the current
state.
 Reward Function: A function that provides a scalar feedback signal based on the state
and action.
 Value Function: A function that estimates the expected cumulative reward from a
given state.
 Model of the Environment: A representation of the environment that helps in
planning by predicting future states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying all
the possible paths and then choosing the path which gives him the reward with the least
hurdles. Each right step will give the robot a reward and each wrong step will subtract the
reward of the robot. The total reward will be calculated when it reaches the final reward that
is the diamond.

Main points in Reinforcement learning –


 Input: The input should be an initial state from which the model will start
 Output: There are many possible outputs as there are a variety of solutions to a
particular problem
 Training: The training is based upon the input, The model will return a state and the
user will decide to reward or punish the model based on its output.
 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.

Difference between Reinforcement learning and Supervised learning:


Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions


In Supervised learning, the
sequentially. In simple words, we can say that the
decision is made on the initial
output depends on the state of the current input and
input or the input given at the
the next input depends on the output of the previous
start
input

In supervised learning the


In Reinforcement learning decision is dependent, So decisions are independent of each
we give labels to sequences of dependent decisions other so labels are given to each
decision.

Example: Object
Example: Chess game,text summarization
recognition,spam detetction

Types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In other
words, it has a positive effect on behavior.
Advantages of reinforcement learning are:

 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish
the results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
 Increases Behavior
 Provide defiance to a minimum standard of performance
 It Only provides enough to meet up the minimum behavior
Elements of Reinforcement Learning
i) Policy: Defines the agent’s behavior at a given time.
ii) Reward Function: Defines the goal of the RL problem by providing feedback.
iii) Value Function: Estimates long-term rewards from a state.
iv) Model of the Environment: Helps in predicting future states and rewards for planning.
Application of Reinforcement Learnings
i) Robotics: Automating tasks in structured environments like manufacturing.
ii) Game Playing: Developing strategies in complex games like chess.
iii) Industrial Control: Real-time adjustments in operations like refinery controls.
iv) Personalized Training Systems: Customizing instruction based on individual needs.
Advantages and Disadvantages of Reinforcement Learning
Advantages:
1. Reinforcement learning can be used to solve very complex problems that cannot be solved
by conventional techniques.
2. The model can correct the errors that occurred during the training process.
3. In RL, training data is obtained via the direct interaction of the agent with the environment
4. Reinforcement learning can handle environments that are non-deterministic, meaning that
the outcomes of actions are not always predictable. This is useful in real-world applications
where the environment may change over time or is uncertain.
5. Reinforcement learning can be used to solve a wide range of problems, including those that
involve decision making, control, and optimization.
6. Reinforcement learning is a flexible approach that can be combined with other machine
learning techniques, such as deep learning, to improve performance.
Disadvantages:
1. Reinforcement learning is not preferable to use for solving simple problems.
2. Reinforcement learning needs a lot of data and a lot of computation
3. Reinforcement learning is highly dependent on the quality of the reward function. If the
reward function is poorly designed, the agent may not learn the desired behavior.
4. Reinforcement learning can be difficult to debug and interpret. It is not always clear why
the agent is behaving in a certain way, which can make it difficult to diagnose and fix
problems.
Conclusion
Reinforcement learning is a powerful technique for decision-making and optimization in
dynamic environments. Its applications range from robotics to personalized learning systems.
However, the complexity of RL requires careful design of reward functions and significant
computational resources. By understanding its principles and applications, one can leverage
RL to solve intricate real-world problems.

Passive and Active learning in Machine Learning


Machine learning is a subfield of artificial intelligence that deals with the creation of algorithms
that can learn and improve themselves without explicit programming. One of the most critical
factors that contribute to the success of a machine learning model is the quality and quantity of
data used to train it. Passive learning and active learning are two approaches used in machine
learning to acquire data.
Passive Learning:
Passive learning, also known as batch learning, is a method of acquiring data by processing a
large set of pre-labeled data. In passive learning, the algorithm uses all the available data to learn
and improve its performance. The algorithm does not interact with the user or request additional
data to improve its accuracy.

Example:- An example of passive learning is training a machine learning model to classify


emails as spam or not spam. The algorithm is fed a large dataset of labeled emails and uses it to
learn how to identify spam emails. Once the training is complete, the algorithm can accurately
classify new emails without any further input from the user.
Active Learning:
Active learning is a method of acquiring data where the algorithm interacts with the user to
acquire additional data to improve its accuracy. In active learning, the algorithm starts with a
small set of labeled data and requests the user to label additional data. The algorithm uses the
newly labeled data to improve its performance and may continue to request additional data until
a satisfactory level of accuracy is achieved.
Example:- An example of active learning is training a machine learning model to recognize
handwritten digits. The algorithm may start with a small set of labeled data and ask the user to
label additional data that the algorithm is uncertain about. The algorithm uses the newly labeled
data to improve its accuracy, and the process repeats until the algorithm can accurately recognize
most handwritten digits.

Passive learning and Active learning

Difference Between Passive Learning and Active Learning:


The following table summarizes the differences between passive learning and active learning:

Passive Learning Active Learning

Uses a large set of pre-labeled data to Starts with a small set of labeled data and requests
train the algorithm additional data from the user

The algorithm does not interact with The algorithm interacts with the user to acquire
the user additional data
Passive Learning Active Learning

It does not require user input after May continue to request additional data until a
training is complete satisfactory level of accuracy is achieved

Suitable for applications where a Suitable for applications where labeled data is scarce
large dataset is available or expensive to acquire

Conclusion:
In conclusion, passive learning and active learning are two approaches used in machine learning
to acquire data. Passive learning uses a large set of pre-labeled data to train the algorithm, while
active learning starts with a small set of labeled data and requests additional data from the user to
improve accuracy. The choice between passive learning and active learning depends on the
availability of labeled data and the application’s requirements.
Decision Trees
 Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. It is
a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are used
to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not
contain any further branches.
 The decisions or the test are performed on the basis of
features of the given dataset.
Decision Trees

An example of a decision tree can be explained using above binary


tree. Let’s say you want to predict whether a person is fit given their
information like age, eating habit, and physical activity, etc. The
decision nodes here are questions like ‘What’s the age?’, ‘Does he
exercise?’, ‘Does he eat a lot of pizzas’? And the leaves, which are
outcomes like either ‘fit’, or ‘unfit’. In this case this was a binary
classification problem (a yes no type problem).
Decision Trees
There are two main types of Decision Trees:
Classification trees (Yes/No types)
 What we’ve seen above is an example of classification tree,
where the outcome was a variable like ‘fit’ or ‘unfit’. Here the
decision variable is Categorical.

Regression trees (Continuous data types)


 Here the decision or the outcome variable is Continuous, e.g. a
number like 123.
 Working Now that we know what a Decision Tree is, we’ll see
how it works internally.
 There are many algorithms out there which construct Decision
Trees, but one of the best is called as ID3 Algorithm. ID3
Stands for Iterative Dichotomiser 3.
Decision Trees
 ID3 algorithm is a classification algorithm that follows
a greedy approach of building a decision tree by selecting
a best attribute that yields maximum Information Gain
(IG) or minimum Entropy (H).
 Entropy: Entropy is the measures
of impurity, disorder or uncertainty in a bunch of
examples.
 Entropy controls how a Decision Tree decides to split the
data. It actually effects how a Decision Tree draws its
boundaries.
 The Equation of Entropy:
Decision Trees
 Information gain (IG): It measures how much
“information” a feature gives us about the class. It is also
called as Kullback-Leibler divergence
Why it matter ?
 Information gain is the main key that is used by Decision
Tree Algorithms to construct a Decision Tree.
 Decision Trees algorithm will always tries to
maximize Information gain.
 An attribute with highest Information gain will tested/split
first.
 The Equation of Information gain:
Play Entropy calculations: Entropy of the current
Day Outlook Temperature Humidity Wind
Golf
state. In the above example, we can see in total
D1 Sunny Hot High Weak No
there are 5 No’s and 9 Yes’s.
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
Remember that the Entropy is 0 if all
D9 Sunny Cool Normal Weak Yes
members belong to the same class, and 1
D10 Rain Mild Normal Weak Yes when half of them belong to one class and
D11 Sunny Mild Normal Strong Yes other half belong to other class that is
D12 Overcast Mild High Strong Yes perfect randomness.
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

In the above equation, out of 6 Strong


examples, we have 3 examples where the
outcome was ‘Yes’ for Play Golf and 3
where we had ‘No’ for Play Golf.
Play
Day Outlook Temperature Humidity Wind
Golf Information Gain (IG)
D1 Sunny Hot High Weak No Calculation:
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes where ‘x’ are the possible values for an
D6 Rain Cool Normal Strong No attribute. Here, attribute ‘Wind’ takes two
D7 Overcast Cool Normal Strong Yes possible values in the sample data, hence x
D8 Sunny Mild High Weak No = {Weak, Strong} We’ll have to calculate:
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Amongst all the 14 examples we have 8


places where the wind is weak and 6
where the wind is Strong.
Day Outlook Temperature Humidity Wind
Play Information Gain (IG)
Golf
Calculation:
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
Now out of the 8 Weak examples, 6 of
D7 Overcast Cool Normal Strong Yes
them were ‘Yes’ for Play Golf and 2 of
D8 Sunny Mild High Weak No
them were ‘No’ for ‘Play Golf’. So, we
D9 Sunny Cool Normal Weak Yes
have,
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes We already calculated Entropy(Sstrong)=1
D14 Rain Mild High Strong No
Draw a decision tree for the given data set
using ID3 algorithm
Day Outlook Temperature Humidity Wind Play Golf

D1 Sunny Hot High Weak No


D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Day Outlook Temperature Humidity Wind
Play In the given example there are
Golf
four attributes {outlook,
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
temperature, humidity, wing}
D3 Overcast Hot High Weak Yes and there is class which
D4 Rain Mild High Weak Yes contains binary values i.e., yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
or no.
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No We need to calculate
D9 Sunny Cool Normal Weak Yes information gain for each
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
attribute so that we can decide
D12 Overcast Mild High Strong Yes which attribute will be taken as
D13 Overcast Hot Normal Weak Yes a root node for drawing a
D14 Rain Mild High Strong No
decision tree.
So for calculating Information
gain we need to calculate
Entropy values.
Tempe Humidit Play Attribute : Outlook
Day Outlook Wind
rature y Golf
Values(Outlook) = Sunny, Overcast, Rain
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Tempe Humidit Play
Day Outlook
rature y
Wind
Golf Attribute : Temperature
D1 Sunny Hot High Weak No
Values(Temperature) = Hot, Mild, Cool
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Gain(S,Temperature) = 0.94-(4/14)(1.0)
-(6/14)(0.9183)
-(4/14)(0.8113)
=0.0289
Tempe Humidit Play
Day Outlook
rature y
Wind
Golf Attribute : Humidity
D1 Sunny Hot High Weak No
Values(Humidity) = High, Normal
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Tempe Humidit Play
Day Outlook Wind
rature y Golf Attribute : Wind
D1 Sunny Hot High Weak No Values(Wind) = Strong, Weak
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
 We calculating information gain for all attributes:
Gain(S,Outlook)= 0.2464,
Gain(S,Temperature)= 0.0289
Gain(S,Humidity)=0.1516
Gain(S,Wind) =0.0478
 We can clearly see that IG(S, Outlook) has the highest
information gain of 0.246, hence we chose Outlook attribute as
the root node. At this point, the decision tree looks like.
 Here we observe that whenever the outlook is Overcast,
Play Golf is always ‘Yes’, it’s no coincidence by any
chance, the simple tree resulted because of the highest
information gain is given by the attribute Outlook.
 Now how do we proceed from this point? We can simply
apply recursion, you might want to look at the algorithm
steps described earlier.
 Now that we’ve used Outlook, we’ve got three of them
remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain.
 Where the Overcast node already ended up having leaf
node ‘Yes’, so we’re left with two subtrees to compute:
Sunny and Rain.
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool

Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Humidity
Values(Humidity) = High, Normal
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Wind
Values(Wind) = Strong, Weak

Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Gain(Ssunny,Temperature)= 0.570

Gain(Ssunny,Humidity)=0.97

Gain(Ssunny,Wind) =0.0192
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool

Tempe Humidit Play


Day Wind
rature y Golf
D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No
Attribute : Humidity
Values(Humidity) = High, Normal
Tempe Humidit Play
Day Wind
rature y Golf
D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No
Attribute : Wind
Values(Wind) = Strong, Weak
Tempe Humidit Play
Day Wind
rature y Golf
D4 Mild High Weak Yes
D5 Cool Normal Weak Yes
D6 Cool Normal Strong No
D10 Mild Normal Weak Yes
D14 Mild High Strong No
Gain(Srain,Temperature)= 0.0192

Gain(Srain,Humidity)=0.0192

Gain(Srain,Wind) =0.97
Decision Tree

You might also like