0% found this document useful (0 votes)

26 views11 pages

Decision Tree & Random ForestNotes

The document provides an overview of the Decision Tree Classification Algorithm, explaining its structure, terminology, and how it operates using the CART algorithm. It also discusses the advantages and disadvantages of decision trees, as well as ensemble methods like Random Forest, which combines multiple decision trees for improved accuracy. Additionally, it contrasts bagging and boosting techniques used in ensemble learning, highlighting their differences and applications.

Uploaded by

l.ilyman.der.son.81.3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views11 pages

Decision Tree & Random ForestNotes

Uploaded by

l.ilyman.der.son.81.3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Decision Tree Classification Algorithm

o Decision Tree is a Supervised learning technique that can be used

for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the
given dataset.
o It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-
like structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best

algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model. Below are the two reasons for
using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a

decision, so it is easy to understand.
o The logic behind the decision tree can be easily understood because
it shows a tree-like structure.

Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
•
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of
root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best
attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary attribute by ASM). The root
node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes. So, to solve such
problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:
o Information Gain
o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the

segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about a
class.
o According to the value of information gain, we split the node and
build the decision tree.
o A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the below
formula:

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It

specifies randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples

o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a

decision tree in the CART(Classification and Regression Tree)
algorithm.
o An attribute with the low Gini index should be preferred as compared
to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order

to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique that
decreases the size of the learning tree without reducing accuracy is known
as Pruning. There are mainly two types of tree pruning technology used:

o Cost Complexity Pruning

o Reduced Error Pruning.

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a

human follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other
algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

o It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
o For more class labels, the computational complexity of the decision
tree may increase.

Ensemble Methods
Imagine that you have decided to build a bicycle because you are not happy
with the options available in stores and online. Once you've assembled these
great parts, the resulting bike will outlast all other options.

Each model uses the same idea of combining multiple predictive models
(supervised ML) to obtain higher quality predictions than the model.

For example, the Random Forest algorithm is an ensemble method that

combines multiple decision trees trained with different samples from a data
set. As a result, the quality of predictions of a random forest exceeds the quality
of predictions predicted with a single decision tree.

Think about ways to reduce the variance and bias of a single machine learning
model. By combining the two models, the quality of the predictions becomes
balanced. With another model, the relative accuracy may be reversed. It is
important because any given model may be accurate under some conditions
but may be inaccurate under other conditions.
Most of the top winners of Kaggle competitions use some dressing method.
The most popular ensemble algorithms are Random Forest, XGBoost,
and LightGBM.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the

supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a

number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:
Note: To better understand the Random Forest Algorithm, you should have knowledge
of the Decision Tree Algorithm.

Assumptions for Random Forest

Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:

o There should be some actual values in the feature variable of the

dataset so that the classifier can predict accurate results rather than
a guessed result.
o The predictions from each tree must have very low correlations.

Why use Random Forest?

Below are some points that explain why we should use the Random Forest
algorithm:

o It takes less training time as compared to other algorithms.

o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is
missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by

combining N decision tree, and second is to make predictions for each tree
created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points
(Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below
example:

Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided
into subsets and given to each decision tree. During the training phase,
each decision tree produces a prediction result, and when a new data point
occurs, then based on the majority of results, the Random Forest classifier
predicts the final decision. Consider the below image:

Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the

identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of
the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this
algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest

o Random Forest is capable of performing both Classification and

Regression tasks.
o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting
issue.

Disadvantages of Random Forest

o Although random forest can be used for both classification and

regression tasks, it is not more suitable for Regression tasks.
Python Implementation Steps :

o Data Pre-processing step

o Fitting the Random forest algorithm to the Training set
o Predicting the test result
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.

Bagging Vs Boosting

We all use the Decision Tree Technique on day to day life to make the
decision. Organizations use these supervised machine learning techniques
like Decision trees to make a better decision and to generate more surplus
and profit.

Ensemble methods combine different decision trees to deliver better

predictive results, afterward utilizing a single decision tree. The primary
principle behind the ensemble model is that a group of weak learners come
together to form an active learner.

There are two techniques given below that are used to perform ensemble
decision tree.

Bagging

Bagging is used when our objective is to reduce the variance of a decision

tree. Here the concept is to create a few subsets of data from the training
sample, which is chosen randomly with replacement. Now each collection
of subset data is used to prepare their decision trees thus, we end up with
an ensemble of various models. The average of all the assumptions from
numerous tress is used, which is more powerful than a single decision tree.
Random Forest is an expansion over bagging. It takes one additional step
to predict a random subset of data. It also makes the random selection of
features rather than using all features to develop trees. When we have
numerous random trees, it is called the Random Forest.

These are the following steps which are taken to implement a Random
forest:

o Let us consider X observations Y features in the training data set. First,

a model from the training data set is taken randomly with
substitution.
o The tree is developed to the largest.
o The given steps are repeated, and prediction is given, which is based
on the collection of predictions from n number of trees.
Advantages of using Random Forest technique:

o It manages a higher dimension data set very well.

o It manages missing quantities and keeps accuracy for missing data.
Disadvantages of using Random Forest technique:

Since the last prediction depends on the mean predictions from subset
trees, it won't give precise value for the regression model.

Boosting:

Boosting is another ensemble procedure to make a collection of predictors.

In other words, we fit consecutive trees, usually random samples, and at
each step, the objective is to solve net error from the prior trees.

If a given input is misclassified by theory, then its weight is increased so that

the upcoming hypothesis is more likely to classify it correctly by
consolidating the entire set at last converts weak learners into better
performing models.

Gradient Boosting is an expansion of the boosting procedure.

1. Gradient Boosting = Gradient Descent + Boosting

It utilizes a gradient descent algorithm that can optimize any differentiable
loss function. An ensemble of trees is constructed individually, and
individual trees are summed successively. The next tree tries to restore the
loss ( It is the difference between actual and predicted values).

Advantages of using Gradient Boosting methods:

o It supports different loss functions.

o It works well with interactions.
Disadvantages of using a Gradient Boosting methods:

o It requires cautious tuning of different hyper-parameters.

Difference between Bagging and Boosting:

Bagging Boosting

Various training data subsets are Each new subset contains the
randomly drawn with replacement components that were misclassified
from the whole training dataset. by previous models.

Bagging attempts to tackle the over-

Boosting tries to reduce bias.
fitting issue.

If the classifier is unstable (high If the classifier is steady and

variance), then we need to apply straightforward (high bias), then we
bagging. need to apply boosting.

Models are weighted by their

Every model receives an equal weight.
performance.

Objective to decrease variance, not Objective to decrease bias, not

bias. variance.

It is the easiest way of connecting

It is a way of connecting predictions
predictions that belong to the same
that belong to the different types.
type.

New models are affected by the

Every model is constructed
performance of the previously
independently.
developed model.

Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Tree
No ratings yet
Tree
7 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Unit 3
No ratings yet
Unit 3
25 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Deciosn Tree
No ratings yet
Deciosn Tree
5 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Algorithm Guide
No ratings yet
Decision Tree Algorithm Guide
10 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Main Algorithms Used in Machine Learning Lecture Notes
No ratings yet
Main Algorithms Used in Machine Learning Lecture Notes
26 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Decision Tree
No ratings yet
Decision Tree
24 pages
2179 Unit 3
No ratings yet
2179 Unit 3
29 pages
Lab 2
No ratings yet
Lab 2
3 pages
13.decision Tree
No ratings yet
13.decision Tree
29 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Decisiontree
No ratings yet
Decisiontree
4 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Unit 3
No ratings yet
Unit 3
21 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Cours #4-Decision Tree
No ratings yet
Cours #4-Decision Tree
18 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
8 pages
7 Data Structure II
No ratings yet
7 Data Structure II
38 pages
Sorting & Searching Algorithms Guide
No ratings yet
Sorting & Searching Algorithms Guide
42 pages
LeetCode Sorted by Score
No ratings yet
LeetCode Sorted by Score
33 pages
Linked List Based Algo's
No ratings yet
Linked List Based Algo's
7 pages
Roadmap PDF
No ratings yet
Roadmap PDF
4 pages
OR1 02imp LPmodeling
No ratings yet
OR1 02imp LPmodeling
12 pages
ECEA106L EXP2 Solving A System of Linear Equations PDF
No ratings yet
ECEA106L EXP2 Solving A System of Linear Equations PDF
4 pages
Data Structure: Depth/Breadth-First Search
No ratings yet
Data Structure: Depth/Breadth-First Search
19 pages
Data Structures Algorithms
No ratings yet
Data Structures Algorithms
9 pages
Lecture02 Sorting Part 2
No ratings yet
Lecture02 Sorting Part 2
45 pages
DAA Assignment 3
No ratings yet
DAA Assignment 3
18 pages
Balanced k-means Algorithm Analysis
No ratings yet
Balanced k-means Algorithm Analysis
3 pages
Algorithms & Data Structure: Kiran Waghmare
No ratings yet
Algorithms & Data Structure: Kiran Waghmare
20 pages
Linked List
No ratings yet
Linked List
10 pages
Module2 PPT
No ratings yet
Module2 PPT
191 pages
Multiplication and Division - Practice Questions
No ratings yet
Multiplication and Division - Practice Questions
8 pages
Lec 02
No ratings yet
Lec 02
63 pages
11-Searching and Hashing Final
No ratings yet
11-Searching and Hashing Final
71 pages
TSP Parallelization Insights
No ratings yet
TSP Parallelization Insights
11 pages
Introduction To AI & ML QUESTION BANK MODULEWISE
No ratings yet
Introduction To AI & ML QUESTION BANK MODULEWISE
3 pages
Algorithm Lab Manual
No ratings yet
Algorithm Lab Manual
70 pages
AI Search Strategies Explained
No ratings yet
AI Search Strategies Explained
43 pages
Recurrence Relations Guide
No ratings yet
Recurrence Relations Guide
14 pages
Lab 07 Adversarial Search
No ratings yet
Lab 07 Adversarial Search
27 pages
Tut-1 Solution
No ratings yet
Tut-1 Solution
2 pages
Flightno. 201 202 203 203: 5.6. Traveling Salesman Problem
No ratings yet
Flightno. 201 202 203 203: 5.6. Traveling Salesman Problem
7 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
7 pages
Thesis - Heuristic and Exact Algorithms For Vehicle Routing Problems
No ratings yet
Thesis - Heuristic and Exact Algorithms For Vehicle Routing Problems
256 pages
Final Quiz 2 3
No ratings yet
Final Quiz 2 3
4 pages
Data Structure (B.tech IIIsem)
No ratings yet
Data Structure (B.tech IIIsem)
3 pages

Decision Tree & Random ForestNotes

Uploaded by

Decision Tree & Random ForestNotes

Uploaded by

Decision Tree Classification Algorithm

o Decision Tree is a Supervised learning technique that can be used

There are various algorithms in Machine learning, so choosing the best

o Decision Trees usually mimic human thinking ability while making a

Decision Tree Terminologies

Attribute Selection Measures

o Information gain is the measurement of changes in entropy after the

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

o S= Total number of samples

o Gini index is a measure of impurity or purity used while creating a

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order

o Cost Complexity Pruning

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

For example, the Random Forest algorithm is an ensemble method that

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the

As the name suggests, "Random Forest is a classifier that contains a

Assumptions for Random Forest

o There should be some actual values in the feature variable of the

Why use Random Forest?

o It takes less training time as compared to other algorithms.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by

Step-1: Select random K data points from the training set.

Step-4: Repeat Step 1 & 2.

Applications of Random Forest

1. Banking: Banking sector mostly uses this algorithm for the

o Random Forest is capable of performing both Classification and

Disadvantages of Random Forest

o Although random forest can be used for both classification and

o Data Pre-processing step

Ensemble methods combine different decision trees to deliver better

Bagging is used when our objective is to reduce the variance of a decision

o Let us consider X observations Y features in the training data set. First,

o It manages a higher dimension data set very well.

Boosting is another ensemble procedure to make a collection of predictors.

If a given input is misclassified by theory, then its weight is increased so that

Gradient Boosting is an expansion of the boosting procedure.

1. Gradient Boosting = Gradient Descent + Boosting

Advantages of using Gradient Boosting methods:

o It supports different loss functions.

o It requires cautious tuning of different hyper-parameters.

Difference between Bagging and Boosting:

Bagging attempts to tackle the over-

If the classifier is unstable (high If the classifier is steady and

Models are weighted by their

Objective to decrease variance, not Objective to decrease bias, not

It is the easiest way of connecting

New models are affected by the

You might also like