KEMBAR78
Decision Tree & Random ForestNotes | PDF | Machine Learning | Multivariate Statistics
0% found this document useful (0 votes)
26 views11 pages

Decision Tree & Random ForestNotes

The document provides an overview of the Decision Tree Classification Algorithm, explaining its structure, terminology, and how it operates using the CART algorithm. It also discusses the advantages and disadvantages of decision trees, as well as ensemble methods like Random Forest, which combines multiple decision trees for improved accuracy. Additionally, it contrasts bagging and boosting techniques used in ensemble learning, highlighting their differences and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views11 pages

Decision Tree & Random ForestNotes

The document provides an overview of the Decision Tree Classification Algorithm, explaining its structure, terminology, and how it operates using the CART algorithm. It also discusses the advantages and disadvantages of decision trees, as well as ensemble methods like Random Forest, which combines multiple decision trees for improved accuracy. Additionally, it contrasts bagging and boosting techniques used in ensemble learning, highlighting their differences and applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Decision Tree Classification Algorithm

o Decision Tree is a Supervised learning technique that can be used


for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision
and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the
given dataset.
o It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-
like structure.
o In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best


algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model. Below are the two reasons for
using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a


decision, so it is easy to understand.
o The logic behind the decision tree can be easily understood because
it shows a tree-like structure.

Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of
root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best
attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary attribute by ASM). The root
node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes. So, to solve such
problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for
ASM, which are:
o Information Gain
o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the


segmentation of a dataset based on an attribute.
o It calculates how much information a feature provides us about a
class.
o According to the value of information gain, we split the node and
build the decision tree.
o A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the below
formula:

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It


specifies randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)


Where,

o S= Total number of samples


o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a


decision tree in the CART(Classification and Regression Tree)
algorithm.
o An attribute with the low Gini index should be preferred as compared
to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order


to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique that
decreases the size of the learning tree without reducing accuracy is known
as Pruning. There are mainly two types of tree pruning technology used:

o Cost Complexity Pruning


o Reduced Error Pruning.

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a


human follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other
algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.


o It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
o For more class labels, the computational complexity of the decision
tree may increase.

Ensemble Methods
Imagine that you have decided to build a bicycle because you are not happy
with the options available in stores and online. Once you've assembled these
great parts, the resulting bike will outlast all other options.

Each model uses the same idea of combining multiple predictive models
(supervised ML) to obtain higher quality predictions than the model.

For example, the Random Forest algorithm is an ensemble method that


combines multiple decision trees trained with different samples from a data
set. As a result, the quality of predictions of a random forest exceeds the quality
of predictions predicted with a single decision tree.

Think about ways to reduce the variance and bias of a single machine learning
model. By combining the two models, the quality of the predictions becomes
balanced. With another model, the relative accuracy may be reversed. It is
important because any given model may be accurate under some conditions
but may be inaccurate under other conditions.
Most of the top winners of Kaggle competitions use some dressing method.
The most popular ensemble algorithms are Random Forest, XGBoost,
and LightGBM.

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs to the


supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a


number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:
Note: To better understand the Random Forest Algorithm, you should have knowledge
of the Decision Tree Algorithm.

Assumptions for Random Forest

Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:

o There should be some actual values in the feature variable of the


dataset so that the classifier can predict accurate results rather than
a guessed result.
o The predictions from each tree must have very low correlations.

Why use Random Forest?

Below are some points that explain why we should use the Random Forest
algorithm:

o It takes less training time as compared to other algorithms.


o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is
missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by


combining N decision tree, and second is to make predictions for each tree
created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points
(Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below
example:

Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided
into subsets and given to each decision tree. During the training phase,
each decision tree produces a prediction result, and when a new data point
occurs, then based on the majority of results, the Random Forest classifier
predicts the final decision. Consider the below image:

Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the


identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of
the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this
algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest

o Random Forest is capable of performing both Classification and


Regression tasks.
o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting
issue.

Disadvantages of Random Forest

o Although random forest can be used for both classification and


regression tasks, it is not more suitable for Regression tasks.
Python Implementation Steps :

o Data Pre-processing step


o Fitting the Random forest algorithm to the Training set
o Predicting the test result
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.

Bagging Vs Boosting

We all use the Decision Tree Technique on day to day life to make the
decision. Organizations use these supervised machine learning techniques
like Decision trees to make a better decision and to generate more surplus
and profit.

Ensemble methods combine different decision trees to deliver better


predictive results, afterward utilizing a single decision tree. The primary
principle behind the ensemble model is that a group of weak learners come
together to form an active learner.

There are two techniques given below that are used to perform ensemble
decision tree.

Bagging

Bagging is used when our objective is to reduce the variance of a decision


tree. Here the concept is to create a few subsets of data from the training
sample, which is chosen randomly with replacement. Now each collection
of subset data is used to prepare their decision trees thus, we end up with
an ensemble of various models. The average of all the assumptions from
numerous tress is used, which is more powerful than a single decision tree.
Random Forest is an expansion over bagging. It takes one additional step
to predict a random subset of data. It also makes the random selection of
features rather than using all features to develop trees. When we have
numerous random trees, it is called the Random Forest.

These are the following steps which are taken to implement a Random
forest:

o Let us consider X observations Y features in the training data set. First,


a model from the training data set is taken randomly with
substitution.
o The tree is developed to the largest.
o The given steps are repeated, and prediction is given, which is based
on the collection of predictions from n number of trees.
Advantages of using Random Forest technique:

o It manages a higher dimension data set very well.


o It manages missing quantities and keeps accuracy for missing data.
Disadvantages of using Random Forest technique:

Since the last prediction depends on the mean predictions from subset
trees, it won't give precise value for the regression model.

Boosting:

Boosting is another ensemble procedure to make a collection of predictors.


In other words, we fit consecutive trees, usually random samples, and at
each step, the objective is to solve net error from the prior trees.

If a given input is misclassified by theory, then its weight is increased so that


the upcoming hypothesis is more likely to classify it correctly by
consolidating the entire set at last converts weak learners into better
performing models.

Gradient Boosting is an expansion of the boosting procedure.

1. Gradient Boosting = Gradient Descent + Boosting


It utilizes a gradient descent algorithm that can optimize any differentiable
loss function. An ensemble of trees is constructed individually, and
individual trees are summed successively. The next tree tries to restore the
loss ( It is the difference between actual and predicted values).

Advantages of using Gradient Boosting methods:

o It supports different loss functions.


o It works well with interactions.
Disadvantages of using a Gradient Boosting methods:

o It requires cautious tuning of different hyper-parameters.

Difference between Bagging and Boosting:

Bagging Boosting

Various training data subsets are Each new subset contains the
randomly drawn with replacement components that were misclassified
from the whole training dataset. by previous models.

Bagging attempts to tackle the over-


Boosting tries to reduce bias.
fitting issue.

If the classifier is unstable (high If the classifier is steady and


variance), then we need to apply straightforward (high bias), then we
bagging. need to apply boosting.

Models are weighted by their


Every model receives an equal weight.
performance.

Objective to decrease variance, not Objective to decrease bias, not


bias. variance.

It is the easiest way of connecting


It is a way of connecting predictions
predictions that belong to the same
that belong to the different types.
type.

New models are affected by the


Every model is constructed
performance of the previously
independently.
developed model.

You might also like