Random Forest
Algorithm
Gayathri Prasad S
Introduction
A random forest is a supervised machine
learning algorithm that is constructed from
decision tree algorithms.
It utilizes ensemble learning, which is a
technique that combines many classifiers to
provide solutions to complex problems.
A random forest algorithm consists of many
decision trees.
The random forest algorithm establishes the
outcome based on the predictions of the
decision trees.
It predicts by taking the average or mean of
the output from various trees.
Increasing the number of trees increases the
precision of the outcome.
A random forest eradicates the limitations of
a decision tree algorithm. It reduces the
overfitting of datasets and increases
precision.
Features of a RFA
It’s more accurate than the decision tree
algorithm.
It can produce a reasonable prediction
without hyper-parameter tuning.
It solves the issue of overfitting in decision
trees.
In every random forest tree, a subset of
features is selected randomly at the node’s
splitting point.
The main difference between the decision tree algorithm
and the random forest algorithm is that establishing root
nodes and segregating nodes is done randomly in the
latter.
The random forest employs the bagging/bootstrap
aggregation method to generate the required prediction.
Bagging involves using different samples of data
(training data) rather than just one sample. A training
dataset comprises observations and features that are
used for making predictions. The decision trees produce
different outputs, depending on the training data fed to
the random forest algorithm. These outputs will be
ranked, and the highest will be selected as the final
output.
Pictorial Representation
Classification in random forests employs an
ensemble methodology to attain the
outcome. The training data is fed to train
various decision trees. This dataset consists
of observations and features that will be
selected randomly during the splitting of
nodes. The leaf node of each tree is the
final output produced by that specific
decision tree. The selection of the final
output follows the majority-voting system.
Regression in random forests
In a random forest regression, each tree
produces a specific prediction. The mean
prediction of the individual trees is the
output of the regression. This is contrary to
random forest classification, whose output
is determined by the mode of the decision
trees’ class.
Random forest algorithms are not ideal in the
following situations:
Extrapolation
Random forest regression is not ideal in the
extrapolation of data. Unlike linear regression, which
uses existing observations to estimate values beyond
the observation range. This explains why most
applications of random forest relate to classification.
Sparse data
Random forest does not produce good results when
the data is very sparse. This will generate
unproductive splits, which will affect the outcome.
Advantages of random forest
It can perform both regression and
classification tasks.
A random forest produces good predictions
that can be understood easily.
It can handle large datasets efficiently.
The random forest algorithm provides a
higher level of accuracy in predicting
outcomes over the decision tree algorithm.
Disadvantages of random
forest
When using a random forest, more
resources are required for computation.
It consumes more time compared to a
decision tree algorithm.
Datasets
https://drive.google.com/file/d/15pc24lVzok
KXhPvjqjvgmMNqSc611EoL/view?usp=shari
ng
https://drive.google.com/file/d/1ailAwduVTt0
8yG12MYIzq86-Etz4N9kM/view?usp=sharing
Thank You!!