KEMBAR78
Lecture 3 | PDF | Cross Validation (Statistics) | Statistical Classification
0% found this document useful (0 votes)
58 views15 pages

Lecture 3

Machine learning

Uploaded by

dpmanish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views15 pages

Lecture 3

Machine learning

Uploaded by

dpmanish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 3: Ensemble Learning

Contents
3.1 Understanding Ensembles, K-fold cross validation, Boosting, Stumping, XGBoost
3.2 Bagging, Subagging, Random Forest, Comparison with Boosting, Different ways to
combine classifiers

3.1 Ensemble learning

 This is nothing but the concept of combining of many solutions is always


better than considering only one solution.
 The basic ideas behind the ensemble learning combining different models
into single, the different models having slightly different results on a same
dataset, the result which are generated are slightly better while considering
single solution but the condition that combining the solution together well
otherwise result can be considerably worst.
 Supposed that if you are planning for family picnic then first important
thing is to hotel booking, so for selecting one hotel one must check for
many facilities like location that means it must be close to market then
average rating of hotel, rooms availability, other facility provided and cost,
then all these features aggregate together and we select the room.

3.1.1 Cross Validation in Machine Learning:

 Cross validation is used to estimate performance of model on unseen data.


 It divides given data into the number of subset or folds, uses any one of the
subset as validation set and train the model on remaining subset.
 This process is repeated multiple times, every time different subset can be
used as validation set.
 Finally result from each validation process is averaged so that more robust
model with better performance is generated.
 The main purpose of cross-validation is to avoid overfitting problem.
 Since we must ensure that, we do enough training on data to produce
generalized model. If we train for too long, then we will overfit the data.
 In cross-validation, we evaluate the model on number of validation set so
that it performs well on new or unseen data.
 The basic steps for Cross-validation:
1. Divide the dataset into subsets or folds.
2. Reserve a subset as a validation set.
3. Provide remaining subset for training the model.
4. Evaluate model performance using validation set
 Methods used in Cross-validation:
1. Leave one out cross-validation
2. Validation set approach
3. Leave-P-out cross-validation
4. K-fold cross-validation
5. Stratified K-fold cross-validation

3.1.1.1 K-Fold Cross-validation:

 In this approach each dataset is dividing in to the k equal size samples. Each
individual sample is called as Fold.
 For each set, k-1 folds are used for prediction function and remaining is
used for testing.
 The steps for k-fold cross-validation is:
1. Divide the dataset into the k samples
2. For each iteration.
- Reserve one fold as the test data set
- Use remaining for training purpose
- Then evaluate the performance of model based on training set
 For example, let us consider 5-fold cross-validation, here dataset is divided
into the 5 folds. In the 1st iteration first fold is reserved for testing dataset
and remaining is used for training purpose. In the 2nd iteration second fold is
used for testing and remaining is used for train the dataset. This method is
continuing until all the fold is used as a test dataset.

Fig. K-fold Cross-validation process

3.1.1.2 Stratified k-fold cross-validation:

 This technique is very much similar to the k-fold cross-validation with some
minor changes.
 This method is based on stratification process, means to arrange the data
so that each fold is good representative of complete dataset.
 This is the good method to handle bias and variance.
 For example mobile prices, the price of some mobile gadgets are high as
compared to others so for that we can use stratified k-fold cross validation
process.
 Advantages of Cross-validation:
- Overfitting: It resolves the problem of overfitting as it evaluates the
strong model performance on unseen data.
- Model selection: In cross-validation it combines the different model
performance.
- Data Efficient: This method allows us to use all the data for training
as well as testing, so this makes the model as data efficient.
 Disadvantages of Cross-validation:
- Expensive: It requires high cost as model is complex and require long
time to train.
- Time consuming: As more number of complex model are there to
combine then more time is required for training and testing.
- Bias-variance trade-off: as some folds are result in high variance and
some may results in high bias.

3.2 Boosting:

 Ensemble learning is combining several models to improve the


performance compared to single model.
 Basically we learn set of classifiers means experts and allow them to vote.
Boosting is also a one the classifier or we can say that type of machine
learning.
 Boosting is weak leaner classifier, this technique used to build strong
classifier by using several weak classifiers.
 First by using training data model is developed, then second model is
developed by correcting error present in the first model. This process is
continuing until training data set is predicted correctly or enough models
are added.
 There are several boosting algorithms, but AdaBoost was 1st successful
boosting technique which is developed for binary classification. AdaBoost
means Adaptive Boosting which is combines several weak classifiers into a
single strong classifier.
 Algorithm:
- Initialized large dataset
- Assign weights to training dataset
- Provide this input to the model and classify the dataset into wrongly
classified data and correctly classified data.
- Then increased the weight of wrongly classified data and decreased
the weight of correctly classified data. And then organized the
weights.
- If predicted output is correct then stop the process or
Else continue from step 2.
Fig Boosting Process

3.3 Stumping:

 Decision stump is nothing but one level decision tree, that means it consist
of one internal node called as root node and which is connected to terminal
nodes means leaf nodes.
 Decision stump is used as weak learners in ensemble learning techniques
such as boosting or bagging.
 In this technique, decision is just based on single input features so that it is
also called as 1-rule technique.
 As it is binary classification technique, initially we assign some threshold
value, if input value is greater than threshold then we classify as 1 or if it is
less than or equal to threshold then we classify as 0.
 There are several categories are there to build stump depending upon the
input, for nominal features build stump such that each input feature has
leaf. On other hand build stump based on categorical leaf.

Student Gender == Male

Yes No

Boys Girls

Fig. Binary classification with decision stump


 The Above fig shows, one root node with two leaf node decision is based on
yes or no.
 Algorithm:
- Input: Feature matrix X and Label vector Y (Target value)
- For each feature,
 Set yes who satisfies the rule
 Set no who doesn’t satisfy the rule
 Calculate prediction for each features
 Then calculate error based on if prediction is not equal to set
value.
- Output: Final model with stump rule.
3.4 XGBoost:

 It stands for extreme Gradient Boosting algorithm and it can handle large
dataset easily and achieve better performance in classification and
regression.
 It combines many weak models and develop strong prediction model. This
can makes us to understand data and make better decisions
 The advantage of XGBoost is speed, easy to use and better performance in
large set of data.
 One of the important features of XGBoost is, real world data if some values
are missing then it can handle the same without any preprocessing. This
can be possible by training large data set in small amount of time.
 It is used in many applications such as recommendation system, kaggle
competition, click through prediction system and so on.
 It allows modification in parameters so that model can be highly optimized
and highly personalized.
 It is also useful for managing overfitting problem by adding some weights
and biases in trees.
 It follows parallel learning so it can be easily scalable on clusters. It supports
both classifications as well as regression model.
 In this we use series of models and combine them to achieve highly
accurate model.
 For adding new model in the existing one it uses gradient descent
algorithm.
 Since there are some amount of input feature. If we want to find target
output variable and which is in the continuous format, in that regression
algorithm is used.
 In this our responsibility to guide the data so that one can achieve highly
accurate data model.
 Suppose if you are having input set features, and we want to calculate
target output feature which is in the categorical format then we use
classification algorithm. In this data can be guided by past observations of
dataset.

3.5 Bagging:

 Bagging is also known as Bootstrap Aggregation. It is used to reduce


variance and avoid overfitting problem. It is model of averaging method
which is applied on decision tree algorithm.
 Bagging classifier is used as base fit classifier on random subset of the
original dataset then aggregate (Averaging or voting) their individual model
prediction to form better prediction model.
 Each classifier is trained on training data in parallel which is generated
randomly by replacement method. Bagging reduces overfitting by using
averaging or voting. The training sets of classifier are independent of each
other.
Fig Steps of Prediction by using Bagging

 The above fig shows how bagging works. Bagging creates subset of original
data by using replacement method. It generates subset by bootstrap
resampling and trains each subset separately.
 Final prediction model can be developed by considering averaging or voting
from all prediction models.
 Bagging classifier uses different base classifiers such as decision tree, neural
network, linear classifier and so on.
 Algorithm:
- Subset data is created from original dataset by bootstrap
rasampling with replacement.
- A base classifier is created for each subset.
- Each classifier works parallel on training subset data and these are
independent of each other.
- The final prediction model is developed by considering averaging
or voting of all predictions.
-

3.6 Random Forest:


 Random forest is supervised learning problem which can be used for both
classification and regression algorithm.
 It is an ensemble learning process which combines different classifiers
predictions and improve the overall performance.
 The input dataset is divided into number of subset, random forest classifier
consist of decision tree on every subset and combines predictions from
each tree. It predicts the final output based on majority votes of
predictions.
 This can prevent problem of overfitting and achieve higher accuracy if
greater number of trees are there in the forest.

Fig Random Forest algorithm

 Above Fig shows working of random forest algorithm.


 In this algorithm, multiple trees are used to predict the output but
sometimes many of trees can predict the correct output or some may not.
But in together all can predict the correct output.
 For that some assumptions need to make, such that there is no correlation
or low correlation between predictions. And there must be actual value in
the original dataset so that it can predict correct result.
 Algorithm: working of random forest algorithm is as follows
- Select random subset from training set
- Develop decision tree on selected subset data points
- Combine predictions by using averaging or voting.
- Then final prediction model is achieved with higher accuracy.
 Advantages:
- It resolves the overfitting problem and achieves higher accuracy.
- It maintains good accuracy even if some amount of data is missing.
- It works well on large dataset
 Disadvantages:
- More computational resources are required for implementation.
- In this complexity are the major issues.
- It is time consuming process.
 Applications:
- Banking: Used in to sanctioned loan
- Marketing: Used to identify marketing trends
- Medicines: Used in disease predictions.
- Land: To identify similar area

3.7 Difference between Boosting and Bagging:


Sr. Boosting Bagging
No.
1 This method combines different This method combines same types
types of predictions. of predictions.
2 This is use to decrease bias. This is use to decrease variance.
3 This model is dependent upon This model is developing
past develop model. independently.
4 This is use to decrease bias. It tries to solve overfitting problem.
5 In this classifiers are trained In this classifiers are trained
serially. parallel.
6 If the classifier is having high bias If the classifier is having high
then uses boosting. variance then uses bagging.
7 Example: AdaBoost Example: Random Forest

3.8 Difference between Boosting and Random Forest:

Sr. Boosting Random Forest


No.
1 It combines weak learners to It uses decision trees to make
make predictions. prediction.
2 Decision is based on different It uses voting and averaging for
types of prediction. prediction.
3 It gives accurate result only when It can give better accuracy as
classification is used. compared to boosting.
4 One is develop depend on Each decision tree is developing
previous stumps. independently.
5 Bias problem can be resolved Overfitting problem can be
resolved
6 Models are ensemble Models are ensemble in parallels.
sequentially.

3.8 Different ways to combine classifier:

 Combine classifier means to make combination of set of classifier in this


individual decision of each classifier are grouped to make prediction.
 Combination classifier gives much more accurate result than individual.
 The aim of combination is that training set cannot provide sufficient data to
classifier. Another is some complex problem cannot be solved by suggested
learning algorithm.
 We can make prediction by considering weighted approximation of all the
models.

University Asked Questions:

1. Explain the Random Forest algorithm in detail Dec 22 10M

Ans Refer Section 3.6

2. Explain different ways to combine classifier Dec 22 10 M

Ans. Refer Section 3.2, 3.5 and 3.8

Review Questions:

1. What is ensemble learning? Explain in detail.


Ans. Refer Section 3.1

2. Explain K-Fold Cross Validation.

Ans. Refer Section 3.1.1.1

3. Difference between boosting and Bagging.

Ans. Refer Section 3.7

4. Difference between Boosting and Random forest.

Ans. Refer Section 3.8

5. Explain Stumping in detail.

Ans. Refer Section 3.3

6. Explain XGBoost in detail.

Ans. Refer Section 3.4

Summary

 This chapter includes combining different type classifier to predict the


result.
 Different methods to combine the classifier.

You might also like