0% found this document useful (0 votes)

15 views13 pages

Ensemble Learning

Ensemble Learning is a machine learning technique that combines multiple models to enhance prediction accuracy and reduce overfitting. Key methods include Bagging, which reduces variance by training models in parallel, and Boosting, which reduces bias by training models sequentially to correct errors. While ensemble methods improve performance, they can be more complex and computationally intensive than individual models.

Uploaded by

pinkykharvadcs07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

Ensemble Learning

Uploaded by

pinkykharvadcs07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Ensemble Learning

Ensemble Learning is a machine learning technique that combines multiple individual models
to produce a more robust and accurate prediction than any single model. The core idea is that a
group of models, when used together, can perform better than individual models by reducing the
risk of overfitting and increasing generalization to new, unseen data.

Ensemble methods typically rely on diversity of models to complement each other's strengths
and weaknesses. There are two main types of ensemble learning: Bagging and Boosting.

Types of Ensemble Learning:

1. Bagging (Bootstrap Aggregating)

● Goal: Reduce variance and prevent overfitting by aggregating the predictions of multiple
models trained independently.
● How it Works:
○ In bagging, multiple models are trained in parallel, each on a different subset of
the data.
○ These subsets are created by bootstrapping, which involves randomly sampling
the dataset with replacement (i.e., some data points may appear more than once
in a subset).
○ After training the models, their predictions are aggregated to produce the final
output. For classification, the final prediction is usually made by voting (majority
class). For regression, it’s typically the average of the predictions.
● Advantages:
○ Reduces overfitting by smoothing out the model's predictions.
○ Works well with unstable models (models that have high variance) like decision
trees.
● Popular Algorithms:
○ Random Forest: An ensemble of decision trees, each trained on a random
subset of the data, and the final prediction is made by averaging or voting.
○ Bagged Decision Trees: Decision trees trained using bootstrapped samples of
the data.

2. Boosting

● Goal: Reduce bias and improve the predictive accuracy by combining models in a
sequential manner, where each new model corrects the errors made by the previous
ones.
● How it Works:
○ In boosting, models are trained sequentially. Each new model focuses more on
the errors made by the previous models by assigning higher weights to
misclassified data points.
○ The process continues until a predefined number of models are trained or a
certain level of accuracy is achieved.
○ The final prediction is made by combining the predictions of all models, usually
with weighted averaging or voting.
● Advantages:
○ Can significantly improve the performance of weak models by focusing on
difficult-to-classify cases.
○ Works well for both classification and regression tasks.
● Popular Algorithms:
○ AdaBoost (Adaptive Boosting): Focuses on training weak classifiers (e.g.,
decision trees) sequentially. Each model corrects the errors of the previous one,
and the final prediction is a weighted combination of all models.
○ Gradient Boosting Machines (GBM): Fits new models that correct the residual
errors of previous models, using gradient descent optimization to minimize the
error.
○ XGBoost: A more efficient and regularized implementation of gradient boosting
that is widely used for structured data.
○ LightGBM: A fast, distributed implementation of gradient boosting for large
datasets.

3. Stacking (Stacked Generalization)

● Goal: Combine predictions from multiple models (which may be of different types) by
training a meta-model (a higher-level model) on their outputs.
● How it Works:
○ First, multiple base models (e.g., decision trees, logistic regression, etc.) are
trained on the dataset.
○ Then, a meta-model is trained on the outputs of these base models (i.e., the
predictions made by each base model) to make the final prediction.
● Advantages:
○ Leverages the strengths of various types of models, combining them into a more
powerful predictive system.
● Popular Algorithms: Stacking can be applied with any combination of models (e.g.,
decision trees, SVMs, neural networks, etc.).

4. Voting Classifier

● Goal: Use a combination of different models to make a final prediction based on majority
voting or averaging.
● How it Works:
○ In a hard voting classifier, each model casts a vote for a class, and the class
that gets the most votes becomes the final prediction.
○ In a soft voting classifier, each model provides class probabilities, and the class
with the highest average probability across all models is selected.
● Advantages:
○ Simple to implement and understand.
○ Often used as a baseline ensemble method.

Key Differences between Bagging and Boosting:

Aspect Bagging Boosting

Training Models are trained in parallel on Models are trained sequentially, each
Process random subsets of data. correcting the errors of the previous
one.

Focus Reduces variance by averaging Reduces bias by giving more weight to

predictions from several models. misclassified data points in each round.

Model Each model is trained Models are dependent on the previous

Complexity independently. one, and the errors are corrected
sequentially.

Final Majority voting (classification) or Weighted sum of predictions from all

Prediction averaging (regression). models.

Example Random Forest, Bagged AdaBoost, Gradient Boosting, XGBoost.

Algorithm Decision Trees.

Advantages of Ensemble Learning:

1. Improved Accuracy:

○ By combining multiple models, ensemble learning can often achieve higher
accuracy compared to individual models.
2. Reduces Overfitting:
○ Ensemble methods like bagging help reduce the risk of overfitting by averaging
predictions from multiple models trained on different subsets of data.
3. Robustness:
○ Ensemble methods are more robust to noisy data or outliers, as they rely on the
consensus of multiple models.
4. Flexibility:
○ Can be used with any machine learning model (e.g., decision trees, logistic
regression, neural networks), making them versatile for various tasks.

Disadvantages of Ensemble Learning:

1. Increased Complexity:

○ Ensembles of models are more complex to implement, train, and tune. They
require more computational resources and time compared to individual models.
2. Interpretability:
○ Since ensembles combine several models, it can be harder to interpret their
decisions, making them less transparent.
3. Risk of Overfitting (in Boosting):
○ Although boosting can reduce bias, it is also more prone to overfitting if the
models are too complex or the data is noisy.

Applications of Ensemble Learning:

● Image Classification: Combining different types of classifiers for object recognition in

images.
● Spam Detection: Using multiple models to classify emails as spam or not.
● Customer Churn Prediction: Using ensembles to predict whether a customer will leave
a service.
● Financial Forecasting: Predicting stock prices or market trends using various models
trained on financial data.

Summary:

Ensemble Learning combines the predictions of multiple models to improve overall

performance. It works by either training models in parallel (Bagging) to reduce variance or in
sequence (Boosting) to reduce bias. Popular ensemble techniques include Random Forest
(Bagging) and AdaBoost/Gradient Boosting (Boosting). While ensemble methods can improve
predictive accuracy, they are computationally more expensive and harder to interpret.

4o mini

You said:
elaborate. K-Nearest Neighbors (K-NN): A non-parametric algorithm that classifies a data point
based on the majority class among its ‘k’ closest neighbors. It’s simple but computationally
expensive for large datasets.

ChatGPT said:
ChatGPT

K-Nearest Neighbors (K-NN)

K-Nearest Neighbors (K-NN) is a non-parametric, lazy learning algorithm used for both
classification and regression tasks. It is one of the simplest machine learning algorithms, but it
can be computationally expensive, especially with large datasets.

Key Characteristics of K-NN:

1. Non-Parametric:
○ K-NN does not assume any underlying distribution or structure for the data. It
makes decisions based purely on the data itself, without learning a model in the
traditional sense.
2. Lazy Learning:
○ K-NN is a lazy learner, meaning it doesn’t learn or build a model during training.
Instead, it stores the training dataset and only performs computation at the time
of prediction. This is different from eager learning algorithms like decision trees
or neural networks, which learn a model during training.
3. Instance-Based:
○ K-NN makes decisions by comparing a test data point to the training data. It
uses the stored data for prediction rather than generalizing a model.

How K-NN Works:

1. Training Phase:

○ The algorithm simply stores the training data. There is no model fitting or
parameter adjustment in this phase.
2. Prediction Phase:
○ Given a test data point, K-NN finds the K closest neighbors from the training
dataset. These neighbors are identified based on a distance metric (e.g.,
Euclidean distance, Manhattan distance, or Cosine similarity).
○ Distance Metric: A mathematical function that measures how far apart two
points are in the feature space. For most cases, Euclidean distance is used, but
others like Manhattan or Minkowski distance can also be used.
3. The Euclidean distance between two points x=(x1,x2,...,xn)x = (x_1, x_2, ...,
x_n)x=(x1,x2,...,xn) and y=(y1,y2,...,yn)y = (y_1, y_2, ..., y_n)y=(y1,y2,...,yn) in an
n-dimensional space is:
Distance(x,y)=∑i=1n(xi−yi)2\text{Distance}(x, y) = \sqrt{\sum_{i=1}^{n} (x_i -
y_i)^2}Distance(x,y)=i=1∑n(xi−yi)2
Once the K nearest neighbors are identified, the prediction is made:
○ For Classification: The class label is determined by a majority vote among the
K nearest neighbors. The class that appears most frequently among the K
neighbors is assigned to the test point.
○ For Regression: The output is typically the average or mean of the values of the
K nearest neighbors.
4. Choosing K:
○ The value of K (the number of neighbors) is a crucial hyperparameter in K-NN.
○ Small K (e.g., 1 or 3) may lead to overfitting, as the model becomes too sensitive
to noise in the data.
○ Large K provides smoother, more generalized predictions, but may suffer from
underfitting, as it might smooth out important distinctions between classes.
5. Generally, choosing the right value of K is done through cross-validation to find the
value that minimizes error.

Example of K-NN for Classification:

Consider a dataset with two features: height and weight, and the goal is to classify people as
either "tall" or "short" based on their height and weight.

1. Training data might include points like:

○ (170 cm, 65 kg) -> "Tall"
○ (160 cm, 55 kg) -> "Short"
○ (180 cm, 75 kg) -> "Tall"
○ (165 cm, 60 kg) -> "Short"
2. Test point: (167 cm, 62 kg)
○ Find the 3 nearest neighbors to (167, 62) using the Euclidean distance.
○ Suppose the 3 nearest neighbors are: (160 cm, 55 kg), (170 cm, 65 kg), and (165
cm, 60 kg).
○ The majority of the neighbors are "Short", so the test point (167 cm, 62 kg) would
be classified as "Short".

Advantages of K-NN:

1. Simplicity:
○ K-NN is easy to understand and implement. It does not require any complex
training process or model fitting.
2. Non-Parametric:
○ Since K-NN doesn’t assume any prior knowledge or distribution about the data, it
can work well with a wide range of datasets.
3. Versatile:
○ K-NN can be used for both classification (categorical output) and regression
(continuous output) tasks.
4. Flexibility:
○ The algorithm is very flexible and can easily be adjusted by changing the value of
K or the distance metric.
5. Adaptive:
○ K-NN naturally adapts to the complexity of the data. If more data points are
added, K-NN’s performance can improve, as it leverages the entire dataset.

Disadvantages of K-NN:
1. Computationally Expensive:
○ Since K-NN stores all the training data and performs computation at the time of
prediction, it can be very slow, especially with large datasets.
○ Distance calculations between the test point and all training points require
significant computation.
2. Memory Intensive:
○ K-NN requires storing the entire dataset in memory, which can be inefficient for
large datasets.
3. Sensitive to Noise:
○ K-NN is sensitive to noisy data, as it makes decisions based on the closest
neighbors. Outliers or irrelevant features can heavily influence the model's
predictions.
4. Curse of Dimensionality:
○ As the number of features (dimensions) increases, the distance between points
becomes less meaningful, and the performance of K-NN degrades. This is known
as the curse of dimensionality.
○ In high-dimensional spaces, the concept of "nearness" can become less clear,
and K-NN may require dimensionality reduction techniques (like PCA or LDA) to
work well.
5. Choice of K:
○ Selecting the optimal value for K is a challenge, and improper choices can lead to
overfitting (too small K) or underfitting (too large K).

K-NN in Practice:

● Distance Metric:
○ Euclidean distance is most common, but other distance measures may be more
suitable depending on the problem (e.g., Manhattan distance, Minkowski
distance).
● Scaling and Normalization:
○ Features should be scaled or normalized before applying K-NN, because the
distance metric is affected by the scale of the data. Features with larger ranges
dominate the distance calculation, leading to biased results.
● Weighted K-NN:
○ Instead of treating all neighbors equally, you can weight the neighbors based on
their distance from the test point. Neighbors that are closer to the test point are
given higher weights.

Applications of K-NN:
1. Image Classification: Classifying images by comparing pixel values to known labeled
images.
2. Recommendation Systems: Finding items similar to those a user has liked or
purchased.
3. Pattern Recognition: Identifying patterns in handwriting, speech, or other sensory data.
4. Medical Diagnosis: Classifying diseases based on medical data, such as classifying a
tumor as benign or malignant based on features like size, shape, etc.
5. Anomaly Detection: Detecting rare events or outliers based on the patterns of
surrounding data points.

Summary:

K-Nearest Neighbors (K-NN) is a simple, intuitive, and versatile machine learning algorithm
used for classification and regression tasks. It classifies data points based on the majority class
or average of the K nearest neighbors. While K-NN is non-parametric and doesn’t make any
assumptions about the data, it can be computationally expensive, especially with large datasets,
and is sensitive to noisy data and high-dimensional spaces. Properly tuning the K value and
preprocessing the data (such as scaling features) are essential for optimizing K-NN’s
performance.

4o mini

You said:
elaborate Gradient Descent: An optimization algorithm used to minimize a model’s loss function.
Variants include: Batch Gradient Descent: Uses the entire dataset to calculate the gradient.
Stochastic Gradient Descent (SGD): Uses one sample at a time, reducing computation.
Mini-Batch Gradient Descent: Combines the advantages of batch and stochastic methods by
using small random batches.

ChatGPT said:
ChatGPT

Gradient Descent

Gradient Descent is an optimization algorithm used in machine learning to minimize the loss
function of a model. The loss function (also called the cost function) measures how far off the
model's predictions are from the actual outcomes. The goal of gradient descent is to find the
model parameters (weights) that minimize the loss function by iteratively adjusting the
parameters in the direction that reduces the loss.

Concept of Gradient Descent

Gradient descent works by updating the model's parameters based on the gradient (derivative)
of the loss function with respect to the parameters. In simple terms, it finds the steepest
direction to reduce the error and takes steps in that direction.
1. Loss Function: Measures how well the model predicts the target variable.
2. Gradient: The derivative of the loss function with respect to the model's parameters. The
gradient tells us the direction and rate of change in the loss function.
3. Learning Rate: A hyperparameter that determines the step size during each iteration of
gradient descent. A higher learning rate may speed up convergence but risk
overshooting the minimum, while a lower learning rate may lead to slow convergence.

The basic update rule in gradient descent for a parameter θ\thetaθ is:

θ=θ−α⋅∇θJ(θ)\theta = \theta - \alpha \cdot \nabla_{\theta} J(\theta)θ=θ−α⋅∇θJ(θ)

Where:

● θ\thetaθ is the parameter (e.g., weights in a linear model).

● α\alphaα is the learning rate (step size).
● ∇θJ(θ)\nabla_{\theta} J(\theta)∇θJ(θ) is the gradient of the loss function
J(θ)J(\theta)J(θ) with respect to θ\thetaθ.

Types of Gradient Descent

There are three main variants of gradient descent, each with different trade-offs regarding
computational efficiency and convergence speed:

1. Batch Gradient Descent (BGD)

Batch Gradient Descent uses the entire dataset to compute the gradient of the loss function
and updates the model parameters accordingly.

How it works:

● In each iteration, BGD computes the gradient of the loss function with respect to the
model parameters over the entire dataset.
● It then adjusts the parameters based on the average gradient of all data points.

Advantages:

● Converges smoothly because the gradient is averaged over the whole dataset, leading
to more stable updates.
● Can take advantage of vectorized operations (efficient computation) for large datasets.

Disadvantages:

● Computationally expensive for large datasets because it requires calculating the

gradient over the entire dataset in every iteration.
● Slow to update parameters, especially when the dataset is large.

Use case:

● Suitable when the dataset is small enough to be processed all at once and
computational resources are sufficient.

2. Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent computes the gradient and updates the model parameters
based on one random data point at a time.

How it works:

● In each iteration, SGD picks a single random data point and computes the gradient of
the loss function with respect to that point.
● It then adjusts the parameters based on this single data point.

Advantages:

● Faster updates since it processes only one data point at a time.

● Can handle large datasets efficiently because it doesn't need to load the entire dataset
into memory.
● Introduces more randomness (noise) in the updates, which can help escape local
minima or saddle points, potentially leading to better solutions.

Disadvantages:

● The updates are noisy and the path to convergence can be unstable. As a result, the
loss function may fluctuate before it settles at the global minimum.
● Slower convergence to the minimum compared to batch gradient descent, as it does
not take the average gradient.

Use case:

● Works well with large datasets or real-time learning tasks where the data continuously
changes, such as in online learning.

3. Mini-Batch Gradient Descent

Mini-Batch Gradient Descent is a compromise between batch gradient descent and stochastic
gradient descent. It uses small, random subsets (batches) of the dataset, rather than the entire
dataset or a single data point, to compute the gradient.

How it works:

● In each iteration, mini-batch gradient descent divides the dataset into small random
batches (usually 32, 64, or 128 data points in each batch).
● The model parameters are updated after computing the gradient for each batch.

Advantages:

● Faster convergence compared to batch gradient descent, as it updates parameters

more frequently.
● Less noisy than stochastic gradient descent, leading to more stable convergence.
● Can take advantage of vectorization and modern hardware (e.g., GPUs) for efficient
computation.
● Works well with large datasets because it balances memory usage and computational
efficiency.

Disadvantages:

● The optimal mini-batch size is a hyperparameter that needs to be tuned.

● May still be sensitive to the choice of learning rate and batch size.

Use case:

● Mini-batch gradient descent is widely used in deep learning because it strikes a balance
between efficiency, memory usage, and the ability to converge faster than BGD and with
less noise than SGD.

Comparison of the Three Variants:

Feature Batch Gradient Stochastic Gradient Mini-Batch Gradient
Descent Descent (SGD) Descent

Data Processing Entire dataset at One data point at a time Small random batches
once of the dataset

Update Frequency After processing After each data point After each mini-batch
entire dataset

Speed of Updates Slow Fast Faster than BGD,

slower than SGD
Computational Less efficient with Highly efficient with Efficient for large
Efficiency large datasets large datasets datasets with balance

Convergence Stable and smooth Noisy, fluctuates, may Stable but with some
escape local minima fluctuations

Suitable For Small datasets Large datasets, online Large datasets, deep
learning learning

Memory Usage High Low (one data point at a Moderate (depends on

time) batch size)

Key Concepts to Tune:

● Learning Rate: A crucial hyperparameter that determines how large a step is taken
toward the minimum. Too high a learning rate can cause overshooting, while too low a
rate can make convergence slow.
● Mini-Batch Size: For mini-batch gradient descent, the size of the mini-batches needs to
be chosen carefully. A batch size that is too small can lead to noisy updates, while a size
that is too large can slow down the computation.
● Momentum: Momentum is used in some gradient descent variants (like SGD with
momentum), where the update is based not only on the current gradient but also on the
past gradients. This helps to smooth out the updates and avoid oscillations.

Applications of Gradient Descent:

1. Linear and Logistic Regression: Optimizing weights in regression models to minimize
error.
2. Neural Networks: Training deep learning models by adjusting weights in layers using
backpropagation, which utilizes gradient descent.
3. Support Vector Machines: Optimizing the hyperplane to separate classes.
4. Recommender Systems: Finding optimal parameters for collaborative filtering.
5. Deep Learning Frameworks: Popular frameworks like TensorFlow, Keras, and PyTorch
use variations of gradient descent to train deep neural networks.

Summary:

Gradient Descent is a widely used optimization algorithm for minimizing a model's loss function.
It has several variants, including Batch Gradient Descent, which uses the entire dataset,
Stochastic Gradient Descent (SGD), which uses one data point at a time, and Mini-Batch
Gradient Descent, which combines the advantages of both by using small batches of data.
Each variant has its own strengths and trade-offs in terms of computation, convergence speed,
and stability. Gradient descent is at the core of many machine learning and deep learning
algorithms.

LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
9 pages
Time To Explore (5) ML
No ratings yet
Time To Explore (5) ML
9 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Bagging
No ratings yet
Bagging
7 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Unit 4
No ratings yet
Unit 4
24 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
4 pages
Chapter 3 Ensemble Learning
No ratings yet
Chapter 3 Ensemble Learning
37 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Unit 4 ML
No ratings yet
Unit 4 ML
9 pages
Ensemble Methods Unit - 4
No ratings yet
Ensemble Methods Unit - 4
17 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Module 2
No ratings yet
Module 2
34 pages
Machine Learning
No ratings yet
Machine Learning
76 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Ensemble Techniques Presentation
No ratings yet
Ensemble Techniques Presentation
17 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
ML Ca1
No ratings yet
ML Ca1
11 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Model Ensembles
No ratings yet
Model Ensembles
26 pages
ML Uint 4-2
No ratings yet
ML Uint 4-2
20 pages
ML Unit 3 V2
No ratings yet
ML Unit 3 V2
47 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Unit 2
No ratings yet
Unit 2
13 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Unit 4 PDF
No ratings yet
Unit 4 PDF
9 pages
Unit 3
No ratings yet
Unit 3
99 pages
Ensembles
No ratings yet
Ensembles
9 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
Unit 3-Ensemble Techniques
No ratings yet
Unit 3-Ensemble Techniques
47 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
CSE 445 - Lecture 7 - Ensemble Learning
No ratings yet
CSE 445 - Lecture 7 - Ensemble Learning
17 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
26 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
What Is Ensemble Learning
No ratings yet
What Is Ensemble Learning
4 pages
Citrix Workspace - Assessing The Business Value of VDI in The Public Cloud - IDC Solution Brief
No ratings yet
Citrix Workspace - Assessing The Business Value of VDI in The Public Cloud - IDC Solution Brief
11 pages
Onkyo TX NR 616 Service Manual PDF
No ratings yet
Onkyo TX NR 616 Service Manual PDF
138 pages
Project Report Final
No ratings yet
Project Report Final
21 pages
Holistic Testing-Weave Quality Into Your Product
No ratings yet
Holistic Testing-Weave Quality Into Your Product
37 pages
Go Programming Cheat Sheet
No ratings yet
Go Programming Cheat Sheet
1 page
Survey Design & Questionnaire Techniques
No ratings yet
Survey Design & Questionnaire Techniques
27 pages
CompDB App
No ratings yet
CompDB App
2 pages
IOT Final-1
No ratings yet
IOT Final-1
24 pages
SCSC Notes
No ratings yet
SCSC Notes
4 pages
Global Supply Chain Management Simulation
No ratings yet
Global Supply Chain Management Simulation
9 pages
Business Processes in SAP S/4HANA Sales: Enterprise Structures
No ratings yet
Business Processes in SAP S/4HANA Sales: Enterprise Structures
6 pages
Best Version No Math
No ratings yet
Best Version No Math
38 pages
Class Diagrams
100% (1)
Class Diagrams
20 pages
Streets of Blood - Carl Sargent, Marc Gascoigne (EPUB Rip)
100% (1)
Streets of Blood - Carl Sargent, Marc Gascoigne (EPUB Rip)
437 pages
R5105N Series Microprocessor Supervisor
No ratings yet
R5105N Series Microprocessor Supervisor
14 pages
K-1000C LED Controller Manual
No ratings yet
K-1000C LED Controller Manual
9 pages
Genius: High Accuracy Three Phase Smart Meter
No ratings yet
Genius: High Accuracy Three Phase Smart Meter
2 pages
Computer Programming 1 Bachelor of Science in Information Technology
No ratings yet
Computer Programming 1 Bachelor of Science in Information Technology
1 page
CS-H4-R201-1H3WKFL Camara
No ratings yet
CS-H4-R201-1H3WKFL Camara
2 pages
Liz: I Am Afraid That I Am Putting On Weight. - Tony
No ratings yet
Liz: I Am Afraid That I Am Putting On Weight. - Tony
6 pages
Essential Computer Concepts 2
No ratings yet
Essential Computer Concepts 2
28 pages
Srujana Short Resume
No ratings yet
Srujana Short Resume
2 pages
Embedded Rust Programming
No ratings yet
Embedded Rust Programming
5 pages
Tim Cook Is The CEO
No ratings yet
Tim Cook Is The CEO
5 pages
Gamestorming Techniques Guide
No ratings yet
Gamestorming Techniques Guide
10 pages
Body Process A Gestalt Approach To Working With The Body in Psychotherapy Ebook and TestBank Bundle Full Download
100% (1)
Body Process A Gestalt Approach To Working With The Body in Psychotherapy Ebook and TestBank Bundle Full Download
412 pages
B.Tech Admissions Guide
No ratings yet
B.Tech Admissions Guide
15 pages
SCW OS Level V
No ratings yet
SCW OS Level V
59 pages
Enterprise Architecture: Dexter C. Hulleza
No ratings yet
Enterprise Architecture: Dexter C. Hulleza
28 pages
Lab 2
No ratings yet
Lab 2
3 pages

Ensemble Learning

Uploaded by

Ensemble Learning

Uploaded by

Ensemble Learning

Types of Ensemble Learning:

1. Bagging (Bootstrap Aggregating)

3. Stacking (Stacked Generalization)

Key Differences between Bagging and Boosting:

Focus Reduces variance by averaging Reduces bias by giving more weight to

Model Each model is trained Models are dependent on the previous

Final Majority voting (classification) or Weighted sum of predictions from all

Example Random Forest, Bagged AdaBoost, Gradient Boosting, XGBoost.

Advantages of Ensemble Learning:

1.​ Improved Accuracy:

Disadvantages of Ensemble Learning:

1.​ Increased Complexity:

Applications of Ensemble Learning:

●​ Image Classification: Combining different types of classifiers for object recognition in

Ensemble Learning combines the predictions of multiple models to improve overall

K-Nearest Neighbors (K-NN)

Key Characteristics of K-NN:

How K-NN Works:

1.​ Training Phase:

Example of K-NN for Classification:

1.​ Training data might include points like:

Concept of Gradient Descent

θ=θ−α⋅∇θJ(θ)\theta = \theta - \alpha \cdot \nabla_{\theta} J(\theta)θ=θ−α⋅∇θ​J(θ)

●​ θ\thetaθ is the parameter (e.g., weights in a linear model).

Types of Gradient Descent

1. Batch Gradient Descent (BGD)

●​ Computationally expensive for large datasets because it requires calculating the

2. Stochastic Gradient Descent (SGD)

●​ Faster updates since it processes only one data point at a time.

3. Mini-Batch Gradient Descent

●​ Faster convergence compared to batch gradient descent, as it updates parameters

●​ The optimal mini-batch size is a hyperparameter that needs to be tuned.

Comparison of the Three Variants:

Speed of Updates Slow Fast Faster than BGD,

Memory Usage High Low (one data point at a Moderate (depends on

Key Concepts to Tune:

Applications of Gradient Descent:

You might also like

1. Improved Accuracy:

1. Increased Complexity:

● Image Classification: Combining different types of classifiers for object recognition in

1. Training Phase:

1. Training data might include points like:

θ=θ−α⋅∇θJ(θ)\theta = \theta - \alpha \cdot \nabla_{\theta} J(\theta)θ=θ−α⋅∇θJ(θ)

● θ\thetaθ is the parameter (e.g., weights in a linear model).

● Computationally expensive for large datasets because it requires calculating the

● Faster updates since it processes only one data point at a time.

● Faster convergence compared to batch gradient descent, as it updates parameters

● The optimal mini-batch size is a hyperparameter that needs to be tuned.