Unit IV: Ensemble Learning & Unsupervised Learning – Study Material
Ensemble Learning
Ensemble Learning is a technique where multiple models are combined to improve
overall performance. It reduces errors, increases accuracy, and handles data variability
better than individual models.
### Key Features:
1. Combines multiple weak learners to create a strong learner.
2. Improves generalization and reduces overfitting.
3. Works well for both classification and regression tasks.
### Types of Ensemble Learning:
- **Bagging**: Reduces variance by training multiple models on random subsets (e.g.,
Random Forest).
- **Boosting**: Reduces bias by training models sequentially, giving more weight to
misclassified instances (e.g., AdaBoost, Gradient Boosting).
- **Stacking**: Combines multiple models using a meta-learner for final predictions.
Model Combination Schemes
Different strategies exist for combining multiple models in ensemble learning.
1. **Voting**: In classification, multiple models vote, and the majority class is selected.
2. **Error-Correcting Output Codes (ECOC)**: Decomposes multi-class problems into
multiple binary classifications.
3. **Bagging (Bootstrap Aggregating)**: Trains models independently on different
subsets of data and averages results.
4. **Boosting**: Models are trained sequentially, correcting errors from previous
models.
5. **Stacking**: Outputs from base learners are combined using another model (meta-
learner) for final predictions.
Bagging: Random Forest
Bagging is a technique that improves stability and accuracy by reducing overfitting.
### **Random Forest**:
- Uses multiple Decision Trees trained on different subsets of data.
- Predictions are averaged (regression) or majority-voted (classification).
- Handles missing values and large datasets well.
### **Advantages**:
- Reduces overfitting.
- Works well with high-dimensional data.
- Can be used for feature importance ranking.
### **Disadvantages**:
- Requires more computational power.
- Loses interpretability compared to individual Decision Trees.
Boosting: AdaBoost
Boosting combines weak models sequentially, giving more weight to misclassified
instances.
### **AdaBoost (Adaptive Boosting)**:
- Assigns weights to each sample and updates them iteratively.
- Focuses on misclassified samples to improve predictions.
- Uses weak classifiers like Decision Stumps.
### **Advantages**:
- Reduces bias, improving weak classifiers.
- More accurate than bagging for complex datasets.
### **Disadvantages**:
- Sensitive to noise in the dataset.
- Slower training due to sequential model building.
Unsupervised Learning
Unsupervised Learning finds patterns in **unlabeled data**. Unlike supervised learning,
it does not rely on predefined outputs.
### **Key Features**:
1. Works with **unlabeled** data.
2. Groups similar data points or reduces dimensionality.
3. Used in anomaly detection, recommendation systems, and exploratory data analysis.
### **Main Types**:
- **Clustering**: Groups similar data points.
- **Dimensionality Reduction**: Reduces dataset complexity while preserving essential
information (e.g., PCA, LLE, Factor Analysis).
Clustering: Introduction
Clustering is an unsupervised learning technique that **groups similar data points**
based on some similarity measure.
### **Types of Clustering**:
1. **Hierarchical Clustering**: Builds a hierarchy of clusters (e.g., AGNES, DIANA).
2. **Partitional Clustering**: Divides data into distinct clusters (e.g., K-Means, K-
Mode).
3. **Density-Based Clustering**: Identifies clusters based on dense regions (e.g.,
DBSCAN, Mean-Shift).
Hierarchical Clustering: AGNES & DIANA
Hierarchical Clustering builds a nested structure of clusters.
### **AGNES (Agglomerative Nesting)**:
- A **bottom-up** approach: Each data point starts as its own cluster and merges step by
step.
- Uses linkage methods (single, complete, average).
### **DIANA (Divisive Analysis)**:
- A **top-down** approach: All data points start in one cluster and are split iteratively.
### **Advantages**:
- No need to predefine the number of clusters.
- Dendrograms provide visual insights.
### **Disadvantages**:
- Computationally expensive for large datasets.
- Sensitive to noise and outliers.
Partitional Clustering: K-Means & K-Mode
Partitional Clustering divides data into **fixed K clusters**.
### **K-Means Clustering**:
- Assigns data points to **K clusters** based on distance (usually Euclidean).
- Iteratively updates centroids to minimize variance.
### **K-Mode Clustering**:
- Used for categorical data instead of numerical values.
- Replaces means with **modes** (most frequent values).
### **Advantages**:
- Fast and scalable for large datasets.
- Works well when clusters are well-separated.
### **Disadvantages**:
- Sensitive to initial cluster centers.
- Does not handle outliers well.
Dimensionality Reduction: PCA & LLE
Dimensionality reduction techniques help reduce the number of features while preserving
important information.
### **Principal Component Analysis (PCA)**:
- Finds new feature axes (principal components) that maximize variance.
- Used in image compression, face recognition.
### **Locally Linear Embedding (LLE)**:
- A nonlinear technique preserving local relationships in data.
- Suitable for highly nonlinear structures.
### **Advantages**:
- Reduces noise and redundancy.
- Speeds up model training.
### **Disadvantages**:
- Can lose interpretability.
- Assumes linearity (for PCA).