In conclusion, dimensionality reduction improves machine learning models by simplifying
data, enhancing performance, and preventing overfitting. Both feature selection and feature
extraction help reduce complexity, with feature selection focusing on important features and
feature extraction creating new, compact features. While it offers benefits like better
visualization and faster computation, care must be taken to avoid data loss and
interpretability issues. Proper use of dimensionality reduction techniques can lead to more
efficient and effective models.
CO-VARIANCE MATRIX
Early stopping
Early stopping is a regularization technique used to prevent overfitting in neural networks. It
monitors the model’s performance on a validation set during training and stops the training
process when the performance starts to degrade.
BIAS AND VARIANCE
There are various ways to evaluate a machine-learning model.
We can use MSE (Mean Squared Error) for Regression;
Precision, Recall, and ROC (Receiver operating characteristics) for a Classification Problem
along with Absolute Error.
In a similar way, Bias and Variance help us in parameter tuning and deciding better-fitted
models among several built.
•Bias is one type of error that occurs due to wrong assumptions about data such as
assuming data is linear when in reality, data follows a complex function.
•On the other hand, variance gets introduced with high sensitivity to variations in training
data. This also is one type of error since we want to make our model robust against noise.
•There are two types of error in machine learning. Reducible error and Irreducible error. Bias
and Variance come under reducible error.
ML UNDERFITTING AND OVERFITTING
● Machine learning models aim to perform well on both training data and new, unseen
data and is considered “good” if:
● It learns patterns effectively from the training data.
● It generalizes well to new, unseen data.
● It avoids memorizing the training data (overfitting) or failing to capture relevant
patterns (underfitting).
● To evaluate how well a model learns and generalizes, we monitor its performance on
both the training data and a separate validation or test dataset which is often
measured by its accuracy or prediction errors. However, achieving this balance can
be challenging. Two common issues that affect a model’s performance and
generalization ability are overfitting and underfitting. These problems are major
contributors to poor performance in machine learning models. Let’s us understand
what they are and how they contribute to ML models.
Bias and Variance in Machine Learning
● Bias and variance are two key sources of error in machine learning models that
directly impact their performance and generalization ability.
● Bias: is the error that happens when a machine learning model is too simple and
doesn’t learn enough details from the data. It’s like assuming all birds can only be
small and fly, so the model fails to recognize big birds like ostriches or penguins that
can’t fly and get biased with predictions.
● These assumptions make the model easier to train but may prevent it from capturing
the underlying complexities of the data.
● High bias typically leads to underfitting, where the model performs poorly on both
training and testing data because it fails to learn enough from the data.
● Example: A linear regression model applied to a dataset with a non-linear
relationship.
Reasons for Underfitting:
The model is too simple, So it may be not capable to represent the complexities in the data.
The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.
The size of the training dataset used is not enough.
Excessive regularization are used to prevent the overfitting, which constraint the model to
capture the data well.
Features are not scaled.