What is Machine Learning?
Machine Learning is a subset of Artificial Intelligence (AI) where computers learn from data to make predictions
or decisions without being explicitly programmed.
Example:
Instead of writing rules to detect spam emails, an ML model learns from past emails (labeled as spam/not spam)
and predicts future emails.
Types of Machine Learning
There are three main types:
1. Supervised Learning
Learns from labeled data (input-output pairs).
.
Used for classification (e.g., spam detection) and regression (e.g., house price prediction).
2. Unsupervised Learning
Works with unlabeled data to find hidden patterns.
Used for clustering (e.g., customer segmentation) and dimensionality reduction.
3. Reinforcement Learning
Learns by interacting with an environment and receiving rewards/punishments.
Used in gaming (e.g., AlphaGo), robotics, and self-driving cars.
1. Numerical Data (Quantitative)
Represents measurable quantities and can be either:
Discrete: Countable numbers (whole numbers).
Example: Number of students in a class (1, 2, 3, ...).
Continuous: Can take any value within a range (decimals).
Example: Temperature (36.5°C, 98.6°F), Height (5.9 ft).
Use in ML:
Regression (predicting continuous values).
Algorithms: Linear Regression, Decision Trees, Neural Networks.
2. Categorical Data (Qualitative)
Represents categories or groups with no inherent order.
Binary: Only two categories (Yes/No, True/False).
Nominal: More than two categories with no order.
Example: Colors (Red, Green, Blue), Gender (Male, Female).
Use in ML:
Classification (grouping data into categories).
Algorithms: Logistic Regression, Random Forest, SVM.
Preprocessing Needed:
One-Hot Encoding (converting categories into binary columns).
3. Ordinal Data
Represents categories with a meaningful order but no fixed numerical difference.
Example:
Education Level (High School < Bachelor’s < Master’s).
Customer Ratings (Poor < Fair < Good < Excellent).
Use in ML:
Can be treated as categorical or numerical (if encoded properly).
Algorithms: Decision Trees, Ordinal Regression.
Preprocessing Needed:
Label Encoding (assigning numbers in order, e.g., Poor=1, Excellent=4).
What is One-Hot Encoder?
One-Hot Encoding is a technique used in machine learning to convert categorical data into a numerical format
so that it can be used by algorithms. It creates binary columns for each unique category.
sparse=False or sparse_output=False in OneHotEncoder
These parameters control the type of output format returned by the OneHotEncoder.
One-Hot Encoding creates new columns — one for each category — and marks 1 or 0
depending on presence.
What is Label Encoding?
Label Encoding is a technique to convert categorical values (like strings) into numerical labels (integers). It
assigns a unique number to each category.
1. Variance :
Definition: Variance measures the average squared distance from the mean. It tells us
how far the values are spread out in the dataset.
Variance(σ2)= ∑ (xᵢ - μ)² / N, where:
σ² is the population variance.
xᵢ is each individual data point.
μ is the population mean.
N is the number of data points in the population.
2. Standard Deviation
Definition: Standard Deviation is the square root of the variance. It shows the average
distance from the mean, in the same unit as the original data.
Standard Deviation(σ)= Variance
What Does Standard Deviation Suggest?
Low SD indicates that the numbers are close to the mean set.
High SD signifies that the numbers are dispersed at a wider range.
Overfitting :the model perform well on training data but not well in testing data.
(or)
Low bias and high varience is called overfitting that means if the model well
perform in training that means Low bias orlse high bias
If the model good in testing that means it has low bias or else high varience.
Underfitting :The model doesn’t well perform in testing and training .
( high bias and low varience )
Appropriate / Generalized fitting : the model well perform on both training and testing
( low bias and low varience )
Noisy : noisy data refers to irrelevant or meaningless data points within a dataset that can
negatively impact model performance. That may spelling mistake , numeric typing mistake
and others .
Ex: [85, 87, 90, 88, 89, 0, 91]
The 0 might be a noisy value if it was entered wrongly (maybe meant to be 80).
Outliers : outliers are data points that significantly deviate from the majority of the dataset. They can be much
higher or lower than other values and can negatively impact model performance.
Ex: [30k, 32k, 29k, 35k, 1 crore]
The 1 crore is an outlier. It might be a real value (like a CEO’s salary), but it’s very far
from the average
Data Preprocessing
Data preprocessing is a critical step in machine learning that transforms raw data into a clean, structured format
for modeling. Poor-quality data leads to poor model performance, so proper preprocessing is essential.
Why Data Preprocessing?
Real-world data is messy (missing values, noise, inconsistencies).
ML algorithms require structured numerical data.
Preprocessing improves accuracy, efficiency, and reliability.
Step 1: Data Collection
Step 2: Handling Missing Data
Step 3: Handling Categorical Data
Step 4: Feature Scaling
Step 5: Splitting Data into Train & Test Sets
Step 6: Handling Outliers
Step 7: Feature Engineering
Feature Scaling: Standardization vs. Normalization
Feature scaling is a critical preprocessing step that ensures all numerical features are on a similar scale,
preventing some features from dominating others in machine learning algorithms.
Why Scale Features?
Algorithms like SVM, KNN, Neural Networks, and Gradient Descent are sensitive to feature scales.
Distance-based algorithms (K-Means, KNN) perform poorly with unscaled data.
Helps models converge faster (especially in gradient descent).
Scaler Use When Algorithms Affected
- You want to scale features between 0 and 1. - KNN, SVM, Neural Networks, Logistic
MinMaxScaler
Data has no outliers. Regression
- You want mean = 0 and std = 1. - Data has Linear Regression, Logistic
StandardScaler
normal (Gaussian) distribution. Regression, SVM, PCA
- Data has outliers. - You want to reduce the Any algorithm where outliers can
RobustScaler
effect of extreme values. affect performance
- Data is already centered around 0. - You want Works well for text-based models,
MaxAbsScaler
to preserve sparse data (like in NLP). Sparse datasets, L1/L2 models
- You want to scale each data row (not KNN, Cosine similarity models, NLP
Normalizer
columns) to unit norm. - Mainly used in text data. applications
1.Percentile :Percentile tells you how much percentage of people are below you in
something.
Example:
You wrote an exam and got the 90th percentile.
That means:
You did better than 90% of the students in that exam.
It doesn’t mean you got 90 marks. It just means your position (rank) is higher than 90% of others.
2. Quantile : Quantile means dividing data into equal parts.
Example:
Let’s say you have marks of 100 students, and you want to divide them into 4 equal groups:
1st group (0% to 25%)
2nd group (25% to 50%)
3rd group (50% to 75%)
4th group (75% to 100%)
These are called quartiles (because you divided into 4 parts).
Quantile just means:
“Let’s cut the data into parts — like 2 parts, 4 parts, or 10 parts.”
What is EDA? ( Exploratory Data Analysis )
EDA means exploring and understanding the data before building a model.
It helps us know what’s inside the data — patterns, problems, or interesting things.
Why EDA is Important?
To understand the shape and size of data
To spot missing values or outliers
To find relationships between columns
To decide what cleaning and transformations are needed
What Do We Do in EDA?
Step What it Means in Simple Words
1. Data Summary See rows, columns, data types
2. Null Values Check if any values are missing
3. Unique Values See what categories are there (e.g., cat, dog, etc.)
4. Descriptive Stats Check mean, median, min, max, std
5. Distribution Check See how values are spread using histograms
6. Correlation Matrix Check which columns are related to each other
7. Outlier Detection Find extreme values (too high/too low)
8. Visualizations Use plots (bar, pie, box, scatter, etc.) to see patterns
Common EDA Visualizations
Chart Type Use it for...
Histogram See distribution of numbers
Box Plot Spot outliers and data spread
Count Plot Count of categories (bar chart)
Heatmap Show correlation between features
Pie Chart Show part-to-whole (not preferred much)
Scatter Plot Relationship between 2 numeric columns
How does machine learning work ?
Machine Learning process includes Project Setup, Data Preparation, Modeling and Deployment.
Stages of machine learning
The following are the stages (detailed sequential process) of Machine Learning:
1. Data Collection
What happens: Gather data from various sources (databases, text files, images, audio, web scraping, etc.).
Goal: Get the raw data needed for the problem.
Output: Store in CSV or a database format.
2. Data Pre-processing
What happens: Clean and prepare the data.
Remove duplicates
Fix errors
Handle missing values (remove or fill)
Format data properly
Goal: Make the data ready for model training.
3. Choosing the Right Model
What happens: Select a suitable machine learning model.
Example models: Linear Regression, Decision Trees, Neural Networks
Goal: Choose the model based on data type, size, problem complexity, and resources.
4. Training the Model
What happens: Feed the clean data into the model so it can learn patterns.
Goal: Help the model make accurate predictions.
5. Evaluating the Model
What happens: Test the model on new (unseen) data.
Goal: Check how well the model performs.
6. Hyperparameter Tuning and Optimization
What happens: Adjust model settings (hyperparameters) to improve performance.
Use techniques like cross-validation.
Goal: Make the model work better on different data.
7. Predictions and Deploymentm
What happens: Use the trained model to make predictions on new data.
Deployment: Integrate the model into a real-world system or app.
Goal: Use the model for decision-making or automation.
Types of Machine Learning:
Supervised Machine Learning − It is a type of machine learning that trains the model using labeled datasets to
predict outcomes.
2. Unsupervised Machine Learning − It is a type of machine learning that learns patterns and structures within
the data without human supervision.
3. Semi-supervised Learning − It is a type of machine learning that is neither fully supervised nor fully
unsupervised. The semi-supervised learning algorithms basically fall between supervised and unsupervised learning
methods.
4. Reinforcement Machine Learning − It is a type of machine learning model that is similar to supervised
learning but does not use sample data to train the algorithm. This model learns by trial and error.
Importance of Machine Learning / features :
Machine Learning (ML) plays a crucial role in automation, data analysis, and decision-
making. Here’s why it’s important:
1. Data Processing
Analyzes massive data from sources like social media, sensors, etc.
Reveals hidden patterns to support better decisions.
2. Data-Driven Insights
Finds trends and connections in large datasets that humans may miss.
Helps in making accurate predictions and smarter decisions.
3. Automation
Automates repetitive and time-consuming tasks.
Reduces human error and saves effort.
4. Personalization
Understands user behavior to give personalized recommendations.
Used in e-commerce, social media, and streaming services.
5. Predictive Analytics
Predicts future outcomes using past data.
Used in sales forecasting, risk analysis, and demand planning.
6. Pattern Recognition
Recognizes patterns in images, speech, and text.
Applied in image processing, speech recognition, and NLP.
7. Finance
Used in credit scoring, fraud detection, and algorithmic trading.
8. Retail
Improves recommendation systems, customer service, and inventory management.
9. Fraud Detection & Cybersecurity
Detects suspicious behavior and fraud in real-time.
10. Continuous Improvement
ML models improve over time by learning from new data.
Disadvantages of Machine Learning :
1.Data Dependency
ML needs a large amount of high-quality data.
Poor or insufficient data leads to poor results.
2.High Computational Cost
Some ML models (like deep learning) need powerful hardware (GPUs, high RAM).
Increases cost and energy usage.
3.Time-Consuming
Training models can take a lot of time, especially with large datasets.
4.Lack of Interpretability
Complex models like neural networks are hard to understand ("black box").
It’s difficult to explain why the model gave a certain output.
5.Overfitting
The model may perform well on training data but fail on new (real-world) data.
6.Security and Privacy Issues
Models trained on sensitive data may leak private information.
Vulnerable to adversarial attacks.
7.Not Always Accurate
ML predictions are not 100% correct.
Wrong predictions can cause serious problems (e.g., in healthcare or finance).
Challenges of Machine Learning :
1.Data Collection & Quality
Getting clean, labeled, and relevant data is difficult.
Garbage in → garbage out.
2.Choosing the Right Algorithm
There are many algorithms. Picking the best one for the task is a challenge.
3.Bias in Data
If training data is biased, the model will also be biased and unfair.
4.Model Interpretability
Hard to explain how and why complex models work.
A big challenge in critical fields like healthcare, law, and finance.
5.Scalability
Models may work on small data but struggle with huge datasets or in real-time
6.Hyperparameter Tuning
Finding the best settings for the model is trial and error and takes time.
7.Deployment and Maintenance
Putting ML into production is complex.
Models need to be monitored and updated regularly.
8.Ethical & Legal Concerns
Data privacy, algorithm fairness, and accountability are major concerns.
Machine Learning vs. Deep Learning :
Feature Machine Learning (ML) Deep Learning (DL)
A subset of ML that uses neural networks
Definition A subset of AI that learns from data
with multiple layers
Data Requirement Works with small to medium-sized data Requires large amounts of data
Decision Trees, SVM, KNN, Linear CNN, RNN, LSTM, GANs (deep neural
Algorithms Used
Regression, etc. networks)
Feature Manual feature selection is often Automatically extracts features from raw
Engineering required data
Hardware Needs powerful GPUs and high computing
Can work on traditional computers
Dependency resources
Slower to train due to complex
Training Time Faster to train
architectures
Interpretability Easier to understand and explain Harder to interpret (black-box models)
Application Spam detection, credit scoring, Image recognition, speech recognition, self-
Examples customer segmentation driving cars
1.Supervised learning: The supervised learning further can be classified into two types −
classification and regression.
· Classification: Predict categories (e.g., spam or not spam)
· Regression: Predict continuous values (e.g., house price)
a. Linear Regression
b. Logistic Regression
c. Decision Trees
d. Random Forest
e. K-Nearest Neighbors (KNN)
f. Support Vector Machines (SVM)
g. Naive Bayes
h. Neural Networks (for classification/regression)
2.UnSupervised learning: we can classify the unsupervised learning algorithms into three types −
clustering, association, and dimensionality reduction.
Clustering
Goal: Group similar data points together.
Use Case: Customer segmentation, market research, pattern recognition.
Common Algorithms:
a. K-Means Clustering
b. Hierarchical Clustering
c. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
d. Gaussian Mixture Models (GMM)
2. Association
Goal: Discover interesting relationships or rules among variables in large datasets.
Use Case: Market basket analysis (e.g., “People who buy bread often buy butter”).
🔧Common Algorithms:
e. Apriori Algorithm
f. Eclat Algorithm
g. FP-Growth Algorithm
3.Dimensionality Reduction
Goal: Reduce the number of input variables while preserving important information.
Use Case: Data visualization, noise removal, speeding up machine learning algorithms.
🔧 Common Algorithms:
a. Principal Component Analysis (PCA)
b. t-Distributed Stochastic Neighbor Embedding (t-SNE)
c. Linear Discriminant Analysis (LDA)
d. Autoencoders
Type Purpose Example Algorithms
K-Means, DBSCAN,
Clustering Grouping similar data points
Hierarchical
Association Finding relationships/patterns Apriori, FP-Growth, Eclat
Dimensionality Simplifying high-dimensional
PCA, t-SNE, Autoencoders
Reduction data
3.Reinforcement Learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by
performing actions and receiving rewards or penalties from the environment.
It’s like learning by trial and error.
The goal is to maximize cumulative rewards over time.
How it Works (Basic Terms)
Term Description
Agent The learner or decision-maker (e.g., robot, AI player).
Environment The world through which the agent moves.
A move the agent can make (e.g., move left, buy
Action
stock).
State The current situation of the agent.
Feedback from the environment (positive or
Reward
negative).
Policy A strategy the agent follows to take actions.
Value
Expected reward in the long run from a given state.
Function
· Positive Reinforcement
Reward for a good action.
Helps the agent learn quickly.
· Negative Reinforcement
Penalizes wrong actions
Discourages unwanted behavior.hhhh
Real-Life Examples of Reinforcement Learning :
Application Description
🎮 Games AlphaGo, Chess, Atari – learned strategies better than humans.
🚗 Self-Driving
Learn how to navigate, avoid obstacles, and follow traffic rules.
Cars
📈 Stock Trading Decide when to buy or sell stocks based on market rewards.
🧹 Robotics Teach robots to walk, pick up objects, etc.
🧠 Healthcare Personalized treatment plans and dosage control.
Libraries and Packages :
Numpy : for numeric computations
Pandas : for data manipulation and data preprocessing .
Scikit-learn : has implemented almost all the machine learning algorithms such as linear regression, logistic
regression, k-means clustering, k-nearest neighbor, etc
Matplotlib : for data visualization .
Mathematics and statistics :
Mathematics and Statistics play important role in developing machine
learning and data science related applications .
Topic Key Concepts Why It's Useful in ML
Algebra Variables, functions Building equations and models
Linear Data representation and
Vectors, matrices, tensors
Algebra transformations
Mean, median, standard deviation,
Statistics Understanding and analyzing data
Bayes
Probability Probability, conditional, Bayes rule Predictions and classification
Calculus Derivatives, gradients, chain rule Training models via optimization
Trigonometry tanh, sin, cos Used in some activation functions
Supervised learning in machine learning :
Python
Algorithm Description Common Use Cases
Libraries
Predicts continuous values using a House price prediction, scikit-learn,
Linear Regression
linear relationship sales forecasting statsmodels
Predicts binary/class labels using aSpam detection, disease scikit-learn,
Logistic Regression
logistic function prediction statsmodels
Credit risk analysis, scikit-learn,
Decision Tree Tree-like model for decision making
medical diagnosis XGBoost
Ensemble of decision trees for better Fraud detection, stock scikit-learn,
Random Forest
accuracy prediction XGBoost
Support Vector Finds optimal hyperplane for Face detection, text
scikit-learn
Machine (SVM) classification/regression classification
K-Nearest Neighbors Classifies based on closest training Recommender systems,
scikit-learn
(KNN) examples handwriting recognition
Probabilistic classifier using Bayes' Email filtering, sentiment
Naive Bayes scikit-learn, nltk
theorem analysis
Builds additive models by combining Insurance pricing, XGBoost,
Gradient Boosting
weak learners customer churn prediction LightGBM
Boosting algorithm that focuses on Fraud detection, credit
AdaBoost scikit-learn
misclassified examples scoring
Boosting algorithm optimized for Ecommerce ranking, churn
CatBoost CatBoost
categorical data modeling
Linear Discriminant Reduces dimensionality and Face recognition,
scikit-learn
Analysis (LDA) classifies bioinformatics
Quadratic
Similar to LDA but assumes different Classification with non-
Discriminant Analysis scikit-learn
covariances per class linear class boundaries
(QDA)
Unsupervised Learning Algorithms
Algorithm Type Used For Key Features Example Use Case
Grouping similar Centroid-based, needs Customer
K-Means Clustering
data number of clusters segmentation
Hierarchical Building cluster Dendrogram structure, Gene sequence
Clustering
Clustering trees no fixed k analysis
Grouping with Density-based, detects
DBSCAN Clustering Spatial data analysis
noise handling outliers
Probabilistic model,
Gaussian Mixture
Clustering Soft clustering assumes normal Market segmentation
Model (GMM)
distribution
PCA (Principal
Dimensionality Reducing feature Linear, unsupervised, Image compression,
Component
Reduction space keeps variance pre-processing
Analysis)
Dimensionality Visualizing high- Non-linear, good for Visualizing
t-SNE
Reduction dimensional data 2D/3D projections embeddings
Neural Network Deep learning-based Anomaly detection,
Autoencoders Feature extraction
(Dim. Reduction) encoder-decoder image denoising
Finding item Frequent itemsets, Market basket
Apriori Algorithm Association
relationships support-confidence analysis
Faster association Tree-based, avoids E-commerce
FP-Growth Association
rule mining candidate generation recommendation
Reinforcement Learning Algorithms
Example Use
Algorithm Type Used For Key Features
Case
Learning optimal Game AI, robot
Q-Learning Value-Based Off-policy, uses Q-table
actions path planning
Learns from current Grid-world
SARSA Value-Based On-policy learning
policy navigation
Value-Based Q-learning with Uses neural nets to Playing Atari
Deep Q-Network (DQN)
(Deep RL) deep networks estimate Q-values games
Learning policy Monte Carlo-based, Simple control
REINFORCE Policy-Based
directly updates after episodes problems
Policy + Value- Stable and fast Two networks: actor Continuous action
Actor-Critic
Based learning (policy), critic (value) space problems
A3C (Asynchronous Asynchronous agents Advanced game-
Policy + Value Parallel training
Advantage Actor-Critic) improve learning speed playing agents
DDPG (Deep
Combines DQN and Robotics,
Deterministic Policy Policy-Based Continuous control
policy gradient methods autonomous driving
Gradient)
PPO (Proximal Policy Safe and stable Simple, efficient, widely Game playing, real-
Policy-Based
Optimization) training used world control
Temporal
Learning with Balance between MC Real-time learning
TD(0), TD(λ) Difference
bootstrapping and TD methods tasks
Methods
Unsupervised Learning Algorithms
Algorithm Type Used For Key Features Example Use Case
Grouping similar Centroid-based, needs Customer
K-Means Clustering
data number of clusters segmentation
Hierarchical Building cluster Dendrogram structure, Gene sequence
Clustering
Clustering trees no fixed k analysis
Grouping with Density-based, detects
DBSCAN Clustering Spatial data analysis
noise handling outliers
Probabilistic model,
Gaussian Mixture
Clustering Soft clustering assumes normal Market segmentation
Model (GMM)
distribution
PCA (Principal
Dimensionality Reducing feature Linear, unsupervised, Image compression,
Component
Reduction space keeps variance pre-processing
Analysis)
Dimensionality Visualizing high- Non-linear, good for Visualizing
t-SNE
Reduction dimensional data 2D/3D projections embeddings
Neural Network Deep learning-based Anomaly detection,
Autoencoders Feature extraction
(Dim. Reduction) encoder-decoder image denoising
Finding item Frequent itemsets, Market basket
Apriori Algorithm Association
relationships support-confidence analysis
Faster association Tree-based, avoids E-commerce
FP-Growth Association
rule mining candidate generation recommendation
Reinforcement Learning Algorithms
Example Use
Algorithm Type Used For Key Features
Case
Learning optimal Game AI, robot
Q-Learning Value-Based Off-policy, uses Q-table
actions path planning
Learns from current Grid-world
SARSA Value-Based On-policy learning
policy navigation
Value-Based Q-learning with Uses neural nets to Playing Atari
Deep Q-Network (DQN)
(Deep RL) deep networks estimate Q-values games
Learning policy Monte Carlo-based, Simple control
REINFORCE Policy-Based
directly updates after episodes problems
Policy + Value- Stable and fast Two networks: actor Continuous action
Actor-Critic
Based learning (policy), critic (value) space problems
A3C (Asynchronous Asynchronous agents Advanced game-
Policy + Value Parallel training
Advantage Actor-Critic) improve learning speed playing agents
DDPG (Deep
Combines DQN and Robotics,
Deterministic Policy Policy-Based Continuous control
policy gradient methods autonomous driving
Gradient)
PPO (Proximal Policy Safe and stable Simple, efficient, widely Game playing, real-
Policy-Based
Optimization) training used world control
Temporal
Learning with Balance between MC Real-time learning
TD(0), TD(λ) Difference
bootstrapping and TD methods tasks
Methods
What Are Parameters in Machine Learning?
Parameters are values that the learning algorithm optimizes during training to make better predictions.
They directly affect the output of the model.
Examples:
In linear regression, the parameters are the weights (coefficients) and bias (intercept).
In neural networks, parameters include the weights and biases of each neuron.
In decision trees, the structure (like split thresholds) is based on learned decisions — though not
typically called "parameters", they play a similar role.
Types of Parameters
There are two main categories often mentioned:
1. Model Parameters
Learned during training.
Affect predictions.
Examples:
Weights in linear regression or neural networks.
Split values in decision trees
2. Hyperparameters
Set manually before training.
Not learned from data — instead, you tune them using methods like grid search or cross-
validation.
Examples:
Learning rate
Number of hidden layers
Number of trees in a random forest
Regularization strength
Term Learned or Set? Examples Role
Parameters Learned Weights, biases, thresholds Determine model output
Hyperparameters Manually set Learning rate, depth, #neurons Control training process
What Are Hyperparameters?
Hyperparameters are the settings or controls that you choose before training a machine learning
model.
They are not learned from data like parameters. Instead, you set them manually or with tuning techniques (like
Grid Search or Random Search).
Why Are Hyperparameters Important?
They control how the model learns.
They affect the speed, accuracy, and performance of the model.
Good hyperparameter choices → Better model.
Examples :
--------------
Imagine you’re baking a cake.
Ingredients like flour, sugar = parameters (learned from data)
Oven temperature, baking time = hyperparameters (you set them before baking)
Use Hyperparameter Tuning:
a. Grid Search – Try all combinations
b. Random Search – Try random combinations
c. Bayesian Optimization – Smart search
d. Automated ML (AutoML) – Automatically selects best ones
What is Hyperparameter Tuning?
When building a machine learning model, you choose values like:
Learning rate
Number of layers
Max depth
Number of neighbors (k in KNN)
But how do you know which values are best?
==> Hyperparameter tuning helps you find the best combination for high accuracy.
Total Error = Bias² + Variance + Noise
Then use only .transform() on your test or new data:
This formula tells us why machine learning models make mistakes.
X_test_scaled
Every = scaler.transform(X_test)
model has some error. This error comes from 3 parts: Bias, Variance, and Noise.
Simply
Term: fit_transform() makes your data Definition
cleaner, fairer, and more understandable for the
(Simple)
machine. Error from wrong assumptions. Model is too simple and misses patterns.
Bias
(Underfitting)
Error from too much sensitivity. Model is too complex and changes a lot.
Variance
(Overfitting)
a. Regression
Noise Error we( cannot
continues values
remove. )
It's randomness or unknown factors in the data.
Linear Regression
Decision Tree Regressor
Random Forest Regressor
· fit() – Learns statistical properties from your data (like mean, standard deviation, etc.)
b. Classification
· transform() ( categorial
– Applies those values
statistics to scale ) the data
or modify
Logistic
The shortcut Regression
fit_transform() --> this helps to both run at same time .
K-Nearest Neighbors
Decision Tree Classifier
Using fit_transform() :
Naive Bayes
Support Vector
Normalizes Machine (SVM)
or standardizes the data
Removes bias caused by large values dominating smaller ones
· ForRegression: MAE,
Helps models MSE, RMSE,
converge faster R²
andScore
learn better
· For Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix, ROC-AUC
Ag Salar
e y
Hyperparameter
30,00 Tuning
25
0
80,00
GridSearchCV
45 RandomizedSearchCV
0
Without scaling: Formula Meaning
MAE (Mean Absolute
The model thinks salary is more
( \frac{1}{n} \sumimportant (because 80,000 > 45). y_{\text{true}} - y_{\text{pred}}
Error)
It leads to biased learning.
MSE (Mean Squared 1n∑(ytrue−ypred)2\frac{1}{n} \sum (y_{\ Squared differences; penalizes
Error)
After fit_transform(): text{true}} - y_{\text{pred}})^2 large errors more.
RMSE (Root Mean Same units as target variable, but
MSE\sqrt{MSE}
Squared
Error)
Both features are scaled to a similar range (e.g., between -1 and 1) punishes big mistakes heavily.
MAPE (Mean
Now the model treats them fairly.
Absolute Percentage ( \frac{1}{n} \sum \frac{ y_{\text{true}} - y_{\text{pred}}
Error)
fit_transform()?
Measures how much variance in
1−SSresSStot1 - \frac{SS_{\text{res}}}{SS_{\
R² Score
------------ the target is explained by the
text{tot}}}
model.
Always use it on your training data during data preprocessing:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
By the source of error (Bias–Variance decomposition)
These explain why errors happen in models:
Bias Error
Comes from wrong assumptions or oversimplified models.
High bias = underfitting (model is too simple, missing patterns).
Variance Error
Comes from too much sensitivity to training data.
High variance = overfitting (model memorizes instead of generalizing).
Irreducible Error (Noise)
Comes from randomness or missing information in the real world.
Can’t be reduced by any model
Formula:
Total Error=Bias^2+Variance+Noise
Classification errors
For categorical predictions (e.g., spam vs. not spam), errors are often measured differently:
Accuracy = % of correct predictions.
Precision = Out of predicted positives, how many were actually positive?
Recall = Out of actual positives, how many were predicted as positive?
F1-score = Harmonic mean of precision and recall.
Log Loss = Measures how close predicted probabilities are to the true classes.