KEMBAR78
Machine Learning | PDF | Machine Learning | Variance
0% found this document useful (0 votes)
9 views25 pages

Machine Learning

Machine Learning (ML) is a subset of Artificial Intelligence that enables computers to learn from data for predictions without explicit programming. It includes types such as supervised, unsupervised, and reinforcement learning, each with specific applications and algorithms. Data preprocessing, feature scaling, and exploratory data analysis are critical steps in the ML process to ensure model accuracy and efficiency.

Uploaded by

Yalla Ravikumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views25 pages

Machine Learning

Machine Learning (ML) is a subset of Artificial Intelligence that enables computers to learn from data for predictions without explicit programming. It includes types such as supervised, unsupervised, and reinforcement learning, each with specific applications and algorithms. Data preprocessing, feature scaling, and exploratory data analysis are critical steps in the ML process to ensure model accuracy and efficiency.

Uploaded by

Yalla Ravikumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

What is Machine Learning?

Machine Learning is a subset of Artificial Intelligence (AI) where computers learn from data to make predictions
or decisions without being explicitly programmed.

Example:

Instead of writing rules to detect spam emails, an ML model learns from past emails (labeled as spam/not spam)
and predicts future emails.

Types of Machine Learning


There are three main types:

1. Supervised Learning

Learns from labeled data (input-output pairs).


.
Used for classification (e.g., spam detection) and regression (e.g., house price prediction).

2. Unsupervised Learning

Works with unlabeled data to find hidden patterns.

Used for clustering (e.g., customer segmentation) and dimensionality reduction.

3. Reinforcement Learning

Learns by interacting with an environment and receiving rewards/punishments.


Used in gaming (e.g., AlphaGo), robotics, and self-driving cars.

1. Numerical Data (Quantitative)


Represents measurable quantities and can be either:

Discrete: Countable numbers (whole numbers).

Example: Number of students in a class (1, 2, 3, ...).


Continuous: Can take any value within a range (decimals).

Example: Temperature (36.5°C, 98.6°F), Height (5.9 ft).

Use in ML:

Regression (predicting continuous values).


Algorithms: Linear Regression, Decision Trees, Neural Networks.
2. Categorical Data (Qualitative)
Represents categories or groups with no inherent order.

Binary: Only two categories (Yes/No, True/False).


Nominal: More than two categories with no order.

Example: Colors (Red, Green, Blue), Gender (Male, Female).

Use in ML:

Classification (grouping data into categories).


Algorithms: Logistic Regression, Random Forest, SVM.

Preprocessing Needed:

One-Hot Encoding (converting categories into binary columns).

3. Ordinal Data
Represents categories with a meaningful order but no fixed numerical difference.

Example:

Education Level (High School < Bachelor’s < Master’s).


Customer Ratings (Poor < Fair < Good < Excellent).

Use in ML:

Can be treated as categorical or numerical (if encoded properly).


Algorithms: Decision Trees, Ordinal Regression.

Preprocessing Needed:

Label Encoding (assigning numbers in order, e.g., Poor=1, Excellent=4).

What is One-Hot Encoder?

One-Hot Encoding is a technique used in machine learning to convert categorical data into a numerical format
so that it can be used by algorithms. It creates binary columns for each unique category.

sparse=False or sparse_output=False in OneHotEncoder

These parameters control the type of output format returned by the OneHotEncoder.

One-Hot Encoding creates new columns — one for each category — and marks 1 or 0
depending on presence.

What is Label Encoding?

Label Encoding is a technique to convert categorical values (like strings) into numerical labels (integers). It
assigns a unique number to each category.
1. Variance :
Definition: Variance measures the average squared distance from the mean. It tells us
how far the values are spread out in the dataset.
Variance(σ2)= ∑ (xᵢ - μ)² / N, where:
σ² is the population variance.
xᵢ is each individual data point.
μ is the population mean.
N is the number of data points in the population.

2. Standard Deviation
Definition: Standard Deviation is the square root of the variance. It shows the average
distance from the mean, in the same unit as the original data.

Standard Deviation(σ)= Variance

What Does Standard Deviation Suggest?

 Low SD indicates that the numbers are close to the mean set.
 High SD signifies that the numbers are dispersed at a wider range.

Overfitting :the model perform well on training data but not well in testing data.
(or)
Low bias and high varience is called overfitting that means if the model well
perform in training that means Low bias orlse high bias

If the model good in testing that means it has low bias or else high varience.

Underfitting :The model doesn’t well perform in testing and training .


( high bias and low varience )

Appropriate / Generalized fitting : the model well perform on both training and testing
( low bias and low varience )

Noisy : noisy data refers to irrelevant or meaningless data points within a dataset that can
negatively impact model performance. That may spelling mistake , numeric typing mistake
and others .
Ex: [85, 87, 90, 88, 89, 0, 91]
The 0 might be a noisy value if it was entered wrongly (maybe meant to be 80).

Outliers : outliers are data points that significantly deviate from the majority of the dataset. They can be much
higher or lower than other values and can negatively impact model performance.

Ex: [30k, 32k, 29k, 35k, 1 crore]


The 1 crore is an outlier. It might be a real value (like a CEO’s salary), but it’s very far
from the average
Data Preprocessing
Data preprocessing is a critical step in machine learning that transforms raw data into a clean, structured format
for modeling. Poor-quality data leads to poor model performance, so proper preprocessing is essential.

Why Data Preprocessing?


 Real-world data is messy (missing values, noise, inconsistencies).
 ML algorithms require structured numerical data.
 Preprocessing improves accuracy, efficiency, and reliability.

Step 1: Data Collection

Step 2: Handling Missing Data

Step 3: Handling Categorical Data

Step 4: Feature Scaling

Step 5: Splitting Data into Train & Test Sets

Step 6: Handling Outliers

Step 7: Feature Engineering

Feature Scaling: Standardization vs. Normalization


Feature scaling is a critical preprocessing step that ensures all numerical features are on a similar scale,
preventing some features from dominating others in machine learning algorithms.

Why Scale Features?


 Algorithms like SVM, KNN, Neural Networks, and Gradient Descent are sensitive to feature scales.
 Distance-based algorithms (K-Means, KNN) perform poorly with unscaled data.
 Helps models converge faster (especially in gradient descent).
Scaler Use When Algorithms Affected
- You want to scale features between 0 and 1. - KNN, SVM, Neural Networks, Logistic
MinMaxScaler
Data has no outliers. Regression
- You want mean = 0 and std = 1. - Data has Linear Regression, Logistic
StandardScaler
normal (Gaussian) distribution. Regression, SVM, PCA
- Data has outliers. - You want to reduce the Any algorithm where outliers can
RobustScaler
effect of extreme values. affect performance
- Data is already centered around 0. - You want Works well for text-based models,
MaxAbsScaler
to preserve sparse data (like in NLP). Sparse datasets, L1/L2 models
- You want to scale each data row (not KNN, Cosine similarity models, NLP
Normalizer
columns) to unit norm. - Mainly used in text data. applications
1.Percentile :Percentile tells you how much percentage of people are below you in
something.

Example:

You wrote an exam and got the 90th percentile.

That means:

You did better than 90% of the students in that exam.

It doesn’t mean you got 90 marks. It just means your position (rank) is higher than 90% of others.

2. Quantile : Quantile means dividing data into equal parts.

Example:

Let’s say you have marks of 100 students, and you want to divide them into 4 equal groups:

1st group (0% to 25%)

2nd group (25% to 50%)

3rd group (50% to 75%)

4th group (75% to 100%)

These are called quartiles (because you divided into 4 parts).

Quantile just means:

“Let’s cut the data into parts — like 2 parts, 4 parts, or 10 parts.”

What is EDA? ( Exploratory Data Analysis )

EDA means exploring and understanding the data before building a model.
It helps us know what’s inside the data — patterns, problems, or interesting things.

Why EDA is Important?


To understand the shape and size of data

To spot missing values or outliers

To find relationships between columns

To decide what cleaning and transformations are needed


What Do We Do in EDA?

Step What it Means in Simple Words


1. Data Summary See rows, columns, data types
2. Null Values Check if any values are missing
3. Unique Values See what categories are there (e.g., cat, dog, etc.)
4. Descriptive Stats Check mean, median, min, max, std
5. Distribution Check See how values are spread using histograms
6. Correlation Matrix Check which columns are related to each other
7. Outlier Detection Find extreme values (too high/too low)
8. Visualizations Use plots (bar, pie, box, scatter, etc.) to see patterns

Common EDA Visualizations


Chart Type Use it for...
Histogram See distribution of numbers
Box Plot Spot outliers and data spread
Count Plot Count of categories (bar chart)
Heatmap Show correlation between features
Pie Chart Show part-to-whole (not preferred much)
Scatter Plot Relationship between 2 numeric columns

How does machine learning work ?


Machine Learning process includes Project Setup, Data Preparation, Modeling and Deployment.
Stages of machine learning
The following are the stages (detailed sequential process) of Machine Learning:

1. Data Collection

What happens: Gather data from various sources (databases, text files, images, audio, web scraping, etc.).

Goal: Get the raw data needed for the problem.

Output: Store in CSV or a database format.

2. Data Pre-processing

What happens: Clean and prepare the data.

 Remove duplicates
 Fix errors
 Handle missing values (remove or fill)
 Format data properly

Goal: Make the data ready for model training.

3. Choosing the Right Model

What happens: Select a suitable machine learning model.

Example models: Linear Regression, Decision Trees, Neural Networks

Goal: Choose the model based on data type, size, problem complexity, and resources.
4. Training the Model

What happens: Feed the clean data into the model so it can learn patterns.

Goal: Help the model make accurate predictions.

5. Evaluating the Model

What happens: Test the model on new (unseen) data.

Goal: Check how well the model performs.

6. Hyperparameter Tuning and Optimization

What happens: Adjust model settings (hyperparameters) to improve performance.

Use techniques like cross-validation.

Goal: Make the model work better on different data.

7. Predictions and Deploymentm

What happens: Use the trained model to make predictions on new data.

Deployment: Integrate the model into a real-world system or app.

Goal: Use the model for decision-making or automation.

Types of Machine Learning:


Supervised Machine Learning − It is a type of machine learning that trains the model using labeled datasets to
predict outcomes.
2. Unsupervised Machine Learning − It is a type of machine learning that learns patterns and structures within
the data without human supervision.

3. Semi-supervised Learning − It is a type of machine learning that is neither fully supervised nor fully
unsupervised. The semi-supervised learning algorithms basically fall between supervised and unsupervised learning
methods.

4. Reinforcement Machine Learning − It is a type of machine learning model that is similar to supervised
learning but does not use sample data to train the algorithm. This model learns by trial and error.
Importance of Machine Learning / features :

Machine Learning (ML) plays a crucial role in automation, data analysis, and decision-
making. Here’s why it’s important:

1. Data Processing

 Analyzes massive data from sources like social media, sensors, etc.
 Reveals hidden patterns to support better decisions.

2. Data-Driven Insights

 Finds trends and connections in large datasets that humans may miss.
 Helps in making accurate predictions and smarter decisions.

3. Automation

 Automates repetitive and time-consuming tasks.


 Reduces human error and saves effort.

4. Personalization

 Understands user behavior to give personalized recommendations.


 Used in e-commerce, social media, and streaming services.

5. Predictive Analytics

 Predicts future outcomes using past data.


 Used in sales forecasting, risk analysis, and demand planning.

6. Pattern Recognition

 Recognizes patterns in images, speech, and text.


 Applied in image processing, speech recognition, and NLP.

7. Finance

 Used in credit scoring, fraud detection, and algorithmic trading.

8. Retail

 Improves recommendation systems, customer service, and inventory management.

9. Fraud Detection & Cybersecurity

 Detects suspicious behavior and fraud in real-time.

10. Continuous Improvement

 ML models improve over time by learning from new data.


Disadvantages of Machine Learning :

1.Data Dependency

 ML needs a large amount of high-quality data.


 Poor or insufficient data leads to poor results.

2.High Computational Cost

 Some ML models (like deep learning) need powerful hardware (GPUs, high RAM).
 Increases cost and energy usage.

3.Time-Consuming

 Training models can take a lot of time, especially with large datasets.

4.Lack of Interpretability

 Complex models like neural networks are hard to understand ("black box").
 It’s difficult to explain why the model gave a certain output.

5.Overfitting

 The model may perform well on training data but fail on new (real-world) data.

6.Security and Privacy Issues

 Models trained on sensitive data may leak private information.


 Vulnerable to adversarial attacks.

7.Not Always Accurate

 ML predictions are not 100% correct.


 Wrong predictions can cause serious problems (e.g., in healthcare or finance).

Challenges of Machine Learning :


1.Data Collection & Quality

 Getting clean, labeled, and relevant data is difficult.


 Garbage in → garbage out.

2.Choosing the Right Algorithm

 There are many algorithms. Picking the best one for the task is a challenge.

3.Bias in Data

 If training data is biased, the model will also be biased and unfair.
4.Model Interpretability

 Hard to explain how and why complex models work.


 A big challenge in critical fields like healthcare, law, and finance.

5.Scalability

 Models may work on small data but struggle with huge datasets or in real-time

6.Hyperparameter Tuning

 Finding the best settings for the model is trial and error and takes time.

7.Deployment and Maintenance

 Putting ML into production is complex.


 Models need to be monitored and updated regularly.

8.Ethical & Legal Concerns

 Data privacy, algorithm fairness, and accountability are major concerns.

Machine Learning vs. Deep Learning :


Feature Machine Learning (ML) Deep Learning (DL)
A subset of ML that uses neural networks
Definition A subset of AI that learns from data
with multiple layers
Data Requirement Works with small to medium-sized data Requires large amounts of data
Decision Trees, SVM, KNN, Linear CNN, RNN, LSTM, GANs (deep neural
Algorithms Used
Regression, etc. networks)
Feature Manual feature selection is often Automatically extracts features from raw
Engineering required data
Hardware Needs powerful GPUs and high computing
Can work on traditional computers
Dependency resources
Slower to train due to complex
Training Time Faster to train
architectures
Interpretability Easier to understand and explain Harder to interpret (black-box models)
Application Spam detection, credit scoring, Image recognition, speech recognition, self-
Examples customer segmentation driving cars
1.Supervised learning: The supervised learning further can be classified into two types −
classification and regression.

· Classification: Predict categories (e.g., spam or not spam)

· Regression: Predict continuous values (e.g., house price)

a. Linear Regression
b. Logistic Regression
c. Decision Trees
d. Random Forest
e. K-Nearest Neighbors (KNN)
f. Support Vector Machines (SVM)
g. Naive Bayes
h. Neural Networks (for classification/regression)

2.UnSupervised learning: we can classify the unsupervised learning algorithms into three types −
clustering, association, and dimensionality reduction.

Clustering

Goal: Group similar data points together.

Use Case: Customer segmentation, market research, pattern recognition.

Common Algorithms:

a. K-Means Clustering
b. Hierarchical Clustering
c. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
d. Gaussian Mixture Models (GMM)

2. Association

Goal: Discover interesting relationships or rules among variables in large datasets.

Use Case: Market basket analysis (e.g., “People who buy bread often buy butter”).

🔧Common Algorithms:

e. Apriori Algorithm
f. Eclat Algorithm
g. FP-Growth Algorithm
3.Dimensionality Reduction

Goal: Reduce the number of input variables while preserving important information.

Use Case: Data visualization, noise removal, speeding up machine learning algorithms.

🔧 Common Algorithms:

a. Principal Component Analysis (PCA)


b. t-Distributed Stochastic Neighbor Embedding (t-SNE)
c. Linear Discriminant Analysis (LDA)
d. Autoencoders

Type Purpose Example Algorithms


K-Means, DBSCAN,
Clustering Grouping similar data points
Hierarchical
Association Finding relationships/patterns Apriori, FP-Growth, Eclat
Dimensionality Simplifying high-dimensional
PCA, t-SNE, Autoencoders
Reduction data

3.Reinforcement Learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by
performing actions and receiving rewards or penalties from the environment.

 It’s like learning by trial and error.


 The goal is to maximize cumulative rewards over time.

How it Works (Basic Terms)


Term Description
Agent The learner or decision-maker (e.g., robot, AI player).
Environment The world through which the agent moves.
A move the agent can make (e.g., move left, buy
Action
stock).
State The current situation of the agent.
Feedback from the environment (positive or
Reward
negative).
Policy A strategy the agent follows to take actions.
Value
Expected reward in the long run from a given state.
Function

· Positive Reinforcement

 Reward for a good action.


 Helps the agent learn quickly.

· Negative Reinforcement

 Penalizes wrong actions


 Discourages unwanted behavior.hhhh
Real-Life Examples of Reinforcement Learning :

Application Description
🎮 Games AlphaGo, Chess, Atari – learned strategies better than humans.
🚗 Self-Driving
Learn how to navigate, avoid obstacles, and follow traffic rules.
Cars
📈 Stock Trading Decide when to buy or sell stocks based on market rewards.
🧹 Robotics Teach robots to walk, pick up objects, etc.
🧠 Healthcare Personalized treatment plans and dosage control.

Libraries and Packages :

Numpy : for numeric computations


Pandas : for data manipulation and data preprocessing .
Scikit-learn : has implemented almost all the machine learning algorithms such as linear regression, logistic
regression, k-means clustering, k-nearest neighbor, etc

Matplotlib : for data visualization .

Mathematics and statistics :

Mathematics and Statistics play important role in developing machine


learning and data science related applications .

Topic Key Concepts Why It's Useful in ML


Algebra Variables, functions Building equations and models
Linear Data representation and
Vectors, matrices, tensors
Algebra transformations
Mean, median, standard deviation,
Statistics Understanding and analyzing data
Bayes
Probability Probability, conditional, Bayes rule Predictions and classification
Calculus Derivatives, gradients, chain rule Training models via optimization
Trigonometry tanh, sin, cos Used in some activation functions
Supervised learning in machine learning :

Python
Algorithm Description Common Use Cases
Libraries
Predicts continuous values using a House price prediction, scikit-learn,
Linear Regression
linear relationship sales forecasting statsmodels
Predicts binary/class labels using aSpam detection, disease scikit-learn,
Logistic Regression
logistic function prediction statsmodels
Credit risk analysis, scikit-learn,
Decision Tree Tree-like model for decision making
medical diagnosis XGBoost
Ensemble of decision trees for better Fraud detection, stock scikit-learn,
Random Forest
accuracy prediction XGBoost
Support Vector Finds optimal hyperplane for Face detection, text
scikit-learn
Machine (SVM) classification/regression classification
K-Nearest Neighbors Classifies based on closest training Recommender systems,
scikit-learn
(KNN) examples handwriting recognition
Probabilistic classifier using Bayes' Email filtering, sentiment
Naive Bayes scikit-learn, nltk
theorem analysis
Builds additive models by combining Insurance pricing, XGBoost,
Gradient Boosting
weak learners customer churn prediction LightGBM
Boosting algorithm that focuses on Fraud detection, credit
AdaBoost scikit-learn
misclassified examples scoring
Boosting algorithm optimized for Ecommerce ranking, churn
CatBoost CatBoost
categorical data modeling
Linear Discriminant Reduces dimensionality and Face recognition,
scikit-learn
Analysis (LDA) classifies bioinformatics
Quadratic
Similar to LDA but assumes different Classification with non-
Discriminant Analysis scikit-learn
covariances per class linear class boundaries
(QDA)
Unsupervised Learning Algorithms

Algorithm Type Used For Key Features Example Use Case


Grouping similar Centroid-based, needs Customer
K-Means Clustering
data number of clusters segmentation
Hierarchical Building cluster Dendrogram structure, Gene sequence
Clustering
Clustering trees no fixed k analysis
Grouping with Density-based, detects
DBSCAN Clustering Spatial data analysis
noise handling outliers
Probabilistic model,
Gaussian Mixture
Clustering Soft clustering assumes normal Market segmentation
Model (GMM)
distribution
PCA (Principal
Dimensionality Reducing feature Linear, unsupervised, Image compression,
Component
Reduction space keeps variance pre-processing
Analysis)
Dimensionality Visualizing high- Non-linear, good for Visualizing
t-SNE
Reduction dimensional data 2D/3D projections embeddings
Neural Network Deep learning-based Anomaly detection,
Autoencoders Feature extraction
(Dim. Reduction) encoder-decoder image denoising
Finding item Frequent itemsets, Market basket
Apriori Algorithm Association
relationships support-confidence analysis
Faster association Tree-based, avoids E-commerce
FP-Growth Association
rule mining candidate generation recommendation

Reinforcement Learning Algorithms


Example Use
Algorithm Type Used For Key Features
Case
Learning optimal Game AI, robot
Q-Learning Value-Based Off-policy, uses Q-table
actions path planning
Learns from current Grid-world
SARSA Value-Based On-policy learning
policy navigation
Value-Based Q-learning with Uses neural nets to Playing Atari
Deep Q-Network (DQN)
(Deep RL) deep networks estimate Q-values games
Learning policy Monte Carlo-based, Simple control
REINFORCE Policy-Based
directly updates after episodes problems
Policy + Value- Stable and fast Two networks: actor Continuous action
Actor-Critic
Based learning (policy), critic (value) space problems
A3C (Asynchronous Asynchronous agents Advanced game-
Policy + Value Parallel training
Advantage Actor-Critic) improve learning speed playing agents
DDPG (Deep
Combines DQN and Robotics,
Deterministic Policy Policy-Based Continuous control
policy gradient methods autonomous driving
Gradient)
PPO (Proximal Policy Safe and stable Simple, efficient, widely Game playing, real-
Policy-Based
Optimization) training used world control
Temporal
Learning with Balance between MC Real-time learning
TD(0), TD(λ) Difference
bootstrapping and TD methods tasks
Methods
Unsupervised Learning Algorithms

Algorithm Type Used For Key Features Example Use Case


Grouping similar Centroid-based, needs Customer
K-Means Clustering
data number of clusters segmentation
Hierarchical Building cluster Dendrogram structure, Gene sequence
Clustering
Clustering trees no fixed k analysis
Grouping with Density-based, detects
DBSCAN Clustering Spatial data analysis
noise handling outliers
Probabilistic model,
Gaussian Mixture
Clustering Soft clustering assumes normal Market segmentation
Model (GMM)
distribution
PCA (Principal
Dimensionality Reducing feature Linear, unsupervised, Image compression,
Component
Reduction space keeps variance pre-processing
Analysis)
Dimensionality Visualizing high- Non-linear, good for Visualizing
t-SNE
Reduction dimensional data 2D/3D projections embeddings
Neural Network Deep learning-based Anomaly detection,
Autoencoders Feature extraction
(Dim. Reduction) encoder-decoder image denoising
Finding item Frequent itemsets, Market basket
Apriori Algorithm Association
relationships support-confidence analysis
Faster association Tree-based, avoids E-commerce
FP-Growth Association
rule mining candidate generation recommendation

Reinforcement Learning Algorithms


Example Use
Algorithm Type Used For Key Features
Case
Learning optimal Game AI, robot
Q-Learning Value-Based Off-policy, uses Q-table
actions path planning
Learns from current Grid-world
SARSA Value-Based On-policy learning
policy navigation
Value-Based Q-learning with Uses neural nets to Playing Atari
Deep Q-Network (DQN)
(Deep RL) deep networks estimate Q-values games
Learning policy Monte Carlo-based, Simple control
REINFORCE Policy-Based
directly updates after episodes problems
Policy + Value- Stable and fast Two networks: actor Continuous action
Actor-Critic
Based learning (policy), critic (value) space problems
A3C (Asynchronous Asynchronous agents Advanced game-
Policy + Value Parallel training
Advantage Actor-Critic) improve learning speed playing agents
DDPG (Deep
Combines DQN and Robotics,
Deterministic Policy Policy-Based Continuous control
policy gradient methods autonomous driving
Gradient)
PPO (Proximal Policy Safe and stable Simple, efficient, widely Game playing, real-
Policy-Based
Optimization) training used world control
Temporal
Learning with Balance between MC Real-time learning
TD(0), TD(λ) Difference
bootstrapping and TD methods tasks
Methods
What Are Parameters in Machine Learning?

Parameters are values that the learning algorithm optimizes during training to make better predictions.
They directly affect the output of the model.

Examples:

In linear regression, the parameters are the weights (coefficients) and bias (intercept).

In neural networks, parameters include the weights and biases of each neuron.

In decision trees, the structure (like split thresholds) is based on learned decisions — though not
typically called "parameters", they play a similar role.

Types of Parameters

There are two main categories often mentioned:

1. Model Parameters

Learned during training.

Affect predictions.

Examples:

 Weights in linear regression or neural networks.


 Split values in decision trees

2. Hyperparameters

Set manually before training.

Not learned from data — instead, you tune them using methods like grid search or cross-
validation.

Examples:

 Learning rate
 Number of hidden layers
 Number of trees in a random forest
 Regularization strength

Term Learned or Set? Examples Role

Parameters Learned Weights, biases, thresholds Determine model output

Hyperparameters Manually set Learning rate, depth, #neurons Control training process
What Are Hyperparameters?
Hyperparameters are the settings or controls that you choose before training a machine learning
model.

They are not learned from data like parameters. Instead, you set them manually or with tuning techniques (like
Grid Search or Random Search).

Why Are Hyperparameters Important?


They control how the model learns.

They affect the speed, accuracy, and performance of the model.

Good hyperparameter choices → Better model.

Examples :

--------------

Imagine you’re baking a cake.

Ingredients like flour, sugar = parameters (learned from data)

Oven temperature, baking time = hyperparameters (you set them before baking)

Use Hyperparameter Tuning:

a. Grid Search – Try all combinations


b. Random Search – Try random combinations
c. Bayesian Optimization – Smart search
d. Automated ML (AutoML) – Automatically selects best ones

What is Hyperparameter Tuning?


When building a machine learning model, you choose values like:

 Learning rate
 Number of layers
 Max depth
 Number of neighbors (k in KNN)

But how do you know which values are best?

==> Hyperparameter tuning helps you find the best combination for high accuracy.
Total Error = Bias² + Variance + Noise
Then use only .transform() on your test or new data:
This formula tells us why machine learning models make mistakes.
X_test_scaled
Every = scaler.transform(X_test)
model has some error. This error comes from 3 parts: Bias, Variance, and Noise.

Simply
Term: fit_transform() makes your data Definition
cleaner, fairer, and more understandable for the
(Simple)
machine. Error from wrong assumptions. Model is too simple and misses patterns.
Bias
(Underfitting)
Error from too much sensitivity. Model is too complex and changes a lot.
Variance
(Overfitting)
a. Regression
Noise Error we( cannot
continues values
remove. )
It's randomness or unknown factors in the data.
 Linear Regression
 Decision Tree Regressor
Random Forest Regressor
· fit() – Learns statistical properties from your data (like mean, standard deviation, etc.)

b. Classification
· transform() ( categorial
– Applies those values
statistics to scale ) the data
or modify
 Logistic
The shortcut Regression
fit_transform() --> this helps to both run at same time .
 K-Nearest Neighbors
Decision Tree Classifier
Using fit_transform() :
 Naive Bayes


Support Vector
Normalizes Machine (SVM)
or standardizes the data
 Removes bias caused by large values dominating smaller ones
· ForRegression: MAE,
Helps models MSE, RMSE,
converge faster R²
andScore
learn better
· For Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix, ROC-AUC
Ag Salar
e y
Hyperparameter
30,00 Tuning
25
0
80,00
 GridSearchCV
45  RandomizedSearchCV
0

Without scaling: Formula Meaning


MAE (Mean Absolute
 The model thinks salary is more
( \frac{1}{n} \sumimportant (because 80,000 > 45). y_{\text{true}} - y_{\text{pred}}
Error)
 It leads to biased learning.
MSE (Mean Squared 1n∑(ytrue−ypred)2\frac{1}{n} \sum (y_{\ Squared differences; penalizes
Error)
After fit_transform(): text{true}} - y_{\text{pred}})^2 large errors more.
RMSE (Root Mean Same units as target variable, but
MSE\sqrt{MSE}
Squared

Error)
Both features are scaled to a similar range (e.g., between -1 and 1) punishes big mistakes heavily.
MAPE (Mean
Now the model treats them fairly.
Absolute Percentage ( \frac{1}{n} \sum \frac{ y_{\text{true}} - y_{\text{pred}}
Error)
fit_transform()?
Measures how much variance in
1−SSresSStot1 - \frac{SS_{\text{res}}}{SS_{\
R² Score
------------ the target is explained by the
text{tot}}}
model.
Always use it on your training data during data preprocessing:

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
By the source of error (Bias–Variance decomposition)
These explain why errors happen in models:

Bias Error

Comes from wrong assumptions or oversimplified models.

High bias = underfitting (model is too simple, missing patterns).

Variance Error

Comes from too much sensitivity to training data.

High variance = overfitting (model memorizes instead of generalizing).

Irreducible Error (Noise)

Comes from randomness or missing information in the real world.

Can’t be reduced by any model

Formula:

Total Error=Bias^2+Variance+Noise

Classification errors
For categorical predictions (e.g., spam vs. not spam), errors are often measured differently:

Accuracy = % of correct predictions.

Precision = Out of predicted positives, how many were actually positive?

Recall = Out of actual positives, how many were predicted as positive?

F1-score = Harmonic mean of precision and recall.

Log Loss = Measures how close predicted probabilities are to the true classes.

You might also like