0% found this document useful (0 votes)

4 views22 pages

MLT Study

Uploaded by

mics2025006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views22 pages

MLT Study

Uploaded by

mics2025006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Part 1: Foundational ML Concepts 🧠

Types of ML Techniques

● Supervised Learning: This is like learning with a teacher. The algorithm is trained on a
labeled dataset, which means each data point is tagged with the correct output. The
goal is to learn a mapping function that can predict the output for new, unseen data.
○ Example: Predicting house prices (regression) based on features like area and
location, or classifying emails as spam or not spam (classification).
● Unsupervised Learning: This is like learning without a teacher. The algorithm is given
unlabeled data and tries to find patterns and structure on its own.
○ Example: Grouping customers into different segments based on their purchasing
behavior (clustering), or reducing the number of features in a dataset
(dimensionality reduction).
● Semi-Supervised Learning: A middle ground between supervised and unsupervised
learning. It uses a small amount of labeled data and a large amount of unlabeled data.
This is useful when labeling data is expensive and time-consuming.
● Reinforcement Learning: This is about learning to make decisions. An agent learns by
interacting with an environment. It receives rewards for good actions and penalties for
bad ones. The goal is to learn a policy that maximizes the cumulative reward.
○ Example: Training a program to play a game like chess or controlling a robot to
perform a task.

Key Learning Paradigms

● Agent: The learner or decision-maker in reinforcement learning.

● Self-supervised Learning: A type of unsupervised learning where the supervision
signal is generated from the input data itself. For example, a model learns to predict the
next word in a sentence by looking at the previous words.
● Active Learning: The algorithm can interactively query a user (or another information
source) to label new data points. It tries to select the most informative data points to
query, to learn more efficiently.
● Passive Learning: The standard paradigm where the algorithm is given a fixed dataset
and learns from it passively, without any ability to influence which data it sees.

Eager vs. Lazy Learners

● Eager Learners: These algorithms build a classification model from the training data
before receiving any test data. They spend more time on training but are fast during
prediction.
○ Examples: Linear Regression, Decision Trees, SVM.
● Lazy Learners: These algorithms defer the learning process until it's time to make a
prediction. They simply store the training data. They are fast to train but can be slow to
predict.
○ Example: K-Nearest Neighbors (KNN), where the algorithm looks for the 'k'
closest training examples to make a prediction.

Hypothesis, Bias, and Learning Types

● Hypothesis (h) and Hypothesis Space (H): A hypothesis is a specific function that the
learning algorithm picks to best approximate the true target function. The hypothesis
space is the set of all possible hypotheses the algorithm can choose from. For linear
regression, the hypothesis space is the set of all possible linear equations.
● Inductive Learning: This is the core of most ML. It involves generalizing from specific
examples to create a general rule. The conclusions are probable, not guaranteed.
● Inductive Bias: The set of assumptions a learner uses to make predictions on unseen
data. Without some bias, an algorithm cannot generalize beyond the training data. A
common bias is assuming a linear relationship in linear regression.
● Deductive Learning: This involves moving from a general rule to a specific conclusion.
It's about logical deduction and is less common in machine learning, which typically
deals with uncertain data.

Parametric vs. Non-parametric Algorithms

● Parametric Algorithms: These algorithms have a fixed number of parameters,

regardless of the amount of training data. They make strong assumptions about the form
of the function they are trying to learn (e.g., linear).
○ Pros: Fast, require less data.
○ Cons: Limited complexity, can lead to underfitting if assumptions are wrong.
○ Examples: Linear Regression, Logistic Regression.
● Non-parametric Algorithms: These algorithms do not make strong assumptions about
the form of the target function. The number of parameters often grows with the training
data.
○ Pros: Flexible, can fit a wide range of functions.
○ Cons: Require more data, slower, prone to overfitting.
○ Examples: K-Nearest Neighbors, Decision Trees.
Overfitting and Underfitting

This is a central challenge in machine learning, related to the bias-variance tradeoff.

● Underfitting: The model is too simple to capture the underlying patterns in the data. It
performs poorly on both the training and test sets. It has high bias.
● Overfitting: The model learns the training data too well, including the noise. It performs
very well on the training set but poorly on the test set. It has high variance.
● Good Fit: The model captures the underlying pattern and generalizes well to new data.
Sample OLS Question

Q: Given the data points (1, 2), (2, 4), (3, 5), find the OLS regression line y^=β0+β1x.

1. Calculate means: xˉ=(1+2+3)/3=2, yˉ=(2+4+5)/3=11/3.

2. Calculate β1:
○ Numerator:
(1−2)(2−11/3)+(2−2)(4−11/3)+(3−2)(5−11/3)=(−1)(−5/3)+0+(1)(4/3)=9/3=3.
○ Denominator: (1−2)2+(2−2)2+(3−2)2=1+0+1=2.
○ β1=3/2=1.5.
3. Calculate β0:
○ β0=yˉ−β1xˉ=11/3−(1.5)(2)=11/3−3=2/3≈0.67.
4. Result: The regression line is y^=0.67+1.5x.
Stochastic Gradient Descent (SGD)

● Batch Gradient Descent: Computes the gradient using the entire training set at each
step. Can be very slow for large datasets.
● Stochastic Gradient Descent (SGD): Computes the gradient using a single training
example at each step. Much faster and can help escape local minima, but the updates
are noisy.
● Mini-Batch Gradient Descent: A compromise that updates the parameters using a
small batch of training examples. This is the most common approach.
Correlation and Multicollinearity
Correlation Analysis for Multicollinearity

Correlation measures the statistical relationship or association between two variables. It tells
you how one variable changes as the other one does.

Multicollinearity is a phenomenon that occurs in regression models when two or more

independent variables (predictors) are highly correlated with each other. This means one
predictor can be linearly predicted from the others with a substantial degree of accuracy.

Think of it like having two different witnesses in a trial who tell the exact same story. Hearing the
story a second time doesn't add much new information and can make it difficult to know which
witness is more important. Similarly, in a model, multicollinearity makes it hard to determine the
individual effect of each correlated predictor on the outcome variable.

Why is it a problem? 🧐
● Unstable Coefficients: The coefficient estimates for the correlated variables can
change erratically in response to small changes in the model or the data.
● Difficult Interpretation: It becomes challenging to determine the individual contribution
of each predictor. You can't say "a one-unit increase in X1 causes a B1 increase in Y"
because you can't change X1 without also changing its correlated counterpart, X2.
● Inflated Standard Errors: This makes the coefficients seem statistically insignificant
when they might actually be important.

How to detect it?

1. Correlation Matrix: Calculate the correlation coefficient between every pair of
independent variables. A common rule of thumb is that a correlation coefficient of 0.7 or
higher (or lower than -0.7) indicates potential multicollinearity.
2. Heatmap Visualization: A heatmap provides a clear visual representation of the
correlation matrix, making it easy to spot highly correlated pairs.

Interpreting VIF:

● VIF = 1: No correlation. This is the baseline.

● 1 < VIF < 5: Moderate correlation. This is often acceptable.
● VIF > 5 or 10: High correlation and a cause for concern. It indicates that the model's
coefficients are poorly estimated.

Logistic Regression
Logistic Regression Overview

Despite its name, Logistic Regression is a fundamental algorithm for binary classification,
not regression. It's used to predict a categorical outcome that has two possible values, such as
Yes/No, True/False, or 1/0.

For example, you could use logistic regression to predict:

● Whether an email is spam (1) or not spam (0).

● Whether a customer will churn (Yes) or not (No).

The core idea is to model the probability that a given input belongs to a particular class.
Model Validation and Evaluation
Splitting the Data

To evaluate a model's performance on unseen data, we must first split our dataset.

● Hold-Out Method: This is the simplest strategy. The dataset is split into two parts: a
training set (e.g., 80%) and a testing set (e.g., 20%). The model is built on the training
set and evaluated on the testing set.
○ Drawback: The performance metric can be highly dependent on which data
points end up in the training vs. testing set.
● Stratified Partition: This is an improved version of the hold-out method, crucial for
classification. It ensures that the proportion of different classes is the same in both the
training and testing sets as it is in the original dataset. This prevents a situation where,
for example, the testing set has no examples of a minority class.

Cross-Validation (CV)

Cross-validation is a more robust technique that gives a more reliable estimate of model
performance.

● k-Fold Cross-Validation:
1. The dataset is randomly split into 'k' equal-sized subsets, called folds.
2. The model is trained and tested 'k' times.
3. In each iteration, one fold is held out as the test set, and the remaining k-1 folds
are used for training.
4. The final performance is the average of the performance scores from all 'k'
iterations. Common choices for 'k' are 5 or 10.
● Leave-One-Out Cross-Validation (LOOCV): This is an extreme case of k-Fold CV
where k equals the number of data points (n). In each iteration, one single data point is
used to test the model, and the rest (n−1) are used to train it. It's computationally
expensive but provides a very thorough validation.
● Stratified k-Fold: This combines k-Fold CV with stratification. When creating the folds, it
ensures that each fold is representative of the overall class distribution. This is the
recommended standard for most classification problems.
Metrics for Multiclass Classifiers

When you have more than two classes (e.g., classifying images as Cat, Dog, or Bird), you need
to average the binary metrics.

● Macro-Averaging: Calculate the metric (e.g., precision) independently for each class
and then take the unweighted average. It treats all classes equally.
● Micro-Averaging: Aggregate the counts of TPs, FPs, and FNs across all classes and
then calculate the metric once. It gives more weight to the bigger classes. For
Micro-Averaging, Precision = Recall = F1 Score = Overall Accuracy.

Decision Trees
Decision Tree Overview

A Decision Tree is a supervised learning algorithm that works like a flowchart. It splits the data
into smaller and smaller subsets based on a series of questions about the features, eventually
arriving at a decision or a class label in the leaf nodes.

Key Terminology:

● Root Node: The starting node, which represents the entire dataset.
● Decision Node: An internal node that tests a feature and splits the data into branches
based on the outcome.
● Leaf Node: A terminal node that represents a final classification (decision).
● Homogeneous (Pure) Node: A node containing data points from only one class.
● Heterogeneous (Impure) Node: A node containing a mix of data points from multiple
classes.

The algorithm's goal is to create splits that make the resulting child nodes as pure as possible.
Weather Forecast Decision Tree Example 🌳
Goal: Decide whether to Play or Don't Play tennis based on the weather. Features:
Outlook, Humidity, Wind.

Here's how the tree might be built:

1. Root Node: The tree starts with the entire dataset, which is impure (contains both Play
and Don't Play examples).
2. First Split: The algorithm calculates which feature (Outlook, Humidity, or Wind) will
best split the data into purer subsets. Let's say it's Outlook. This becomes the root
node.
3. Branches: The Outlook node splits into three branches: Sunny, Overcast, and
Rain.
4. Subsequent Splits:
○ The data that goes down the Overcast branch might all be Play. This branch
ends in a pure leaf node labeled Play.
○ The data in the Sunny branch is still mixed. The algorithm now looks at the
remaining features (Humidity, Wind) to find the best feature to split this Sunny
subset. Let's say Humidity is best.
○ This creates two new branches under Sunny: High (which leads to a Don't
Play leaf node) and Normal (which leads to a Play leaf node).
○ This process continues recursively until all branches end in pure leaf nodes or a
stopping condition is met.

ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Machine Learning Overview Guide
No ratings yet
Machine Learning Overview Guide
68 pages
Module 3
No ratings yet
Module 3
63 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Final ML
No ratings yet
Final ML
2 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Unit1 ML
No ratings yet
Unit1 ML
15 pages
Machine Learning & Data Types Guide
No ratings yet
Machine Learning & Data Types Guide
22 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Classification
No ratings yet
Classification
53 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
Summary - Data Analytics& Machine Learning
No ratings yet
Summary - Data Analytics& Machine Learning
18 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
ML Concepts
No ratings yet
ML Concepts
8 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
AI & ML Basics for Beginners
No ratings yet
AI & ML Basics for Beginners
13 pages
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
No ratings yet
Financial Machine Learning-Unit-1: Dr. J.Dhanalakshmi
70 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Module 1
No ratings yet
Module 1
50 pages
Module 3-1
No ratings yet
Module 3-1
7 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
MLSC Final Notes
No ratings yet
MLSC Final Notes
24 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Machine Learning & NLP Overview
No ratings yet
Machine Learning & NLP Overview
41 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
Slide 1
No ratings yet
Slide 1
29 pages
Data Science Interview - 1
No ratings yet
Data Science Interview - 1
32 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Data Analysis Chap 3
No ratings yet
Data Analysis Chap 3
21 pages
Machine Learning: Spam Filtering & Regression
No ratings yet
Machine Learning: Spam Filtering & Regression
8 pages
DSA - Stacks and Queues
No ratings yet
DSA - Stacks and Queues
80 pages
DSA - Linked List
No ratings yet
DSA - Linked List
37 pages
Course Handout MA 2013
No ratings yet
Course Handout MA 2013
5 pages
Queue Assignement
No ratings yet
Queue Assignement
2 pages
Notes
No ratings yet
Notes
3 pages
Cybercrime Flashcards
No ratings yet
Cybercrime Flashcards
2 pages
904 - Ayush Jha - Internship Letter For NoQs
No ratings yet
904 - Ayush Jha - Internship Letter For NoQs
4 pages
University of Mumbai: Third Year Semester VI
No ratings yet
University of Mumbai: Third Year Semester VI
11 pages
W.A.S.M.U.Widanaarachchi Postgraduate Institute of Science University of Peradeniya Peradeniya, Sri Lanka Csc2239@pgis - LK
No ratings yet
W.A.S.M.U.Widanaarachchi Postgraduate Institute of Science University of Peradeniya Peradeniya, Sri Lanka Csc2239@pgis - LK
7 pages
c11 - Quantitative Data Analysis and Interpretation
No ratings yet
c11 - Quantitative Data Analysis and Interpretation
37 pages
MA20228: Statistics For Business I Assessed Coursework 1: Alberto Martín Izquierdo (Ami29)
No ratings yet
MA20228: Statistics For Business I Assessed Coursework 1: Alberto Martín Izquierdo (Ami29)
2 pages
How Analytics Has Changed in The Last 10 Years (And How It's Stayed The Same)
No ratings yet
How Analytics Has Changed in The Last 10 Years (And How It's Stayed The Same)
4 pages
Student Attendance at Flag Ceremonies
100% (1)
Student Attendance at Flag Ceremonies
9 pages
Data Interview
No ratings yet
Data Interview
8 pages
Compass Maritime Case Analysis
33% (6)
Compass Maritime Case Analysis
31 pages
Economics Project Guide for Students
No ratings yet
Economics Project Guide for Students
2 pages
MBA Papers
No ratings yet
MBA Papers
82 pages
ARMS
No ratings yet
ARMS
46 pages
About The Classification and Regression Supervised Learning Problems
No ratings yet
About The Classification and Regression Supervised Learning Problems
3 pages
Mldoc Intro
No ratings yet
Mldoc Intro
4 pages
Pravin Sip Report
No ratings yet
Pravin Sip Report
50 pages
Apriori Report
No ratings yet
Apriori Report
16 pages
Economics Class 11 Notes Chapter Correlation
50% (4)
Economics Class 11 Notes Chapter Correlation
4 pages
Intro to Econometrics Course Guide
No ratings yet
Intro to Econometrics Course Guide
3 pages
The Influence of Motivation Competence and Individual Characteristics On Performance Clerk The Study in The City of Makassar
No ratings yet
The Influence of Motivation Competence and Individual Characteristics On Performance Clerk The Study in The City of Makassar
6 pages
Final Thesis - Dhiraj 2
No ratings yet
Final Thesis - Dhiraj 2
76 pages
Persistent Errors in Solving Linear Programming: Hevy Risqi Maharani, Nila Ubaidah
No ratings yet
Persistent Errors in Solving Linear Programming: Hevy Risqi Maharani, Nila Ubaidah
4 pages
Big Data Defination Aspect
No ratings yet
Big Data Defination Aspect
30 pages
Two-Way ANOVA Guide for Researchers
No ratings yet
Two-Way ANOVA Guide for Researchers
46 pages
What Is Science Cornell Notes Example
No ratings yet
What Is Science Cornell Notes Example
3 pages
Time Series Analysis Spark Practical
No ratings yet
Time Series Analysis Spark Practical
293 pages
Business Analytics Using IOT, AI and ML
No ratings yet
Business Analytics Using IOT, AI and ML
15 pages
Probability and Statistics Dec 2023
No ratings yet
Probability and Statistics Dec 2023
2 pages
(Ebook) Marketing Research by Daniel Nunan, David F. Burks, Naresh K. Malhotra ISBN 9781292308722, 1292308729 Download
No ratings yet
(Ebook) Marketing Research by Daniel Nunan, David F. Burks, Naresh K. Malhotra ISBN 9781292308722, 1292308729 Download
85 pages
7 Types of Analysis Meneses
No ratings yet
7 Types of Analysis Meneses
1 page
Advanced Regression Analysis
No ratings yet
Advanced Regression Analysis
13 pages

MLT Study

Uploaded by

MLT Study

Uploaded by

Part 1: Foundational ML Concepts 🧠

Key Learning Paradigms

●​ Agent: The learner or decision-maker in reinforcement learning.

Eager vs. Lazy Learners

Hypothesis, Bias, and Learning Types

Parametric vs. Non-parametric Algorithms

●​ Parametric Algorithms: These algorithms have a fixed number of parameters,

This is a central challenge in machine learning, related to the bias-variance tradeoff.

1.​ Calculate means: xˉ=(1+2+3)/3=2, yˉ​=(2+4+5)/3=11/3.

Multicollinearity is a phenomenon that occurs in regression models when two or more

How to detect it?

●​ VIF = 1: No correlation. This is the baseline.

For example, you could use logistic regression to predict:

●​ Whether an email is spam (1) or not spam (0).

Here's how the tree might be built:

You might also like

● Agent: The learner or decision-maker in reinforcement learning.

● Parametric Algorithms: These algorithms have a fixed number of parameters,

1. Calculate means: xˉ=(1+2+3)/3=2, yˉ=(2+4+5)/3=11/3.

● VIF = 1: No correlation. This is the baseline.

● Whether an email is spam (1) or not spam (0).