0% found this document useful (0 votes)

7 views26 pages

Machine Learning Notes

The document provides an overview of machine learning, including its definition, types (supervised, unsupervised, and reinforcement learning), and general architecture. It explains key concepts and algorithms related to supervised learning, such as linear regression and logistic regression, as well as classification techniques and performance metrics. Additionally, it introduces K-Nearest Neighbors (KNN) as a classification algorithm.

Uploaded by

gaurav.mhatre.4520255

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

Machine Learning Notes

Uploaded by

gaurav.mhatre.4520255

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Unit 1 machine learning

1)What is machine learning

Ans: Machine learning is a type of artificial intelligence (AI) that
allows computers to learn and make decisions without being explicitly
programmed. Instead of being told exactly what to do, a machine
learning model is trained using data, allowing it to identify patterns
and make predictions or decisions based on new, unseen data.

In simpler terms, it’s like teaching a computer to recognize patterns,

make decisions, and improve over time by showing it examples (data),
rather than manually programming every rule.

There are three main types of machine learning:

1. Supervised learning: The model is trained on labeled data (i.e.,

data with known outcomes). The goal is for the model to learn
the relationship between input data and the correct output.

○ Example: Predicting house prices based on features like

location, size, and age.
2. Unsupervised learning: The model is trained on data without
labels. It tries to find hidden patterns or groupings in the data on
its own.

○ Example: Grouping customers into clusters based on

purchasing behavior.
3. Reinforcement learning: The model learns by interacting with
an environment and receiving feedback (rewards or penalties).
The goal is to maximize the cumulative reward over time.

○ Example: Teaching a robot to navigate a maze by rewarding

it for getting closer to the exit.
Machine learning is used in a wide variety of applications, from
recommendations on streaming services (like Netflix or Spotify) to
voice assistants (like Siri) and self-driving cars!

2) General architecture of ML systems

Ans: The general architecture of a Machine Learning (ML) system
typically consists of several key components. Here's a breakdown of
these components in a simple and concise manner:

Here’s a more balanced version that’s not too brief or too lengthy:

1. Data Collection

● Raw data is collected from various sources, such as databases,

sensors, or user input, which forms the foundation for the model.

2. Data Preprocessing

● The data is cleaned and transformed (handling missing values,

removing outliers, normalizing) to make it usable for training the
model.

3. Feature Engineering

● Important features are selected or created from the raw data to

improve the model's ability to make accurate predictions.

4. Model Selection

● Based on the problem (e.g., classification, regression), an

appropriate machine learning algorithm (like decision trees,
neural networks, etc.) is chosen.

5. Model Training

● The selected model is trained using historical data, learning the

underlying patterns through a process like gradient descent or
backpropagation.
6. Model Evaluation

● The model’s performance is tested using separate test data to

check how well it generalizes to new, unseen data. Metrics like
accuracy or mean squared error are used.

7. Model Tuning

● Hyperparameters (e.g., learning rate, number of layers) are

adjusted to improve model performance, often through methods
like grid search or cross-validation.

8. Deployment

● Once the model performs well, it is deployed into a production

environment where it can make real-time predictions on new
data.

9. Monitoring & Maintenance

● Post-deployment, the model’s performance is monitored, and it

may need to be retrained periodically as data patterns change
over time.

This provides a clear and concise explanation of the general

architecture without being too long.

3) Supervised Learning
Ans: Supervised Learning is a type of machine learning where the
model is trained on labeled data, meaning the input data is paired with
the correct output or label. The goal is for the model to learn the
relationship between inputs and outputs so that it can predict the
output for new, unseen data.

Key Points:
1. Labeled Data: The training dataset includes both input data and
the correct output (label).

○ Example: In a dataset of house prices, the features (input)

might include square footage, number of bedrooms, and
location, while the label (output) would be the price.
2. Training Process: The algorithm learns by finding patterns in
the labeled data. It adjusts its parameters to minimize the error
between its predictions and the actual labels.

○ Example: A model trained on labeled images of cats and

dogs learns to differentiate between them.
3. Types of Supervised Learning:

○ Classification: The output is a category or class.

■ Example: Email spam detection (spam or not spam).
○ Regression: The output is a continuous value.
■ Example: Predicting house prices based on features
like area and number of rooms.

Common Algorithms:

● Linear Regression (for regression problems)

● Logistic Regression (for classification problems)
● Decision Trees
● Support Vector Machines (SVM)
● K-Nearest Neighbors (KNN)

Example:

A supervised learning model can be trained on a dataset where inputs

like age, income, and education level are used to predict whether a
person will buy a product (yes or no).

In short, supervised learning works by learning from labeled data to

make predictions or classifications for new, unseen data.
4) Unsupervised Learning
Ans: Unsupervised Learning is a type of machine learning where the
model is trained on data that is not labeled. In other words, the
algorithm is given input data without any corresponding output labels,
and the goal is for the model to find patterns, structures, or groupings
in the data on its own.

Key Points:

1. No Labeled Data: The training dataset consists of input data

without any known output or label.

○ Example: A dataset of customer purchase behavior without

knowing which customer belongs to which group.
2. Pattern Recognition: The model tries to identify patterns,
similarities, or relationships in the data without predefined labels.

○ Example: Grouping customers into clusters based on their

purchasing habits.
3. Common Tasks in Unsupervised Learning:

○ Clustering: Grouping similar data points together.

■ Example: Segmenting customers into clusters based
on purchasing behavior (e.g., frequent shoppers,
occasional buyers).
○ Association: Finding relationships between variables.
■ Example: Market basket analysis (e.g., people who
buy bread are also likely to buy butter).
○ Dimensionality Reduction: Reducing the number of
features or variables in the data while retaining important
information.
■ Example: Using techniques like PCA (Principal
Component Analysis) to simplify complex datasets.

Common Algorithms:
● K-Means Clustering (for clustering tasks)
● Hierarchical Clustering
● DBSCAN (Density-Based Spatial Clustering of Applications with
Noise)
● Principal Component Analysis (PCA) (for dimensionality
reduction)
● Apriori Algorithm (for association rule learning)

Example:

An e-commerce platform might use unsupervised learning to segment

its customers into different groups based on their browsing behavior
and purchase history, without needing prior knowledge of customer
categories.

In short, unsupervised learning helps uncover hidden structures in

data when labels are not available, allowing for tasks like clustering
and anomaly detection.

5) reinforcement learning
ans:Reinforcement Learning (RL) is a type of machine learning
where an agent learns how to make decisions by interacting with an
environment and receiving feedback in the form of rewards or
penalties. The goal of reinforcement learning is to find a strategy (or
policy) that maximizes the cumulative reward over time.

Key Points:

1. Agent and Environment: In reinforcement learning, there are

two main components:

○ Agent: The entity that takes actions in the environment.

○ Environment: The external system the agent interacts
with.
2. Actions, States, and Rewards:
○ State: A representation of the current situation or condition
of the environment.
○ Action: A decision made by the agent to change the state
of the environment.
○ Reward: A numerical value given to the agent based on
the action it takes. Positive rewards encourage certain
actions, while negative rewards (penalties) discourage
others.
3. Learning Process:

○ The agent explores the environment by taking actions and

receiving feedback (rewards or penalties). Over time, it
learns which actions lead to higher rewards.
○ The agent tries to maximize the cumulative reward, often
by balancing exploration (trying new actions) and
exploitation (choosing known, rewarding actions).
4. Policy: A strategy or function that maps states to actions. The
goal is to find the optimal policy that maximizes the expected
cumulative reward.

5. Value Function: A function that estimates the expected reward

for an agent starting from a certain state, helping the agent
determine the best actions to take.

Unit 2
1)Linear Regression Models
Ans: Linear Regression is one of the simplest and most widely
used algorithms in machine learning and statistics. It models the
relationship between a dependent variable (target) and one or more
independent variables (features) by fitting a linear equation to
observed data.

Certainly! Here’s a simplified explanation of Simple Linear

Regression and Multiple Linear Regression:
Simple Linear Regression:

● Definition: Simple linear regression models the relationship

between a single independent variable and a dependent
variable by fitting a straight line.
● Equation: y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon
Where:
○ yy is the dependent variable (target).
○ xx is the independent variable (feature).
○ β0\beta_0 is the intercept.
○ β1\beta_1 is the slope (coefficient of xx).
○ ϵ\epsilon is the error term (residuals).
● Example: Predicting a person's salary based on years of
experience.

Multiple Linear Regression:

● Definition: Multiple linear regression models the relationship

between multiple independent variables and a dependent
variable by fitting a plane (or hyperplane in higher dimensions).

● Equation:
y=β0+β1x1+β2x2+⋯+βnxn+ϵy = \beta_0 + \beta_1 x_1 +
\beta_2 x_2 + \dots + \beta_n x_n + \epsilon
Where:

○ y is the dependent variable (target).

○ x1,x2,…,xnx_1, x_2, \dots, x_n are multiple independent
variables (features).
○ β0\beta_0 is the intercept.
○ β1,β2,…,βn\beta_1, \beta_2, \dots, \beta_n are the
coefficients of the features.
○ ϵ\epsilon is the error term.
● Example: Predicting a person’s salary based on multiple factors
like years of experience, education level, and age.

In essence, Simple Linear Regression uses one feature, while

Multiple Linear Regression uses two or more features to predict the
target variable.

2)Logistic Regression
Ans: Logistic Regression is a statistical method used for binary
classification problems, where the goal is to predict a categorical
outcome (typically two classes, such as 0 or 1, Yes or No, True or
False). Despite its name, logistic regression is used for classification,
not regression.

Key Points:

1. Goal:

○ Predict the probability that a given input belongs to a

certain class (usually class 1).
○ The output is a probability value between 0 and 1.
2. Sigmoid Function:

○ Logistic regression uses the sigmoid function (also called

the logistic function) to map any real-valued number into a
probability.
○ The sigmoid function is given by: σ(z)=11+e−z\sigma(z) =
\frac{1}{1 + e^{-z}} Where:
■ zz is the linear combination of input features (i.e.,
z=β0+β1x1+β2x2+…z = \beta_0 + \beta_1 x_1 +
\beta_2 x_2 + \dots).
3. Logistic Regression Model:
○ The model is based on the equation:
P(y=1∣x)=11+e−(β0+β1x1+β2x2+… )P(y=1|x) =
\frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 +
\dots)}} Where:
■ P(y=1∣x)P(y=1|x) is the probability that the target
variable yy equals 1 for a given input xx.
■ β0,β1,…\beta_0, \beta_1, \dots are the coefficients to
be learned.
4. Decision Boundary:

○ After obtaining the probability, a threshold (typically 0.5) is

applied to classify the input:
■ If P(y=1∣x)>0.5P(y=1|x) > 0.5, classify as 1 (positive
class).
■ If P(y=1∣x)≤0.5P(y=1|x) \leq 0.5, classify as 0
(negative class).
5. Training:

○ The coefficients β0,β1,…\beta_0, \beta_1, \dots are learned

using a method called Maximum Likelihood Estimation
(MLE), which aims to maximize the probability of the
observed data given the model.

3) Concept of Classification

Ans: Classification is a type of supervised learning where the

goal is to predict the category or class label of an input based
on its features. In classification problems, the output variable
is categorical, meaning the target variable consists of distinct
classes or categories.

Key Concepts:

1.Classes/Labels:
○ The output of a classification model is a discrete
value (label), which represents the category or class
the input belongs to.
○ Example: In a spam detection model, the classes
could be "Spam" and "Not Spam."
2.Input Features:

○ The model learns from the input data, which consists

of one or more features (variables).
○ Example: For spam detection, the features might
include the length of the email, frequency of certain
keywords, or the sender's domain.
3.Goal of Classification:

○ The objective is to learn a mapping from input

features to class labels, such that the model can
predict the correct class for new, unseen data.
4.Types of Classification:

○ Binary Classification: Involves two possible classes or

labels.
■ Example: Predicting whether an email is spam or
not spam.
○ Multiclass Classification: Involves more than two
possible classes.
■ Example: Classifying animals into categories like
"Cat," "Dog," and "Bird."
○ Multilabel Classification: Each input can belong to
multiple classes at the same time.
■ Example: A movie could belong to both the
"Action" and "Comedy" genres.
5.Classification Algorithms:

○ Several algorithms can be used for classification,

such as:
■ Logistic Regression
■ Decision Trees
■ Support Vector Machines (SVM)
■ K-Nearest Neighbors (KNN)
■ Random Forest
■ Naive Bayes
■ Neural Networks
6.Performance Metrics:

○ To evaluate the performance of a classification

model, several metrics can be used:
■ Accuracy: The proportion of correct predictions
out of all predictions.
■ Precision: The proportion of true positive
predictions among all positive predictions.
■ Recall (Sensitivity): The proportion of true
positive predictions among all actual positives.
■ F1-Score: The harmonic mean of precision and
recall, balancing both metrics.
■ Confusion Matrix: A table that describes the
performance of a classification model by
showing the true positive, true negative, false
positive, and false negative values.

3) need and Working of KNN

Ans: K-Nearest Neighbors (KNN):

K-Nearest Neighbors (KNN) is a simple, instance-based

machine learning algorithm used for classification and
regression. It makes predictions based on the majority class
(for classification) or average value (for regression) of the K
nearest data points in the feature space.

Need for KNN:

1.Simplicity: KNN is easy to understand and implement. It
requires minimal assumptions about the data, making it
useful when you don’t know much about the underlying
data distribution.
2.Versatility: It can be used for both classification and
regression tasks, making it adaptable to different types of
problems.
3.Non-parametric: KNN doesn’t assume any predefined
relationship between the features and the target variable.
It relies purely on the data's proximity, making it flexible
for complex datasets.
4.No Training Phase: Unlike many other algorithms, KNN
doesn't require a model-building phase. It stores the
training data and makes predictions during the testing
phase.

Working of KNN:

1.Choose the number of neighbors (K):

○ First, you need to select the number of neighbors, K,

that will be used to classify a new data point. A
common choice is 3 or 5, but it can be any positive
integer.
○ Small K: High variance, may lead to overfitting.
○ Large K: High bias, may lead to underfitting.
2.Distance Metric:

○ KNN uses a distance metric to find the nearest

neighbors. The most common metric is Euclidean
distance but other distance metrics like Manhattan,
Minkowski, or cosine similarity can also be used.
○ The Euclidean distance between two points pp and qq
in a 2D feature space is calculated as:
Distance(p,q)=(x1−x2)2+(y1−y2)2\text{Distance}(
p, q) = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2} where
(x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2) are the
coordinates of the points.
3.Find the K Nearest Neighbors:

○ For a given test point, calculate the distance from the

test point to all points in the training dataset.
○ Sort these distances in ascending order to find the K
nearest neighbors.
4.Classification (for classification problems):

○ The algorithm then checks the classes (labels) of the

K nearest neighbors.
○ The test point is assigned the most common class
among the K neighbors (majority voting).
5.Example: If K = 3, and the nearest 3 points belong to two
classes (2 points from class 1 and 1 from class 2), the new
point will be classified as class 1.

6.Regression (for regression problems):

○ For regression, instead of voting, the predicted value

is the average (or sometimes the median) of the K
neighbors' values.

4) Decsion Tree based Algorithm

Decision Tree-based Algorithm

A Decision Tree is a supervised learning algorithm used for

both classification and regression tasks. It models the
relationship between input features and output labels by
splitting the data into subsets based on the most significant
feature, making predictions based on the structure of the tree.
Basic Concept:

A decision tree works by recursively splitting the dataset into

subsets based on a feature that best separates the data at each
step. The final result is a tree-like structure where each leaf
node represents a class label (for classification) or a predicted
value (for regression), and each internal node represents a
feature test or decision.

Structure of a Decision Tree:

1.Root Node: The topmost node that represents the entire

dataset.
2.Internal Nodes: Represent decision points where the data
is split based on feature values.
3.Leaf Nodes: Represent the final output of the decision
tree, i.e., the predicted class (for classification) or value
(for regression).
4.Branches: Represent the outcome of a decision rule that
splits the data.

Working of Decision Tree:

1.Splitting: The tree starts at the root and splits the dataset
into two or more homogeneous sets based on the best
feature. The best feature is selected by using various
splitting criteria like Gini Impurity, Information Gain
(Entropy), or Variance Reduction.

2.Stopping Criteria: The tree continues to split until one or

more stopping criteria are met:

○ The dataset is completely classified.

○ The tree reaches a pre-defined maximum depth.
○ There are no further features to split on.
○ The number of data points in a node is smaller than a
threshold.
3.Prediction: Once the tree is built, it can make predictions
by following the path from the root node to the leaf node,
based on the input features.

Key Concepts in Decision Trees:

1.Entropy and Information Gain (for Classification):

○ Entropy: Measures the disorder or impurity in the

dataset.
○ Information Gain: Measures how much uncertainty is
reduced after a split based on a particular feature.
○ The feature that maximizes the Information Gain
(i.e., minimizes entropy) is chosen for splitting.
2.The formula for Entropy is:
Entropy(S)=−∑i=1kpilog⁡2pi\text{Entropy}(S) = -
\sum_{i=1}^{k} p_i \log_2 p_i
where pip_i is the probability of class ii in the dataset.

Information Gain is calculated as:
Information
Gain(S,A)=Entropy(S)−∑v∈A∣Sv∣∣S∣⋅Entropy(Sv)\text{Inf
ormation Gain}(S, A) = \text{Entropy}(S) - \sum_{v \in
A} \frac{|S_v|}{|S|} \cdot \text{Entropy}(S_v)
where AA is the attribute (feature), and SvS_v is the
subset of data for which the feature AA has value vv.

3.Gini Impurity (Alternative to Entropy):

○ Gini Impurity is another criterion for choosing the

best feature to split on. It measures how often a
randomly chosen element would be incorrectly
classified.
○ The formula for Gini Impurity is:
4.Gini(S)=1−∑i=1kpi2Gini(S) = 1 - \sum_{i=1}^{k} p_i^2
where pip_i is the probability of class ii.

5.Variance Reduction (for Regression):

○ For regression tasks, instead of entropy or Gini

impurity, the variance reduction criterion is used. The
algorithm chooses the feature that results in the
greatest reduction in variance in the target variable
after a split.

Steps to Build a Decision Tree:

1.Select the Best Feature: Choose the feature that provides

the best split, based on the criterion (e.g., Information
Gain, Gini Impurity, or Variance Reduction).

2.Split the Dataset: Divide the dataset into subsets based on

the chosen feature.

3.Repeat the Process: Recursively apply the splitting

process to each subset, creating child nodes and branches.

4.Stopping Conditions: If a stopping condition (like a pure

node or maximum depth) is met, stop the recursion.

5.Assign Output: Once the tree is built, assign the class label
or regression value to the leaf nodes based on the
majority class or mean value of the target variable in that
leaf.

5) SVM and its working

Ans: Support Vector Machine (SVM) and Its Working

Support Vector Machine (SVM) is a powerful supervised
learning algorithm primarily used for classification tasks,
although it can also be used for regression. The main goal of an
SVM is to find the optimal hyperplane that best separates
different classes in the feature space.

SVM is known for its effectiveness in high-dimensional spaces

and is widely used in tasks such as image classification, text
classification, and bioinformatics.

Key Concepts of SVM:

1.Hyperplane:

○ A hyperplane is a decision boundary that separates

the feature space into two halves.
○ In 2D, the hyperplane is simply a line.
○ In 3D, the hyperplane is a plane, and in higher
dimensions, it's a general hyperplane.
2.Support Vectors:

○ Support Vectors are the data points that are closest

to the hyperplane. These are the most important data
points because the position of the hyperplane is
determined by them.
○ They "support" the decision boundary (hence the
name) and help define the margin.
3.Margin:

○ The margin is the distance between the hyperplane

and the nearest support vectors.
○ SVM aims to maximize this margin, which helps in
creating a more generalizable model.
○ A larger margin generally means better classification
performance and fewer errors on unseen data.
4.Linear and Non-linear SVM:

○ Linear SVM is used when the data is linearly

separable, meaning a straight line (or hyperplane in
higher dimensions) can perfectly separate the
classes.
○ Non-linear SVM is used when the data is not linearly
separable. In such cases, SVM uses a kernel trick to
map the data into a higher-dimensional space where
it becomes linearly separable.

Working of SVM:

1. Linear SVM (For linearly separable data):

● Goal: Find the optimal hyperplane that separates the

classes with the maximum margin.

● Steps:

1.Representation of Data: Suppose you have a dataset

with two classes (e.g., Class 1 and Class 2). Each data
point has a feature vector and a corresponding class
label.
2.Hyperplane Definition: The hyperplane in a 2D space
is represented as a line: w⋅x+b=0w \cdot x + b = 0
Where:
■ ww is the weight vector (normal vector to the
hyperplane).
■ xx is the feature vector.
■ bb is the bias term (offset of the hyperplane).
3.Maximizing the Margin: The SVM algorithm aims to
maximize the margin between the classes. This is
achieved by minimizing the following cost function:
12∣∣w∣∣2\frac{1}{2} ||w||^2 Subject to the
constraints: yi(w⋅xi+b)≥1y_i (w \cdot x_i + b) \geq
1 where yiy_i is the class label of the i-th data point,
and xix_i is the i-th data point.
4.Solution: The solution is found using optimization
techniques, and the resulting hyperplane is the one
that maximizes the margin between the two classes.

2. Non-linear SVM (For non-linearly separable data):

● When the data cannot be separated by a straight line (or

hyperplane), SVM uses a kernel trick to transform the data
into a higher-dimensional space where it becomes linearly
separable.

● Kernel Trick:

○ The kernel function computes the inner product of

the data points in a higher-dimensional space without
explicitly computing the transformation.

○ Commonly used kernels include:

■ Linear Kernel: Used when the data is linearly

separable.
■ Polynomial Kernel: Used when the data has
polynomial relationships.
■ Radial Basis Function (RBF) Kernel (Gaussian
Kernel): Often used for non-linear decision
boundaries.
■ Sigmoid Kernel: Similar to a neural network
activation function.
○ By using a kernel, the algorithm implicitly maps the
input data into a higher-dimensional feature space,
where it looks for the optimal hyperplane in that
space.
Steps for Non-Linear SVM:

1.Kernel Function: Apply a kernel function (e.g., RBF,

polynomial) to transform the data into a
higher-dimensional space.
2.Optimization: Maximize the margin in the transformed
space while minimizing misclassification.
3.Decision Boundary: After applying the kernel trick, the
SVM finds the hyperplane that best separates the data in
the higher-dimensional space.
4.Class Prediction: Use the resulting hyperplane to classify
new data points.

Unit 3

1)K-means Clustering and its implementation

Ans: K-means Clustering – Theory

K-means Clustering is an unsupervised machine learning algorithm

used to partition a set of data points into K distinct groups, or
clusters, based on their similarity. It is one of the most popular
clustering algorithms due to its simplicity and efficiency.

Key Concepts of K-means Clustering:

1. Clusters:

○ A cluster is a collection of data points that are more similar

to each other than to points in other clusters. Data points
within the same cluster are close to one another in terms of
a chosen distance metric (typically Euclidean distance).
2. Centroid:

○ The centroid of a cluster is the central point that

represents the "average" of all data points within the
cluster. It is the point that minimizes the sum of squared
distances from all points in the cluster to it.
3. K (Number of Clusters):

○ K is the number of clusters that the algorithm will divide

the dataset into. This value must be predefined by the user
before applying the algorithm. The goal of K-means is to
partition the data into K clusters, where each data point
belongs to the nearest centroid.

How K-means Clustering Works:

The K-means algorithm follows an iterative process to find the best

possible centroids and cluster assignments:

1. Initialization:

○ Choose K initial centroids randomly from the dataset. These

centroids act as the starting points for the clusters.
2. Assignment Step:

○ For each data point, calculate the distance from the data
point to each of the K centroids and assign the point to the
nearest centroid. The most common distance metric used is
Euclidean distance:
Distance(x,c)=∑i=1n(xi−ci)2\text{Distance}(x, c) =
\sqrt{\sum_{i=1}^{n} (x_i - c_i)^2} where xx is the data
point and cc is the centroid.
3. Update Step:

○ After assigning all data points to the nearest centroid,

update the centroids by computing the mean of all data
points in each cluster. The new centroid is the average of all
the data points assigned to that cluster:
ck=1∣Sk∣∑xi∈Skxic_k = \frac{1}{|S_k|} \sum_{x_i \in
S_k} x_i where ckc_k is the centroid of the kk-th cluster,
and SkS_k is the set of data points in the kk-th cluster.
4. Repeat:

○ Repeat the Assignment Step and Update Step until the

centroids no longer change (i.e., convergence) or a
predefined number of iterations is reached. At this point,
the algorithm has converged and the final clusters are
formed.

2) Association rule Mining

ans:Association Rule Mining

Association Rule Mining is a technique in data mining used to

discover interesting relationships or patterns among items in large
datasets. It is widely used in market basket analysis to find
associations between products frequently bought together.

Key Concepts:

1. Itemsets:
A collection of one or more items. For example, {bread, butter}
is an itemset.

2. Association Rule:

A rule of the form A → B, meaning if itemset A occurs, itemset
B is likely to occur as well. For example, "If a customer buys
bread, they are likely to also buy butter."

3. Support:
The frequency of occurrence of an itemset in the dataset.
Support(A)=Number of transactions containing ATotal number of
transactions\text{Support}(A) = \frac{\text{Number of
transactions containing } A}{\text{Total number of
transactions}}
4. Confidence:
The likelihood that itemset B occurs given that A has occurred.
Confidence(A→B)=Support(A∪B)Support(A)\text{Confidence}(A
\rightarrow B) = \frac{\text{Support}(A \cup
B)}{\text{Support}(A)}
5. Lift:
The measure of how much more likely B is bought when A is
bought compared to when they are independent.
Lift(A→B)=Confidence(A→B)Support(B)\text{Lift}(A \rightarrow
B) = \frac{\text{Confidence}(A \rightarrow
B)}{\text{Support}(B)}

Steps in Association Rule Mining:

1. Frequent Itemset Generation:

Identify itemsets that appear frequently in the dataset, i.e.,
those that meet the minimum support threshold.

2. Rule Generation:

Generate rules from the frequent itemsets, and calculate their
confidence. Strong rules are those that have high confidence and
lift.

Popular Algorithms:

● Apriori Algorithm:
It generates frequent itemsets using a bottom-up approach, and
prunes out infrequent itemsets early.

● FP-Growth:
An efficient algorithm that builds a Frequent Pattern Tree
(FP-tree) to find frequent itemsets without generating candidate
itemsets.

Applications:

1. Market Basket Analysis:

Identifying products often bought together, helping in product
placement and promotions.

2. Recommendation Systems:

Suggesting products based on what customers have bought in
the past.

3. Cross-Selling:
Offering complementary products based on purchasing patterns.

Advantages:

1. Discover Hidden Patterns:

Finds valuable relationships in large datasets.

2. Business Insights:

Helps businesses optimize marketing, inventory, and sales
strategies.

Disadvantages:

1. Computational Complexity:

The process can be slow for large datasets.

2. Threshold Sensitivity:

Choosing the right support and confidence thresholds can be
difficult.
Conclusion:

Association rule mining is a powerful tool for discovering relationships

in data. It’s widely used for tasks like market basket analysis, product
recommendations, and cross-selling. However, careful tuning of
parameters is necessary for optimal performance.

ML Unit 1
No ratings yet
ML Unit 1
9 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
AIML Question Ans Part2
No ratings yet
AIML Question Ans Part2
25 pages
Unit-1 ML (1) .Docx 3rd Sem
No ratings yet
Unit-1 ML (1) .Docx 3rd Sem
20 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
Machine Learning Concepts & Types
No ratings yet
Machine Learning Concepts & Types
14 pages
Basic of Machine Learning
No ratings yet
Basic of Machine Learning
7 pages
Assignment No 1
No ratings yet
Assignment No 1
9 pages
ML 1
No ratings yet
ML 1
44 pages
AI Modeling
No ratings yet
AI Modeling
12 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
Tutorial Sheet1 (M.L.)
No ratings yet
Tutorial Sheet1 (M.L.)
49 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
ML UT 1 Merged
No ratings yet
ML UT 1 Merged
31 pages
ML Theory
No ratings yet
ML Theory
54 pages
Machine Learning
No ratings yet
Machine Learning
44 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Machine Learning Basics & Techniques
No ratings yet
Machine Learning Basics & Techniques
13 pages
Unit 1
No ratings yet
Unit 1
10 pages
Machine Learning
No ratings yet
Machine Learning
39 pages
Machine Learning Concise Notes
No ratings yet
Machine Learning Concise Notes
7 pages
ML Unit 1
No ratings yet
ML Unit 1
20 pages
ML Insem
No ratings yet
ML Insem
46 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
ML Theory Quest
No ratings yet
ML Theory Quest
4 pages
21CSC305P ML - Unit 1-E
No ratings yet
21CSC305P ML - Unit 1-E
137 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
12 pages
Mubbashir Assignment ML
No ratings yet
Mubbashir Assignment ML
10 pages
AI Module 1 Simple Notes
No ratings yet
AI Module 1 Simple Notes
14 pages
Machine Learning Applications
No ratings yet
Machine Learning Applications
39 pages
Unit I 2 Mark Answers ML
No ratings yet
Unit I 2 Mark Answers ML
3 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
Unit3-2 Marks
No ratings yet
Unit3-2 Marks
10 pages
ASSIGNMENT 1 Mavhine Learning
No ratings yet
ASSIGNMENT 1 Mavhine Learning
8 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Unit-1 ML
No ratings yet
Unit-1 ML
31 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
MachineLearning Perplexity
No ratings yet
MachineLearning Perplexity
5 pages
R22 Machine Learning Digital Notes Final
No ratings yet
R22 Machine Learning Digital Notes Final
143 pages
Machine Learning - Question Bank
No ratings yet
Machine Learning - Question Bank
45 pages
DS Unit2
No ratings yet
DS Unit2
23 pages
ML Unit-1
No ratings yet
ML Unit-1
28 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
20 pages
Everything You Need To Know About Machine Learning
No ratings yet
Everything You Need To Know About Machine Learning
6 pages
ML Solutions
No ratings yet
ML Solutions
34 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Chapter - 5 Learning
No ratings yet
Chapter - 5 Learning
38 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Book of 843 - AI - Student - HandbookXI-104-127
No ratings yet
Book of 843 - AI - Student - HandbookXI-104-127
24 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
SDL Unit 1
No ratings yet
SDL Unit 1
7 pages
Machine Learning and Deep Learning Basics
No ratings yet
Machine Learning and Deep Learning Basics
36 pages
Null 5
No ratings yet
Null 5
16 pages
Internship Report
No ratings yet
Internship Report
25 pages
Ôn tập deadlock - bài tập lập lịch cho CPU - TRẦN THỊ NHẬT LINH
No ratings yet
Ôn tập deadlock - bài tập lập lịch cho CPU - TRẦN THỊ NHẬT LINH
8 pages
Student's Digital Archive
No ratings yet
Student's Digital Archive
51 pages
Linux Certification Essentials
No ratings yet
Linux Certification Essentials
150 pages
Lua Programming Quick Reference
No ratings yet
Lua Programming Quick Reference
6 pages
Project Report - 2 On CreditCard Fraud Detection
No ratings yet
Project Report - 2 On CreditCard Fraud Detection
42 pages
Complexity Theory and Big O Notation
No ratings yet
Complexity Theory and Big O Notation
21 pages
Definition and Types of Modeling and Simulation FINAL
No ratings yet
Definition and Types of Modeling and Simulation FINAL
15 pages
Suction Unit User's Manual - Eng
No ratings yet
Suction Unit User's Manual - Eng
15 pages
7 Fresh and Simple Ways To Test Cross-Browser Compatibility - FreelanceFolder
No ratings yet
7 Fresh and Simple Ways To Test Cross-Browser Compatibility - FreelanceFolder
45 pages
Pre-Defense Board Information (Fall-2024) Day
No ratings yet
Pre-Defense Board Information (Fall-2024) Day
6 pages
BTEC Assignment Brief: (For NQF Only)
No ratings yet
BTEC Assignment Brief: (For NQF Only)
4 pages
3D Printing Prep Guide for Designers
No ratings yet
3D Printing Prep Guide for Designers
7 pages
Jdsref PDF
No ratings yet
Jdsref PDF
838 pages
DIY Smart Home with Arduino
100% (1)
DIY Smart Home with Arduino
17 pages
Indian Mobile Brands & Ambassadors
No ratings yet
Indian Mobile Brands & Ambassadors
9 pages
Auto Chemistry Analyzer User Manual en B2 HETO AU200 and AU150
No ratings yet
Auto Chemistry Analyzer User Manual en B2 HETO AU200 and AU150
225 pages
ECDIS Passage Planning Guide
No ratings yet
ECDIS Passage Planning Guide
5 pages
Data Analysis With STATA - Sample Chapter
No ratings yet
Data Analysis With STATA - Sample Chapter
22 pages
Literature Review Help for Students
100% (2)
Literature Review Help for Students
7 pages
OpenText VIM 16.3.4 - User Guide English
100% (10)
OpenText VIM 16.3.4 - User Guide English
138 pages
Basics of Telephony
100% (4)
Basics of Telephony
35 pages
Mohammad Qaasim - Network Engineer CV
No ratings yet
Mohammad Qaasim - Network Engineer CV
4 pages
TrueRTA Quick Start
No ratings yet
TrueRTA Quick Start
8 pages
LAB05 SCOR - Configure Cisco Firepower NGFW Discovery and IPS Policy
No ratings yet
LAB05 SCOR - Configure Cisco Firepower NGFW Discovery and IPS Policy
31 pages
Final Project New
100% (1)
Final Project New
127 pages
Ai Lab Reports
No ratings yet
Ai Lab Reports
7 pages
Automatic Differentiation in Solid Mechanics
No ratings yet
Automatic Differentiation in Solid Mechanics
7 pages
AdityaRai Task3
No ratings yet
AdityaRai Task3
56 pages
A Company Provides Vce Files Free and Valid Dumps PDF
No ratings yet
A Company Provides Vce Files Free and Valid Dumps PDF
10 pages

Machine Learning Notes

Uploaded by

Machine Learning Notes

Uploaded by

Unit 1 machine learning

1)​What is machine learning

In simpler terms, it’s like teaching a computer to recognize patterns,

There are three main types of machine learning:

1.​ Supervised learning: The model is trained on labeled data (i.e.,

○​ Example: Predicting house prices based on features like

○​ Example: Grouping customers into clusters based on

○​ Example: Teaching a robot to navigate a maze by rewarding

2) General architecture of ML systems

●​ Raw data is collected from various sources, such as databases,

●​ The data is cleaned and transformed (handling missing values,

●​ Important features are selected or created from the raw data to

●​ Based on the problem (e.g., classification, regression), an

●​ The selected model is trained using historical data, learning the

●​ The model’s performance is tested using separate test data to

●​ Hyperparameters (e.g., learning rate, number of layers) are

●​ Once the model performs well, it is deployed into a production

9. Monitoring & Maintenance

●​ Post-deployment, the model’s performance is monitored, and it

This provides a clear and concise explanation of the general

○​ Example: In a dataset of house prices, the features (input)

○​ Example: A model trained on labeled images of cats and

○​ Classification: The output is a category or class.

●​ Linear Regression (for regression problems)

A supervised learning model can be trained on a dataset where inputs

In short, supervised learning works by learning from labeled data to

1.​ No Labeled Data: The training dataset consists of input data

○​ Example: A dataset of customer purchase behavior without

○​ Example: Grouping customers into clusters based on their

○​ Clustering: Grouping similar data points together.

An e-commerce platform might use unsupervised learning to segment

In short, unsupervised learning helps uncover hidden structures in

1.​ Agent and Environment: In reinforcement learning, there are

○​ Agent: The entity that takes actions in the environment.

○​ The agent explores the environment by taking actions and

5.​ Value Function: A function that estimates the expected reward

Certainly! Here’s a simplified explanation of Simple Linear

●​ Definition: Simple linear regression models the relationship

Multiple Linear Regression:

●​ Definition: Multiple linear regression models the relationship

○​ y is the dependent variable (target).

In essence, Simple Linear Regression uses one feature, while

○​ Predict the probability that a given input belongs to a

○​ Logistic regression uses the sigmoid function (also called

○​ After obtaining the probability, a threshold (typically 0.5) is

○​ The coefficients β0,β1,…\beta_0, \beta_1, \dots are learned

Ans: Classification is a type of supervised learning where the

○​ The model learns from the input data, which consists

○​ The objective is to learn a mapping from input

○​ Binary Classification: Involves two possible classes or

○​ Several algorithms can be used for classification,

○​ To evaluate the performance of a classification

3) need and Working of KNN

Ans: K-Nearest Neighbors (KNN):

K-Nearest Neighbors (KNN) is a simple, instance-based

Need for KNN:

1.​Choose the number of neighbors (K):​

○​ First, you need to select the number of neighbors, K,

○​ KNN uses a distance metric to find the nearest

○​ For a given test point, calculate the distance from the

○​ The algorithm then checks the classes (labels) of the

6.​Regression (for regression problems):​

○​ For regression, instead of voting, the predicted value

4) Decsion Tree based Algorithm

Decision Tree-based Algorithm

A Decision Tree is a supervised learning algorithm used for

A decision tree works by recursively splitting the dataset into

Structure of a Decision Tree:

1.​Root Node: The topmost node that represents the entire

Working of Decision Tree:

2.​Stopping Criteria: The tree continues to split until one or

○​ The dataset is completely classified.

Key Concepts in Decision Trees:

1.​Entropy and Information Gain (for Classification):​

○​ Entropy: Measures the disorder or impurity in the

3.​Gini Impurity (Alternative to Entropy):​

1)What is machine learning

1. Supervised learning: The model is trained on labeled data (i.e.,

○ Example: Predicting house prices based on features like

○ Example: Grouping customers into clusters based on

○ Example: Teaching a robot to navigate a maze by rewarding

● Raw data is collected from various sources, such as databases,

● The data is cleaned and transformed (handling missing values,

● Important features are selected or created from the raw data to

● Based on the problem (e.g., classification, regression), an

● The selected model is trained using historical data, learning the

● The model’s performance is tested using separate test data to

● Hyperparameters (e.g., learning rate, number of layers) are

● Once the model performs well, it is deployed into a production

● Post-deployment, the model’s performance is monitored, and it

○ Example: In a dataset of house prices, the features (input)

○ Example: A model trained on labeled images of cats and

○ Classification: The output is a category or class.

● Linear Regression (for regression problems)

1. No Labeled Data: The training dataset consists of input data

○ Example: A dataset of customer purchase behavior without

○ Example: Grouping customers into clusters based on their

○ Clustering: Grouping similar data points together.

1. Agent and Environment: In reinforcement learning, there are

○ Agent: The entity that takes actions in the environment.

○ The agent explores the environment by taking actions and

5. Value Function: A function that estimates the expected reward

● Definition: Simple linear regression models the relationship

● Definition: Multiple linear regression models the relationship

○ y is the dependent variable (target).

○ Predict the probability that a given input belongs to a

○ Logistic regression uses the sigmoid function (also called

○ After obtaining the probability, a threshold (typically 0.5) is

○ The coefficients β0,β1,…\beta_0, \beta_1, \dots are learned

○ The model learns from the input data, which consists

○ The objective is to learn a mapping from input

○ Binary Classification: Involves two possible classes or

○ Several algorithms can be used for classification,

○ To evaluate the performance of a classification

1.Choose the number of neighbors (K):

○ First, you need to select the number of neighbors, K,

○ KNN uses a distance metric to find the nearest

○ For a given test point, calculate the distance from the

○ The algorithm then checks the classes (labels) of the

6.Regression (for regression problems):

○ For regression, instead of voting, the predicted value

1.Root Node: The topmost node that represents the entire

2.Stopping Criteria: The tree continues to split until one or

○ The dataset is completely classified.

1.Entropy and Information Gain (for Classification):

○ Entropy: Measures the disorder or impurity in the

3.Gini Impurity (Alternative to Entropy):

○ Gini Impurity is another criterion for choosing the

5.Variance Reduction (for Regression):

○ For regression tasks, instead of entropy or Gini

1.Select the Best Feature: Choose the feature that provides

2.Split the Dataset: Divide the dataset into subsets based on

3.Repeat the Process: Recursively apply the splitting

4.Stopping Conditions: If a stopping condition (like a pure

○ A hyperplane is a decision boundary that separates

○ Support Vectors are the data points that are closest

○ The margin is the distance between the hyperplane

○ Linear SVM is used when the data is linearly

● Goal: Find the optimal hyperplane that separates the

1.Representation of Data: Suppose you have a dataset

● When the data cannot be separated by a straight line (or

○ The kernel function computes the inner product of

○ Commonly used kernels include:

■ Linear Kernel: Used when the data is linearly

1.Kernel Function: Apply a kernel function (e.g., RBF,

1)K-means Clustering and its implementation

○ A cluster is a collection of data points that are more similar

○ The centroid of a cluster is the central point that

○ K is the number of clusters that the algorithm will divide

○ Choose K initial centroids randomly from the dataset. These

○ After assigning all data points to the nearest centroid,

○ Repeat the Assignment Step and Update Step until the

2. Association Rule:

1. Frequent Itemset Generation:

2. Rule Generation:

1. Market Basket Analysis: