KEMBAR78
Module 3 Supervised ML Algo | PDF | Linear Regression | Regression Analysis
0% found this document useful (0 votes)
7 views48 pages

Module 3 Supervised ML Algo

Uploaded by

vishal.patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views48 pages

Module 3 Supervised ML Algo

Uploaded by

vishal.patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Module 3

Supervised Machine
Learning algorithms
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that
can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such
as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without
depending on each other.
• Bayes: It is called Bayes because it depends on the principle of
Bayes' Theorem
•.
Bayes' Theorem:
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends
on the conditional probability.
• The formula for Bayes' theorem is given as:

• P(A|B) is Posterior probability: Probability of hypothesis A on the observed event


B.
• P(B (evidence) |A (hypothesis) ) is Likelihood probability: Probability of the
evidence given that the probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
Algorithm of Naïve Bayes'
Classifier:
• Compute the prior probability for the target class.
• Compute frequency matrix and Likelihood probability for each of the
feature.
• Use Bayes theorem to calculate the posterior probability of all
hypotheses.
• Use maximum posteriori hypothesis to classify the test object to the
hypothesis with the highest probability.
• Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a
class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other
Algorithms.
• It is the most popular choice for text classification problems.
• Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated,
so it cannot learn the relationship between features.
Applications of Naïve Bayes
Classifier
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier
is an eager learner.
• It is used in Text classification such as Spam filtering and Sentiment
analysis.
Types of Naïve Bayes Model
• Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular
word is present or not in a document. This model is also famous for document
classification tasks.
Example
• Calculate

• X=(age=young, income=medium, student=yes, credit_rating=fair)


KNN(K-Nearest Neighbor)
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.
KNN(K-Nearest Neighbor)
• The K-NN working can be explained on the basis of the below
algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data points
in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• Step-6: Our model is ready.
KNN(K-Nearest Neighbor)
KNN(K-Nearest Neighbor)
KNN(K-Nearest Neighbor)
• Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
• Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be
complex some time.
• The computation cost is high because of calculating the
distance between the data points for all the training
samples.
KNN(K-Nearest Neighbor)
• Example
KNN(K-Nearest Neighbor)
KNN(K-Nearest Neighbor)
KNN(K-Nearest Neighbor)
Decision tree
Introduction
• Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the
given dataset.
• It is a graphical representation for getting all the possible solutions
to a problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the
root node, which expands on further branches and constructs a tree-
like structure.
• In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
• Why use Decision Trees?

• Decision Trees usually mimic human thinking ability while making a


decision, so it is easy to understand.
• The logic behind the decision tree can be easily understood because it
shows a tree-like structure.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or
more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot
be segregated further after getting a leaf node.
• Decision node: Internal nodes.
Working of algorithm
• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
• Step-3: Divide the S into subsets that contains possible values for the best
attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node.
Attribute Selection Measures

• While implementing a Decision tree, the main issue arises that how to
select the best attribute for the root node and for sub-nodes. So, to
solve such problems there is a technique which is called as Attribute
selection measure or ASM. By this measurement, we can easily select
the best attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:
• Information Gain
• Gini Index
1. Information Gain:
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the
decision tree.
• A decision tree algorithm always tries to maximize the value of information gain,
and a node/attribute having the highest information gain is split first. It can be
calculated using the below formula:

• Entropy: Entropy is a metric to measure the impurity in a given attribute. It


specifies randomness in data. Entropy can be calculated as:
Example
Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary attribute by ASM).
The root node splits further into the next decision node (distance from
the office) and one leaf node based on the corresponding labels. The
next decision node further gets split into one decision node (Cab
facility) and one leaf node. Finally, the decision node splits into two leaf
nodes (Accepted offers and Declined offer).
Example
2. Gini Index:

• Gini index is a measure of impurity or purity used while creating a


decision tree in the CART(Classification and Regression Tree)
algorithm.
• An attribute with the low Gini index should be preferred as compared
to the high Gini index.
• It only creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
• Gini index can be calculated using the below formula:
• Gini Index= 1- ∑j Pj^2
Steps to Calculate Gini for a split

• Calculate Gini for sub-nodes, using formula sum of square of


probability for success and failure (p²+q²).
• Calculate Gini for split using weighted Gini score of each node of that
split
Advantages and disadvantages
• Advantages of the Decision Tree
• It is simple to understand as it follows the same process which a human follow while making
any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.

• Disadvantages of the Decision Tree


• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
• For more class labels, the computational complexity of the decision tree may increase.
Dataset
Linear regression
• Linear regression is one of the easiest and most popular Machine
Learning algorithms.
• It is a statistical method that is used for predictive analysis.
• Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between a
dependent (y) and one or more independent (y) variables, hence
called as linear regression.
Linear regression
• Since linear regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing according
to the value of the independent variable.
• The linear regression model provides a sloped straight line
representing the relationship between the variables.
Linear regression

The values for x and y variables are training datasets for Linear Regression model
representation.
Types of Linear Regression
• Simple Linear Regression:
• If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
• Multiple Linear regression:
• If more than one independent variable is used to predict the value of
a numerical dependent variable, then such a Linear Regression
algorithm is called Multiple Linear Regression.
Linear Regression Line

• A linear line showing the relationship between the dependent and


independent variables is called a regression line.
• Positive Linear Relationship:
• If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.
Linear Regression Line

• Negative Linear Relationship:


• If the dependent variable decreases on the Y-axis and independent
variable increases on the X-axis, then such a relationship is called a
negative linear relationship.
Simple Linear Regression Model
• The Simple Linear Regression model can be represented using the below equation:

• Where,
• a0= It is the intercept of the Regression line (can be obtained putting x=0) or Bias
in ML.
a1= It is the slope of the regression line, which tells whether the line is increasing
or decreasing.
ε = The error term. (For a good model it will be negligible)
Formulas
• Slope of line a1
• a1(slope) = sum((xi-mean(x)) * (yi-mean(y))) / sum((xi – mean(x))^2)
• a0= mean(y) - a1 * mean(x)
OR
Example
Year GDP 4wheeler_passengar_vehicle_sale(in
lakhs)
2011 6.2 26.3
2012 6.5 26.6
2013 5.4 25
2014 6.5 26
2015 7.1 27.9
2016 7.9 30.4
Multivariate\multiple linear
regression
• Response variable is affected by more than one predictor variable; for
such cases, the Multiple Linear Regression algorithm is used.
• Multiple Linear Regression is one of the important regression algorithms
which models the linear relationship between a single dependent
continuous variable and more than one independent variable.
• For MLR, the dependent or target variable(Y) must be the
continuous/real, but the predictor or independent variable may be of
continuous or categorical form.
• Each feature variable must model the linear relationship with the
dependent variable.
• MLR tries to fit a regression line through a multidimensional space of
data-points.
Multivariate\multiple linear
regression
• In Multiple Linear Regression, the target variable(Y) is a linear
combination of multiple predictor variables x1, x2, x3, ...,xn. Since it is
an enhancement of Simple Linear Regression, so the same is applied
for the multiple linear regression equation, the equation becomes:
Y = a0 + a1X1 + a2X2 + a3X3 + a4X4 +…+ anXn

• Where,
• Y= Output/Response variable
• a0, a1, a2, a3 , an....= Coefficients of the model.
• x1, x2, x3, x4,...= Various Independent/feature variable
Multivariate\multiple linear
regression
b1= [(Σx22)(Σx1y) – (Σx1x2)(Σx2y)] / [(Σx12) (Σx22) – (Σx1x2)2]

b2= [(Σx12)(Σx2y) – (Σx1x2)(Σx1y)] / [(Σx12) (Σx22) – (Σx1x2)2]

b0=
Multivariate\multiple linear
regression
• Σx12 = ΣX12 – (ΣX1)2 / n
• Σx22 = ΣX22 – (ΣX2)2 / n
• Σx1y = ΣX1y – (ΣX1Σy) / n
• Σx2y = ΣX2y – (ΣX2Σy) / n
• Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n
Example
X1 X2 Y
1 2 3
2 3 4
3 1 6
4 5 8
5 4 10
Sum=15 15 31
Mean=3 3 6.2

You might also like