Unit 3_Classification - NaiveBayes_SVM.pdf

Machine Learning
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’ Grade Accredited
Department of Information Technology
NBAAccredited-UG Programme
Ms. K. D. Patil
Assistant Professor

Contents - Classification
• Sigmoid function, Classification Algorithm in Machine Learning,
Decision Trees, Ensemble Techniques: Bagging and boosting, Adaboost
and gradient boost, Random Forest, Naïve Bayes Classifier, Support
Vector Machines. Performance Evaluation: Confusion Matrix, Accuracy,
Precision, Recall, AUC-ROC Curves, F-Measure
Machine Learning Department of Information Technology

Course Outcome
• CO3: To apply different classification algorithms for various machine
learning applications.

Naive Bayes Classifier
• Naive Bayes classifier is a straightforward and powerful algorithm for the
classification task.
• Even if we are working on a data set with millions of records with some
attributes, it is suggested to try Naive Bayes approach.
• Naive Bayes classifier gives great results when we use it for textual data
analysis. Such as in Natural Language Processing (NLP).
• It is named as "Naive" because it assumes the presence of one feature
does not affect other features. The "Bayes" part of the name refers to its
basis in Bayes’ Theorem.
• Naive Bayes is a kind of classifier which uses the Bayes Theorem.

• Bayes Theorem works on conditional probability.
• Conditional probability is the probability that something will happen,
given that something else has already occurred.
• Using the conditional probability, we can calculate the probability of an
event using its prior knowledge.
• Naive Bayes classifier is a straightforward and powerful algorithm for the
classification task.
• It predicts membership probabilities for each class such as the probability
that given record or data point belongs to a particular class.
• The class with the highest probability is considered as the most likely class.

• Naive Bayes classifier assumes that all the features are unrelated to each
other.
• Presence or absence of a feature does not influence the presence or
absence of any other feature.
• This approach is based on the assumption that the features of the input
data are conditionally independent given the class, allowing the algorithm
to make predictions quickly and accurately.
• Example:
• Spam Filtering
• Sentiment Analysis
• Fraud Detection

Types of Naive Bayes Classifier
• Gaussian Naive Bayes:
• When attribute values are continuous, an assumption is made that the
values associated with each class are distributed according to Gaussian
distribution i.e., Normal Distribution.
• When plotted, it gives a bell shaped curve which is symmetric about
the mean of the feature values

• Multinomial Naive Bayes:
• Multinomial Naive Bayes is preferred to use on
data that is multinomial distributed.
• It is one of the standard classic algorithms. Which
is used in text categorization (classification).
• Each event in text classification represents the
occurrence of a word in a document.
• A multinomial distribution is a statistical
distribution that models the results of N
independent trials, where each trial can result in
one of K distinct, mutually exclusive outcomes
with fixed probabilities that sum to one.

• Bernoulli Naive Bayes:
• Bernoulli Naive Bayes is used on the data that is distributed according
to multivariate Bernoulli distributions. i.e., multiple features can be
there, but each one is assumed to be a binary-valued (Bernoulli,
boolean) variable. So, it requires features to be binary valued.
• Example:
• One application would be text classification with ‘bag of words’
model where the 1s & 0s are “word occurs in the document” and
“word does not occur in the document” respectively.

• Refer the following naive bayes classifier formula to calculate probability.
P(A|B) = P(B|A) . P(A)/P(B)
Where,
P(A|B) = It is a target probability of the class
P(B|A) = It is a probability of predictor given class
P(A) = It is the prior probability a class
P(B) = It is the prior probability of predictor

• Advantages:
• Naive Bayes Algorithm is a fast, highly scalable algorithm.
• Naive Bayes can be used for Binary and Multiclass classification.
• It provides different types of Naive Bayes Algorithms like GaussianNB,
MultinomialNB, BernoulliNB.
• It is a simple algorithm that depends on doing a bunch of counts.
• Great choice for Text Classification problems.
• It's a popular choice for spam email classification.
• It can be easily train on small dataset.
• Disadvantages:
• It considers all the features to be unrelated, so it cannot learn the
relationship between features.

Naive Bayes Classifier in Python
# Import libraries
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.30)
from sklearn.naive_bayes import GaussianNB
# Build a Gaussian Classifier
model = GaussianNB()
# Model training
model.fit(X_train, y_train)

Naive Bayes Classifier in Python
# Predict Output
predicted = model.predict([X_test[6]])
print("Actual Value:", y_test[6])
print("Predicted Value:", predicted[0])
# Evaluation Metrics
from sklearn.metrics import (accuracy_score,confusion_matrix, f1_score,)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_pred, y_test)
F1 = f1_score(y_pred, y_test, average="weighted")

Support Vector Machine
• It is a supervised machine learning problem where we try to find a
hyperplane that best separates the two classes.
• SVM selects best hyperplane by finding the maximum margin between the
hyper planes that means maximum distances between the two classes.
• Example:
• Bioinformatics and Medical Applications
• Text and Natural Language Processing (NLP)
• Image Recognition and Classification
• Recommendation Systems

• Terminology:
• Support Vectors:
• These are the points that are closest to the hyperplane.
• A separating line will be defined with the help of these data points.
• Margin:
• It is the distance between the hyperplane and the observations
closest to the hyperplane (support vectors).
• In SVM large margin is considered a good margin.
• There are two types of margins hard margin and soft margin.

• Terminology:
• Hyperplane:
• It is a decision boundary that separates different classes in feature
space.
• It is represented by the equation wx + b = 0 in linear classification.

• SVM can be of two types:
• Linear SVM:
• Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight
line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM:
• Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier used is called
as Non-linear SVM classifier.

• SVM can be of two types:

Support Vector Machine - Working
• SVM is defined in terms of the support vectors only, we don’t have to
worry about other observations since the margin is made using the points
which are closest to the hyperplane (support vectors).
• Suppose we have a dataset that has two classes (green and blue). We
want to classify that the new data point as either blue or green.

• To classify these points, we can have many decision boundaries.
• Since we are plotting the data points in a 2-dimensional graph we call this
decision boundary a straight line but if we have more dimensions, we call this
decision boundary a “hyperplane”.

• The best hyperplane is that plane that has the maximum distance from both the
classes, and this is the main aim of SVM.
• This is done by finding different hyperplanes which classify the labels in the best
way then it will choose the one which is farthest from the data points or the one
which has a maximum margin.

• Advantages of SVM
• SVM works better when the data is Linear
• It is more effective in high dimensions
• With the help of the kernel trick, we can solve any complex problem
• SVM is not sensitive to outliers
• Can help us with Image classification
• Disadvantages of SVM
• Choosing a good kernel is not easy
• It doesn't show good results on a big dataset

Thank You!!!
Happy Learning!!!

Unit 3_Classification - NaiveBayes_SVM.pdf

More Related Content

What's hot

Similar to Unit 3_Classification - NaiveBayes_SVM.pdf

More from KanchanPatil34

Recently uploaded

Unit 3_Classification - NaiveBayes_SVM.pdf