What is Naive Bayes algorithm?
It is a classification technique based on Bayes’ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier assumes
that the presence of a particular feature in a class is unrelated to the presence of
any other feature.
For example, a fruit may be considered to be an apple if it is red, round, and about
3 inches in diameter. Even if these features depend on each other or upon the
existence of the other features, all of these properties independently contribute to
the probability that this fruit is an apple and that is why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c),
P(x) and P(x|c). Look at the equation below:
Above,
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
How Naive Bayes algorithm works?
Let’s understand it using an example. Below I have a training data set of
weather and corresponding target variable ‘Play’ (suggesting
possibilities of playing). Now, we need to classify whether players will
play or not based on weather condition. Let’s follow the below steps to
perform it.
Step 1: Convert the data set into a frequency table
Step 2: Create Likelihood table by finding the probabilities like Overcast
probability = 0.29 and probability of playing is 0.64.
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome of
prediction.
Problem: Players will play if weather is sunny. Is this statement is correct?
We can solve it using above discussed method of posterior probability.
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14
= 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability
Naive Bayes uses a similar method to predict the probability of different class
based on various attributes. This algorithm is mostly used in text classification and
with problems having multiple classes.
Applications of Naive Bayes Algorithms
Real time Prediction: Naive Bayes is an eager learning classifier and it is
sure fast. Thus, it could be used for making predictions in real time.
Multi class Prediction: This algorithm is also well known for multi class
prediction feature. Here we can predict the probability of multiple classes of target
variable.
Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers
mostly used in text classification (due to better result in multi class problems and
independence rule) have higher success rate as compared to other algorithms. As a
result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment
Analysis (in social media analysis, to identify positive and negative customer
sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative
Filtering together builds a Recommendation System that uses machine learning
and data mining techniques to filter unseen information and predict whether a user
would like a given resource or not
Naive Bayes classifiers are a collection of classification algorithms based
on Bayes’ Theorem. It is not a single algorithm but a family of algorithms
where all of them share a common principle, i.e. every pair of features being
classified is independent of each other.
To start with, let us consider a dataset.
Consider a fictional dataset that describes the weather conditions for playing a
game of golf. Given the weather conditions, each tuple classifies the conditions
as fit(“Yes”) or unfit(“No”) for playing golf.
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm,
which is based on Bayes theorem and used for solving
classification problems.
o It is mainly used in text classification that includes a high-
dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast
machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on
the basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and
Bayes, Which can be described as:
o Naïve: It is called Naïve because it assumes that the
occurrence of a certain feature is independent of the
occurrence of other features. Such as if the fruit is identified on
the bases of color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle
of Bayes' Theorem
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law,
which is used to determine the probability of a hypothesis with
prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the
observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given
that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before
observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help
of the below example:
Suppose we have a dataset of weather conditions and
corresponding target variable "Play". So using this dataset we need
to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we
need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given
features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or
not?
Solution: To solve this, first consider the below dataset:
Advantages of Naïve Bayes Classifier:
o Naïve Bayes is one of the fast and easy ML algorithms to
predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the
other Algorithms.
o It is the most popular choice for text classification
problems.
Disadvantages of Naïve Bayes Classifier:
o Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:
o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes
Classifier is an eager learner.
o It is used in Text classification such as Spam
filtering and Sentiment analysis.
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a
normal distribution. This means if predictors take continuous
values instead of discrete, then the model assumes that these
values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used
when the data is multinomial distributed. It is primarily used for
document classification problems, it means a particular
document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the
Multinomial classifier, but the predictor variables are the
independent Booleans variables. Such as if a particular word is
present or not in a document. This model is also famous for
document classification tasks.
Advantages of Naive Bayes Classifier
The following are some of the benefits of the Naive Bayes classifier:
It is simple and easy to implement
It doesn’t require as much training data
It handles both continuous and discrete data
It is highly scalable with the number of predictors and data points
It is fast and can be used to make real-time predictions
It is not sensitive to irrelevant features
Where is Naive Bayes Used?
You can use Naive Bayes for the following things:
Face Recognition
As a classifier, it is used to identify the faces or its other features, like
nose, mouth, eyes, etc.
Weather Prediction
It can be used to predict if the weather will be good or bad.
Medical Diagnosis
Doctors can diagnose patients by using the information that the classifier
provides. Healthcare professionals can use Naive Bayes to indicate if a
patient is at high risk for certain diseases and conditions, such as heart
disease, cancer, and other ailments.
News Classification
With the help of a Naive Bayes classifier, Google News recognizes
whether the news is political, world news, and so on.
As the Naive Bayes Classifier has so many applications, it’s worth
learning more about how it works.
Understanding Naive Bayes and Machine Learning
Machine learning falls into two categories:
Supervised learning
Unsupervised learning
Supervised learning falls into two categories:
Classification
Regression
Naive Bayes algorithm falls under classification.
What is Naive Bayes?
Let's start with a basic introduction to the Bayes theorem, named after Thomas Bayes
from the 1700s. The Naive Bayes classifier works on the principle of conditional
probability, as given by the Bayes theorem.
Let us go through some of the simple concepts of probability that we will use.
Consider the following example of tossing two coins. If we toss two coins and look at
all the different possibilities, we have the sample space as:{HH, HT, TH, TT}
While calculating the math on probability, we usually denote probability as P.
Some of the probabilities in this event would be as follows:
The probability of getting two heads = 1/4
The probability of at least one tail = 3/4
The probability of the second coin being head given the first coin is
tail = 1/2
The probability of getting two heads given the first coin is a head =
1/2
The Bayes theorem gives us the conditional probability of event A,
given that event B has occurred. In this case, the first coin toss will be B
and the second coin toss A. This could be confusing because we've
reversed the order of them and go from B to A instead of A to B.
According to Bayes theorem:
Let us apply Bayes theorem to our coin example. Here, we have two
coins, and the first two probabilities of getting two heads and at least one
tail are computed directly from the sample space.
Now in this sample space, let A be the event that the second coin is
head, and B be the event that the first coin is tails. Again, we reversed it
because we want to know what the second event is going to be.
We're going to focus on A, and we write that out as a probability of A
given B:
Probability = P(A|B)
= [ P(B|A) * P(A) ] / P(B)
= [ P(First coin being tail given the second coin is the head) * P(Second
coin being
head) ] / P(First coin being tail)
= [ (1/2) * (1/2) ] / (1/2)
= 1/2 = 0.5
Bayes theorem calculates the conditional probability of the occurrence
of an event based on prior knowledge of conditions that might be related
to the event.
Like with any of our other machine learning tools, it's important to
understand where the Naive Bayes fits in the hierarchy.