Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Probability-based Learning
Sections 6.1, 6.2, 6.3
John D. Kelleher and Brian Mac Namee and Aoife D’Arcy
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
1 Big Idea
2 Fundamentals
Bayes’ Theorem
Bayesian Prediction
Conditional Independence and Factorization
3 Standard Approach: The Naive Bayes’ Classifier
A Worked Example
4 Summary
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Big Idea
(a)
(b)
Figure: A game of find the lady
(a)
Likelihood
Left Center Right
(b)
Figure: A game of find the lady : (a) the cards dealt face down on a
table; and (b) the initial likelihoods of the queen ending up in each
position.
(a)
Likelihood
Left Center Right
(b)
Figure: A game of find the lady : (a) the cards dealt face down on a
table; and (b) a revised set of likelihoods for the position of the queen
based on evidence collected.
(a)
Likelihood
Left Center Right
(b)
Figure: A game of find the lady : (a) The set of cards after the wind
blows over the one on the right; (b) the revised likelihoods for the
position of the queen based on this new evidence.
Figure: A game of find the lady : The final positions of the cards in
the game.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Big Idea
We can use estimates of likelihoods to determine the most
likely prediction that should be made.
More importantly, we revise these predictions based on
data we collect and whenever extra evidence becomes
available.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Fundamentals
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Table: A simple dataset for M ENINGITIS diagnosis with descriptive
features that describe the presence or absence of three common
symptoms of the disease: H EADACHE, F EVER, and VOMITING.
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
A probability function, P(), returns the probability of a
feature taking a specific value.
A joint probability refers to the probability of an
assignment of specific values to multiple different features.
A conditional probability refers to the probability of one
feature taking a specific value given that we already know
the value of a different feature
A probability distribution is a data structure that
describes the probability of each possible value a feature
can take. The sum of a probability distribution must equal
1.0.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
A joint probability distribution is a probability distribution
over more than one feature assignment and is written as a
multi-dimensional matrix in which each cell lists the
probability of a particular combination of feature values
being assigned.
The sum of all the cells in a joint probability distribution
must be 1.0.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
P(h, f , v , m), P(¬h, f , v , m)
P(h, f , v , ¬m), P(¬h, f , v , ¬m)
P(h, f , ¬v , m), P(¬h, f , ¬v , m)
P(h, f , ¬v , ¬m), P(¬h, f , ¬v , ¬m)
P(H, F , V , M) =
P(h, ¬f , v , m), P(¬h, ¬f , v , m)
P(h, ¬f , v , ¬m), P(¬h, ¬f , v , ¬m)
P(h, ¬f , ¬v , m), P(¬h, ¬f , ¬v , m)
P(h, ¬f , ¬v , ¬m), P(¬h, ¬f , ¬v , ¬m)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Given a joint probability distribution, we can compute the
probability of any event in the domain that it covers by
summing over the cells in the distribution where that event
is true.
Calculating probabilities in this way is known as summing
out.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
Bayes’ Theorem
P(Y |X )P(X )
P(X |Y ) =
P(Y )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
Example
After a yearly checkup, a doctor informs their patient that he
has both bad news and good news. The bad news is that the
patient has tested positive for a serious disease and that the
test that the doctor has used is 99% accurate (i.e., the
probability of testing positive when a patient has the disease is
0.99, as is the probability of testing negative when a patient
does not have the disease). The good news, however, is that
the disease is extremely rare, striking only 1 in 10,000 people.
What is the actual probability that the patient has the
disease?
Why is the rarity of the disease good news given that the
patient has tested positive for it?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
P(t|d)P(d)
P(d|t) =
P(t)
P(t) = P(t|d)P(d) + P(t|¬d)P(¬d)
= (0.99 × 0.0001) + (0.01 × 0.9999) = 0.0101
0.99 × 0.0001
P(d|t) =
0.0101
= 0.0098
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
Deriving Bayes theorem
P(Y |X )P(X ) = P(X |Y )P(Y )
P(X |Y )P(Y ) P(Y |X )P(X )
=
P(Y ) P(Y )
P(X |Y )P(Y
) P(Y |X )P(X )
=
P(Y )
P(Y )
P(Y |X )P(X )
⇒P(X |Y ) =
P(Y )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
The divisor is the prior probability of the evidence
This division functions as a normalization constant.
0 ≤ P(X |Y ) ≤ 1
X
P(Xi |Y ) = 1.0
i
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayes’ Theorem
We can calculate this divisor directly from the dataset.
|{rows where Y is the case}|
P(Y ) =
|{rows in the dataset}|
Or, we can use the Theorem of Total Probability to
calculate this divisor.
X
P(Y ) = P(Y |Xi )P(Xi ) (1)
i
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
Generalized Bayes’ Theorem
P(q[1], . . . , q[m]|t = l)P(t = l)
P(t = l|q[1], . . . , q[m]) =
P(q[1], . . . , q[m])
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
Chain Rule
P(q[1], . . . , q[m]) =
P(q[1]) × P(q[2]|q[1])×
· · · × P(q[m]|q[m − 1], . . . , q[2], q[1])
To apply the chain rule to a conditional probability we just
add the conditioning term to each term in the expression:
P(q[1], . . . , q[m]|t = l) =
P(q[1]|t = l) × P(q[2]|q[1], t = l) × . . .
· · · × P(q[m]|q[m − 1], . . . , q[3], q[2], q[1], t = l)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
H EADACHE F EVER VOMITING M ENINGITIS
true false true ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
P(M|h, ¬f , v ) =?
In the terms of Bayes’ Theorem this problem can be stated
as:
P(h, ¬f , v |M) × P(M)
P(M|h, ¬f , v ) =
P(h, ¬f , v )
There are two values in the domain of the M ENINGITIS
feature, ’true’ and ’false’, so we have to do this calculation
twice.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
We will do the calculation for m first
To carry out this calculation we need to know the following
probabilities: P(m), P(h, ¬f , v ) and P(h, ¬f , v | m).
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
We can calculate the required probabilities directly from
the data. For example, we can calculate P(m) and
P(h, ¬f , v ) as follows:
|{d5 , d8 , d10 }| 3
P(m) = = = 0.3
|{d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 }| 10
|{d3 , d4 , d6 , d7 , d8 , d10 }| 6
P(h, ¬f , v ) = = = 0.6
|{d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 }| 10
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
However, as an exercise we will use the chain rule
calculate:
P(h, ¬f , v | m) =?
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
Using the chain rule calculate:
P(h, ¬f , v | m) = P(h | m) × P(¬f | h, m) × P(v | ¬f , h, m)
|{d8 , d10 }| |{d8 , d10 }| |{d8 , d10 }|
= × ×
|{d5 , d8 , d10 }| |{d8 , d10 }| |{d8 , d10 }|
2 2 2
= × × = 0.6666
3 2 2
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
So the calculation of P(m|h, ¬f , v ) is:
!
P(h|m) × P(¬f |h, m)
× P(v |¬f , h, m) × P(m)
P(m|h, ¬f , v ) =
P(h, ¬f , v )
0.6666 × 0.3
= = 0.3333
0.6
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
The corresponding calculation for P(¬m|h, ¬f , v ) is:
P(h, ¬f , v | ¬m) × P(¬m)
P(¬m | h, ¬f , v ) =
P(h, ¬f , v )
!
P(h|¬m) × P(¬f | h, ¬m)
× P(v |¬f , h, ¬m) × P(¬m)
=
P(h, ¬f , v )
0.7143 × 0.8 × 1.0 × 0.7
= = 0.6667
0.6
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
P(m|h, ¬f , v ) = 0.3333
P(¬m|h, ¬f , v ) = 0.6667
These calculations tell us that it is twice as probable that
the patient does not have meningitis than it is that they do
even though the patient is suffering from a headache and
is vomiting!
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
The Paradox of the False Positive
The mistake of forgetting to factor in the prior gives rise to
the paradox of the false positive which states that in
order to make predictions about a rare event the model has
to be as accurate as the prior of the event is rare or there is
a significant chance of false positives predictions (i.e.,
predicting the event when it is not the case).
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
Bayesian MAP Prediction Model
MMAP (q) = argmax P(t = l | q[1], . . . , q[m])
l∈levels(t)
P(q[1], . . . , q[m] | t = l) × P(t = l)
= argmax
l∈levels(t) P(q[1], . . . , q[m])
Bayesian MAP Prediction Model (without normalization)
MMAP (q) = argmax P(q[1], . . . , q[m] | t = l) × P(t = l)
l∈levels(t)
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
H EADACHE F EVER VOMITING M ENINGITIS
true true false ?
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
P(m | h, f , ¬v ) =?
P(¬m | h, f , ¬v ) =?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
!
P(h|m) × P(f | h, m)
× P(¬v | f , h, m) × P(m)
P(m | h, f , ¬v ) =
P(h, f , ¬v )
0.6666 × 0 × 0 × 0.3
= =0
0.1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
!
P(h|¬m) × P(f | h, ¬m)
× P(¬v | f , h, ¬m) × P(¬m)
P(¬m | h, f , ¬v ) =
P(h, f , ¬v )
0.7143 × 0.2 × 1.0 × 0.7
= = 1.0
0.1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
P(m | h, f , ¬v ) = 0
P(¬m | h, f , ¬v ) = 1.0
There is something odd about these results!
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
Curse of Dimensionality
As the number of descriptive features grows the number of
potential conditioning events grows. Consequently, an
exponential increase is required in the size of the dataset as
each new descriptive feature is added to ensure that for any
conditional probability there are enough instances in the
training dataset matching the conditions so that the resulting
probability is reasonable.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Bayesian Prediction
The probability of a patient who has a headache and a
fever having meningitis should be greater than zero!
Our dataset is not large enough → our model is over-fitting
to the training data.
The concepts of conditional independence and
factorization can help us overcome this flaw of our current
approach.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
If knowledge of one event has no effect on the probability
of another event, and vice versa, then the two events are
independent of each other.
If two events X and Y are independent then:
P(X |Y ) = P(X )
P(X , Y ) = P(X ) × P(Y )
Recall, that when two event are dependent these rules are:
P(X , Y )
P(X |Y ) =
P(Y )
P(X , Y ) = P(X |Y ) × P(Y ) = P(Y |X ) × P(X )
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Full independence between events is quite rare.
A more common phenomenon is that two, or more, events
may be independent if we know that a third event has
happened.
This is known as conditional independence.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
For two events, X and Y , that are conditionally
independent given knowledge of a third events, here Z , the
definition of the probability of a joint event and conditional
probability are:
P(X |Y , Z ) = P(X |Z )
P(X , Y |Z ) = P(X |Z ) × P(Y |Z )
P(X , Y )
P(X |Y ) =
P(Y ) P(X |Y ) = P(X )
P(X , Y ) = P(X |Y ) × P(Y ) P(X , Y ) = P(X ) × P(Y )
= P(Y |X ) × P(X )
X and Y are independent
X and Y are dependent
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
If the event t = l causes the events q[1], . . . , q[m] to
happen then the events q[1], . . . , q[m] are conditionally
independent of each other given knowledge of t = l and
the chain rule definition can be simplified as follows:
P(q[1], . . . , q[m] | t = l)
= P(q[1] | t = l) × P(q[2] | t = l) × · · · × P(q[m] | t = l)
m
Y
= P(q[i] | t = l)
i=1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Using this we can simplify the calculations in Bayes’
Theorem, under the assumption of conditional
independence between the descriptive features given the
level l of the target feature:
m
!
Y
P(q[i] | t = l) × P(t = l)
i=1
P(t = l | q[1], . . . , q[m]) =
P(q[1], . . . , q[m])
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Withouth conditional independence
P(X , Y , Z |W ) = P(X |W ) × P(Y |X , W ) × P(Z |Y , X , W ) × P(W )
With conditional independence
P(X , Y , Z |W ) = P(X |W ) × P(Y |W ) × P(Z |W ) × P(W )
| {z } | {z } | {z } | {z }
Factor 1 Factor 2 Factor 3 Factor 4
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
The joint probability distribution for the meningitis dataset.
P(h, f , v , m), P(¬h, f , v , m)
P(h, f , v , ¬m), P(¬h, f , v , ¬m)
P(h, f , ¬v , m), P(¬h, f , ¬v , m)
P(h, f , ¬v , ¬m), P(¬h, f , ¬v , ¬m)
P(H, F , V , M) =
P(h, ¬f , v , m), P(¬h, ¬f , v , m)
P(h, ¬f , v , ¬m), P(¬h, ¬f , v , ¬m)
P(h, ¬f , ¬v , m), P(¬h, ¬f , ¬v , m)
P(h, ¬f , ¬v , ¬m), P(¬h, ¬f , ¬v , ¬m)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Assuming the descriptive features are conditionally
independent of each other given M ENINGITIS we only need
to store four factors:
Factor1 : < P(M) >
Factor2 : < P(h|m), P(h|¬m) >
Factor3 : < P(f |m), P(f |¬m) >
Factor4 : < P(v |m), P(v |¬m) >
P(H, F , V , M) = P(M) × P(H|M) × P(F |M) × P(V |M)
ID H EADACHE F EVER VOMITING M ENINGITIS
1 true true false false
2 false true false false
3 true false true false
4 true false true false
5 false true false true
6 true false true false
7 true false true false
8 true false true true
9 false true false false
10 true false true true
Calculate the factors from the data.
Factor1 : < P(M) >
Factor2 : < P(h|m), P(h|¬m) >
Factor3 : < P(f |m), P(f |¬m) >
Factor4 : < P(v |m), P(v |¬m) >
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Factor1 : < P(m) = 0.3 >
Factor2 : < P(h|m) = 0.6666, P(h|¬m) = 0.7413 >
Factor3 : < P(f |m) = 0.3333, P(f |¬m) = 0.4286 >
Factor4 : < P(v |m) = 0.6666, P(v |¬m) = 0.5714 >
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Factor1 : < P(m) = 0.3 >
Factor2 : < P(h|m) = 0.6666, P(h|¬m) = 0.7413 >
Factor3 : < P(f |m) = 0.3333, P(f |¬m) = 0.4286 >
Factor4 : < P(v |m) = 0.6666, P(v |¬m) = 0.5714 >
Using the factors above calculate the probability of
M ENINGITIS=’true’ for the following query.
H EADACHE F EVER VOMITING M ENINGITIS
true true false ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
P(h|m) × P(f |m) × P(¬v |m) × P(m)
P(m|h, f , ¬v ) = P =
i P(h|Mi ) × P(f |Mi ) × P(¬v |Mi ) × P(Mi )
0.6666 × 0.3333 × 0.3333 × 0.3
= 0.1948
(0.6666 × 0.3333 × 0.3333 × 0.3) + (0.7143 × 0.4286 × 0.4286 × 0.7)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
Factor1 : < P(m) = 0.3 >
Factor2 : < P(h|m) = 0.6666, P(h|¬m) = 0.7413 >
Factor3 : < P(f |m) = 0.3333, P(f |¬m) = 0.4286 >
Factor4 : < P(v |m) = 0.6666, P(v |¬m) = 0.5714 >
Using the factors above calculate the probability of
M ENINGITIS=’false’ for the same query.
H EADACHE F EVER VOMITING M ENINGITIS
true true false ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
P(h|¬m) × P(f |¬m) × P(¬v |¬m) × P(¬m)
P(¬m|h, f , ¬v ) = P =
i P(h|Mi ) × P(f |Mi ) × P(¬v |Mi ) × P(Mi )
0.7143 × 0.4286 × 0.4286 × 0.7
= 0.8052
(0.6666 × 0.3333 × 0.3333 × 0.3) + (0.7143 × 0.4286 × 0.4286 × 0.7)
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Conditional Independence and Factorization
P(m|h, f , ¬v ) = 0.1948
P(¬m|h, f , ¬v ) = 0.8052
As before, the MAP prediction would be
M ENINGITIS = ’false’
The posterior probabilities are not as extreme!
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Standard Approach: The Naive
Bayes’ Classifier
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Naive Bayes’ Classifier
m
!
Y
M(q) = argmax P(q[i] | t = l) × P(t = l)
l∈levels(t) i=1
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Naive Bayes’ is simple to train!
1 calculate the priors for each of the target levels
2 calculate the conditional probabilities for each feature
given each target level.
Table: A dataset from a loan application fraud detection domain.
C REDIT G UARANTOR /
ID H ISTORY C O A PPLICANT ACCOMODATION F RAUD
1 current none own true
2 paid none own false
3 paid none own false
4 paid guarantor rent true
5 arrears none own false
6 arrears none own true
7 current none own false
8 arrears none own false
9 current none rent false
10 none none own true
11 current coapplicant own false
12 current none own true
13 current none rent true
14 paid none own false
15 arrears none own false
16 current none own false
17 arrears coapplicant rent false
18 arrears none free false
19 arrears none own false
20 paid none own false
P(fr ) = 0.3 P(¬fr ) = 0.7
P(CH = ’none’ | fr ) = 0.1666 P(CH = ’none’ | ¬fr ) = 0
P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(CH = ’current’ | fr ) = 0.5 P(CH = ’current’ | ¬fr ) = 0.2857
P(CH = ’arrears’ | fr ) = 0.1666 P(CH = ’arrears’ | ¬fr ) = 0.4286
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(GC = ’guarantor’ | fr ) = 0.1666 P(GC = ’guarantor’ | ¬fr ) = 0
P(GC = ’coapplicant’ | fr ) = 0 P(GC = ’coapplicant’ | ¬fr ) = 0.1429
P(ACC = ’own’ | fr ) = 0.6666 P(ACC = ’own’ | ¬fr ) = 0.7857
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
P(ACC = ’free’ | fr ) = 0 P(ACC = ’free’ | ¬fr ) = 0.0714
Table: The probabilities needed by a Naive Bayes prediction model
calculated from the dataset. Notation key: FR=F RAUDULENT,
CH=C REDIT H ISTORY, GC = G UARANTOR /C O A PPLICANT, ACC =
ACCOMODATION, T=’true’, F=’false’.
P(fr ) = 0.3 P(¬fr ) = 0.7
P(CH = ’none’ | fr ) = 0.1666 P(CH = ’none’ | ¬fr ) = 0
P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(CH = ’current’ | fr ) = 0.5 P(CH = ’current’ | ¬fr ) = 0.2857
P(CH = ’arrears’ | fr ) = 0.1666 P(CH = ’arrears’ | ¬fr ) = 0.4286
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(GC = ’guarantor’ | fr ) = 0.1666 P(GC = ’guarantor’ | ¬fr ) = 0
P(GC = ’coapplicant’ | fr ) = 0 P(GC = ’coapplicant’ | ¬fr ) = 0.1429
P(ACC = ’own’ | fr ) = 0.6666 P(ACC = ’own’ | ¬fr ) = 0.7857
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
P(ACC = ’free’ | fr ) = 0 P(ACC = ’free’ | ¬fr ) = 0.0714
C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMODATION F RAUDULENT
paid none rent ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
A Worked Example
P(fr ) = 0.3 P(¬fr ) = 0.7
P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
Ym
P (q [k ] | fr ) × P (fr ) = 0.0139
k =1
m
Y
P (q [k ] | ¬fr ) × P(¬fr ) = 0.0245
k =1
C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMODATION F RAUDULENT
paid none rent ?
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
A Worked Example
P(fr ) = 0.3 P(¬fr ) = 0.7
P(CH = ’paid’ | fr ) = 0.1666 P(CH = ’paid’ | ¬fr ) = 0.2857
P(GC = ’none’ | fr ) = 0.8334 P(GC = ’none’ | ¬fr ) = 0.8571
P(ACC = ’rent’ | fr ) = 0.3333 P(ACC = ’rent’ | ¬fr ) = 0.1429
Ym
P (q [k ] | fr ) × P (fr ) = 0.0139
k =1
m
Y
P (q [k ] | ¬fr ) × P(¬fr ) = 0.0245
k =1
C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMODATION F RAUDULENT
paid none rent ’false’
The model is generalizing beyond the dataset!
C REDIT G UARANTOR /
ID H ISTORY C O A PPLICANT ACCOMMODATION F RAUD
1 current none own true
2 paid none own false
3 paid none own false
4 paid guarantor rent true
5 arrears none own false
6 arrears none own true
7 current none own false
8 arrears none own false
9 current none rent false
10 none none own true
11 current coapplicant own false
12 current none own true
13 current none rent true
14 paid none own false
15 arrears none own false
16 current none own false
17 arrears coapplicant rent false
18 arrears none free false
19 arrears none own false
20 paid none own false
C REDIT H ISTORY G UARANTOR /C O A PPLICANT ACCOMMODATION F RAUDULENT
paid none rent ’false’
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
Summary
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
P(d|t) × P(t)
P(t|d) = (2)
P(d)
A Naive Bayes’ classifier naively assumes that each of the
descriptive features in a domain is conditionally
independent of all of the other descriptive features, given
the state of the target feature.
This assumption, although often wrong, enables the Naive
Bayes’ model to maximally factorise the representation that
it uses of the domain.
Surprisingly, given the naivety and strength of the
assumption it depends upon, a Naive Bayes’ model often
performs reasonably well.
Big Idea Fundamentals Standard Approach: The Naive Bayes’ Classifier Summary
1 Big Idea
2 Fundamentals
Bayes’ Theorem
Bayesian Prediction
Conditional Independence and Factorization
3 Standard Approach: The Naive Bayes’ Classifier
A Worked Example
4 Summary