KEMBAR78
Lecture 5 Bayesian | PDF | Statistical Classification | Bayesian Inference
0% found this document useful (0 votes)
17 views37 pages

Lecture 5 Bayesian

The document outlines the principles of probabilistic learning, focusing on Bayesian learning and Naïve Bayes classification. It explains how Bayesian classifiers utilize Bayes' theorem to determine the probability of a given sample belonging to a particular class, and discusses the advantages and disadvantages of Naïve Bayes classifiers. Practical examples illustrate the application of these concepts in classification problems.

Uploaded by

imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views37 pages

Lecture 5 Bayesian

The document outlines the principles of probabilistic learning, focusing on Bayesian learning and Naïve Bayes classification. It explains how Bayesian classifiers utilize Bayes' theorem to determine the probability of a given sample belonging to a particular class, and discusses the advantages and disadvantages of Naïve Bayes classifiers. Practical examples illustrate the application of these concepts in classification problems.

Uploaded by

imran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Outline

• Motivation of Probabilistic Learning


• Bayesian Thinking Analogy
• Bayesian Learning
• Naïve Bayes Classification
• Practical Examples
• Advantages and Disadvantages

2
Classification Problem

• Training data: examples of the form (d,h(d))


– where d are the data objects to classify (inputs)
– and h(d) are the correct class info for d, h(d){1,…K}
• Goal: given dnew, provide h(dnew)

3
Motivation

4
Basic Classification
Input Output

Spam Spam vs. Not-Spam Binary


filtering
!!!!$$$!!!!

Character Multi-Class
recognition
C
C vs. other 25 characters
Outline

• Motivation of Probabilistic Learning


• Bayesian Thinking Analogy
• Bayesian Learning
• Naïve Bayes Classification
• Practical Examples
• Advantages and Disadvantages

6
Bayesian Thinking

7
Bayesian Thinking

8
Outline

• Motivation of Probabilistic Learning


• Bayesian Thinking Analogy
• Bayesian Learning
• Naïve Bayes Classification
• Practical Examples
• Advantages and Disadvantages

9
Bayesian Learning

• Bayesian Classifiers

– Bayesian classifiers are statistical classifiers, and are based on Baye’s theorem

– They can calculate the probability that a given sample belongs to a particular class

– Bayesian learning algorithms are among the most practical approaches to certain types of
learning problems.

– There results are comparable to the performance of other classifiers, such as decision tree
and neural networks in many cases

– Optimal decisions can be made by reasoning about these probabilities together with
observed data

10
Why Bayesian?

• Provides practical learning algorithms


– E.g. Naïve Bayes

• Prior knowledge and observed data can be


combined

• It is a generative (model based) approach,


which offers a useful conceptual framework
– Any kind of objects can be classified, based on a probabilistic
model specification
Bayes Theorem

• In machine learning we are interested in determining the best hypothesis from


space H, given the observed training data D.

• One way to specify what we mean by the best hypothesis is to say that we
demand the most probable hypothesis, given the data D plus any initial
knowledge about the prior probabilities of the various hypotheses in H
• best hypothesis ≈ most probable hypothesis

• Bayes theorem provides a direct method for calculating such probabilities.


P ( D | h) P ( h)
P(h | D) 
P( D)
12
Bayes Theorem

P(h|D) is called the posterior probability of h, because it reflects our confidence that h holds
after we have seen the training data D and we are interested in evaluating the probability
P(h|D).

13
Bayesian Theorem Example
Two equal size sacks A and B contain red and white balls, we are
interested in finding out the probability that a ball chosen belongs to
sack A, given the ball is red.

Given Data
60% of balls in Sack A are Sack B contains only red
white and rest are red. balls.

(h) Ball is chosen from sack A. (D) Ball is red.

P(h|D) = P(D|h)* P(h)/P(D)= ?

P(h|D) = P(D|h)* P(h)/P(D)=(0.4*0.5)/0.7 = 0.29 (29%)


14
Bayesian Theorem Example
Interested in finding out a patient’s probability of
having liver disease if they are an alcoholic.

(h) Patient has liver disease. (D) Patient is alcoholic.

Past Data
10% of patients entering your Five percent of the clinic’s
clinic have liver disease. patients are alcoholics.

P(h) = 0.10 P(D) = 0.05


You might also know that among those patients diagnosed with liver disease, 7% are alcoholics.
P(D|h) = 0.07.
Bayes’ theorem tells you: P(h|D) = (0.07 * 0.1)/0.05 = 0.14
If the patient is an alcoholic, their chances of having liver disease is 0.14 (14%)
15
Choosing Hypothesis (MAP Hypothesis)

• In many learning scenarios, the learner considers some set of candidate


hypotheses H and is interested in finding the most probable hypothesis h∈H
given the observed training data D

• any maximally probable hypothesis is called


– maximum a posteriori (MAP) hypotheses

P(D) can be dropped, because it is a constant independent of h.


16
Choosing Hypothesis (ML Hypothesis)

• Sometimes it is assumed that every hypothesis is equally probable a priori

• In this case, the equation can be simplified because P(D|H) is often called the likelihood
of D given h, any hypothesis that maximizes P(D|h) is called maximum likelihood (ML)
hypothesis.

P(h) can be dropped, because it is equal for each h ∈ H


17
Bayes Theorem and Concept Learning

• What is the relationship between Bayes theorem and the problem of concept
learning?

• It can be used for designing a straightforward learning algorithm

• Brute-Force MAP LEARNING algorithm


– For each hypothesis h ∈ H, calculate the posterior probability

– Output hypothesis hMAP with the highest posterior probability.

18
Outline

• Motivation of Probabilistic Learning


• Bayesian Thinking Analogy
• Bayesian Learning
• Naïve Bayes Classification
• Practical Examples
• Advantages and Disadvantages

19
Probability Model for Classifier

• Say, class label C has k distinct values: c1…ck


• Goal:
– Given values for all the features, we want to predict the
probability for C=c1, C=c2,…C=ck

20
Naive Bayes Classifier

• Special, simple optimal classifier, where


– hypothesis = classification
– all attributes are independent given the class

class

attrib. attrib. attrib.


1 2 3
Probability Model for Classifier

• Assuming conditional independence:

22
Naïve Bayes Classifiers - Properties

• Estimating the conditional probability of attribute for a given class


P( xi | c j ) instead of calculating the joint distribution of all attributes
P( x1 , x2 ,, xn | c j ) which greatly reduces the number of parameters

• The learning step in Naïve Bayes consists of estimating


P( xi | c j ) and P(c j ) based on the frequencies in the training data.

• An unseen instance is classified by computing the class that maximizes


the posterior

23
Outline

• Motivation of Probabilistic Learning


• Bayesian Thinking Analogy
• Bayesian Learning
• Naïve Bayes Classification
• Practical Examples
• Advantages and Disadvantages

24
Naïve Bayes Classifiers – Play Tennis Example
Day Outlook Temperature Humidity Wind Play
Tennis

Day1 Sunny Hot High Weak No


Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No

For the day <sunny, cool, high, strong>, what’s the play prediction?
25
26

Naïve Bayes Classifiers – Play Tennis Example


The Evidence relates all attributes without Exceptions.

Outlook Temp. Humidity Windy Play


Sunny Cool High True ? Evidence E

Pr[ yes | E ]  Pr[Outlook  Sunny | yes]


 Pr[Temperature  Cool | yes]
 Pr[ Humidity  High | yes]
Probability of
class “yes”  Pr[Windy  True | yes]
Pr[ yes ]

Pr[ E ]
27
Outlook Temperature Humidity Windy Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1

Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14

Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5

Outlook Temp Humidity Windy Play

Sunny Hot High False No

Sunny Hot High True No

Overcast Hot High False Yes

Rainy Mild High False Yes

Rainy Cool Normal False Yes

Rainy Cool Normal True No

Overcast Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rainy Mild High True No


28

Compute Prediction For New Day


Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14

Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5

For compute prediction for new day:


Pr[ yes | E ]  Pr[Outlook  Sunny | yes]
Outlook Temp. Humidity Windy Play
Sunny Cool High True ?
 Pr[Temperature  Cool | yes]
 Pr[ Humidity  High | yes]
Likelihood of the two classes
For “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053  Pr[Windy  True | yes]
For “no” = 3/5  1/5  4/5  3/5  5/14 = 0.0206
Pr[ yes ]
Conversion into a probability by normalization: 
P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205 Pr[ E ]
P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
Naïve Bayes Classifiers – Buy Computer Example

29
Naïve Bayes Classifiers – Buy Computer Example

30
Naïve Bayes Classifiers – Buy Computer Example

31
Naïve Bayes Classifier – Fruit Example

32
Naïve Bayes Classifier – Fruit Example

33
Naïve Bayes Classifier – Fruit Example

34
Naïve Bayes Classifier – Fruit Example

35
Outline

• Motivation of Probabilistic Learning


• Bayesian Thinking Analogy
• Bayesian Learning
• Naïve Bayes Classification
• Practical Examples
• Advantages and Disadvantages

36
37

Naïve Bayesian Classifier:


Advantages and Disadvantages
• Advantages :
– Easy to implement.
– Good results obtained in most of the cases.
• Disadvantages
– Assumption: class conditional independence , therefore loss of accuracy
– Practically, dependencies exist among variables
– E.g., hospitals: patients: Profile: age, family history etc
Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc
– Dependencies among these cannot be modeled by Naïve Bayesian
Classifier.
• How to deal with these dependencies?
– Bayesian Belief Networks.
Summary

• Combine prior knowledge with observed data

• Computationally efficient classifier

• Can be extended to capture the real world learning


problems

38

You might also like