Chapter 4 Solutions: Classification: Basic Concepts
1. Bayesian classifiers can predict class membership probabilities, in other word, __________
a. The probability that a given tuple belongs to a particular class.
b. The probability that a given tuple does not belong to a particular class.
c. None of the above.
2. What is the prior probability?
A priori probability refers to the likelihood of an event occurring before some evidence is taken into
account. (Any definition that gives same meaning)
3. What is the posterior probability?
Posterior probability is the probability an event will happen after all evidence or background
information has been taken into account. (Any definition that gives same meaning)
4. Which one of the following statements is TRUE for a Decision Tree?
a. Decision tree is only suitable for the classification problem statement.
b. In a decision tree, the entropy of a node decreases as we go down a decision tree.
c. In a decision tree, entropy determines purity.
d. Decision tree can only be used for only numeric valued and continuous attributes.
(Entropy helps to determine the impurity of a node and as we go down the decision tree, entropy
decreases)
5. How do you choose the right node while constructing a decision tree?
a. An attribute having high entropy
b. An attribute having high entropy and information gain
c. An attribute having the lowest information gain.
d. An attribute having the highest information gain.
(We select first those attributes which are having maximum information gain)
6. In a naive Bayes algorithm, when an attribute value in the testing record has no example in the training
set, then the entire posterior probability will be zero.
a. True
b. False
c. None of the above.
(Since for a particular value in the attribute, the probability will be zero due to the absence of an
example present in the training dataset. This usually leads to the problem of zero probability in the
Naive Bayes algorithm.)
7. Which of the following statements about Naive Bayes is incorrect?
a. Attributes are equally important.
b. Attributes are statistically dependent of one another given the class value.
c. Attributes are statistically independent of one another given the class value.
d. Attributes can be nominal or numeric
e. All of the above
8. High entropy means that the partitions in classification are?
a. Pure
b. Not pure
c. Useful
d. Useless
e. None of the above
9. Which of the following is considered as one of Naïve Bayes disadvantages?
a. Easy to implement.
b. Reduces computaional cost.
c. Loss of accuracy due to the class conditional independence.
d. None of the above.
For question 10 and 11, show ALL calculations and workings. You will be required to submit your
answers next week.
Decision Tree Classification
10) For the following medical diagnosis data, create a decision tree using the ID3
Method
Strep Throat = 3
Allergy = 3
Cold = 4
-----
10
m
Info( D) = − pi log 2 ( pi )
i =1
v | Dj |
Info A ( D) = Info( D j )
j =1 |D|
Gain(A) = Info(D) − Info A(D)
Each attribute is discrete
1
Attribute Selection: Information Gain
The expected information needed to classify a tuple in D:
m
3 3 3 3 4 4
Info( D) = − pi log 2 ( pi ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 1.571
i =1 10 10 10 10 10 10
Finding the splitting attribute:
Let us determine the expected information needed to classify a
tuple in D if the tuples are partitioned according to:
(i) Sore throat
Sore Strep Allergy Cold
throat throat
Yes 2 1 2
No 1 2 2
2
Attribute Selection
Sore Strep Allergy Cold
throat (ST) throat
means “ST=yes” = “ST=no”
Yes 2 1 2 = 5 out of 10 samples
No 1 2 2
𝐼𝑛𝑓𝑜𝑆𝑇 𝐷
5 2 2 1 1 2 2 5 1 1 2 2 2 2
= × − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 + × − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2
10 5 5 5 5 5 5 10 5 5 5 5 5 5
= 0.5 × 1.52 + 0.5 × 1.52
= 1.52 bits
Hence, the gain in information from such a partitioning would
be:
Gain( ST ) = Info( D) − InfoST ( D) = 1.571 − 1.52 = 0.051
3
(ii) Fever
Fever(F) Strep Allergy Cold
throat
Yes 1 0 4
No 2 3 0
𝐼𝑛𝑓𝑜𝐹 𝐷
5 1 1 4 4 5 2 2 3 3
= 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 + 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5
= 0.5 × 0.722 + 0.5 × 0.971
= 0.85 bits
Hence, the gain in information from such a partitioning would
be:
Gain( F ) = Info( D) − InfoF ( D) = 1.571 − 0.85 = 0.721
4
Attribute Selection: Information Gain
(iii) Swollen Glands
Swollen Strep Allergy Cold
Glands(SG) throat
Yes 3 0 0
No 0 3 4
𝐼𝑛𝑓𝑜𝑆𝐺 𝐷
3 3 3 7 3 3 4 4
= 10 × − 3 𝑙𝑜𝑔2 3 + 10 × − 7 𝑙𝑜𝑔2 7 − 7 𝑙𝑜𝑔2 7
= 0.3 × 0 + 0.7 × 0.985
= 0.69 bits
Hence, the gain in information from such a partitioning would
be:
Gain( SG ) = Info( D) − InfoSG ( D) = 1.571 − 0.69 = 0.88
5
Attribute Selection: Information Gain
(iv) Congestion
Congestion Strep Allergy Cold
throat
Yes 1 3 4
No 2 0 0
𝐼𝑛𝑓𝑜𝐶 𝐷
8 1 1 3 3 4 4 2 2 2
= 10 × − 8 𝑙𝑜𝑔2 8 − 8 𝑙𝑜𝑔2 8 − 8 𝑙𝑜𝑔2 8 + 10 × − 2 𝑙𝑜𝑔2 2
= 0.8 × 1.405 + 0
= 1.124 bits
Hence, the gain in information from such a partitioning would
be:
Gain(cong ) = Info( D) − Infocong ( D) = 1.571 − 1.124 = 0.45
6
Attribute Selection: Information Gain
(v) Headache
Congestion Strep Allergy Cold
throat
Yes 1 2 2
No 2 1 2
𝐼𝑛𝑓𝑜𝐻 𝐷
5 1 1 2 2 2 2 5 2 2 1 1 2 2
= 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 + 10 × − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5 − 5 𝑙𝑜𝑔2 5
= 0.5 × 1.52 + 0.5 × 1.52
= 1.52 bits
Hence, the gain in information from such a partitioning would
be:
Gain( H ) = Info( D) − InfoH ( D) = 1.571 − 1.52 = 0.051
7
Highest
value ∴
root
node
Swollen Glands
No Yes
Strep Throat
Fever
No Yes
Allergy Cold
8
Decision Tree Classification
11) For the given test record, determine, using Naive Bayes Classifier, if the weather is suitable for playing
golf. X = (outlook = sunny, temperature = mild , humidity = normal, wind = true)
Yes = 9
No = 5
-----
14
Each attribute is discrete 9
◼ P(Ci): P(play = “yes”) = 9/14 = 0.643
P(play = “no”) = 5/14= 0.357
◼ Compute P(X|Ci) for each class
P(outlook = “sunny” | play = “yes”) = 2/9 = 0.222
P(outlook = “sunny” | play = “no”) = 3/5 = 0.6
P(temperature = “mild” | play = “yes”) = 4/9 = 0.444
P(temperature = “mild” | play = “no”) = 2/5 = 0.4
P(humidity = “normal” | play = “yes) = 6/9 = 0.667
P(humidity = “normal” | play = “no”) = 1/5 = 0.2
P(wind = “true” | play = “yes”) = 6/9 = 0.667
P(wind = “true” | play = “no”) = 2/5 = 0.4
◼ X = (outlook = sunny, temperature =mild , humidity = normal, wind = true)
P(X|Ci) : P(X|play = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|play = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|play = “yes”) * P(play = “yes”) = 0.028
P(X|play = “no”) * P(play = “no”) = 0.007
Therefore, X belongs to class (play = “yes”) 10