Q1 .
Consider the following data, where the Y label is whether or not the child goes out to
play.
Q2. For instance, the following table informs about decision making factors to play tennis
at outside for previous 14 days (use ID3 algorithm).
Q3. A dataset has the following class distributions before and after a split:
Before Split
• Class 1: 10 samples
• Class 2: 10 samples
After Split
Left Node:
• Class 1: 8, Class 2: 2
Right Node:
• Class 1: 2, Class 2: 8
Calculate the information gain if the entropy before the split is 1.
Q4 . A bank uses a decision tree with the following rules:
1. If credit score ≥ 700 → Approve
2. If income ≥ $50,000 & credit score < 700 → Approve
3. Otherwise → Reject
If a customer has:
• Credit Score: 680
• Income: $55,000
Will they be approved?
Q5. A VFDT model starts with an error rate of 12%, but after training on 500,000 instances,
the error drops to 7%.
Compute the percentage reduction in error.
Q6. A standard decision tree takes 3 seconds per 1,000 instances, while VFDT processes
10,000 instances per second.
How much faster is VFDT?
Q7 . An exhaustive search model evaluates 250 feature sets, while a heuristic model
evaluates only 1 million sets.
What percentage of the total feature space does the heuristic model check?
Q8. Draw Decision Tree for the given datatset .
Q9. A model has Class A accuracy: 90% and Class B accuracy: 80%.
After drift, accuracies drop to Class A: 78%, Class B: 72%.
Compute the overall drop in performance.
Q10 . A model initially had an accuracy of 75%, which dropped to 60% due to concept
drift.
After an adaptive method was applied, accuracy improved to 68%.
Calculate the percentage of lost accuracy recovered.
Q11. A real-time fraud detection model has the following accuracy over time:
• Week 1: 95%
• Week 2: 93%
• Week 3: 85%
• Week 4: 70%
Compute the overall percentage drop in accuracy.
Q12.
A dataset has three classes with the following proportions:
• Class A: 40%
• Class B: 35%
• Class C: 25%
Compute the entropy of the dataset.
Q13. A dataset is split into two subsets:
• Subset 1: (Class A = 20, Class B = 10)
• Subset 2: (Class A = 5, Class B = 15)
Compute the Gini Index after the split.
Q14 A batch decision tree takes O(n^2) time for training.
VFDT takes O(nlogn).
For n=100,000n = 100,000n=100,000, compute the ratio of their complexity.
Q15. A VFDT model has seen 100,000 instances, and two attributes have the following
observed information gains:
• IG1= 0.05
• IG2=0.04
Given a confidence threshold of δ =0.01, determine whether a split is made using the
Hoeffding Bound:
1
ϵ= √ln(𝛿)/2𝑁
Q16. A dataset has a binary attribute A with values A1 and A2. The class probabilities are:
P(A1)=0.7,P(A2)=0.3
Calculate the entropy of A.
Q17. A decision tree considers a split on Feature A and Feature B. The dataset entropy is
initially 0.94.
After splitting:
• Feature A Split: weighted entropy = 0.75
• Feature B Split: weighted entropy = 0.68
Compute the entropy reduction percentage for each feature and determine which split is
preferred
Q18. A VFDT model receives 250,000 instances, and the confidence threshold is δ=0.05.
1
Compute the Hoeffding bound ϵ: ϵ= √ln(𝛿)/2𝑁
Q19 . A dataset contains 4 classes with the following distribution before a split:
• Class A: 50 instances
• Class B: 30 instances
• Class C: 20 instances
• Class D: 100 instances
Compute the entropy before split.
Q20. Construct Decision Tree Using Gini Index