0% found this document useful (0 votes)

316 views5 pages

Data Mining Exercises - Solutions

The document discusses 10 questions related to data mining concepts like Python properties for data mining, code output, necessary changes to code, output of code snippets, differences between supervised and unsupervised techniques, overfitting, predictive modeling process, k-means clustering example, number of splits possible in a dataset, and information gain calculation. It provides explanations, examples, and step-by-step workings for each question.

Uploaded by

Mehmet Zirek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

316 views5 pages

Data Mining Exercises - Solutions

Uploaded by

Mehmet Zirek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1. Give 5 properties of Python and explain why Python is suitable for Data Mining?

Python is easy to use, object oriented, easy to read, expressive, open source, portable programming
language which has a lot of libraries for data mining algorithms

2. Write the output of the following code.

[ 2.0, 100, 5]

3. (10 pts) Please make the necessary change in the given code so that it doesn’t give the following
error message and works as commented:

Line 1 must be → import pandas as pd

4. What is the output of the following code? (Hint if the loop test is False then the execution jumps to
the else: row.)

4 320
5. What is the output of the following code?

True

False

None

6. We are writing a sublist function which compares two lists and returns true if the first list (lst1) is
a sublist of (is contained inside) the second list (lst2). We created a version of the second list as ls2
where we eliminated all elements of lst2 which are not in the in the first list to e if the final lists are
the same. However even though the final lists contain the same elements,

This output needs to be True, since elements of

list [2,1] are also elements of list [1,2,5,3]

What property of lists can we use in the comparison ( ?==? ) so that function gives correct result:
(True) in the given example above.
Line 4 must be → return sorted(lst1) == sorted(ls2)

Note: Another sublist function given in the Apriori algorithm code runs faster.
7. What is the output of the following code?

25
81
75

8. a) Describe the difference between unsupervised and supervised

techniques of Data Mining, give an example for each. b) define overfitting
(o.f.), for which of the above techniques o.f. is a problem?

a) Supervised techniques can be used when labeled dataset is available for

training and testing where as unsupervised techniques doesn’t have/need a
labeled dataset. The unsupervised techniques are used to detect new patterns
and clusters in relatively unknown or unstructured data and supervised
techniques are used to predict future data when there is enough structured
and analyzed past information. K means clustering is an unsupervised
technique, decision tree analysis is an example of a supervised technique.

b) Overfitting is a problem of supervised techniques where the model is too

much customized for the specific training data at hand where as it doesn’t
perform well on the test data and future data.

9. Describe predictive modeling process. Which techniques are most suitable

to model datasets with nominal categories? Give 2 examples for these
techniques.

In predictive modeling a labeled dataset is split into two parts as training and
test datasets. A model is built using the training data set and test dataset is
fed into the model to predict their labels. Actual labels of test dataset and
predicted labels are compared to evaluate the performance of the model.
Classification techniques are most suited for predicting or describing data sets
with binary or nominal categories. Decision Trees and Rule Based Classifiers
are examples.
10. At one stage in K-Means Clustering of the given data set with two
attributes, distances of the points to each centroid are given in the following
table: What will be the centroid coordinates in the next stage?

Id x y distance_from_1 distance_from_2 distance_from_3

0 12 39 26,93 56,08 56,73
1 28 30 14,14 41,76 53,34
2 29 54 38,12 40,80 34,06
3 24 55 39,05 45,88 37,44
4 45 63 50,70 31,14 16,40
5 52 70 59,93 32,25 6,71
6 52 63 53,71 26,40 13,34
7 55 58 51,04 20,62 18,00
8 53 23 27,89 24,21 53,04
9 55 14 29,07 30,87 62,00
10 64 19 38,12 23,35 57,71
11 69 7 43,93 35,01 70,41

To find the updated centroid coordinates we first assign points to the

existing centroids (check which point is closest to which
centroid,ex:points 0,1 and 9 are closest to C1 now we take arithmetic
mean of x and y coordinates of these points to find updated
coordinates of Centroid1)

C1=[a.m.(X0,X1,X9), a.m.(Y0,Y1,Y9)] =
[(12+28+55)/3,(39+30+14)/3]

Similarly

C2=[a.m.(X8,X10,X11), a.m.(Y8,Y10,Y11) ]=

[(53+64+69)/3,(23+19+7)/3]

and

C3=[a.m.(X2,X3,X4, X5,X6,X7), a.m.(Y2,Y3,Y4, Y5,Y6,Y7) ]=

[(29+24+45+52+52+55)/6,(54+55+63+70+63+58)/6]

Answer:

[31.67, 27.67], [62.0, 16.33], [42.83, 60.5] ]

11. How many different splits can be made on the dataset given below.
Note: Use the “Entropy” measure for information gain given by the following
formula:
Id a1 a2 a3 Class
1 T T 1.0 +
2 T T 6.0 +
3 T F 5.0 -
4 F F 4.0 +
5 F T 7.0 -
6 F T 3.0 -
7 F F 8.0 -
8 T F 7.0 +
9 F T 5.0 -

8 splits are possible:

Entropy original: -4/9 * log2(4/9) – 5/9* log2(5/9) = 0.9911

Split information gains

Ex. a1 children entropy= 4/9 * Entropy(3+,1-) + 5/9 * Entropy(1+,4-)

= 4/9* [-3/4 * log2(3/4) – 1/4* log2(1/4)] + 5/9 * [-1/5 * log2(1/5) – 4/5*

log2(4/5)] = 0.7616

Ex. a2 children entropy = 5/9 * Entropy(2+,3-) + 4/9 * Entropy(2+,2-)

= 5/9* [-2/5 * log2(2/5) – 3/5* log2(3/5)] + 4/9 * [-1/2 * log2(1/2) – 1/2*

log2(1/2)] =0.9839

a1 info gain: E.O.- (a1 children entropy) = E.O.- (0.7616)= 0.2294

a2 info gain: E.O.- (a2 children entropy) = E.O.- (0.9839)= 0.0072
a3 >=3.0 i.g.: E.O.- (a3 >=3.0 children entropy) = E.O.- (0.8484)= 0.1427
a3 >=4.0 i.g.: E.O.- (a3 >=4.0 children entropy) = E.O.- (0.9885)= 0.0026
a3 >=5.0 i.g.: E.O.- (a3 >=5.0 children entropy) = E.O.- (0.9183)= 0.0728
a3 >=6.0 i.g.: E.O.- (a3 >=6.0 children entropy) = E.O.- (0.9839)= 0.0072
a3 >=7.0 i.g.: E.O.- (a3 >=7.0 children entropy) = E.O.- (0.9728) = 0.0183
a3 >=8.0 i.g.: E.O.- (a3 >=8.0 children entropy) = E.O.- (0.8889) = 0.1022

Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
Final Exam Paper Fall 2020
No ratings yet
Final Exam Paper Fall 2020
3 pages
Week-1 Assessment-1 Answers
No ratings yet
Week-1 Assessment-1 Answers
3 pages
Syllabus
No ratings yet
Syllabus
9 pages
Data Preprocessing for ML Models
No ratings yet
Data Preprocessing for ML Models
6 pages
Question Bank - ML - Unit1,2,3
0% (1)
Question Bank - ML - Unit1,2,3
3 pages
Daa Assignment
No ratings yet
Daa Assignment
12 pages
Probabilistic Graphical Model Handout
No ratings yet
Probabilistic Graphical Model Handout
6 pages
Introduction of Neural Network
100% (1)
Introduction of Neural Network
69 pages
Sample Questions Answers
No ratings yet
Sample Questions Answers
8 pages
Minterm Maxterm K Map
No ratings yet
Minterm Maxterm K Map
15 pages
Advanced Machine Learning: Module-1
No ratings yet
Advanced Machine Learning: Module-1
164 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Clustering MCQs - Data Science Quiz
No ratings yet
Clustering MCQs - Data Science Quiz
4 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Classification Exam Prep
No ratings yet
Classification Exam Prep
9 pages
Al3451 - Question Bank
100% (1)
Al3451 - Question Bank
12 pages
Lecture 2.1.9 Comparison of BNN and ANN
No ratings yet
Lecture 2.1.9 Comparison of BNN and ANN
5 pages
Data Mining
No ratings yet
Data Mining
6 pages
One Variable Optimization
No ratings yet
One Variable Optimization
15 pages
DWM List of Experiment 2020-21 Even Sem
No ratings yet
DWM List of Experiment 2020-21 Even Sem
1 page
Neural Networks Lab Exercises
No ratings yet
Neural Networks Lab Exercises
6 pages
1 - Performance Modelling Introduction
No ratings yet
1 - Performance Modelling Introduction
71 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Ai Fundamentals Final Exam
No ratings yet
Ai Fundamentals Final Exam
21 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
Week 5 Exercises Solutions
100% (1)
Week 5 Exercises Solutions
12 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages
Fuzzy Logic and Applications PDF
No ratings yet
Fuzzy Logic and Applications PDF
13 pages
Hill Climbing Algorithm Guide
No ratings yet
Hill Climbing Algorithm Guide
5 pages
18AI61
No ratings yet
18AI61
3 pages
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
No ratings yet
Data Warehousing, OLAP, Data Mining Practice Questions Solutions
4 pages
Chapter4 - Heuristic Search
No ratings yet
Chapter4 - Heuristic Search
18 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
96 pages
Java Array Questions
No ratings yet
Java Array Questions
4 pages
KNN Algorithm - PPT (Autosaved)
0% (1)
KNN Algorithm - PPT (Autosaved)
8 pages
Deep Learning Technique Syllabus
100% (1)
Deep Learning Technique Syllabus
2 pages
Machine 2021 Jan-Apr Practice
No ratings yet
Machine 2021 Jan-Apr Practice
26 pages
Lab Manual: Semester-VII
100% (1)
Lab Manual: Semester-VII
65 pages
Question Bank - NLP
No ratings yet
Question Bank - NLP
3 pages
Fuzzy Logic Syllabus
No ratings yet
Fuzzy Logic Syllabus
2 pages
Question Bank UM19MB602: Introduction To Machine Learning Unit 4: Decision Tree
No ratings yet
Question Bank UM19MB602: Introduction To Machine Learning Unit 4: Decision Tree
4 pages
SVM Questions
No ratings yet
SVM Questions
7 pages
SYLLABUS
No ratings yet
SYLLABUS
2 pages
Bayesian Belief Network in Artificial Intelligence
No ratings yet
Bayesian Belief Network in Artificial Intelligence
10 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
18 pages
Discrete Math for CS Students
No ratings yet
Discrete Math for CS Students
41 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Computational Intelligence (CS3030/CS3031) : School of Computer Engineering, KIIT-DU, BBS-24, India
No ratings yet
Computational Intelligence (CS3030/CS3031) : School of Computer Engineering, KIIT-DU, BBS-24, India
2 pages
Machine Learning - Question
No ratings yet
Machine Learning - Question
5 pages
Computer Algoritham For Chennai Univarsity Unit5
No ratings yet
Computer Algoritham For Chennai Univarsity Unit5
11 pages
Ai Minor 6th Sem
No ratings yet
Ai Minor 6th Sem
1 page
DSA Assignment 1-Solutions
No ratings yet
DSA Assignment 1-Solutions
10 pages
Bayesian Networks: Approximate Inference
No ratings yet
Bayesian Networks: Approximate Inference
37 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
Lab Manual Soft Computing
No ratings yet
Lab Manual Soft Computing
44 pages
Programming in Java, 2e Sachin Malhotra Saurabh Choudhary
No ratings yet
Programming in Java, 2e Sachin Malhotra Saurabh Choudhary
38 pages
Daa Unit5 Notes
No ratings yet
Daa Unit5 Notes
15 pages
DW & DM Questions & Answers
No ratings yet
DW & DM Questions & Answers
12 pages
Another SAT Practice Test
No ratings yet
Another SAT Practice Test
20 pages
Decision Tree Case Study: Options
No ratings yet
Decision Tree Case Study: Options
5 pages
Himni I Flamurit - Wikipedia
No ratings yet
Himni I Flamurit - Wikipedia
3 pages
Support Vector Machine
No ratings yet
Support Vector Machine
7 pages
PM Glossary of Terms
No ratings yet
PM Glossary of Terms
10 pages
Financial Engineering Lecture Notes
No ratings yet
Financial Engineering Lecture Notes
12 pages
Financial Engineering Lecture Notes
No ratings yet
Financial Engineering Lecture Notes
24 pages
The Victorian Actress in The Novel and On The Stage Renata Kobetts Miller PDF Download
No ratings yet
The Victorian Actress in The Novel and On The Stage Renata Kobetts Miller PDF Download
65 pages
Denver International Airport Baggage Handling System Failure
No ratings yet
Denver International Airport Baggage Handling System Failure
12 pages
Faceless - Book Review
No ratings yet
Faceless - Book Review
13 pages
Arabic Lesson 3
No ratings yet
Arabic Lesson 3
7 pages
Advanced Function Analysis
No ratings yet
Advanced Function Analysis
10 pages
ChatGPT Insights for Exam Prep
No ratings yet
ChatGPT Insights for Exam Prep
10 pages
The Bodhisattvas Confession of Downfalls
No ratings yet
The Bodhisattvas Confession of Downfalls
11 pages
Grade11 STEM Reviewer Full
No ratings yet
Grade11 STEM Reviewer Full
11 pages
Music of Southeast Asian: Lesson
No ratings yet
Music of Southeast Asian: Lesson
22 pages
CPT168 HW 11 Answer Key
No ratings yet
CPT168 HW 11 Answer Key
8 pages
GEC 102 Chapter 4r
No ratings yet
GEC 102 Chapter 4r
9 pages
02 - Linux Checklist
No ratings yet
02 - Linux Checklist
6 pages
Play - Definition
No ratings yet
Play - Definition
5 pages
Kindergarten Shape Attributes
No ratings yet
Kindergarten Shape Attributes
3 pages
Liturgy of The Hours - USCCB
100% (2)
Liturgy of The Hours - USCCB
1 page
Building and Maintaining Friendships
No ratings yet
Building and Maintaining Friendships
3 pages
Features of Maltego
No ratings yet
Features of Maltego
6 pages
Ghaus-ul-Azam, The Greatest Saint of All Time
No ratings yet
Ghaus-ul-Azam, The Greatest Saint of All Time
7 pages
6fc5103-0ab03-1aa3 Siemens Manual Datasheet PDF
No ratings yet
6fc5103-0ab03-1aa3 Siemens Manual Datasheet PDF
267 pages
Keihlasan Dan Arti Pentingnya Dalam Mengelola Pendidikan
No ratings yet
Keihlasan Dan Arti Pentingnya Dalam Mengelola Pendidikan
18 pages
Informative and Persuasive Writing
No ratings yet
Informative and Persuasive Writing
4 pages
Alster-Sumerian Proverbs
No ratings yet
Alster-Sumerian Proverbs
17 pages
Ab 2 F (A) F (B) 2 Ab 2 Ab 2
No ratings yet
Ab 2 F (A) F (B) 2 Ab 2 Ab 2
18 pages
XML Parsing Techniques in Java
No ratings yet
XML Parsing Techniques in Java
44 pages
Dynamic Programming Made Simpler
No ratings yet
Dynamic Programming Made Simpler
15 pages
Centum VP Collaborative Information Server
No ratings yet
Centum VP Collaborative Information Server
146 pages
Irregular Verbs
No ratings yet
Irregular Verbs
6 pages
Tarek Hajj Shehadi: VECTOR INTEGRAL CALCULUS
No ratings yet
Tarek Hajj Shehadi: VECTOR INTEGRAL CALCULUS
65 pages
HPR Ids
No ratings yet
HPR Ids
6 pages
Philippine Literary History Overview
100% (1)
Philippine Literary History Overview
249 pages

Data Mining Exercises - Solutions

Uploaded by

Data Mining Exercises - Solutions

Uploaded by

1. Give 5 properties of Python and explain why Python is suitable for Data Mining?

2. Write the output of the following code.

Line 1 must be → import pandas as pd

This output needs to be True, since elements of

8. a) Describe the difference between unsupervised and supervised

a) Supervised techniques can be used when labeled dataset is available for

b) Overfitting is a problem of supervised techniques where the model is too

9. Describe predictive modeling process. Which techniques are most suitable

Id x y distance_from_1 distance_from_2 distance_from_3

To find the updated centroid coordinates we first assign points to the

C3=[a.m.(X2,X3,X4, X5,X6,X7), a.m.(Y2,Y3,Y4, Y5,Y6,Y7) ]=

[31.67, 27.67], [62.0, 16.33], [42.83, 60.5] ]

8 splits are possible:

Entropy original: -4/9 * log2(4/9) – 5/9* log2(5/9) = 0.9911

Ex. a1 children entropy= 4/9 * Entropy(3+,1-) + 5/9 * Entropy(1+,4-)

= 4/9* [-3/4 * log2(3/4) – 1/4* log2(1/4)] + 5/9 * [-1/5 * log2(1/5) – 4/5*

Ex. a2 children entropy = 5/9 * Entropy(2+,3-) + 4/9 * Entropy(2+,2-)

= 5/9* [-2/5 * log2(2/5) – 3/5* log2(3/5)] + 4/9 * [-1/2 * log2(1/2) – 1/2*

a1 info gain: E.O.- (a1 children entropy) = E.O.- (0.7616)= 0.2294

You might also like