Tutorial 1: Introduction to Data Mining
1. What is data mining?
a. A process or method that extracts or “mines” interesting knowledge or patterns from
large amounts of data
b. A method of data analysis that automates analytical model building
c. Both (a) and (b) are incorrect
2. Why do we use data mining?
a. Large-scale data is being collected and stored at enormous speeds
b. High-dimensional, and complex data is everywhere
c. All above are correct
3. Which of the following are data mining steps?
a. Data cleaning, data integration, and data transformation
b. Data classification and prediction
c. Data clustering
d. Association Rule
4. Which of the following are data mining tasks, or functionalities?
a. Quantitative mining
b. Discrete mining
c. Descriptive mining
d. Continuous mining
5. Which of the following are data mining techniques?
a. Data cleaning
b. Classification
c. Data cleaning, data integration, and data transformation
d. Data selection
6. What does machine learning refer to?
a. A branch of artificial intelligence (AI) and computer science which focuses on the use of
data and algorithms to imitate the way that humans learn, gradually improving its
accuracy
b. A method of data analysis that automates analytical model building
c. Both (a) and (b) are correct
7. Which of the following are types of machine learning?
a. Supervised learning
b. Unsupervised learning
c. Semi-Supervised learning
d. All above are correct
8. Data cleaning can be defined as…
a. A process in which multiple data sources are combined
b. A process that removes or transforms noise and inconsistent data
c. A process where data relevant to the analysis task are retrieved from the database
d. All above are incorrect
9. What is the difference between classification and clustering?
a. Classification is a process by which a model is created to predict an outcome for some
target variable, while clustering is a process of grouping data into categories by similarity
b. Clustering is a process by which a model is created to predict an outcome for some
target variable, while classification is a process of grouping data into categories by
similarity
c. Both (a) and (b) are incorrect
10. Clustering may be considered a form of…
a. Supervised learning
b. Unsupervised learning
c. Reinforcement learning
d. Semi-Supervised learning
11. Describe 3 real-life examples in which the use of Machine learning by companies, banks, social
media platforms etc. have directly impacted on you. Emails automatically being archived in junk
folder, adverts targeting you in social media as a result of your google searches, use of voice
recognition to verify yourself when you phone the bank, etc..
12. Name 3 broad sectors or fields in which data mining is commonly used? Health Care, Finance,
Marketing, etc.
13. Name the broad type of machine learning that is associated with developing a predictive model
using a data set containing input attributes as well as labelled data for the target Supervised learning
14. Name the broad type of machine learning that is used to detect patterns in the data when the
data is not labelled? Unsupervised learning
15. What does a classification model do?
a. Assigns data to predefined category labels for the target variable
b. Predicts real number response values for the target value
c. Compares predicted data classifications to the actual class labels in the data
d. Clusters responses in groups based on similarity, to find patterns
16. Briefly describe the association rule mining approach?
This mining approach aims to discover interesting relation between objects in large
databases.
17. Which data mining technique would be most appropriate to apply to the data set below in order
to predict whether or not a new unseen patient (not included in the sample) has the same
disease? Classification
Sample Age Over- Smoker Disease result
weight
1 <40 No Yes -
2 >40 No Yes +
3 <40 Yes No -
4 >40 No No -
5 >40 No Yes +
18. Suppose we have a data set with 𝑛 attributes 𝑋1, 𝑋2, … , 𝑋𝑛, where the class label 𝑌 for each instance is
missing. What would be the most suitable data mining technique for such a data set?
Clustering or association
19. Which of the following would be considered a data mining task?
a. Predicting the weather using satellite imaging data
b. Predicting loan repayments using customers payment history
c. Both of the above