Assignment 2
Q.1 Describe the following term
Data cleaning
Data transformation
Data reduction
Binning
Outliers
Entity identification problem
Q.2 Suppose that the values for a given set of data are grouped into intervals. The intervals and
corresponding frequencies are as follows. Compute an approximate median value for the data.
Q. 3 Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33,
35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data?What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
Q.4 Using the data for age given in Q. 3, answer the following.
(a) Use smoothing by bin means to smooth the data, using a bin depth of 3. Illustrate your steps.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?
Q.5 Use the two methods below to normalize the following group of data: 200, 300, 400, 600,
1000
(a) min-max normalization by setting min = 0 and max = 1
(b) z-score normalization
Q.6 Using the data for age given in Q.3 answer the following:
(a) Use min-max normalization to transform the value35for age onto the range [0:0;1:0].
(b) Use z-score normalization to transform the value 35 for age, where the standard deviation of
age is 12.94 years.
(c) Use normalization by decimal scaling to transform the value 35 for age.
Q.7 Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215
Partition them into three bins by each of the following methods:
(a) equal-frequency (equidepth) partitioning
(b) equal-width partitioning