KEMBAR78
Assg 2 Pre-Processing | PDF
0% found this document useful (0 votes)
169 views1 page

Assg 2 Pre-Processing

This document contains 7 questions about data analysis concepts and techniques. Question 1 defines key terms like data cleaning, transformation, reduction, binning, and outlier detection. Question 2 asks to compute the median of grouped data. Question 3 analyzes age data, finding mean, median, mode, and commenting on modality. Question 4 has subparts about smoothing age data using bin means and determining outliers. Question 5 covers normalizing data using min-max and z-score normalization. Question 6 applies normalization techniques to an age value. Question 7 involves partitioning sorted data into bins using equal-frequency and equal-width methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views1 page

Assg 2 Pre-Processing

This document contains 7 questions about data analysis concepts and techniques. Question 1 defines key terms like data cleaning, transformation, reduction, binning, and outlier detection. Question 2 asks to compute the median of grouped data. Question 3 analyzes age data, finding mean, median, mode, and commenting on modality. Question 4 has subparts about smoothing age data using bin means and determining outliers. Question 5 covers normalizing data using min-max and z-score normalization. Question 6 applies normalization techniques to an age value. Question 7 involves partitioning sorted data into bins using equal-frequency and equal-width methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Assignment 2

Q.1 Describe the following term


 Data cleaning
 Data transformation
 Data reduction
 Binning
 Outliers
 Entity identification problem

Q.2 Suppose that the values for a given set of data are grouped into intervals. The intervals and
corresponding frequencies are as follows. Compute an approximate median value for the data.

Q. 3 Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33,
35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data?What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).

Q.4 Using the data for age given in Q. 3, answer the following.
(a) Use smoothing by bin means to smooth the data, using a bin depth of 3. Illustrate your steps.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?

Q.5 Use the two methods below to normalize the following group of data: 200, 300, 400, 600,
1000
(a) min-max normalization by setting min = 0 and max = 1
(b) z-score normalization

Q.6 Using the data for age given in Q.3 answer the following:
(a) Use min-max normalization to transform the value35for age onto the range [0:0;1:0].
(b) Use z-score normalization to transform the value 35 for age, where the standard deviation of
age is 12.94 years.
(c) Use normalization by decimal scaling to transform the value 35 for age.

Q.7 Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215
Partition them into three bins by each of the following methods:
(a) equal-frequency (equidepth) partitioning
(b) equal-width partitioning

You might also like