Mathematical Data
Management
John Irish G. Lira, PhD
What is Data Management?
Data Management is the development and
execution of architectures, policies,
practices and procedures in order to
manage the information lifecycle needs of
an enterprise in an effective manner.
Data Management Association International 2019
What is Statistics?
A branch of Mathematics that examines
and investigates ways to process and
analyze the data gathered.
It provides procedure in data collection,
presentation, organization, and
interpretation to have meaningful idea that
is useful to decision-makers.
Frequency Distribution
and Graph
What is Frequency Distribution?
Tabular Form
It is a group of data into categories
showing the number of observations in
each of the non-overlapping classes.
Mutually Exclusive
Grouped Frequency Distribution
It is used when the range of the data set is
large.
The data are grouped into classes
ü Categorical
ü Interval or Ratio
Constructing Frequency Distribution
Grouped Frequency
Categorical Frequency
Grouped Frequency Distribution
Determining Class Interval
Categorical Frequency
It is used to organized nominal-level or
ordinal-level type of data.
Examples:
Gender Political affiliation
Business type Year level
Example 1
Twenty applicants were given a
performance evaluation appraisal. The
data set is
High High High Low Average
Average Low Average Average Average
Low Average Average High High
Low Low Average High High
Construct a frequency for the data.
Step 1
Construct a table.
Class Tally Frequency Percent
High
Average
Low
Step 2
Tally the raw data.
Class Tally Frequency Percent
High IIII-II
Average IIII-III
Low IIII
High High High Low Average
Average Low Average Average Average
Low Average Average High High
Low Low Average High High
Step 3
Convert the tallied data into numerical
frequencies.
Class Tally Frequency
High IIII-II 7
Average IIII-III 8
Low IIII 5
Step 4
Determine the percentage.
Class Tally Frequency Percent
High IIII-II 7 35
Average IIII-III 8 40
Low IIII 5 25
Frequency of
Percentage Formula: the class
f
% = x100%
n Total number
Percentage of values
Determining Class Interval
Rule 1: 2k ≥ n
Range HV - LV
i= =
No. of Classes k
Rule 2: Range
i=
1 + 3.322 log N
Range
Rule 3: i=
No. of Classes
Note: i = is the suggested class interval
Example 2
Suppose a researcher wished to do a study
on the monthly salary (in ₧ thousands) of call
center agents of selected call center
companies. The research first would have to
collect the data by asking each call center
agents about their monthly salary. The data
collected in original form is called raw data.
In this case, the data are
Example 2 (continuation)
18.80 22.00 23.40 24.30 27.00 27.90 31.00 26.00 20.80 17.00
20.00 22.60 23.40 24.50 27.00 29.30 32.10 26.10 21.00 17.30
20.25 22.75 23.70 24.70 27.40 30.10 33.70 26.30 21.60 17.80
18.40 21.90 23.00 23.85 26.80 27.80 30.80 25.00 20.40 15.50
18.70 21.90 23.20 24.10 26.90 27.90 30.90 25.20 20.50 15.70
17.95 21.75 22.90 23.70 26.50 27.50 30.60 24.75 20.25 14.10
18.35 21.80 22.90 23.70 26.50 27.60 30.75 25.00 20.30 14.30
20.20 22.80 23.50 24.60 27.30 29.50 32.90 26.20 21.30 17.40
Example 2 (continuation)
Construct a frequency distribution using
Rule 1 and determine the following
Range Relative frequencies
Interval Percentage
Class limits Midpoints
Class boundaries Cumulative frequencies
Step 1
Arrange the raw data in ascending or
descending order.
14.10 17.95 20.25 21.75 22.90 23.70 24.75 26.50 27.50 30.60
14.30 18.35 20.30 21.80 22.90 23.70 25.00 26.50 27.60 30.75
15.50 18.40 20.40 21.90 23.00 23.85 25.00 26.80 27.80 30.80
15.70 18.70 20.50 21.90 23.20 24.10 25.20 26.90 27.90 30.90
17.00 18.80 20.80 22.00 23.40 24.30 26.00 27.00 27.90 31.00
17.30 20.00 21.00 22.60 23.40 24.50 26.10 27.00 29.30 32.10
17.40 20.20 21.30 22.75 23.50 24.60 26.20 27.30 29.50 32.90
17.80 20.25 21.60 22.80 23.70 24.70 26.30 27.40 30.10 33.70
Step 2
Determine the classes
ü Find the Highest Value (HV) and Lowest
Value (LV) in the data set.
HV = 33.70 and LV = 14.10
ü Find the Range
Range = HV – LV = 33.70 – 14.10 = 19.60
ü Determine the number of classes using
2K ≥ n Rule
Determining the number of classes
2k ≥ n (2 raised to the power of k.)
? When k = 6
2k ≥ n Þ 26 ≥ 80 Þ 64 ≥ 80
? When k = 7
2k ≥ n Þ 27 ≥ 80 Þ 128 ≥ 80
Thus, the recommended no. of classes is 7.
Determine the class interval (or width)
Range HV - LV
i= =
No. of Classes k
33.70 - 14.10 19.60
i= = = 2.80 » 3
7 7
Thus, the interval is 3.
Step 2 (continuation)
Select a starting point for the lowest class limit
14
14.10 17.95 20.25 21.75 22.90 23.70 24.75 26.50 27.50 30.60
14.30 18.35 20.30 21.80 22.90 23.70 25.00 26.50 27.60 30.75
15.50 18.40 20.40 21.90 23.00 23.85 25.00 26.80 27.80 30.80
15.70 18.70 20.50 21.90 23.20 24.10 25.20 26.90 27.90 30.90
17.00 18.80 20.80 22.00 23.40 24.30 26.00 27.00 27.90 31.00
17.30 20.00 21.00 22.60 23.40 24.50 26.10 27.00 29.30 32.10
17.40 20.20 21.30 22.75 23.50 24.60 26.20 27.30 29.50 32.90
17.80 20.25 21.60 22.80 23.70 24.70 26.30 27.40 30.10 33.70
Step 2 (continuation)
Determine Lower and Upper class limits
Class Limits
12 – 14
15 – 17
18 – 20
Lower Limit 21 – 23 Upper Limit
24 – 26
27 – 29
30 – 32
33 – 35
Step 2 (continuation)
Determine Lower and Upper Class Limits
Class Limits Class Boundaries
12 – 14 11.5 – 14.5
15 – 17 14.5 – 17.5 12 – 0.5 = 11.5
18 – 20 17.5 – 20.5
21 – 23 20.5 – 23.5
24 – 26 23.5 – 26.5
27 – 29 26.5 – 29.5 20 + 0.5 = 20.5
30 – 32 29.5 – 32.5
33 – 35 32.5 – 35.5
Step 3
Tally the raw data
Class Limits Real Boundaries Tally
12 – 14 11.4445 – 14.4444 II
15 – 17 14. 4445 – 17.4444 IIII
17. 4445 – 20.4444
18 – 20 IIII-IIII-II
20. 4445 – 23.4444
21 – 23 23. 4445– 26.4444 IIII-IIII-IIII-IIII
24 – 26 26. 4445 – 29.4444 IIII-IIII-IIII-III
27 – 29 29. 4445 – 32.4444 IIII-IIII-IIII
30 – 32 32. 4445 – 35.4444 IIII-III
33 – 35 II
Step 4
Convert the tallied data to numerical frequencies
Class Limits Tally Frequency
12 – 14 II 2
15 – 17 IIII 5
18 – 20 IIII-IIII-II 12
21 – 23 IIII-IIII-IIII-IIII 19
24 – 26 IIII-IIII-IIII-III 18
27 – 29 IIII-IIII-IIII 14
30 – 32 IIII-III 8
33 – 35 II 2
Step 6
Determine the percentage
Class Limits Frequency Percentage
12 – 14 2 2.50
15 – 17 5 6.25
18 – 20 12 15.00
21 – 23 19 23.75
24 – 26 18 22.50
27 – 29 14 17.50
30 – 32 8 10.00
33 – 35 2 2.50
Total 80 100
(8 ÷ 80) x 100% ≈ 10%
Step 5
Determine the relative frequency (rf)
Class Limits Frequency Relative Frequency
12 – 14 2 0.0250
15 – 17 5 0.0625
18 – 20 12 0.1500
21 – 23 19 0.2375
24 – 26 18 0.2250
27 – 29 14 0.1750
30 – 32 8 0.1000
33 – 35 2 0.0250
Total 80 1.00
8 ÷ 80 ≈ 0.10
Step 7
Determine the cumulative frequencies (cf)
Class Limits f cf Found by
12 – 14 2 2 2
15 – 17 5 7 2+5
18 – 20 12 19 2 + 5 +12
21 – 23 19 38 2 + 5 +12 + 19
24 – 26 18 56 2 + 5 +12 + 19 + 18
27 – 29 14 70 2 + 5 +12 + 19 + 18 + 14
30 – 32 8 78 2 + 5 +12 + 19 + 18 + 14 + 8
33 – 35 2 80
Total 80
Step 8
Determine the midpoints (X)
Class Limits f X Found by
12 – 14 2 13 (12 + 14) ÷ 2
15 – 17 5 16 (15 + 17) ÷ 2
18 – 20 12 19 (19 + 20) ÷ 2
21 – 23 19 22 (21 + 23) ÷ 2
24 – 26 18 25 (24 + 26) ÷ 2
27 – 29 14 28 (27 + 29) ÷ 2
30 – 32 8 31 (30 + 32) ÷ 2
33 – 35 2 34
Total 80
Example 3
SJS Travel Agency, a nationwide local travel
agency, offers special rates on summer
period. The owner wants additional
information on the ages of those people
taking travel tours. A random sample of 50
customers taking travel tours last summer
revealed these ages.
Example 3 (continuation)
18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48
Construct a frequency distribution using
Rule 1.
Step 1
Arrange the raw data in ascending or
descending order.
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Step 2
ü Find the Highest Value (HV) and Lowest
Value (LV) in the data set.
HV = 77 and LV = 18
ü Find the Range
Range = HV – LV = 77 – 18 = 59
Step 2 (continuation)
ü Determine the number of classes using
Range
i=
1 + 3.322 log N
77 - 18
=
1 + 3.322(log 50)
59
=
1 + 3.322(1.698970004)
59
= = 8.88 » 9
6.643978354
Step 3 (continuation)
Select a starting point for the lowest class limit
18
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Step 3 (continuation)
Determine Lower and Upper class limits
Class Limits
18 – 26
27 – 35
36 – 44
Lower Limit 45 – 53 Upper Limit
54 – 62
63 – 71
72 – 80
Step 3 (continuation)
Determine Lower and Upper Class Limits
Class Limits Class Boundaries
18 – 26 17.5 – 26.5
27 – 35 26.5 – 35.5 18 – 0.5 = 17.5
36 – 44 35.5 – 44.5
45 – 53 44.5 – 53.5
54 – 62 53.5 – 62.5
63 – 71 62.5 – 71.5 44 + 0.5 = 44.5
72 – 80 71.5 – 80.5
Step 3
Tally the raw data
Class Limits Class Boundaries Tally
18 – 26 17.5 – 26.5 III
27 – 35 26.5 – 35.5 IIII
36 – 44 35.5 – 44.5 IIII-IIII
45 – 53 44.5 – 53.5 IIII-IIII-IIII
54 – 62 53.5 – 62.5 IIII-IIII-I
63 – 71 62.5 – 71.5 IIII-I
72 – 80 71.5 – 80.5 II
Step 4
Convert the tallied data to numerical frequencies
Class Limits Tally Frequency
18 – 26 III 3
27 – 35 IIII 5
36 – 44 IIII-IIII 9
45 – 53 IIII-IIII-IIII 14
54 – 62 IIII-IIII-I 11
63 – 71 IIII-I 6
72 – 80 II 2
Step 5
Determine the relative frequency (rf)
Class Limits Frequency Relative Frequency
18 – 26 3 0.06
27 – 35 5 0.10
36 – 44 9 0.18
45 – 53 14 0.28
54 – 62 11 0.22
63 – 71 6 0.12
72 – 80 2 0.04
Total 50 1.00
2 ÷ 50 = 0.04
Step 6
Determine the percentage
Class Limits Frequency Percentage
18 – 26 3 6
27 – 35 5 10
36 – 44 9 18
45 – 53 14 28
54 – 62 11 22
63 – 71 6 12
72 – 80 2 4
Total 50 100
(2 ÷ 50) x 100 = 4
Step 7
Determine the cumulative frequencies (cf)
Class Limits f cf Found by
18 – 26 3 3 3
27 – 35 5 8 3+5
36 – 44 9 17 3+5+9
45 – 53 14 31 3 + 5 + 9 + 14
54 – 62 11 42 3 + 5 + 9 + 14 + 11
63 – 71 6 48 3 + 5 + 9 + 14 + 11 + 6
72 – 80 2 50 3 + 5 + 9 + 14 + 11 + 6 + 2
Total 50
Step 8
Determine the midpoints (X)
Class Limits f X Found by
18 – 26 3 22 (18 + 26) ÷ 2
27 – 35 5 31 (27 + 35) ÷ 2
36 – 44 9 40 (36 + 44) ÷ 2
45 – 53 14 49 (45 + 53) ÷ 2
54 – 62 11 58 (54 + 62) ÷ 2
63 – 71 6 67 (63 + 71) ÷ 2
72 – 80 2 76 (72 + 80) ÷ 2
Total 50
What is a Stem-and-Leaf plot?
This method is to some extent overcomes the
loss of actual observations brought about by
the histogram.
The advantage of the stem-and-leaf plot over
the histogram is that we can see the actual
observations.
Was introduced by John Tukey.
The stem is the leading digit or digits.
The leaf is the trailing digit.
Example 3 (Stem and Leaf)
NU Travel Agency, a nationwide local travel
agency, offers special rates on summer
period. The owner wants additional
information on the ages of those people
taking travel tours. A random sample of 50
customers taking travel tours last summer
revealed these ages.
Example 3 (continuation)
18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48
Construct a stem-and-leaf plot.
Example 3 (Stem and Leaf)
Stem Leaf
1 8, 9
2 4, 7, 8, 9
3 1, 4, 6, 6, 7, 8, 9, 9
4 0, 2, 4, 5, 6, 6, 7, 8, 8, 8, 9, 9
5 0, 1, 1, 2, 3, 4, 4, 5, 6, 7, 8, 8, 9
6 0, 1, 2, 3, 4, 6, 7, 8
7 0, 4, 7
Tens digit Units digit
(leading digits) (trailing digits)
Graphing Frequency Distribution
Histogram
Frequency Polygon
Cumulative Frequency or Ogive
Example 3
Let us consider the middle income of 80
families living in National Capital Region.
Class Limits Class Boundaries X f cf
18 – 26 17.5 – 26.5 22 4 4
27 – 35 26.5 – 35.5 31 9 13
36 – 44 35.5 – 44.5 40 16 29
45 – 53 44.5 – 53.5 49 23 52
54 – 62 53.5 – 62.5 58 17 69
63 – 71 62.5 – 71.5 67 8 77
72 – 80 71.5 – 80.5 76 3 80
Construct a histogram, frequency polygon,
and cumulative frequency polygon.
Histogram
A graph in which the classes are marked on
the horizontal axis (x-axis) and the class
frequencies on the vertical axis (y-axis).
Histogram of Middle Income Families at NCR
25
20
Frequency
Midpoints
15
10
5
0
22 31 40 49 58 67 76
Salary (in Thousands)
Frequency Polygon
A graph that displays the data using points
which are connected by lines.
Frequency Polygon for Call Center Agents' Salary
25
20
Frequency
15
10
5
Midpoints
0
15 18 21 24 27 30 33
Salary (inThousands)
Cumulative Frequency Polygon
A graph that displays the cumulative
frequencies for the classes in a frequency
distribution.
Ogive for Call Center Agents' Salary
100
Cumulative Frequency
80
60
Upper
40 Class
20 Boundaries
0
16.5 19.5 22.5 25.5 28.5 31.5 34.5
Real Limit (Salary in Thousands)
Other Types of Graphs/Charts
Pareto Chart
Bar Chart (Bar Graph)
Pie Chart (Circle Graph)
Time Series Graph
Pictograph
Scatter Plot
Example 4
Using the information in the table below about
the favorite snacks of 870 youths, construct a
pareto chart, bar chart, and pie chart.
Products Sales
Junk Foods 135
Candy 250
Ice Cream 185
Chocolate 210
Others 90
Pareto Chart
It represent a frequency distribution for a
categorical data (or nominal-level) & frequencies
are displayed by the heights of vertical bars,
which are arranged in order from highest to
lowest.
Favorite Snacks
300
250
Sales (in Millions)
200
150
100
50
0
Candy Chocolate Ice Cre am Junk Foods Othe rs
Products
Bar Chart (Bar Graph)
The bases of the rectangles are arbitrary intervals
whose centers are the codes. The height of each
rectangle represents the frequency of that
category. It is also applicable for categorical data
(or nominal-level).
Favorite Snacks
300
250
Sales (in Millions)
200
150
100
50
0
Junk Foods Candy Ice Cream Chocolate Others
Products
Pie Chart (Circle Graph)
A circle divided into portions that represent the
relative frequencies (or percentages) of the data
belonging to different categories. The data in a
pie chart should be categorical or nominal-level.
Favorite Snacks
Others
10%
Candy
29%
Junk Foods
16%
Ice Cream
21% Chocolate
24%
Time Series Graph
It represents data that occur over specific
period of time under observation.
It shows for a trend or pattern on the increase
or decrease over the period of time.
Example for Time Series Graph
Using the information in the table below about
the dollar to peso exchange rate from January to
December of 2009, construct a time series
graph.
Month Jan Feb March April May June
Peso/US Dollar 41 42 43 46 44 45
Exchange Rate
Month July August Sept Oct Nov Dec
Peso/US Dollar 43 42 45 44 45 43
Exchange Rate
Example for Time Series Graph
Peso-US Dollar Exchange Rate
47
46
Peso per US Dollar
45
44
43
42
41
40
39
38
Jan Feb Mar Apr May Jun Jul Aug S ep Oct Nov Dec
Months
Pictograph
It immediately suggests the nature of the data
being shown.
It is a combination of the attention-getting
quality and the accuracy of the bar chart.
Appropriate pictures arranged in a row
(sometimes in a column) present the
quantities for comparison.
Example for Pictograph
The VSAS Realty Inc. is a real estate who
develops household in Rizal province. The
information in the table show the number of
house construction from 2005 to 2009. Construct
a pictograph.
Year 2005 2006 2007 2008 2009
No. of Houses 400 250 600 550 700
Example for Pictograph
800
700
600
No. of houses
500
400
300
200
100
2005 2006 2007 2008 2009
Year
Legend: = 100 houses
Scatter Plot
It used to examine possible relationships
between two numerical variables.
The two variables are plot in x-axis and y-axis.
Example for Scatter Diagram
The owner of a chain of halo-halo stores
would like to study the effect of atmospheric
temperature on sales during the summer
season. A random sample of 12 days is
selected with the results given as follows:
Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales 147 143 147 168 206 155 192 211 209 187 200 150
Guidelines for Developing Graphs/Charts
ü The graph or chart should include a title.
ü The scales for all axes should be included.
ü The scale on the y-axis should start at zero.
ü The graph or chart should not disfigure the data.
ü The x-axis and y-axis should be properly labeled.
ü The graph or chart should not contain
unnecessary decorations.
ü The simplest possible graph or chart should be
used for any data set.
Example for Scatter Plot
225
200
175
150
Sales (Y)
125
100
75
50
25
0
0 15 30 45 60 75 90
Temperature (X)
Statistics: The only Science that enables
different experts using the same figure to
draw different conclusions.
– Evan Esar