Probability and Statistics (PHM113)
Lecture 2:
Data Organization &Data Presentation
Organizing Data
• A frequency distribution is the organization of raw data in table
form, using classes and frequencies.
Frequency
distributions
Categorical Quantitative
frequency frequency
- Two types of frequency distributions that distribution distribution
are most often used are the categorical
frequency distribution and the grouped Ungrouped Grouped
frequency frequency
frequency distribution.
distribution distribution
Categorical Frequency Distributions
• The categorical frequency distribution is used for data that can be
placed in specific categories, such as nominal or ordinal-level
(Qualitative variables) data.
• Examples:
Data such as sports, or major field of study
Example 1
Twenty-five army inductees were given a blood test to determine their blood type.
The data set is
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution: Distribution of Blood Types
Since the data are categorical, discrete classes can be used. There are four blood types: A, B, O, and
AB. These types will be used as the classes for the distribution.
The frequency distribution for the data is:
Class Frequency Relative Frequency
(Percent)
A
B
O
AB
𝑓𝑟𝑒𝑞.
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (%) = × 100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠
Solution: Distribution of Blood Types
Since the data are categorical, discrete classes can be used. There are four blood types: A, B, O, and
AB. These types will be used as the classes for the distribution.
The frequency distribution for the data is:
Class Frequency Relative Frequency
(Percent)
A 5 20
B 7 28
O 9 36
AB 4 16
𝑓𝑟𝑒𝑞.
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (%) = × 100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠
Grouped Frequency Distributions
• When the range of the data is large, the data must be grouped into classes that
are more than one unit in width, in what is called a grouped frequency
distribution.
• Class limits
Lower limit – Higher limit
• Class boundaries
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 − 0.5 − (𝐻𝑖𝑔ℎ𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 0.5)
𝐻𝑖𝑔ℎ𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 + 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
• Midpoint =
2
Example 2
These data represent the record high temperatures in degrees
Fahrenheit (oF) for each of the 50 states. Construct a grouped
frequency distribution for the data using 7 classes.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Step 1: Determine the classes
• Range = highest value - lowest value =
=134 – 100 = 34
• 7 Classes is given Class Class boundaries Class
𝑅𝑎𝑛𝑔𝑒 limits Midpoints
• Class width=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
34
= = 4.9 ≅ 5 (𝑅𝑜𝑢𝑛𝑑 𝑢𝑝)
7
• Classes start:
100, 105, 110, 115, 120, 125, 130
• Class boundaries
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡−0.5 – 𝐻𝑖𝑔ℎ𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+0.5
Step 1: Determine the classes
• Range = highest value - lowest value =
=134 – 100 = 34
• 7 Classes is given Class Class boundaries Class
𝑅𝑎𝑛𝑔𝑒 limits Midpoints
• Class width=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
34 100–104 99.5–104.5 102
= = 4.9 ≅ 5 (𝑅𝑜𝑢𝑛𝑑 𝑢𝑝) 105–109 104.5–109.5 107
7
• Classes start: 110–114 109.5–114.5 112
100, 105, 110, 115, 120, 125, 130 115–119 114.5–119.5 117
• Class boundaries 120–124 119.5–124.5 122
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡−0.5 – 𝐻𝑖𝑔ℎ𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+0.5 125–129 124.5–129.5 127
130–134 129.5–134.5 132
Step 2:Find the numerical frequencies.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Class Class boundaries Class Frequency
limits Midpoints
100–104 99.5–104.5 102
105–109 104.5–109.5 107
110–114 109.5–114.5 112
115–119 114.5–119.5 117
120–124 119.5–124.5 122
125–129 124.5–129.5 127
130–134 129.5–134.5 132
Step 2:Find the numerical frequencies.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Class Class boundaries Class Frequency
limits Midpoints
100–104 99.5–104.5 102 2
105–109 104.5–109.5 107 8
110–114 109.5–114.5 112 18
115–119 114.5–119.5 117 13
120–124 119.5–124.5 122 7
125–129 124.5–129.5 127 1
130–134 129.5–134.5 132 1
Relative Frequency
Cumulative frequency distribution
Class limits Class boundaries Class Frequency Class boundaries Cumulative
Midpoints frequency
100–104 99.5–104.5 102 2
105–109 104.5–109.5 107 8
110–114 109.5–114.5 112 18
115–119 114.5–119.5 117 13
120–124 119.5–124.5 122 7
125–129 124.5–129.5 127 1
130–134 129.5–134.5 132 1
Cumulative frequency distribution
Class limits Class boundaries Class Frequency Class boundaries Cumulative
Midpoints frequency
Less Than 99.5 0
100–104 99.5–104.5 102 2 Less Than 104.5 2
105–109 104.5–109.5 107 8 Less Than 109.5 10
110–114 109.5–114.5 112 18 Less Than 114.5 28
115–119 114.5–119.5 117 13 Less Than 119.5 41
120–124 119.5–124.5 122 7 Less Than 124.5 48
125–129 124.5–129.5 127 1 Less Than 129.5 49
130–134 129.5–134.5 132 1 Less Than 134.5 50
Cumulative Relative Frequency
Note 1: Division leads to integer
Construct a grouped frequency distribution for the data using 5 classes
112 100 127 120 135 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Range = highest value - lowest value = 135 – 100 = 35
𝑅𝑎𝑛𝑔𝑒 35
Class width= = =7
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 5
• Classes start: 100, 107, 114, 121, 128
135 missing !!
𝑅𝑎𝑛𝑔𝑒 35
Class width= = =7+1=8
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 5
• Classes start: 100, 108, 116, 124, 132
Ungrouped frequency distribution
When the range of the data values is relatively small, a
frequency distribution can be constructed using single data
values for each class. This type of distribution is called an
ungrouped frequency distribution
Example
Find the frequency distribution table for 20 families who own certain
number of Cats.
The data set is: 3, 0, 1, 4, 4, 1, 2, 0, 2, 2, 0, 2, 0, 1, 3, 1, 2, 1, 1, 3.
The frequency distribution for the data is:
No. of Cats Frequency
0 4
1 6
2 5
3 3
4 2
Data Presentation
Stem and Leaf Plots
A stem and leaf plot is a data plot that uses part of the data value as
the stem and part of the data value as the leaf to form groups or
classes.
• A combination of sorting and graphing
• It has the advantage over a grouped frequency distribution of
retaining the actual data while showing them in graphical form.
Example
At an outpatient testing center, the number of cardiograms performed
each day for 20 days is shown. Construct a stem and leaf plot for the
data.
25 31 20 32 13
14 43 02 57 23
36 32 33 32 44
32 52 44 51 45
Solution
1-Arrange the data in order:
02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 44, 45, 51, 52, 57
2-Separate the data according to the first digit, as shown.
02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36
43, 44, 44, 45 51, 52, 57
3-A display is made by using
the leading digit as the stem
and the trailing digit as the leaf
Example
An insurance company researcher conducted a survey on the number of car
thefts in a large city for a period of 30 days last summer. The raw data are
shown. Construct a stem and leaf plot by using classes
50–54, 55–59, 60–64, 65–69, 70–74, and 75–79.
52 62 51 50 69
58 77 66 53 57
75 56 55 67 73
79 59 68 65 72
57 51 63 69 75
65 53 78 66 55
Step 1: Arrange the data in order.
50, 51, 51, 52, 53, 53, 55, 55, 56, 57, 57, 58, 59, 62, 63, 65, 65, 66, 66,
67, 68, 69, 69, 72, 73, 75, 75, 77, 78, 79
Step 2: Separate the data according to the classes.
50, 51, 51, 52, 53, 53 55, 55, 56, 57, 57, 58, 59
62, 63 65, 65, 66, 66, 67, 68, 69, 69
72, 73 75, 75, 77, 78, 79
Graphs
The three most commonly used graphs in research are
1. The histogram is a graph that displays the data by using
vertical bars of various heights to represent the frequencies of
the classes.
2. The frequency polygon is a graph that displays the data by
using lines that connect points plotted for the frequencies at
the midpoints of the classes.
3. The ogive is a graph that represents the cumulative
frequencies for the classes in a frequency distribution.
Histogram
• Back to Example 2
Histogram
Frequency polygon
• Back to Example 2
Frequency polygon
Ogive
• Back to Example 2
Ogive