Lesson 2 : DATA COLLECTION AND PRESENTATION
Learning Objectives
State the different methods in collecting and presenting data
Differentiate probability from non probability sampling
Construct the frequency distribution table
Enumerate the different graphical presentations.
Data collected are useless and meaningless unless they are properly
presented for analysis and interpretation. All statistical procedures help to describe
data.
In this lesson, you will learn the different ways of presenting data, either tabular or
graphical. These methods of presenting data are considered important
characteristics of the data on a more direct manner than is possible using any of the
statistical analysis.
A. Data Collection
Methods of Data Collection
Direct Method referred to as interview. This may be structured or
unstructured interview. This is mainly used for a small sample size.
This is a method where there is a person to person exchange of idea
between the one soliciting information (interviewer) and the one
supplying the information (interviewee).
Indirect Method popularly known as paper and pencil method or the
questionnaires method. Researcher has to prepare questions relevant
to the subject of the study.
Registration Method referred to as documentary analysis where the
researcher makes use of the data /fact / information on file. These
documents are something that is enforced by a certain law or policy.
This includes birth, death, licenses and other records.
Observation Method data pertaining to behaviors of an individual or a
group of individuals at the time of occurrence of a given situation are
best obtain by direct observation. Subjects may be taken individually or
collectively, depending on the target of the investigator. This method is
used also if the objects of the study cannot talk nor write – like plants
and animals.
Experimental method this method examines the cause and effect of
certain phenomena. Data obtained here are done through a series of
experiments which require laboratory results.
B. Sampling Techniques
Probability Sampling.
It is a sampling procedure wherein every element of the population is given a
non zero chance of being selected as sample. This is taken to mean that everyone in
the population has the chance to be included in the sample. It is also known as
Random Sampling.
Simple Random Sampling Selection is done fairly, just and without bias.
Researcher gives no criteria or is being objective in the selection of samples.
Examples: drawing of winning stub in the tambiolo; selection of number in
the table of random sampling and others.
Systematic Sampling. The researcher obtains sampling by developing a
certain nth star or simply developing a pattern which can also be dine
through random selection.
Stratified Sampling. Selection of samples in this sampling technique can be
done by equal or proportional strata. This is the technique commonly used
particularly if there are several sources of data.
Cluster Sampling. This technique is done by choosing samples in group.
Selection will be randomly done in clustered form. When a group is chosen,
regardless of who is in the group, they are all considered as samples.
Multistage Sampling. This technique is referred to as selection of samples
in several stages of sampling.
Non- Probability Sampling.
It is a sampling technique wherein not every element of the population is
given a chance of being selected as a sample. The researcher states his prejudice
for certain samples. It is otherwise known as non random sampling.
Purposive Sampling. It is a non random sampling technique of
choosing samples where the researcher defined his criteria and rules.
Quota Sampling. The researcher or investigator limits the number of
samples on the required number for the subject of his study.
Convenience Sampling. The researcher chooses his most preferred
location / venue where he can conduct his study. The researcher
specifies the place and time where he can collect his data.
C. Data Presentation
Textual Presentation
Data collected is presented in paragraph form if it is purely qualitative or when
there are very few numbers involved. This method is commonly adopted by
researchers undergoing qualitative research.
Tabular Presentation
The more effective way of presenting the data is by means of table which
appears in the form of rows and columns. Data presented in tabular form can be
easily used for comparison and emphasis. One can easily draw relationships from
the presented table.
A statistical table has four components: table heading, body, stubs, and box heads.
Table 1
Frequency Distribution of Respondents in terms of Sex
Percentag
Sex Frequency
e
Male 20 29
Female 50 71
Total 70 100
Graphical Presentation of Data
The statistics often uses graphs for better analysis of variables. There are two types
if graphs for analyzing variables :
- Histogram ( bar chart)
- Pie Chart
Histogram is a standard graph where variants of the variables are
represented on one axis and variable frequencies on the other axis.
Individual values of the frequency are then displayed as bars ( boxes,
vectors, logs, cones etc.)
Pie Chart represents relative frequencies of individual variants of a
variable. Frequencies are presented as proportion in a sector of a circle.
Bar Graph is used to represent discrete data, so instead of being joined,
like in the histogram the bars are separated. The length of each
represents the frequency within the given class. The width of the bar is
arbitrary., however must be of the same width almost the same as the
histogram.
Frequency Polygon is a line chart. The frequency is placed along the
vertical axis and the individual variants are placed along the horizontal
axis. The values are attached to a line.
Ogive a graphical presentation of cumulative frequencies or relative
cumulative frequency. The vertical axis is the cumulative frequency or
relative cumulative frequency. The horizontal axis represent the variants.
The graph always start at zero , at the lowest variant and ends up at the
total frequency.
Pareto Graph is a bar chart for qualitative variable with the bars
arranged by frequency. The variants are on the horizontal axis and are
sorted from the highest importance to the lowest .
Stem and Leaf plot a device for presenting quantitative data in graphical
format, similar to a histogram, to assist in visualizing the shape of a
distribution.
To construct a stem - and – leaf display, the observations must first be sorted
in ascending order; this can be done most easily if working by hand by constructing
a draft of the stem – and –leaf display with the leaves unsorted , then sorting the
leaves to produce the final stem-and-leaf display.
Here is the sorted set of data values that will be used in the following
example:
44 46 47 49 63 64 66 68 68 72 72
75 76 81 84 88 106
In this example the leaf represents the ones place and the stem will represent the
rest of the numbers. The stem-and-left display is drawn with two columns separated
by a vertical line. The stems are listed to the left of the vertical line. It is important
that each stem is listed only once and that no numbers are skipped, even if it means
that some stem have no leaves. The leaves are listed in increasing order in a row to
the right of each stem.
Stem Leaf
4 4 6 7 9
5
6 3 4 6 8 8
7 2 2 5 6
8 1 4 8
9
10 6
D. Frequency Distribution
Frequency Distribution is an arrangement of data showing the frequency of
occurrence of the different values of the variable.
Frequency Distribution Table is the tabular arrangement of data by classes or
categories together with their corresponding frequencies.
Constructing Frequency Distribution Table
Supposed we have collected a raw data as shown below:
Given: 70 83 87 76 80 87 75 84 85
76 81 82 89 77 84 86 71 80
80 79 84 86 93 83 85 88 72
84 84 92
Steps
1. Find the Range ( R) of values. Get the difference of the highest value
(HV) and the lowest value (LV).
R = HV - LV
R = 93 - 70
R = 23
2. Determine the desired Class interval ( CI). The ideal number of class
intervals is somewhere between 5 and15 preferably odd class
intervals. But the more scientific way is applying the pattern :
C I = 3.33 + log n
= 3.33 + log 30
= 3.33 + 1.4771
= 4.81 or 5
3. Compute for Class Size ( i) . Divide the computed range (R ) by the
desired computed class interval (CI ).
i = R / CI
= 23 / 5 = 4.6 = 5
4. Construct a frequency table by making class intervals starting with the
lowest value in the lower limit of first class interval, then add the
computed class size (i) to obtain the lower limit of the next class
interval. Continue adding the class size on the lower limits until you
reach the desired class interval ( CI). Get the upper limit of each class
interval by subtracting one from the lower limit of the next class
interval.
5. Determine the number of data (frequency) for every class interval by
tallying the raw data.
6. Write the obtained frequency ( f) from each class interval by counting
the tallied form.
7. Determine the Class mark ( X ) of each class interval. Add the lower
limit (LL) and the upper limit (UL ) then divide the sum by 2 to get its
mid-point.
8. Determine the class boundary (CB) or class limit by subtracting 0.5
from every lower limits and adding 0.5 from every upper limits.
9. Determine the less than cumulative frequency ( < F ) and the greater
than cumulative frequency ( > F ). To determine the less than
cumulative frequencies, write the first class frequency ( f) under the
column ( < F ) and add the next class frequency of the next class
interval. From the cumulative sum, add again the third class frequency
to obtain the 3rd < F, continue performing the process until you reach
the last class interval. To determine the greater than cumulative
frequency, write the total number of data collected ( n ) under the
column > F. Subtract the second class frequency to determine the 3 rd >
F. Continue performing the operation until the last class interval is
reached.
10. Obtain the relative frequencies (RF) to determine the percentage
distribution of frequencies. Divide the class frequency ( f ) of each class
interval ( CI ) then multiply by 100.
Frequency Distribution table
Class
f X CB <F >F RF
Interval
70 - 74 3 72 69.5 - 74.5 3 30 3/30 x 100 = 10
5/30 x 100 =
75 - 79 5 77 74.5 - 79.5 8 27 16.67
80 - 84 12 82 79.5 - 84.5 20 22 12/30 x 100 = 40
8/30 x 100 =
85 - 89 8 87 84.5 - 89.5 28 10 26.67
2/30 x 100 =
90 - 94 2 92 89.5 - 94.5 30 2 6.67
Based from the table above, notice that 70 75 80 85 90 are called
lower limit ( LL ) and 74 79 84 89 94 are called upper limit.
Try to answer the following :
1. Which class has the greatest frequency ?
2. Which class has the least frequency?
3. What limits does 85 - 89 class interval have?
4. How many respondents got 80 and above?
5. How many respondents got 89 and below?
6. About how many percent belongs to 75 - 79 ?
7. What is the midpoint of 80 - 84?
Definition of Terms
Range (R ). It is determine by the difference of highest and
lowest values.
Class Interval (CI) – it is the grouping of category defined by a
lower limit and an upper limit.
Class Size (i) – refers to the quotient of the computed range
and class frequency of the desired class interval.
Class frequency (f) – refers to the number of observations
belonging to a class interval or the number of items within a
category.
Class Boundaries (CB) – the true limit which is situated
between the upper limit of one interval and the lower limit of the
next interval. These are more precise expressions of the class
limits by at least 0.5 of their values.
Class Mark ( X ) – refers to the midpoint if the acquired class
size. It is obtained by adding the lower and upper values divided
by 2.
Cumulative Frequency – the total number of observations that
have values less than or equal to specified amount.
Relative frequency (RF) – these are the percentage distribution
in every class interval.
Exercise 2
1. In each of the following, construct a complete frequency distribution table.
35 58 43 80 48 85 42 39 63 44 35
54 38 63 62 65 37 76 46 34 34 45
36 44 42 47 51 40 31 80 54 50 50
34 50
Find the following
1.1 Class size
1.2 Number of classes
1.3 Class mark of the 3rd class
1.4 Lower limit of the 4th class
1.5 Upper class boundary of the third class
1.6 Total number of frequency
1.7 Highest frequency
1.8 Class that comprise 30% if the distribution
1.9 Class with the highest frequency
1.10 Class boundary of the class with lowest frequency.
2. The grades given to you are the following:
84 81 74 92 80 88 98 79
82 85 97 82 89 84 86 91
85 87 95 90 90 84 93 92
88 85 86 90 86 89 88 91
88 98 96 94 83 92 95 87
From this data, prepare the following:
1. Stem- and – leaf display
2. Complete frequency distribution table using 5 class intervals
3. Histogram