KEMBAR78
Chapter - Two | PDF | Histogram | Categorical Variable
0% found this document useful (0 votes)
35 views38 pages

Chapter - Two

Chapter Two discusses methods of data collection and presentation, highlighting primary and secondary sources of data. It outlines various data collection methods such as observation, questionnaires, and interviews, as well as presentation techniques including tabular and graphical formats. The chapter emphasizes the importance of organizing raw data for statistical analysis and provides guidelines for constructing frequency distributions and visual representations like histograms and pie charts.

Uploaded by

gesgisermias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views38 pages

Chapter - Two

Chapter Two discusses methods of data collection and presentation, highlighting primary and secondary sources of data. It outlines various data collection methods such as observation, questionnaires, and interviews, as well as presentation techniques including tabular and graphical formats. The chapter emphasizes the importance of organizing raw data for statistical analysis and provides guidelines for constructing frequency distributions and visual representations like histograms and pie charts.

Uploaded by

gesgisermias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter Two

2. Methods of Data
Collection and Presentation
2.1. Sources of Data
• There are two sources of data.
• These are Primary sources and Secondary sources.
• Primary sources of data are objects or persons from which
we collect the figures used for first hand information.
• The data obtained from sources are measurements observed or
recorded as a part of an original study or surveys being
conducted.
• Secondary sources are either published or unpublished
materials or records.
Cont …
• Secondary data can be literally defined as second-hand
information and data or
• Information that was either gathered by someone else (e.g.,
researchers, institutions, other NGOs, etc.)
Some of the sources of secondary data are

• Government Document, • Review Articles,

• Official Statistics, • Reference Books,

• Research Institutes, Universities,


• Technical Report,
• Hospitals,
• Scholarly Journals,
• Libraries,
• Trade Journals,
• Library Search Engines.
Before use of secondary data
investigator should examine:
• The type and objective of the situations.
• The purpose for which the data are collected and compatible with
the present problem.

• The nature and classification of data is appropriate to our problem.


• There are no biases and misreporting in the published data.
• Reliability, homogeneity, and completeness.
Two activities involved: planning and
measuring to collect a scientific
primary data
1.Planning:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
 If sampling is preferred, decide on sample size, selection method,… etc
 Decide measurement procedure.
 Set up the necessary organizational structure

2.Measuring: there are different options


 Telephone Interview  Laboratory experiment/experimental
 Mail Questionnaires design
 Personal Interview  Focus group discussion
 New Product Registration
Assignment: Please read/refer about advantage and disadvantage of methods of data collection
2.2. Methods of Data
Collection
1. Observation: involves recording the behavioral patterns of people,
objects and events in a systematic manner.

2. Questionnaire: Is a popular means of collecting data. A set of questions


are administered to respondent either physically or through mail (Email,
Postal, etc).

3. Interviewing: Interviews can be undertaken on a personal (face to face)


or via telephone (indirect method).

4. Extract from Records/Documentary Sources: is method of collecting


information (secondary data) from published or unpublished sources.
2.3. Methods of Data Presentation

• So far you know how to collect data. So what do we do with the collected
data next?

• Now you have to present the data you have collected. Thus, the collected
data also known as ‘raw data’ are always in an unorganized form.

• It needs to be organized and presented in a meaningful and readily


comprehensible form in order to facilitate further statistical analysis.

• The presentation of data is broadly classified in to the following two


categories:
 Tabular presentation
 Diagrammatic and Graphic presentation.
2.3.1. Tabular presentation of data

Tables are important to summarize large volume of data in more


understandable way.

Based on the characteristics they present tables are:

i. Simple (one way table): table which present one characteristics for
example age distribution.

ii. Two way table: it presents two characteristics in columns and rows for
example age versus sex.

iii. A higher order table: table which presents two or more characteristics in
one table.
Cont ….
In statistics usually we use frequency distribution table for different type of
data.

• Frequency: is the number of values in a specific class of the distribution.

• Frequency Distribution: is the organization of raw data in table form,


using classes and frequencies.

There are three basic types of frequency distributions

• Categorical frequency distribution


• Ungrouped frequency distribution
• Grouped frequency distribution
Categorical Frequency Distribution

• Used for data which can be placed in specific categories such as nominal or
ordinal level data.

• For example: marital status, political affiliation, religious affiliation, blood


type …
Steps of constructing categorical frequency
distribution

Step 1: You have to identify that the data is in nominal or ordinal scale of
measurement

Step 2: Make a table as show below

Step 3: Put distinct values of a data set in column A

Step 4: Tally the data and place the result in column B

Step 5: Count the tallies and place the results in column C

Step 6: Find the percentage of values in each class by using the formula
Where,f is frequency, and nis total number of values.
Example

Example 2.1: Twenty-five army inductees were given a blood test to


determine their blood type. The data set is given as follows:

Construct a frequency distribution for the above data.


Ungrouped Frequency Distribution

• Is a table of all the potential raw score values that could possible occur in
the data along with the number of times each actually occurred.

• Is often constructed for small set or data on discrete variable.


• The major components of this type of frequency distributions are Class,
tally, frequency, and cumulative frequency.
• Cumulative frequency (CF):- are used to show how many values are accumulated
up to and including a specific class.

• Less than Cumulative Frequency (LCF):-is the total sum of observations below
specified class including that class

• More than Cumulative frequency (MCF):- is the total sum of observations above
specified class including that class.
Constructing ungrouped frequency
distribution

• First find the smallest and largest raw score in the collected
data.

• Arrange the data in order of magnitude and count the


frequency.

• To facilitate counting one may include a column of tallies.


Example
Example 2.2: A demographer is interested in the number of children a family
may have, he/she took sample of 30 families and obtained the following
observations.

Number of children in a sample of 30 families

Construct a frequency distribution for this data.


Solution
• These individual observations can be arranged in ascending or descending
order of magnitude
• 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6,
7, 7, 8, 8, 8

• Frequency distribution of children in a 30 families is as follow:

Class Tally Frequency LCF MCF


2 ||||| 5 5 30
3 ||||| || 7 12 25
4 ||||| ||| 8 20 18
5 |||| 4 24 10
6 | 1 25 6
7 || 2 27 5
8 ||| 3 30 3
Grouped Frequency Distribution

• When several numbers are grouped in one class; the data must be
grouped in which each class has more than one unit in width.

• When the range of the data is large, and for data from continuous
variable.
Some of basic terms that are most frequently
used

• Upper Class Limits: are the largest number that can belong to the
different classes.

• Lower Class Limits: are the smallest number that can belong to the
different class.

• Class Boundaries: (true class limits) are the number used to


separate classes, but without the gaps created by class limits.

• Class mark (midpoints): are the midpoints of the classes.


• Class width: is the difference between two consecutive lower class
limits or two consecutive lower class boundaries.
Steps in constructing grouped frequency distribution

Step 1: Find the highest and the lowest values

Step 2: Find the range;

Step 3: Select the number of classes desired. Select the number of classes
arbitrarily between 5 and 20 or Use Struge’s rule. That is, where k is the
number of class desired and n is the number of observations.

Step 4: Find the class width (W) by dividing the range by the number of
classes

Note that: Round the value of W up to the nearest whole number if there is a
reminder. For instance, 4.7≈5 and 4.12≈5.
Cont …
Step 5: Select the starting point as the lowest class limit. This is usually the
lowest score (observation).

• Add the width to that score to get the lower class limit of the next class.
• Keep adding until you achieve the number of desired classes calculated in
step 3.

Step 6: Find the upper class limit; subtract unit of measurement(U) from the
lower class limit of the second class.

• Then add the width to each upper class limit to get all upper class limits.
Step 7: Unit of measurement: is the smallest value of difference between
consecutive observations or sometimes it is next value.

Note that: U=1 is the maximum value of unit of measurement.


Cont …
Step 8: Find the class boundaries and

Step 9: Tally the data and write the numerical values for tallies in the
frequency column.

Step 10: Find cumulative frequency (LCF and MCF)

Step 11: Find relative frequency or/and relative cumulative frequency.

• Relative frequency distribution enables us to understand the distribution of


the data and to compare different sets of data.
Example
Example 2.3: Consider the following set of data and construct
the frequency distribution.

Solution:
Cont …
Cont …
Cont …
Diagrammatic Presentation of the Data

• We have discussed the techniques of classification and tabulation that help us in


organizing the collected data in a meaningful fashion.

• However, this way of presentation of statistical data does not always prove to be
interesting to a layman.

• One of the most effective and interesting alternative way in which a statistical data
may be presented is through diagrams and graphs.

• The three most commonly used diagrammatic presentation for discrete as well as
qualitative data are:
• Pie charts
• Pictogram
• Bar charts
Pie Chart
• Pie chart can used to compare the relation between the whole and its
components.

• Pie chart is a circular diagram and the area of the sector of a circle is used
in pie chart.

• To construct a pie chart (sector diagram), we draw a circle with radius


(square root of the total).

• The total angle of the circle is .


Bar Charts

• The bar charts (simple bar chart, multiple bar charts) use vertical or
horizontal bins to represent the frequencies of a distribution.

• Simple Bar Chart is used to represents data involving only one variable
classified on spatial, quantitative or temporal basis.

• In simple bar chart, we make bars of equal width but variable length, i.e.
the magnitude of a quantity is represented by the height or length of the
bars.
Example
• Draw simple bar diagram to represent the profits of a bank for 5
years.
Multiple Bars
• When two or more interrelated series of data are depicted by a bar diagram,
then such a diagram is known as a multiple-bar diagram.

• Suppose we have export and import figures for a few years.

• We can display by two bars close to each other, one exports while the other
imports. Suitable where some comparison is involved
Graphical Presentation of Data

• Often we use graphical presentation form for continuous data type;

• results from the grouped frequency distribution and continuous variables


distributed over time.

A. Histogram

B. Frequency Polygon

C. O- give Graph
Procedures for constructing
statistical graphs:
• Draw and label the X and Y axes.

• Choose a suitable scale for the frequencies or cumulative frequencies and


label it on the Y axes.

• Represent the class boundaries for the histogram or ogive or the mid points
for the frequency polygon on the X axes.

• Plot the points.

• Draw the bars or lines to connect the points.


Histogram
• Histogram is a special type of bar graph in which the horizontal scale
represents classes of data values and the vertical scale represents
frequencies.

• The height of the bars correspond to the frequency values, and the drawn
adjacent to each other (without gaps).

• We can construct a histogram after we have first completed a frequency


distribution table for a data set.
Example
Example2.9: The histogram for the data in example 2.4 is

7.0

6.0
Frequency 5.0

4. 0

3.0

2.0

1.0

0.0 5.5 11.5 17.5 23.5 29.5 35.5 41.5


Class boundaries
Frequency Polygon
• A frequency polygon uses line segment connected to points located directly
above class midpoint values.

• The heights of the points correspond to the class frequencies, and the line
segments are extended to the left and right so that the graph begins and ends
on the horizontal axis with the same distance that the previous and next
midpoint would be located.
7.0
C Frequency polygon
o 6.0
m
.
f
5.0
r
e 4.0
q
u
e 3.0
n
c 2.0
y
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Midpoints
O-give Graph
• An o-give is a line that depicts cumulative frequencies.
• Note that the O-give uses class boundaries along the horizontal scale, and
graph begins with the lower boundary of the first class and ends with the upper
boundary of the last class.

• There are two type of O-give namely less than O-give and more than O-give.
End

Thank you!!!
Quiz (5%)
The investigator was interested in studying the marital status, which is
often grouped as Single(S), Married (M), Divorced (D), and Widowed
(W) of people in a certain town. The following data were obtained.
DSDDSWSDSSDDWMMSDDDWMSSWMDDM
DWDSSWDDSDSMWMDSDWDMSSDWWSSS
WSDMWSS
A. To which scale of measurement do these data belong?
B. Summarize the data by constructing the appropriate frequency
distribution
C. Present the data using the appropriate Graph/Diagrams.

You might also like