Research Methods Chapter 7
Research Methods Chapter 7
The statistical investigation can take two forms. The researcher studies every unit of the field of
study (survey) and drive conclusion by computing the sum of all units. This type of survey is called
census survey. Or the researcher study only a unit in the field of survey and this type of survey is
called sample survey. In sample technique of survey some unit are taken as representative of the
whole field of domain and the conclusion of the sample is extended to the whole population. In
this chapter emphasis is given to the second form of investigation as it was commonly used in most
research works.
Before going to details and uses of sampling it is appropriate to be familiar with some basic
definitions concerning sampling
Population: Is the theoretically specified aggregation of survey elements from which the survey
sample is actually selected.
Sampling Frame: Is the list of elements from which the sample is drawn
Sampling: Is the process of using a small number or part of a larger population to make conclusion
about the whole population.
Element: Is unit from which information is collected and which provides the basis of analysis
E.g., when we work out certain measurement like, mean from a sample they are called
statistics. But when such measure describe the characteristic of the population, they are
called parameter(s)
The precision and accuracy of survey result are affected by the manner in which the sample has
been chosen. Strict attention must be paid to the planning of the sample. Regardless of the type of
project to be conducted, the process of selecting a sample follows well-defined activities.
The first thing that the sample plan must include is a definition of the population to be investigated.
This involves the following procedure
Defining population
Census Vs Sample
Sampling Design
Sample Size
The first thing the sample plan must include is a definition of the population to be investigated.
Defining the target population implies specifying the subject of the study. Specification of a
2
population involves identifying which elements (items) are included, as well as where and when.
If the research problem is not properly defined then defining population will be difficult
For example, a financial institution considering making a new type of loan plan available, might
acquire information from any one or all of the following groups-
Depositors who have borrowed money Designated bank For the last 12 months
All people who have borrowed money Specified geographic area For the last 12 months
Thus, the researcher must begin with careful specification of his population.
Once the population has been defined, the researcher must decide whether the survey is to be
conducted among all members of the population or only a subset of the population. That is, a
choice must be made between census and sample
Advantages of census
Reliability: Data derived through census are highly reliable. The only possible errors can
be due to computation the
Limitation of census
Excessive time and energy: Beside cost factor, census survey takes too long time and
consumes too much energy.
3
Need for sampling
The use of sample in research project has the objective of estimating; testing and making inference
about a population on the basis of information taken from the sample
Sampling can save time and money (it is economical than census). Sampling may enable more
accurate measurement, because sample study is generally conducted by trained and experienced
investigator. Sampling remains the only way when population contains infinitely many members.
It usually enables to estimate the sampling error and, thus, assists obtaining information concerning
some characteristics of the population.
If the choice of sample units is made with due care and the matter under survey is not
heterogeneous, the conclusion of the sample survey can have almost the same reliability as those
of census survey.
Sampling technique also enables researchers to obtain detailed study, as the number of sample
units is fairly small these can be studied intensively and elaborately.
Less accuracy: In comparison to census technique the conclusion derived from sample are
more liable to error. Therefore, sampling technique is less accurate than the census
technique.
Misleading conclusion: If the sample is not carefully selected or if samples are arbitrarily
selected, the conclusion derived from them will become misleading if extended to all
population.
Need for specialized knowledge: The sample technique can be successful only if a
competent and able scientist makes the selection.
A beginner researcher commonly asks himself when and where sampling technique is appropriate
to his study. Sampling technique is used under the following conditions.
4
Vast data: When the number of units is very large, sampling technique must be used.
Because it economize money, time and effort
When at most accuracy is not required: The sampling technique is very suitable in those
situations where 100% accuracy is not required, otherwise census technique is unavoidable.
When census is impossible: If we want to know the amount of mineral wealth in a country
we cannot dig all mines to discover and count. Rather we have to use the sampling
technique.
Homogeneity: If all units of the population are alike (similar) sampling technique is easy
to use.
Representative-ness: An ideal sample must represent adequately the whole population. It should
not lack a quality found in the whole population.
Adequacy: The number of units included in the sample should be sufficient to enable derivation
of conclusion applicable for the whole population. A sample having 10% of the whole population
can be considered.
Homogeneity: The element included in the sample must bear likeness with other element.
Operationally, sample design is the heart of sample planning. Specification of sample design
includes the method of selecting individual sample unit involves both theoretical and practical
considerations. Sample design should answer the following
What type of sample to use? Different types of samples are considered, examined and appropriate
sampling technique is selected.
5
What is the appropriate sample unit? Is a single element or group of elements of the defined
population are subjected to selection in the sample? Sampling unit can be
In actual practice the sample will be drawn from a list of population elements, which can be
different from target population that has been defined. Sample frame is the list of elements from
which the sample is drawn. It is a physical list of the population elements. Ideally the sample frame
should identify each population element once only once. It should not include elements not in the
defined population
The most widely used frame in survey research is a telephone directory. Using such a frame,
however, may lead to error arising from exclusion of:
Voluntary unlisted
Involuntary unlisted
Incomplete frame
Population
Frame
Too complete
Frame
6
Population
Complete frame
Frame
Population
How are refusals and non-response to be handled? The sample plan must include provision for
how refusals and non-response are to be handled. Whether additional sampling units are to be
chosen as replacement and if so, how these are to be selected. And the like should be planned
wellhead.
A researcher is worried about sample size because of the fact that sample size (number of elements
in sample) and precision of the study are directly related. The larger the sample size the higher is
the accuracy. The sample size determination is purely statistical activity, which needs statistical
knowledge. There are a number of sample size determination methods.
Personal judgments: The personal judgment and subjective decision of the researcher in some
cases can be used as a base to determine the size of the sample.
Budgetary approach is another way to determine the sample size. Under this approach the sample
size is determined by the available fund for the proposed study.
E.g., if cost of surveying of one individual or unit is 30 birr and if the total available fund
for survey is say 1800 birr , the sample size then will be determined as,
Sample size (n) = total budget of survey /Cost of unit survey, accordingly, the sample
size will be 60 units (1800 / 30 = 60 units)
Traditional inferences: This is based on precision rate and confidence level. To estimate sample
size using this approach we need to have information about the estimated variance of the
population, the magnitude of acceptable error and the confidence interval
7
Variance or heterogeneity of the population: It refers to the standard deviation of the
population parameter. The sample size depends up on the variance of the population. If the
population is similar (homogenous) small sample size can be enough.
E.g., Predicting the average age of college students Vs predicting average age of
people visiting a given supermarket at a given day.
If information about variance is not available a researcher is expected to estimate it. Estimation of
the variance or standard deviation is not an easy undertaking. The researcher can carry out either
pilot study for the purpose of estimating the population standard deviation or he can use the rule
of the thumb. According to the rule of the thumb standard devotion is one-sixth of the range
E.g., If the households yearly average income is expected to range between 1500
and 24000 birr, using the rule of the thumb the standard deviation will be
1/6(22500) = 3750 hence range equal 22500 (24000-1500)
Magnitude of acceptable error: The magnitude of error (range of possible error) indicates
how precise the study must be. It is acceptable error for that study. The researcher makes
subjective judgment about the desired magnitude of error.
E.g., to estimate the average income of household one may allow an error says 50
Confidence interval: In most case (research) 95% confidence level is used. That is, it is
assumed that 95 times out of 100 the estimate from sample will include the population
parameter.
Once the above concepts are understood and determined the size of sample is quite simple to
determine. It is determined based on the following relationship.
8
E.g., the household yearly income expected to range from 1000 – 25000. The SD based on rule
of thumb, range = 24000 *1/6 = 4000
I.e. Z1 = 1.96
E = 20
S = 200
Bayesian Statistics. This is the selection of the sample size, which maximizes the difference
between the expected value of information (EVI) and cost of sampling. That is, marginal cost
of information (MCI) should be equal to Marginal value of information (MVI).
Determination of optimum sample size requires comparing the weighted cost of additional
information against additional expected value of information.
The main reason why a large sample size is desired is that sample size is related to random
sampling error,
v. Cost of Sampling
The sample plan must take into account the estimated cost of sampling. Such costs are of two
types, overhead costs and, variable costs. In reality however, it may be difficult and even for some
people not reasonable to separate sampling cost from over all study cost.
The last step in sample planning is the execution of the sample process (procedure). In short the
sample is actually chosen. The actual requirement for sampling procedure
Sampling techniques are basically of two types namely, non-probability sampling and probability
sampling.
8.2.1. Non-probability
Non-probability sampling technique does not give equal chance that each element of the
population will be included in the sample. Units are selected at the discretion of the researcher.
Such samples derive their control from the judgment of the researcher. Some of the disadvantages
of non-probability sampling are of the following:
No confidence can be placed in the data obtained from such samples; they don't represent
the large population. Therefore, the result obtained may not be generalized for the entire
population.
Sometimes such samples are based on an absolute frame, which does not adequately cover
the population.
The advantages of non-probability sampling on the other hand is that it is much less complicated,
less expensive, and a researcher may take the advantage of the available respondents without the
statistical complexity of the probability sampling. Moreover it is very convenient in the situation
when the sample to be selected is very small and the researcher wants to get some idea of the
population characteristics
10
Non-probability sampling can be adequate if the researcher has no desire to generalize his findings
beyond the sample, or if the study is merely a trial run for larger study (in preliminary research).
Quota Sampling
Judgment sampling
Snowball sampling
Convenience sampling
1. Quota sampling
Under this sampling approach, the interviewers are simply given quotas to be full-filled from the
different strata (groups).
E.g., an interviewer in a particular city may be assigned say 100 interviews. He will
assign this to different subgroups (say 50 for male respondents and 50 for female
respondents).
Even though quota sampling is not probabilistic, the researcher must take precaution to keep from
biasing selection and makes sure that the sample is as representative and generalize-able as
possible.
In this approach the investigator has complete freedom in choosing his sample according to his
wishes and desire. The experienced individual (researcher) select the sample based upon his
judgment about some appropriate characteristics required from the sample members
The intent is to select elements that are believed to be typical or representative of the population
in such a way that error of judgment in the selection will cancel each other out. The researcher
selects a sample to serve a specific purpose, even if this makes a sample less than fully
representative.
The Consumers Price Index (CPI) is based on a judgment sampling. That is,
based on prices of basket of goods and services purchased by average households.
11
The key assumption underling in this type of sampling is that, with sound judgment of expertise
and an appropriate strategy, one can carefully and consciously choose the element to be included
in the sample. Its advantage is its low cost, convenient to use, less time-consuming, and as good
as probability sampling.
However, its value depends on entirely on the expert judgment of the researcher
Weakness of this approach is that without an objective basis for making the judgment or without
an external check, there is no way to know whether the so-called typical cases are, in-fact, typical
and its value is entirely depends on the judgment of the researcher.
3. Snowball Sampling
The term snowball comes from the analogy of the snowball, beginning small but becomes bigger
and bigger as it rolls downhill. Snowball sampling is popular among scholars conducting
observational research and in community study.
The major purpose of snowball sampling is to estimate characteristics that are rare in the total
population. First initial respondents are selected randomly but additional respondent are then
obtained from referrals or by other information provided by the initial respondent.
E.g., consider a researcher use telephone to obtain referral. Random telephone calls are
made; the respondents (answering the call) are asked if they know someone else who meets
the studies respondent qualification. Like “whether they know the someone who survived
the September eleven terrorist attack in New York “
SAY,
A researcher wants to study the impact of the September Eleven Terrorist attack on the social
life and life style of the survivals.
Major advantages of this type sampling are that it substantially increases the probability of finding
the desired characteristic in the population and lower sampling variance and cost.
4. Convenience Sampling
This is a "hit or miss" procedure of study. No planned effort is made to collect information. The
researcher comes across certain people and things and has transaction with them then he tries to
12
make generalization about the whole population. This sampling technique is not scientific and has
no value as a research technique. However, as it is characterized by "hit or miss" method
sometimes hits are secured. In general, the availability and willingness to respond are the major
factors in selecting the respondents. Commonly such a sample is taken to test ideas or even to gain
ideas about a subject of interest.
All probability samples are based on chance selection procedures. Chance selection eliminates the
bias inherent in the non-probability sampling procedure, because this process is random.
The procedure of randomization should not be thought as unplanned or unscientific. It is rather the
basis of all probability sampling technique.
Probability sampling is the most preferred type of sampling because of the following
characteristics
The sample units are not selected based on the desecration of the researcher
Each unit of the population has some known probability of entering the sample
The processes of sampling is automatic in one or more steps of selection of units in the
sample
There are number of probability sampling some of them are discussed bellow
Systematic Sampling
Stratified Sampling
Cluster Sampling
Multi-stage Sampling
It is the basic sampling method in every statistical computation. Each element in the population
has an equal chance of being included in the sample. It is drawn by a random procedure from a
13
sample frame. Drawing names from a hat is a typical simple random sampling technique. The
sampling process is simple because it requires only one stage of sample selection.
Selecting random sample is made in such a way that. Each element in the sample frame is assigned
a number. Then each number is written on separate pieces of paper, properly mixed and one is
selected. If say the sample size is 45, then the selection procedure is repeated 45 times. When the
population is consists of a large number of elements table of random digits or computer generated
random numbers are utilized.
2. Systematic Sampling
Systematic sampling involves only a slight difference from simple random sampling. The
mechanics of taking a systematic sample are rather simple. If the population contains N ordered
elements, and sample size of n is required or desired to select, then we find the ratio of these two
numbers, i.e., N/n to obtain the sampling interval.
E.g., Say the population size N= 600 and the desired sample size is 60 (n = 60),
then the sample interval will be 600/60 = 10
Random number at the 10 interval will be selected, i.e., if the researcher starts
from the fourth element then 4th, 14th, 24th etc, elements will be selected.
Systematic sampling assumes that the population elements are ordered in the same fashion (like
names in the telephone directory). Some types of ordering, such as an alphabetic listing, will
usually be uncorrelated with the characteristics (say income family size) to be investigated. If the
arrangement of the elements of the sample is itself random with regard to the characteristics under
study, systematic sampling will tend to give result close to those provided by simple random
sampling. We say close for the reason that, in systematic sampling all elements of the population
do not have the same or equal chance of being included. Systematic sampling may increase
representative-ness when items are ordered with regard to the characteristics of interest
E.g., if the populations of customer group are ordered by decreasing order of purchase
volume, a systematic sample will be sure to contain some high-volume and some low-
volume customers.
The problem of periodicity occurs if a list has a systematic pattern, that is, if the list is not random
in character (like cyclical or seasonal pattern).
14
E.g., consider collecting retail store- sale volume, if the researcher is to choose a
sampling interval of seven days, his choice of day can result in sampling that would not
reflect day-off- the week variation in sale.
2. Stratified Sampling
This method of sampling is a mixture of deliberate and random sampling technique. If population
from which the sample to be drawn does not constitute a homogeneous group, stratified sampling
technique is used in order to obtain a representative sample. Under this technique, the population
is divided into various classes or sub-population, which is individually more homogeneous than
the total population. The different sub-populations are called strata. Then certain items (elements)
are selected from the classes by the random sampling technique. Since each stratum is more
homogeneous than the total population, we are able to get more precise estimate for each stratum.
By estimating more accurately each of the component parts of population (sub population), we get
a better estimate of the whole population. In other words the population will be broken into
different strata based on one or more characteristics say, frequency of purchase of a product or
types of customers (credit card versus non-credit card), or the industry. Thus, we will have strata
of customers, strata of industry etc.
First we shall split the whole male population in the town into various strata on the basis
of, say special professions like:
Business men
Shop keepers
From these different groups the researcher will select elements using random sample technique.
15
We can say that strata can be formed on the basis of common characteristics of the items (elements)
to be put in each stratum. Various strata are formed in such a way as to ensure element being more
homogeneous with in each stratum.
Thus, strata are purposively formed and are usually based on past experience and personal
judgment of the researcher.
The usual method for selection of items for the sample from each stratum is that of simple random
sampling. Systematic sampling can also be used if it is considered more appropriate in certain
situation.
Stratified sample size can be made proportionate to its size in which case the sample that is drawn
from each stratum is made proportionate to the relative size of that stratum.
Stratified sample size can also be made disproportionate to its size. That is, the sample size from
each stratum is made based on other circumstance such as based on the relative variance of stratum.
Here we take large sample size from more variable strata (heterogeneous).
Where 1 2, 3, …k denote the standard deviation of the k strata, N1 , N2, N3…Nk the size of
the k strata, ni denote the sample size of the k strata and n the total sample size.
The entire population is first divided into a set of strata (sub-population groups), using
some external sources, such as census data
From each separate sample, some statistics (mean) is computed and properly weighted
to form an overall estimated mean for the whole population
16
Sample variances are also computed within each separate stratum and appropriately
weighted to yield a combined estimate for the whole population.
4. Cluster sampling
This technique will sample economically while retaining the characteristics of a probability
sampling. In cluster sampling the primary sampling unit is no more the individual elements in the
population rather it is say manufacturing unit, city or block of city, etc.
After randomly selecting the primary sample unit (city, part of city), we survey or interview all
families or elements in that selected primary sample unit. The area sample is the commonly used
type of cluster sampling.
Now using a cluster sampling, we would consider the 400 cases as clusters. From this
cluster we randomly select say n cases and examine all the machine-parts in each
randomly selected case.
Cluster sampling clearly will reduce costs by concentrating survey in selected cluster. But it is less
precise than random sampling. Cluster sampling is used only because of the economic advantage
it possesses.
5. Multi-stage sampling
Items are selected in different stage at random. Multi stage sampling is a further improvement over
cluster sampling.
E.g., If we wish to estimate say yield per hectare of a given crop say coffee in Jimma
zone. We begin by random selection of say 5 districts in the first instance.
Of these 5 districts, 10 villages per district will be chosen in the same manner. In final
stage we will select again randomly 5 farms from every village. Thus, we shall examine
per hectare yield in a total of 250 farms all over that region.
Zone or region
17
District (5) first stage
There are two advantages of this sampling technique. It is easier to administer than most sampling
technique. A large number of units can be sampled for a given cost because of sequential
clustering, whereas this is not possible in most sample design.
Multi-stage sampling is relatively convenient, less time consuming and less expensive method of
sampling. However, an element of sampling bias gets introduced because of unequal size of some
of the selected sub-sample. This method is recommended only when it would be practical to draw
a sample with a simple random sampling technique.
Sampling study subjected to sampling and non-sampling errors, which are of random and/or of a
constant in nature. The error created .due to sampling and of which the average magnitude be
determined are called sampling error, while others are called sampling bias.
Sampling error is the difference between the result of a sample and the result of census. It is the
difference between the sample estimation and the actual value of the population.
These are errors that are created because of the chance only. Although the sample is properly
selected, there will be some difference between the sample statistics and the actual value
(population parameter). The mean of the sample might be different from the population mean by
chance alone. The standard deviation of the sample might also be different from the population
standard deviation. Therefore, we can expect some difference between the sample statistics and
the population parameter. This difference is known as sampling error. To illustrate this let us take
a very simple example. Suppose an individual student has scored the following grades in 10
subjects (Consider these subjects as population); 55, 60, 65, 90, 55, 75, 88, 45, 85, 82. Say, a
18
sample of four grades 55, 65, 82, and 90 are selected at random from this population to estimate
the average grade of this student. The mean of this sample is 73. But the population mean is 70.
The sampling error is therefore, 73 - 70 = 3. However, the variation due to random fluctuation
(sampling error) decreases as the sample size increases though it is not possible to completely
avoid sampling error.
Systematic sampling is also called sampling bias. Such error can be created from errors in the
sampling procedure, and it cannot be reduced or eliminate by increasing the sample size. Such
error occurs because of human mistakes and not chance variation. The possible factors that
contribute to the creation of such error include inappropriate sampling frame, accessibility bias,
defective measuring device, and non-response bias or defects in data collection.
1. Inappropriate sampling: If the sample units are a misrepresentation of the population; it will
result in sample bias. This could happen when a researcher gathers data from a sample that was
drawn from some favored locations. It occurs when there is a failure of all units in the population
to have some probability of being selected for the sample.
3. Accessibility bias: In many research studies, researchers tend to select respondents who are
the most accusable to them. When all members of the population are not equally accessible,
the researcher must provide some mechanism of controlling in order to ensure the absence of
over and under-representation of some respondents.
Total error is usually measured as total error variance, also known as mean square (MSE) 2
2
For more information refer Zikmund (1998)
19
(TE) 2 = (SE) 2 + (NE) 2
Generally, non-sampling errors occur in a sample survey as well as in census survey whereas the
sampling error occurs only in a sample survey. Preparing the survey questionnaire and handling
the data properly can minimize non-sampling error.
If a sample is taken from a normal (normally distributed) population N (, p) the sample
distribution of mean would also be normal with x = and standard deviation = n1/2* p
If sample is from a normal population, the means of samples drawn from such a population are
themselves normally distributed
When sample is not from a normal population, the size of the sample plays critical role. When n
is small, the shape of the distribution will depend largely on the shape of the parent population.
But when as n gets larger (n30), the shape of the sample distribution will become more and more
like a normal distribution.
The theorem that explains this sort of relationship between the shape of the population distribution
and the sample distribution of the mean is known as THE CENTRAL LIMTE THEOREM. This
theorem assures that the sample distribution of the mean approaches normal distribution as the
sample size increases.
The significance of the central limit theorem lies in that it permits us to use sample statistics
inference about population parameters without knowing anything about the shape of the frequency
distribution of that population other than what we get from the sample.
Sampling theory
Sampling theory is the study of the relationship existing between a population and sample drawn
from the population. Sample theory is applicable only to random samples. The theory of sampling
20
is concerned with estimating the property of the population from those of the samples and also
with gauging the precision of the estimate.
This sort of movement from particular (sample) towards general (population) is what is known as
statistical induction or statistical inference. In simple word from the sample we attempt to draw
inference concerning the population.
In order to be able to follow this inductive method, we first follow a deductive argument that is we
imagine a population and investigate the behavior of the sample drawn from this population
applying the law of probability
The methodology dealing with all this is known as sampling theory. Sampling theory is design to
attain one or more of the following objectives
Statistical inference: Sampling theory helps in making generalization about the population
from the studies based on samples drawn from it. It also helps in determining the accuracy
of such generalization
21