Chapter three
Sampling techniques and sample size determination
1
Sampling
A sample is “a smaller (but hopefully representative)
collection of units from a population used to
determine truths about that population” .
Why sample?
2
Why sample?
Cost in terms of money, time and manpower
Accessibility
Utility e.g. to do diagnostic laboratory test you
don’t draw the whole of patient’s blood.
A census is a sample consisting of the entire population.
Even though a census is not full proof, it gives detailed
information about every small area of the population.
It has the following disadvantages:
Expensive
Takes a long time
Cumbersome & therefore inaccurately done ( a careful sample
produces a more accurate data than a census.)
3
Sampling…..
Sampling is the process of selecting a representative sample
from populations.
It Selecting cases (elements)—or locating people (or other units of analysis)
—from a target population in order to study the population.
sampling
Sample
Inference
Population
4
Population Vs. Sample
Population of Interest
Population Sample
Sample
Parameter Statistic
We measure the sample using statistics in order to draw
inferences about the population and its parameters.
5
Advantage of sampling
We obtain a sample rather than a complete enumeration (a
census ) of the population for many reasons.
Feasibility it may be the only feasible method of
collecting data
Reduced cost sampling reduces demands on resource
such as finance, personal and material
Greater accuracy sampling may lead to better accuracy
of collecting data.
Greater speed data can be collected and summarized
more quickly
6
Disadvantage of Sampling
If sampling is biased, or not representative or too small the
conclusion may not be valid and reliable
If the population is very large and there are many sections and
subsections, the sampling procedure becomes very complicated
If the researcher does not possess the necessary skill and
technical knowledge in sampling procedure, then the outcome
will be devastated.
7
Key issue in sampling
Selecting the right number of the right people
To minimize sampling errors I.e. choosing the wrong
people by chance
8
Characteristics of Good Samples
3 factors that influence sample
representativeness
Sampling procedure
Sample size
Participation (response)
9
Basic Terms
Population (also called source population or target
population): is a group of individuals persons, objects, or
items from which samples are taken for measurement.
It refers to the entire group of individuals or objects to
which researchers are interested in generalizing the
conclusions.
10
Basic Terms cont’d…
Census: Obtained by collecting information about
each member of a population. Studying the whole
population and requires a great deals of time, money
and energy.
Sample survey: study sample and draw conclusions
about populations. It is cheaper in terms of cost,
practical & convenient in terms of technicalities,
saves time & energy.
11
Basic Terms cont’d…
Sampling Frame: is the list of people from which the
sample is taken. It is the list from which the potential
respondents are drawn.
It should be comprehensive, complete and up-to-date.
Examples of sampling frame: Electoral Register;
Postcode Address File; telephone book and so on.
Probability samples: With probability sampling methods, each
population element has a known (non-zero) chance of being
chosen for the sample.
12
Basic term cont’d….
Non-probability samples: With non-probability
sampling methods, we do not know the probability
that each population element will be chosen, and/or
we cannot be sure that each population element has a
non-zero chance of being chosen
Sampling unit - the unit of selection in the sampling
process
13
Basic term cont’d….
Sampling fraction (Sampling interval) - the
ratio of the number of units in the sample to
the number of units in the reference
population (N/n)
14
y
subj
ects
The
Hierarchy of sampling actu
al
parti
cipa
nts
Sample
in
Subjects
thewho are
selected
stud
y
Sampling Frame
The list of potential subjects from
which the sample is drawn
Source population
The Population from whom the study subjects
would be obtained
15
Target population
Characteristics Of A Good Sample Design
Sample design must result in a truly representative sample.
Sample design must be such which results in a small
sampling error.
Sample design must be viable in the context of funds
available for the research study.
Sample design must be such so that systematic bias can be
controlled in a better way.
Sample should be such that the results of the sample study
can be applied, in general, for the universe with a reasonable
level of confidence.
16
Errors in statistical Study
A sample is expected to mirror the population from which it
comes, however, there is no guarantee that any sample will be
precisely representative of the population.
No sample is the exact mirror image of the population .
Sampling or Random
Errors
Non-sampling or
systematic
17
1. Sampling error
random error- the sample selected is not
representative of the population due to chance
The uncertainty associated with an estimate that is based
on data gathered from a sample of the population rather
than the full population is known as sampling error.
Sampling errors are the random variations in the sample
estimates around the true population parameters.
18
Sampling error cont’d…
the level of it is controlled by sample size
a larger sample size leads to a smaller sampling error. it
decreases with the increase in the size of the sample,
and it happens to be of a smaller magnitude in case of
homogeneous population.
When n = N ⇒ sampling error = 0
Can not be avoided or totally eliminated
19
2. Non Sampling Error
It is a type of systematic error in the design or conduct of a
sampling procedure which results in distortion of the sample, so
that it is no longer representative of the reference population.
We can eliminate or reduce the non-sampling error (bias) by
careful design of the sampling procedure and not by increasing
the sample size.
It can occur whether the total study population or a sample is
being used.
20
Non-sampling Error……
o The basic types of non-sampling error
Non-response error
Response or data error
o A non-response error occurs when units selected as part of the
sampling procedure do not respond in whole or in part
If non-respondents are not different from those that did
respond, there is no non-response error
When non-respondents constitute a significant proportion of
the sample (about 15% or more
21
Non-sampling Error…….
o A response or data error is any systematic bias
that occurs during data collection, analysis or
interpretation
Respondent error (e.g., lying, forgetting, etc.)
Interviewer bias
Recording errors
Poorly designed questionnaires
22
Non-Sampling Error cont’d …
Systematic error makes survey results unrepresentative of the
target population by distorting the survey estimates in one
direction.
Random error can distort the results in any given direction but
tend to balance out on average
Thus, the total survey error
sampling error + non-sampling error
23
Types of Sampling Methods
Sampling Method
Non-Probability Probability Samples
Samples
Simple Stratified
Random
Quota
Judgemental
Systematic Cluster
Convenience
Multistage Random
Sampling
24
Probability Sampling Method …
The random ("equal chance“) and "independent" components of
random sampling are what makes us confident that the sample has a
reasonable chance of representing the population
What does it mean to be independent? The researchers select each
person for the study separately.
Let us say you were asked to participate in an experiment, enjoyed it,
and told your friends to contact the researcher to volunteer for the study.
This would be an example of non-independent sampling.
25
Probability Sampling Method cont’d …
In probability sampling
A sampling frame exists or can be compiled.
should have an equal or at least a known or nonzero chance
of being included in the sample.
Generalization is possible (from sample to population)
Simple Random Sampling,
Systematic Sampling,
Stratified Random Sampling,
Cluster Sampling
Multistage Sampling.
26
1. Simple Random Sampling(SRS)
Simple random sampling is the most straightforward of the
random sampling strategies.
To use SRS there should be
o sampling frame for the population
o All possible samples of “n” subjects are equally likely ( ) to occur.
o population is small, relatively homogeneous & readily available
27
Simple Random Sampling cont’d …
Procedures to select the sample
The specific procedures that you follow may vary depending
on your resources, but all involve some type of random
process. Depending on the complexity of the population, we
can use different tools to select “n” samples from the given
sampling frame.
These are lottery method,
table of random number (they are available in the appendix
of many research methods and statistics textbooks) or
computer generated random number.
28
Simple Random Sampling cont’d …
Lottery method is appropriate if the total population is not too
large, otherwise if the population is too large then it will be very
difficult to use lottery method.
Thus, table of random number or computer generated random
number is the feasible method to be used.
29
2. Systematic Random Sampling
Systematic sampling is thought as random, as long as the periodic interval is
determined beforehand and the starting point is random
A method of selecting sample members from a larger population according to
a random starting point and a fixed, periodic interval.
Typically, every nth member is selected from the total population for inclusion
in the sample population.
It is frequently chosen by researchers for its simplicity and its periodic
quality.
it needs the population to be homogeneous, however the method does not
require frame.
30
Steps in systematic sampling:
Define the population
Determine the desired sample size (n)
List the population from 1 to N
Determine K, where k=N/n
Select a random number between 1 and k, let us denote this number by “a”
Starting at a, take every Kth number on the list until the desired sample is
obtained.
Then the selected list will be
a, a+k, a+2k, a+3k, …, a+(n-1)k
Note: Systematic sampling should not used when a cyclic repetition is
inherent in the sampling frame
31
E.g. systematic sampling
• N = 1200, and n = 60
sampling fraction = 1200/60 = 20
• List persons from 1 to 1200
• Randomly select a number between 1 and 20
(e.g. 8)
• 1st person selected = the 8th on the list
• 2nd person = 8 + 20 = 28th list e.t.c.
32
Systematic sampling….
o It relies on arranging the target population according to some
ordering scheme and then selecting elements at regular
intervals through that ordered list.
o Systematic sampling involves a random start and then
proceeds with the selection of every kth element from then
onwards. In this case, k =(population size/sample size).
o It is important that the starting point is not automatically the
first in the list, but is instead randomly chosen from within
the first to the kth element in the list.
33
3. Stratified Random Sampling
Stratified random sampling is used when we have subgroups in
our population that are likely to differ substantially in their
responses or behavior (i.e. if the population is heterogeneous).
In stratified random sampling, the population is first divided into
a number of parts or 'strata' according to some characteristic,
chosen to be related to the major variables being studied.
For example, you are interested in visual-spatial reasoning and
previous research suggests that men and women will perform
differently on these types of task
34
Stratified Random Sampling cont’d…
So, you divide your sample into male and female members and
randomly select the required sample size within each subgroup
(or "stratum")
With this technique, you are guaranteed to have enough of each
subgroup for meaningful analysis.
Often we used simple random sampling to select a sample from
each strata after stratification.
35
Steps involve in stratified sampling method:
Define the population
Determine the desired sample size
Identify the variable and subgroups (strata) for which you want to guarantee
appropriate representation (either proportional or equal)
Classify all members of the population as a member of one of the identified
subgroups
Randomly select (using simple random sampling or others) an appropriate
number of individuals from each subgroup.
Then the total sample size will be the sum of all samples from each subgroup.
36
There are two methods to get the study subject from each subgroup,
proportional allocation or
equal allocation.
We use proportional allocation technique when our subgroups vary dramatically in size
in our population
Let N be total population and N1, N2 . . . . Nk be the subtotal population for strata 1, 2,
…. K respectively. Moreover let n be the total sample size and n1, n2…..nk be th
subsample for strata 1, 2…..k respectively in which N = N1 + N2 +….. …+ NK
and n = n1 + n2 + …………..+ nk
Then the subsample “ni “which will be selected from subgroup Ni can be computed by
37
The higher the population in the subgroup, the higher the
sample size will be.
However, equal allocation will be used if the total population
from each subgroup is approximately equal.
38
Advantage of stratified sampling over simple random sampling
The representativeness of the sample is improved. That is,
adequate representation of minority subgroups of interest can
be ensured by stratification and by varying the sampling
fraction between strata as required.
DEMERIT
Sampling frame for the entire population has to be prepared
separately for each stratum.
39
4. Cluster Random Sampling
In this sampling scheme, selection of the required sample is done on groups
of study units (clusters) instead of each study unit individually.
The sampling unit is a cluster, and the sampling frame is a list of these
clusters.
If the study covers wide geographical area, using the other methods will be
too costly.
The idea is, divided the total population in to different clusters and then the
unit of selection will be cluster.
Therefore, total population in the selected cluster will be taken as the sample.
40
Steps in cluster sampling are:
Define the population
Determine the desired sample size
Identify and define a logical cluster
Make a list of all clusters in the population
Estimate the average number of population number per cluster
Determine the number of clusters needed by dividing the sample
size by the estimated size of the cluster
Randomly select the required number of clusters (using table of
random number as the total number of clusters is manageable)
Include in the sample all population in the selected cluster.
41
Consider the following graphical display:
42
5. Multistage Random Sampling
This is the most complex sampling strategy.
The researcher combines simpler sampling methods to address sampling
needs in the most effective way of possible.
Example 1,
The administrator might begin with a cluster sample of all schools in the
district.
Then he might set up a stratified sampling process within clusters.
Within schools, the administrator could conduct a simple random sample
of classes or grades.
By combining various methods, researchers achieve a rich variety of
results useful in different contexts.
43
Non-Probability Sampling Method
In the presence of constraints to use probability sampling
strategies, the alternative sampling method is non-probability
sampling method.
Non-probability sampling strategies are used when it is
practically impossible to use probability sampling strategies.
Non-probability sampling is sampling procedure which does not
afford any basis for estimating the probability that each item in
the population has of being included in the sample.
44
Cont’d……….
Subjective units of population have a zero or unknown
probability of selection before drawing the as sample. Hence
obtained a non-representative samples.
Sampling error can not be computed
Survey results cannot be projected to the population
Advantages
Cheaper and faster than probability
Reasonably representative if collected in a thorough manner
45
1. Judgment Sampling/ Purposive sampling
The researcher selects the sample based on judgment. A researcher
exerts some effort in selecting a sample that seems to be
most appropriate for the study.
This is used primarily when there is a limited number of
people that have expertise in the area being researched
46
2. Convenience Sampling
Convenience sampling selects a particular group of people but
it does not come close to sampling all of a population.
The sample would generalize only to similar programs in
similar cities.
It looks just like cluster sampling.
The major difference is that the clusters of research
participants are selected by convenience rather than by a
random process.
47
Cont’d………..
Sometimes known as grab or opportunity sampling or
accidental or haphazard sampling.
A type of non probability sampling which involves the sample
being drawn from that part of the population which is close to
hand. That is, readily available and convenient.
The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample
because it would not be representative enough.
This type of sampling is most useful for pilot testing.
48
3. Quota sampling
It is a method that ensures a certain number of sample units
from different categories with specific characteristics are
represented. The investigator interviews as many people in
each category of study unit as he can find until he has filled his
quota.
It is the non-probability equivalent of stratified sampling. This
differs from stratified sampling, where the stratums are filled
by random sampling.
The population is first segmented into mutually exclusive sub-
groups, just as in stratified sampling.
49
Cont’d
Then judgment used to select subjects or units from
each segment based on a specified proportion.
For example, an interviewer may be told to sample
200 females and 300 males between the age of 45 and
60.
It is this second step which makes the technique one
of non-probability sampling.
50
Cont’d
In quota sampling the selection of the sample is non-random.
For example interviewers might be tempted to interview
those who look most helpful. The problem is that these
samples may be biased because not everyone gets a chance of
selection.
This random element is its greatest weakness and quota
versus probability has been a matter of controversy for many
years
51
Sample Size Determination
Determining the sample size for a study is a crucial component
of study to include sufficient numbers of subjects so that
statistically significant results can be detected.
"How large a sample do I need?“
The answer will depend on the aims, nature and scope of the
study and on the expected result. All of which should be
carefully considered at the planning stage.
52
Sample……
o If sample (“n”) is
Large
Increase accuracy
Costy / complex
Take
Optimum
Small sample
o Decrease accuracy
o Less costy
How ?
53
Factors to determine sample size
Size of population
Resources – subjects, financial, manpower
Method of Sampling- random, stratified
Degree of difference to be detected
Variability (S.D.) – pilot study, historical
Degree of Accuracy (or errors)
54
o Sample size determination depending on outcome variables.
There are three possible categories of outcome variables.
The first is where the variable of interest has only two
alternatives response: yes/no, dead/alive, vaccinated/not
vaccinated and so on.
The second category covers those outcome variable with
multiple, mutually exclusive alternatives responses, such as
marital status, religion, blood group and so on.
For these two categories of outcome variables, the data are
generally express as percentages or rates.
So we can use percentage to compute the sample size.
55
The third category covers continuous response variables
such as birth weight, age at first marriage, blood
pressure and cerium uric acid level, for which
numerical measurement are usually made.
In this case the data are summarize in the form of means
and standard deviations or their derivatives.
56
Sample Size………...
There are several approaches to determining the sample size.
Depending on the type of response variable, whether it is
categorical or continuous, we will have two sets of formulas.
The sample size determination formulas come from the formulas
for the maximum error of the estimates and is derived by solving
for n.
57
To estimate sample size using simple or systematic
random sampling, need to know:
oEstimate of the prevalence of the outcome
o Precision desired - Level of confidence (always use
95%)
oSize of total population
58
Sample size
Standard deviation () of the population: It is rare that a
researcher knows the exact standard deviation of the population.
Typically, the standard deviation of the population is estimated:
from the results of a previous survey,
from a pilot study,
from secondary data,
from judgment of the researcher.
59
Maximum acceptable difference (w): This is the maximum
amount of error that you are willing to accept.
Desired confidence level (Z/2 ) : is your level of certainty that
the sample mean does not differ from the true population mean
by more than the maximum acceptable difference. Commonly
we use a 95% confidence level.
Then the sample size determination formula for single
population mean is defined by:
60
Sample size for single population mean cont’d…
Where
α= The level of significance which can be obtain as 1-
confidence level.
σ=Standard deviation of the population
w= Maximum acceptable difference
z α/2 = The value under standard normal table for the
given value of confidence level
61
Incorrect sample size will lead to
o Wrong conclusions
o Poor quality research (Errors)
o Type II error can be minimized by increasing the sample size
o Waste of resources
o Loss of money
o Ethical problems
o Delay in completion
62