Statistical sampling procedures
Statistical inference relies on inductive projection of findings from the sample to
the population which is based on the assumption that the sample is
representative of the population from where it is drawn. This representativeness
is ensured by random sampling in which the individual units in the population
are selected by chance and at a given equal opportunity of being selected.
Definition of terms
Sample
A sample is a subset of the larger population or parent population. Alternatively,
a sample is a finite subset of individuals in a population selected for study. A
sample should represent the population adequately as the purpose is to infer the
characteristics of the population from the sample findings.
Parent Population
This is the group of individuals under study which may be finite or infinite.
Sample Statistic
This is the measure computed from the sample observation alone e.g. mean (ı̅)
and standard deviation (σ).
Sample Size
This is the total number of units in the sample.
Sampling Unit
This is the smallest unit in the selection process. This should be well defined in
every survey e.g. a household or an under-five in a community survey of
nutritional status or a
the pregnant woman studied in the community.
Sample Error
This is an index of the precision of the estimate obtained from a sample.
What is sampling?
In survey research, sampling is the process of using a subset of a population to
represent the whole population. Sampling allows large-scale research to be
carried out with a more realistic cost and time frame because it uses a smaller
number of individuals in the population with representative characteristics to
stand in for the whole.
Sampling is based on two principles viz.
1. Elimination of bias by ensuring that,
· All units have an equal chance of being selected.
· Every unit has a known chance of being selected.
· Probability sampling technique is used to select each unit.
2. Obtaining High Precision. Precision is the measure of how repeated
observations or outcomes conform to the others. It is also called the error margin.
It is the quantity obtained by multiplying the reliability factor by the standard
error of the mean.
Sources of Bias
1. Design defect. When the wrong study design is adopted the study will be
defective as bias is being introduced from the onset of the study.
2. Observer error. This may be inter-observer error where biases are introduced
by different persons whose observations or measurements are in variance. It may
also be intra observer error where the same person’s observations or
measurements on the same variable are in variance.
3. Instrument error. Biological measurements are also subject to variability. This
variability may be inherent to the instrument, peculiar to environmental factors
e.g. climates, variation within an individual, from one occasion to another, and
from one observer to another, etc. Variations in instruments can introduce errors
in results or outcomes. Therefore, to assess biological data we need statistical
techniques that will help us cope with such variability.
4. Communication problems. Poor communication, no response, incomplete or
misinformation may occur between observer and respondent.
Advantages of sampling
1. It reduces the cost of study.
2. Not all persons in the population are studied.
3. Material used is less.
4. Demand for personnel is less.
5. It guarantees quick results as a smaller population is studied.
6. Reduces time constraints.
7. Reduces error and enhances accuracy due to the smaller population
studied.
Limitations of sampling
1. Some people or units are excluded.
2. The sample mean may not be equal to the population mean i.e. ×ı ≠ μ
3. It may be difficult sometimes to select a sample that is representative of the
population.
4. In the human population, it is naturally easy to introduce bias (discrimination)
in sampling.
5. Some surveys may not fit into sampling as everyone has to be interviewed, e.g.
census.
Types of sampling
There are two major types of sampling methods: probability and non-probability
sampling.
Probability sampling, also known as random sampling, is a kind of sample
selection where randomization is used instead of deliberate choice. Each member
of the population has a known, non-zero chance of being selected.
Non-probability sampling techniques are where the researcher deliberately picks
items or individuals for the sample based on non-random factors such as
convenience, geographic availability, or costs.
Probability sampling methods
1. Simple random sampling
With simple random sampling, every element in the population has an equal
chance of being selected as part of the sample. It’s something like picking a
name out of a hat. Simple random sampling can be done by anonymizing the
population – e.g. by assigning each item or person in the population a number
and then picking numbers at random.
Advantages: Simple random sampling is easy to do and cheap. Designed to
ensure that every member of the population has an equal chance of being
selected, it reduces the risk of bias compared to non-random sampling.
Disadvantages: It can lead to unrepresentative groupings being picked by
chance due to lack of control
2. Systematic sampling
With systematic sampling, the random selection only applies to the first item
chosen. A rule then applies so that every nth item is picked afterward. The best
practice is to sort your list randomly to ensure that selections won’t be
accidentally clustered together. This is commonly achieved using a random
number generator. If that’s not available you might order your list alphabetically
by first name and then pick every fifth name to eliminate bias.
Advantages: Systematic sampling is efficient and straightforward, especially
when dealing with populations that have a clear order. It ensures a uniform
selection across the population.
Disadvantages: There’s a potential risk of introducing bias if there’s an
unrecognized pattern in the population that aligns with the sampling interval.
3. Stratified sampling
Stratified sampling involves random selection within predefined groups. It’s a
useful method for researchers wanting to determine what aspects of a sample
are highly correlated with what’s being measured. Samples are then subdivided
(stratified) in a way that makes sense for the research.
For example, you want to measure the height of students at a college where 80%
of students are female and 20% are male. We know that gender is highly
correlated with height, and if we took a simple random sample of 200 students
(out of the 2,000 who attend the college), we could by chance get 200 females
and not one male. This would bias our results and we would underestimate the
height of students overall. Instead, we could stratify by gender and make sure
that 20% of our sample (40 students) are male and 80% (160 students) are
female.
Advantage: Stratified sampling enhances the representation of all identified
subgroups within a population, leading to more accurate results in
heterogeneous populations.
Disadvantage: This method requires accurate knowledge about the
population’s stratification, and its design and execution can be more intricate
than other methods.
4. Cluster sampling
With cluster sampling, groups rather than individual units of the target
population are selected randomly for the sample. These might be pre-existing
groups, such as people in certain zip codes or students belonging to an
academic year.
Cluster sampling can be done by selecting the entire cluster, or in the case of
two-stage cluster sampling, by randomly selecting the cluster itself, and then
selecting at random again within the cluster.
Advantage: Cluster sampling is economically beneficial and logistically easier
when dealing with vast and geographically dispersed populations.
Disadvantage: Due to potential similarities within clusters, this method can
introduce a greater sampling error compared to other methods.
Non-probability sampling methods
The non-probability sampling methodology doesn’t offer the same bias-
removal benefits as probability sampling, but there are times when these types of
sampling are chosen for expediency or simplicity. Here are some forms of non-
probability sampling and how they work.
1. Convenience sampling
Elements in a sample are selected based on their accessibility and availability. If
you are doing a research survey and you work at a university, for example, a
convenience sample might consist of students or co-workers who happen to be
on campus with open schedules and are willing to take your questionnaire. This
kind of sample can have value, especially if it’s done as an early or preliminary
step, but significant bias will be introduced.
Advantage: Convenience sampling is the most straightforward method,
requiring minimal planning, making it quick to implement.
Disadvantage: Due to its non-random nature, the method is highly susceptible
to biases, and the results are often lacking in their application to the real world.
2. Quota sampling
Like the probability-based stratified sampling method, this approach aims to
achieve a spread across the target population by specifying who should be
recruited for a survey according to certain groups or criteria. For example, your
quota might include a certain number of males and a certain number of females.
Alternatively, you might want your samples to be at a specific income level or in
certain age brackets or ethnic groups.
Advantage: Quota sampling ensures certain subgroups are adequately
represented, making it great for when random sampling isn’t feasible but
representation is necessary.
Disadvantage: The selection within each quota is non-random and researchers’
discretion can influence the representation, which both strongly increase the risk
of bias.
3. Purposive sampling
Elements in the sample are chosen consciously by researchers based on their
knowledge and understanding of the research question at hand or their goals.
Also known as judgment sampling, this technique is unlikely to result in a
representative sample, but it is a quick and fairly easy way to get a range of
results or responses.
Advantage: Purposive sampling targets specific criteria or characteristics, making
it ideal for studies that require specialized participants or specific conditions.
Disadvantage: It’s highly subjective and based on researchers’ judgment,
which can introduce biases and limit the study’s real-world application.
4. Snowball or referral sampling
With this approach, people recruited to be part of a sample are asked to invite
those they know to take part, who are then asked to invite their friends and
family, and so on. The participation radiates through a community of connected
individuals like a snowball rolling downhill.
Advantage: Especially useful for hard-to-reach or secretive populations, snowball
sampling is effective for certain niche studies.
Disadvantage: The method can introduce bias due to the reliance on participant
referrals, and the choice of initial seeds can significantly influence the final
sample.
A structured sampling procedure
1) Define your research goals
If you aim to get a general sense of a larger group, simple random or stratified
sampling could be your best bet. For focused insights or studying unique
communities, snowball or purposive sampling might be more suitable.
2) Assess the nature of your population
The nature of the group you’re studying can guide your method. For a diverse
group with different categories, stratified sampling can ensure all segments are
covered. If they’re widely spread geographically, cluster sampling becomes
useful. If they’re arranged in a certain sequence or order, systematic sampling
might be effective.
3) Consider your constraints
Your available time, budget, and ease of accessing participants matter.
Convenience or quota sampling can be practical for quicker studies, but they
come with some trade-offs. If reaching everyone in your desired group is
challenging, snowball or purposive sampling can be more feasible.
4) Determine the reach of your findings
Decide if you want your findings to represent a much broader group. For a wider
representation, methods that include everyone fairly (like probability sampling)
are a good option. For specialized insights into specific groups, non-probability
sampling methods can be more suitable.
5) Avoid or reduce sampling errors and bias
Using a sample is a kind of shortcut. If you could ask every single person in a
population to take part in your study and have each of them reply, you’d have a
highly accurate (and very labor-intensive) project on your hands.
But since that’s not realistic, sampling offers a “good-enough” solution that
sacrifices some accuracy for the sake of practicality and ease. How much accuracy
you lose out on depends on how well you control for sampling error, non-
sampling error, and bias in your survey design.