Sampling
Understanding different sampling methods
Whenever a sociologist starts a piece of research, they always have in mind a group of people they
want to study. These people, whoever they may be, are known as the target population and are, in
effect, everyone in a particular group you would like to research.
When you conduct research about a group of people, it’s rarely possible to collect data from every
person in that group. Instead, you select a sample.
The sample is the group of individuals who will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide how you will select a
sample that is representative of the group as a whole. There are two types of sampling methods:
Probability sampling involves random selection, allowing you to make statistical
inferences about the whole group.
Non-probability sampling involves non-random selection based on convenience or other
criteria, allowing you to easily collect initial data.
Population vs sample
First, you need to identify the target population of your research.
The population is the entire group that you want to draw conclusions about. The
population can be defined in terms of geographical location, age, income, and many other
characteristics.
The sample is the specific group of individuals that you will collect data from.
It can be very broad or quite narrow: maybe you want to make inferences about the whole adult
population of your country; maybe your research focuses on customers of a certain company,
patients with a specific health condition, or students in a single school. It is important to carefully
define your target population according to the purpose and practicalities of your project.
If the population is very large, demographically mixed, and geographically dispersed, it might be
difficult to gain access to a representative sample.
Sampling Frame and Sampling Units
The sampling frame is the actual list of individuals that the sample will be drawn from. Ideally, it
should include the entire target population (and nobody who is not part of that population).
If we assume, for the sake of argument, that sociologists generally try to construct representative
samples, it follows that a researcher will normally need some way of identifying everyone in their
target population so that an accurate, representative, sample can be taken.
This list is called a sampling frame and the individuals, groups or other phenomena (you could,
for example, have a target population of magazines aimed at a particular age group that you are
going to sample) are known as sampling units.
Examples of sampling frames might be:
Electoral Roll / Register - provides a list of everyone eligible to vote.
School Registers - provide lists of children attending school.
Professional Membership Lists - organisations such as the British Medical Association
(BMA) keep a register of all doctors in Britain.
Company payrolls - provides a list of all employees in a company.
Example
You are doing research on working conditions at Company X. Your population is all 1000
employees of the company. Your sampling frame is the company’s HR database which lists
the names and contact details of every employee.
For most types of sampling (there are exceptions) a sampling frame is essential for two main
reasons:
1. If a researcher can't identify everyone in their target population it's unlikely that their
sample will be representative of that population.
2. If a researcher is to make contact with the people in their sample, they clearly need to know
who they are...
Sample size
The number of individuals in your sample depends on the size of the population, and on how
precisely you want the results to represent the population as a whole.
You can use a sample size calculator to determine how big your sample should be. In general, the
larger the sample size, the more accurately and confidently you can make inferences about the
whole population.
Probability sampling methods
Probability sampling means that every member of the population has a chance of being selected.
If you want to produce results that are representative of the whole population, you need to use a
probability sampling technique.
There are four main types of probability sample.
a) Simple random sampling
In a simple random sample, every member of the population has an equal chance of being selected.
Your sampling frame should include the whole population. Types of Sampling
1. Simple Random Sampling
This is one of the most basic (simple) forms of sampling, based on the probability that the random
selection of names from a sampling frame will produce a sample that is representative of a target
population. In this respect, a simple random sample is similar to a lottery:
• Everyone in the target population is identified on a sampling frame.
• The sample is selected by randomly choosing names from the frame until the sample is
complete.
For example, a 20% sample of a target population of 100 people would involve the random
selection of 20 people to be in the sample.
An example of a simple random sample you could easily construct would be to take the names of
every student in your class from the register, write all the names on separate pieces of paper and
put them in a box.
If you then draw out a certain percentage of names at random you will have constructed your
simple random sample…
To conduct this type of sampling, you can use tools like random number generators or other
techniques that are based entirely on chance.
Example
You want to select a simple random sample of 100 employees of Company X. You assign a number
to every employee in the company database from 1 to 1000, and use a random number generator
to select 100 numbers.
b) Systematic sampling
Systematic sampling is similar to simple random sampling, but it is usually slightly easier to
conduct. Every member of the population is listed with a number, but instead of randomly
generatin A variation on the above is to select the names for your sample systematically rather than
on a simple random basis. Thus, instead of putting all the names on your sampling frame
individually into a box, it's less trouble to select your sample from the sampling frame itself.
For example, if you were constructing a 20% sample of a target population containing 100 names,
a systematic sample would involve choosing every fourth name from your sampling frame.
A simple example of a systematic sample would be for you to use a class register to construct a
sample of students in your class.
You could try constructing a 50% sample of your class using this sampling technique.
This type of sampling technique tends to be used when the target population is very large.
For example, if you were going to select a 10% sample from a target population of 1 million people
you would either need a very large box and a lot of patience or a computer and some means of
getting the names in your target population into a program that would select your sample randomly
g numbers, individuals are chosen at regular intervals.
Example
All employees of the company are listed in alphabetical order. From the first 10 numbers, you
randomly select a starting point: number 6. From number 6 onwards, every 10th person on the list
is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people.
If you use this technique, it is important to make sure that there is no hidden pattern in the list that
might skew the sample. For example, if the HR database groups employees by team, and team
members are listed in order of seniority, there is a risk that your interval might skip over people in
junior roles, resulting in a sample that is skewed towards senior employees.
Systematic samples are near-random because if you have a list of 100 people, for example, and
you start to select a 20% sample beginning with the first name on your sampling frame, the second,
third and fourth names actually have no chance of being included in your sample...
When deciding which of these two types of sampling to use, their basic advantages and
disadvantages are very similar and we can summarise them in the following terms:
Uses:
1. Both are relatively quick and easy ways of selecting samples (if the target population is
reasonably small).
2. They are random / near random, which means that everyone in the target population has an
equal chance of appearing in the sample (this is not quite true of systematic sampling, but such
samples are “random enough” for most research purposes).
3. They are both reasonably inexpensive to construct. Both simply require a sampling frame
that is accurate for the target population.
4. Other than some means of identifying people in the target population (a name and address,
for example), the researcher does not require any other knowledge about this population (an idea
that will become more significant when we consider some other forms of sampling)
Limitations
1. The fact that these types of sample always need a sampling frame means that, in some
cases, it may not be possible to use these types of sampling. For example, a study into “underage
drinking” could not be based on a simple random or systematic sample because no sampling frame
exists for the target population.
2. In many cases a researcher will want to get the views of different categories of people
within a target population and it is not always certain that these types of sampling will produce a
sample that is representative of all shades of opinion.
For example, in a classroom it might be important to get the views of both the teacher and their
students about some aspect of education. A simple random / systematic sample may not include a
teacher because this category is likely to be a very small percentage of the overall class; there is a
high level of probability that the teacher would not be chosen for by any sample that is simply
based on chance…
One way of trying to overcome some of the potential limitations of simple random / systematic
sampling is to use an alternative sampling technique that avoids the problem of possible under-
representation, while retaining the idea of selection based on chance.
c) Stratified sampling
This sampling method is appropriate when the population has mixed characteristics, and you want
to ensure that every characteristic is proportionally represented in the sample.
You divide the population into subgroups (called strata) based on the relevant characteristic (e.g.
gender, age range, income bracket, job role).
From the overall proportions of the population, you calculate how many people should be sampled
from each subgroup. Then you use random or systematic sampling to select a sample from each
subgroup.
Example
The company has 800 female employees and 200 male employees. You want to ensure that the
sample reflects the gender balance of the company, so you sort the population into two strata based
on gender. Then you use random sampling on each group, selecting 80 women and 20 men, which
gives you a representative sample of 100 people.
The following outlines some of the uses and limitations of stratified random and stratified non-
random sampling.
Uses
1. This type of sampling ensures that known differences in the target population will be
accurately reflected in the sample. In basic terms, therefore, we can be sure that in terms of the
characteristics of our target population our sample will be broadly representative.
2. Stratified samples do not have to be very big, since it is possible, (using small samples that
are carefully stratified), to make certain that we have accurately reflected the make-up of our target
population.
3. Stratified samples, in particular, are usually relatively cheap and quick to construct
accurately.
Limitations
1. In order to stratify a sample the researcher must have accurate and up-to-date information
about the target population. This is not always available.
2. Even in situations when a researcher has accurate information about the different groups
that make-up the target population it is possible that this information may be out-of-date by the
time the research based on the sample is actually conducted. This is especially true where the
sample is large and complex and in situations where the composition of the target population may
change rapidly and consistently.
d) Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup should
have similar characteristics to the whole sample. Instead of sampling individuals from each
subgroup, you randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the
clusters themselves are large, you can also sample individuals from within each cluster using one
of the techniques above.
This form of sampling is usually done when a target population is spread over a wide geographic
area.
For example, an opinion poll into voting behaviour may involve a sample of 1000 people to
represent the 35 million people eligible to vote in a General Election. If a simple random sample
were taken it's possible that the researcher might have to poll 10 people in Newcastle, 15 people
in Cardiff, 3 people in Bournemouth and so forth. In other words, it would be a time-consuming
and very expensive process and the results from the poll would probably be out-of-date before the
poll could be finished.
To avoid these problems, a researcher can use a multi-stage / cluster sample that firstly, divides
the country into smaller units (in this example, electoral constituencies) and then into small units
within constituencies (for example, local boroughs). Boroughs could then be selected which, based
on past research, show a representative crosssection of voters and a sample of electors could be
taken from a relatively small number of boroughs across the country.
This method is good for dealing with large and dispersed populations, but there is more risk of
error in the sample, as there could be substantial differences between clusters. It’s difficult to
guarantee that the sampled clusters are really representative of the whole population.
Example
The company has offices in 10 cities across the country (all with roughly the same number of
employees in similar roles). You don’t have the capacity to travel to every office to collect your
data, so you use random sampling to select 3 offices – these are your clusters.
We can note the following uses and limitations with this type of sampling:
Uses
1. This type of sample saves the researcher time and money.
2. Once a relatively reliable sample has been established, the researcher can use the same or
a similar sample again and again (as with political opinion polling).
Limitations
1. Unless great care is taken by the researcher it is possible that the cluster samples will not
be representative of the target population.
2. Even though it is a relatively cheap form of sampling, this is not necessarily the case. A
sample that seeks to represent the whole of Britain, for example, is still going to be too expensive
for many researchers
Non-probability sampling methods
In a non-probability sample, individuals are selected based on non-random criteria, and not every
individual has a chance of being included. This type of sample is easier and cheaper to access, but
you can’t use it to make valid statistical inferences about the whole population.
Non-probability sampling techniques are often appropriate for exploratory and qualitative
research. In these types of research, the aim is not to test a hypothesis about a broad population,
but to develop an initial understanding of a small or under-researched population.
a) Convenience sampling
A convenience sample simply includes the individuals who happen to be most accessible to the
researcher. This is an easy and inexpensive way to gather initial data, but there is no way to tell if
the sample is representative of the population, so it can’t produce generalizable results.
Example
You are researching opinions about student support services in your university, so after each of
your classes, you ask your fellow students to complete a survey on the topic. This is a convenient
way to gather data, but as you only surveyed students taking the same classes as you at the same
level, the sample is not representative of all the students at your university.
b) . Voluntary response sampling
Similar to a convenience sample, a voluntary response sample is mainly based on ease of access.
Instead of the researcher choosing participants and directly contacting them, people volunteer
themselves (e.g. by responding to a public online survey).
Voluntary response samples are always at least somewhat biased, as some people will inherently
be more likely to volunteer than others.
Example
You send out the survey to all students at your university and a lot of students decide to complete
it. This can certainly give you some insight into the topic, but the people who responded are more
likely to be those who have strong opinions about the student support services, so you can’t be
sure that their opinions are representative of all students.
c) Purposive sampling
This type of sampling involves the researcher using their judgement to select a sample that is most
useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge
about a specific phenomenon rather than make statistical inferences. An effective purposive
sample must have clear criteria and rationale for inclusion.
Example
You want to know more about the opinions and experiences of disabled students at your university,
so you purposefully select a number of students with different support needs in order to gather a
varied range of data on their experiences with student services.
d) Snowball sampling
If the population is hard to access, snowball sampling can be used to recruit participants via other
participants. The number of people you have access to “snowballs” as you get in contact with more
people.
Just as a snowball rolling downhill gets larger and larger as it picks-up more snow, a “snowball
sample” picks-up more and more to be in the sample over time.
It is not always possible for a researcher to get hold of a sampling frame for a target population.
This may be because such a list doesn't exist or because someone who controls access to the list
will not release it to a researcher. Whatever the reason, it may still be possible to construct a sample
in a ad hoc (unsystematic) way.
As the name suggests, a snowball or opportunity sample involves the researcher identifying
someone in the target population who is willing to be researched.
This person may then suggests another 2 or 3 people (perhaps more) who will help. These people,
in turn, suggest another 2 or 3 people until, in a relatively short space of time, the researcher has a
sample they feel they can use in their research.
Clearly, this type of sampling is not going to produce a sample that is truly representative of a
target population, but it may be the best that can be achieved in certain situations.
If you use this type of sampling in your (project) work, please pay careful attention to the
limitations of this technique...
Example
You are researching experiences of homelessness in your city. Since there is no list of all homeless
people in the city, probability sampling isn’t possible. You meet one person who agrees to
participate in the research, and she puts you in contact with other homeless people that she knows
in the area.
We can note the following uses and limitations of this sampling technique.
Uses
1. This type of sampling enables a researcher to construct a sample in situations where it
would not be possible to do so using any other sampling technique.
2. It can be a relatively cheap and quick method of sampling.
Limitations
1. The sample is unlikely to be representative of a target population.
2. There is no way of checking whether or not your sample is representative.
3. There is a high likelihood of a self-selected sample being constructed (see below: Sampling
Errors