KEMBAR78
Cluster Sampling Techniques Explained | PDF | Sampling (Statistics) | Bias Of An Estimator
0% found this document useful (0 votes)
49 views8 pages

Cluster Sampling Techniques Explained

Cluster sampling is a method used when a complete list of population elements is not available. It involves dividing the population into clusters and sampling clusters rather than individual elements. Key aspects include: 1. The population is divided into clusters according to a defined rule, with clusters treated as sampling units. 2. A sample of clusters is selected using a procedure like simple random sampling. 3. All elements within selected clusters are enumerated to collect data. This method is easier and cheaper than sampling individual elements when cluster listings are available but not element listings. It introduces bias if clusters are constructed such that elements are heterogeneous within clusters.

Uploaded by

ASHUTOSH GAURAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views8 pages

Cluster Sampling Techniques Explained

Cluster sampling is a method used when a complete list of population elements is not available. It involves dividing the population into clusters and sampling clusters rather than individual elements. Key aspects include: 1. The population is divided into clusters according to a defined rule, with clusters treated as sampling units. 2. A sample of clusters is selected using a procedure like simple random sampling. 3. All elements within selected clusters are enumerated to collect data. This method is easier and cheaper than sampling individual elements when cluster listings are available but not element listings. It introduces bias if clusters are constructed such that elements are heterogeneous within clusters.

Uploaded by

ASHUTOSH GAURAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Cluster Sampling

It is one of the basic assumptions in any sampling procedure that the population can be divided into a finite
number of distinct and identifiable units, called sampling units. The smallest units into which the population
can be divided are called elements of the population. The groups of such elements are called clusters.
In many practical situations and many types of populations, a list of elements is not available and so the use
of an element as a sampling unit is not feasible. The method of cluster sampling can be used in such
situations.
In cluster sampling
- Divide the whole population into clusters according to some well-defined rule.
- Treat the clusters as sampling units.
- Choose a sample of clusters according to some procedure.
- Carry out a complete enumeration of the selected clusters, i.e., collect information on all the sampling
units available in selected clusters.
Examples:
1. In a city, the list of all the individual persons staying in the houses may be difficult to obtain or even may
be not available but a list of all the houses in the city may be available. So every individual person will be
treated as sampling unit and every house will be a cluster.
2. The list of all the agricultural farms in a village or a district may not be easily available but the list of
village or districts are generally available. In this case, every farm is sampling unit and every village or
district is the cluster.
Moreover, it is easier, faster, cheaper and convenient to collect information on clusters rather than on
sampling units.
In both the examples, draw a sample of clusters from houses/villages and then collect the observations on
all the sampling units available in the selected clusters.

Conditions under which the cluster sampling is used:


Cluster sampling is preferred when
(i) No reliable listing of elements is available, and it is expensive to prepare it.
(ii) Even if the list of elements is available, the location or identification of the units may be difficult.
(iii) A necessary condition for the validity of this procedure is that every unit of the population under study
must correspond to one and only one unit of the cluster so that the total number of sampling units in the
frame may cover all the units of the population under study without any omission or duplication. When
this condition is not satisfied, bias is introduced.

Page 1
Construction of clusters:
The clusters are constructed such that the sampling units are heterogeneous within the clusters and
homogeneous among the clusters. This is opposite to the construction of the strata in the stratified sampling.
There are two options to construct the clusters – equal size and unequal size. We discuss the estimation of
population means and its variance in both the cases.

Case of equal clusters


1. Suppose the population is divided into N clusters and each cluster is of size M.
2. Select a sample of n clusters from N clusters by the method of SRSWOR.
So
Total population size = NM
Total sample size = nM.
Let
y ij th th
: Value of the characteristic under study for the value of j element ( j = 1, 2,…,M ) in the i
cluster ( i = 1, 2,…,N ).
M
1
ȳ i= ∑ y ij th
M j=1 mean per element of i cluster.

Page 2
Population (NM units)

Population
Cluster Cluster Cluster N clusters
M units M units ……… M units

N Clusters

Sample
Cluster Cluster Cluster n clusters
M units M units ……… M units

n Clusters

Page 3
Estimation of population mean:
First select n clusters from N clusters by SRSWOR. Based on n clusters, find the mean of each cluster
separately based on all the units in every cluster. So we have the cluster means as ȳ 1 , ȳ 2 ,..., ȳ n . Consider the
mean of all such cluster means as an estimator of population mean Ȳ as:
n
1
ȳ cl= ∑ ȳ i
n i=1
Bias
n
1
E( ȳ cl )= ∑ E ( ȳi )
n i=1
n
1
∑ Ȳ
n i=1
= (since SRSWOR is used)
= Ȳ
Thus ȳ cl is an unbiased estimator of population mean Ȳ .

Variance:

The variance of ȳ cl can be derived on the same lines as deriving the variance of sample mean in SRSWOR.
The only difference is that in SRSWOR, the sampling units are y 1 , y 2 ,..., y n whereas in case of ȳ cl , the
sampling units are ȳ 1 , ȳ 2 ,..., ȳ n .
N −n 2 N−n 2
Var ( ȳ )= S V a^ r ( ȳ )= s
[Note that in case of SRSWOR, Nn and Nn ]
2
Hence Var ( ȳ cl )=E( ȳ cl− Ȳ )
N −n 2
S
= Nn b ,
N
1
S 2b = ∑ ( ȳ i−Ȳ )2
where N−1 i=1 which is the mean sum of square between the cluster means in the population.
Case of unequal clusters:
In practice, the equal sizes of clusters are available only when planned. In real applications, it is hard to get
clusters of equal size. For example, the villages with equal areas are difficult to find, the districts with same
number of persons are difficult to find, the number of members in a household may not be same in each
household in a given area.

Mi th
Let there be N clusters and be the size of i cluster, let
N
M 0 =∑ M i
i=1

N
1
M̄ = ∑ Mi
N i=1

Mi
1
ȳ i= ∑ y ij ; mean of ith cluster
M i j=1

N Mi
1
Ȳ = ∑ y ij
M0 ∑
i=1 j=1

N
Mi
Ȳ =∑ ȳ
i=1 M0 i

N
1 Mi
Ȳ = ∑ ȳ
N i =1 M̄ i

Suppose that n clusters are selected with SRSWOR and all the elements in these selected clusters are surveyed.
Assume that M i ' s(i=1,2, . .. , N ) are known.
Population

Population
N clusters

Cluster Cluster Cluster


M1 M2 ……… MN
units units units

N Clusters

Cluster Cluster Cluster


M1 M2 ……… Mn Sample
units units units n clusters

n Clusters

Based on this scheme, several estimators can be obtained to estimate the population mean. We consider some
estimator as.

1. Mean of cluster means:


Consider the simple arithmetic mean of the cluster means as:
n
1
¯ȳ c = ∑ ȳ
n i=1 i

N
1
E( ¯ȳ c )= ∑ ȳ i
N i=1
N
Mi
Ȳ =∑ ȳ
¿ Ȳ (where i=1 M0 i
).
The bias of ȳ c is
¯
Bias( ¯ȳ c )=E ( ¯ȳ c )−Ȳ
N N
1 M
∑ ȳ i−∑ i ȳ i
= N i=1 i=1 M 0

N
M0 N

=

1
M0 [ ∑ M i ȳ i −
i=1
∑ ȳ
N i=1 i ]
N N

=

1
M0 [ N
∑ M i ȳ i −
i=1
(∑ )(∑ )
i=1
Mi

N
i=1
ȳ i
]
N N

=

1
M0 [
∑ M i ȳ i − M̄ ∑ ȳ i
i=1 i =1 ]
N N N N

=

1
M0 [∑
i=1
M i ȳ i −Ȳ ∑ M i − M̄ ∑ ȳ i + Ȳ ∑ M i
i=1 i=1 i=1 ]
N N N

=

1
M0 [∑
i=1
M i ȳ i −Ȳ ∑ M i − M̄ ∑ ȳ i + Ȳ N M̄
i=1 i=1 ]
N
1
− ∑ M ȳ −Ȳ M i− M̄ ȳ i +Ȳ M̄ ]
M 0 i=1 [ i i
=
N
1
− ∑ [ M ( ȳ −Ȳ )− M̄( ȳ i −Ȳ ) ]
M 0 i=1 i i
=
N
1
− ∑ (M − M̄ )( ȳ i− Ȳ )
M 0 i =1 i
=

N−1
=

( ) S
M 0 m ȳ
,
N
1
S m ȳ = ∑ ( M − M̄ )( ȳ i −Ȳ ).
where N−1 i=1 i
Bias( ¯ȳ c )=0 if M i and ȳ i are uncorrelated.

2
MSE( ¯ȳ c )=Var ( ¯ȳ c )+ [ Bias( ¯ȳ c ) ]
2
N −n 2 N −1 2
=
S+
Nn b M 0 ( )
Sm ȳ

where
N
1
S 2b = ∑ ( ȳ i−Ȳ )2
N−1 i=1

2. Weighted mean of cluster means


Consider the arithmetic mean based on cluster total as
n
¿ 1
ȳ c = ∑ M ȳ
n M̄ i =1 i i
n
¿ 1 1
E( ȳ c )= ∑ E( M i ȳ i )
n i=1 M̄
N
n 1
= ∑ M ȳ
n M̄ N i=1 i i
N Mi
1
= ∑ ∑ ȳ i
M 0 i=1 j=1
=Ȳ
¿ ¿
Thus c is an unbiased estimator of Ȳ . The variance of ȳ c is given by

n
1 M
Var ( ȳ c )=Var ∑ i ȳ i
¿
n i=1 M̄ ( )
N −n ¿2
S
= Nn b ,
where
N 2
1
2 Mi
¿
Sb = ∑
N−1 i=1 M i
ȳ −Ȳ ( )
2 2
¿ ¿
E( sb )=S b

You might also like