Chapter 3 Sampling & Estimation Theory
Chapter 3 Sampling & Estimation Theory
Sampling
&
Estimation Theory
Prepared By: Nahom M.
April 5,2011
Lecture Outline
• Introduction
• Advantages and disadvantages of sample survey
• Types of sampling techniques
• Sampling and non sampling error
• Examples and exercises
• Estimation theory
3.1 Introduction
Sample is a representative of a population.
It is a subset of the population which is selected
for further statistical investigation or analysis.
Sampling is the process of selecting samples
from the population.
A proper procedure should be adopted for
evaluating the sample plan in order to select
representative units of the population.
Advantages of sample survey
Saves time, money, effort, and labor
It is useful when population is infinitely large. If
taking census is practically impossible, sampling
will be the only option
When data available is limited
Minimize destruction
It can be more accurately supervised and data
can be carefully selected
Limitations or draw backs of sample survey
It would give unreliable data if not designed and
executed carefully which leads to sampling error
Sample survey is not useful when information is
needed about each and every unit of the
population
A good sample possesses two characteristics, which are
i. Representativeness of the population and
ii. Adequate in Size (sample size)
Sampling Error
Sampling error occurs when the sample statistic is not
representative of the population parameter. It simply
occurs when the samples represent only a portion of a
population. In such cases, the information contained in
the sample may lead to incorrect inferences about the
parent population.
27
2. Interval Estimation (Confidence Interval )
Point estimation produces a single value as an estimate of a
population parameter. The estimate may or may not be
close to the actual parameter value; thus, the estimate might
be incorrect.
• An interval estimate describes a range of values within
which a parameter might lie.
• An interval or range of values believed to include the
unknown population parameter.
• Associated with the interval there is a measure of
confidence that the interval does indeed contain the
parameter of interest.
28
• Because of these, interval estimation are more
desirable than point estimation.
• A confidence interval or interval estimate has two
components:
A range or interval of values
An associated level of confidence
3.4 Confidence Interval Estimation of a
Population Mean
31
If the population distribution is normal, the sampling distribution
of the mean is normal.
In notation: [X~N()]
n
• If the sample size is sufficiently large, regardless of the shape of
the population distribution, the sampling distribution is normal
(Central Limit Theorem)
The normal distribution probability density function:
x
2
N o r m a l D is trib uti o n: = 0 , = 1
0.4
f ( x) 1 e 2 2 for x
0.3
2 2
e 2718281
. ... and 314159265
. ...
f(x)
0.2
0.1
population
0.2
0.1
2.5% 2.5%
meanfalls
mean fallswithin
withinthe
the95%
95%interval
intervalaround
around
0.0
thepopulation
the populationmean.)
mean.)
x
196
. 196
.
n n
x 1.96 x
x 1.96
n n
**5%
x
x
5%of
ofsuch
suchintervals
intervalsaround
aroundthe
thesample
sample
x
meancan
mean canbebeexpected
expectednot
nottotoinclude
includethe
the
* x
x
actualvalue
actual valueof
ofthe
thepopulation
populationmean.
mean.
x (Whenthe
(When thesample
samplemean
meanfalls
fallsoutside
outsidethe
the
x
95%interval
95% intervalaround
aroundthe
thepopulation
population
x
x
x
x
* mean.)
mean.)
A (1-)100% Confidence Interval
for
We define z as the z value that cuts off a right-tail area of under the standard
2 2
normal curve. (1-) is called the confidence coefficient. is called the error
probability, and (1-)100% is called the confidence level.
S tand ard Norm al Distrib ution
P z z
0.4 2
(1 )
0.3 P z z
2
f(z)
0.2
P z z z (1 )
0.1 2 2
2 2
0.0 (1- )100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5
z Z z x z
2 2
2 n
Critical Values of z and Levels of
Confidence
(1 )
z
Stand ard N o rm al Distrib utio n
2 2
0.4
(1 )
f(z)
0.2
0.4 0.4
0.3 0.3
f(z)
f(z)
0.2 0.2
0.1 0.1
0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z
0 .4 0 .9
0 .8
0 .3 0 .7
0 .6
0 .5
f(x)
f(x)
0 .2
0 .4
0 .3
0 .1
0 .2
0 .1
0 .0 0 .0
x x
0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1
1.56 1.0
1.1
0.3413
0.3643
0.3438
0.3665
0.3461
0.3686
0.3485
0.3708
0.3508
0.3729
0.3531
0.3749
0.3554
0.3770
0.3577
0.3790
0.3599
0.3810
0.3621
0.3830
{
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
44
Cont’d
(1-)100%
AA(1- )100%confidence
confidenceinterval forwhen
intervalfor whenisisnot
notknown
known
s
x t
n
2
wheret isisthe
where thevalue
valueofofthe
thettdistribution
distributionwith
withn-1n-1degrees
degreesof
of
2
freedomthat
freedom thatcuts
cutsoff
offaatail
tailarea
areaof
of to toits
itsright.
right.
2
The t Distribution table
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f = 1 0
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
7 1.415 1.895 2.365 2.998 3.499 Area = 0.10 Area = 0.10
8 1.397 1.860 2.306 2.896 3.355
}
f(t)
0 .2
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
-2.228 0
16 1.337 1.746 2.120 2.583 2.921 2.228
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807 Wheneverisisnot
Whenever notknown
known(and
(andthe
thepopulation
populationisis
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787 assumednormal),
assumed normal),thethecorrect
correctdistribution
distributiontotouse
useisis
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771 thet tdistribution
the distributionwith
withn-1
n-1degrees
degreesofoffreedom.
freedom.
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756 Note,however,
Note, however,that
thatfor
forlarge
largedegrees
degreesofoffreedom,
freedom,
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704 thet tdistribution
the distributionisisapproximated
approximatedwellwellbybythe
theZZ
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617 distribution.
distribution.
1.282 1.645 1.960 2.326 2.576
Example 1
AAstock
stockmarket
marketanalyst
analystwants
wantstotoestimate
estimatethe
theaverage
averagereturn
returnononaacertain
certain
stock. AArandom
stock. randomsample
sampleofof15
15days
daysyields
yieldsananaverage
average(annualized)
(annualized)return
return
xof10.37%
of
andaastandard
and standarddeviation
deviationofofss==3.5%.
3.5%. Assuming
Assumingaanormal
normal
populationof
population ofreturns,
returns,give
giveaa95%
95%confidence
confidenceinterval
intervalfor
forthe
theaverage
averagereturn
return
ononthis
thisstock.
stock.
df
---
t0.100
-----
t0.050
-----
t0.025
------
t0.010
------
t0.005
------ The critical value of t for df = (n -1) = (15 -1)
1
.
3.078
.
6.314
.
12.706
.
31.821
.
63.657
. =14 and a right-tail area of 0.025 is:
. . . . . .
.
13
.
1.350
.
1.771
.
2.160
.
2.650
.
3.012
t 0.025 2.145
14 1.345 1.761 2.145 2.624 2.977 The corresponding confidence interval or
15 1.341 1.753 2.131 2.602 2.947 s
. . . . . .
interval estimate is: x t 0 . 025
.
.
.
.
.
.
.
.
.
.
.
. n
35
.
10.37 2.145
15
10.37 1.94
8.43,12.31
Example 2
A random sample of 100 customer accounts at a large firm is selected
for the purpose of estimating the mean number of transactions per
year for each customer. The sample mean is 43 and the sample
standard deviation is 12. Determine a 90% confidence interval
estimate for .
Solution:- In this problem, as is not known, the appropriate
distribution is t- distribution. However, since n > 30 we can
approximate
_
it by the standard normal distribution.
n= 100, x = 43, S = 12, C.L. = .90 , and z = 1.64
Sx = S / n = 12 / 100 = 1.2
The interval estimate is x + zSx = 43 + 1.64 (1.2) = 43 + 1.97
Thus, the interval is from 41.03 to 44.97
We are 90% confident that the mean number of transactions per year
per customer falls between 41.03 and 44.97. 48
Cont’d
Example 3
A quality control inspector of a Company selects frequent random
samples of size n=6 from the output of an automatic machine to
check on the average diameter of parts being made. Diameters are
normally distributed. The sample has a mean diameter of 2.0016
inches and a standard deviation of 0.0012 inches. Construct the 99%
confidence interval for .
Solution:-In this problem x is not known and n < 30. Therefore, we use
_
t-distribution.
x = 2.0016, C.L. = 0.99, = 1 - 0.99 = 0.01, df = n - 1 = 6-1 = 5,
S_ x = 0.0012 / 6 = 0.0004898, t /2,v = t 0.005,5 = 4.032
The interval is given by x + t/2,v S x = 2.0016 + (4.032 x 0.004898)
= 2.0016 + 0019748. This means that we are 99% confident that
would fall between 1.9996252 and 2.0035748 49
Determination of Sample Size
• Collecting valid information through sampling requires
careful planning, including determination of an appropriate
sample size.
• How large should the sample size be? The answer depends
on the following three factors.
1. How precise (narrow) do we want a confidence interval estimate
to be?
2. How confident do we want to be that the interval estimate is
correct?
3. What is the standard deviation of the population in question?
• Generally the higher the desired precision or level of
confidence, the larger will be the sample size.
• And also, the larger the population variability is, the larger
will be the sample size.
50
Sample Size for Interval Estimation of
• Consider
_
z = (x - )/( /n)
Solving this for n we get _the following
n =(z2 2) / (x - )2
This is the formula for computing sample size for interval
estimation of .
• There are three quantities that determine the value of n
– The value of z reflecting the confidence interval
– The absolute value of (x - ) which represents the maximum error
in estimation
– What is your estimate of the variance (or standard deviation) of
the population in question? When is not known, Sx from a pilot
sample is used in its place.
51
Cont’d
Example:
For the purpose of illustration, assume the desired confidence
level is 95%. If =15 and we want an estimate of with a
maximum error in estimation of 5, the required sample size
would be computed as follows.
Solution: n =(z2 2) / (x - )2
c.l = 0.95, = 0.05, z = 1.96
x = 15
| x - | = 5
n = [(1.96)2 (15) 2 ] / 5 2
= 34.5744 or 35
In sample size determination, no matter what the value of the
decimal places is, we round them up wards. 52