Sampling and
Sampling
Distributions
1
Learning Objectives
Determine when to use sampling.
Determine the pros and cons of various sampling
techniques.
Be aware of the different types of errors that can
occur in a study.
Understand the impact of the Central Limit Theorem
on statistical analysis.
Use the sampling distributions of the sample mean
and sample proportion.
2
Reasons for Sampling
Sampling – A means for gathering information about
a population without conducting a census
Information gathered from sample, and inference is made
about the population
Sampling has advantages over a census
Sampling can save money.
Sampling can save time.
3
Random Versus Nonrandom Sampling
Nonrandom Sampling - Every unit of the population
does not have the same probability of being included
in the sample
Random sampling - Every unit of the population has
the same probability of being included in the sample.
4
Random Sampling Techniques
Simple Random Sample – basis for other random
sampling techniques
Each unit is numbered from 1 to N (the size of the
population)
A random number generator can be used to select
n items that form the sample
5
Random Sampling Techniques
Stratified Random Sample
The population is broken down into strata with like
characteristics (i.e. men and women OR old, young, and
middle-aged people)
Efficient when differences between strata exist
Proportionate (% of the sample from each stratum equals
% that each stratum is within the whole population)
Systematic Random Sample
Define k = N/n. Choose one random unit from first k units,
and then select every kth unit from there.
Cluster (or Area) Sampling
The population is in pre-determined clusters (students in
classes, apples on trees, etc.)
A random sample of clusters is chosen and all or some
units within the cluster is used as the sample
6
Simple Random Sample:
Population Members
01 Alaska Airlines 11 DuPont 21 Lucent
02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JCPenney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner
Population size of N = 30
Desired sample size of n = 6
7
Simple Random Sample:
Sample Members
01 Alaska Airlines 11 DuPont 21 Lucent
02 Alcoa 12 Exxon Mobil 22 Mattel
03 Ashland 13 General Dynamics 23 Mead
04 Bank of America 14 General Electric 24 Microsoft
05 BellSouth 15 General Mills 25 Occidental Petroleum
06 Chevron 16 Halliburton 26 JCPenney
07 Citigroup 17 IBM 27 Procter & Gamble
08 Clorox 18 Kellog 28 Ryder
09 Delta Air Lines 19 KMart 29 Sears
10 Disney 20 Lowe’s 30 Time Warner
8
Systematic Sampling: Example
Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000).
A sample of fifty (n = 50) purchases orders is
needed for an audit.
k = 10,000/50 = 200
9
Systematic Sampling: Example
First sample element randomly selected from the
first 200 purchase orders. Assume the 45th
purchase order was selected.
Subsequent sample elements: 45, 245, 445, 645, . . .
10
Convenience (NonRandom) Sampling
Non-Random sampling – sampling techniques used
to select elements from the population by any
mechanism that does not involve a random selection
process
These techniques are not desirable for making statistical
inferences
Example – choosing members of this class as an accurate
representation of all students at our university, selecting
the first five people that walk into a store and ask them
about their shopping preferences, etc.
11
Non-sampling Errors
Non-sampling Errors – all errors that exist other than
the variation expected due to random sampling
Missing data, data entry, and analysis errors
Leading questions, poorly conceived concepts, unclear
definitions, and defective questionnaires
Response errors occur when people do not know, will not
say, or overstate in their answers
12
Distribution of a Small Finite Population
N=8
54, 55, 59, 63,64, 68, 69, 70
13
Sample Space for n = 2 with
Replacement
Sample Mean Sample Mean Sample Mean Sample Mean
1 (54,54) 54.0 17 (59,54) 56.5 33 (64,54) 59.0 49 (69,54) 61.5
2 (54,55) 54.5 18 (59,55) 57.0 34 (64,55) 59.5 50 (69,55) 62.0
3 (54,59) 56.5 19 (59,59) 59.0 35 (64,59) 61.5 51 (69,59) 64.0
4 (54,63) 58.5 20 (59,63) 61.0 36 (64,63) 63.5 52 (69,63) 66.0
5 (54,64) 59.0 21 (59,64) 61.5 37 (64,64) 64.0 53 (69,64) 66.5
6 (54,68) 61.0 22 (59,68) 63.5 38 (64,68) 66.0 54 (69,68) 68.5
7 (54,69) 61.5 23 (59,69) 64.0 39 (64,69) 66.5 55 (69,69) 69.0
8 (54,70) 62.0 24 (59,70) 64.5 40 (64,70) 67.0 56 (69,70) 69.5
9 (55,54) 54.5 25 (63,54) 58.5 41 (68,54) 61.0 57 (70,54) 62.0
10 (55,55) 55.0 26 (63,55) 59.0 42 (68,55) 61.5 58 (70,55) 62.5
11 (55,59) 57.0 27 (63,59) 61.0 43 (68,59) 63.5 59 (70,59) 64.5
12 (55,63) 59.0 28 (63,63) 63.0 44 (68,63) 65.5 60 (70,63) 66.5
13 (55,64) 59.5 29 (63,64) 63.5 45 (68,64) 66.0 61 (70,64) 67.0
14 (55,68) 61.5 30 (63,68) 65.5 46 (68,68) 68.0 62 (70,68) 69.0
15 (55,69) 62.0 31 (63,69) 66.0 47 (68,69) 68.5 63 (70,69) 69.5
16 (55,70) 62.5 32 (63,70) 66.5 48 (68,70) 69.0 64 (70,70) 70.0
14
Distribution of the Sample Means
Sampling Distribution Histogram
20
15
10
Frequency
53.75 56.25 58.75 61.25 63.75 66.25 68.75 71.25
15 15
Sampling Distribution of x
The sampling distribution of x is the probability
distribution of all possible values of the sample
mean x.
Expected Value of x
E( x ) =
where:
= the population mean
16
Sampling Distribution of Mean x
Proper analysis and interpretation of a sample
statistic requires knowledge of its distribution.
Use
x
Unknown to
estimate
Population Calculate
()
Parameter Process of
[STARTHERE]
Inferential Statistics x
Select
a
random
sample
17
If the Population is not Normal
We can apply the Central Limit Theorem:
Even if the population is not normal,
…sample means from the population will be
approximately normal as long as the sample size is
large enough.
Properties of the sampling distribution:
x μand σ
σx
n
18
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough…
shape of
population
x
19
If the Population is not Normal
(continued)
Population Distribution
Sampling distribution properties:
Central Tendency
x μ μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx Smaller sample
Larger
sample
n size
size
μx x
20
How Large is Large Enough?
For most distributions, n > 30 will give a
sampling distribution that is nearly normal
For fairly symmetric distributions, n > 15
For normal population distributions, the
sampling distribution of the mean is always
normally distributed
21
Central Limit Theorem
Consider taking a sample of size n from a population
The sampling distribution of the sample mean is the
distribution of the means of repeated samples of size
n from a population
The central limit theorem states that as the sample
size increases,
The shape of the distribution becomes a normal
distribution (this condition is typically consider to be met
when n is at least 30)
The variance decreases by a factor of n
22
Sampling from a Normal Population
The distribution of sample means is normal for
any sample size.
If x is the mean of a random sample of size n
from a normal population with mean of and
standard deviation of , the distributi on of x is
a normal distributi on with mean x and
standard deviation x .
n
23
Z Formula for Sample Means
X
Z
n
24
Tire Store Example
Suppose that the mean expenditure per customer
at a tire store is $85.00, with a standard deviation
of $9.00. If a random sample of 40 customers is
taken, what is the probability that the sample
average expenditure per customer for this sample
will be $87.00 or more?
Solution: Because the sample size is greater
than 30, the central limit theorem can be
used to state that the sample mean is
normally distributed and the problem can
proceed using the normal distribution
calculations.
25
Solution to Tire Store Example
Population Parameters : 85, 9
Sample Size : n 40
X 87 85
P ( X 87) P
9
n 40
P Z 1.41 0.0793
26
Graphic Solution to Tire Store Example
9
X 40
1
.5000 .5000
1. 42
.4207 .4207
85 87 X 0 1.41 Z
X- 87 85 2
Z= 1. 41 Equal Areas
9 1. 42 of .0793
n 40
27
Demonstration Problem 7.1
Suppose that during any hour in a large department
store, the average number of shoppers is 448, with
a standard deviation of 21 shoppers. What is the
probability that a random sample of 49 different
shopping hours will yield a sample mean between
441 and 446 shoppers?
28
Demonstration Problem 7.1
29
Graphic Solution for
Demonstration Problem
X
3 1
.4901 .4901
.2486 .2486
.2415 .2415
441 446 448 X -2.33 -.67 0 Z
X - 441 448 X - 446 448
Z= 2.33 Z = 21
0.67
21
n 49
n 49
30
Sampling Distribution of
p
Sample Proportion
x
pˆ
n
where :
x number of items in a sample that possess the characteristic
n = number of items in the sample
Sampling Distribution
The central limit theorem holds, and the distribution is
approximately normal if np > 5 and nq > 5
(p is the population proportion and q = 1 - p)
The mean of the distribution is p.
The variance of the distribution is pq/n
31
Sampling Distribution of
p (“p hat”)
p or “p hat’ is a sample proportion
Whereas the mean is computed by averaging a set
of values, the sample proportion is computed by
dividing the frequency with which a given
characteristic occurs in a sample by the number
of items in the sample
32
Z Formula for Sample Proportions
p p
Z
p q
n
where :
p
sample proportion
n sample size
p population proportion
q 1 p
n p ≥ 5
n q ≥ 5
33
Demonstration Problem 7.3
If 10% of a population of parts is defective,
what is the probability of randomly selecting
80 parts and finding that 12 or more parts are
defective?
34
Solution for Demonstration Problem 7.3
Population Parameters
. 15 p
p = 0 . 10 P Z
p q
q = 1 - p 1 . 10 . 90 n
Sample . 15 . 10
P Z
n = 80 (. 10 )(. 90 )
80
x 12
0 . 05
x 12 P Z
p
0 . 15 0 . 0335
n 80 P ( Z 1. 49 )
Check: np = 80(0.1) = 8 > 5 . 5 P ( 0 Z 1. 49 )
and nq = 80(0.9) = 72 > 5 . 5 . 4319
p . 15 )
P ( . 0681
35
Graphic Solution for
Demonstration Problem
p
0. 0335 1
.5000 .5000
.4319 .4319
^
0.10 0.15 p 0 1.49 Z
pˆ p 0.15 0.10 0.05
Z= 1.49
pq (.10)(.90) 0.0335
n 80
36