KEMBAR78
Probability and Statistic Chapter6 | PDF | Normal Distribution | Statistical Inference
0% found this document useful (0 votes)
43 views35 pages

Probability and Statistic Chapter6

The document discusses inferences based on two samples, specifically paired samples and two independent samples. It covers the distribution of differences between paired samples, which follows a normal distribution. It also discusses confidence intervals and hypothesis testing for the difference in means between paired samples using a t-test. As an example, it analyzes data on zinc concentrations in river water to test if bottom water concentrations exceed surface water concentrations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views35 pages

Probability and Statistic Chapter6

The document discusses inferences based on two samples, specifically paired samples and two independent samples. It covers the distribution of differences between paired samples, which follows a normal distribution. It also discusses confidence intervals and hypothesis testing for the difference in means between paired samples using a t-test. As an example, it analyzes data on zinc concentrations in river water to test if bottom water concentrations exceed surface water concentrations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Paired Samples

Two Independent samples

P ROBABILITY AND S TATISTICS


C HAPTER 6: I NFERENCES B ASED ON T WO S AMPLES

Dr. Phan Thi Huong

HoChiMinh City University of Technology


Faculty of Applied Science, Department of Applied Mathematics
Email: huongphan@hcmut.edu.vn

HCM city — 2021.

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Two Independent samples

O UTLINE

1 PAIRED S AMPLES

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Two Independent samples

O UTLINE

1 PAIRED S AMPLES

2 T WO I NDEPENDENT SAMPLES

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Normal distribution
Two Independent samples

D ISTRIBUTION OF THE S AMPLE D IFFERENCES

A SSUMPTIONS
The data consists of n independently selected pairs (X 1 , Y1 ),
(X 2 , Y2 ), . . . , (X n , Yn ), with E (X i ) = µ1 and E (Yi ) = µ2 . Let

D 1 = X 1 − Y1 , D 2 = X 2 − Y2 , . . . , D n = X n − Yn

So the D i ’s are the differences within pairs. Then the D i ’s are


assumed to be normally distributed with mean µD and variance σ2D .

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Normal distribution
Two Independent samples

D ISTRIBUTION OF THE S AMPLE D IFFERENCES

R EMARK 1
Let D = X − Y . Then the expected difference is

µD = E(X − Y ) = E(X ) − E(Y ) = µ1 − µ2 .

Then D i ’s constitute a normal random sample with mean µD .


Moreover,
D − µD
T= p ∼ t n−1
sD / n

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Normal distribution
Two Independent samples

CI AND HT ON THE D IFFERENCE IN M EANS


C ONFIDENCE I NTERVALS
The paired t CI for µD is

SD
D ± t n−1,α/2 p
n

A one-sided confidence bound results from retaining the relevant


sign and replacing t α/2 by t α .

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Normal distribution
Two Independent samples

CI AND HT ON THE D IFFERENCE IN M EANS


C ONFIDENCE I NTERVALS
The paired t CI for µD is

SD
D ± t n−1,α/2 p
n

A one-sided confidence bound results from retaining the relevant


sign and replacing t α/2 by t α .

H YPOTHESIS T ESTING
Test statistic

D − ∆0
T= p
SD / n

Dr. Phan Thi Huong Probability and Statistics


Paired Samples
Normal distribution
Two Independent samples

CI AND HT ON THE D IFFERENCE IN M EANS


C ONFIDENCE I NTERVALS
The paired t CI for µD is

SD
D ± t n−1,α/2 p
n

A one-sided confidence bound results from retaining the relevant


sign and replacing t α/2 by t α .

H YPOTHESIS T ESTING
Test statistic
H1 Rejection Region
D − ∆0 µD 6= ∆0 |T | > t n−1,α/2
T= p
SD / n µD > ∆0 T > t n−1,α
µD < ∆0 T < −t n−1,α
⇒ use t-test.
Dr. Phan Thi Huong Probability and Statistics
Paired Samples
Normal distribution
Two Independent samples

CI AND HT ON THE D IFFERENCE IN M EANS

E XAMPLE 1
Trace metals in drinking water affect the flavor, and unusually high
concentrations can pose a health hazard. An article reports on a
study in which six river locations were selected (six experimental
objects) and the zinc concentration (mg/L) determined for both
surface water and bottom water at each location. The six pairs of
observations are displayed in the accompanying table. Does the
data suggest that true average concentration in bottom water
exceeds that of surface water? (α = 0.05)
Zinc concentration 1 2 3 4 5 6
in bottom water (x) 0.430 0.266 0.567 0.531 0.707 0.716
in surface water (y) 0.415 0.238 0.390 0.410 0.605 0.609
Difference 0.015 0.028 0.177 0.121 0.102 0.107

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

B ASIC A SSUMPTIONS
1 X 1 , X 2 , . . . , X m is a random sample from a distribution with
mean µ1 and variance σ21 .
2 Y1 , Y2 , . . . , Yn is a random sample from a distribution with mean
µ2 and variance σ22 .
3 The X and Y samples are independent of one another.
If both X and Y are normal then

(X − Y ) − (µ1 − µ2 )
Z= q 2 ∼ N (0, 1)
σ1 σ22
m + n

E(X − Y ) = E(X ) − E(Y ) = µ1 − µ2


σ21 σ22
Var (X − Y ) = Var (X ) + Var (Y ) = +
m n
Dr. Phan Thi Huong Probability and Statistics
Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + K NOWN σ (H YPOTHESIS T ESTS ON


THE D IFFERENCE IN M EANS )

E XTRA A SSUMPTIONS
The X and Y samples are normally distributed with known
variances σ21 and σ22 .

First of all, compute a statistic


s
(x − y) − ∆ σ21 σ22
z= , se = +
se m n
Then apply the following decision rule (z-test)

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + K NOWN σ (H YPOTHESIS T ESTS ON


THE D IFFERENCE IN M EANS )

E XTRA A SSUMPTIONS
The X and Y samples are normally distributed with known
variances σ21 and σ22 .

First of all, compute a statistic


s
(x − y) − ∆ σ21 σ22
z= , se = +
se m n
Then apply the following decision rule (z-test)
H1 Rejection Region
µ1 − µ2 6= ∆ |z| > z α/2
µ1 − µ2 < ∆ z < −z α
µ1 − µ2 > ∆ t > z α
Dr. Phan Thi Huong Probability and Statistics
Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + K NOWN σ (H YPOTHESIS T ESTS ON


THE D IFFERENCE IN M EANS )

E XAMPLE 2
A consumer-research organization routinely selects several car
models each year and evaluates their fuel efficiency. In this year’s
study of two similar subcompact models from two different
automakers, the average gas mileage for twelve cars of brand A was
27.2 miles per gallon. The nine brand B cars that were tested
averaged 32.1 mpg. At α = 0.01 should it conclude that brand B cars
have higher average gas mileage than brand A cars do? Suppose
that two populations have normal distribution with standard
deviations 3.8 mpg and 4.3 mpg respectively.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES


(H YPOTHESIS T ESTS ON THE D IFFERENCE IN M EANS )

E XTRA A SSUMPTIONS
The X and Y samples are normally distributed with the same
variance (σ21 and σ22 may be unknown).

If both X and Y are normal then

(X − Y ) − (µ1 − µ2 )
∼ t m+n−2
se
where
s
(m − 1)s 12 + (n − 1)s 22
µ ¶
1 1
se = s2 + and s 2 = .
m n m +n −2

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES


(H YPOTHESIS T ESTS ON THE D IFFERENCE IN M EANS )

First of all, compute a statistic

(x − y) − ∆
t=
se
Then apply the following decision rule (t-test)

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES


(H YPOTHESIS T ESTS ON THE D IFFERENCE IN M EANS )

First of all, compute a statistic

(x − y) − ∆
t=
se
Then apply the following decision rule (t-test)
H1 Rejection Region
µ1 − µ2 6= ∆ |t | > t α/2,m+n−2
µ1 − µ2 < ∆ t < −t α,m+n−2
µ1 − µ2 > ∆ t > t α,m+n−2

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES


(H YPOTHESIS T ESTS ON THE D IFFERENCE IN M EANS )
E XAMPLE 3
The course coordinator wants to determine if two ways of taking
the course resulted in a significant difference in achievement as
measured by the final exam for the course. The following table gives
the scores on an examination with 45 possible points for two
groups.
Online 32 37 35 28 41 44 35 31 34
Classroom 35 31 29 25 34 40 27 32 31
Do these data present sufficient evidence to indicate that the
average grade for students who take the course online is
significantly higher than for those who attend a conventional class?
Assume that the population are both normal and have the same
variances and the significance level α = 0.01.
Dr. Phan Thi Huong Probability and Statistics
Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES


σ (C ONFIDENCE I NTERVAL ON A D IFFERENCE IN M EANS )
If both X and Y are normal then

(X − Y ) − (µ1 − µ2 )
∼ t m+n−2
se
where
s
(m − 1)s 12 + (n − 1)s 22
µ ¶
1 1
se = s2 + and s 2 = .
m n m +n −2

A 100(1 − α)% confidence interval for µ1 − µ2 is

(x − y) − t m+n−2,α/2 × se ≤ µ1 − µ2 ≤ (x − y) + t m+n−2,α/2 × se

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES


(H YPOTHESIS T ESTS ON THE D IFFERENCE IN M EANS )

E XAMPLE 4
Ten samples of standard cement had an average weight percent
calcium of x = 90.0 with a sample standard deviation of s 1 = 5.0,
and 15 samples of the lead-doped cement had an average weight
percent calcium of y = 87.0 with a sample standard deviation of
s 2 = 4.0. Assume that weight percent calcium is normally
distributed with same standard deviation. Find a 95% confidence
interval on the difference in means, µ1 − µ2 , for the two types of
cement.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + U NEQUAL


VARIANCES
E XTRA A SSUMPTIONS
The X and Y samples are normally distributed with unequal
variances (σ21 and σ22 may be unknown).

If both X and Y are normal then


(X − Y ) − (µ1 − µ2 )
q 2 ≈ tν
s1 s 22
m+ n
where
s 12 s 22 2
³ ´
m + n
ν= ³ 2 ´2 ³ 2 ´2 ,
1 s1 1 s2
m−1 m + n−1 n
if ν is not an integer, round down to be the nearest integer.
Dr. Phan Thi Huong Probability and Statistics
Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + U NEQUAL


VARIANCES

T HE T WO -S AMPLE T TEST FOR TESTING H0 : µ1 − µ2 = ∆0


We can test hypotheses about this difference based on the statistic
H1 Rejection Region
(x − y) − ∆0 t −t est µ1 − µ2 6= ∆0 |t | > t ν,α/2
T= q 2 −−− −→
s1 s 22 µ1 − µ2 > ∆0 t > t ν,α
m+ n µ1 − µ2 < ∆0 t < −t ν,α
where
s 12 s 22 2
³ ´
m + n
ν= ³ 2 ´2 ³ 2 ´2 .
1 s1 1 s2
m−1 m + n−1 n

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + U NEQUAL


VARIANCES

E XAMPLE 5
Arsenic concentration in public drinking water supplies is a
potential health risk. The below table reported drinking water
arsenic concentrations (in ppb) for 10 metropolitan Phoenix
communities and 10 communities in rural Arizona. Determine if
there is any difference in mean arsenic concentrations between
metropolitan Phoenix communities and communities in rural
Arizona.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

Metro Phoenix Rural Arizona


Phoenix 3 Rimrock 48
Chandler 7 Goodyear 44
Gilbert 25 New River 40
Glendale 10 Apache Junction 38
Mesa 15 Buckeye 33
Paradise Valley 6 Nogales 21
Peoria 12 Black Canyon City 20
Scottsdale 25 Sedona 12
Tempe 15 Payson 1
Sun City 7 Casa Grande 18

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

N ORMAL P OPULATION + U NKNOWN σ + U NEQUAL


VARIANCES σ (C ONFIDENCE I NTERVAL ON A D IFFERENCE
IN M EANS )

T HE T WO -S AMPLE T C ONFIDENCE I NTERVAL FOR µ1 − µ2


s
s 12 s 22
X − Y ± t ν,α/2 +
m n
where
s 12 s 22 2
³ ´
m + n
ν= ³ 2 ´2 ³ 2 ´2
1 s1 1 s2
m−1 m + n−1 n

A one-sided CI can be calculated as described earlier.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

E XAMPLE 6
The void volume within a textile fabric affects comfort,
flammability, and insulation properties. Permeability of a fabric
refers to the accessibility of void space to the flow of a gas or liquid.
An article gave summary information on air permeability
(cm3/cm2/sec) for a number of different fabric types. Consider the
following data on two different types of plain-weave fabric:

Fabric Type Sample Size Sample Mean Sample Std


Cotton 10 51.71 0.79
Triacetate 10 136.14 3.59

Assuming that the porosity distributions for both types of fabric are
normal, let’s calculate a confidence interval for the difference
between true average porosity for the cotton fabric and that for the
acetate fabric, using γ = 95%.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE S AMPLE S IZE


If m and n are large then

(X − Y ) − (µ1 − µ2 )
z= q 2 ' N (0, 1)
S1 S 22
m + n

First of all, compute a statistic


s
(x − y) − ∆ s 12 s 22
z= , se = +
se m n

Then apply the following decision rule (z-test)

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE S AMPLE S IZE


If m and n are large then

(X − Y ) − (µ1 − µ2 )
z= q 2 ' N (0, 1)
S1 S 22
m + n

First of all, compute a statistic


s
(x − y) − ∆ s 12 s 22
z= , se = +
se m n

Then apply the following decision rule (z-test)


H1 Rejection Region
µ1 − µ2 6= ∆ |z| > z α/2
µ1 − µ2 < ∆ z < −z α
µ1 − µ2 > ∆ z > z α
Dr. Phan Thi Huong Probability and Statistics
Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE S AMPLE S IZE

E XAMPLE 7
To compare the average life of two brands of 9-volt batteries, a
sample of 100 batteries from each brand is tested. The sample
selected from the first brand shows an average life of 47 hours and a
standard deviation of 4 hours. A mean life of 48 hours and a
standard deviation of 3 hours are recorded for the sample from the
second brand. Is the observed difference between the means of the
two samples significant at the 0.01 level?

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

D ISTRIBUTION OF THE D IFFERENCE IN P ROPORTIONS

P ROPOSITION 2.1
Let p̂ 1 = X /m and p̂ 2 = Y /n, where X ∼ B (m, p 1 ) and Y ∼ B (n, p 2 )
with X ⊥Y . Then
E(p̂ 1 − p̂ 2 ) = p 1 − p 2
So (p̂ 1 − p̂ 2 ) is an unbiased estimator of (p 1 − p 2 ), and
p 1 q1 p 2 q2
Var (p̂ 1 − p̂ 2 ) = +
m n
The following test statistic is distributed approximately as standard
normal and is the basis of the test:
(p̂ 1 − p̂) − (p 1 − p 2 )
Z= q
p 1 q1 p 2 q2
m + n

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

D ISTRIBUTION OF THE D IFFERENCE IN P ROPORTIONS

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE -S AMPLE (H YPOTHESIS T ESTS ON THE D IFFERENCE


IN P ROPORTION )

A L ARGE -S AMPLE z T EST H0 : p 1 − p 2 = 0


Test statistic H1 Rejection
(p̂ −p̂ )−∆
Z = q 1 ¡ 21 10¢ , p̂ 1 − p̂ 2 6= 0 |Z | > z α/2
p q m+n
p̂ 1 − p̂ 2 > 0 Z > z α
m p̂ 1 +n p̂ 2
p = m+n p̂ 1 − p̂ 2 < 0 Z < −z α
The test can safely be used as long as m p̂ 1 , m q̂ 1 , n p̂ 2 , and n q̂ 2 are all
at least 10.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

A L ARGE -S AMPLE z T EST H0 : p̂ 1 − p̂ 2 = 0

E XAMPLE 8
Extracts of St. John’s Wort are widely used to treat depression. An
article in the April 18, 2001, issue of the Journal of the American
Medical Association compared the efficacy of a standard extract of
St. John’s Wort with a placebo in 200 outpatients diagnosed with
major depression. Patients were randomly assigned to two groups;
one group received the St. John’s Wort, and the other received the
placebo. After eight weeks, 19 of the placebo- treated patients
showed improvement, and 27 of those treated with St. John’s Wort
improved. Is there any reason to believe that St. John’s Wort is
effective in treating major depression? Use α = 0.05.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE -S AMPLE (C ONFIDENCE I NTERVAL ON THE


D IFFERENCE IN P ROPORTION )

A CI for p 1 − p 2 is
s s
p̂ 1 q̂ 1 p̂ 2 q̂ 2 p̂ 1 q̂ 1 p̂ 2 q̂ 2
(p̂ 1 −p̂ 2 )−z α/2 + ≤ p 1 −p 2 ≤ (p̂ 1 −p̂ 2 )+z α/2 +
m n m n

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE -S AMPLE (C ONFIDENCE I NTERVAL ON THE


D IFFERENCE IN P ROPORTION )

A CI for p 1 − p 2 is
s s
p̂ 1 q̂ 1 p̂ 2 q̂ 2 p̂ 1 q̂ 1 p̂ 2 q̂ 2
(p̂ 1 −p̂ 2 )−z α/2 + ≤ p 1 −p 2 ≤ (p̂ 1 −p̂ 2 )+z α/2 +
m n m n

This interval can safely be used as long as m p̂ 1 , m q̂ 1 , n q̂ 2 , and


n q̂ 2 are all at least 10.
A one-sided confidence bound results from retaining the
relevant sign and replacing z α/2 by z α .
The estimated standard deviation of (p̂ 1 − p̂ 2 ) is different here
from what it was for hypothesis testing when ∆0 = 0.

Dr. Phan Thi Huong Probability and Statistics


Introduction
Paired Samples
Inferences for Two Population Means
Two Independent samples
Inferences for Population Proportions (Large-Sample)

L ARGE -S AMPLE (C ONFIDENCE I NTERVAL ON THE


D IFFERENCE IN P ROPORTION )

E XAMPLE 9
Consider the process of manufacturing crankshaft bearings.
Suppose that a modification is made in the surface finishing
process and that, subsequently, a second random sample of 85
bearings is obtained. The number of defective bearings in this
second sample is 8. Suppose that

m = 85, p̂ 1 = 10/85 = 0.1176, n = 85, p̂ 2 = 8/85 = 0.0941

Obtain an approximate 95% confidence interval on the difference


in the proportion of defective bearings produced under the two
processes.

Dr. Phan Thi Huong Probability and Statistics

You might also like