KEMBAR78
Power and Sample Size | PDF | Type I And Type Ii Errors | Statistical Significance
0% found this document useful (0 votes)
174 views88 pages

Power and Sample Size

The document discusses sampling and sample size in randomized controlled trials (RCTs). It explains that estimates from multiple well-implemented RCTs will produce a distribution centered around the true effect, whereas a single study provides a single estimate that could fall anywhere in the distribution. Larger sample sizes increase precision but do not reduce bias, and randomization is key to reducing bias. The document also covers how statistical power is determined by sample size, effect size, and variability in outcomes.

Uploaded by

joannaorlova
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views88 pages

Power and Sample Size

The document discusses sampling and sample size in randomized controlled trials (RCTs). It explains that estimates from multiple well-implemented RCTs will produce a distribution centered around the true effect, whereas a single study provides a single estimate that could fall anywhere in the distribution. Larger sample sizes increase precision but do not reduce bias, and randomization is key to reducing bias. The document also covers how statistical power is determined by sample size, effect size, and variability in outcomes.

Uploaded by

joannaorlova
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Sampling and Sample Size

Rachel Glennerster
MIT
Sampling and Sample Size

Rachel Glennerster
MIT
Which of these most likely to describe
estimates from 8 well implemented RCTs?

I. II.
70%

A. I
B. II
C. Neither 15% 15%

A. B. C.
J - PAL | SAMPLING AND SAMPLE SIZE 4
Which is the best description of II?

I. II.
89%

A. Imprecise estimate
B. Biased estimate
C. Imprecise but unbiased
11%
0%

A. B. C.
J - PAL | SAMPLING AND SAMPLE SIZE 5
Bias and precision

Precision (Sample Size)

estimates

truth

Less bias (Randomization)

J - PAL | SAMPLING AND SAMPLE SIZE 6


Outline

• Introduction
• Hypothesis testing
• What influences power?
• Power in clustered designs
• Calculating power in practice

J - PAL | SAMPLING AND SAMPLE SIZE 7


Outline

• Introduction
• Hypothesis testing
• What influences power?
• Power in clustered designs
• Calculating power in practice

J - PAL | SAMPLING AND SAMPLE SIZE 8


We evaluate bringing tutors into schools

J - PAL | SAMPLING AND SAMPLE SIZE 10


Post-test: control & treatment

Comparison mean Treatment mean

The mean of treatment is 6ppt higher than mean of control


Is this impact statistically significant?
Average Difference = 6 points
80%

15%
A. Yes
5%
B. No
C. We cant tell
A. B. C.

J - PAL | SAMPLING AND SAMPLE SIZE 12


Difference between the sample means

Comparison mean Treatment mean

Estimated effect

J - PAL | SAMPLING AND SAMPLE SIZE 13


What if we ran a second experiment?

Comparison mean Treatment mean

Estimated effect

J - PAL | SAMPLING AND SAMPLE SIZE 14


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 15


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 16


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 17


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 18


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 19


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 20


Many experiments: a distribution of estimates

100

90

80

70

60
Frequency

50

40

30

20

10

0
-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Difference

J - PAL | SAMPLING AND SAMPLE SIZE 21


Outline

• Introduction
• Hypothesis testing
• What influences power?
• Power in clustered designs
• Calculating power in practice

J - PAL | SAMPLING AND SAMPLE SIZE 22


Distribution of estimates if true effect=β

0.5

0.45 β
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Estimated effects normally distributed around effect β


Distribution of estimates if true effect=0

0.5

0.45 0
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Estimated effects normally distributed around effect 0


Two distributions under two hypotheses

0.5

0.45
True effect=H0
0.4
True effect=Hβ
0.35

H0 Hβ
0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Probability of getting given estimate under two hypotheses


If we run one study, see one estimate β�
β�
0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

How do we know if our estimated effect is significant?


Did our estimate come from 𝐻𝐻β or 𝐻𝐻0 ?

0.5

0.45

0.4
True effect=H0
0.35


True effect=Hβ

H0
0.3
P1
0.25

0.2

0.15

0.1

0.05
P2
0
-4 -3 -2 -1 0 1 2 3 4 5 6

Which is more likely?


Can we rule out that the estimate comes from H0?
Impose significance level of 5%
Critical value 0.5 Critical value
0.45

0.4

H0 Hβ
0.35 True effect=H0

0.3 True effect=Hβ

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Anything between lines cannot be distinguished from 0


Critical value

• Definition: The estimated effect size that exactly


corresponds to the significance level.
• If testing whether:
– The effect is bigger than 0
– Significant at 95% level
Then, critical value is the level of the estimate where exactly
5% of area under the curve lies to the right

J - PAL | SAMPLING AND SAMPLE SIZE 30


Is β� significantly different from 0, at 5%?
β�
0.5
Critical value Critical value
0.45

0.4

H0 0.35

0.3
Hβ True effect=H0

True effect=Hβ

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

How do we know?
Hypothesis Testing

• In criminal law, most institutions follow the rule: “innocent


until proven guilty”
• The presumption is that the accused is innocent and the
burden is on the prosecutor to show guilt
– The jury or judge starts with the “null hypothesis” that the
accused person is innocent
– The prosecutor has a hypothesis that the accused person is
guilty

J - PAL | SAMPLING AND SAMPLE SIZE 32


Hypothesis Testing

• Usually in program evaluation, instead of “presumption


of innocence,” the rule is: “presumption of zero”
• The “Null hypothesis” (H0) is then that there was no (zero)
impact of the program
• The burden of proof is on showing there was an impact
• Note, there are exceptions:
– e.g. the null for a cash plus training program might be that
it’s the same as the training on its own

J - PAL | SAMPLING AND SAMPLE SIZE 33


Hypothesis Testing: Conclusions

• If it is very unlikely (less than a 5% probability) that the


difference is solely due to chance:
– We “reject our null hypothesis”
• We may now say:
“our program has a statistically significant impact”
• What is statistically significant and what is most likely are
different concepts
– there may be cases where it’s more likely that the program
worked than that it didn’t, but we still say there is no
statistically significant impact

J - PAL | SAMPLING AND SAMPLE SIZE 34


Significance and Type I errors

• Traditionally significance
level is set at 5%
• This means allowing a 5%
chance of experiencing
Type I errors
• 5% of time we will say
program had impact
when in fact it didn’t

Source: Effect Size FAQs blog

J - PAL | SAMPLING AND SAMPLE SIZE 35


If true effect was β

Critical value 0.5


Critical value
0.45

0.4

H0 0.35

0.3
Hβ True effect=H0

True effect=Hβ

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

How often would we get an estimated effect we can


distinguish from zero?
How often would we reject null, if Hβ true?
0.5
Critical value
True effect=H0
0.45
True effect=Hβ
0.4
Power
0.35

H0 0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Shaded area shows power = % of time we would find Hβ


different from 0 if true effect was β
The power to avoid Type II errors

• Statistical power is the probability that, if the true effect is


of a given size, our proposed experiment will be able to
distinguish the estimated effect from zero
• Power is the probability of
avoiding Type II errors
• Traditionally, we aim for 80%
power (some aim for 90%)
• Low power means we may not
find a significant effect even
though an effect exists

J - PAL | SAMPLING AND SAMPLE SIZE Source: Effect Size FAQs blog 38
Four results from hypothesis testing
Underlying truth
Effect No effect
Power: when is effect No error
prob, find significance

Significant

Statistical test
No error

Not significant

J - PAL | SAMPLING AND SAMPLE SIZE Source: Effect Size FAQs blog 39
Four results from hypothesis testing

The underlying truth:

TREATMENT EFFECT NO TREATMENT EFFECT


Statistical Test:
(H0 false) (H0 true)
False Positive
SIGNIFICANT True Positive
Probability = α
(Reject H0) Probability = 1-κ
Type I error
False Zero
NOT SIGNIFICANT True Zero
Probability = κ
(Fail to reject H0) Probability = (1-α)
Type II error

J - PAL | SAMPLING AND SAMPLE SIZE 40


Outline

• Introduction
• Hypothesis testing
• What influences power?
• Power in clustered designs
• Calculating power in practice

J - PAL | SAMPLING AND SAMPLE SIZE 41


What influences power?
0.5
Critical value
True effect=H0
0.45
True effect=Hβ
0.4
Power
0.35

H0 0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Move overlap of cures, less power, what influences overlap


of these curves?
Power: main ingredients

1. Effect Size

J - PAL | SAMPLING AND SAMPLE SIZE 44


Effect Size: 1*SE

• Hypothesized effect
0.5
size determines distance between
1 Standard
means 0.45 Error

0.4
True effect=H0

H0 0.35

0.3
Hβ True effect=Hβ

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 45


Effect Size = 1*SE
0.5

0.45 True effect=H0

True effect=Hβ
0.4
Significance

H0 Hβ
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 46


Power: 26%
If the true impact was 1*SE…
0.5

True effect=H0
0.45

True effect=Hβ
0.4

H0 Hβ
Power
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

The Null Hypothesis would be rejected only 26% of the time


Effect Size: 3*SE

0.5

0.45 3*SE

0.4
True effect=H0

H0 0.35

0.3
Hβ True effect=Hβ

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Bigger hypothesized effect size  distributions farther apart


Effect size 3*SE: Power= 91%
0.5
True effect=H0
0.45
True effect=Hβ

0.4 Power

H0 0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Bigger Effect size means more power


Effect size and take-up

• Let’s say we believe the impact on our participants is “3”


• What happens if take up is 1/3?
• Let’s show this graphically

J - PAL | SAMPLING AND SAMPLE SIZE 50


Effect Size: 3*SE
0.5

0.45
3*SE
0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Let’s say we believe the impact on our participants is “3”


Take up is 33%. Effect size is 1/3rd
0.5

1 Standard
0.45
Error

0.4
True effect=H0

H0 Hβ
0.35
True effect=Hβ

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 52


Back to: Power = 26%
0.5

True effect=H0
0.45
True effect=Hβ
0.4

H0 Hβ
Power
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Take-up is reflected in the effect size


Power: main ingredients

1. Effect Size
2. Sample Size

J - PAL | SAMPLING AND SAMPLE SIZE 54


By increasing sample size
you …

Power 91% 89%

-4 -3 -2 -1 0 1 2 3 4 5 6

A. Reduce bias
B. Increase precision
C. Both
D. Neither 11%
E. Don’t know 0% 0% 0%

A. B. C. D. E.
J - PAL | SAMPLING AND SAMPLE SIZE 55
Increasing sample size
will …

Power 91%
83%

-4 -3 -2 -1 0 1 2 3 4 5 6

A. Move curves further


apart
B. Move curves closer
together
C. Make curves fatter 6% 6% 6%
D. Make curves narrower 0%
E. Don’t know
A. B. C. D. E.
J - PAL | SAMPLING AND SAMPLE SIZE 56
Power: Effect size = 1 SE
Sample size = 1,000

0.5

0.45

H0 Hβ
True effect=H0
0.4

True effect=Hβ
0.35
Significance
0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 57


Power: Effect size= 1SE
Sample size = 4,000
0.5

0.45

0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3
Significance

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 58


Power: 64%
0.5

0.45

0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3
Power

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 59


Power: Sample size = 9,000
0.5

0.45

0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3
Significance

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 60


Power: 91%
0.5

0.45

0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3
Power

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 61


Power: main ingredients

1. Effect Size
2. Sample Size
3. Variance

J - PAL | SAMPLING AND SAMPLE SIZE 62


What will increased variation in the underlying
population do to our estimates?
A. Increase risk of bias?
B. Reduce risk of bias
72%
C. Increase precision of
estimate
D. Reduce precision of
estimate
E. Will not change
estimates

11%
6% 6% 6%

A. B. C. D. E.
J - PAL | SAMPLING AND SAMPLE SIZE 63
What does increased variation in population
do to our distribution of estimates curves?

A. Move them further


apart? 89%

B. Move them closer


C. Make them fatter
D. Make them thinner
E. Don’t know

6% 6%
0% 0%

A. B. C. D. E.
J - PAL | SAMPLING AND SAMPLE SIZE 64
Low variance sample
0.5

0.45

0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3
Significance
0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Estimates will be more tightly clustered


Low variance sample
0.5

0.45

0.4

H0 Hβ
True effect=H0
0.35
True effect=Hβ
0.3
Power
0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Estimates will be more tightly clustered  Higher power


Higher variance sample 0.5

0.45

True effect=H0
0.4
True effect=Hβ
0.35
Significance

H0 0.3

0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Estimates will be more dispersed


Variance and power: intuition
• Our program seeks to increase child height
• There is a lot of variation in height in the population
• At endline children in treatment are taller than in control
• Is this because we happened to start with taller children?
or because the program worked?
• We need a big sample to sort this out

Population
Treatment Control

J - PAL | SAMPLING AND SAMPLE SIZE 69


Variance and power: intuition

• If everyone in the underlying population was of similar


height at the start, it would be easy to sort this out
– we would need a smaller sample (or have more power
with a given sample)
– the variation we see at the end between treatment and
control must be due to the program

Population
Treatment Control

J - PAL | SAMPLING AND SAMPLE SIZE 70


Power: main ingredients

1. Effect Size
2. Sample Size
3. Variance
4. Proportion of sample in T vs. C

J - PAL | SAMPLING AND SAMPLE SIZE 71


Sample split: 50% C, 50% T
0.5

0.45

H0 Hβ
0.4

0.35 True effect=H0

0.3 True effect=Hβ

Significance
0.25

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Equal split gives distributions that are the same “fatness”


Power: 91%
0.5

0.45

H0 0.4

0.35

0.3 True effect=H0

True effect=Hβ
0.25
Power
0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 73


If it’s not 50-50 split?

• What happens to the relative fatness if the split is not 50-


50.
• Say 25-75?

J - PAL | SAMPLING AND SAMPLE SIZE 74


Sample split: 25% C, 75% T
0.5

0.45

0.4

H0 Hβ
0.35

0.3

True effect=H0
0.25

True effect=Hβ
0.2
Significance
0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

Uneven distributions, not efficient, i.e. less power


Power: 83%
0.5

0.45

H0 Hβ
0.4

0.35

0.3 True effect=H0

True effect=Hβ
0.25
Power
0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4 5 6

J - PAL | SAMPLING AND SAMPLE SIZE 76


Allocation ratio

• Definition: the fraction of the total sample allocated to


the treatment group is the allocation ratio
• Usually, for a given sample size, power is maximized
when half sample allocated to treatment, half to control
• Diminishing marginal benefit to precision from adding
sample, so best to add equally

J - PAL | SAMPLING AND SAMPLE SIZE 77


Allocation to T v C

σ2 σ2
sd ( X 1 − X 2 ) = +
n1 n2

1 1 2
sd ( X 1 − X 2 ) = + = =1
2 2 2

1 1 4
sd ( X 1 − X 2 ) = + = = 1.15
3 1 3
J - PAL | SAMPLING AND SAMPLE SIZE 78
Power equation: MDE

Significance Variance
Level
Effect Size Power

σ
EffectSize = (t(1−κ ) + tα )*
2
1
*
P(1 − P ) N
Proportion in
Treatment Sample
Size
79
J - PAL | SAMPLING AND SAMPLE SIZE
Outline

• Introduction
• Hypothesis testing
• What influences power?
– Effect size
– Sample size
– Variance
– Proportion of sample in treatment vs. control
• Power in clustered designs
• Calculating power in practice

J - PAL | SAMPLING AND SAMPLE SIZE 80


Randomize individuals to T or C

J - PAL | SAMPLING AND SAMPLE SIZE 82


Randomize individuals to T or C

J - PAL | SAMPLING AND SAMPLE SIZE 83


Or randomize clusters: e.g. classes

J - PAL | SAMPLING AND SAMPLE SIZE 84


Or randomize clusters: e.g. classes

J - PAL | SAMPLING AND SAMPLE SIZE 85


Compared to an individual level randomized
design, to achieve the same power, a
clustered level RCT is likely to require..

A. A smaller sample size 80%

B. A bigger sample size


C. The same sample size
D. Don’t know

15%
5%
0%
A. B. C. D.

J - PAL | SAMPLING AND SAMPLE SIZE 86


Clustered design: intuition

• You want to know how close the upcoming national


elections will be

• Method 1: Randomly select 50 people from entire US


population

• Method 2: Randomly select 10 families, and ask five


members of each family their opinion

J - PAL | SAMPLING AND SAMPLE SIZE 87


Low intra-cluster correlation (ICC) or ρ (rho)
Control
Population

Treatment
J - PAL | SAMPLING AND SAMPLE SIZE 88
HIGH intra-cluster correlation (ρ)

Control
Population

Treatment
J - PAL | SAMPLING AND SAMPLE SIZE 89
Intra-cluster correlation definition

• Total variance can be divided into within cluster


variance (𝜏𝜏 2 ) and between cluster variance (σ2 )
• When variance within clusters is small the within cluster
correlation is high ie (ICC) is high (previous slide)
• Definition of ICC: the proportion of total variation
explained by within cluster level variance
– Note, when within cluster variance is high, within cluster
correlation is low and between cluster correlation is high
σ2
• 𝑖𝑖𝑖𝑖𝑖𝑖 = 𝜌𝜌 =
𝜎𝜎2 +𝜏𝜏2

J - PAL | SAMPLING AND SAMPLE SIZE 90


How does ICC impact power?

• For a given N we have less power when we randomize


by cluster (unless ICC is zero)
• There are diminishing returns to surveying more people
per cluster
• Usually the number of clusters is the key determinant of
power, not the number of people per cluster

J - PAL | SAMPLING AND SAMPLE SIZE 91


All uneducated people live in one village.
People with only primary education live in
another. College grads live in a third, etc.
ICC (ρ) on education will be..

A. High
83%
B. Low
C. No effect on rho
D. Don’t know

17%

0% 0%

A. B. C. D.

J - PAL | SAMPLING AND SAMPLE SIZE 92


If ICC (ρ) is high, what is a more efficient
way of increasing power?
A. Include more clusters in
the sample 47%
B. Interview more people in
each cluster
C. Both
D. Don’t know 26%

16%
11%

A. B. C. D.
J - PAL | SAMPLING AND SAMPLE SIZE 93
Power with clustering

Significance
Effect Size Variance
Level
Power

σ
= (t(1−κ ) + tα )*
2
EffectSize 1
*
1 + ρ (m − 1) P(1 − P ) N
Proportion in
Average Treatment Sample
ICC Size
J - PAL |
Cluster Size
SAMPLING AND SAMPLE SIZE
94

You might also like