0% found this document useful (0 votes)

19 views29 pages

Statistics for Data Analysts

Summarized Cheat Sheet - Hypothesis Testing

Uploaded by

sayantini123bak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views29 pages

Statistics for Data Analysts

Summarized Cheat Sheet - Hypothesis Testing

Uploaded by

sayantini123bak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

HYPOTHESIS TESTING

n
1
• x= ∑ x i
n i=1
n
1
• s21= ∑
n−1 i=1
( x i−x )
2

n
1
s = ∑ ( x i−x )
2 2
•
n i=1
• If E(statistic) = parameter

• then the statistic is said to be an Unbiased Estimate of the parameter.

• Sample mean is an unbiased estimate of the population mean.

• This means that the average of all sample means equals the population mean.

• E ( x )=μ

Also, E ( s1 ) =σ E ( s2 ) ≠ σ 2
2 2
and

• Unknown parameters are estimated using sample observations.

• Parameter values are fixed.

• Values of statistics vary from sample to sample.

• Each sample has some probability of being chosen.

• Each value of a statistic is associated with probability.

• Thus, Statistic is a random variable.

• Distribution of a statistic is called a sampling distribution.

• Distribution of a statistic may not be the same as the distribution of the population.

• We saw in previous example, E ( x )=μ∧Var ( x )=σ 2 /n.

• This is always true and can be proved as below:

( )
n n n
1 1 1
• E ( x )=E ∑ x i = ∑ E ( x i )= ∑ μ=μ
n i=1 n i=1 n i =1

( )
n n n
1 1 1 1
• Var ( x )=Var ∑ x i = 2 ∑ Var ( x i )= 2 ∑ σ 2= 2 n σ 2=σ 2 /n
n i =1 n i=1 n i=1 n
• Square root of variance is generally called as standard deviation.

• Here we shall call it Standard Error.

• Different samples of the same size from the same population yield different sample means.

• Standard Error of x is a measure of the variability in different values of sample mean.

Central Limit Theorem

• When population distribution is N(μ, σ),

σ
• Then x N (μ , )
√n
• When the population distribution is not normal,

σ
• Then also, x N (μ , ), provided n → ∞ .
√n
• Practically, this result is true for n ≥ 30.

• The result may also be written as

( x−μ)
• N (0 ,1)
σ /√n
• Clearly, this result is valid when

• Sample comes out of a normal population, or

• Sample size is large (n ≥ 30).

• Suppose a population has mean μ = 8 and standard deviation σ = 3.

• Suppose a random sample of size n = 36 is selected.
• What is the probability that the sample mean is between 7.75 and 8.25?
• P ( 7.75< x <8.25 ) ?
σ
•
x N (μ , ) x N (8 , 0.5)
√ n , or
• Using Excel,
• P ( 7.75< x <8.25 ) = NORM.DIST(8.25,8,0.5,1)-
NORM.DIST(7.75,8,0.5,1)

POPULATION & SAMPLE PROPORTIONS

• X and π are population parameters.

• x and p are sample statistics.
• p provides an estimate of π .
• Note that, x B(n , π )
• E ( x )=n π ,
• Var (x )=n π (1 – π ),
• This implies that
• E( p)=E (x /n)=π ,
2
• Var ( p)=Var (x /n)=n π (1 – π )/n =π (1 – π )/n .
• Standard Error ( p)=√ [Var ( p)]=√[ π (1 – π )/n]

• When the sample size n is large enough, binomial distribution approaches normal distribution.
• So, for large n ,
p−π

•
•
√ π (1−π )
n
~N (0,1),
This is a particular case of central limit theorem.
• Practically, this result is true for n ≥ 30.
Or, when nπ ≥ 5as well as n(1 – π )≥5 .

• We have seen the following 2 results:

x−μ
• N ( 0 , 1)
σ /√ n
• This result is valid:
• When sample size is 30 or more, or
• When parent population has normal distribution
p−π
N ( 0 ,1 )
• √ π ( 1−π ) /n
• This result is valid
• when sample size is 30 or more, or
• When nπ ≥ 5 , as well as n ( 1−π ) ≥ 5

• Two types of error:

• Type I Error: Reject H0, when it is true
• Size of Type I Error = P(Type I Error)
• =P(Reject H0, when it is true)
• =α (Also called Producer’s risk)
• Type II Error: Accept H0, when it is wrong
• Size of Type II Error = P(Type II Error)
• =P(Accept H0, when it is wrong)
• =β (Also called Consumer’s risk)
• Size of Type I Error (α) is called the Level of Significance.
• α is set by the researcher in advance.

• Critical value divides the whole area under the probability curve into two regions:
• Critical (Rejection) region
• When the statistical outcome falls into this region, H is rejected.
0
• Size of this region is α.
• Acceptance Region
• When the statistical outcome falls into this region, H is
0
accepted. Size of this region is (1-α).
Testing of Statistical Hypothesis
(One Samples Test)

Testing of Hypothesis for µ (z-test)

• Conditions/ Assumptions:
• Population is normal or n ≥ 30
• σ is known or n ≥ 30
x−μ
Z c=
σ
Test Statistic:
√n
1. Obtain the Critical Values using Excel or the Statistical Table
• Excel Formula
• For TTT: NORM.S.INV(α/2) and
NORM.S.INV(1 - α/2)
• For RTT: NORM.S.INV(1- α)
• For LTT: NORM.S.INV(α)
• p – value Approach
• Let Z be the computed value of the test statistic and Z ~ N (0,1)
c
• Then p – value is given by the following probability
• For two-tailed tests: 2P(Z> |Zc|)
• Excel Formula: 2*(1-
NORM.S.DIST(ABS(Zc),1))
• For right-tailed tests: P(Z> Zc)
• Excel Formula: 1-NORM.S.DIST(Zc,1))
• For left-tailed tests: P(Z< Zc)
• Excel Formula: NORM.S.DIST(Zc,1))

Testing of Hypothesis for µ (z-test)

• Conditions/ Assumptions:
• n<30; Population is normal; σ is unknown
x−μ
Z c=
σ
• Test Statistic:
√n
1.Obtain the Critical Values using t
distribution with (n-1) degree of
freedom (t ). (n−1)

• Excel Formula
• For TTT: T.INV(α/2,n-1) and
T.INV(1 - α/2 ,n-1)
• For RTT: T.INV(1- α ,n-1)
• For LTT: T.INV(α ,n-1)
2. p – value Approach in t-test
1. Let Tc be the computed value of the test statistic and T ~t(n-1)
2. Then p – value is given by the following probability
• For two-tailed tests: 2P(T> |Tc|)
1. Excel Formula: 2*(1-T.DIST(ABS(Tc),n-1,1))
• For right-tailed tests: P(T> Tc)
1. Excel Formula: 1-T.DIST(Tc, n-1,1))
• For left-tailed tests: P(T< Tc)
1. Excel Formula: T.DIST(Tc, n-1,1))

Testing of Statistical Hypothesis

(Two Samples Tests)
x 1−x 2
Z c= N (0 ,1)

√
2 2
• Z test for two independent samples σ1 σ 2
+
n 1 n2
x1 −x2
Z c= N (0 , 1)

√
2 2
• Z test for two independent samples s s 1 2
+
n1 n2
• t test for two independent samples assuming equal variances
x 1−x 2
T c= t ( n +n −2 ) 1
[ ( n1−1 ) s 1+ ( n2−1 ) s2 ]
√
2 2 2
• 1 1 1 2
, where S =
S + n 1 +n 2 −2
n1 n2
• Use t (n +n −2) distribution for critical value/ p-value.
1 2

• t test for two independent samples assuming unequal variances

2
x 1−x 2 ( s21 /n 1+ s 22 /n 2)
T c= t (f ) f=

√ [ ]
2 2
• 2
s1 s2
+
2
, Where ( s 1 /n1 ) ( s 2 /n2 )
2 2

n1 n2 +
n1 −1 n2−1
d
• paired t test T c= t (n−1)
sd / √ n
• Testing the Hypothesis for Difference of Proportions
p1− p 2
Z c= N (0 , 1) n1 p 1+ n2 p2
•
√^π (1− ^π ) 1
+
1
n1 n2( )
, where ^π =
n1 + n2

( p1 − p2 )
N ( 0 ,1 )
• Thus,
√ 1 1
π (1−π )( + )
n1 n2
.
• In the given example, we have three populations.
• We wish to test
• H0: π1 = π2 = π3 (All the proportions are the same)
• H1: Not all π1, π2, π3 are equal
• The table of data shown in the example is called the Contingency Table.
• Contingency Tables are used to classify sample observations according to two or more
characteristics.
• Contingency Table is useful in situations involving multiple population proportions.
• Let a contingency table has r rows and c columns.
• Then, it will have r x c cells
Chi square tests are always right tailed.

• We always have

• If we approximate some expected frequency, we must make sure that above condition is
satisfied.
• In these problems, data is of discrete type
• Chi – Square distribution is a continuous distribution.
• It loses its validity if any expected frequency is less than FIVE.
• In such case, the expected frequency is pooled with the preceding or succeeding
frequency.
• D.f. is reduced by one for one such pooling.
• We do not make any assumption about the distribution of parent population.
• The difference between two means can be examined using t – test or Z – test.
• If we have more than 2 samples.
• We wish to test the hypothesis that
• all the samples are drawn from the population having the same means.
• Or all population means are the same.
• We use ANOVA.

• ANOVA is essentially a
procedure for testing the
difference among various
groups of data for
homogeneity.
• At its simplest, ANOVA tests
the following hypotheses:
 H0: The means of all the
groups are equal.
 H1: Not all the means are
equal
• doesn’t say how
or which ones
differ.
• Can follow up
with “multiple
comparisons”
•
ANOVA IS ALWAYS RIGHT TAILED
TOO
• If the observations are large, you can
shift their origin and scale.
• This will not change the result.
• Shifting origin means adding or
subtracting some constant.
• Shifting of scale means multiplying or
dividing by some constant.
• Two-way analysis of variance is
an extension of one-way
analysis of variance.
• The variation is controlled by
two factors.
• The values of random variable
X are affected by different
levels of two factors.
• Assumptions
 The populations are normally
distributed.
 The samples are
independent.
 The variances of the
populations are equal.

• HA0: All levels of Factor A have the

same effect
• HA1: All levels of Factor A don’t have
the same effect
• HB0: All levels of Factor B have the
same effect
• HB1: All levels of Factor B don’t have
the same effect
• HAB0: There is no interaction effect
• HAB1: Interaction effect is there
•

Sampling
No ratings yet
Sampling
34 pages
Cape Applied Mathematics Cheat Sheet
No ratings yet
Cape Applied Mathematics Cheat Sheet
6 pages
STS 201 Week 6 Lecture Note
No ratings yet
STS 201 Week 6 Lecture Note
35 pages
Managerial Statistics Formulas
No ratings yet
Managerial Statistics Formulas
6 pages
Справочник по Гипотезам
No ratings yet
Справочник по Гипотезам
3 pages
QM Consolidated Formulae
No ratings yet
QM Consolidated Formulae
40 pages
Probability and Statistics - 3
No ratings yet
Probability and Statistics - 3
59 pages
Study Notes On Estimation
No ratings yet
Study Notes On Estimation
17 pages
I. Test of a Mean: σ unknown: X Z n Z N X t s n ttn
No ratings yet
I. Test of a Mean: σ unknown: X Z n Z N X t s n ttn
12 pages
Normal Population, Hypothesis Testing
No ratings yet
Normal Population, Hypothesis Testing
44 pages
Unit 5 Mba 1ST
No ratings yet
Unit 5 Mba 1ST
197 pages
Inference Using Normal and T Distribution
No ratings yet
Inference Using Normal and T Distribution
9 pages
Chapter 8
No ratings yet
Chapter 8
45 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
No ratings yet
22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
69 pages
Statistics Help Card Formulas
No ratings yet
Statistics Help Card Formulas
3 pages
Statistics Formulas & Probability Rules
No ratings yet
Statistics Formulas & Probability Rules
3 pages
课本附录 (二) - 公式表 Formula Sheet - final
No ratings yet
课本附录 (二) - 公式表 Formula Sheet - final
2 pages
CRE Equations and Formulas Print Out
100% (1)
CRE Equations and Formulas Print Out
30 pages
MAS202 - Assignment 2: Exercise 1
No ratings yet
MAS202 - Assignment 2: Exercise 1
16 pages
Section 5.7
No ratings yet
Section 5.7
47 pages
Formuleblad Statistiek
No ratings yet
Formuleblad Statistiek
10 pages
Reliance JIO
No ratings yet
Reliance JIO
69 pages
Chapter-8-Estimation & Hypothesis Testing Bios
No ratings yet
Chapter-8-Estimation & Hypothesis Testing Bios
10 pages
Chapter 9 (Independent Means Only) UPDATED!!!
No ratings yet
Chapter 9 (Independent Means Only) UPDATED!!!
27 pages
Chapter 8
No ratings yet
Chapter 8
21 pages
Review Chapter 8
No ratings yet
Review Chapter 8
5 pages
CQE Academy Equation Cheat Sheet B
No ratings yet
CQE Academy Equation Cheat Sheet B
15 pages
PSCV Unit-Iii Digital Notes
No ratings yet
PSCV Unit-Iii Digital Notes
46 pages
Statistics Packet
No ratings yet
Statistics Packet
17 pages
Formula List: Subject: Managerial Statistics
No ratings yet
Formula List: Subject: Managerial Statistics
7 pages
Statistics and Probability Guide
No ratings yet
Statistics and Probability Guide
7 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
22 pages
Research Methodology and Biostatistics Part II 2
No ratings yet
Research Methodology and Biostatistics Part II 2
45 pages
Black Belt Training - Module 2 - Day 3
No ratings yet
Black Belt Training - Module 2 - Day 3
95 pages
U-3 Notes
No ratings yet
U-3 Notes
42 pages
CH 10
No ratings yet
CH 10
43 pages
C22 Inferential Statistics DXB
No ratings yet
C22 Inferential Statistics DXB
66 pages
4 Inferentials
No ratings yet
4 Inferentials
53 pages
Large Sample Test
No ratings yet
Large Sample Test
6 pages
IE 2207 Mod 1 CH 10
No ratings yet
IE 2207 Mod 1 CH 10
47 pages
1 Review 1-13-2025
No ratings yet
1 Review 1-13-2025
97 pages
P&S UNIT-5 Testing of Hypothesis
No ratings yet
P&S UNIT-5 Testing of Hypothesis
47 pages
Quality Control & Reliability Course
No ratings yet
Quality Control & Reliability Course
46 pages
Statistical Inference For Namagers
No ratings yet
Statistical Inference For Namagers
4 pages
CFA Level 1 Review - Quantitative Methods
50% (2)
CFA Level 1 Review - Quantitative Methods
10 pages
Final Exam of Business Statistics I at ADA University
No ratings yet
Final Exam of Business Statistics I at ADA University
14 pages
ECM1001 Formula Sheet
No ratings yet
ECM1001 Formula Sheet
15 pages
Outline 3
No ratings yet
Outline 3
1 page
Formulas
No ratings yet
Formulas
8 pages
Important Formulas: Data Description Discrete Probability Distributions
No ratings yet
Important Formulas: Data Description Discrete Probability Distributions
7 pages
QT
No ratings yet
QT
16 pages
CH 8
No ratings yet
CH 8
20 pages
ST102 Notes
0% (1)
ST102 Notes
21 pages
Business Analytics & Machine Learning: Regression Analysis
No ratings yet
Business Analytics & Machine Learning: Regression Analysis
58 pages
Stat Prob
No ratings yet
Stat Prob
7 pages
CQE Academy Equation Cheat Sheet - D
No ratings yet
CQE Academy Equation Cheat Sheet - D
15 pages
Conjunction of 2 Planets
No ratings yet
Conjunction of 2 Planets
15 pages
SR 2406 Supplemental Requirements - NACLA Recognized 17025 Construction Materials Testing-13027-7
No ratings yet
SR 2406 Supplemental Requirements - NACLA Recognized 17025 Construction Materials Testing-13027-7
8 pages
Engineering Lab: Compressor & Pump
No ratings yet
Engineering Lab: Compressor & Pump
3 pages
Technical Specifications of Flyash For Using As A Fill Material
No ratings yet
Technical Specifications of Flyash For Using As A Fill Material
7 pages
Occupational Health and Safety at Work For Dummies, UK Edition - 978!1!119-28724-7
No ratings yet
Occupational Health and Safety at Work For Dummies, UK Edition - 978!1!119-28724-7
2 pages
Science Quiz for Students
No ratings yet
Science Quiz for Students
9 pages
Jadavpur University Kolkata - 700032: WWW - Jaduniv.edu - in
No ratings yet
Jadavpur University Kolkata - 700032: WWW - Jaduniv.edu - in
10 pages
IGCSE Biology: Osmosis Experiment
No ratings yet
IGCSE Biology: Osmosis Experiment
2 pages
Civil Engineering Methodology Dissertation
100% (2)
Civil Engineering Methodology Dissertation
4 pages
English Mini-IO Transcript
No ratings yet
English Mini-IO Transcript
4 pages
Fungi Types, Morphology & Structure, Uses and Disadvantages: Ascomycota
No ratings yet
Fungi Types, Morphology & Structure, Uses and Disadvantages: Ascomycota
7 pages
Cohesive Nouns
100% (1)
Cohesive Nouns
3 pages
Holmen 200 Manual Ver 1 2 2
No ratings yet
Holmen 200 Manual Ver 1 2 2
24 pages
NCRTAS Brochure Updated
No ratings yet
NCRTAS Brochure Updated
2 pages
Forget Developing Poor Countries Its Time To De-Develop Rich Countries by Jason Hickel
No ratings yet
Forget Developing Poor Countries Its Time To De-Develop Rich Countries by Jason Hickel
2 pages
Essential 5 Tarjima
No ratings yet
Essential 5 Tarjima
8 pages
Getting The GMMA Right
No ratings yet
Getting The GMMA Right
3 pages
1st ASM1 Planning A Computing Project NguyenNgocTuyen BH01279 Slide
No ratings yet
1st ASM1 Planning A Computing Project NguyenNgocTuyen BH01279 Slide
22 pages
Chemistry Solubility Project
No ratings yet
Chemistry Solubility Project
9 pages
A Review of Blast Loading in The Urban Environment
No ratings yet
A Review of Blast Loading in The Urban Environment
32 pages
Landmine Detection Using Autoencoders On Multipolarization GPR
No ratings yet
Landmine Detection Using Autoencoders On Multipolarization GPR
14 pages
Assignment On Cells and Stem Cells
No ratings yet
Assignment On Cells and Stem Cells
10 pages
Infix Prefix Postfix
No ratings yet
Infix Prefix Postfix
5 pages
13 Amines 2
No ratings yet
13 Amines 2
17 pages
Four Theoretical Contributions Which Are Central To The Understanding of Organizations Ezdehar Okasheh University of The People
No ratings yet
Four Theoretical Contributions Which Are Central To The Understanding of Organizations Ezdehar Okasheh University of The People
8 pages
Lecture 5 ParametricMethod
No ratings yet
Lecture 5 ParametricMethod
20 pages
Subject Overview Physics Year 4
No ratings yet
Subject Overview Physics Year 4
1 page
Testing Argument Validity Using Truth Table
No ratings yet
Testing Argument Validity Using Truth Table
5 pages
Project 4 Final Draft 1
No ratings yet
Project 4 Final Draft 1
9 pages
Class - VI Mathematics (Ex. 9.1) Questions: Portal For CBSE Notes, Test Papers, Sample Papers, Tips and Tricks
No ratings yet
Class - VI Mathematics (Ex. 9.1) Questions: Portal For CBSE Notes, Test Papers, Sample Papers, Tips and Tricks
14 pages

Statistics for Data Analysts

Uploaded by

Statistics for Data Analysts

Uploaded by

HYPOTHESIS TESTING

• then the statistic is said to be an Unbiased Estimate of the parameter.

• Sample mean is an unbiased estimate of the population mean.

• Unknown parameters are estimated using sample observations.

• Parameter values are fixed.

• Values of statistics vary from sample to sample.

• Each sample has some probability of being chosen.

• Each value of a statistic is associated with probability.

• Thus, Statistic is a random variable.

• Distribution of a statistic is called a sampling distribution.

• We saw in previous example, E ( x )=μ∧Var ( x )=σ 2 /n.

• This is always true and can be proved as below:

• Here we shall call it Standard Error.

• Standard Error of x is a measure of the variability in different values of sample mean.

Central Limit Theorem

• The result may also be written as

• Sample comes out of a normal population, or

• Sample size is large (n ≥ 30).

• Suppose a population has mean μ = 8 and standard deviation σ = 3.

POPULATION & SAMPLE PROPORTIONS

• X and π are population parameters.

• We have seen the following 2 results:

• Two types of error:

Testing of Hypothesis for µ (z-test)

Testing of Hypothesis for µ (z-test)

Testing of Statistical Hypothesis

• t test for two independent samples assuming unequal variances

• HA0: All levels of Factor A have the

You might also like