KEMBAR78
Basic Statistical Test | PDF | Student's T Test | Type I And Type Ii Errors
0% found this document useful (0 votes)
24 views116 pages

Basic Statistical Test

1. The document discusses using analysis of variance (ANOVA) to test differences between three or more group means. 2. ANOVA can be used to analyze data from a study that investigated differences in math performance of students based on the age of their teacher (young, middle-aged, aging). 3. The study found variation in student performance within groups that could be due to chance or other uncontrolled factors, in addition to possible effects of teacher age.

Uploaded by

Aliah Joy Junio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views116 pages

Basic Statistical Test

1. The document discusses using analysis of variance (ANOVA) to test differences between three or more group means. 2. ANOVA can be used to analyze data from a study that investigated differences in math performance of students based on the age of their teacher (young, middle-aged, aging). 3. The study found variation in student performance within groups that could be due to chance or other uncontrolled factors, in addition to possible effects of teacher age.

Uploaded by

Aliah Joy Junio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

Basic Statistical

Test
TESTING FOR
SIGNIFICANT
DIFFERENCE
t-Test
t-Test

1. A t-test is a type of inferential statistic used to determine if


there is a significant difference between means of two
group.
2. Calculating a t-test requires three key data values (mean
difference, standard deviation of each group and number of
data values of each group)
3. There are several types of t-test (One sample, independent
and paired)
t-Test Assumptions

1. Scale of Measurement (Ordinal or Continuous Data)


2. Simple Random Sample
3. Normally distributed data
4. Homogeneity of Variance
Is there a difference To determine if I think boys
in the mathematics and girls
there is a perform
performances of boys
and girls? difference similarly in
math

problem objective hypothesis


Statistical Hypothesis

An assertion, conjecture, belief, claim,


allegation, contention, guess,
supposition, or theory concerning one
or more populations

Most research works are hypothesis-


driven.
Types of Hypothesis

■ Null Hypothesis: a hypothesis formulated with the hope that they be


rejected at the end of the study
■ Alternative Hypothesis: a hypothesis that is accepted in case the null is
rejected
Types of Error
■ Type I error: The error committed when one rejects
the null hypothesis while its true
■ Type II error: The error committed when one does
not reject the null hypothesis while its false
Level of significance (alpha level)

■ The probability of falsely rejecting the null hypothesis

■ The probability of committing Type I error

■ Traditionally, the maximum allowed level of alpha the researcher


should commit is set at 1% or 5%
p-value/significance

■ The actual probability of type I error committed by the


researcher
Steps in hypothesis testing
■ Formulate your hypothesis
■ State the level of significance
■ Compute the test statistic value (t, F etc)
■ Determine the p-value/sig
■ Decide whether to reject or not the hypothesis
– If the p-value is smaller than level of sig then reject the null
hypothesis
Example of problems of
comparative type

■ Is the average span of human life significantly


shorter than 65 years? (one sample mean test)
■ Is there a significant difference in the work
performance of regular and temporary
employees? (two independent sample mean
test)
■ Is there a significant difference in the
performance of the students in the posttest
compared to their earlier performance in the
pretest (paired sample mean test)
case age at death
1 71
2 76
3 19
4 18
5 19
6 72
7 62
8 42
9 60
10 76
11 60
12 42
13 66
Steps in hypothesis testing
■ Formulate your hypothesis
■ State the level of significance
■ Compute the test statistic value (t-value)
■ Determine the p-value/sig
■ Decide whether to reject or not the hypothesis
– If the p-value is smaller than level of significance then reject the
null hypothesis
Countless factors affect life expectancy at the individual level: nutrition,
exercise, income, education, risk-taking, and stress. Life expectancy differs among
countries of varying levels of development. In the Philippines life expectancy on the
average is believed to be 65. Data were collected to serve as evidences in proving this
hypothesis.
The table above shows the result of the one sample mean test for the
average life span today that was hypothesized to be 65. An average life span of 52.54
years is computed from the evidences gathered which is lower by 12.46 years
compared to the hypothesized value of 65 years. The t-value of 2.302 with significance
of 0.065 is an indication that the observed difference is not significant at the 0.05
level. This means that the evidences gathered are not enough to dispute the
hypothesis and therefore the average life span of 65 years is acceptable.
permanent temporary
89 93
69 84
77 92
69 89
86 81
67 80
83 87
73 82
67 72
77 72
77 94
81 92
67 88
83
86
Mean
Name pretest posttest
AQUINO, MANNY 14 20
ADLOC, JUSTINE 14 19
ANCHETA, RONALD 15 21
CABISON, MARK KEVIN 9 6
CABANGON, ROLLY 5 7
CABUCOY, ALVIN 1 21
CALIBOSO, CHRISTOPHER JESTER 13 13
CAJAS, GEORGE 5 13
CRUZ, JOEVERT 9 16
DEL ROSARIO, FREDDIE 0 9
ELEAZAR, JUDILYN 3 8
ETRATA, FRANKLIN 0 14
GABOT, LEONARD 7 23
GUTIERREZ, MARVIN 14 19
IBAY, JUNALYN 12 18
JARDINEZ, PETER ANGELO 6 21
LEE LEONG, MICHAEL ANGELO 9 21
MANIBOG, IRISH JEN YVETT 4 23
MANGAMPAT, AL 0 0
MENCIAS, PRECIOS 2 16
MERANDILA, MONICA MAE 5 25
NACES, MAUREEN 1 4
Problem Set
Analysis of Variance (F-test)
Analysis of Variance (ANOVA)
1. ANOVA is called the Fisher analysis of variance,
and it is the extension of t-test and z-test.
2. ANOVA is used for three or more groups of data.
3. ANOVA is helpful for testing three or more
variables. It is similar to multiple two-sample t-test
SEVERAL
INDEPENDENT
SAMPLE MEAN
TEST
Analysis of Variance (ANOVA)
Sample Problem
■ Based on the result of the NAT, is there a significant difference in the
Mathematics performance of High School Students taught by young
(below 30 years), middle-aged (30 to 49),and aging (50 and above)
teachers?
AGE OF TEACHERS
Young Middle Aging
28 44 33
49 32 21
46 37 23
46 32 25
38 29 35
36 29 47
48 31 30
39 29 33
30 45 38
44 28 40
33 30 27
49 40 24
44 46 20
46 39 28
40 44 31
Mean 41.07 35.67 30.33
SD 6.88 6.75 7.56
AGE OF TEACHERS
Young Middle Aging
28 44 33 ■ The purpose of
49 32 21 research is to
46 37 23 investigate the
cause of
46 32 25
differences in
38 29 35 scores or the
36 29 47 variation
48 31 30
39 29 33 ■ As researcher of
this study, you
30 45 38
suspect that a
44 28 40 certain factor is
33 30 27 contributory to
49 40 24 the said variation.
44 46 20
46 39 28
40 44 31
Mean 41.07 35.67 30.33
SD 6.88 6.75 7.56
AGE OF TEACHERS
Young Middle Aging
30 23 19 ■ If only the age of the
30 23 19 teacher is the cause
30 23 19 of variation, then the
NAT result could have
30 23 19
been like this.
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
30 23 19
Mean 30 35.67 30.33
SD 0 0 0
AGE OF TEACHERS
Young Middle Aging
28 44 33 ■ Despite the fact that
49 32 21 highly performing
46 37 23 group was taught by a
young teacher, still
46 32 25
the students
38 29 35 performed differently
36 29 47 from one another.
48 31 30
39 29 33 ■ What could have
caused this variation?
30 45 38
44 28 40
33 30 27
49 40 24
44 46 20
46 39 28
40 44 31
Mean 41.07 35.67 30.33
SD 6.88 6.75 7.56
Two sources of variation
1. Treatments applied (Between subject)
2. Sampling Error –
a. Chance
b. Factors not considered in the study
Analysis of Variance
Sum of
Source of Degrees of Mean p-value/
Variation Squares freedom squares F sig
Treatment TrSS tdf TrMS=TrSS/trdf TrMS/EMS
Error ESS edf EMS=ESS/edf
Total TSS Tdf

■ The null hypothesis here is about the equality of all the


means

• The mean performance scores of the students


under the teachers of varying ages are equal
Analysis of Variance
Sum of
Source of Degrees of Mean p-value/
Variation Squares freedom squares F sig
Treatment TrSS tdf TrMS=TrSS/trdf TrMS/EMS
Error ESS edf EMS=ESS/edf
Total TSS Tdf

TrMS  Error + mean effect of treatments


F=
EMS  Error

• When F→1 treatment effects are insignificant


• When F→ high treatment effects becomes apparent
Analysis of Variance
Sum of
Source of Degrees of Mean p-value/
Variation Squares freedom squares F sig
Treatment 1678.00 2.00 839.00 55.146 0.0000012
Error 213.00 14.00 15.21
Total 1891 16

Analysis of Variance
Sum of
Source of Degrees of Mean p-value/
Variation Squares freedom squares F sig
Treatment 89.00 2.00 44.50 2.925 0.086817769
Error 213.00 14.00 15.21
Total 302 16

■ The ratio of the two variances (F) must be sufficiently


high to reject the null hypothesis. A high value of F will
lead to smaller p-value
Steps

■ Null Hypo: There is no significant difference in the Students’


Math performance in NAT under different teachers age levels
■ Alternative Hypo: There are at least two means which are
significantly different
■ Alpha/level of sig: 5% ( whenever I reject the null, I am willing
to assume 5% risk)
Treatments Categories NAT Mean SD F sig
Age of teacher Young 41.07 6.88 9.564 0.000
Middle 35.67 6.75
Old 30.33 7.56

■ The table above shows that the highest mean was observed for
students taught by young teachers followed by middle aged teachers.
Students performed low when handled by aging teacher. The f value
computed is 9.564 with an associated p-value of 0.000. Since the p-
value is less than the 5%level significance, then the null hypo is
immediately rejected which means that there is sufficient evidence to
conclude that at least 2 means are significantly different.
Multiple Mean Comparison Test

(Post hoc or After-ANOVA test)
If found significant, which pair(s) of means of the categories of
teacher’s age is(are)significantly different?

• Young and middle?


• Middle and adult?
• Young and adult?
Mean Comparison Test using Scheffe’s test
Compared Categories Mean difference sig
Young Middle 32.00 0.661
Aging 22.50 0.043

Middle Aging -9.50 0.021

■ Further comparison of means using Scheffes test reveals that mean


observed young teachers is not significantly different from means
observed for classes handled by middle aged teachers. Compared
however to old teachers, the performance of the students are
significantly different. On the other hand, when compared to students
handled by aging teachers, students taught by middle-aged teachers
performed differently higher.
Testing for Significant
Relationship
Statement of the Problem
 Is there a significant relationship between the students performance in
Biology and extent of parents assistance to their children?
 Is there a significant relationship between the students performance in
Algebra and the students number of hours spent in studying the subject?
 Is there a significant relationship between the students performance in
Biology and their:
 Height
 Weight
 Is there a significant relationship between level of blood pressure and age?
 Is there a significant association between smoking habit and religion?
 Is there a significant relationship between rate of fertilization and yield of
rice?
 Is there a significant relationship between drying rate and ambient
temperature?
 Is there a significant relationship between altitude and fruiting density of
mango?
Graphical representation of relationship

■ If it is possible to fit a single straight line, then the relationship is


perfect
25

20

15
Y

10

0
0 2 4 6 8 10 12
X

■ If a straight line is unimaginable, then this is an indication of the


absence of significant relationship (independence)
Graphical representation of relationship

25 30

20 25

20
15
15
Y

Y
10
10
5
5

0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
X X

Perfect Strong

30 25

25
20
20
15
15
Y

Y
10
10

5 5

0
0
0 2 4 6 8 10 12
0 2 4 6 8 10 12
X
X

Moderate Weak/Poor
 The wider the distance between the two lines that colonize the scattered
points the weaker is the relationship between the involved variables
Signed relationship

30

25

20

15
Y

10

0
0 2 4 6 8 10 12
X

 Positive relationship ■ Negative relationship


 Increase in x →increase in y ■ Increase in x →decrease in y
Non-linear relationship
2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
r=-0.0582
r=-0.12332
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Perfect but Moderate


nonlinear nonlinear
(quadratic) (quadratic)
70

60

50

40
Y

30

20

10

0
0 1 2 3 4 5
X

Strong
nonlinear
(exponential)
Simple Correlation
 Pearson r- measure linear relationship between two normally
distributed ratio variables, 2 interval variables, or 1 ratio and 1 interval
variables
cov(x, y )
r=
SDx SDy

 Where cov(x,y) is the covariance between x and y are the standard


deviations of x and y resp
 Mathematically, the numerator cannot exceed the denominator

SDx , SD y
Important points about Pearson r
−1  r  1 → 0  r  1

 r= 0 implies absence of significant relationship or


the variables are independent
 r=1 implies perfect relationship as
the relationship becomes weaker
as r → 0 the relationship becomes stronger
r →1
To give meaningful explanation to each computed correlation coefficient ,
You can use the suggestions of Guilford’s (1956).

Absolute r Interpretation

value/contingency

less than 0.2 Slight; almost negligible relationship

Above 0.2 to 0.4 Low: definite but small relationship

Above 0.4 to 0.7 Moderate: substantial relationship

Above 0.7 to 0.9 High: marked relationship

Above 0.9 to 1.0 Very high: very dependable relationship


Other common tools to measure
relationship

■ Spearman rho- measure directional


relationship between :
– 2 ordinal variables;
– 1 interval and one ordinal;
– one ratio and one ordinal;
– two intervals/ratio that are assumed non-
normal
– Relationship of variables measured on Likert
Scale are suggested to be expressed in terms
of Spearman rho
■ Chi-Square-measure association between:
– Two nominal variables
– One nominal and one ordinal
Chi-square is a measure of non-directional
relationship
Sample
■ A study was conducted to determine variables
that are significantly associated with academic
performance. The following variables are
considered:
a. Daily_Allowance
b. HS_GPA
c. Religion
d. Mothers_Age
e. Mothers_Education
f. HS_Graduated_From

Null Hypothesis: There is no significant relationship between academic performance and the
following variables: Daily allowance, High School GPA, Religion, Mothers Age, Mothers
Education, HS Graduated From
Data
Mother's HS Graduated Academic
Daily Allowance HS GPA Religion Mother's Age Education From Performance
10 98.000 1 29 1 2 76.200
21 99.840 3 26 1 1 78.520
30 99.160 3 34 1 1 79.600
29 98.880 1 32 1 1 81.480
14 98.400 3 25 1 1 78.680
18
43
98.320
97.640
2
3
28
30
11
1
1
83.160
83.160
20 96.760 2 32 1 1 77.400
49 96.400 3 27 1 2 80.880
66 95.960 1 30 1 1 83.920
45 95.920 2 27 1 2 84.400
29 95.280 1 24 1 2 78.480
14 95.240 2 32 1 2 82.680
33 95.040 3 23 1 2 78.960
50 94.760 1 29 1 2 84.000
36 94.600 3 33 1 1 85.320
46 94.520 3 29 1 2 80.520
30 94.400 2 26 1 2 81.600
48 94.280 3 26 1 2 81.760
74 94.160 1 30 1 2 85.880
42 94.000 2 38 1 2 82.040
44 93.760 2 46 1 1 84.280
78 93.000 2 42 1 2 86.360
13 92.840 1 45 1 2 81.560
64 92.560 1 40 1 1 84.680
64 92.440 1 41 1 1 83.680
57 92.440 1 45 1 2 84.840
67 92.360 3 44 1 2 83.040
34 92.320 3 45 1 1 85.080
59 92.120 3 46 1 2 82.080
60 91.800 2 45 1 1 85.200
36 91.760 3 46 1 1 81.320
87 91.280 2 47 1 1 87.440
54 90.920 3 46 1 1 85.480
47 90.400 1 36 1 2 84.640
47 90.280 1 39 1 1 82.640
74 90.000 2 39 1 2 84.880
66 89.960 3 43 1 1 82.920
108 89.880 3 37 1 2 87.960
50 89.560 1 47 1 1 85.000
62 89.520 1 43 1 2 85.440
30 89.200 3 45 1 2 84.600
47 89.000 2 36 1 2 83.640
115 89.000 2 45 1 2 88.800
52 88.800 2 38 1 2 85.240
94 88.680 2 47 1 1 89.280
70 88.360 2 38 1 2 86.400
46 88.200 2 42 1 1 86.520
121 87.760 3 38 1 2 89.520
80 87.640 3 43 1 2 86.600
122 87.600 1 49 1 2 89.640
123 87.520 1 60 1 1 89.760
110 87.520 3 53 2 2 90.200
103 87.400 3 48 2 1 89.360
114 87.280 2 60 2 2 89.680
65 87.240 1 51 2 2 88.800
125 86.800 3 55 2 1 91.000
75 86.640 2 56 2 2 89.000
85 86.600 3 55 2 2 89.200
96 86.440 2 49 2 2 87.520
63 86.000 3 53 2 1 85.560
99 85.960 3 52 2 2 89.880
83 85.920 1 51 2 2 90.960
100 85.880 1 50 2 2 89.000
69 85.640 3 57 2 1 88.280
70 85.640 1 60 2 1 86.400
116 85.480 1 60 2 1 92.920
119 85.440 1 51 2 1 92.280
148 85.320 2 48 2 1 93.760
115 85.200 3 56 2 1 88.800
76 85.080 3 51 2 2 90.120
86 85.080 1 53 2 2 90.320
128 85.040 2 55 2 2 90.360
112 84.840 1 59 2 2 91.440
112 84.680 3 60 2 1 92.440
63 84.680 1 49 2 2 88.560
132 84.560 1 50 2 2 90.840
150 84.360 3 49 2 2 93.000
73 84.280 2 48 3 2 89.760
125 84.040 3 50 3 1 94.000
118 83.880 1 58 3 2 94.160
144 83.760 2 48 3 2 93.280
120 83.600 3 51 3 1 92.400
121 83.520 2 58 3 2 94.520
130 83.320 1 58 3 1 91.600
123 83.000 2 49 3 2 93.760
117 82.960 3 50 3 2 95.040
127 82.680 1 53 3 2 91.240
144 82.480 2 53 3 1 97.280
141 82.400 2 48 3 1 97.920
133 81.920 3 48 3 2 96.960
145 81.880 2 59 3 2 97.400
148 81.400 1 50 3 2 93.760
122 81.160 1 48 3 1 94.640
136 81.160 1 49 3 2 96.320
145 80.680 3 55 3 1 95.400
149 80.480 2 54 3 2 94.880
143 77.600 1 55 3 2 95.160
132 77.520 1 51 3 2 96.840
130 77.200 3 52 3 1 96.600

click here for SPSS data


Use the following information

Grade

Below 80 Poor

80 to 85 Fair

85 to 90 Good

Above 90 Very Good


Variable Statistic Value Significance Interpretation
correlated with
Academic
Performance

Pearson r -0.928 0.000* Negative and


Very high
Daily Allowance
Pearson r 0.929 0.000* Positive and very
high
HS GPA
Chi-square 4.313 0.634ns low
Contingency 0.203
Religion
Pearson r 0.793 0.000* Positive and high
Mother's Age
Spearman rho 0.869 0.000* Positive and high
Mother's Education
Chi-square 68.00 0.000* moderate
Contingency 0.636
HS Graduated From
Problem Set
FORECASTING
Forecasting

Forecasting is predicting or estimating the


future value of a variable.
Regression Methods
Regression methods deal with establishing a mathematical
relationship between independent and dependent variables. The
variable that is to be estimated is called the dependent variable
while the variable that helps in the estimation is called the
independent variable. Simple regression deals with a linear
relationship between one dependent and one independent
variable. Multiple regression deals with one dependent variable
and two or more independent variables.
Example 1:
The following data represent the relationship between
the dependent variable-Sales Revenue in millions of
dollars(y)-and the independent variables- Number of
Sales Representatives (𝑥1 ) and product price (𝑥2 ):
Year Sales Revenue (millions of Number of Sales Product Price
dollars) Representative

1 1.2 25 0.95

2 1.5 25 0.93

3 2.0 25 0.92

4 3.5 26 0.90

5 4.1 28 0.87

6 5.6 28 0.85
Answer the ff.:

(a) If the company intends to increase the number of sales


representatives to 30, use causal linear regression to
forecast next year’s revenue.
(b) If the company plans to decrease the product price to $0.82
next year, forecast next year’s sales revenue using causal
linear regression.
(c) Compare the results of (a) and (b) using the coefficient of
determination.
(a) Causal linear regression based on
number of sales representatives:
Number of Sales Sales Revenue (y)
Representatives (millions of dollars)
(x1)

25 1.2

25 1.5

25 2.0

26 3.5

28 4.1

28 5.6

Sum:
Solution:
(b) Causal linear regression
based on product price:
Product Price Sales Revenue (y)
(millions of dollars)

0.95 1.2

0.93 1.5

0.92 2.0

0.90 3.5

0.87 4.1

0.85 5.6

Sum:
Solution:
(c) Coefficients of
determination for a
(c) Coefficients of
determination for b
Example 2:
The following data represent the industry
sales(x) and Corporation ABC’s annual sales
Year Industry Sales (x) ($millions) ABC’s Sales (y) ($ millions)
(y) of toddler clothes:
1 1103 105

2 1250 117

3 1097 110

4 955 101

5 945 97

6 903 92

7 1025 104

8 1170 116
Answer the following
(a) If the industry estimate of next year’s sales is $1300 millions,
forecast ABC’s annual sales for next year using causal linear
regression.
(b) Compute the correlation coefficient and interpret its meaning.
(c) How much of the variation in ABC’s sales is explained by industry
sales?
(a) If the industry estimate of next year’s sales
is $1300 millions, forecast ABC’s annual
sales for next year using causal linear
regression.
Industry Sales (x) ABC’s Sales
(y)

1103 105

1250 117

1097 110

955 101

945 97

903 92

1025 104

1170 116
Solution:
(b) Compute the correlation
coefficient and interpret its
meaning.
(c) How much of the variation in
ABC’s sales is explained by industry
sales?
Example 3:
(a) Using the data below, forecast next year’s sales
revenue using time series linear regression with
time as the independent variable;
(b) Find theSales
Year(x)2
coefficient
(y)
of determination
1 1.2

2 1.5

3 2

4 3.5

5 4.1

6 5.6

Sum:
Solution:

You might also like