KEMBAR78
Practical | PDF | Regression Analysis | Standard Deviation
0% found this document useful (0 votes)
9 views14 pages

Practical

Uploaded by

Sailesh Sailesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views14 pages

Practical

Uploaded by

Sailesh Sailesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Tribhuvan University

Institute of Science and Technology


Mahendra Morang Adarsh Multiple Campus
B.Sc. CSIT

LAB REPORT
STATISTICS II

Submitted to
Mr. Bhupal Guragain

Submitted by
Name:
Symbol No.:
Date:
Semester: 3rd

1
Practical Record Page

S.N. Name of the Experiment Date Page Remarks


01 2081/10/02
02 2081/10/09
03 2081/11/06
04 2081/11/10
05 2081/11/15
06 2081/11/19

2
Lab No. 1: Confidence Interval
Find the confidence interval of the mean assuming a normal distribution for the following
data:

height 78 55 68 48 65 76 57 55 65 75 51 61 68 67 76
78 71 56 57 67 58 51 50 58 50 77 55 48 70 55 58
70 56 52 74 61 69 76 61 68 78 56 78 57 66 66 74
66 48 73 71 70 62 74 76 50 69 75 65 48

1. Software Used
• Microsoft Excel (2016 or newer)

• Excel functions: AVERAGE, STDEV.S, COUNTA, SQRT, T.INV.2T

2. Theory
A Confidence Interval (CI) is a range within which the true population mean is
expected to lie with a certain level of confidence. For a normally distributed population
with unknown standard deviation, we use the t-distribution:
s
CI = x̄ ± zα/2 · √
n
Where:

• x̄ = sample mean

• s = sample standard deviation

• n = sample size

• zα/2 = t-score for the given confidence level

3
3. Code (Excel Formulas)
Purpose Cell Formula
Sample Mean B2 =AVERAGE(A2:A61)
Sample Std. Deviation B3 =STDEV.S(A2:A61)
Sample Size B4 =COUNTA(A2:A61)
Standard Error B5 =B3/SQRT(B4)
z-score (95%) B6 =NORM.S.INV(1-0.05/2)
Margin of Error B7 =B6*B5
CI Lower Bound B8 =B2-B7
CI Upper Bound B9 =B2+B7

4. Result
• Sample Size (n): 60

• Sample Mean (x̄): 63.88

• Sample Std. Deviation (s): 9.55

• t-score (95% ): 1.96

• Margin of Error: 2.46

• Confidence Interval: (61.42, 66.35)

4
Lab No. 2: Two-Sample Z-Test for the Difference of
Means (Known σ)
Problem: Test whether two sample means are significantly different if they are selected
from populations with standard deviations σ1 = 840 and σ2 = 940, respectively. The
following data are given:

y 14 1 65 95 18 27 62 75 8 84 704 63 9 6 11
x 6 20 16 91 88 15 5 59 69 19 73 39 86 7 85
y 22 34 8 83 68 64 115 57 32 99 98 39 38 96 92
x 52 27 2 49 76 72 77 71 95 55 68 29 97 17 98

1. Software Used
• Microsoft Excel (2016 or newer)

• Excel Functions: AVERAGE, COUNT, SQRT

2. Theory
A two-sample Z-test is used to determine whether two population means are signif-
icantly different when population standard deviations are known. The formula for the
Z-test statistic is:
x̄1 − x̄2
Z=q 2
σ1 σ22
n1
+ n2

Where:

• x̄1 , x̄2 = sample means

• n1 , n2 = sample sizes

• σ1 , σ2 = population standard deviations

3. Code (Excel Formulas)


Purpose Cell Formula
Mean of y B2 =AVERAGE(range_of_y)
Mean of x B3 =AVERAGE(range_of_x)
Sample size y B4 =COUNT(range_of_y)
Sample size x B5 =COUNT(range_of_x)
Z-statistic B6 =(B2-B3)/SQRT((840^2/B4)+(940^2/B5))

5
4. Solution (Values)
• Sample Size of y (n1 ): 30

• Sample Size of x (n2 ): 30

• Mean of y (x̄1 ): 66.07

• Mean of x (x̄2 ): 54.93

• Population SDs: σ1 = 840, σ2 = 940

Hypotheses

• H0 : µ1 = µ2 (no significant difference)

• H1 : µ1 6= µ2 (significant difference exists)

Test statistics
66.07 − 54.93 11.14 11.14 11.14
Z= q =√ =√ = ≈ 0.0484
8402 9402
+ 30 23520 + 29413.33 52933.33 230.0
30

5. Result
• Z-calculated = 0.0484

• Z-critical (95% confidence, two-tailed): ±1.96

• Since |Z| < 1.96, we fail to reject the null hypothesis.

• Conclusion: The two sample means are not significantly different.

6
Lab No. 3: Paired t Test
Problem: Memory capacity of 10 students was tested before and after training. State
whether the training was effective or not using the following scores:

Roll No. 1 2 3 4 5 6 7 8 9 10
Before training 12 14 11 8 7 10 3 0 5 6
After training 15 16 10 7 5 12 10 2 3 8

1. Software Used
• Microsoft Excel (2016 or newer)

• Excel functions: AVERAGE, STDEV.S, SQRT, COUNT, basic arithmetic operations

2. Theory
A paired t-test compares two related samples to determine whether their means differ
significantly. The test statistic is calculated using:


t= √
sd / n
Where:

• d:
¯ mean of the differences (After - Before)

• sd : standard deviation of the differences

• n: number of pairs

The null hypothesis H0 : There is no significant difference in means (training is not


effective). The alternative hypothesis H1 : There is a significant difference in means
(training is effective).

3. Code (Excel Formulas)


Purpose Formula
Difference (di ) =After - Before
Mean of differences (d)
¯ =AVERAGE(d_range)
Std. dev. of differences (sd ) =STDEV.S(d_range)
Sample size (n) =COUNT(d_range)
Standard Error =s_d/SQRT(n)
t-statistic =mean_d / SE

7
4. Solution (Step-by-Step)
• Differences di = [3, 2, -1, -1, -2, 2, 7, 2, -2, 2]

• d¯ = 1.2

• sd = 2.74

• n = 10

• Standard Error = 2.74



10
= 0.867

• t= 1.2
0.867
≈ 1.384

Degrees of Freedom: df = n − 1 = 9

5. Result
• t-calculated = 1.384

• t-critical (two-tailed, 95% CI, df = 9) = 2.262

• Since |t| < tcritical , we fail to reject the null hypothesis.

• Conclusion: The training was not significantly effective.

8
Lab no. 4
The Following table represents the layout of CRD of four treatments.

A (9) B (14) D (11) C (10)


D (8) A (14) B (13) C (16)
B (7) C (12) D (5) A (11)
A (14) B (12) C (6) D (5)

At 5% level of significance test whether there is significant difference between mean


of 4 treatments.

Hypothesis:
• H0 : There is no significant difference between treatments.
• H1 : There is significant difference between treatments.
Alpha = 5%

Test statistics:
• Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Row 1 4 48 12 6
9.66666
Row 2 4 46 11.5 7
17.3333
Row 3 4 44 11 3
Row 4 4 29 7.25 8.25

ANOVA
Source of Variation SS df MS F P-value F.crit
Treatment 56.1875 3 18.7291 1.81616 0.19784 3.49029
Error 123.75 12 10.3125
Total 179.937 15

Decision
Since Fcalc < Ftab so we accept H0 .
Hence we conclude that there is no significant difference between mean of 4 treatments.

9
Lab no: 5
The following table gives the result of the experiment on four varieties of a crop in 5
blocks of plot.

Block I Block II Block III Block IV Block V


A 32 B 33 D 30 A 35 C 36
B 34 C 34 C 35 C 32 D 29
C 31 A 34 B 36 B 37 A 37
D 29 D 26 A 33 D 28 B 35

Analyse the above result to test whether there is significant difference between yields
of four varieties and also test whether blocks are homogenous or not.

Hypothesis:
H0T : There is no significant difference between treatments.
H1T : There is significant difference between treatments.
H0B : There is no significant difference between blocks.
H1B : There is significant difference between blocks.

Level of significance:
α = 5%

Test statistics:
Anova: Two-Factor Without Replication

SUMMARY Count Sum Average Variance


A 5 171 34.2 3.7
B 5 175 35 2.5
C 5 168 33.6 4.3
D 5 148 29.6 11.3

Decision:
In both cases, fcal > ftab , so we reject H0 .
Hence we conclude that there is significant difference between treatments and blocks.

10
Count Sum Average Variance
I 4 126 31.5 4.33333
II 4 127 31.75 14.9166
III 4 134 33.5 7
IV 4 132 33 15.3333
V 4 143 35.75 0.91666

Source of Variation SS df MS F P-value F crit


Treatments 86.6 3 28.8666 8.46943 0.00272 3.49029
Blocks 46.3 4 11.575 3.40833 0.04456 3.25916
Error 40.9 12 3.40833
Total 173.8 19

Lab no: 6
22. The table shows the corresponding values of the three variables X1 , X2 , and X3 :

X1 5 7 8 6 10 9
X2 12 20 30 40 33 25
X3 51 55 58 60 70 66
Find the regression equation of X1 on X2 and X3 . Estimate X1 when X2 = 50 and
X3 = 100.
Where X1 represents pull strength, X2 represents wire length and X3 represents die height

Step 1: Computation Table


X1 X2 X3 X12 X22 X32 X1 X2 X 1 X3 X2 X3
5 12 51 25 144 2601 60 255 612
7 20 55 49 400 3025 140 385 1100
8 30 58 64 900 3364 240 464 1740
6 40 60 36 1600 3600 240 360 2400
10 33 70 100 1089 4900 330 700 2310
9 25 66 81 625 4356 225 594 1650
45 160 360 355 4758 21846 1235 2758 9812

11
Step 2: Normal Equations
We assume the multiple linear regression model:

X1 = a + bX2 + cX3

The normal equations using summation are:


X X X
X1 = na + b X2 + c X3
X X X X
X1 X2 = a X2 + b X22 + c X 2 X3
X X X X
X1 X3 = a X3 + b X 2 X3 + c X32
Substituting the values from the table:

45 = 6a + 160b + 360c (1)


1241 = 160a + 4268b + 9812c (2)
2713 = 360a + 9812b + 21746c (3)
Solving these three simultaneous equations (by matrix or substitution method), we
get approximate values:

a ≈ −2.5, b ≈ 0.2, c ≈ 0.1

Step 3: Regression Equation


X1 = −2.5 + 0.2X2 + 0.1X3

Step 4: Estimation
For X2 = 50 and X3 = 100:

X1 = −2.5 + 0.2(50) + 0.1(100) = −2.5 + 10 + 10 = 17.5


Estimated value of X1 is 17.5.

Using Microsoft Excel. First, the data were entered into three separate columns for X1 ,
X2 , and X3 . Then, the Data Analysis tool was accessed from the Data tab. The
Regression option was selected, and the input ranges for the dependent variable (X1 )
and the independent variables (X2 and X3 ) were specified. After appropriate options were
selected, including checking the Labels box and specifying an output range, the regression
output was generated. From the output, the coefficients of the regression equation were
extracted, and the estimation for X1 was calculated accordingly.

12
Lab No. 7
Following are the scores obtained by trainees in 3 different categories. Test whether 3
categories have performed equally.

Kruskal-Wallis H Test (Corrected Solution)


Given: Scores of 3 categories A, B, and C

• A: 67, 77, 79, 72, 64

• B: 55, 50, 58, 57, 54

• C: 66, 70, 69, 62, 68

Step 1: Combine all data and assign ranks


There are N = 15 total values. Ranks are assigned from smallest (1) to largest (15):

Score Rank
50 1
54 2
55 3
57 4
58 5
62 6
64 7
66 8
67 9
68 10
69 11
70 12
72 13
77 14
79 15

Step 2: Compute rank sums

• A (n = 5): 67(9), 77(14), 79(15), 72(13), 64(7)


R1 = 9 + 14 + 15 + 13 + 7 = 58

• B (n = 5): 55(3), 50(1), 58(5), 57(4), 54(2)


R2 = 3 + 1 + 5 + 4 + 2 = 15

13
• C (n = 5): 66(8), 70(12), 69(11), 62(6), 68(10)
R3 = 8 + 12 + 11 + 6 + 10 = 47

Step 3: Apply Kruskal-Wallis formula


 2
R1 R22 R32

12
H= + + − 3(N + 1)
N (N + 1) n1 n2 n3
 2
152 472

12 58
H= + + − 3 · 16
15 · 16 5 5 5
 
12 3364 225 2209
H= + + − 48
240 5 5 5
12 12
H= (672.8 + 45 + 441.8) − 48 = · 1159.6 − 48
240 240

H = 0.05 · 1159.6 − 48 = 57.98 − 48 = 9.98


Step 4: Conclusion
Degrees of freedom: k − 1 = 2 Chi-square critical value at α = 0.05: χ20.05,2 = 5.991
Since H = 9.98 > 5.991, we reject the null hypothesis.
Conclusion: There is a statistically significant difference between the three categories
at the 5% level.

14

You might also like