Simple Linear Regression 35
Problems
1. Consider a set of data (x
i
, y
i
), i = 1, 2, , n, and the following two
regression models:
y
i
=
0
+
1
x
i
+, (i = 1, 2, , n), Model A
y
i
=
0
+
1
x
i
+
2
x
2
i
+, (i = 1, 2, , n), Model B
Suppose both models are tted to the same data. Show that
SS
Res, A
SS
Res, B
If more higher order terms are added into the above Model B, i.e.,
y
i
=
0
+
1
x
i
+
2
x
2
i
+
3
x
3
i
+ +
k
x
k
i
+, (i = 1, 2, , n),
show that the inequality SS
Res, A
SS
Res, B
still holds.
2. Consider the zero intercept model given by
y
i
=
1
x
i
+
i
, (i = 1, 2, , n)
where the
i
s are independent normal variables with constant variance
2
. Show that the 100(1 )% condence interval on E(y|x
0
) is given
by
b
1
x
0
+t
/2, n1
s
x
2
0
n
i=1
x
2
i
where s =
_
n
i=1
(y
i
b
1
x
i
)/(n 1) and b
1
=
n
i=1
y
i
x
i
n
i=1
x
2
i
.
3. Derive and discuss the (1 )100% condence interval on the slope
1
for the simple linear model with zero intercept.
4. Consider the xed zero intercept regression model
y
i
=
1
x
i
+
i
, (i = 1, 2, , n)
The appropriate estimator of
2
is given by
s
2
=
n
i=1
(y
i
y
i
)
2
n 1
Show that s
2
is an unbiased estimator of
2
.
36 Linear Regression Analysis: Theory and Computing
Table 2.10 Data for Two Parallel
Regression Lines
x y
x
1
y
1
.
.
.
.
.
.
xn
1
yn
1
x
n
1
+1
y
n
1
+1
.
.
.
.
.
.
x
n
1
+n
2
y
n
1
+n
2
5. Consider a situation in which the regression data set is divided into two
parts as shown in Table 2.10.
The regression model is given by
y
i
=
_
(1)
0
+
1
x
i
+
i
, i = 1, 2, , n
1
;
(2)
0
+
1
x
i
+
i
, i = n
1
+ 1, , n
1
+n
2
.
In other words, there are two regression lines with common slope. Using
the centered regression model
y
i
=
_
(1)
0
+
1
(x
i
x
1
) +
i
, i = 1, 2, , n
1
;
(2)
0
+
1
(x
i
x
2
) +
i
, i = n
1
+ 1, , n
1
+n
2
,
where x
1
=
n1
i=1
x
i
/n
1
and x
2
=
n1+n2
i=n1+1
x
i
/n
2
. Show that the least
squares estimate of
1
is given by
b
1
=
n1
i=1
(x
i
x
1
)y
i
+
n1+n2
i=n
1
+1
(x
i
x
2
)y
i
n1
i=1
(x
i
x
1
)
2
+
n1+n2
i=n1+1
(x
i
x
2
)
2
6. Consider two simple linear models
Y
1j
=
1
+
1
x
1j
+
1j
, j = 1, 2, , n
1
and
Y
2j
=
2
+
2
x
2j
+
2j
, j = 1, 2, , n
2
Assume that
1
=
2
the above two simple linear models intersect. Let
x
0
be the point on the x-axis at which the two linear models intersect.
Also assume that
ij
are independent normal variable with a variance
2
. Show that
Simple Linear Regression 37
(a). x
0
=
1
2
(b). Find the maximum likelihood estimates (MLE) of x
0
using the
least squares estimators
1
,
2
,
1
, and
2
.
(c). Show that the distribution of Z, where
Z = (
1
2
) +x
0
(
2
),
is the normal distribution with mean 0 and variance A
2
2
, where
A
2
=
x
2
1j
2x
0
x
1j
+x
2
0
n
1
n
1
(x
1j
x
1
)
2
+
x
2
2j
2x
0
x
2j
+x
2
0
n
2
n
2
(x
2j
x
2
)
2
.
(d). Show that U = N
2
/
2
is distributed as
2
(N), where N =
n
1
+n
2
4.
(e). Show that U and Z are independent.
(f). Show that W = Z
2
/A
2
2
has the F distribution with degrees of
freedom 1 and N.
(g). Let S
2
1
=
(x
1j
x
1
)
2
and S
2
2
=
(x
2j
x
2
)
2
, show that the
solution of the following quadratic equation about x
0
, q(x
0
) =
ax
2
0
+ 2bx
0
+c = 0,
_
(
2
)
2
_
1
S
2
1
+
1
S
2
2
_
2
F
,1,N
_
x
2
0
+ 2
_
(
1
2
)(
2
) +
_
x
1
S
2
1
+
x
2
S
2
2
_
2
F
,1,N
_
x
0
+
_
(
1
2
)
2
x
2
1j
n
1
S
2
1
+
x
2
2j
n
2
S
2
2
_
2
F
,1,N
_
= 0.
Show that if a 0 and b
2
ac 0, then 1 condence interval
on x
0
is
b
b
2
ac
a
x
0
b +
b
2
ac
a
.
7. Observations on the yield of a chemical reaction taken at various tem-
peratures were recorded in Table 2.11:
(a). Fit a simple linear regression and estimate
0
and
1
using the
least squares method.
(b). Compute 95% condence intervals on E(y|x) at 4 levels of temper-
atures in the data. Plot the upper and lower condence intervals
around the regression line.
38 Linear Regression Analysis: Theory and Computing
Table 2.11 Chemical Reaction Data
temperature (C
0
) yield of chemical reaction (%)
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
Data Source: Raymond H. Myers, Classical and Mod-
ern Regression Analysis With Applications, P77.
(c). Plot a 95% condence band on the regression line. Plot on the
same graph for part (b) and comment on it.
8. The study Development of LIFETEST, a Dynamic Technique to As-
sess Individual Capability to Lift Material was conducted in Virginia
Polytechnic Institute and State University in 1982 to determine if cer-
tain static arm strength measures have inuence on the dynamic lift
characteristics of individual. 25 individuals were subjected to strength
tests and then were asked to perform a weight-lifting test in which
weight was dynamically lifted overhead. The data are in Table 2.12:
(a). Find the linear regression line using the least squares method.
(b). Dene the joint hypothesis H
0
:
0
= 0,
1
= 2.2. Test this
hypothesis problem using a 95% joint condence region and
0
and
1
to draw your conclusion.
(c). Calculate the studentized residuals for the regression model. Plot
the studentized residuals against x and comment on the plot.
Simple Linear Regression 39
Table 2.12 Weight-lifting Test Data
Individual Arm Strength (x) Dynamic Lift (y)
1 17.3 71.4
2 19.5 48.3
3 19.5 88.3
4 19.7 75.0
5 22.9 91.7
6 23.1 100.0
7 26.4 73.3
8 26.8 65.0
9 27.6 75.0
10 28.1 88.3
11 28.1 68.3
12 28.7 96.7
13 29.0 76.7
14 29.6 78.3
15 29.9 60.0
16 29.9 71.7
17 30.3 85.0
18 31.3 85.0
19 36.0 88.3
20 39.5 100.0
21 40.4 100.0
22 44.3 100.0
23 44.6 91.7
24 50.4 100.0
25 55.9 71.7
Data Source: Raymond H. Myers, Classical and Mod-
ern Regression Analysis With Applications, P76.