Training Data Analysis
Multivariate Linier Regression
BASIC PRINCIPLE
Novandri Kusuma Wardana
Multivariate Linear Regression
A. The Basic Principle
We consider the multivariate extension of multiple
linear regression modeling the relationship
between m responses Y1,,Ym and a single set of r
predictor variables z1,,zr. Each of the m responses is
assumed to follow its own regression model, i.e.,
Y1 = B01 + B11z1 + B21z2 + +
Y2 = B02 + B12z1 + B22z2 + +
Y1 = B01 + B11z1 + B21z2 + +
Br1zr
Br2zr
Br1zr
1
2
where E = E
= 0, Var =
Conceptually, we can let
[zj0, zj1, , zjr]
denote the values of the predictor variables for the jth
trial and
Yj1
j1
Y
j2
j2
Yj =
, =
Yjm
jm
be the responses and errors for the jth trial. Thus we
have an n x (r + 1) design matrix
z10
z
20
Z =
zn0
z11
z21
zn1
z1r
z2r
znr
If we now set
Y11
Y
21
Y =
Yn1
Y12
Y22
Yn2
Y1m
Y2m
= Y1 | Y 2 | | Y m
Ynm
01
11
=
r1
02
12
r2
0m
1m
= 1 | 2 | | m
rm
11
21
=
n1
12
22
n2
1m
2m
=
| 2
1
nm
'1
'
2
| | m =
'
m
the multivariate linear regression model is
with
Y = Z +
E i = 0
and
Cov i,k = ikI, i, k = 1, , m
Note also that the m observed responses on the jth
trial have covariance matrix
11 12 1m
21
22
2m
=
m1 m2 mm
The ordinary least squares estimates b are found in a
~
manner analogous to the univariate case we begin
by taking
i =
'
ZZ
-1
Z'Yi
collecting the univariate least squares estimates yields
-1
-1
'
'
'
| | | = ZZ Z Y | Y | | Y = ZZ Z'Y
m
1
2
m
1 2
Now for any choice of parameters
B = b1 | b2 | | b m
the resulting matrix of errors is
Y - Z
The resulting Error Sums of Squares and Crossproducts
is
Y
Y - ZB
'
- ZB =
- Zb1
Y
'
- Zb1
- Zb m
Y
'
- Zb1
- Zbi
Y
'
- Zbi
^
b~ (i)
- Zb m
Y
Y - ZB
'
- ZB and
are both minimized.
- Zb m
Y
'
- Zb m
generalized
variance
Y - ZB
Y - ZB
'
minimizes the
i.e.,
tr
We can show that the selection b~(i) =
ith diagonal sum of squares
- Zb1
'
so we have matrices of predicted values
- Z
Y
Z Y
'
= ZZ
-1
'
and we have a resulting matrices of residuals
Y-Y
-1
'
= I - Z ZZ Z' Y
Note that the orthogonality conditions among
residuals, predicted values, and columns of the
design matrix which hold in the univariate case are
also true in the multivariate case because
-1
'
Z I - Z ZZ Z' = Z' - Z' = 0
'
which means the residuals are perpendicular to the
columns of the design matrix
-1
'
= Z I - Z ZZ Z' = Z' - Z' = 0
Z
'
'
and to the predicted values
'
Y
-1
' '
'
Z I - Z ZZ Z' Y
= 0
Furthermore, because
we have
'
YY
total sums of
squares and
crossproducts
+
Y
'
YY
predicted sums
of squares and
crossproducts
'
residual (error)
sums of squares
and crossproducts
Example suppose we had the following six sample
observations on two independent variables (palatability
and texture) and two dependent variables (purchase
intent and overall quality):
Palatability Texture
65
72
77
68
81
73
71
77
73
78
76
87
Overall Purchase
Quality
Intent
63
67
70
70
72
70
75
72
89
88
76
77
Use these data to estimate the multivariate linear
regression model for which palatability and texture are
independent variables while purchase intent and
overall quality are the dependent variables
We wish to estimate
Y1 = B01 + B11z1 + B21z2
and
Y2 = B02 + B12z1 + B22z2
jointly.
The design matrix is
1
1
1
Z =
1
1
65
72
77
68
81
73
71
77
73
78
76
87
so
1 1 1 1 1 1
'
ZZ
= 65 72 77 68 81 73
71 77 73 78 76 87
1
1
1
1
65
72
77
68
81
73
71
77
436
462
6
73
=
436
31852
33591
78
462 33591 35728
76
87
and
-1
ZZ
'
-1
436
462
6
= 436 31852 33591
462 33591 35728
62.560597030 -0.378268027 -0.453330568
= -0.378268027 0.005988412 -0.000738830
-0.453330568 -0.000738830 0.006584661
and
Z'y1
so
1 1 1 1 1 1
= 65 72 77 68 81 73
71 77 73 78 76 87
63
70
445
72
=
32536
75
34345
89
76
-1
'
1 = ZZ Z'y1
62.560597030 -0.378268027 -0.453330568 445
= -0.378268027 0.005988412 -0.000738830 32536
-0.453330568 -0.000738830 0.006584661 34345
-37.501205460
= 1.134583728
0.379499410
and
Z'y 2
so
1 1 1 1 1 1
= 65 72 77 68 81 73
71 77 73 78 76 87
67
70
444
70
=
32430
72
34260
88
77
-1
'
2 = ZZ Z'y2
62.560597030 -0.378268027 -0.453330568 444
= -0.378268027 0.005988412 -0.000738830 32430
-0.453330568 -0.000738830 0.006584661 34260
-21.432293350
= 0.940880634
0.351449792
so
-37.501205460 -21.432293350
|
= 1.134583728
=
0.940880634
1 2
0.379499410
0.351449792
This gives us estimated values matrix
1
1
1
= Z =
1
1
65
72
77
68
81
73
71
63.19119
73.41028
77
-37.501205460 -21.432293350
73
77.56520
0.940880634 =
1.134583728
78
69.25144
0.379499410
0.351449792
83.24203
76
87
78.33986
64.67788
73.37275
76.67135
69.96067
81.48922
77.82812
and residuals matrix
Y - Y
63
70
72
=
75
89
76
67 63.19119 64.67788 0.191194960 -2.322116943
70 73.41028 73.37275 3.410277515 3.372746244
70 77.56520 76.67135 5.565198512 6.671350244
-
=
72 69.25144 69.96067 -5.748557985 -2.039326498
88 83.24203 81.48922 -5.757968347 -6.510777845
77 78.33986 77.82812 2.339855345 0.828124797
Note that each
column sums to zero!
B. Inference in Multivariate Regression
The least squares estimators
b = [b
| b ||b
]
~ (1) ~ (2)
~ (m)
~
of the multivariate regression model have the
following properties
= i.e., E
- E
i
i
-1
'
- Cov i, k = ik ZZ , i, k = 1, , m
'
= -1
- E = 0 and E
n - r - 1
if the model is of full rank, i.e., rank(Z)=
r + 1 < n.
~
Note that e~ and b are also uncorrelated.
~
This means that, for any observation z0
= z'0
z'0
| | | = z'0 | z'0 | | z'0
m
1
2
m
1 2
is an unbiased estimator, i.e.,
= z'0
E z'0
We can also determine from these properties that the
estimation errors
have covariances
z'0i - z'0
i
E z'0 i -
i i
'
= z0 E i -
i
z
0
i
'
-1
'
'
- i z0 = ikz0 ZZ z0
'
Furthermore, we can easily ascertain that
= Y0
z'0
i.e., the forecasted vector Y0 associated with the values
~
of the predictor variables z0 is an unbiased estimator
~
of Y~ 0.
The forecast errors have covariance
'
E Y0i - z0i
0k
- z
k
'
0
'
= 1 + z' ZZ
ik
0
-1
z0
Thus, for the multivariate regression model with full
rank (Z) = r + 1, n r + 1 + m, and normally
~
distributed errors ~e,
-1
'
ZZ
Z'Y
is the maximum likelihood estimator of b and
~
where the elements of S are
~
Cov
i
k
= ZZ
'
ik
-1
, i, k = 1, , m
^
Also, the maximum likelihood estimator of b is
~
independent of the maximum likelihood estimator of
the positive definite matrix S given by
1
'
= 1
=
n
n
and
'
Y - Z Y - Z
~ Wp,n-r-1
n
all of which provide additional support for using the
least squares estimate when the errors are normally
distributed
'
and n-1
are the maximum likelihood estimators of
and
These results can be used to develop likelihood ratio
tests for the multivariate regression parameters.
The hypothesis that the responses do not depend on
predictor variables zq+1, zq+2,, zr is
Big Beta (2)
H0 : 2 = 0 where
1
=
2
If we partition Z
in a similar manner
~
Z = Z1 | Z2
m x (q + 1)
m x (r - q)
(q + 1) x m
(r - q) x m
we can write the general model as
E Y = Z
1
= Z1 | Z2 = Z11 + Z22
^
The extra sum of squares associated with b(2) are
~
'
Y - Z1
Y - Z1
1
1
where
and
1 = n-1
= n -
'
Y - Z
Y - Z
'
1 1
= ZZ
Y - Z1
1
-1
'
Z'1Y
Y - Z1
1
The likelihood ratio for the test of the hypothesis
H0:b(2) = 0~
~
is given by the ratio of generalized variances
= =
, 1
1
,
L
,
1
max L ,
,
max L
1
n2
which is often converted to Wilks Lambda statistic
2 n
Finally, for the multivariate regression model with
full rank (Z)
~ = r + 1, n r + 1 + m, normally
distributed errors ~e, and the null hypothesis is true
^
^
(so n(S~ 1 S)
~ Wq,r-q(S))
~
~
- n - r - 1 - m - r + q + 1 ln
2
when n r and n m are both large.
~ 2m r-q
If we again refer to the Error Sum of Squares and
Crossproducts as
^
E
= nS
~
~
and the Hypothesis Sum of Squares and Crossproducts
as
H
= n(S
- S)
~
~1 ~
then we can define Wilks lambda as
2n
E
= =
=
E+H
1
i=1 1 + i
where h1 h2 hs are the ordered eigienvalues of
HE-1 where s = min(p, r - q).
~~
There are other similar tests (as we have seen in our
discussion of MANOVA):
s
Pillais Trace
i =1
-1
i
= tr H H + E
1 + i
-1
=
tr
HE
Hotelling-Lawley Trace i
i=1
1
Roys Greatest Root
1 + 1
Each of these statistics is an alternative to Wilks
lambda and perform in a very similar manner
(particularly for large sample sizes).
Example For our previous data (the following six
sample observations on two independent variables palatability and texture - and two dependent variables purchase intent and overall quality
Palatability Texture
65
72
77
68
81
73
71
77
73
78
76
87
Overall Purchase
Quality
Intent
63
67
70
70
72
70
75
72
89
88
76
77
to test the hypotheses that i) palatability has no joint
relationship with purchase intent and overall quality
and ii) texture has no joint relationship with purchase
intent and overall quality.
We first test the hypothesis that palatability has no
joint relationship with purchase intent and overall
quality, i.e.,
H0:b(1) = 0
~
The likelihood ratio for the test of this hypothesis is
given by the ratio of generalized variances
= =
,
2
max L ,
,
max L
2
, 2
2
,
L
n2
For ease of computation, well use the Wilks lambda
statistic
2 n
E
= =
E+H
2
The error sum of squares and crossproducts matrix is
114.31302415 99.335143683
E =
99.335143683
108.5094298
and the hypothesis sum of squares and crossproducts
matrix for this null hypothesis is
214.96186763 178.26225891
H =
178.26225891
147.82823253
so the calculated value of the Wilks lambda statistic is
2n
E
=
E+H
114.31302415 99.335143683
99.335143683 108.5094298
=
114.31302415 99.335143683 214.96186763 178.26225891
99.335143683 108.5094298 + 178.26225891 147.82823253
2536.570299
=
= 0.34533534
7345.238098
The transformation to a Chi-square distributed
statistic (which is actually valid only when n r and n
m are both large) is
- n - r - 1 - m - r + q + 1 ln
2
= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.34533534
2
= 0.92351795
at a = 0.01 and m(r - q) = 1 degrees of freedom, the
critical value is 9.210351 - we have a strong nonrejection. Also, the approximate p-value of this chisquare test is 0.630174 note that this is an extremely
gross approximation (since n r = 4 and n m = 4).
We next test the hypothesis that texture has no joint
relationship with purchase intent and overall quality,
i.e.,
H0:b(2) = 0
~
The likelihood ratio for the test of this hypothesis is
given by the ratio of generalized variances
= =
,
1
max L ,
,
max L
1
, 1
1
,
L
n2
For ease of computation, well use the Wilks lambda
statistic
2 n
E
= =
E+H
1
The error sum of squares and crossproducts matrix is
114.31302415 99.335143683
E =
99.335143683
108.5094298
and the hypothesis sum of squares and crossproducts
matrix for this null hypothesis is
21.872015222 20.255407498
H =
20.255407498
18.758286731
so the calculated value of the Wilks lambda statistic is
2n
E
=
E+H
114.31302415 99.335143683
99.335143683 108.5094298
=
114.31302415 99.335143683 21.872015222 20.255407498
99.335143683 108.5094298 + 20.255407498 18.758286731
2536.570299
=
= 0.837135598
3030.059055
The transformation to a Chi-square distributed
statistic (which is actually valid only when n r and n
m are both large) is
- n - r - 1 - m - r + q + 1 ln
2
= - 6 - 2 - 1 - 2 - 2 + 1 + 1 ln 0.837135598
2
= 0.15440838
at a = 0.01 and m(r - q) = 1 degrees of freedom, the
critical value is 9.210351 - we have a strong nonrejection. Also, the approximate p-value of this chisquare test is 0.925701 - note that this is an extremely
gross approximation (since n r = 4 and n m = 4).
SAS code for a Multivariate Linear Regression Analysis:
OPTIONS LINESIZE = 72 NODATE PAGENO = 1;
DATA stuff;
INPUT z1 z2 y1 y2;
LABEL z1='Palatability Rating'
z2='Texture Rating'
y1='Overall Quality Rating'
y2='Purchase Intent';
CARDS;
65
71
63
67
72
77
70
70
77
73
72
70
68
78
75
72
81
76
89
88
73
87
76
77
;
PROC GLM DATA=stuff;
MODEL y1 y2 = z1 z2/;
MANOVA H=z1 z2/PRINTE PRINTH;
TITLE4 'Using PROC GLM for Multivariate Linear Regression';
RUN;
SAS output for a Multivariate Linear Regression Analysis:
Dependent Variable: y1
Source
Model
Error
Corrected Total
DF
2
3
5
R-Square
0.691740
Sum of
Squares
256.5203092
114.3130241
370.8333333
Coeff Var
8.322973
Overall Quality Rating
Mean Square
128.2601546
38.1043414
Root MSE
6.172871
F Value
3.37
Pr > F
0.1711
y1 Mean
74.16667
Source
z1
z2
DF
1
1
Type I SS
234.6482940
21.8720152
Mean Square
234.6482940
21.8720152
F Value
6.16
0.57
Pr > F
0.0891
0.5037
Source
z1
z2
DF
1
1
Type III SS
214.9618676
21.8720152
Mean Square
214.9618676
21.8720152
F Value
5.64
0.57
Pr > F
0.0980
0.5037
Dependent Variable: y1
Parameter
Intercept
z1
z2
Overall Quality Rating
Estimate
-37.50120546
1.13458373
0.37949941
Standard
Error
48.82448511
0.47768661
0.50090335
t Value
-0.77
2.38
0.76
Pr > |t|
0.4984
0.0980
0.5037
SAS output for a Multivariate Linear Regression Analysis:
Dependent Variable: y2
Source
Model
Error
Corrected Total
DF
2
3
5
R-Square
0.625830
Sum of
Squares
181.4905702
108.5094298
290.0000000
Coeff Var
8.127208
Purchase Intent
Mean Square
90.7452851
36.1698099
Root MSE
6.014134
F Value
2.51
Pr > F
0.2289
y2 Mean
74.00000
Source
z1
z2
DF
1
1
Type I SS
162.7322835
18.7582867
Mean Square
162.7322835
18.7582867
F Value
4.50
0.52
Pr > F
0.1241
0.5235
Source
z1
z2
DF
1
1
Type III SS
147.8282325
18.7582867
Mean Square
147.8282325
18.7582867
F Value
4.09
0.52
Pr > F
0.1364
0.5235
Dependent Variable: y2
Parameter
Intercept
z1
z2
Purchase Intent
Estimate
-21.43229335
0.94088063
0.35144979
Standard
Error
47.56894895
0.46540276
0.48802247
t Value
-0.45
2.02
0.72
Pr > |t|
0.6829
0.1364
0.5235
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure
Multivariate Analysis of Variance
y1
y2
E = Error SSCP Matrix
y1
y2
114.31302415
99.335143683
99.335143683
108.5094298
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 3
y1
y1
1.000000
y2
0.891911
0.1081
y2
0.891911
0.1081
1.000000
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for z1
y1
y2
y1
214.96186763
178.26225891
y2
178.26225891
147.82823253
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for z1
E = Error SSCP Matrix
Characteristic
Characteristic Vector V'EV=1
Root
Percent
y1
y2
1.89573606
100.00
0.10970859
-0.01905206
0.00000000
0.00
-0.17533407
0.21143084
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall z1 Effect
H = Type III SSCP Matrix for z1
E = Error SSCP Matrix
S=1
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
M=0
N=0
Value F Value
0.34533534
1.90
0.65466466
1.90
1.89573606
1.90
1.89573606
1.90
Num DF
2
2
2
2
Den DF
2
2
2
2
Pr > F
0.3453
0.3453
0.3453
0.3453
SAS output for a Multivariate Linear Regression Analysis:
The GLM Procedure
Multivariate Analysis of Variance
H = Type III SSCP Matrix for z2
y1
y2
y1
21.872015222
20.255407498
y2
20.255407498
18.758286731
Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for z2
E = Error SSCP Matrix
Characteristic
Characteristic Vector V'EV=1
Root
Percent
y1
y2
0.19454961
100.00
0.06903935
0.02729059
0.00000000
0.00
-0.19496558
0.21052601
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of No Overall z2 Effect
H = Type III SSCP Matrix for z2
E = Error SSCP Matrix
S=1
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
M=0
N=0
Value F Value
0.83713560
0.19
0.16286440
0.19
0.19454961
0.19
0.19454961
0.19
Num DF
2
2
2
2
Den DF
2
2
2
2
Pr > F
0.8371
0.8371
0.8371
0.8371
We can also build confidence intervals for the
predicted mean value of Y0 associated with z0 - if the
~
~
model
+
Y = Z
has normal errors, then
'z0
and
~ Nm
-1
'
'
'
z0, z0 ZZ z0
independent
~ Wn-r-1
n
so
'
'
'
'
'
-1
z0 - z0
z0 - z0
T2 =
-1
-1
n
r
1
'
'
z' ZZ
z'0 ZZ
z0
z0
0
Thus the 100(1 a)% confidence interval for the
predicted mean value of Y0 associated with ~z0 (bz0)
~
~~
is given by
'z0 - 'z0
'
-1
n
r
1
'z0 - 'z0
z ZZ
'
0
'
-1
m n - r - 1
z0
Fm,n-r-m
n-r-m
and the 100(1 a)% simultaneous confidence intervals
~ with z0 (z0 b(i) ) are
for the mean value of Yi associated
~ ~
~
z
i
'
0
-1
m n - r - 1
n
'
'
Fm,n-r- m z0 ZZ z0
ii
n-r-m
n-r-1
i = 1,,m
Finally, we can build prediction intervals for the
predicted value of Y~ 0 associated with ~z0 here the
prediction error
+
Y = Z
has normal errors, then
'z0
and
~ Nm
-1
'
'
'
z0, z0 ZZ z0
independent
~ Wn-r-1
n
so
'
'
'
'
'
-1
z0 - z0
z0 - z0
T2 =
-1
-1
n
r
1
'
'
z' ZZ
z'0 ZZ
z0
z0
0
the prediction intervals the 100(1 a)% prediction
interval associated with ~z0 is given by
'z0
Y0 -
'
-1
n
r
1
'z0
Y0 -
m n - r - 1
1 + z ZZ z0
Fm,n-r-m
n-r-m
'
0
'
-1
and the 100(1 a)% simultaneous prediction intervals
with ~z0 are
z
i
'
0
-1
m n - r - 1
n
'
'
Fm,n-r- m 1 + z0 ZZ z0
ii
n-r-m
n-r-1
i = 1,,m