Multivariate Analysis Notes
Multivariate Analysis Notes
                                 1
Contents
                              2
0.4.6    The Procrustes Problem . . . . . . . . . . . .      49
0.4.7    Matrix Rank Reduction . . . . . . . . . . . .       50
0.4.8    Torgerson Metric Multidimensional Scaling . .       50
0.4.9    A Guttman Multidimensional Scaling Result .         52
0.4.10   A Few General MATLAB Routines to Know
         About . . . . . . . . . . . . . . . . . . . . . .   53
                           3
List of Figures
                               4
0.1     Necessary Matrix Algebra Tools
0.1.1 Preliminaries
                                   5
on the blackboard, a capital letter with a wavy line underneath to
indicate boldface):
                                                            
                     a11 a12 a13                   a1V   
                                                            
                    
                     a21 a22 a23                   a2V   
                                                             
                  A=
                    
                      ... ... ...               . . . ...   
                                                             
                                                             
                                                            
                                                            
                      aU1 aU2 aU3                   aUV
This matrix has U rows and V columns and is said to have order
U  V . An arbitrary element auv refers to the element in the uth
row and v th column, with the row index always preceding the column
index (and therefore, we might use the notation of A = {auv }UV
to indicate the matrix A as well as its order).
   A 1  1 matrix such as (4)11 is just an ordinary number, called a
scalar. So without loss of any generality, numbers are just matrices.
A vector is a matrix with a single row or column; we denote a column
vector by a lowercase boldface letter, e.g., x, y, z, and so on. The
vector
                                           
                                 
                                 
                                     x1     
                                            
                           x=    
                                 
                                      ...   
                                            
                                           
                                           
                                     xU         U1
                        x = (x1, . . . , xU )1U
with the prime indicating the transpose of x, i.e., the interchange of
row(s) and column(s). This transpose operation can be applied to
any matrix; for example,
                                     6
                                        
                               
                               
                                   1 1 
                                        
                               
                         A=    
                                  3 7 
                                        
                                       
                                   4 1 32
                                            
                                   1 3 4
                        A = 
                                       
                                  1 7 1 23
   If a matrix is square, dened by having the same number of rows
as columns, say U , and if the matrix and its transpose are equal, the
matrix is said to be symmetric. Thus, in A = {auv }UU , auv = avu
for all u and v. As an example,
                                                
                                      1 4 3
                                             
                     A = A =      
                                   
                                      4 7 1 
                                              
                                              
                                             
                                       3 1 3
For a square matrix AUU , the elements auu, 1  u  U , lie along
the main or principal diagonal. The sum of main diagonal entries
of a square matrix is called the trace; thus,
                                   7
Similarly, we might at times need an U  V matrix of all ones, say
E:
                                                    
                                   
                                   
                                       1  1 
                           E=      
                                   
                                       ... . . . ... 
                                                     
                                                     
                                                    
                                                    
                                       1  1
A diagonal matrix is square with zeros in all the o main-diagonal
positions:
                                                        
                                   a1    0           
                                                        
                     DUU =     
                                
                                     ... . . . ...       
                                                         
                                                        
                                                        
                                    0    aU               UU
Here, we again indicate the main diagonal entries with just one index
as a1, a2, . . . , aU . If all of the main diagonal entries in a diagonal
matrix are 1s, we have the identity matrix denoted by I:
                                                     
                             1         0        0
                                                       
                            
                             0         1        0  
                          I=
                             .
                             ..        ...   ...   ... 
                                                        
                                                        
                                                       
                                                       
                              0         0        1
  To introduce some useful operations on matrices, suppose we have
two matrices A and B of the same U  V order:
                                                     
                              a11    a1V          
                                                     
                      A=   
                           
                                ... . . . ...         
                                                      
                                                     
                                                     
                               aU1    aUV              UV
                                                     
                              b11    b1V          
                                                     
                      B=   
                           
                                ... . . . ...         
                                                      
                                                     
                                                     
                               bU1    bUV             UV
                                        8
As a denition for equality of two matrices of the same order (and
for which it only makes sense to talk about equality), we have:
   A = B if and only if auv = buv for all u and v.
Remember, the if and only if statement (sometimes abbreviated
as i) implies two conditions:
   if A = B, then auv = buv for all u and v;
   if auv = buv for all u and v, then A = B.
Any denition by its very nature implies an if and only if state-
ment.
   To add two matrices together, they rst have to be of the same
order (referred to as conformal for addition); we then do the addition
component by component:
                                                              
                           a11 + b11    a1V        + b1V   
                                                              
           A+B=         
                        
                                ...   ...              ...     
                                                               
                                                              
                                                              
                            aU1 + bU1    aUV        + bUV   UV
                               AB = C34 =
                                                             
     1(1) + 4(1) 1(2) + 4(0) 1(0) + 4(1) 1(1) + 4(4) 
                                                         
 
 
     3(1) + 1(1) 3(2) + 1(0) 3(0) + 1(1) 3(1) + 1(4)    =
                                                          
                                                         
     1(1) + 0(1) 1(2) + 0(0) 1(0) + 0(1) 1(1) + 0(4)
                                            
                             3 2 4 17 
                                       
                                       
                            2  6 1  7 
                                       
                                       
                              1 2 0 1
  Some properties of matrix addition and multiplication follow, where
the matrices are assumed conformal for the operations given:
  (A) matrix addition is commutative:
                           A+B=B+A
  (B) matrix addition is associative:
A + (B + C) = (A + B) + C
                                    10
  (C) matrix multiplication is right and left distributive over matrix
addition:
                      A(B + C) = AB + AC
                      (A + B)C = AC + BC
  (D) matrix multiplication is associative:
                         A(BC) = (AB)C
In general, AB =   BA even if both products are dened. Thus,
multiplication is not commutative as the following simple example
shows:
                                                              
          0 1            1 1          0 1          1 1 
A22   =
              ; B22 =       ; AB =       ; BA =      
          1 0              0 1            1 1            1 0
   In the product AB, we say that B is premultiplied by A and A
is postmultiplied by B. Thus, if we pre- or postmultiply a matrix
by the identity, the same matrix is retrieved:
(A) = A; (A + B) = A + B
                         x2     
               XNP     
                       =
                         ...
                                 
                                 
                                 
                                     = v1 v2    vP
                                
                                
                          xN
The inner product (also called the dot or scalar product) of two
vectors , xU1 and yU1 , is dened as
                                                 
                                       y1        
                                                        U
                                                          
                x y = (x1, . . . , xU ) ...
                                      
                                       
                                       
                                                  
                                                  
                                                     =         xuyu
                                                        u=1
                                        yU
Thus, the inner product of a vector with itself is merely the sum of
                                            
squares of the entries in the vector: x x = Uu=1 x2u . Also, because
                                      13
an inner product is a scalar and must equal it own transpose (i.e.,
x y = (x y) = y x), we have the end result that
                                      x y = y x
If there is an inner product, there should also be an outer product
dened as the U  U matrices given by xy or as yx . As indicated
by the display equations below, xy is the transpose of yx :
                                                                           
                
                
                    x1     
                           
                                                      
                                                      
                                                          x1y1    x1yU     
                                                                              
            
         xy =   
                
                     ...   
                              (y1, . . . , yN ) =    
                                                      
                                                            ... . . .   ...   
                                                                              
                                                                           
                                                                           
                    xU                                    xU y1    xU yU
                                                                           
                y1                     y1 x1    y1 xU 
                                                         
               ..                       .            .
                                         .
         yx =  .  (x1, . . . , xU ) =  . . . .        . 
                                                         .  
                                                            
                                                         
                  yU                      yU x1    yU xU
  A vector can be viewed as a geometrical vector in U dimensional
space. Thus, the two 2  1 vectors
                                                          
                                   3        4 
                               x=
                                    ; y =    
                                   4          1
can be represented in the two-dimensional Figure 1 below, with the
entries in the vectors dening the coordinates of the endpoints of the
arrows.
   The Euclidean distance between two vectors, x and y, is given
as:
                
                
                                 
                
U
                
                          (xu  yu     )2   = (x  y)(x  y)
                    u=1
and the length of any vector is the Euclidean distance between the
vector
     and the origin. Thus, in Figure 1, the
                                            distance between x and
y is 10 with respective lengths of 5 and 17.
                                                 14
        
                            (3,4)
                                    
                                   
   4
   3                
                  
                  ....
                     .....
                         ....
            
                            ....
   2                           ....
                                  ....
                                     ....
                                                    
                                        ...
                                          ...
             
                                            ...
                      
                        
                                              ...
                     (4,1)                      ...
                   
                                                  ...
                 
                                                    ...
   1
               
                                                      ..
        
                                                       ..
(0,0)
        
            1       2                               3          4
                                                            Figure 1: Two vectors plotted in two-dimensional space
          The cosine of the angle between the two vectors x and y if dened
        by:
                                            xy
                             cos() =  1/2  1/2
                                      (x x) (y y)
        Thus, in the gure we have
                                                                                              
                                                                         	        
                                           4
                                   3 4     
                                           1       16
                       cos() =               =  = .776
                                     5 17         5 17
        The cosine value of .776 corresponds to an angle of 39.1 degrees or
        .68 radians; these later values can be found with the inverse (or arc)
        cosine function (on, say, a hand calculator, or using MATLAB as we
        suggest in the next section).
           When the means of the entries in x and y are zero (i.e., deviations
        from means have been taken), then cos() is the correlation between
                                                                                          15
                                              x
                                             
                                              
                                            
                                          
               
                      c2                        b2
           
           
              ....
                 ....
                    ...
                      ...
                        ...
        
                          ...
                            ...
                              ...
                                ..
                                 ..
                                  ..
                                   ..
                                                  
                                    ..
                                     .                 y
              2
                             a
dy
the entries in the two vectors. Vectors at right angles have cos() = 0,
or alternatively, the correlation is zero.
   Figure 2 shows two generic vectors, x and y, where without loss
of any real generality, y is drawn horizontally in the plane and x
is projected at a right angle onto the vector y resulting in a point
dened as a multiple d of the vector y. The formula for d that
we demonstrate below is based on the Pythagorean theorem that
c2 = b2 + a2:
0 = 2dxy + 2d2y y
                                                       16
                                      x y
                                    d= 
                                      yy
The diagram in Figure 2 is somewhat constricted in the sense that the
angle between the vectors shown is less than 90 degrees; this allows
the constant d to be positive. Other angles might lead to negative d
when dening the projection of x onto y, and would merely indicate
the need to consider the vector y oriented in the opposite (negative)
direction. Similarly, the vector y is drawn with a larger length than
x which gives a value for d that is less than 1.0; otherwise, d would
be greater than 1.0 indicating a need to stretch y to represent the
point of projection onto it.
   There are other formulas possible based on this geometric infor-
mation: the length of the projection is merely d times the length of
y; and cos() can be given   as the length of dy divided by the length
                               
                                       
                                           
                                              
of x, which is d y y/ x x = x y/( x x y y).
0.1.4 Determinants
                                       17
Beyond a 3  3 we can use a recursive process illustrated below. This
requires the introduction of a few additional matrix terms that we
now give: for a square matrix AUU , dene Auv to be the (n 
1)  (n  1) submatrix of A constructed by deleting the uth row
and v th column of A. We call det(Auv ) the minor of the entry auv ;
the signed minor of (1)u+v det(Auv ) is called the cofactor of auv .
The recursive algorithm would chose some row or column (rather
arbitrarily), and nd the cofactors for the entries in it; the cofactors
would then be weighted by the relevant entries and summed.
   As an example, consider the 4  4 matrix
                                               
                                 1 1 3 1 
                                           
                                1      1
                                           
                               
                                     1 0    
                                            
                                           
                                 3  2 1  2 
                                           
                                           
                                  1 2 4 3
and choose the second row. The expression below involves the weighted
cofactors for 33 submatrices that can be obtained by formulas. Be-
yond a 4  4 there will be nesting of the processes:
                                                               
                  1 3 1 
                         
                           
                           
                                              1 3 1 
                                                    
                           
          2+1
 (1)((1) ) det(  2 1 2 ) + (1)((1) ) det( 3 1 2 
                         
                          
                           
                                      2+2    
                                                    )+
                                                     
                                                  
                   2 4 3                       1 4 3
                                                                
                       1 1 1                           1 1 3 
                                                               
           2+3                              2+4                
(0)((1)         ) det( 3 2 2 
                      
                              ) + (1)((1)     ) det(  3
                                                             2 1 ) =
                                                                  
                                                               
                        1 2 3                              1 2 4
                          5 + (15) + 0 + (29) = 39
   Another strategy to nd the determinant of a matrix is to reduce it
a form in which we might note the determinant more or less by simple
                                           18
inspection. The reductions could be carried out by operations that
have a known eect on the determinant; the form which we might
seek is a matrix that is either upper-triangular (all entries below the
main diagonal are all zero), lower-triangular (all entries above the
main diagonal are all zero), or diagonal. In these latter cases, the
determinant is merely the product of the diagonal elements. Once
found, we can note how the determinant might have been changed
by the reduction process and carry out the reverse changes to nd
the desired determinant.
   The properties of determinants that we could rely on in the above
iterative process are as follows:
(A) if one row of A is multiplied by a constant c, the new determinant
is c det(A); the same is true for multiplying a column by c;
(B) if two rows or two columns of a matrix are interchanged, the sign
of the determinant is changed;
(C) if two rows or two columns of a matrix are equal, the determinant
is zero;
(D) the determinant is unchanged by adding a multiple of some row
to another row; the same is true for columns;
(E) a zero row or column implies a zero determinant;
(F) det(AB) = det(A) det(B)
                         yU1 = AUV xV 1
or,
                                      23
                                                                  
                   
                   
                       y1     
                              
                                      
                                      
                                          a11    a1V    
                                                           
                                                                x1     
                                                                       
                   
                   
                        ...   
                                 =   
                                      
                                           ... . . . ...   
                                                           
                                                                 ...   
                                                                       
                                                                  
                                                                  
                       yU                 aU1    aUV         xV
where yu = au1x1 + au2x2 +    + auV xV . Alternatively, y can be
written as a linear combination of the columns of A with weights
given by x1, . . . , xV :
                                                                        
         y1          a11                  a12                   a1V     
                                                                        
         ..          .                     ...                   ...    
         .     = x1  ..             + x2          +    + xV          
                                                                        
                                                                        
          yU            aU1                    aU2                     aUV
   To indicate one common usage for matrix transformation in a data
context, suppose we consider our data matrix X = {xij }NP , where
xij represents an observation for subject i on variable j. We would
like to use matrix transformations to produce a standardized matrix
Z = {(xij  xj )/sj }NP , where xj is the mean of the entries in
the j th column and sj is the corresponding standard deviation; thus,
the columns of Z all have mean zero and standard deviation one. A
matrix expression for this transformation could be written as follows:
                                           1
              ZNP = (INN  ( )ENN )XNP DP P
                                          N
where I is the identity matrix, E contains all ones, and D is a diagonal
matrix containing s11 , s12 , . . . , s1 , along the main diagonal positions.
                                       P
Thus, (INN  ( N )ENN )XNP produces a matrix with columns
                   1
y y = (Tx) (Tx) = x T Tx = x Ix = x x
                                      25
                                                                        
                        	                        
            
                                                              
                                                                  a1v    
                                                                         
                ru   = au1    auV                ; cv =   
                                                              
                                                                   ...   
                                                                         
                                                                        
                                                                        
                                                                  aUv
and
                                      
                               r1    
                                      
                            
                               r2    
                                       
                                             	                           
                AUV =      
                            
                                ...   
                                       
                                       
                                           = c1 c2    cV
                                      
                                      
                                rU
The maximum number of linearly independent rows of A and the
maximum number of linearly independent columns is the same; this
common number is dened to be the rank of A. A matrix is said to
be of full rank is the rank is equal to the minimum of U and V .
   Matrix rank has a number of useful properties:
   (A) A and A have the same rank;
   (B) A A, AA, and A have the same rank;
   (C) the rank of a matrix is unchanged by a pre- or postmultipli-
cation by a nonsingular matrix;
   (D) the rank of a matrix is unchanged by what are called elemen-
tary row and column operations: (a) interchange of two rows or two
columns; (2) multiplication or a row or a column by a scalar; (3) ad-
dition of a row (or column) to another row (or column). This is true
because any elementary operation can be represented by a premul-
tiplication (if the operation is to be on rows) or a postmultiplication
(if the operation is to be on columns) of a nonsingular matrix.
   To give a simple example, suppose we wish to perform some ele-
mentary row and column operations on the matrix
                                           26
                                             
                                   1 1 1
                                         
                                
                                
                                   1 0 2
                                          
                                         
                                    3 2 4
To interchange the rst two rows of this latter matrix, interchange the
rst two rows of an identity matrix and premultiply; for the rst two
columns to be interchanged, carry out the operation on the identity
and post-multiply:
                                                 
                 
                 
                     0 1 0  1 1 1 
                                  
                                         1 0 2 
                                               
                 
                 
                    1 0 0 
                           
                                    
                                    
                                        
                                        
                                                
                            1 0 2  =  1 1 1 
                                                
                                           
                     0 0 1    3 2 4       3 2 4
                                                 
                 
                 
                     1 1 1  0 1 0 
                                  
                                         1 1 1 
                                               
                                           
                 
                    1 0 2  1 0 0  =  0 1 2 
                                            
                                           
                     3 2 4    0 0 1       2 3 4
To multiply a row of our example matrix (e.g., the second row by 5),
multiply the desired row of an identity matrix and premultiply; for
multiplying a specic column (e.g., the second column by 5), carry
out the operation of the identity and post-multiply:
                                                 
                
                
                     1 0 0
                              1 1 1 
                                     
                                          1 1 1 
                                                 
                
                
                    0 5 0
                                    
                                     
                                         
                                         
                                                  
                             1 0 2  =  5 0 10 
                                                  
                                             
                     0 0 1     3 2 4       3 2 4
                                                 
                
                
                     1 1 1  1 0 0 
                                  
                                         1 5 1 
                                                
                                            
                    1 0 2  0 5 0  =  1 0 2 
                                            
                                            
                     3 2 4    0 0 1       3 10 4
                                     27
To add one row to a second (e.g., the rst row to the second), carry
out the operation on the identity and premultiply; to add one column
to a second (e.g., the rst column to the second), carry out the
operation of the identity and post-multiply:
                                                                     
                  1 0 0  1 1 1     1 1 1 
                                         
                                         
                  1 1 0  1 0 2  =  2 1 3 
                                         
                                         
                   0 0 1    3 2 4       3 2 4
                                                                     
                 
                 
                     1 1 1  1 0 0 
                                  
                                         1 2 1 
                                               
                 
                 
                    1 0 2 
                           
                                    
                                    
                                        
                                        
                                                
                            1 1 0  =  1 1 2 
                                                
                                           
                     3 2 4    0 0 1       3 5 4
  In general, by performing elementary row and column operations,
any U  V matrix can be reduced to a canonical form:
                                                                   
                        
                        
                            1          0       0          0
                        
                        
                            ...     ...   ...     ...     ...     ... 
                                                                      
                                                                      
                                                                     
                                                                     
                        
                        
                        
                            0          1       0          0    
                                                                      
                                                                     
                        
                        
                        
                            0          0       0          0    
                        
                        
                              ...   ...     ...     ...   ...   ...  
                                                                      
                                                                     
                                                                     
                            0          0       0          0
The rank of a matrix can then be found by counting the number of
ones in the above matrix.
   Given an U  V matrix, A, there exist s nonsingular elementary
row operation matrices, R1 , . . . , Rs , and t nonsingular elementary
column operation matrices, C1, . . . , Ct such that Rs    R1AC1    Ct
is in canonical form. Moreover, if A is square (U  U ) and of full
rank (i.e., det(A) = 0), then there are s nonsingular elementary row
                                              28
operation matrices, R1, . . . , Rs, and t nonsingular elementary col-
umn operation matrices, C1, . . . , Ct, such that Rs    R1A = I or
AC1    Ct = I. Thus, A1 can be found either as Rs    R1 or as
C1    Ct. In fact, a common way in which an inverse is calculated
by hand starts with both A and I on the same sheet of paper;
when reducing A step-by-step, the same operations are then applied
to I, building up the inverse until the canonical form is reached in
the reduction of A.
                                          29
the set of equations is said to be consistent if a solution exists, i.e., a
linear combination of the column vectors of A can be used to dene
c:
                                                    
                   a11                 a1V      c1 
                                                    
               x1 
                  
                      .
                      .
                      .                
                          +    + xV 
                                           .
                                           .
                                           .       . 
                                               =  .. 
                                                    
                                                    
                    aU1                   aUV        cU
or the augmented matrix (A c) has the same rank as A; otherwise no
solution exists and the system of equations is said to be inconsistent.
                                    31
                  Y = b 1 X1 + b 2 X2 +    + b K XK
for some values b1, . . . , bK . We could always write for any values of
b1 , . . . , bK :
                Y = b 1 X1 + b 2 X2 +    + b K XK + e
where                                   
                                    e1 
                                   
                             e=      . 
                                    .. 
                                         
                                        
                                        
                                     eN
is an error vector. To formulate our task as an optimization problem
(least-squares), we wish to nd a good set of weights, b1, . . . , bK , so
the length of e is minimized, i.e., ee is made as small as possible.
   As notation, let
                    	           
       b1           
                                                     
                 X = X1 . . . XK ; b =   .
                                        ..
                                                      
                                                      
                                                     
                                                     
                                         bK
To minimize ee = (Y  Xb)(Y  Xb), we use the vector b that
satises what are called the normal equations:
                             X Xb = X Y
If XX is nonsingular (i.e., det(XX) = 0; or equivalently, X1, . . . , XK
are linearly independent), then
                                    32
                           b = (XX)1X Y
The vector that is closest to Y in our least-squares sense, is Xb;
this is a linear combination of the columns of X (or in other jargon,
Xb denes the projection of Y into the space dened by (all linear
combinations of) the columns of X.
   In statistical uses of multiple regression, the estimated variance-
covariance matrix of the regression coecients, b1, . . . , bK , is given as
   1
( NK  )ee(XX)1, where ( NK1
                                  )ee is an (unbiased) estimate of the
error variance for the distribution from which the errors are assumed
drawn. Also, in multiple regression instances that usually involve an
additive constant, the latter is obtained from a weight attached to
an independent variable dened to be identically one.
   In multivariate multiple regression where there are, say, T depen-
dent variables (each represented by an N  1 vector), the dependent
vectors are merely concatenated together into an N  T matrix,
YNT ; the solution to the normal equations now produces a matrix
BKT = (XX)1X Y of regression coecients. In eect, this gen-
eral expression just uses each of the dependent variables separately
and adjoins all the results.
Suppose we are given a square matrix, AUU , and consider the poly-
nomial det(A  I) in the unknown value , referred to as Laplaces
expansion:
det(A  I) = ()U + S1()U1 +    + SU1 ()1 + SU ()0
                                     33
where Su is the sum of all u  u principal minor determinants. A
principal minor determinant is obtained from a submatrix formed
from A that has u diagonal elements left in it. Thus, S1 is the trace
of A and SU is the determinant.
   There are U roots, 1, . . . , U , of the equation det(A  I) = 0,
given that the left-hand-side is a U th degree polynomial. The roots
are called the eigenvalues of A. There are a number of properties
of eigenvalues that prove generally useful:
                                         
   (A) det A = Uu=1 u ; trace(A) = Uu=1 u;
   (B) if A is symmetric with real elements, then all u are real;
   (C) if A is positive denite, then all u are positive (strictly greater
than zero); if A is positive semi-denite, then all u are nonnegative
(greater than or equal to zero);
   (D) if A is symmetric and positive semi-denite with rank R, then
there are R positive roots and U  R zero roots;
   (E) the nonzero roots of AB are equal to those of BA; thus, the
trace of AB is equal to the trace of BA;
   (F) eigenvalues of a diagonal matrix are the diagonal elements
themselves;
   (G) for any U  V matrix B, the ranks of B, BB, and BB
are all the same. Thus, because B B (and BB) are symmetric and
positive semi-denite (i.e., x(BB)x  0 because (Bx) (Bx) is a
sum-of-squares which is always nonnegative), we can use (D) to nd
the rank of B by counting the positive roots of BB.
   We carry through a small example below:
                                    34
                                          
                                     7 0 1
                                           
                                  
                             A=   
                                     0 7 2
                                            
                                           
                                      1 2 3
                         S1 = trace(A) = 17
                                           
           7 0        7 1        7 2 
S2 = det(
              )+det(     )+det(     ) = 49+20+17 = 86
           0 7          1 3          2 3
          S3 = det(A) = 147 + 0 + 0  7  28  0 = 112
Thus,
                             Axu = uxu
An eigenvector is determined up to a scale factor only, so typically
we normalize to unit length (which then gives a  option for the two
possible unit length solutions).
   We continue our simple example and to nd the corresponding
eigenvalues: when  = 2, we have the equations (for [A  I]x = 0)
                                      35
                                                          
                     
                     
                         5 0 1
                                  x1             
                                                  
                                                          
                                                          
                                                              0
                                                      
                     
                        0 5 2  x2
                                                 
                                                     =   
                                                             0
                                                                
                                                           
                         1 2 1     x3                         0
with an arbitrary solution of
                                                 
                                
                                
                                     15 a        
                                                  
                                                 
                                
                                    25 a        
                                                  
                                                 
                                             a
Choosing a to be + 530 to obtain one of the two possible normalized
solutions, we have as our nal eigenvector for  = 2:
                                                 
                                  30
                                    1            
                                                 
                                                 
                                  2            
                                    30           
                                                 
                                                 
                                        5
                                         30
                             Axv = v xv
lead to
                          xv Axu = xv uxu
                          xv xu u = xu xv v
Due to the equality of xv xu and xuxv , and by assumption, u = v ,
the inner product xv xu must be zero for the last displayed equality
to hold.
   In summary of the above discussion, for every real symmetric ma-
trix AUU , there exists an orthogonal matrix P (i.e., P P = PP =
I) such that P AP = D, where D is a diagonal matrix containing
the eigenvalues of A, and
                              	                   
                         P = p1 . . . pU
                                   37
where pu is a normalized eigenvector associated with u for 1  u 
U . If the eigenvalues are not distinct, it is still possible to choose the
eigenvectors to be orthogonal. Finally, because P is an orthogonal
matrix (and P AP = D  PP APP = PDP ), we can nally
represent A as
                                        A = PDP
   In terms of the small numerical example being used, we have for
P AP = D:
                                   
                                                                           
           130    230    5 
                               30    7 0 1      
                                                        130    25   1
                                                                          6   
                                                                           
                                                                         
       
           25       1        0 
                                     0 7 2      
                                                  
                                                        230     1    2    
                                                                                 =
                        5                                     5     6   
                                                                           
              1       2      1      1 2 3             5         0   1
                6        6       6                         30             6
                                                 
                                         2 0 0 
                                               
                                               
                                         0 7 0 
                                               
                                               
                                          0 0 8
and for PDP = A:
                                                                         
           130    25    1
                               6      2 0 0    
                                                       130    230   5 
                                                                         30 
                                              
                                                                         
       
           230     1
                        5
                             2
                               6
                                   
                                      0 7 0    
                                                 
                                                       25       1
                                                                    5
                                                                          0
                                                                            =
                                                                         
                                                                         
             5        0     1         0 0 8            1       2    1
               30              6                           6        6     6
                                                 
                                         7 0 1 
                                               
                                               
                                         0 7 2 
                                               
                                               
                                          1 2 3
  The representation of A as PDP leads to several rather nice
computational tricks. First, if A is p.s.d., we can dene
                                            38
                                  
                                                               
                                
                                
                                    1 . . . 0                  
                                                                
                     D   1/2    
                                 .
                                   .. . . .   ...               
                                                                
                                
                                                              
                                                                
                                   0 . . . U
and represent A as
                   1/2
          L = PD =          1 p 1 2 p 2 . . . U p U
Secondly, if A is p.d., we can dene
                                                           
                                        1
                                       1    ...     0     
                                                           
                         D   1
                                     
                                      
                                          ... . . .   ...   
                                                            
                                                           
                                                      1    
                                          0 ...       U
and represent A1 as
                               A1 = PD1P
To verify,
                  AA1 = (PDP )(PD1P ) = I
Thirdly, to dene a square root matrix, let A1/2  PD1/2P. To
verify, A1/2A1/2 = PDP = A.
   There is a generally interesting way to represent the multiplication
of two matrices considered as collections of column and row vectors,
respectively, where the nal answer is a sum of outer products of
vectors. This view will prove particularly useful in our discussion of
                                          39
principal component analysis. Suppose we have two matrices BUV ,
represented as a collection of its V columns:
                            	                    
                       B = b1 b2 . . . bV
and CV W , represented as a collection of its V rows:
                                       
                                   
                                c1     
                                       
                                      
                                c2     
                             C=
                                .
                                ..
                                        
                                        
                                        
                                       
                                       
                                 cV
The product BC = D can be written as
                                                    
                                          
                                       c1           
                                                    
                     	              
  c           
                                         2          
                 BC = b1 b2 . . . bV  .
                                       ..
                                                     
                                                     
                                                     
                                                         =
                                                    
                                                    
                                        cV
                   b1c1 + b2c2 +    + bV cV = D
   As an example, consider the spectral decomposition of A consid-
ered above as PDP , and where from now on, without loss of any gen-
erality, the diagonal entries in D are ordered as 1  2      U .
We can represent A as
                                                        
                                                        
                    	 
                                                   1 p 1 
                                            
                                              
                                                   .     
                                                          
           AUU =        1 p 1 . . . U p U      .
                                                    .      =
                                               
                                                          
                                                         
                                                   U p U
                      1p1p1 +    + U pU pU
If A is p.s.d. and of rank R, then the above sum obviously stops at
R components. In general, the matrix BUU that is a rank K ( R)
                                   40
least-squares approximation to A can be given by
                   B = 1p1p1 +    + k pK pK
and the value of the loss function:
              U 
               U                                          1
                    (auv  buv )2 = (2K+1 +    + 2U ) 2
             v=1 u=1
                                          44
and XQ+1, . . . , XP , and the observed covariance matrix AP P is par-
titioned into four parts:
                                                 
                                     A11 A12
                          AP P   =
                                   
                                                  
                                                  
                                     A12 A22
where A11 is Q  Q and represents the observed covariances among
the variables in the rst battery; A22 is (P  Q)  (P  Q) and
represents the observed covariances among the variables in the second
battery; A12 is Q  (P  Q) and represents the observed covariances
between the variables in the rst and second batteries. Consider the
following two equations in unknown vectors a and b, and unknown
scalar :
                         A1      1 
                          11 A12 A22 A12 a = a
                         A1     1
                          22 A12 A11 A12 b = b
                                      45
equivalent equation systems are typically used to obtain the solutions
to the original set of expressions.)
                            gB1g  aP P
or if we think correlations rather than merely covariances (so the
main diagonal of A consists of all ones):
                             g B1g  1
Given the correlation matrix B, the possible values the correlations
in g could have are in or on the ellipsoid dened in P  1 dimensions
by gB1g  1. The important point is that we do not have a box
                                  46
in P  1 dimensions containing the correlations with sides extending
the whole range of 1; instead, some restrictions are placed on the
observable correlations that gets dened by the size of the correlation
in B. For example, when P = 3, a correlation between variables
X1 and X2 of r12 = 0 gives the degenerate ellipse of a circle for
constraining the correlation values between X1 and X2 and the third
variable X3 (in a two-dimensional r13 versus r23 coordinate system);
for r12 = 1, the ellipse attens to a line in this same two-dimensional
space.
   Another algebraic restriction that can be seen immediately is based
on the formula for the partial correlation between two variables,
holding the third constant:
                              r12  r13r23
                          
                           (1  r13
                                 2 )(1  r 2 )
                                          23
                                   47
   The solution is to rst nd the singular value decomposition of A
as UDV , where U is n  r and has orthonormal columns, V is
m  r and has orthonormal columns, and D is r  r, diagonal, with
positive values d1  d2      dr > 0 along the main diagonal.
Then, B is dened as UDV, where we take the rst t columns of
U and V to obtain U and V, respectively, and the rst t values,
d1      dt, to form a diagonal matrix D.
   The approximation of A by a rank t matrix B, has been one mech-
anism for representing the row and column objects dening A in a
low-dimensional space of dimension t through what can be generi-
cally labeled as a biplot (the prex bi refers to the representation
of both the row and column objects together in the same space).
Explicitly, the approximation of A and B can be written as
                                                
           B = UDV = UDD(1)V = PQ ,
where  is some chosen number between 0 and 1, P = UD and
                           
is n  t, Q = (D(1)V ) and is m  t.
   The entries in P and Q dene coordinates for the row and column
objects in a t-dimensional space that, irrespective of the value of 
chosen, have the following characteristic:
   If a vector is drawn from the origin through the ith row point and
the m column points are projected onto this vector, the collection of
such projections is proportional to the ith row of the approximating
matrix B. The same is true for projections of row points onto vectors
from the origin through each of the column points.
                                  48
0.4.6   The Procrustes Problem
                            1
                    dij =  (d2ij  Ai  Bj + C);                    (2)
                            2
                                               n
                                               
                          Ai = (1/n)                d2ij ;
                                              j=1
                                               n
                                               
                          Bj = (1/n)                d2ij ;
                                              i=1
                                         n n
                                       2  
                         C = (1/n )                    d2ij .
                                             i=1 j=1
                                        51
   So, the Question: If I give you D = {dij }nn, nd me a set of
coordinates to do it. The Solution: Find D = {dij }, and take its
Spectral Decomposition. This is exact here.
   To use this result to obtain a spatial representation for a set of
n objects given any distance-like measure, pij , between objects i
and j, we proceed as follows:
   (a) Assume (i.e., pretend) the Euclidean model holds for pij .
   (b) Dene pij from pij using (1).
   (c) Obtain a spatial representation for pij using a suitable value
for r, the number of dimensions (at most, r can be no larger than
the number of positive eigenvalues for {pij }nn ):
                               {pij }  XX
  (d) Plot the n points in r dimensional space.
  where
                             
                              n
                                 k=1;k=i bik       (i = j)
                             
                    aij = 
                                   bij            (i = j)
If all elements of B are positive, then A is of rank n  1, and has
one smallest eigenvalue equal to zero with an associated eigenvector
                                      52
having all constant elements. Because all (other) eigenvectors must
be orthogonal to the constant eigenvector, the entries in these other
eigenvectors must sum to zero.
   This Guttman result can be used for a method of multidimensional
scaling (mds), and is one that seems to get reinvented periodically
in the literature. Generally, this method has been used to provide
rational starting points in iteratively-dened nonmetric mds.
                                       53
  sum(sum((X - repmat(mean(X,1), size(X,1), 1)).^2, 1))
X and Y are assumed to have the same number of points (rows), and
PROCRUSTES matches the ith point in Y to the ith point in X. Points
in Y can have smaller dimension (number of columns) than those in X.
In this case, PROCRUSTES adds columns of zeros to Y as necessary.
Examples:
54