KEMBAR78
MV - Principal Components Using SAS | PDF | Principal Component Analysis | Factor Analysis
0% found this document useful (0 votes)
155 views69 pages

MV - Principal Components Using SAS

This document discusses principal component analysis using SAS. It begins by explaining the basic principles of PCA, which are to reduce the dimensionality of data while retaining as much information as possible. It then provides details on how to calculate principal components for both population and sample data. Specifically, it shows how to derive the principal components from the covariance matrix by solving optimization problems to find linear combinations of the original variables with maximum variance. These principal components form a new coordinate system that requires fewer dimensions to explain the variance in the data.

Uploaded by

Nadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views69 pages

MV - Principal Components Using SAS

This document discusses principal component analysis using SAS. It begins by explaining the basic principles of PCA, which are to reduce the dimensionality of data while retaining as much information as possible. It then provides details on how to calculate principal components for both population and sample data. Specifically, it shows how to derive the principal components from the covariance matrix by solving optimization problems to find linear combinations of the original variables with maximum variance. These principal components form a new coordinate system that requires fewer dimensions to explain the variance in the data.

Uploaded by

Nadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 69

Principal Components Using SAS

Prof. Dr. Mudassir Uddin


Department of Statistics
University of Karachi

1
Principal Components Analysis
A. The Basic Principle
We wish to explain/summarize the underlying variance-
covariance structure of a large set of variables through a
few linear combinations of these variables. The
objectives of principal components analysis are

- data reduction

- interpretation

The results of principal components analysis are often


used as inputs to

- regression analysis

- cluster analysis
B. Population Principal Components
Suppose we have a population measured on p random
variables X1,…,Xp. Note that these random variables
represent the p-axes of the Cartesian coordinate system in
which the population resides. Our goal is to develop a
new set of p axes (linear combinations of the original p
axes) in the directions of greatest variability:
X2

X1

This is accomplished by rotating the axes.


Consider our random vector

 X1 
X 
X =  
2

  
 
 X p 

with covariance matrix 


~ and eigenvalues 1  2    p.

We can construct p linear combinations

Y1 = a'1 X = a11 X 1 + a12X 2 +  + a1p X p


' 
Y2 = a2 X = a21X 1 + a22X 2 +  + a2p X p
 

Yp = a'p X = a p1 X 1 + a p2X 2 +  + a pp X p
 
It is easy to show that

Var  Yi  = a'iΣai, i = 1,  , p
  '
Cov  Yi, Yk  = aiΣak, i, k = 1,  , p
 
The principal components are those uncorrelated linear
combinations Y1,…,Yp whose variances are as large as
possible.
Thus the first principal component is the linear
combination of maximum variance, i.e., we wish to solve
the nonlinear optimization problem
max a'1Σa1
source of a1
   restrict to
nonlinearit
y
st a'1a1 = 1 coefficient
  vectors of unit
length
The second principal component is the linear
combination of maximum variance that is uncorrelated
with the first principal component, i.e., we wish to solve
the nonlinear optimization problem
max a'2Σa2
a2
  
restricts
st a'2a2 = 1 covariance
 
'
a1Σa2 = 0 to zero
 
The third principal component is the solution to the
nonlinear optimization problem
max a'3Σa3
a3
  
st a'3a3 = 1
'  restricts
a1Σa3 = 0 covariance
'   s to zero
a2Σa3 = 0
Generally, the ith principal component is the linear
combination of maximum variance that is uncorrelated
with all previous principal components, i.e., we wish to
solve the nonlinear optimization problem
max a'iΣai
ai
  
st a'iai = 1
' 
ak Σai = 0 k < i
 
We can show that, for random vector X ~
with covariance
matrix ~ and eigenvalues 1  2    p  0, the ith
principal component is given by

Yi = e'i X = e'i1X 1 + e'i2X 2 +  + e'ip X p, i = 1,  , p


 
Note that the principal components are not unique if
some eigenvalues are equal.
We can also show for random vector X with covariance
~
matrix  and eigenvalue-eigenvector pairs (1 , e1), …, (p ,
~ ~
e~p) where 1  2    p,
p p
σ11 +  + σ pp =  Var  X  =
i =1
i λ1 +  + λ p =  Var Y 
i =1
i

so we can assess how well a subset of the principal


components Yi summarizes the original random variables
Xi – one common method of doing so is
λk proportion of total
p population variance
 λi due to the kth
i =1 principal
component
If a large proportion of the total population variance can
be attributed to relatively few principal components, we
can replace the original p variables with these principal
components without loss of much information!
We can also easily find the correlations between the
original random variables Xk and the principal
components Yi:

eik λi
ρYi,X k =
σkk

These values are often used in interpreting the principal


components Yi.
Example: Suppose we have the following population of
four observations made on three random variables X1, X2,
and X3:

X1 X2 X3
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0

Find the three population principal components Y1, Y2,


and Y3:
First we need the covariance matrix :
~
1.50 2.50 1.00
 
Σ = 2.50 6.00 3.50

1.00 3.50 5.25

and the corresponding eigenvalue-eigenvector pairs:


 0.2910381
 
λ1 = 9.9145474, e1 =  0.7342493
 0.6133309
 0.4150386
 
λ2 = 2.5344988, e2 =  0.4807165
-0.7724340
 0.8619976
 
λ3 = 0.3009542, e3 = -0.4793640 
 0.1648350
so the principal components are:

Y1 = e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3


'
Y2 = e2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3
' 
Y3 = e3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3
 

Note that

σ11 + σ22 + σ33 = 2.0 + 8.0 + 7.0 = 17.0


= 9.9145474 + 2.5344988 + 0.3009542 = λ1 + λ2 + λ3
and the proportion of total population variance due to
the each principal component is
λ1 9.9145474
p
= = 0.777611529
17.0
λ
i=1
i

λ2 2.5344988
p
= = 0.198784220
17.0
λ
i=1
i

λ3 0.3009542
p
= = 0.023604251
17.0
λ
i=1
i

Note that the third principal component is relatively


irrelevant!
Next we obtain the correlations between the original
random variables Xi and the principal components Yi:

e11 λ1 0.2910381 9.9145474


ρY1,X1 = = = 0.610935027
σ11 1.50
e21 λ1 0.7342493 9.9145474
ρY1,X2 = = = 0.385326368
σ22 6.00
e31 λ1 0.6133309 9.9145474
ρY1,X3 = = = 0.367851033
σ33 5.25
e12 λ2 0.4150386 2.5344988
ρY2,X1 = = = 0.440497325
σ11 1.50
e22 λ2 0.4807165 2.5344988
ρY2,X2 = = = 0.127550987
σ21 6.00
e32 λ2 -0.7724340 2.5344988
ρY2,X3 = = = -0.234233023
σ33 5.25
e13 λ3 0.8619976 0.3009542
ρY3,X1 = = = 0.315257191
σ11 1.50
e23 λ3 -0.4793640 0.3009542
ρY3,X2 = = = -0.043829283
σ22 6.00
e33 λ3 0.1648350 0.3009542
ρY3,X3 = = = 0.017224251
σ33 5.25
We can display these results in a correlation matrix:

X1 X2 X3
Y1 0.6109350 0.3853264 0.3678510
Y2 0.4404973 0.1275510 -0.2342330
Y3 0.3152572 -0.0438293 0.0172243

Here we can easily see that


- the first principal component (Y1) is a mixture of all three
random variables (X1, X2, and X3)
- the second principal component (Y2) is a trade-off
between X1 and X3
- the third principal component (Y3) is a residual of X1
When the principal components are derived from an X ~
~
Np(,) distributed population, the density of X is
~~ ~
constant on the -centered
~
ellipsoids

   
'
x - μ Σ x - μ = c2
    

which have axes

c λi, i = 1,  , p

where (i,ei) are the eigenvalue-eigenvector pairs of .


~ ~
We can set  = 0 w.l.g. – we can then write
~ ~

1 ' 1 '
   
2 2
2 '
c = x Σx = e1x ++ epx
  λ1   λp  

where the e'i x are the principal components of x.


~
 '
Setting y i = ei x and substituting into the previous

expression yields

1 2
2 1 2
c = y1 +  + yp
λ1 λp
which defines an ellipsoid (note that i > 0  i) in a
coordinate system with axes y1,…,yp lying in the
directions of e~~1,…,e
~
~ p, respectively.
The major axis lies in the direction determined by the
eigenvector ei associated with the largest eigenvalue i -
~
the remaining minor axes lie in the directions
determined by the other eigenvectors.
Example: For the principal components derived from the
following population of four observations made on three
random variables X1, X2, and X3:

X1 X2 X3
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0

plot the major and minor axes.


We will need the centroid :

 3.0
 
μ = 10.0
 11.5

The direction of the major axis is given by

e'1X = 0.2910381X 1 + 0.7342493X 2 + 0.6133309X 3


 
while the directions of the two minor axis are given by

e'2X = 0.4150386X 1 + 0.4807165X 2 - 0.7724340X 3


 
e'3 X = 0.8619976X 1 - 0.4793640X 2 + 0.1648350X 3
 
We first graph the centroid:

X2

3.0,10.0,15 X
1
.0

X3
…then use the first eigenvector to find a second point on
the first principal axis:
X2

Y1

X1

X3

The line connecting these two points is the Y1 axis.


…then do the same thing with the second eigenvector:
Y2
X2

Y1

X1

X3

The line connecting these two points is the Y2 axis.


…and do the same thing with the third eigenvector:
Y2
X2

Y1

X1

Y3

X3

The line connecting these two points is the Y3 axis.


What we have done is a rotation…
Y2

X2

Y1

X1

Y3

X3
and a translation in p = 3 dimensions.
Y2 Y2
X2

Note that the rotated


axes remain
orthogonal! Y1

X1

Y3

X3
Note that we can also construct principal components for
the standardized variables Zi:

X i - μi
Zi = , i = 1,  , p
σii
which in matrix notation is

   X - μ 
-1
12
Z = V
  
where V1/2 is the diagonal standard deviation matrix.
~
Obviously
E Z  = 0
 
   
-1 -1
Cov Z = V 1 2 Σ V 12
= ρ
   
This suggests that the principal components for the
standardized variables Zi may be obtained from the
eigenvectors of the correlation matrix ~! The operations
are analogous to those used in conjunction with the
covariance matrix.

We can show that, for random vector Z~


of standardized
variables with covariance matrix  and eigenvalues 1  2
~
   p  0, the i principal component is given by
th

 V   X - μ  , i
' ' -1
12
Yi = e Z = e
i i = 1,  , p
   
Note again that the principal components are not unique
if some eigenvalues are equal.
We can also show for random vector Z with covariance
~
matrix  and eigenvalue-eigenvector pairs (1 , e1), …, (p ,
~ ~
e~p) where 1  2    p,
p p

 Var Z 
i =1
i = λ1 +  + λ p =  Var Y 
i =1
i = p

and we can again assess how well a subset of the


principal components Yi summarizes the original random
variables Xi by using
proportion of total
λk population variance
p due to the kth
principal
component
If a large proportion of the total population variance can
be attributed to relatively few principal components, we
can replace the original p variables with these principal
components without loss of much information!
Example: Suppose we have the following population of
four observations made on three random variables X1, X2,
and X3:

X1 X2 X3
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0

Find the three population principal components


variables Y1, Y2, and Y3 for the standardized random
variables Z1, Z2, and Z3:
We could standardize the variables X1, X2, and X3, then
work with the resulting covariance matrix ,
~ but it is
much easier to proceed directly with correlation matrix :
~
1.000 0.833 0.356 
 
ρ = 0.833 1.000 0.624 
 0.356 0.624 1.000 
and the corresponding eigenvalue-eigenvector pairs:
0.58437383
 
λ1 = 2.2149347, e1 = 0.63457754
0.50578527
-0.5449250 These results differ

λ2 = 0.6226418, e2 = -0.1549791
 from the
 0.8240377 covariance- based
principal
 0.6013018 components!
 
λ3 = 0.1624235, e3 = -0.7571610
 0.2552315
so the principal components are:

Y1 = e'1Z = 0.5843738Z1 + 0.6345775Z2 + 0.5057853Z3


' 
Y2 = e2Z = -0.5449250Z1 - 0.1549791Z2 + 0.8240377Z3
' 
Y3 = e3Z = 0.6013018Z1 - 0.7571610Z2 + 0.2552315Z3
 

Note that

σ11 + σ22 + σ33 = 1.0 + 1.0 + 1.0 = 3.0


= 2.2149347 + 0.6226418 + 0.1624235 = λ1 + λ2 + λ3
and the proportion of total population variance due to
the each principal component is
λ1 2.2149347
p
= = 0.738311567
3.0
λ
i=1
i

λ2 0.6226418
p
= = 0.207547267
3
λ
i=1
i

λ3 0.1624235
p
= = 0.054141167
3
λ
i=1
i

Note that the third principal component is again


relatively irrelevant!
Next we obtain the correlations between the original
random variables Xi and the principal components Yi:

ρY1,Z1 = e11 λ1 = 0.58437383 2.2149347 = 0.869703464

ρY1,Z2 = e21 λ1 = 0.6345775 2.2149347 = 0.944419907

ρY1,Z3 = e31 λ1 = 0.5057853 2.2149347 = 0.752742749

ρY2,Z1 = e12 λ2 = -0.5449250 0.6226418 = -0.429987538

ρY2,Z2 = e22 λ 2 = -0.1549791 0.6226418 = -0.122290294


ρY2,X3 = e32 λ2 = 0.8240377 0.6226418 = 0.650228824

ρY3,X1 = e13 λ3 = 0.6013018 0.1624235 = 0.242335443

ρY3,X2 = e23 λ3 = -0.7571610 0.1624235 = -0.305149504

ρY3,X3 = e33 λ3 = 0.2552315 0.1624235 = 0.102862886


We can display these results in a correlation matrix:

Z1 Z2 Z3
Y1 0.8697035 0.944420 0.7527427
Y2 -0.4299875 -0.122290 0.6502288
Y3 0.2423354 -0.305150 0.1028629

Here we can easily see that


- the first principal component (Y1) is a mixture of all three
random variables (X1, X2, and X3)
- the second principal component (Y2) is a trade-off
between X1 and X3
- the third principal component (Y3) is a trade-off between
SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff OUT=pcstuff N=3;
VAR x1 x2 x3;
RUN;
PROC CORR DATA=pcstuff;
VAR x1 x2 x3;
WITH prin1 prin2 prin3;
RUN;
PROC FACTOR DATA=stuff SCREE;
VAR x1 x2 x3;
RUN;
Note that the SAS default is to use the correlation matrix
to perform this analysis!
SAS output for Principal Components Analysis:
The PRINCOMP Procedure
Observations 4
Variables 3

Simple Statistics
x1 x2 x3
Mean 3.000000000 10.00000000 11.50000000
StD 1.414213562 2.82842712 2.64575131

Correlation Matrix
x1 x2 x3
x1 Random Variable 1 1.0000 0.8333 0.3563
x2 Random Variable 2 0.8333 1.0000 0.6236
x3 Random Variable 3 0.3563 0.6236 1.0000

Eigenvalues of the Correlation Matrix


Eigenvalue Difference Proportion Cumulative
1 2.22945702 1.56733894 0.7432 0.7432
2 0.66211808 0.55369318 0.2207 0.9639
3 0.10842490 0.0361 1.0000

Eigenvectors
Prin1 Prin2 Prin3
x1 Random Variable 1 0.581128 -0.562643 0.587982
x2 Random Variable 2 0.645363 -0.121542 -0.754145
x3 Random Variable 3 0.495779 0.817717 0.292477
SAS output for Correlation Matrix – Original Random
Variables vs. Principal Components:
The CORR Procedure

3 With Variables: Prin1 Prin2 Prin3


3 Variables: x1 x2 x3

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Prin1 4 0 1.49314 0 -2.20299 1.11219
Prin2 4 0 0.81371 0 -0.94739 0.99579
Prin3 4 0 0.32928 0 -0.28331 0.47104
x1 4 3.00000 1.41421 12.00000 1.00000 4.00000
x2 4 10.00000 2.82843 40.00000 6.00000 12.00000
x3 4 11.50000 2.64575 46.00000 9.00000 15.00000

Pearson Correlation Coefficients, N = 4


Prob > |r| under H0: Rho=0

x1 x2 x3

Prin1 0.86770 0.96362 0.74027


0.1323 0.0364 0.2597

Prin2 -0.45783 -0.09890 0.66538


0.5422 0.9011 0.3346

Prin3 0.19361 -0.24832 0.09631


0.8064 0.7517 0.9037
SAS output for Factor Analysis

PRINCIPAL COMPONENTS ANALYSIS


FOR QA 610
SPRING QUARTER 2001
Using PROC FACTOR to obtain a Scree Plot for Principal Components Analysis

The FACTOR Procedure


Initial Factor Method: Principal Components

Prior Communality Estimates: ONE

Eigenvalues of the Correlation Matrix: Total = 3 Average = 1


Note that
this is
Eigenvalue Difference Proportion Cumulative consistent
with the
1 2.22945702 1.56733894 0.7432 0.7432
2 0.66211808 0.55369318 0.2207 0.9639
results
3 0.10842490 0.0361 1.0000 from PCA

1 factor will be retained by the MINEIGEN criterion.


SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Scree Plot of Eigenvalues






2.5 ˆ


‚ 1


2.0 ˆ


E ‚
i ‚
g ‚
e 1.5 ˆ
n ‚
v ‚
a ‚
l ‚
u ‚
e 1.0 ˆ
s ‚


‚ 2

0.5 ˆ




‚ 3
0.0 ˆ





Šƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ
0 1 2 3

Number
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Pearson Correlation
Factor Pattern Coefficients for the first
Factor1 principal component
with the three original
x1 Random Variable 1 0.86770 variables X1, X2, and X3
x2 Random Variable 2 0.96362
x3 Random Variable 3 0.74027

Variance Explained by Each Factor

Factor1
First eigenvalue 1
2.2294570

Final Communality Estimates: Total = 2.229457

x1 x2 x3

0.75291032 0.92855392 0.54799278


SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff OUT=pcstuff N=3 COV;
VAR x1 x2 x3;
RUN;
PROC CORR DATA=pcstuff;
VAR x1 x2 x3;
WITH prin1 prin2 prin3;
RUN;
PROC FACTOR DATA=stuff SCREE COV;
VAR x1 x2 x3;
RUN;
Note that here we use SAS to derive the covariance
matrix based principal components!
SAS output for Principal Components Analysis:
The PRINCOMP Procedure
Observations 4
Variables 3

Simple Statistics
x1 x2 x3
Mean 3.000000000 10.00000000 11.50000000
StD 1.414213562 2.82842712 2.64575131

Covariance Matrix
x1 x2 x3
x1 Random Variable 1 2.000000000 3.333333333 1.333333333
x2 Random Variable 2 3.333333333 8.000000000 4.666666667
x3 Random Variable 3 1.333333333 4.666666667 7.000000000

Total Variance 17

Eigenvalues of the Covariance Matrix


Eigenvalue Difference Proportion Cumulative
1 13.2193960 9.8400643 0.7776 0.7776
2 3.3793317 2.9780594 0.1988 0.9764
3 0.4012723 0.0236 1.0000

Eigenvectors
Prin1 Prin2 Prin3
x1 Random Variable 1 0.291038 0.415039 0.861998
x2 Random Variable 2 0.734249 0.480716 -.479364
x3 Random Variable 3 0.613331 -.772434 0.164835
SAS output for Correlation Matrix – Original Random
Variables vs. Principal Components:
The CORR Procedure

3 With Variables: Prin1 Prin2 Prin3


3 Variables: x1 x2 x3

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Prin1 4 0 3.63585 0 -5.05240 3.61516
Prin2 4 0 1.83830 0 -1.74209 2.53512
Prin3 4 0 0.63346 0 -0.38181 0.94442
x1 4 3.00000 1.41421 12.00000 1.00000 4.00000
x2 4 10.00000 2.82843 40.00000 6.00000 12.00000
x3 4 11.50000 2.64575 46.00000 9.00000 15.00000

Pearson Correlation Coefficients, N = 4


Prob > |r| under H0: Rho=0

x1 x2 x3

Prin1 0.74824 0.94385 0.84285


0.2518 0.0561 0.1571

Prin2 0.53950 0.31243 -0.53670


0.4605 0.6876 0.4633

Prin3 0.38611 -0.10736 0.03947


0.6139 0.8926 0.9605
SAS output for Factor Analysis

PRINCIPAL COMPONENTS ANALYSIS


FOR QA 610
SPRING QUARTER 2001
Using PROC FACTOR to obtain a Scree Plot for Principal Components Analysis

The FACTOR Procedure


Initial Factor Method: Principal Components

Prior Communality Estimates: ONE

Eigenvalues of the Covariance Matrix: Total = 17 Average = 5.66666667


Note that
this is
Eigenvalue Difference Proportion Cumulative consistent
with the
1 13.2193960 9.8400643 0.7776 0.7776
2 3.3793317 2.9780594 0.1988 0.9764
results
3 0.4012723 0.0236 1.0000 from PCA

1 factor will be retained by the MINEIGEN criterion.


SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components
Scree Plot of Eigenvalues



14 ˆ

‚ 1


12 ˆ




10 ˆ

E ‚
i ‚
g ‚
e 8ˆ
n ‚
v ‚
a ‚
l ‚
u 6ˆ
e ‚
s ‚




‚ 2






‚ 3




Šƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒ
0 1 2 3

Number
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method: Principal Components

Factor Pattern Pearson Correlation


Factor1 Coefficients for the first
principal component
x1 Random Variable 1 0.74824 with the three original
x2 Random Variable 2 0.94385 variables X1, X2, and X3
x3 Random Variable 3 0.84285

Variance Explained by Each Factor

Factor Weighted Unweighted

Factor1 13.2193960 2.16112149

First eigenvalue 1
Final Communality Estimates and Variable Weights
Total Communality: Weighted = 13.219396 Unweighted = 2.161121

Variable Communality Weight

x1 0.55986257 2.00000000
x2 0.89085847 8.00000000
x3 0.71040045 7.00000000
Covariance matrices with special structures yield
particularly interesting principal components:
- Diagonal covariance matrices – suppose  is the diagonal
~
matrix σ11 0  0 
0 σ  0
Σ =  
22

     
 
 0 0  σ pp 

since the eigenvector e~ i has a value of 1 in the ith position
and 0 in all other positions, we have
0
   so (ii,ei) is the i
th

σ11 0  0    eigenvalue-
0 σ  0  0 
 22  
eigenvecotr pair
Σei =
      σii  = σiie i
  0
 0 0  σ pp   
 
0
 
…so the linear combination

Yk = Σe'i X = X i
 
demonstrates that the set of principal components and
the original set of (uncorrelated) random variables are
the same!
Note that this result is also true if we work with the
correlation matrix.
- constant variances and covariance matrices – suppose  is
~
the patterned matrix  σ2 ρσ2  ρσ2 
 2 2 2
ρσ σ  ρσ 
Σ = 
     
 2 2 2 
ρσ ρσ  σ 

Here the resulting correlation matrix

1 ρ  ρ
ρ 1  ρ
ρ = 
    
 
ρ ρ  1

is also the covariance matrix of the standardized


variables Z Here the resulting correlation matrix
~
C. Using Principal Components to
Summarize Sample Variation
Suppose the data x~1,…,x~n represent n independent
observations from a p-dimensional population with
some mean vector  ~ and covariance
_ matrix 
~ – these data
yield a sample mean vector ~x, sample covariance matrix
S,
~ and sample correlation matrix R. ~

As in the population case, our goal is to develop a new


set of p axes (linear combinations of the original p axes)
in the directions of greatest variability:
y1 = a'1x = a11x1 + a12x 2 +  + a1p x p
' 
y 2 = a2x = a21x1 + a22x 2 +  + a 2p x p
 

y p = a'p x = a p1x1 + a p2x 2 +  + a pp x p
Again it is easy to show that the linear combinations
a'i x = ai1xj1 + ai2xj2 +  + aip x jp
 
have sample means a'i x and
 
 '

 
 '
Var ai x = aiSai, i = 1,  , p
 
 
Cov a'i x, a'k x = a'iSa k, i, k = 1,  , p
     
The principal components are those uncorrelated linear
combinations y^ 1,…,y^ p whose variances are as large as
possible.
Thus the first principal component is the linear
combination of maximum sample variance, i.e., we wish
to solve the nonlinear optimization problem
source of
max a'1Sa1
nonlinearit
a1
   restrict to
coefficient
y st a'1a1 = 1 vectors of unit
 
The second principal component is the linear
combination of maximum sample variance that is
uncorrelated with the first principal component, i.e., we
wish to solve the nonlinear optimization problem
max a'2Sa2
a2
  
restricts
st a'2a2 = 1 covariance
' 
a1Sa2 = 0 to zero
 
The third principal component is the solution to the
nonlinear optimization problem
max a'3Sa3
a3
  
st a'3a3 = 1
'  restricts
a1Sa3 = 0 covariance
'   s to zero
a2Sa3 = 0
Generally, the ith principal component is the linear
combination of maximum sample variance that is
uncorrelated with all previous principal components, i.e.,
we wish to solve the nonlinear optimization problem
max a'iSai
ai
  
st a'iai = 1
' 
akSai = 0 k < i
 
We can show that, for random sample^ X ~ with
^
sample
^
covariance matrix S~ and eigenvalues 1  2    p  0,
the ith sample principal component is given by
ˆi = ˆ
y ˆ'i1x1 + e
e'ix = e ˆ'i2x 2 +  + e
ˆ'ip x p, i = 1,  , p
 
Note that the principal components are not unique if
some eigenvalues are equal.
We can also show for random sample X with sample
~ ^
covariance matrix S and eigenvalue-eigenvector pairs (1 ,
^ ^ ^ ~ ^ ^ ^
e~1), …, (p , e~p) where ~1  ~2    ~p,
p p
ˆ +  + λˆ =
s11 +  + spp =  ii 1
s
i =1
= λ p  Var  y 
i =1
i

so we can assess how well a subset of the principal


components yi summarizes the original random sample X ~
– one common method of doing so is
ˆ
λ proportion of
k
p total sample
ˆ
 λ i
variance due to
i =1 the kth principal
component
If a large proportion of the total sample variance can be
attributed to relatively few principal components, we can
replace the original p variables with these principal
components without loss of much information!
We can also easily find the correlations between the
original random variables xk and the principal
components yi

eˆik λˆi
rYi,Xk =
skk

These values are often used in interpreting the principal


components yi.
Note that
- the approach for standardized data (i.e., principal
components derived from the sample correlation matrix
R) is analogous to the population approach
~
- when principal components are derived from sample
data, the sample data are frequently centered,
x-x
 
which has no effect on the sample covariance matrix S
~
and yields the derived principal components
yˆi = ˆe'i  x - x 
  
Under these circumstances, the mean value of the ith
principal component associated with all n observations
in the data set is
1 n
1 n
1 '
ˆyi =
n
 
j=1 
'



'

n  j=1 



ei xj - x = ei  xj - x = eˆi 0 = 0
ˆ ˆ
n  
Example: Suppose we have the following sample of four
observations made on three random variables X1, X2, and
X3:

X1 X2 X3
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0

Find the three sample principal components y1, y2, and y3


based on the sample covariance matrix S:
~
First we need the sample covariance matrix S:
~
2.00 3.33 1.33
 
S = 3.33 8.00 4.67

1.33 4.67 7.00

and the corresponding eigenvalue-eigenvector pairs:


 0.291000
ˆ = 13.21944, e
ˆ  
λ 1 1 =  0.734253 
 0.613345
 0.415126
ˆ = 3.37916, eˆ  
λ 2 2 =  0.480690 
-0.772403
 0.861968
ˆ =
λ ˆ3 = -0.479385
0.40140, e
3  
 0.164927
so the principal components are:

ˆy1 = e'1x = 0.291000x1 + 0.734253x2 + 0.613345x3


'
ˆy2 = e2x = 0.415126x1 + 0.480690x2 - 0.772403x3
' 
ˆy3 = e3x = 0.861968x1 - 0.479385x2 + 0.164927x3
 

Note that

s11 + s22 + s33 = 2.0 + 8.0 + 7.0 = 17.0


= 13.21944 + 3.37916 + 0.40140 = λˆ1 + λˆ2 + λˆ3
and the proportion of total population variance due to
the each principal component is
ˆ
λ 13.21944
p
1
= = 0.777613814
ˆ 17.0

i=1
λ i

ˆ
λ 3.37916
p
2
= = 0.198774404
ˆ 17.0

i=1
λ i

ˆ
λ 0.40140
p
3
= = 0.023611782
17.0
 λˆ
i =1
i

Note that the third principal component is relatively


irrelevant!
Next we obtain the correlations between the observed
values xi of the original random variables and the sample
principal components yik
ˆe11 λˆ1 0.291000 13.21944
ry1,x1 = = = 0.529016407
s11 2.0
eˆ λˆ
21 1 0.734253 13.21944
ry1,x2 = = = 0.333704415
s22 8.0
ˆe λˆ
31 1 0.613345 13.21944
ry1,x3 = = = 0.318576185
s33 7.0
ˆe12 λˆ2 0.415126 3.37916
ry2,x1 = = = 0.381552972
s11 2.0
ˆe22 λˆ2 0.480690 3.37916
ry2,x2 = = = 0.110453671
s21 8.0
eˆ32 λˆ2 -0.772403 3.37916
ry2,x3 = = = -0.202838600
s33 7.0
eˆ λˆ
13 3 0.861968 0.40140
ry3,x1 = = = 0.273055007
s11 2.0
ˆe λˆ
23 3 -0.479385 0.40140
ry3,x2 = = = -0.037964991
s22 8.0
ˆe λˆ
33 3 0.164927 0.40140
ry3,x3 = = = 0.014927318
s33 7.0
We can display these results in a correlation matrix:

X1 X2 X3
Y1 0.529016 0.333704 0.318576
Y2 0.381553 0.110454 -0.202839
Y3 0.273055 -0.037965 0.014927

How would we interpret these results?

Note that results based on the sample correlation matrix


R
~
will not differ from results based on the population
correlation matrix  (why?).
~
SAS code for Principal Components Analysis:
OPTIONS LINESIZE=72 NODATE PAGENO=1;
DATA stuff;
INPUT x1 x2 x3;
LABEL x1='Random Variable 1'
x2='Random Variable 2'
x3='Random Variable 3';
CARDS;
1.0 6.0 9.0
4.0 12.0 10.0
3.0 12.0 15.0
4.0 10.0 12.0
;
PROC PRINCOMP DATA=stuff COV OUT=pcstuff;
VAR x1 x2 x3;
TITLE4 'Using PROC PRINCOMP for Principal Components Analysis';
RUN;
PROC CORR DATA=pcstuff;
VAR x1 x2 x3; used to instruct SAS to
WITH prin1 prin2 prin3;
run;
perform the principal
components analysis on
the sample covariance
rather than the default
SAS output for Principal Components Analysis:
The PRINCOMP Procedure
Observations 4
Variables 3

Simple Statistics
x1 x2 x3
Mean 3.000000000 10.00000000 11.50000000
StD 1.414213562 2.82842712 2.64575131

Covariance Matrix
x1 x2 x3
x1 Random Variable 1 2.000000000 3.333333333 1.333333333
x2 Random Variable 2 3.333333333 8.000000000 4.666666667
x3 Random Variable 3 1.333333333 4.666666667 7.000000000

Total Variance 17

Eigenvalues of the Covariance Matrix


Eigenvalue Difference Proportion Cumulative
1 13.2193960 9.8400643 0.7776 0.7776
2 3.3793317 2.9780594 0.1988 0.9764
3 0.4012723 0.0236 1.0000

Eigenvectors
Prin1 Prin2 Prin3
x1 Random Variable 1 0.291038 0.415039 0.861998
x2 Random Variable 2 0.734249 0.480716 -0.479364
x3 Random Variable 3 0.613331 -0.772434 0.164835
SAS output for Correlation Matrix – Original Random
Variables vs. Principal Components:
The CORR Procedure

3 With Variables: Prin1 Prin2 Prin3


3 Variables: x1 x2 x3

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
Prin1 4 0 1.49314 0 -2.20299 1.11219
Prin2 4 0 0.81371 0 -0.94739 0.99579
Prin3 4 0 0.32928 0 -0.28331 0.47104
x1 4 3.00000 1.41421 12.00000 1.00000 4.00000
x2 4 10.00000 2.82843 40.00000 6.00000 12.00000
x3 4 11.50000 2.64575 46.00000 9.00000 15.00000

Pearson Correlation Coefficients, N = 4


Prob > |r| under H0: Rho=0

x1 x2 x3

Prin1 0.86770 0.96362 0.74027


0.1323 0.0364 0.2597

Prin2 -0.45783 -0.09890 0.66538


0.5422 0.9011 0.3346

Prin3 0.19361 -0.24832 0.09631


0.8064 0.7517 0.9037

You might also like