Stat 5311 - Multivariate Statistics and
Nonparametric Statistics
Dexter Cahoy, University of Houston-Downtown
1 / 27
Outline
• Multidimensional Scaling
• Chapter 4 in EH and Chapter 12 in JW
• Notation and Terminologies: JW uses p, DZ uses m, and EH
uses q.
2 / 27
Multidimensional Scaling
• Multidimensional scaling (MDS) is a method based on
proximities between objects, subjects, or stimuli used to
produce a spatial representation of these items.
• Proximities express the similarity or dissimilarity between data
objects. These arise in judgements made by human raters
about how alike pairs of objects, stimuli, etc., of interest are.
• It is a dimension reduction technique which aims to find a set
of points in low dimension (typically 2 dimensions) that reflect
the configuration of the high-dimensional data.
3 / 27
Classical (Metric) MDS
• Deals with Euclidean distances, D, derived from a raw n × q
data matrix, X with zero mean (subtract mean).
• Find a low-dimensional representation of the data which mimics
the original distances.
• Let X be known (for now) and
B = XXT .
Pq
• This suggests bij = l=1 xil xjl .
• The squared Euclidean distance between ith row and jth row is
dij2 = bii + bjj − 2bij .
• With certain conditions (see p108 of EH), we can derive
elements of B or X in terms of the Euclidean distances dij2 .
4 / 27
Classical (Metric) MDS
• By spectral decomposition and if X is full rank then
B = V ΛV T
where V contains the first q eigenvectors and Λ contains the q
non-zero eigenvalues λ1 , λ2 , . . . , λq .
• The required coordinate values are thus
X = V Λ1/2
1/2 1/2 1/2
where Λ1/2 = diag λ1 , λ2 , . . . , λq .
• Classical MDS problem: given the proximities, find X.
• GOAL: Find adequate k-dimensional coordinates X̂n×k , k ≤ q.
5 / 27
Classical (Metric) MDS
• The adequacy of the k-dimensional representation can be
judged by the size of the criterion
Pk
λi
Pk = Pni=1 .
j=1 λj
• Pk > 0.8 is desirable (according to EH).
• If proximity matrix is non-Euclidean then one can use
Pk
|λi |r
Pk = Pni=1 r
, r = 1, 2.
j=1 |λj |
6 / 27
Example: Euclidean distances
X=c(3, 4, 4, 6, 1,
5, 1, 1 ,7, 3,
6, 2, 0, 2, 6,
1, 1, 1, 0, 3,
4, 7, 3 ,6 ,2,
2, 2, 5, 1, 0,
0, 4, 1, 1, 1,
0, 6, 4, 3, 5,
7, 6, 5, 1, 4,
2, 1, 4, 3, 1)
X=matrix(X,nr=10, byrow=TRUE)
7 / 27
Example: Euclidean distances
(D <- dist(X))
## 1 2 3 4 5 6 7 8
## 2 5.196152
## 3 8.366600 6.082763
## 4 7.874008 8.062258 6.324555
## 5 3.464102 6.557439 8.366600 9.273618
## 6 5.656854 8.426150 8.831761 5.291503 7.874008
## 7 6.557439 8.602325 8.185353 3.872983 7.416198 5.000000
## 8 6.164414 8.888194 8.366600 6.928203 6.000000 7.071068 5.744563
## 9 7.416198 9.055385 6.855655 8.888194 6.557439 7.549834 8.831761 7.416198
## 10 4.358899 6.164414 7.681146 4.795832 7.141428 2.645751 5.099020 6.708204
## 9
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10 8.000000
8 / 27
Example: Euclidean distances
hatX=cmdscale(D, k = 9, eig = TRUE)$points # \hat{X} solution
hatX
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -1.6038325 2.38060903 -2.2301092 -0.3656856 0.11536476 8.421598e-08
## [2,] -2.8246377 -2.30937202 -3.9523782 0.3419185 0.33169405 -2.312861e-08
## [3,] -1.6908272 -5.13970089 1.2880306 0.6503227 -0.05133897 1.074339e-08
## [4,] 3.9527719 -2.43233961 0.3833746 0.6863995 -0.03460933 3.254229e-08
## [5,] -3.5984894 2.75538195 -0.2551393 1.0783741 -1.26125237 -2.345830e-08
## [6,] 2.9520356 1.35475175 -0.1899027 -2.8211220 0.12385813 -3.319300e-08
## [7,] 3.4689928 0.76411068 0.3016531 1.6369166 -1.94209512 6.148806e-09
## [8,] 0.3545235 2.31408566 2.2161772 2.9240116 2.00450379 -6.654403e-09
## [9,] -2.9362323 -0.01279597 4.3117385 -2.5122743 -0.18911558 1.773836e-08
## [10,] 1.9256952 0.32526941 -1.8734445 -1.6188611 0.90299062 3.093294e-09
## [,7]
## [1,] 0.000000e+00
## [2,] 1.565503e-08
## [3,] 1.532955e-08
## [4,] 2.353591e-08
## [5,] 3.588394e-08
## [6,] 6.457705e-09
## [7,] 1.530651e-08
## [8,] 1.652824e-08
## [9,] 2.040312e-08
## [10,] 4.576136e-08
9 / 27
Example: Euclidean distances
hatX=cmdscale(D, k = 9, eig = TRUE)$points # \hat{X} solution
dist(hatX)
## 1 2 3 4 5 6 7 8
## 2 5.196152
## 3 8.366600 6.082763
## 4 7.874008 8.062258 6.324555
## 5 3.464102 6.557439 8.366600 9.273618
## 6 5.656854 8.426150 8.831761 5.291503 7.874008
## 7 6.557439 8.602325 8.185353 3.872983 7.416198 5.000000
## 8 6.164414 8.888194 8.366600 6.928203 6.000000 7.071068 5.744563
## 9 7.416198 9.055385 6.855655 8.888194 6.557439 7.549834 8.831761 7.416198
## 10 4.358899 6.164414 7.681146 4.795832 7.141428 2.645751 5.099020 6.708204
## 9
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10 8.000000
10 / 27
Example: Euclidean distances
cmdscale(D, k = 9, eig = TRUE)$eig
## [1] 7.518716e+01 5.880560e+01 4.960516e+01 3.042789e+01 1.037419e+01
## [6] 1.086006e-14 5.381235e-15 -5.316641e-15 -8.539862e-15 -
1.050854e-14
#five-dimensional solution achieves complete
#recovery of the observed distance matrix.
max(abs(dist(X) - dist(cmdscale(D, k = 5))))
## [1] 1.065814e-14
#comparing first 5 PCAs with 5-dimensional scaling solution
max(abs(prcomp(X)$x) - abs(cmdscale(D, k = 5)))
## [1] 2.663494e-14
cmdscale(D, k = 3, eig = TRUE)$GOF
## [1] 0.8181726 0.8181726
11 / 27
Example: Non-euclidean (airline) distances of US cities.
library(psychTools)
data(cities)
cities
## ATL BOS ORD DCA DEN LAX MIA JFK SEA SFO MSY
## ATL 0 934 585 542 1209 1942 605 751 2181 2139 424
## BOS 934 0 853 392 1769 2601 1252 183 2492 2700 1356
## ORD 585 853 0 598 918 1748 1187 720 1736 1857 830
## DCA 542 392 598 0 1493 2305 922 209 2328 2442 964
## DEN 1209 1769 918 1493 0 836 1723 1636 1023 951 1079
## LAX 1942 2601 1748 2305 836 0 2345 2461 957 341 1679
## MIA 605 1252 1187 922 1723 2345 0 1092 2733 2594 669
## JFK 751 183 720 209 1636 2461 1092 0 2412 2577 1173
## SEA 2181 2492 1736 2328 1023 957 2733 2412 0 681 2101
## SFO 2139 2700 1857 2442 951 341 2594 2577 681 0 1925
## MSY 424 1356 830 964 1079 1679 669 1173 2101 1925 0
12 / 27
Example
city.loc2 <- cmdscale(cities, k=2,eig=TRUE)
#cmdscale(cities, k=2,eig=TRUE)£points
x <- -city.loc2$points[,1]
y <- -city.loc2$points[,2] # reflect so North is at the top
plot(x, y, xlab = "Coordinate 1", ylab = "Coordinate 2", xlim = range(x
text(x, y, labels = colnames(cities), cex = 0.7)
13 / 27
Example
SEA
BOS
500
JFK
ORD
DCA
Coordinate 2
DEN
SFO
ATL
LAX
−500
MSY
MIA
−2000 −1500 −1000 −500 0 500 1000
Coordinate 1
14 / 27
Example
lam <- city.loc2$eig
lam
## [1] 1.097898e+07 1.972910e+06 1.335364e+04 1.579915e+03 6.352201e+02
## [6] 5.328605e+01 -1.236913e-10 -1.982622e+02 -1.054745e+03 -4.225182e+03
## [11] -4.352426e+04
cumsum(abs(lam)) / sum(abs(lam)) # non-Euclidean MDS
## [1] 0.8434654 0.9950352 0.9960611 0.9961825 0.9962313 0.9962354 0.9962354
## [8] 0.9962506 0.9963316 0.9966562 1.0000000
cumsum(lam^2) / sum(lam^2) # non-Euclidean MDS
## [1] 0.9687021 0.9999832 0.9999846 0.9999846 0.9999846 0.9999846 0.9999846
## [8] 0.9999846 0.9999846 0.9999848 1.0000000
15 / 27
Example
par(mfrow=c(2,1))
plot (1:length(lam),cumsum((lam)))
plot (1:length(lam),cumsum(abs(lam^2)))
12500000
cumsum((lam))
11000000
2 4 6 8 10
1:length(lam)
1.24e+14
cumsum(abs(lam^2))
1.21e+14
2 4 6 8 10
1:length(lam)
16 / 27
Non-metric MDS
• The idea is to find the coordinates in the spatial representation
X̂n×k , k ≤ q of the observed dissimilarities, that minimizes the
Stress function
2
d̂ij − dij
P
i<j
Stress(k) = min{ }1/2 ,
dij2
P
i<j
where d̂ij is a function of the proximity data, and dij is the
Euclidean distance.
17 / 27
House of Representatives Example
Romesburg (1984) shows the number of times 15 congressmen from NJ voted differently on 19 environmental bills.
## Hunt(R) Sandman(R) Howard(D) Thompson(D) Freylinghuysen(R)
## Hunt(R) 0 8 15 15 10
## Sandman(R) 8 0 17 12 13
## Howard(D) 15 17 0 9 16
## Thompson(D) 15 12 9 0 14
## Freylinghuysen(R) 10 13 16 14 0
## Forsythe(R) 9 13 12 12 8
## Widnall(R) 7 12 15 13 9
## Roe(D) 15 16 5 10 13
## Heltoski(D) 16 17 5 8 14
## Rodino(D) 14 15 6 8 12
## Minish(D) 15 16 5 8 12
## Rinaldo(R) 16 17 4 6 12
## Maraziti(R) 7 13 11 15 10
## Daniels(D) 11 12 10 10 11
## Patten(D) 13 16 7 7 11
## Forsythe(R) Widnall(R) Roe(D) Heltoski(D) Rodino(D) Minish(D)
## Hunt(R) 9 7 15 16 14 15
## Sandman(R) 13 12 16 17 15 16
## Howard(D) 12 15 5 5 6 5
## Thompson(D) 12 13 10 8 8 8
## Freylinghuysen(R) 8 9 13 14 12 12
## Forsythe(R) 0 7 12 11 10 9
## Widnall(R) 7 0 17 16 15 14
## Roe(D) 12 17 0 4 5 5
## Heltoski(D) 11 16 4 0 3 2
## Rodino(D) 10 15 5 3 0 1
## Minish(D) 9 14 5 2 1 0
## Rinaldo(R) 10 15 3 1 2 1
## Maraziti(R) 6 10 12 13 11 12
## Daniels(D) 6 11 7 7 4 5
## Patten(D) 10 13 6 5 6 5
## Rinaldo(R) Maraziti(R) Daniels(D) Patten(D)
## Hunt(R) 16 7 11 13 18 / 27
Voting Example
library("MASS")
data("voting", package = "HSAUR2")
#Kruskal's non-metric MDS, default: k=2
voting_mds <- isoMDS(voting)
## initial value 15.268246
## iter 5 value 10.264075
## final value 9.879047
## converged
#gives final stress achieved (in percent).
19 / 27
Voting Example
Kruskal(1964): Poor=20%, Fair=10%, Good=5%, Excellent =
2.5%
Stress vs. k
20
15
stress2
10
5
0
1 2 3 4 5 6 7
Index
20 / 27
Voting Example
8
6 Sandman(R)
Thompson(D)
4
Patten(D)
Coordinate 2
Hunt(R) Roe(D)
Rinaldo(R) Heltoski(D)
0
Minish(D)
Daniels(D) Rodino(D)
Widnall(R)
Howard(D)
−2
Forsythe(R)
−4
Freylinghuysen(R)
−6
Maraziti(R)
−10 −5 0 5
Coordinate 1
Voting behaviour is essentially along party lines, although there is more variation
among Republicans. The voting behaviour of one Republican (Rinaldo) seems to
be closer to his Democratic colleagues. 21 / 27
Voting Example (Shepard Plot)
voting_sh <- Shepard(voting[lower.tri(voting)], voting_mds$points)
#x-axis= orig proximity (similarity or dissimilarity) x(i,j)
#y-axis = d(i,j) =asterisk
plot(voting_sh, pch = "*", cex=1, col="red", xlab = "Dissimilarity",
ylab = "Distance", xlim = range(voting_sh$x), ylim = range(voting_sh$x))
#x-axis= orig proximity(similarity or dissimilarity)
#y-axis= the fit( \hat{d(i,j)}) =dash
lines(voting_sh$x, voting_sh$yf, type = "S", lwd=2,col="blue")
22 / 27
Voting Example (Shepard Plot)
* **
* *
*
15
* ** * *
** * *
* * *
**
* * * *
** *
** **
* ** *
* **
10
**
Distance
* **
** *
*
* *
** *
*
* *
* *
** *
* **
5
* * * *
* *
* * *
** * *
** *
* * ** * *
* *
** * *
* *
5 10 15
Dissimilarity
If proximities are similarities, the points should form a loose line from top left to bottom right. If proximities are
dissimilarities, then the data should form a line from bottom left to top right.
Good fit (zero stress) means that the dashes (fit) and the asterisks(dij ) should coincide.
According to EH, the plot shows some discrepancies between the original dissimilarities and the multidimensional
23 / 27
scaling solution.
Voting Example (vegan package)
library(vegan)
set.seed(7634)
voting_mds2=metaMDS(voting)
## Run 0 stress 0.07328141
## Run 1 stress 0.09555359
## Run 2 stress 0.0768525
## Run 3 stress 0.06938356
## ... New best solution
## ... Procrustes: rmse 0.04121633 max resid 0.1105989
## Run 4 stress 0.0909891
## Run 5 stress 0.09777049
## Run 6 stress 0.07060562
## Run 7 stress 0.06628582
## ... New best solution
## ... Procrustes: rmse 0.04867576 max resid 0.1390413
## Run 8 stress 0.07060575
## Run 9 stress 0.09777044
## Run 10 stress 0.0768525
## Run 11 stress 0.0768525
## Run 12 stress 0.09777068
## Run 13 stress 0.07328141
## Run 14 stress 0.06628577
## ... New best solution
## ... Procrustes: rmse 0.0003259382 max resid 0.00105246
## ... Similar to previous best
## Run 15 stress 0.07328141
## Run 16 stress 0.06796735
## Run 17 stress 0.0768525
## Run 18 stress 0.07328141
## Run 19 stress 0.09777049
## Run 20 stress 0.0768525
## *** Solution reached
#voting_mds2
24 / 27
Voting Example (vegan package)
The closer the fit statistics R 2 to 1 the better the fit.
stressplot(voting_mds2)
Non−metric fit, R2 = 0.996
Linear fit, R2 = 0.983
15
Ordination Distance
10
5
0
5 10 15
Observed Dissimilarity
25 / 27
Summary
• MDS finds a lower-dimension representation of the data that
mimics the distances or minimizes the “stress”” function.
• MDS starts with a proximity matrix D (n × n) consisting of
dissimilarities or distances.
• Classical MDS deals with distances (Euclidean and
non-Euclidean).
• Non-metric MDS deals with non-distances.
• Use scree or Shepard plots to assess fit.
26 / 27
HW/Lab for this week
1 Explore the eurodist data from the library(datasets).
How many coordinates do we need? What does the Shepard
and scree plot say? Plot the coordinates using k = 2.
2 Explore the WWIIleaders data
(http:// stat.ethz.ch/T eaching/ Datasets/W BL/ ) using
k = 2 and non-metric MDS. What does the Shepard and scree
plot say about k = 2?
27 / 27