HelptheStatConsultingGroupby
stat
>
>
dae
givingagift
>canonical.htm
R Data Analysis Examples: Canonical Correlation Analysis
Canonicalcorrelationanalysisisusedtoidentifyandmeasuretheassociationsamongtwosetsofvariables.Canonicalcorrelationisappropriateinthesame
situationswheremultipleregressionwouldbe,butwherearetherearemultipleintercorrelatedoutcomevariables.Canonicalcorrelationanalysisdetermines
asetofcanonicalvariates,orthogonallinearcombinationsofthevariableswithineachsetthatbestexplainthevariabilitybothwithinandbetweensets.
Thispageusesthefollowingpackages.Makesurethatyoucanloadthembeforetryingtoruntheexamplesonthispage.Ifyoudonothaveapackage
installed,run: install.packages("packagename") ,orifyouseetheversionisoutofdate,run: update.packages() .
require(ggplot2)
require(GGally)
require(CCA)
Versioninfo:CodeforthispagewastestedinRUnderdevelopment(unstable)(20121116r61126)
On:20121215
With:CCA1.2;fields6.7;spam0.292;fda2.3.2;RCurl1.953;bitops1.05;Matrix1.010;lattice0.2010;zoo1.79;GGally
0.4.2;reshape0.8.4;plyr1.8;ggplot20.9.3;knitr0.9
PleaseNote:Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands.Itdoesnotcoverallaspectsoftheresearchprocesswhich
researchersareexpectedtodo.Inparticular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,modeldiagnosticsandpotential
followupanalyses.
Examples of canonical correlation analysis
Example1.Aresearcherhascollecteddataonthreepsychologicalvariables,fouracademicvariables(standardizedtestscores)andgenderfor600college
freshman.Sheisinterestedinhowthesetofpsychologicalvariablesrelatestotheacademicvariablesandgender.Inparticular,theresearcherisinterested
inhowmanydimensions(canonicalvariables)arenecessarytounderstandtheassociationbetweenthetwosetsofvariables.
Example2.Aresearcherisinterestedinexploringassociationsamongfactorsfromtwomultidimensionalpersonalitytests,theMMPIandtheNEO.Sheis
interestedinwhatdimensionsarecommonbetweenthetestsandhowmuchvarianceissharedbetweenthem.Sheisspecificallyinterestedinfinding
whethertheneuroticismdimensionfromtheNEOcanaccountforasubstantialamountofsharedvariancebetweenthetwotests.
Description of the data
Forouranalysisexample,wearegoingtoexpandexample1aboutinvestigatingtheassociationsbetweenpsychologicalmeasuresandacademic
achievementmeasures.
Wehaveadatafile,mmreg.dta,with600observationsoneightvariables.Thepsychologicalvariablesare locus_of_control , self_concept and
motivation .Theacademicvariablesarestandardizedtestsinreading( read ),writing( write ),math( math )andscience( science ).Additionally,the
variable female isazerooneindicatorvariablewiththeoneindicatingafemalestudent.
mm<read.csv("http://www.ats.ucla.edu/stat/data/mmreg.csv")
colnames(mm)<c("Control","Concept","Motivation","Read","Write","Math",
"Science","Sex")
summary(mm)
##ControlConceptMotivationRead
##Min.:2.2300Min.:2.6200Min.:0.000Min.:28.3
##1stQu.:0.37251stQu.:0.30001stQu.:0.3301stQu.:44.2
##Median:0.2100Median:0.0300Median:0.670Median:52.1
##Mean:0.0965Mean:0.0049Mean:0.661Mean:51.9
##3rdQu.:0.51003rdQu.:0.44003rdQu.:1.0003rdQu.:60.1
##Max.:1.3600Max.:1.1900Max.:1.000Max.:76.0
##WriteMathScienceSex
##Min.:25.5Min.:31.8Min.:26.0Min.:0.000
##1stQu.:44.31stQu.:44.51stQu.:44.41stQu.:0.000
##Median:54.1Median:51.3Median:52.6Median:1.000
##Mean:52.4Mean:51.9Mean:51.8Mean:0.545
##3rdQu.:59.93rdQu.:58.43rdQu.:58.63rdQu.:1.000
##Max.:67.1Max.:75.5Max.:74.2Max.:1.000
Analysis methods you might consider
Belowisalistofsomeanalysismethodsyoumayhaveencountered.Someofthemethodslistedarequitereasonablewhileothershaveeitherfallenoutof
favororhavelimitations.
Canonicalcorrelationanalysis,thefocusofthispage.
SeparateOLSRegressionsYoucouldanalyzethesedatausingseparateOLSregressionanalysesforeachvariableinoneset.TheOLSregressions
willnotproducemultivariateresultsanddoesnotreportinformationconcerningdimensionality.
Multivariatemultipleregressionisareasonableoptionifyouhavenointerestindimensionality.
Canonical correlation analysis
Belowweusethe canon commandtoconductacanonicalcorrelationanalysis.Itrequirestwosetsofvariablesenclosedwithapairofparentheses.We
specifyourpsychologicalvariablesasthefirstsetofvariablesandouracademicvariablesplusgenderasthesecondset.Forconvenience,thevariablesin
thefirstsetarecalled"u"variablesandthevariablesinthesecondsetarecalled"v"variables.
Let'slookatthedata.
xtabs(~Sex,data=mm)
##Sex
##01
##273327
psych<mm[,1:3]
acad<mm[,4:8]
ggpairs(psych)
ggpairs(acad)
Next,we'lllookatthecorrelationswithinandbetweenthetwosetsofvariablesusingthe matcor functionfromthe CCA package.
#correlations
matcor(psych,acad)
##$Xcor
##ControlConceptMotivation
##Control1.00000.17120.2451
##Concept0.17121.00000.2886
##Motivation0.24510.28861.0000
##
##$Ycor
##ReadWriteMathScienceSex
##Read1.000000.62860.679280.69070.04174
##Write0.628591.00000.632670.56910.24433
##Math0.679280.63271.000000.64950.04822
##Science0.690690.56910.649531.00000.13819
##Sex0.041740.24430.048220.13821.00000
##
##$XYcor
##ControlConceptMotivationReadWriteMathScience
##Control1.00000.171190.24510.373570.358880.337270.32463
##Concept0.17121.000000.28860.060660.019450.053600.06983
##Motivation0.24510.288571.00000.210610.254250.195010.11567
##Read0.37360.060660.21061.000000.628590.679280.69069
##Write0.35890.019450.25420.628591.000000.632670.56915
##Math0.33730.053600.19500.679280.632671.000000.64953
##Science0.32460.069830.11570.690690.569150.649531.00000
##Sex0.11340.125950.09810.041740.244330.048220.13819
##Sex
##Control0.11341
##Concept0.12595
##Motivation0.09810
##Read0.04174
##Write0.24433
##Math0.04822
##Science0.13819
##Sex1.00000
Some Strategies You Might Be Tempted To Try
Beforeweshowhowyoucananalyzethiswithacanonicalcorrelationanalysis,let'sconsidersomeothermethodsthatyoumightuse.
SeparateOLSRegressionsYoucouldanalyzethesedatausingseparateOLSregressionanalysesforeachvariableinoneset.TheOLSregressions
willnotproducemultivariateresultsanddoesnotreportinformationconcerningdimensionality.
Multivariatemultipleregressionisareasonableoptionifyouhavenointerestindimensionality.
R Canonical Correlation Analysis
Duetothelengthoftheoutput,wewillbemakingcommentsinseveralplacesalongtheway.
cc1<cc(psych,acad)
#displaythecanonicalcorrelations
cc1$cor
##[1]0.46410.16750.1040
#rawcanonicalcoefficients
cc1[3:4]
##$xcoef
##[,1][,2][,3]
##Control1.25380.62150.6617
##Concept0.35131.18770.8267
##Motivation1.26242.02732.0002
##
##$ycoef
##[,1][,2][,3]
##Read0.0446210.0049100.021381
##Write0.0358770.0420710.091307
##Math0.0234170.0042290.009398
##Science0.0050250.0851620.109835
##Sex0.6321191.0846421.794647
Therawcanonicalcoefficientsareinterpretedinamanneranalogoustointerpretingregressioncoefficientsi.e.,forthevariable read ,aoneunitincreasein
readingleadstoa.0446decreaseinthefirstcanonicalvariateofset2whenalloftheothervariablesareheldconstant.Hereisanotherexample:being
femaleleadstoa.6321decreaseinthedimension1fortheacademicsetwiththeotherpredictorsheldconstant.
Next,we'lluse comput tocomputetheloadingsofthevariablesonthecanonicaldimensions(variates).Theseloadingsarecorrelationsbetweenvariables
andthecanonicalvariates.
#computecanonicalloadings
cc2<comput(psych,acad,cc1)
#displaycanonicalloadings
cc2[3:6]
##$corr.X.xscores
##[,1][,2][,3]
##Control0.904050.38970.1756
##Concept0.020840.70870.7052
##Motivation0.567150.35090.7451
##
##$corr.Y.xscores
##[,1][,2][,3]
##Read0.39000.060110.01408
##Write0.40680.010860.02647
##Math0.35450.049910.01537
##Science0.30560.113370.02395
##Sex0.16900.126460.05651
##
##$corr.X.yscores
##[,1][,2][,3]
##Control0.4195550.065280.01826
##Concept0.0096730.118720.07333
##Motivation0.2632070.058780.07749
##
##$corr.Y.yscores
##[,1][,2][,3]
##Read0.84040.358830.1354
##Write0.87650.064840.2546
##Math0.76390.297950.1478
##Science0.65840.676800.2304
##Sex0.36410.754930.5434
Theabovecorrelationsarebetweenobservedvariablesandcanonicalvariableswhichareknownasthecanonicalloadings.Thesecanonicalvariatesare
actuallyatypeoflatentvariable.
Ingeneral,thenumberofcanonicaldimensionsisequaltothenumberofvariablesinthesmallersethowever,thenumberofsignificantdimensionsmay
beevensmaller.Canonicaldimensions,alsoknownascanonicalvariates,arelatentvariablesthatareanalogoustofactorsobtainedinfactoranalysis.For
thisparticularmodeltherearethreecanonicaldimensionsofwhichonlythefirsttwoarestatisticallysignificant.(Note:IwasnotabletofindawaytohaveR
automaticallycomputethetestsofthecanonicaldimensionsinanyofthepackagessoIhaveincludedsomeRcodebelow.)
#testsofcanonicaldimensions
ev<(1cc1$cor^2)
n<dim(psych)[1]
p<length(psych)
q<length(acad)
k<min(p,q)
m<n3/2(p+q)/2
w<rev(cumprod(rev(ev)))
#initialize
d1<d2<f<vector("numeric",k)
for(iin1:k){
s<sqrt((p^2*q^24)/(p^2+q^25))
si<1/s
d1[i]<p*q
d2[i]<m*sp*q/2+1
r<(1w[i]^si)/w[i]^si
f[i]<r*d2[i]/d1[i]
p<p1
q<q1
}
pv<pf(f,d1,d2,lower.tail=FALSE)
(dmat<cbind(WilksL=w,F=f,df1=d1,df2=d2,p=pv))
##WilksLFdf1df2p
##[1,]0.754411.7161516357.498e28
##[2,]0.96142.944811862.905e03
##[3,]0.98922.16535949.109e02
Asshowninthetableabove,thefirsttestofthecanonicaldimensionstestswhetherallthreedimensionsaresignificant(theyare,F=11.72),thenexttest
testswhetherdimensions2and3combinedaresignificant(theyare,F=2.94).Finally,thelasttesttestswhetherdimension3,byitself,issignificant(itis
not).Thereforedimensions1and2musteachbesignificantwhiledimensionthreeisnot.
Whenthevariablesinthemodelhaveverydifferentstandarddeviations,thestandardizedcoefficientsallowforeasiercomparisonsamongthevariables.
Next,we'llcomputethestandardizedcanonicalcoefficients.
#standardizedpsychcanonicalcoefficientsdiagonalmatrixofpsychsd's
s1<diag(sqrt(diag(cov(psych))))
s1%*%cc1$xcoef
##[,1][,2][,3]
##[1,]0.84040.41660.4435
##[2,]0.24790.83790.5833
##[3,]0.43270.69480.6855
#standardizedacadcanonicalcoefficientsdiagonalmatrixofacadsd's
s2<diag(sqrt(diag(cov(acad))))
s2%*%cc1$ycoef
##[,1][,2][,3]
##[1,]0.450800.049610.21601
##[2,]0.348960.409210.88810
##[3,]0.220470.039820.08848
##[4,]0.048780.826601.06608
##[5,]0.315040.540570.89443
Thestandardizedcanonicalcoefficientsareinterpretedinamanneranalogoustointerpretingstandardizedregressioncoefficients.Forexample,considerthe
variable read ,aonestandarddeviationincreaseinreadingleadstoa0.45standarddeviationdecreaseinthescoreonthefirstcanonicalvariateforset2
whentheothervariablesinthemodelareheldconstant.
Sample WriteUp of the Analysis
Thereisalotofvariationinthewriteupsofcanonicalcorrelationanalyses.Thewriteupbelowisfairlyminimal,includingonlythetestsofdimensionality
andthestandardizedcoefficients.
Table1:TestsofCanonicalDimensions
CanonicalMult.
DimensionCorr.Fdf1df2p
10.4611.72151634.70.0000
20.172.94811860.0029
30.102.1635940.0911
Table2:StandardizedCanonicalCoefficients
Dimension
12
PsychologicalVariables
locusofcontrol0.840.42
selfconcept0.250.84
motivation0.430.69
AcademicVariablesplusGender
reading0.450.05
writing0.350.41
math0.220.04
science0.050.83
gender(female=1)0.320.54
Testsofdimensionalityforthecanonicalcorrelationanalysis,asshowninTable1,indicatethattwoofthethreecanonicaldimensionsarestatistically
significantatthe.05level.Dimension1hadacanonicalcorrelationof0.46betweenthesetsofvariables,whilefordimension2thecanonicalcorrelation
wasmuchlowerat0.17.
Table2presentsthestandardizedcanonicalcoefficientsforthefirsttwodimensionsacrossbothsetsofvariables.Forthepsychologicalvariables,thefirst
canonicaldimensionismoststronglyinfluencedbylocusofcontrol(.84)andfortheseconddimensionselfconcept(.84)andmotivation(.69).Forthe
academicvariablesplusgender,thefirstdimensionwascomprisedofreading(.45),writing(.35)andgender(.32).Fortheseconddimensionwriting(.41),
science(.83)andgender(.54)werethedominatingvariables.
Cautions, Flies in the Ointment
Multivatiatenormaldistributionassumptionsarerequiredforbothsetsofvariables.
Canonicalcorrelationanalysisisnotrecommendedforsmallsamples.
SeeAlso
R Documentation
CCAPackage
References
Afifi,A,Clark,VandMay,S.2004.ComputerAidedMultivariateAnalysis.4thed.BocaRaton,Fl:Chapman&Hall/CRC.
Howtocitethispage
Reportanerroronthispageorleaveacomment
Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.
IDRE RESEARCH TECHNOLOGY
GROUP
High Performance
Computing
Statistical Computing
GIS and Visualization
ABOUT
2016 UC Regents
CONTACT
Terms of Use & Privacy Policy
NEWS
HighPerformanceComputing
GIS
StatisticalComputing
Hoffman2Cluster
Mapshare
Classes
Hoffman2AccountApplication
Visualization
Conferences
Hoffman2UsageStatistics
3DModeling
ReadingMaterials
UCGridPortal
TechnologySandbox
IDREListserv
UCLAGridPortal
TechSandboxAccess
IDREResources
SharedCluster&Storage
DataCenters
SocialSciencesDataArchive
AboutIDRE
EVENTS
OUR EXPERTS