KEMBAR78
R Data Analysis Examples - Canonical Correlation Analysis | PDF | Regression Analysis | Linear Regression
0% found this document useful (0 votes)
364 views7 pages

R Data Analysis Examples - Canonical Correlation Analysis

The document discusses analyzing associations between two sets of variables using canonical correlation analysis. It uses a dataset with psychological and academic variables for 600 students to demonstrate canonical correlation analysis in R. The analysis identifies three dimensions that explain the associations between the variable sets, and displays the raw canonical coefficients which are interpreted similarly to regression coefficients.

Uploaded by

Fernando Andrade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
364 views7 pages

R Data Analysis Examples - Canonical Correlation Analysis

The document discusses analyzing associations between two sets of variables using canonical correlation analysis. It uses a dataset with psychological and academic variables for 600 students to demonstrate canonical correlation analysis in R. The analysis identifies three dimensions that explain the associations between the variable sets, and displays the raw canonical coefficients which are interpreted similarly to regression coefficients.

Uploaded by

Fernando Andrade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

HelptheStatConsultingGroupby

stat

>

>

dae

givingagift

>canonical.htm

R Data Analysis Examples: Canonical Correlation Analysis


Canonicalcorrelationanalysisisusedtoidentifyandmeasuretheassociationsamongtwosetsofvariables.Canonicalcorrelationisappropriateinthesame
situationswheremultipleregressionwouldbe,butwherearetherearemultipleintercorrelatedoutcomevariables.Canonicalcorrelationanalysisdetermines
asetofcanonicalvariates,orthogonallinearcombinationsofthevariableswithineachsetthatbestexplainthevariabilitybothwithinandbetweensets.
Thispageusesthefollowingpackages.Makesurethatyoucanloadthembeforetryingtoruntheexamplesonthispage.Ifyoudonothaveapackage
installed,run: install.packages("packagename") ,orifyouseetheversionisoutofdate,run: update.packages() .

require(ggplot2)
require(GGally)
require(CCA)
Versioninfo:CodeforthispagewastestedinRUnderdevelopment(unstable)(20121116r61126)
On:20121215
With:CCA1.2;fields6.7;spam0.292;fda2.3.2;RCurl1.953;bitops1.05;Matrix1.010;lattice0.2010;zoo1.79;GGally
0.4.2;reshape0.8.4;plyr1.8;ggplot20.9.3;knitr0.9

PleaseNote:Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands.Itdoesnotcoverallaspectsoftheresearchprocesswhich
researchersareexpectedtodo.Inparticular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,modeldiagnosticsandpotential
followupanalyses.

Examples of canonical correlation analysis


Example1.Aresearcherhascollecteddataonthreepsychologicalvariables,fouracademicvariables(standardizedtestscores)andgenderfor600college
freshman.Sheisinterestedinhowthesetofpsychologicalvariablesrelatestotheacademicvariablesandgender.Inparticular,theresearcherisinterested
inhowmanydimensions(canonicalvariables)arenecessarytounderstandtheassociationbetweenthetwosetsofvariables.
Example2.Aresearcherisinterestedinexploringassociationsamongfactorsfromtwomultidimensionalpersonalitytests,theMMPIandtheNEO.Sheis
interestedinwhatdimensionsarecommonbetweenthetestsandhowmuchvarianceissharedbetweenthem.Sheisspecificallyinterestedinfinding
whethertheneuroticismdimensionfromtheNEOcanaccountforasubstantialamountofsharedvariancebetweenthetwotests.

Description of the data


Forouranalysisexample,wearegoingtoexpandexample1aboutinvestigatingtheassociationsbetweenpsychologicalmeasuresandacademic
achievementmeasures.
Wehaveadatafile,mmreg.dta,with600observationsoneightvariables.Thepsychologicalvariablesare locus_of_control , self_concept and
motivation .Theacademicvariablesarestandardizedtestsinreading( read ),writing( write ),math( math )andscience( science ).Additionally,the
variable female isazerooneindicatorvariablewiththeoneindicatingafemalestudent.

mm<read.csv("http://www.ats.ucla.edu/stat/data/mmreg.csv")
colnames(mm)<c("Control","Concept","Motivation","Read","Write","Math",
"Science","Sex")
summary(mm)
##ControlConceptMotivationRead
##Min.:2.2300Min.:2.6200Min.:0.000Min.:28.3
##1stQu.:0.37251stQu.:0.30001stQu.:0.3301stQu.:44.2
##Median:0.2100Median:0.0300Median:0.670Median:52.1
##Mean:0.0965Mean:0.0049Mean:0.661Mean:51.9
##3rdQu.:0.51003rdQu.:0.44003rdQu.:1.0003rdQu.:60.1
##Max.:1.3600Max.:1.1900Max.:1.000Max.:76.0
##WriteMathScienceSex
##Min.:25.5Min.:31.8Min.:26.0Min.:0.000
##1stQu.:44.31stQu.:44.51stQu.:44.41stQu.:0.000
##Median:54.1Median:51.3Median:52.6Median:1.000
##Mean:52.4Mean:51.9Mean:51.8Mean:0.545
##3rdQu.:59.93rdQu.:58.43rdQu.:58.63rdQu.:1.000
##Max.:67.1Max.:75.5Max.:74.2Max.:1.000

Analysis methods you might consider


Belowisalistofsomeanalysismethodsyoumayhaveencountered.Someofthemethodslistedarequitereasonablewhileothershaveeitherfallenoutof

favororhavelimitations.

Canonicalcorrelationanalysis,thefocusofthispage.
SeparateOLSRegressionsYoucouldanalyzethesedatausingseparateOLSregressionanalysesforeachvariableinoneset.TheOLSregressions
willnotproducemultivariateresultsanddoesnotreportinformationconcerningdimensionality.
Multivariatemultipleregressionisareasonableoptionifyouhavenointerestindimensionality.

Canonical correlation analysis


Belowweusethe canon commandtoconductacanonicalcorrelationanalysis.Itrequirestwosetsofvariablesenclosedwithapairofparentheses.We
specifyourpsychologicalvariablesasthefirstsetofvariablesandouracademicvariablesplusgenderasthesecondset.Forconvenience,thevariablesin
thefirstsetarecalled"u"variablesandthevariablesinthesecondsetarecalled"v"variables.
Let'slookatthedata.

xtabs(~Sex,data=mm)
##Sex
##01
##273327
psych<mm[,1:3]
acad<mm[,4:8]
ggpairs(psych)

ggpairs(acad)

Next,we'lllookatthecorrelationswithinandbetweenthetwosetsofvariablesusingthe matcor functionfromthe CCA package.

#correlations
matcor(psych,acad)
##$Xcor
##ControlConceptMotivation
##Control1.00000.17120.2451
##Concept0.17121.00000.2886
##Motivation0.24510.28861.0000
##
##$Ycor
##ReadWriteMathScienceSex
##Read1.000000.62860.679280.69070.04174
##Write0.628591.00000.632670.56910.24433
##Math0.679280.63271.000000.64950.04822
##Science0.690690.56910.649531.00000.13819
##Sex0.041740.24430.048220.13821.00000
##
##$XYcor
##ControlConceptMotivationReadWriteMathScience
##Control1.00000.171190.24510.373570.358880.337270.32463
##Concept0.17121.000000.28860.060660.019450.053600.06983
##Motivation0.24510.288571.00000.210610.254250.195010.11567
##Read0.37360.060660.21061.000000.628590.679280.69069
##Write0.35890.019450.25420.628591.000000.632670.56915
##Math0.33730.053600.19500.679280.632671.000000.64953
##Science0.32460.069830.11570.690690.569150.649531.00000
##Sex0.11340.125950.09810.041740.244330.048220.13819
##Sex
##Control0.11341
##Concept0.12595
##Motivation0.09810
##Read0.04174
##Write0.24433
##Math0.04822
##Science0.13819
##Sex1.00000

Some Strategies You Might Be Tempted To Try


Beforeweshowhowyoucananalyzethiswithacanonicalcorrelationanalysis,let'sconsidersomeothermethodsthatyoumightuse.

SeparateOLSRegressionsYoucouldanalyzethesedatausingseparateOLSregressionanalysesforeachvariableinoneset.TheOLSregressions

willnotproducemultivariateresultsanddoesnotreportinformationconcerningdimensionality.
Multivariatemultipleregressionisareasonableoptionifyouhavenointerestindimensionality.

R Canonical Correlation Analysis


Duetothelengthoftheoutput,wewillbemakingcommentsinseveralplacesalongtheway.

cc1<cc(psych,acad)
#displaythecanonicalcorrelations
cc1$cor
##[1]0.46410.16750.1040
#rawcanonicalcoefficients
cc1[3:4]
##$xcoef
##[,1][,2][,3]
##Control1.25380.62150.6617
##Concept0.35131.18770.8267
##Motivation1.26242.02732.0002
##
##$ycoef
##[,1][,2][,3]
##Read0.0446210.0049100.021381
##Write0.0358770.0420710.091307
##Math0.0234170.0042290.009398
##Science0.0050250.0851620.109835
##Sex0.6321191.0846421.794647
Therawcanonicalcoefficientsareinterpretedinamanneranalogoustointerpretingregressioncoefficientsi.e.,forthevariable read ,aoneunitincreasein
readingleadstoa.0446decreaseinthefirstcanonicalvariateofset2whenalloftheothervariablesareheldconstant.Hereisanotherexample:being
femaleleadstoa.6321decreaseinthedimension1fortheacademicsetwiththeotherpredictorsheldconstant.
Next,we'lluse comput tocomputetheloadingsofthevariablesonthecanonicaldimensions(variates).Theseloadingsarecorrelationsbetweenvariables
andthecanonicalvariates.

#computecanonicalloadings
cc2<comput(psych,acad,cc1)
#displaycanonicalloadings
cc2[3:6]
##$corr.X.xscores
##[,1][,2][,3]
##Control0.904050.38970.1756
##Concept0.020840.70870.7052
##Motivation0.567150.35090.7451
##
##$corr.Y.xscores
##[,1][,2][,3]
##Read0.39000.060110.01408
##Write0.40680.010860.02647
##Math0.35450.049910.01537
##Science0.30560.113370.02395
##Sex0.16900.126460.05651
##
##$corr.X.yscores
##[,1][,2][,3]
##Control0.4195550.065280.01826
##Concept0.0096730.118720.07333
##Motivation0.2632070.058780.07749
##
##$corr.Y.yscores
##[,1][,2][,3]
##Read0.84040.358830.1354
##Write0.87650.064840.2546
##Math0.76390.297950.1478
##Science0.65840.676800.2304
##Sex0.36410.754930.5434

Theabovecorrelationsarebetweenobservedvariablesandcanonicalvariableswhichareknownasthecanonicalloadings.Thesecanonicalvariatesare
actuallyatypeoflatentvariable.
Ingeneral,thenumberofcanonicaldimensionsisequaltothenumberofvariablesinthesmallersethowever,thenumberofsignificantdimensionsmay
beevensmaller.Canonicaldimensions,alsoknownascanonicalvariates,arelatentvariablesthatareanalogoustofactorsobtainedinfactoranalysis.For
thisparticularmodeltherearethreecanonicaldimensionsofwhichonlythefirsttwoarestatisticallysignificant.(Note:IwasnotabletofindawaytohaveR
automaticallycomputethetestsofthecanonicaldimensionsinanyofthepackagessoIhaveincludedsomeRcodebelow.)

#testsofcanonicaldimensions
ev<(1cc1$cor^2)
n<dim(psych)[1]
p<length(psych)
q<length(acad)
k<min(p,q)
m<n3/2(p+q)/2
w<rev(cumprod(rev(ev)))
#initialize
d1<d2<f<vector("numeric",k)
for(iin1:k){
s<sqrt((p^2*q^24)/(p^2+q^25))
si<1/s
d1[i]<p*q
d2[i]<m*sp*q/2+1
r<(1w[i]^si)/w[i]^si
f[i]<r*d2[i]/d1[i]
p<p1
q<q1
}
pv<pf(f,d1,d2,lower.tail=FALSE)
(dmat<cbind(WilksL=w,F=f,df1=d1,df2=d2,p=pv))
##WilksLFdf1df2p
##[1,]0.754411.7161516357.498e28
##[2,]0.96142.944811862.905e03
##[3,]0.98922.16535949.109e02
Asshowninthetableabove,thefirsttestofthecanonicaldimensionstestswhetherallthreedimensionsaresignificant(theyare,F=11.72),thenexttest
testswhetherdimensions2and3combinedaresignificant(theyare,F=2.94).Finally,thelasttesttestswhetherdimension3,byitself,issignificant(itis
not).Thereforedimensions1and2musteachbesignificantwhiledimensionthreeisnot.
Whenthevariablesinthemodelhaveverydifferentstandarddeviations,thestandardizedcoefficientsallowforeasiercomparisonsamongthevariables.
Next,we'llcomputethestandardizedcanonicalcoefficients.

#standardizedpsychcanonicalcoefficientsdiagonalmatrixofpsychsd's
s1<diag(sqrt(diag(cov(psych))))
s1%*%cc1$xcoef
##[,1][,2][,3]
##[1,]0.84040.41660.4435
##[2,]0.24790.83790.5833
##[3,]0.43270.69480.6855
#standardizedacadcanonicalcoefficientsdiagonalmatrixofacadsd's
s2<diag(sqrt(diag(cov(acad))))
s2%*%cc1$ycoef
##[,1][,2][,3]
##[1,]0.450800.049610.21601
##[2,]0.348960.409210.88810
##[3,]0.220470.039820.08848
##[4,]0.048780.826601.06608
##[5,]0.315040.540570.89443
Thestandardizedcanonicalcoefficientsareinterpretedinamanneranalogoustointerpretingstandardizedregressioncoefficients.Forexample,considerthe
variable read ,aonestandarddeviationincreaseinreadingleadstoa0.45standarddeviationdecreaseinthescoreonthefirstcanonicalvariateforset2
whentheothervariablesinthemodelareheldconstant.

Sample WriteUp of the Analysis

Thereisalotofvariationinthewriteupsofcanonicalcorrelationanalyses.Thewriteupbelowisfairlyminimal,includingonlythetestsofdimensionality
andthestandardizedcoefficients.

Table1:TestsofCanonicalDimensions
CanonicalMult.
DimensionCorr.Fdf1df2p
10.4611.72151634.70.0000
20.172.94811860.0029
30.102.1635940.0911
Table2:StandardizedCanonicalCoefficients
Dimension
12
PsychologicalVariables
locusofcontrol0.840.42
selfconcept0.250.84
motivation0.430.69
AcademicVariablesplusGender
reading0.450.05
writing0.350.41
math0.220.04
science0.050.83
gender(female=1)0.320.54
Testsofdimensionalityforthecanonicalcorrelationanalysis,asshowninTable1,indicatethattwoofthethreecanonicaldimensionsarestatistically
significantatthe.05level.Dimension1hadacanonicalcorrelationof0.46betweenthesetsofvariables,whilefordimension2thecanonicalcorrelation
wasmuchlowerat0.17.
Table2presentsthestandardizedcanonicalcoefficientsforthefirsttwodimensionsacrossbothsetsofvariables.Forthepsychologicalvariables,thefirst
canonicaldimensionismoststronglyinfluencedbylocusofcontrol(.84)andfortheseconddimensionselfconcept(.84)andmotivation(.69).Forthe
academicvariablesplusgender,thefirstdimensionwascomprisedofreading(.45),writing(.35)andgender(.32).Fortheseconddimensionwriting(.41),
science(.83)andgender(.54)werethedominatingvariables.

Cautions, Flies in the Ointment


Multivatiatenormaldistributionassumptionsarerequiredforbothsetsofvariables.
Canonicalcorrelationanalysisisnotrecommendedforsmallsamples.

SeeAlso

R Documentation
CCAPackage

References
Afifi,A,Clark,VandMay,S.2004.ComputerAidedMultivariateAnalysis.4thed.BocaRaton,Fl:Chapman&Hall/CRC.

Howtocitethispage

Reportanerroronthispageorleaveacomment

Thecontentofthiswebsiteshouldnotbeconstruedasanendorsementofanyparticularwebsite,book,orsoftwareproductbytheUniversityofCalifornia.

IDRE RESEARCH TECHNOLOGY


GROUP

High Performance
Computing
Statistical Computing

GIS and Visualization

ABOUT
2016 UC Regents

CONTACT

Terms of Use & Privacy Policy

NEWS

HighPerformanceComputing

GIS

StatisticalComputing

Hoffman2Cluster

Mapshare

Classes

Hoffman2AccountApplication

Visualization

Conferences

Hoffman2UsageStatistics

3DModeling

ReadingMaterials

UCGridPortal

TechnologySandbox

IDREListserv

UCLAGridPortal

TechSandboxAccess

IDREResources

SharedCluster&Storage

DataCenters

SocialSciencesDataArchive

AboutIDRE

EVENTS

OUR EXPERTS

You might also like