MULTICOLLINEARITY
Nguyen Quang
quangn@ueh.edu.vn
OUTLINE
• Definition
• Sources of multicollinearity
• Detection
• Remedy
MULTICOLLINEARITY
• One of the assumptions of the classical linear regression model is that there is no perfect
linear relationship among the regressors.
• If there are one or more such relationships among the regressors, we call it
multicollinearity, or collinearity for short.
• Perfect collinearity: A perfect linear relationship between two variables.
• Imperfect collinearity: The regressors are highly (but not perfectly) collinear.
𝑌 AND 𝑋𝑠: NO RELATIONSHIP AMONG X𝑠
𝒀
𝑿𝟑
𝑿𝟐
𝑌 AND 𝑋𝑠: RELATIONSHIP AMONG 𝑋𝑠
𝑿𝟐 𝑿𝟑
SOURCES OF MULTICOLLINEARITY
• Constraints on the population being sampled
• Example: people with higher income are wealthier
• We may not have enough observations who are wealthy but low-income
individuals or those who earn high incomes but are not wealthy
• Regressors are just collinear
• Example: people with higher education tend to have higher income
SOURCES OF MULTICOLLINEARITY
• Model specification
• Example: adding polynomial terms to a model, especially if the range of the 𝑋
variable is small.
• Economic Function
• Example: 𝑤𝑎𝑔𝑒 = 𝑓 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒, 𝑎𝑔𝑒
• Experience may be correlated with age
CONSEQUENCES
• The OLS estimators are still BLUE, but one or more regression coefficients have
large standard errors relative to the values of the coefficients, thereby making the 𝑡
ratios small.
• Even though some regression coefficients are statistically insignificant, the 𝑅2
value may be very high.
• Therefore, one may conclude (misleadingly) that the true values of these
coefficients are not different from zero.
• Also, the regression coefficients may be very sensitive to small changes in the data,
especially if the sample is relatively small.
• In some cases, wrong expected signs of estimated coefficients.
• For the following regression model:
𝑌! = 𝛽" + 𝛽#𝑋#! + 𝛽$𝑋$! + 𝑢!
• It can be shown that:
𝜎# 𝜎#
VARIANCE 𝑣𝑎𝑟 𝛽8# = # #
∑ 𝑥#! 1 − 𝑟#$
= #
∑ 𝑥#!
𝑉𝐼𝐹
INFLATION and
FACTOR 𝜎# 𝜎#
𝑣𝑎𝑟 𝛽8$ = # # = # 𝑉𝐼𝐹
∑ 𝑥$! 1 − 𝑟#$ ∑ 𝑥$!
where 𝜎 # is the variance of the error term 𝑢! , and
𝑟#$ is the coefficient of correlation between 𝑋#
and 𝑋$.
VARIANCE 𝑉𝐼𝐹 =
1
!
INFLATION 1 − 𝑟!"
FACTOR 𝑽𝑰𝑭 is a measure of the degree to which the
variance of the OLS estimator is inflated
because of multicollinearity.
DETECTION OF MULTICOLLINEARITY
• High 𝑅2 but few significant 𝑡 ratios
• High pair-wise correlations among or regressors
• Significant 𝐹 test for auxiliary regressions (regressions of each regressor
on the remaining regressors) or 𝑅2 of auxiliary regression is higher than
the regression between 𝑌 and 𝑋𝑠
• Wrong expected sign but high 𝑅2
• High variance inflation factor (𝑉𝐼𝐹) > 10 (or 5)
• Sensitive change when one more independent variable is added
EXAMPLE: HOUSEHOLD EXPENDITURE
SURVEY DATA OF MARRIED COUPLES 2020 IN HCM
• [DEP VAR] expense: household expenditure (mil. VND/month)
• income: household monthly income (mil. VND/month)
• age_wife: age of the wife (or female partner)
• age_husband: age of the husband (or male partner)
• hhsize: Household size (members)
• children: % children in the household
SUMMARY
STATISTICS
HOUSEHOLD
EXPENDITURE:
OLS REGRESSION
DETECTING
MULTICOLLINEARITY
• Correlation matrix
• Auxiliary regression
• Variance inflation factors
CORRELATION
MATRIX
• High correlation coefficients (usually believed to be ±0.8)
suggest high multicollinearity.
• Low correlation coefficients do not imply the absence of
multicollinearity…
• … as multicollinearity may involve more than two variables
AUXILIARY
REGRESSION
VARIANCE INFLATING FACTOR
SOLUTIONS FOR
MULTICOLLINEARITY
• General Rules of Thumb: DO NOT WORRY IF
SOLUTIONS • coefficients are statistically significant
• correct expected signs for coefficients
RESTRUCTURING THE MODEL
• There may be alternative specifications or alternative functional forms
• Example: production function
𝑦 = 𝐹(𝑙𝑎𝑏𝑜𝑟, 𝑙𝑎𝑛𝑑, 𝑐𝑎𝑝𝑖𝑡𝑎𝑙)
• Solution:
𝑦 𝑙𝑎𝑏𝑜𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑙
= 𝐹( , 𝑙𝑎𝑛𝑑, )
𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑
TRANSFORMING
REGRESSOR
DROPPING
CORRELATED
REGRESSORS
DROPPING
CORRELATED
REGRESSORS