KEMBAR78
Lecture 4 - Multicolinearity | PDF | Multicollinearity | Linear Regression
0% found this document useful (0 votes)
36 views24 pages

Lecture 4 - Multicolinearity

The document discusses multicollinearity in regression analysis, defining it as the presence of linear relationships among regressors that can affect the reliability of regression coefficients. It outlines sources of multicollinearity, its consequences, and methods for detection, including high R² values with insignificant t ratios and high variance inflation factors (VIF). The document also suggests potential remedies, such as restructuring the model or dropping correlated regressors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views24 pages

Lecture 4 - Multicolinearity

The document discusses multicollinearity in regression analysis, defining it as the presence of linear relationships among regressors that can affect the reliability of regression coefficients. It outlines sources of multicollinearity, its consequences, and methods for detection, including high R² values with insignificant t ratios and high variance inflation factors (VIF). The document also suggests potential remedies, such as restructuring the model or dropping correlated regressors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MULTICOLLINEARITY

Nguyen Quang
quangn@ueh.edu.vn
OUTLINE
• Definition
• Sources of multicollinearity
• Detection
• Remedy
MULTICOLLINEARITY
• One of the assumptions of the classical linear regression model is that there is no perfect
linear relationship among the regressors.
• If there are one or more such relationships among the regressors, we call it
multicollinearity, or collinearity for short.
• Perfect collinearity: A perfect linear relationship between two variables.
• Imperfect collinearity: The regressors are highly (but not perfectly) collinear.
𝑌 AND 𝑋𝑠: NO RELATIONSHIP AMONG X𝑠

𝒀
𝑿𝟑
𝑿𝟐
𝑌 AND 𝑋𝑠: RELATIONSHIP AMONG 𝑋𝑠

𝑿𝟐 𝑿𝟑
SOURCES OF MULTICOLLINEARITY
• Constraints on the population being sampled
• Example: people with higher income are wealthier
• We may not have enough observations who are wealthy but low-income
individuals or those who earn high incomes but are not wealthy
• Regressors are just collinear
• Example: people with higher education tend to have higher income
SOURCES OF MULTICOLLINEARITY
• Model specification
• Example: adding polynomial terms to a model, especially if the range of the 𝑋
variable is small.
• Economic Function
• Example: 𝑤𝑎𝑔𝑒 = 𝑓 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛, 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒, 𝑎𝑔𝑒
• Experience may be correlated with age
CONSEQUENCES
• The OLS estimators are still BLUE, but one or more regression coefficients have
large standard errors relative to the values of the coefficients, thereby making the 𝑡
ratios small.
• Even though some regression coefficients are statistically insignificant, the 𝑅2
value may be very high.
• Therefore, one may conclude (misleadingly) that the true values of these
coefficients are not different from zero.
• Also, the regression coefficients may be very sensitive to small changes in the data,
especially if the sample is relatively small.
• In some cases, wrong expected signs of estimated coefficients.
• For the following regression model:
𝑌! = 𝛽" + 𝛽#𝑋#! + 𝛽$𝑋$! + 𝑢!
• It can be shown that:

𝜎# 𝜎#
VARIANCE 𝑣𝑎𝑟 𝛽8# = # #
∑ 𝑥#! 1 − 𝑟#$
= #
∑ 𝑥#!
𝑉𝐼𝐹

INFLATION and
FACTOR 𝜎# 𝜎#
𝑣𝑎𝑟 𝛽8$ = # # = # 𝑉𝐼𝐹
∑ 𝑥$! 1 − 𝑟#$ ∑ 𝑥$!
where 𝜎 # is the variance of the error term 𝑢! , and
𝑟#$ is the coefficient of correlation between 𝑋#
and 𝑋$.
VARIANCE 𝑉𝐼𝐹 =
1
!
INFLATION 1 − 𝑟!"

FACTOR 𝑽𝑰𝑭 is a measure of the degree to which the


variance of the OLS estimator is inflated
because of multicollinearity.
DETECTION OF MULTICOLLINEARITY
• High 𝑅2 but few significant 𝑡 ratios
• High pair-wise correlations among or regressors
• Significant 𝐹 test for auxiliary regressions (regressions of each regressor
on the remaining regressors) or 𝑅2 of auxiliary regression is higher than
the regression between 𝑌 and 𝑋𝑠
• Wrong expected sign but high 𝑅2
• High variance inflation factor (𝑉𝐼𝐹) > 10 (or 5)
• Sensitive change when one more independent variable is added
EXAMPLE: HOUSEHOLD EXPENDITURE
SURVEY DATA OF MARRIED COUPLES 2020 IN HCM

• [DEP VAR] expense: household expenditure (mil. VND/month)


• income: household monthly income (mil. VND/month)
• age_wife: age of the wife (or female partner)
• age_husband: age of the husband (or male partner)
• hhsize: Household size (members)
• children: % children in the household
SUMMARY
STATISTICS
HOUSEHOLD
EXPENDITURE:
OLS REGRESSION
DETECTING
MULTICOLLINEARITY
• Correlation matrix
• Auxiliary regression
• Variance inflation factors
CORRELATION
MATRIX
• High correlation coefficients (usually believed to be ±0.8)
suggest high multicollinearity.
• Low correlation coefficients do not imply the absence of
multicollinearity…
• … as multicollinearity may involve more than two variables
AUXILIARY
REGRESSION
VARIANCE INFLATING FACTOR
SOLUTIONS FOR
MULTICOLLINEARITY
• General Rules of Thumb: DO NOT WORRY IF
SOLUTIONS • coefficients are statistically significant
• correct expected signs for coefficients
RESTRUCTURING THE MODEL
• There may be alternative specifications or alternative functional forms
• Example: production function
𝑦 = 𝐹(𝑙𝑎𝑏𝑜𝑟, 𝑙𝑎𝑛𝑑, 𝑐𝑎𝑝𝑖𝑡𝑎𝑙)
• Solution:
𝑦 𝑙𝑎𝑏𝑜𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑙
= 𝐹( , 𝑙𝑎𝑛𝑑, )
𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑 𝑙𝑎𝑛𝑑
TRANSFORMING
REGRESSOR
DROPPING
CORRELATED
REGRESSORS
DROPPING
CORRELATED
REGRESSORS

You might also like