KEMBAR78
01 Linear Regression | PDF | Linear Regression | Regression Analysis
0% found this document useful (0 votes)
20 views42 pages

01 Linear Regression

The document discusses linear regression, focusing on its application to advertising data to analyze the relationship between advertising budgets and product sales. It covers both simple and multiple linear regression, model selection, and the interpretation of coefficients, as well as statistical properties such as goodness-of-fit and hypothesis testing. The document emphasizes the importance of understanding linear relationships and the potential need for non-linear modeling in regression analysis.

Uploaded by

gedankenmanken
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views42 pages

01 Linear Regression

The document discusses linear regression, focusing on its application to advertising data to analyze the relationship between advertising budgets and product sales. It covers both simple and multiple linear regression, model selection, and the interpretation of coefficients, as well as statistical properties such as goodness-of-fit and hypothesis testing. The document emphasizes the importance of understanding linear relationships and the potential need for non-linear modeling in regression analysis.

Uploaded by

gedankenmanken
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Linear regression

Gertraud Malsiner-Walli

Readings: ISLR Chapter 3

Gertraud Malsiner-Walli Linear regression 1 / 42


Outline

Visualization of multivariate relationships

Linear regression
Simple linear regression
Multiple linear regression
Model selection
Further topics

Gertraud Malsiner-Walli Linear regression 2 / 42


Advertising data set

▶ Goal: provide a marketing plan for a company that will


improve sales for a particular product.
▶ The dataset contains information about the sales of a product
in 200 different markets together with advertising budgets in
each of these markets for different media channels: TV, Radio
and Newspaper.
▶ The Sales are in thousands of units.
▶ The budgets TV, Radio and Newspaper for are in thousands of
dollars.
▶ Additionally, the variable NewspaperCat is a categorization of
the newspaper budget into three categories: below 25,000,
between 25,000 and 50,000, above 50,000.

Gertraud Malsiner-Walli Linear regression 3 / 42


Data matrix

## TV Radio Newspaper Sales NewspaperCat


## 1 230.1 37.8 69.2 22.1 (50,150]
## 2 44.5 39.3 45.1 10.4 (25,50]
## 3 17.2 45.9 69.3 9.3 (50,150]
## 4 151.5 41.3 58.5 18.5 (50,150]
## 5 180.8 10.8 58.4 12.9 (50,150]
## 6 8.7 48.9 75.0 7.2 (50,150]

Gertraud Malsiner-Walli Linear regression 4 / 42


Visualization of multivariate relationships

Gertraud Malsiner-Walli Linear regression 5 / 42


Scatterplots
25

25

25
20

20

20
15

15

15
Sales

Sales

Sales
10

10

10
5

5
0 50 100 150 200 250 300 0 10 20 30 40 50 0 20 40 60 80 100

TV Radio Newspaper

Gertraud Malsiner-Walli Linear regression 6 / 42


Parallel boxplots
25
20
15
Sales

10
5

(0,25] (25,50] (50,150]

NewspaperCat

Gertraud Malsiner-Walli Linear regression 7 / 42


Linear regression

Gertraud Malsiner-Walli Linear regression 8 / 42


Linear regression

▶ Linear regression assumes that the dependence among a


response variable Y on covariates X1 , X2 , · · · Xp is linear.
▶ Even if in reality true relationships are rarely linear, this simple
approach is extremely useful both conceptually and
practically.
▶ The response variable Y is also called target variable,
outcome, dependent variable.
▶ The covariates X1 , . . . , Xp are referred to also as explanatory
variables, independent variables, predictors or features.

Gertraud Malsiner-Walli Linear regression 9 / 42


Linear regression for the advertising data

Relevant questions:
▶ Is there a relationship between advertising budgets and sales?
▶ How strong is the relationship between advertising budget and
sales?
▶ Is the relationship linear?
▶ Which media contribute to sales?
▶ How accurately can we predict future sales?

Gertraud Malsiner-Walli Linear regression 10 / 42


Simple linear regression

Gertraud Malsiner-Walli Linear regression 11 / 42


Simple linear regression

▶ Goal: predict a quantitative variable Y on the basis of a single


predictor X by assuming an approximately linear relationship.
▶ We assume:

Y = β0 + β1 X + ϵ
|{z} , E (ϵ|X ) = 0, (1)
| {z }
linear function error term

where β0 and β1 are unknown parameters (coefficients).


▶ Given some estimates β̂0 and β̂1 , we can predict future values
of Y by using the regression line:

ŷ = β̂0 + β̂1 x ,

where ŷ indicates a prediction of Y given X = x .

Gertraud Malsiner-Walli Linear regression 12 / 42


How do we obtain the best regression line?
▶ The goal is to find a line that lies “close’ ’ to the points in the
scatterplot.
▶ Let
ŷi = β̂0 + β̂1 xi
be the prediction or fitted value for Y based on xi .
Then
ei = yi − ŷi
is called the i-th residual.
20
Sales

5 10

0 50 100 150 200 250 300

Gertraud Malsiner-Walli TV
Linear regression 13 / 42
Least squares criterion
▶ One possibility to estimate the coefficients is to minimize the
sum of squared residuals:
n
X n
X n
X
SSR = ei2 = (yi − ŷi )2 = (yi − (β̂0 + β̂1 xi ))2 .
i=1 i=1 i=1

▶ The minimizing values are called the ordinary least squares


(OLS) estimates and are given by:
sy
β̂1 = rx ,y
sx

β̂0 = ȳ − β̂1 x̄ ,
where x̄ and ȳ denote the mean of the variables,
sy and sx the standard deviation of Y and X resp.,
and rx ,y is the correlation coefficient.
Gertraud Malsiner-Walli Linear regression 14 / 42
Least squares criterion for the advertising data

▶ We estimate the parameters for the simple regression where TV


budget is used as X and Sales is used as Y :

[ = 7.03259 + 0.04754 · TV
Sales (2)

▶ Interpretation of the coefficients


▶ β0 = 7.03259, the intercept, is the average value of Sales when
TV budget is equal to 0.
▶ β1 = 0.04754, the slope, is the marginal effect of TV on Sales.
The expected value for Sales will increase by
β1 = 0.04754 × 1000 ≈ 48 items for every additional 1000
dollars spent on TV advertising.

Gertraud Malsiner-Walli Linear regression 15 / 42


Goodness-of-fit (I)

▶ The goodness-of-fit of a linear regression can be assessed by


the residual standard error (RSE) and the R2 statistic.
▶ RSE is an estimate of the standard deviation of the error terms
ϵ: v s
u
u 1 X n
1
RSE = t (yi − ŷ )2 = SSR. (3)
n − 2 i=1 n−2

⇒ measures (roughly) the average amount by which the


response will deviate from the regression line.
▶ The RSE for the simple regression Sales ∼ TV in Equation (2)
is 3.26, meaning that the actual sales deviate from the
regression line on average by 3260 items.

Gertraud Malsiner-Walli Linear regression 16 / 42


Goodness-of-fit (II)

▶ The coefficient of determination R2 :


TSS − SSR SSR
R2 = =1− ,
TSS TSS
Pn
where TSS= i=1 (yi − ȳ )2 is the total sum of squares.
▶ It lies between 0 and 1 and measures the amount of variation
in Y that can be explained by the regression model.
▶ In the social sciences, low R 2 values in regression analysis are
not uncommon.

Gertraud Malsiner-Walli Linear regression 17 / 42


Goodness of fit for the advertising data

▶ The R 2 for the simple regression Sales ∼ TV is 0.6119.


▶ The R 2 for the simple regression Sales ∼ Radio is 0.332.
▶ The R 2 for the simple regression Sales ∼ Newspaper is
0.05212.

Gertraud Malsiner-Walli Linear regression 18 / 42


Statistical properties of OLS estimators:
Unbiasedness

▶ Question: Are the estimates β̂0 , β̂1 equal to the true values?
▶ Answer: Yes (on average) - if certain assumptions on the error
term hold.

Gertraud Malsiner-Walli Linear regression 19 / 42


Statistical properties of OLS estimators: standard
error

▶ Question: How large is the difference between the OLS


estimate and the true value?
▶ The standard error (SE) of an estimator reflects how the
estimator varies under repeated sampling.
▶ Good news: We can estimate the SE of the regression
coefficients under certain assumptions on the error terms:

σ2 x̄ 2
 
2 2 2 1
SE (β̂1 ) = nP 2
, SE (β̂0 ) = σ + n
P 2
i=1 (xi − x̄ ) n i=1 (xi − x̄ )

where σ 2 is the variance of the error terms, which is typically


unknown and can be estimated by the RSE in Equation (3).

Gertraud Malsiner-Walli Linear regression 20 / 42


Confidence intervals and hypothesis testing
▶ The standard errors can then be used to obtain confidence
intervals:

β̂1 ± 2 · SE (β̂1 ), β̂0 ± 2 · SE (β̂0 ).

▶ The standard errors can be used in hypothesis testing. The


most common test involves the following hypotheses:

H0 : β1 = 0, i.e. there is no linear relationship between X and Y

HA : β1 ̸= 0, i.e. there is a linear relationship between X and Y

▶ The test statistic of this hypothesis test is given by


t = β̂1 /SE (β̂1 ) and follows a t-distribution under the null
hypothesis.

Gertraud Malsiner-Walli Linear regression 21 / 42


Incorporating non-linearities

▶ It is rather easy to incorporate many nonlinearities into simple


regression model by appropriately defining the dependent
variable Y and the independent variable X .

Gertraud Malsiner-Walli Linear regression 22 / 42


Incorporating non-linearities - Examples

✓linear Y = e β0 X β1 ϵ̃ ⇒ log(Y ) = β0 + β1 log(X ) + ϵ


✓linear Y = β0 + β1 X + ϵ

1
X nonlinear Y = +ϵ
β0 + β1 X

Gertraud Malsiner-Walli Linear regression 23 / 42


Visual inspection of residuals for Sales ~ TV

5
Residuals

0
−5

8 10 12 14 16 18 20
Fitted values
Sales ~ TV

The plot should look random with constant variance but it is not
the case.
Gertraud Malsiner-Walli Linear regression 24 / 42
Incorporating non-linearities in the model Sales ~ TV
▶ The linearity assumption is violated at the left end of the plot.
▶ Solution: Estimate the model log(Sales) ∼ log(TV ).
▶ Interpretation: a one percent increase in TV generate a β1
percent change in Sales.
▶ Residual plot improved, so has R 2 .
0.4
0.2
Residuals

0.0
−0.4 −0.2

1.0 1.5 2.0 2.5 3.0


Fitted values
log(Sales) ~ log(TV)

Gertraud Malsiner-Walli Linear regression 25 / 42


Multiple linear regression

Gertraud Malsiner-Walli Linear regression 26 / 42


Multiple linear regression

▶ The multiple linear regression model is:

Y = β0 + β1 X1 + β2 X2 + . . . + βp Xp + ϵ.

▶ In the advertising example, the model becomes:

Sales = β0 + β1 · TV + β2 · radio + β3 · newspaper + ϵ.

▶ We interpret βj as the average effect on Y of a one unit


increase in Xj , holding all other predictors fixed (i.e., ceteris
paribus).

Gertraud Malsiner-Walli Linear regression 27 / 42


More on the interpretation of the coefficients

▶ If the predictors are uncorrelated, the coefficient β̂1 of the


simple linear regression Y ∼ X1 would equal the coefficient β̂1
of the multiple linear regression.
▶ With correlation amongst predictors comes the following issues:
▶ The variance of all coefficients increases.
▶ Because predictors usually change together, interpretations
become hazardous

Gertraud Malsiner-Walli Linear regression 28 / 42


Estimation and prediction
▶ Parameters are estimated using the least square method where
the sum of squared residuals is minimized:
n
X
SSR = (yi − ŷi )2
i=1
n
X
= (yi − β̂0 − β̂1 xi1 − · · · − β̂P xip )2
i=1

▶ This minimization problem is implemented in all statistical


software programs.
▶ Given estimates β̂0 , β̂1 , . . . , β̂p , we can predict future values of
Y by using the regression model:

ŷ = β̂0 + β̂1 x1 + . . . + β̂1 xp .

Gertraud Malsiner-Walli Linear regression 29 / 42


Figure 1: Linear regression with two covariates

Gertraud Malsiner-Walli Linear regression 30 / 42


Goodness of fit
▶ The coefficient of determination R2 is, as in the simple
linear regression case, equal to:
TSS − SSR
R2 = .
TSS
▶ Note: It never decreases, it usually increases when another
independent variable is added to a regression.
▶ This makes it a poor tool for deciding whether one variable or
several variables should be added to a model.
▶ Alternative: Adjusted R 2

2 (1 − R 2 )(n − 1)
Radj =1− .
n−p−1

Gertraud Malsiner-Walli Linear regression 31 / 42


Hypothesis Testing - the t-test
▶ Question: Which of the predictors are useful in predicting the
response?
▶ To answer this question we can construct hypothesis tests for
each regression coefficient βj :

H0 : βj = 0

HA : βj ̸= 0.

▶ Under certain assumptions and additionally by assuming the


errors are normally distributed, we have the following result:

β̂j /SE (β̂j ) ∼ tn−p−1 .

▶ The ratio β̂j /SE (β̂j ) is called the t-statistic.


▶ p-values can be computed based on the distribution under the
null-hypothesis.
Gertraud Malsiner-Walli Linear regression 32 / 42
Model selection

Gertraud Malsiner-Walli Linear regression 33 / 42


Model comparison: in-sample and out-of sample
approach
Comparison between models can be done based on
▶ in-sample (likelihood based) measures such as Akaike
information criterion (AIC) or Bayesian information criterion
(BIC);
▶ out-of-sample measures (by measuring the prediction
performance on a test data set).
▶ Typically the MSE (mean squared error) or the RMSE (root
MSE) is employed as a goodness of fit measure both in-sample
and out-of-sample:
n √
1X
MSE = (yi − ŷi )2 , RMSE = MSE
n i=1

Gertraud Malsiner-Walli Linear regression 34 / 42


Information Criteria: AIC and BIC

▶ AIC, BIC: Model Fit + Penalty

= n · log(MSE ) + const(n) + m · (p + 1)

▶ MSE : Mean square error


▶ p: Number of regressors
▶ m = 2: AIC (Akaike Information C riterion)
▶ m = log n: BIC (Bayesian IC )
▶ Choose the model that minimizes the criterion!

Gertraud Malsiner-Walli Linear regression 35 / 42


Variable selection: Irrelevant and omitted variables

▶ Adding irrelevant variables to a model does not affect


unbiasedness but increases the variance of the estimators.
▶ Excluding a relevant variable will typically bias the estimates.

Gertraud Malsiner-Walli Linear regression 36 / 42


Deciding on the important variables

▶ The most direct approach is called best subset regression: we


fit a OLS regression for each possible combination of the p
predictors and then choose between them based on some
criterion that balances MSE with model size.
▶ There are 2p models under consideration, so computation
becomes quickly infeasible.
▶ Alternative: automated approach that search through a subset
of all models - Stepwise regression

Gertraud Malsiner-Walli Linear regression 37 / 42


Forward selection

▶ Begin with null model - a model that contains an intercept but


no predictors.
▶ Fit p simple linear regressions with each regressor and add the
variable that results in the lowest AIC to the null model.
▶ Continue adding variables as long as the AIC will decrease.
Stopp if adding any variable will increase the AIC.

Gertraud Malsiner-Walli Linear regression 38 / 42


Backward selection

▶ Start with all variables in the model.


▶ Remove that variable whose exclusion yields the smallest AIC.
▶ Continue removing variables as long as the AIC will decrease.
Stop if removing any included variable will increase the AIC.

Gertraud Malsiner-Walli Linear regression 39 / 42


Further topics

Gertraud Malsiner-Walli Linear regression 40 / 42


Qualitative predictors (1)
▶ Categorical covariates are coded as dummy variables (which
take on 0 or 1 values) when used in a regression setting.
▶ If a categorical variable has K levels, K − 1 dummy variables
are needed to represent the variable.
▶ In our example, NewspaperCat has three levels, thus we need
two dummy variables:
(
1 if the newspaper budget is in the range (25, 50]
D1 =
0 otherwise
(
1 if the newspaper budget is in the range (50, 150]
D2 =
0 otherwise

▶ Note that we have not created any dummy variable for the first
level (0, 25] as this is considered to be the baseline.
Gertraud Malsiner-Walli Linear regression 41 / 42
Qualitative predictors (2)

▶ The model Sales ~ NewspaperCat becomes:


β0 if NewspaperCat (0, 25]


[ = β0 +β1 D1 +β2 D2 =
Sales β +β
0 1 if NewspaperCat (25, 50]


β + β if NewspaperCat (50, 150]
0 2

Gertraud Malsiner-Walli Linear regression 42 / 42

You might also like