0% found this document useful (0 votes)

19 views11 pages

Chapter 2 Regression Analysis Notes

Regression analysis

Uploaded by

aphelelekoyo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views11 pages

Chapter 2 Regression Analysis Notes

Regression analysis

Uploaded by

aphelelekoyo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Regression analysis

July 31, 2024

1 Introduction

1.1 The model

The origin of time series analysis is the linear regression model that is concerned with the interde-
pendence of time series observations. We are looking to explain the behaviour of a certain variable
with the aid of a number of explanatory variables. We consider the following model

Linear regression expresses a dependent variable as a linear function of independent variables,

possibly random, and an error.
Xk
Yi = β0 + βj Xj,i + εi ,
j=1

where Yi is known as the regressand, dependent variable or simply the left-hand-side variable. The
k variables, X1,i , . . . , Xk,i are known as the regressors, independent variables or right-hand-side
variables. β1 , β2 , . . . , βk are the regression coefficients, εi is known as the innovation, shock or
error and i = 1, 2, . . . , n index the observation. While this representation clarifies the relationship
between Yi .

First, we consider the case of a univariate explanatory variable x: simple linear regression
(k = 1)

Then, we extend the model to include more variables: multiple regression model (k ≥ 1).

Regression analysis uses two principal types of data: cross sectional and time series. A cross-
sectional regression involves many observations of X and Y for the same time period. These
observations could come from different companies, asset classes, investment funds, countries, or
other entities, depending on the regression model. For example, a cross-sectional model might use
data from many companies to test whether predicted EPS growth explains differences in price-to-
earnings ratios during a specific time period. Note that if we use cross-sectional observations in
a regression, we usually denote the observations as i = 1, 2, . . . , n. Time-series data can also use

1
many observations from different time periods for the same company, asset class, investment fund,
country, or other entity, depending on the regression model. For example, a time-series model
might use monthly data from many years to test whether a country’s inflation rate, determines
its short-term interest rates. If we use time-series data in a regression, we usually denote the
observations as t = 1, 2, . . . , T .

2 Simple linear regression model

2.1 Assumptions

The four key assumptions of a linear regression model:

Linearity: The relationship between the dependent variable and the independent variable is
linear.

Homoskedasticity: The variance of the regression residuals is the same for all the observations.

Independence: The observations are independent of one another. This implies that the re-
gression residuals are uncorrelated across observations.

Normality: The regression residuals are normally distributed

2.1.1 Linearity

We assume that the true underlying relationship between the dependent and the independent
variables is linear; if not, the model will produce invalid results. For example, Yi = b0 ebi Xi + ϵi
is nonlinear in bi , so we should not apply the linear regression model to it. The independent
variable, X, must also not be random. If the independent variable is random, there would be no
linear relationship between the dependent and independent variables. The residuals of a model
should appear random, i.e. no pattern should be present when the residuals are plotted against
the independent variable.

2.1.2 Homoskedasticity

Observations exhibit homoskedasticity if the variance of the residuals is the same for all observa-
tions:
E ϵ2i = σϵ2 , i = 1, . . . , n.

If the residuals are not homoskedastic, then we refer to this as heteroskedasticity.

2
2.1.3 Independence

In a linear regression model, we assume that the observations are uncorrelated with one another,
implying they are independent. If this assumption is violated, the residuals will be correlated. It is
also a necessary assumption in order to correctly estimate the variances of the estimated parameters
of b0 and b1 that we use in hypothesis tests of the intercept and slope. Therefor, we need to visually
and statistically examine the residuals for a regression model.

2.1.4 Normality

This assumption requires that the residuals are normally distributed. This does not mean that
the dependent and independent variables must be normally distributed; it only means that the
residuals from the model are normally distributed. It is good practice though, to understand the
distribution of the variables in order to identify any outliers as it can substantially influence the
fitted lines such that the estimated model will not fit well for most of the other observations.

2.2 The model

Consider two random variables X and Y , and assume that we have two samples of size n from each
variable. The sample correlation coefficient is defined as
S
ρS = ,
Sx Sy

with v
u Pn 2
t i=1 x2i − ( i=1
u Pn xi )
n
Sx = ,
n−1
v
u Pn 2
t i=1 yi2 − ( i=1
u Pn yi )
n
Sy = ,
n−1
and
n Pn Pn !
1 X
i=1 x i i=1 y i
Sxy = x i yi − .
n−1 n
i=1

The correlation coefficient is indicator on the existence of a linear relationship between the two
variables.

If the two variables are very strongly positively related, the coefficient value is close to +1
(strong positive linear relationship).

If the two variables are very strongly negatively related, the coefficient value is close to -1
(strong negative linear relationship).

3
No straight line relationship is indicated by a coefficient close to zero

If the relationship between y and x is linear, then the variables are connected by regression line

y = β0 + β1 x + ϵ

The estimated regression model is

ŷ = β̂0 + β̂1 x,
where ŷ is the estimated or predicted value of y for a given value of x.

2.3 Estimates of the regression coefficients

The values of beta0 and beta1 are obtained using the least squares method. This method minimizes
the sum of
Psquared differences between the regression line and the data points, i.e., minimizes the
distance ni=1 (yi − ŷi )2 , where yi is the actual value and ŷi is the predicted value at the same x
obtained using ŷ = β0 + β1 x.

Proposition 2.1 The estimates of the regression coefficients are

P
xi yi − nx̄ȳ Sxy
β̂1 = Pi 2 2
= 2,
x
i i − nx̄ Sx
and
β̂0 = ȳ − β̂1 x̄.

Proof 2.1 Proof in class.

Proposition 2.2 An estimator of σ 2 is given by

Pn 2
2 2 e SSE
σ̂ = s = i=1 i = .
n−2 n−2

Proof 2.2 Proof in class.

2.4 Properties of the estimators

The estimators for β0 , β1 , and σ̂ 2 , have the following properties

Proposition 2.3 Gauss-Markov Theorem

Assuming that four linear regression assumptions hold true, the OLS estimators of β0 and β1 are
BLUE: best linear unbiased estimators.

4
The coefficients β̂0 and β̂1 are estimates (i.e., random). Hence, it is possible to show that
!
h i
2 1 x̄2
V β̂0 = σ +P 2
n i (xi − x̄)
h i σ2
V β̂1 = P 2.
i (xi − x̄)

Substituting the unbiased estimator s2 for σ 2 , the estimated variances of the regression coefficients
are !
h i
2 1 x̄2
V̂ β̂0 = s +P 2
n i (xi − x̄)
h i s2
V̂ β̂1 = P 2.
i (xi − x̄)
The variance estimator s2 is unbiased and its variance is given by
2σ 4
V s2 =

.
n−2
Confidence intervals for the regression coefficients are given by
h i h i
β̂0 − tn−2, α2 SE
d β̂0 ≤ β0 ≤ β̂0 + tn−2, α SE
2
d β̂0
h i h i
β̂1 − tn−2, α2 SE β̂1 ≤ β1 ≤ β̂1 + tn−2, α2 SE
d β̂1 .

A confidence interval for the regression variance is as follows

SSE SSE
≤ σ2 ≤ .
χ2n−2,1− a χ2n−2, a
2 2

3 Multiple linear regression model

3.1 The model

Recall the multiple linear regression model

k
X
Yi = β0 + βj Xj,i + εi ,
j=1

for i = 1, . . . , n. This is equivalent to

y1 =β0 + β1 x11 + β2 x12 + · · · + βK x1K + ε1
y2 =β0 + β1 x21 + β2 x22 + · · · + βK x2K + ε2
....
..
yn =β0 + β1 xn1 + β2 xn2 + · · · + βk xnK + εn .

5
It is more convenient to use matrix notation, and write the model as follows

y = Xβ + ε,

where,  
y1
 y2 
y=
 
.. 
 . 
yn

 
ε1
 ε2 
ε= ,
 
..
 . 
εn

 
β0
 β1 
β= 
 ··· 
βK
,

and  
1 x11 x12 · · · x1K
 1 x21 x22 · · · x2K 
X=
 
.. .. .. .. .. 
 . . . . . 
1 xn1 xn2 · · · xnK
.

3.2 Estimators

The estimation is based on the OLS, and we obtain the following estimation for the betas.

Proposition 3.1 The OLS estimator of β is unbiased and it is given by

−1
β̂ = X⊤ X X⊤ y,

with −1
V[β̂] = σ 2 X⊤ X .

Proof 3.1 In class.

6
The fitted values are given by
ŷ = Xβ̂
−1
= X X⊤ X X⊤ y
| {z }
H
= Hy.
The residuals may also be expressed in terms of the matrix H, and we have
e = (I − H)y = (I − H)ε.
The matrix H has the following properties
H t = H,
and
H 2 = H.
We can show the followings

Pn
The sum of residuals is 0, i.e, i=1 ei =0
The residuals and fitted values are uncorrelated
e⊤ e
A unbiased estimator of σ 2 is s2 = n−K−1
−1
The variance-covariance matrix of the estimators β̂ is V[β̂] = σ 2 X⊤ X estimators

4 Assessing the model

There are several Goodness of Fit measures:

Coefficient of Determination R2 ,

F-statistic for the test of fit, and

Standard error of the Regression.

4.1 Coefficient of determination

This measure is also referred to as R-squared or R2 and is the percentage of the variation of the
dependent variable that is explained by the independent variable:
Sum of squares regression
R2 =
Sum of squares total
Pn 2
i=1 Ŷ i − Ȳ
R2 = Pn 2 .
i=1 Yi − Ȳ

7
By construction the coefficient of determination ranges from 0% to 100%. In a simple linear
regression, the square of the pairwise correlation is equal to the coefficient of determination:

r 2 = R2 .

The coefficient of determination is a descriptive measure, so in order to determine whether

our regression model is statistically meaningful, we will need to construct an F -distributed test
statistic.

4.2 F-statistic

We use an f -distributed test statistic to compare two variances. In regression analysis, we can
use an f -distributed test statistic to test whether the slopes in a regression are equal to zero (bi ),
against the alternative hypothesis that at least one slope is not equal to zero:

H0 : β0 = β1 = β2 = . . . = βk = 0.

Ha :At least one bk is not equal to zero. For simple linear regression, these hypotheses simplify to

H0 : β1 = 0
Ha : β1 ̸= 0.

The f -distributed test statistic is constructed by using the sum of squares regression and the sum of
squares error, each adjusted for degrees of freedom; in other words, it is the ratio of two variances.
We divide the sum of squares regression by the number of independent variables, represented by k.
In the case of a simple linear regression, k = 1, so we arrive at the mean square regression (MSR),
which is the same as the sum of squares regression:
Sum of squares regression
M SR = ,
k
P 2
which can be rewritten as M SR = ni=1 Ŷi − Ȳ for simple linear regression. The mean square
error (MSE) is the sum of squares error divided by the degrees of freedom, which is n − k − 1. In
simple linear regression, n − k − 1 becomes n − 2 :
Sum of squares error
M SE = ,
n−k−1
Pn 2
i=1 Yi − Ŷi
M SE = .
n−2

Therefore, the F -distributed test statistic is:

Sum of squares regression
k M SR
F = Sum of squares error
= ,
M SE
n−k−1

8
which is distributed with 1 and n − 2 degrees of freedom in simple linear regression. The F -
statistic in regression analysis is one sided, with the rejection region on the right side, because we
are interested in whether the variation in Y explained (the numerator) is larger than the varia-
tion in Y unexplained (the denominator). The sums of squares from a regression model is often
presented in an analysis of variance (ANOVA) table. An example of an ANOVA table is below:
Source Sum of Squares Degrees of Freedom Mean Square F-Statistic
Regression 191.625 1 191.625 16.0104
Error 47.875 4 11.96875
Total 239.50 5

4.3 Standard Error of the Regression

The standard error of the estimate (se ), which is also known as the standard error of the regression
or the root mean square error. The se is a measure of the distance between the observed values of
the dependent variable and those predicted from the estimated regression; the smaller the se , the
better the fit of the model. The se , along with the coefficient of determination and the F -statistic,
is a measure of the goodness of the fit of the estimated regression line. Unlike the coefficient
of determination and the F -statistic, which are relative measures of fit, the standard error of
the estimate is an absolute measure of the distance of the observed dependent variable from the
regression line. Thus, the se is an important statistic used to evaluate a regression model and is
used in calculating prediction
√ intervals and performing tests on the coefficients. Standard error of
the estimate (se ) = M SE.

5 Hypothesis Testing

We might want to perform hypothesis testing of the linear regression coefficients to determine the
significance of the coefficients. We will look at examples of testing:

Slope Coefficient(s)

Intercept

Independent variable is an indicator variable.

The choice of significance level in hypothesis testing is always a matter of judgment. Analysts
often choose the 0.05 level of significance, which indicates a 5% chance of rejecting the null hy-
pothesis when, in fact, it is true (a Type I error, or false positive). Of course, decreasing the level
of significance from 0.05 to 0.01 decreases the probability of Type I error, but it also increases the
probability of Type II error - failing to reject the null hypothesis when, in fact, it is false (that is, a
false negative). The p-value is the smallest level of significance at which the null hypothesis can be
rejected. The smaller the p-value, the smaller the chance of making a Type I error (i.e., rejecting
a true null hypothesis), so the greater the likelihood the regression model is valid. For example,

9
if the p-value is 0.005 , we reject the null hypothesis that the true parameter is equal to zero at
the 0.5% significance level ( 99.5% confidence). In most software packages, the p-values provided
for regression coefficients are for a test of null hypothesis that the true parameter is equal to zero
against the alternative that the parameter is not equal to zero.

6 Functional forms for simple linear regression

Not every set of independent and dependent variables has a linear relation. In fact, we often see
non-linear relationships in economic and financial data.

There are several different functional forms that can be used to potentially transform the data
to enable their use in linear regression. These transformations include using the log (i.e., natural
logarithm) of the dependent variable, the log of the independent variable, the reciprocal of the
independent variable, the square of the independent variable, or the differencing of the independent
variable. We illustrate and discuss three often-used functional forms, each of which involves log
transformation:

1. the log-lin model, in which the dependent variable is logarithmic but the independent variable
is linear;

2. the lin-log model, in which the dependent variable is linear but the independent variable is
logarithmic; and

3. the log − log model, where both the dependent and independent variables are in logarithmic
form.

7 CAPM and multifactor model

7.1 CAPM

The Capital Asset Pricing Model (CAPM) considers the equilibrium relation between the expected
return of an asset or portfolio µi = E y i , the risk-free return rf , and the expected return of the
market portfolio (µm = E [y m ]). Based on various assumptions (e.g. quadratic utility or normality
of returns) the CAPM states that

µi − rf = βi (µm − rf ) .

This relation is also known as the security market line (SML). When we substitute these definitions
for the expected values in the CAPM we obtain the so-called market model

yti = αi + βi ytm + ϵit .

10
If we write the regression equation in terms of (observed) excess returns xit = yti − rf and xm
t =
m
yt − rf we obtain
xit = βi xm i
t + ϵt .

A testable implication of the CAPM is that the constant term in a simple linear regression using
excess returns should be equal to zero.

Example 7.1 Estimation of the CAPM model for US industries indices compiled by
French.
Using the excess of returns on the market, the consumer goods portfolio, the hi-tech portfolio,
we estimate the CAPM model. The data is available on https: // mba. tuck. dartmouth. edu/
pages/ faculty/ ken. french/ data_ library. html .

7.2 Multifactor model

The CAPM has been frequently challenged by empirical evidence indicating significant risk premia
associated with other factors than the market portfolio. According to the Arbitrage Pricing Theory
(APT) by Ross (1976) there exist several risk factors that are common to a set of assets. These
risk factors (and not only the market risk) capture the systematic risk component. We consider
one version of multi-factor models using the so-called Fama-French benchmark factors SMB (small
minus big) and HML (high minus low)

The factor SMB measures the difference in returns of portfolios of small and large stocks, and
is intended to measure the so-called size effect. The factor HML measures the difference between
value stocks (having a high book value relative to their market value) and growth stocks (with a
low book-market ratio).

Example 7.2 Estimation of the three-factor model for US industries indices compiled by French.
Using the excess of returns on the market, the consumer goods portfolio, the hi-tech portfolio, we
estimate the three-factor model with SMB and HML factors. The data is available on https:
// mba. tuck. dartmouth. edu/ pages/ faculty/ ken. french/ data_ library. html

Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
17 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Module 3
No ratings yet
Module 3
34 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Unit III
No ratings yet
Unit III
11 pages
Unit III
No ratings yet
Unit III
24 pages
Econometrics for Students
No ratings yet
Econometrics for Students
28 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
Unit III
No ratings yet
Unit III
18 pages
Econometrics Theory Note
No ratings yet
Econometrics Theory Note
13 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Basics of Regression Analysis
No ratings yet
Basics of Regression Analysis
63 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
No ratings yet
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
9 pages
Manual ML 1
No ratings yet
Manual ML 1
8 pages
CHP 3 Notes, Gujarati
No ratings yet
CHP 3 Notes, Gujarati
4 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Lecture Notes Week One
No ratings yet
Lecture Notes Week One
16 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Linear Regression for Analysts
No ratings yet
Linear Regression for Analysts
6 pages
Two-Variable Regression Model Basics
No ratings yet
Two-Variable Regression Model Basics
17 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Econometrics Final
No ratings yet
Econometrics Final
13 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Linear Regression for Engineers
No ratings yet
Linear Regression for Engineers
56 pages
Finance Students' Guide to Regression
No ratings yet
Finance Students' Guide to Regression
41 pages
OLS Regression Analysis Basics
No ratings yet
OLS Regression Analysis Basics
16 pages
DA&V Module 2 (SAMI)
No ratings yet
DA&V Module 2 (SAMI)
14 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
1 - Linear Models
No ratings yet
1 - Linear Models
22 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
BRM - L4,5 - Linear Regression
No ratings yet
BRM - L4,5 - Linear Regression
113 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Econometrics
No ratings yet
Econometrics
13 pages
Tests
No ratings yet
Tests
10 pages
220C3A
No ratings yet
220C3A
2 pages
Wollo University Kombolcha Institute of Technology
No ratings yet
Wollo University Kombolcha Institute of Technology
57 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
Os Chap2
No ratings yet
Os Chap2
26 pages
D30 Beams
No ratings yet
D30 Beams
18 pages
Hindi Paraphrasing Tool Project
No ratings yet
Hindi Paraphrasing Tool Project
34 pages
Econometrics: 2SLS & Hausman Test
No ratings yet
Econometrics: 2SLS & Hausman Test
4 pages
(Ebook) Introduction To WinBUGS For Ecologists: Bayesian Approach To Regression, ANOVA, Mixed Models and Related Analyses by Marc Kery ISBN 9780123786050, 0123786053 PDF Download
100% (1)
(Ebook) Introduction To WinBUGS For Ecologists: Bayesian Approach To Regression, ANOVA, Mixed Models and Related Analyses by Marc Kery ISBN 9780123786050, 0123786053 PDF Download
58 pages
DAA Notes PDF
No ratings yet
DAA Notes PDF
200 pages
Exponential and Logarithmic Functions 4
No ratings yet
Exponential and Logarithmic Functions 4
23 pages
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
No ratings yet
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
11 pages
Module III Correlation and Regression
No ratings yet
Module III Correlation and Regression
61 pages
Differential Equation, Calculus of Variation and Special Function
No ratings yet
Differential Equation, Calculus of Variation and Special Function
22 pages
All Practical AI
No ratings yet
All Practical AI
28 pages
U1L07 - Activity Guide - Apps With Storage
No ratings yet
U1L07 - Activity Guide - Apps With Storage
2 pages
DPW 0522 0526 338 Eman
No ratings yet
DPW 0522 0526 338 Eman
12 pages
Econ f342 Appeco
No ratings yet
Econ f342 Appeco
3 pages
Model Hidden Markov Pada Prediksi Harga Beras Dan Perpindahan Konsumen Beras Di Kota Solok Provinsi Sumatera Barat Melsi Diansa Putri
No ratings yet
Model Hidden Markov Pada Prediksi Harga Beras Dan Perpindahan Konsumen Beras Di Kota Solok Provinsi Sumatera Barat Melsi Diansa Putri
10 pages
Ordinal Caca
No ratings yet
Ordinal Caca
3 pages
Local Boutique Problem Solution
No ratings yet
Local Boutique Problem Solution
10 pages
CS F211 - Data Structures & Algorithms
No ratings yet
CS F211 - Data Structures & Algorithms
7 pages
Homework 8
No ratings yet
Homework 8
4 pages
Multinomial Problem Statement
No ratings yet
Multinomial Problem Statement
28 pages
Reviewer Basic Calculus
No ratings yet
Reviewer Basic Calculus
4 pages
Data Science Statistics Assignment
No ratings yet
Data Science Statistics Assignment
9 pages
Data Structures & Algorithms Problems
No ratings yet
Data Structures & Algorithms Problems
1 page
ML Unit-2
No ratings yet
ML Unit-2
34 pages
KMP Algorithm for Developers
No ratings yet
KMP Algorithm for Developers
2 pages
CP3404 Assignment 1 SP2 2020
No ratings yet
CP3404 Assignment 1 SP2 2020
7 pages
Chapter 3 Ann
No ratings yet
Chapter 3 Ann
26 pages

Chapter 2 Regression Analysis Notes

Uploaded by

Chapter 2 Regression Analysis Notes

Uploaded by

Regression analysis

July 31, 2024

1.1 The model

Linear regression expresses a dependent variable as a linear function of independent variables,

2 Simple linear regression model

The four key assumptions of a linear regression model:

 Normality: The regression residuals are normally distributed

If the residuals are not homoskedastic, then we refer to this as heteroskedasticity.

2.2 The model

The estimated regression model is

2.3 Estimates of the regression coefficients

Proposition 2.1 The estimates of the regression coefficients are

Proof 2.1 Proof in class.

Proposition 2.2 An estimator of σ 2 is given by

Proof 2.2 Proof in class.

2.4 Properties of the estimators

The estimators for β0 , β1 , and σ̂ 2 , have the following properties

Proposition 2.3 Gauss-Markov Theorem

A confidence interval for the regression variance is as follows

3 Multiple linear regression model

3.1 The model

Recall the multiple linear regression model

for i = 1, . . . , n. This is equivalent to

Proposition 3.1 The OLS estimator of β is unbiased and it is given by

Proof 3.1 In class.

4 Assessing the model

There are several Goodness of Fit measures:

 F-statistic for the test of fit, and

4.1 Coefficient of determination

The coefficient of determination is a descriptive measure, so in order to determine whether

Therefore, the F -distributed test statistic is:

4.3 Standard Error of the Regression

 Independent variable is an indicator variable.

6 Functional forms for simple linear regression

7 CAPM and multifactor model

yti = αi + βi ytm + ϵit .

7.2 Multifactor model

You might also like

Normality: The regression residuals are normally distributed

F-statistic for the test of fit, and

Independent variable is an indicator variable.