0% found this document useful (0 votes)

70 views54 pages

Linear Regression II

Linear regression finds the linear relationship between two variables by estimating the intercept and slope of the line that best fits the data. The intercept is the value of Y when X is 0, and the slope is the change in Y given a one unit change in X. Residuals are the differences between actual Y values and predicted Y values from the regression line. The variance of the estimate quantifies the error in predictions as the average squared residuals, with smaller variance indicating better fit. The coefficient of determination (R2) shows the proportion of variance in Y explained by the regression line.

Uploaded by

Mutai Victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views54 pages

Linear Regression II

Uploaded by

Mutai Victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 54

Linear Regression II

Linear regression
• An estimate of the linear relationship
between two variables (X & Y) in
terms of the actual scale
– We find the equation for the line that
best fits the data
– This involves
• Finding the intercept– e.g., the value of Y
when X = 0
• Finding the slope– the change in Y given a
one point change in X
Equation of a line
• Y’ = a + bX
• The intercept (a) is the point at
which the line crosses the Y axis
– The value of Y when X = 0
• The slope (b) is the amount of
increase in Y given an increase in
one point of X
• Y’ means “predicted Y value”
Making predictions
• We use the regression equation to
predict what Y will be given some value
of X
– E.g., how tall is someone who weights 121
lbs?
• Last time, we focused on “perfect
predictions”
– Weight “perfectly predicted” height
because the correlation was one…
– That’s usually not the case in real life…
Making predictions
• One way to think about the
regression line is in terms of
“conditional” averages (means)
– Given some condition of X, what is the
mean of Y?
– So, given a GMAT score of 640, what is
the average income?
The line that “best fits”
• The method behind linear regression
involves finding the line that “best
fits” the data
– We won’t get into how this is computed
in this class
• Involves matrix algebra
– But conceptually, the point is to find a
line that minimizes the total distance
from all the points
The line that “best fits”
• The method behind linear regression
involves finding the line that “best fits”
the data
– We won’t get into how this is computed in
this class
• Involves matrix algebra
– But conceptually, the goal is to find a line
that minimizes the total distance from all
the points
• Often called “Ordinary Least Squares regression”
because you square the distance from each
point to the line and use an iterative process to
minimize it– i.e., find the “least” amount of
summed squares
Residuals
• The regression line is what we would
predict for Y given some X…
Residuals
• The regression line is what we would
predict for Y given some X…
– Regression equation gives us the straight
line that minimizes the error involved in
making predictions
• Residuals are what we call error
– Residuals are the differences between an
actual Y value and the predicted Y value
– The residual is Y – Y’
• The actual Y value minus the predicted Y value
Variance of the estimate
• We can quantify the amount of error
in the prediction by finding the
average of all of the square residuals
10000
-11000
Variance of the estimate
• We can quantify the amount of error
in the prediction by finding the
average of all of the squared
residuals
– This is the “variance of the estimate”
E.g., How much do the points vary
around the line
The closer the
points are to the
∑
2
2 (Y − Y ′ ) line, the smaller
σ
estY = the variance of the
N
estimate will be
Variance of the Estimate
• When r=0 (no correlation), this
means the best fitting line is a
horizontal one…

∑
2
2 (Y − Y ′ )
σ
estY =
N
No correlation
No correlation
Variance of the Estimate
• When r=0 (no correlation), this
means the best fitting line is a
horizontal one…
– Same predicted Y for all values of X
• The line is doing nothing for us..
– The variance of the estimate is largest
in this case
• The variance of the predictions around the
regression line is just the variance of Y

When r=0,
∑
2 2
2
σ =
(Y − Y ′ ) Y’ is the
mean of Y 2
σ =
∑ (Y − Y )
estY estY
N N
Variance of the estimate
• For a sample, we use N-2 in the
denominator to get an unbiased
estimate
– Two degrees of freedom

∑
2
2 (Y − Y ′ )
sestY =
N −2
Explained vs. unexplained
variance
• The difference between the total
amount of variance in Y and the
variance of the estimate is the
amount of variance explained by the
regression line
• Explained variance = total variance-
unexplained variance
– Total Variance = Unexplained variance
+ Explained Variance
Coefficient of determination
• This is the “proportion of the total
variance that is explained (or
determined) by the predictor
variable”
• It is the (explained variance)/(total
variance)
– This equals r2
– It is the proportion of the variance in Y
that is accounted for by X
Coefficient of non-
determination
• This is simply the reverse—the
amount of variance in Y that X does
not account for
– An estimate of how much the points
don’t fall on the line
• It is the (unexplained
variance)/(total variance) or (1-r2)
The variance of the
estimate
• Remember that the variance of the
estimate is the unexplained variance
• An easier way to compute the
variance of the estimate is to use the
coefficient of non-determination
2
σestY 2
2
=1 − r Becomes…
σY σ 2 2 2
= σ (1 − r )
estY Y
Example
• Relationship between age and verbal
comprehension
• We want to use age (in months) to
predict test scores on a verbal
comprehension test
Example
• In our sample of 100 kids from
grades 1-6, we have
• Mean age of 98.14 months (s = 21.0)
• Mean test score of 30.35 items correct
out of 50 (s = 7.25)
Example
• In our sample of 100 kids from
grades 1-6, we have
• Mean age of 98.14 months (s = 21.0)
• Mean test score of 30.35 items correct
out of 50 (s = 7.25)
Why use regression?
Our independent variable is age—
a continuous measure…
We don’t have 2 groups to
compare, so we can use a t-test.
We want to look at how increases
in age relate to increases (or
decreases) in scores
Example

• In our sample of 100 kids from grades

1-6, we have
• Mean age of 98.14 months (s = 21.0)
• Mean test score of 30.35 items correct
out of 50 (s = 7.25)
• We find that the correlation between age
and test score in our sample is r = .72
• How can we make predictions for
verbal comprehension given an age?
1. Find the slope of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable)
– My = 30.35, sy = 7.25
• r = .72

sY
bYX = r
sX
1. Find the slope of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable)
– My = 30.35, sy = 7.25
• r = .72

sY 7.25
bYX = r= (.72)
sX 21.0
1. Find the slope of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable)
– My = 30.35, sy = 7.25
• r = .72

sY 7.25
bYX = r= (.72) = .249
sX 21.0
2. Find the intercept of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable)
– My = 30.35, sy = 7.25
• b = .249

aYX = Y − bYX X
2. Find the intercept of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable)
– My = 30.35, sy = 7.25
• b = .249

aYX = Y − bYX X = 30.35 − .249(98.14)

2. Find the intercept of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable)
– My = 30.35, sy = 7.25
• b = .249

aYX = Y − bYX X = 30.35 − .249(98.14)

= 5.91
2. Find the intercept of the line
• X is age (the independent variable)
– Mx = 98.14, sx = 21.0
• Y is test score (the dependent
variable) For an age of
0 months
– My = 30.35, sy = 7.25 (X=0), we
predict a
• b = .249 score of 5.91
on the test

aYX = Y − bYX X = 30.35 − .249(98.14)

= 5.91
Making a prediction
• Y’ = a + bX
– a = 5.91
– b = .249
• Y’ means “predicted Y”

• A child is 10 years old (120 months)

– His predicted test result will be:
Y’ = 5.91 + .249(120) = 35.8 items
correct
Example
• In our sample of 100 kids from
grades 1-6, we have
• Mean age of 98.14 months (s = 21.0)
• Mean test score of 30.35 items correct
out of 50 (s = 7.25)
We predict a child at 120
months will get 35.8 items
correct
This child is older than the
average child in our sample,
so he does better than
average on the test
Interpreting: r vs. b
• b (the slope of the line) is the change
(amount of points) we predict in Y
based on a one point change is X
– For each month increase in age, test scores
go up .249
• r (the correlation) is the change (in
terms of standard deviations) we
predict in Y based on a one standard
deviation change in X
– For every one standard deviation increase
in age, test scores will increase by .72 of a
standard deviation
The residual
• Our equation is:
Test score = 5.91 + .249(age in months)
• We have a child who is 92 months old,
and she gets 40 questions correct
• We’d predict she would get
Y’ =5.91+.249(92) = 28.82 questions
correct
The residual
• Our equation is:
Test score = 5.91 + .249(age in months)
• We have a child who is 92 months old,
and she gets 40 questions correct
• We’d predict she would get
Y’ =5.91+.249(92) = 28.82 questions
correct
The residual is 40-28.82 = 11.18
Positive because she did better than
our predicted value
The residual
• Our equation is:
Test score = 5.91 + .249(age in months)
• We have a child who is 92 months old, and she
gets 40 questions correct
• We’d predict she would get
Y’ =5.91+.249(92) = 28.82 questions correct
The residual is 40-28.82 = 11.18
Positive because she did better than our
predicted value
If another 92 month old got 27 questions
correct, the residual would be 27-28.82=-1.12
Example: Variance explained
• In our sample of 100 kids from
grades 1-6, we have
• Mean age of 98.14 months (s = 21.0)
• Mean test score of 30.35 items correct
out of 50 (s = 7.25)

The total variance in test

scores is s2 = 7.252 = 52.56

How much is explained by

the regression line?
Unexplained variance
• If we went through each of our 100
data points, we could calculate the
residual– the value of Y we actually
got minus the value of Y we
predicted from the equation
– The sum of those squared deviations is
everything we didn’t explain

∑(Y −Y ')
2
Age Score
(X) (Y)

92 25

100 30

84 29

73 25
Age Score Predicted Score
(X) (Y) Y’ = 5.91+.249(X)

92 25 5.91+.249(92) = 28.82

100 30 5.91+.249(92) = 30.81

84 29 5.91+.249(92) = 26.83

73 25 5.91+.249(92) = 24.09
Age Score Predicted Score Residual
(X) (Y) Y’ = 5.91+.249(X) Y-Y’

92 25 5.91+.249(92) = 28.82 25-28.82=-3.82

100 30 5.91+.249(92) = 30.81 30-30.81=-.81

84 29 5.91+.249(92) = 26.83 29-26.83=2.17

73 25 5.91+.249(92) = 24.09 25-24.09=.91

Unexplained variance
• If we went through each of our 100
data points, we could calculate the
residual– the value of Y we actually
got minus the value of Y we
predicted from the equation
– The sum of those squared deviations is
everything we didn’t explain
– The average squared deviations is the
variance of the estimate
∑
2
2 (Y − Y ′ )
σestY =
N
Explained variance
• The total variance is the variance of Y
• The unexplained variance is the
average squared deviation score
• Total variance = explained variance +
unexplained variance
– So all that’s left is what we explained by
the regression line
– Explained variance = total variance
-unexplained
Coefficient of determination
• We know from our example that the
correlation between age & test score
was .72
– We can compute the coefficient of
determination by squaring it
– r2 = .722 = .52
• Age accounts for 52% of the
variance in test scores
Coefficient of non-
determination
• This is simply the reverse—the amount of
variance in Y that X does not account for
– An estimate of how much the points don’t fall
on the line
• It is the (unexplained variance)/(total
variance) or (1-r2)
– So 1- .722 = 1 - .52 = .48
• 48% of the variance in test scores is not
accounted for by age
– We cannot account for 48% of the variance in
test scores
• Next time: More regression & quiz
review
• Happy Spring!

Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Topics: Regression
No ratings yet
Topics: Regression
26 pages
@regression
No ratings yet
@regression
33 pages
Linear Regression Zamin
No ratings yet
Linear Regression Zamin
29 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
06 Regression
No ratings yet
06 Regression
18 pages
Linearregression-Rupak
No ratings yet
Linearregression-Rupak
32 pages
Statistics and Probability Week 7 - 8
No ratings yet
Statistics and Probability Week 7 - 8
4 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
MA150 Statistics Notes (Chapter 14) Descriptive Methods in Regression and Correlation
No ratings yet
MA150 Statistics Notes (Chapter 14) Descriptive Methods in Regression and Correlation
25 pages
Statistics for Data Analysts
No ratings yet
Statistics for Data Analysts
18 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
STAT Q4 Week 9 Enhanced.v1
No ratings yet
STAT Q4 Week 9 Enhanced.v1
11 pages
Regression: Leech N L, Barret K C & Morgan G A (2011)
No ratings yet
Regression: Leech N L, Barret K C & Morgan G A (2011)
35 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Psych Stat Reviewer Midterms
No ratings yet
Psych Stat Reviewer Midterms
10 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
60 pages
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
No ratings yet
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
39 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Stats and Probability
No ratings yet
Stats and Probability
11 pages
Chapter 5 - Eng
No ratings yet
Chapter 5 - Eng
20 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Regression for Academics
No ratings yet
Linear Regression for Academics
28 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Lecture 07 Regression
No ratings yet
Lecture 07 Regression
22 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Simple Regression
100% (1)
Simple Regression
50 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Stats and Probability
No ratings yet
Stats and Probability
13 pages
Intro to Linear Regression
No ratings yet
Intro to Linear Regression
22 pages
4-Biol 605-Regression Models
No ratings yet
4-Biol 605-Regression Models
25 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Correlation & Regression Solutions
No ratings yet
Correlation & Regression Solutions
8 pages
Lecture17 Ova Interactions
No ratings yet
Lecture17 Ova Interactions
88 pages
Lecture16 TwoWayANOVA
No ratings yet
Lecture16 TwoWayANOVA
61 pages
One-Way Anova & Multiple Contrasts
No ratings yet
One-Way Anova & Multiple Contrasts
78 pages
Lecture14 OneWayANOVA
No ratings yet
Lecture14 OneWayANOVA
46 pages
Applied Eco No Metrics With Stata
No ratings yet
Applied Eco No Metrics With Stata
170 pages
LDA at Work
No ratings yet
LDA at Work
53 pages
GLMM Introduction for Tree Breeders
No ratings yet
GLMM Introduction for Tree Breeders
47 pages
Correlation and Regression
No ratings yet
Correlation and Regression
11 pages
Reading 4
No ratings yet
Reading 4
15 pages
Econometrics PhD Lecture Notes 2002
No ratings yet
Econometrics PhD Lecture Notes 2002
86 pages
Copula Regression Parsa Klugman
No ratings yet
Copula Regression Parsa Klugman
10 pages
Libro - Peligro Sísmico (G. Montalva)
No ratings yet
Libro - Peligro Sísmico (G. Montalva)
243 pages
E-Servqual: How E-Servqual Can Influence E-Satisfaction in Shopee
No ratings yet
E-Servqual: How E-Servqual Can Influence E-Satisfaction in Shopee
5 pages
Analytical Method Validation
100% (1)
Analytical Method Validation
57 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
1.1 - Statistical Analysis PDF
No ratings yet
1.1 - Statistical Analysis PDF
10 pages
Statistics For Biology
100% (1)
Statistics For Biology
18 pages
P08 - 178380 - Eviews Guide
No ratings yet
P08 - 178380 - Eviews Guide
9 pages
2020 Mahajanetal Wtliftingpaper
No ratings yet
2020 Mahajanetal Wtliftingpaper
20 pages
Gulf Case Study Soln
No ratings yet
Gulf Case Study Soln
9 pages
Homework 1
0% (1)
Homework 1
8 pages
Basics of Regression Analysis
No ratings yet
Basics of Regression Analysis
63 pages
Assessing The Spatial Differences Among Some Urban Expansion Driving Forces in Constanța Metropolitan Area (Romania) Ema CORODESCU, Cătălin CÎMPIANU
No ratings yet
Assessing The Spatial Differences Among Some Urban Expansion Driving Forces in Constanța Metropolitan Area (Romania) Ema CORODESCU, Cătălin CÎMPIANU
19 pages
SC A232 Exercise c5
No ratings yet
SC A232 Exercise c5
10 pages
Ch.7 Demand Forecasting
No ratings yet
Ch.7 Demand Forecasting
2 pages
16-219-237 A Study On The Effects of FDI On Cambodia's Economic Growth by HUON Sophanara-Final
No ratings yet
16-219-237 A Study On The Effects of FDI On Cambodia's Economic Growth by HUON Sophanara-Final
19 pages
Captivators Sharif Sir 1
No ratings yet
Captivators Sharif Sir 1
40 pages
Precision Medicine in Hepatology: Harnessing IoT and Machine Learning For Personalized Liver Disease Stage Prediction
No ratings yet
Precision Medicine in Hepatology: Harnessing IoT and Machine Learning For Personalized Liver Disease Stage Prediction
11 pages
Beshir The Ongoing Political Crises and Their Impact On Micro and Small Enterprises A Case Study of Ethiopia
No ratings yet
Beshir The Ongoing Political Crises and Their Impact On Micro and Small Enterprises A Case Study of Ethiopia
13 pages
Tutorial Letter 201/1/2023: Econometrics
No ratings yet
Tutorial Letter 201/1/2023: Econometrics
11 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 13
No ratings yet
Introductory Econometrics For Finance Chris Brooks Solutions To Review Questions - Chapter 13
7 pages
Davydenko 2013
No ratings yet
Davydenko 2013
13 pages
Mutual Fund Assets & Returns Analysis
No ratings yet
Mutual Fund Assets & Returns Analysis
11 pages
Mec-9 (Mec109) - em
No ratings yet
Mec-9 (Mec109) - em
16 pages

Linear Regression II

Uploaded by

Linear Regression II

Uploaded by

Linear Regression II

• In our sample of 100 kids from grades

aYX = Y − bYX X = 30.35 − .249(98.14)

aYX = Y − bYX X = 30.35 − .249(98.14)

aYX = Y − bYX X = 30.35 − .249(98.14)

• A child is 10 years old (120 months)

The total variance in test

How much is explained by

100 30 5.91+.249(92) = 30.81

92 25 5.91+.249(92) = 28.82 25-28.82=-3.82

100 30 5.91+.249(92) = 30.81 30-30.81=-.81

84 29 5.91+.249(92) = 26.83 29-26.83=2.17

73 25 5.91+.249(92) = 24.09 25-24.09=.91

You might also like