KEMBAR78
Regression | PDF | Regression Analysis | Linear Regression
0% found this document useful (0 votes)
8 views32 pages

Regression

The document provides an overview of regression analysis, including its definitions, types (logistic and linear), and advantages in predicting relationships between variables. It outlines the steps involved in performing regression analysis, such as calculating correlation coefficients and regression coefficients, along with an example involving water usage and tomato yield. Additionally, it discusses the significance of the regression line, sources of variation, and the interpretation of results in statistical software like SPSS.

Uploaded by

mais.nayef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views32 pages

Regression

The document provides an overview of regression analysis, including its definitions, types (logistic and linear), and advantages in predicting relationships between variables. It outlines the steps involved in performing regression analysis, such as calculating correlation coefficients and regression coefficients, along with an example involving water usage and tomato yield. Additionally, it discusses the significance of the regression line, sources of variation, and the interpretation of results in statistical software like SPSS.

Uploaded by

mais.nayef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

REGRESSION

Dr Hamza Alduraidi
?What is regression

 Fitting a line to the data using an equation in order


to describe and predict data
 Logistic Regression
 Binary (outcome is nominal)
 Multinomial (outcome is ordinal)

 Linear Regression
 Multiple linear (outcome is continuous)
Regression Analysis
 Regression Analysis is mathematical measure of
average relationship between two or more variables.
 Regression Analysis is a very powerful tool in the field
of statistical analysis in predicting the value of one
variable, given the value of another variable, when
those variables are related to each other.
 Regression analysis is a statistical tool used in
prediction of value of unknown variable from known
variable.
Advantages of Regression
Analysis
 Regression analysis provides estimates of
values of the dependent variables from the
values of independent variables.
 Regression analysis also helps to obtain a
measure of the error involved in using the
regression line as a basis for estimations .
 Regression analysis helps in obtaining a
measure of the degree of association or
correlation that exists between the two
variable.
Assumptions in Regression Analysis
 Existence of actual linear relationship.
 The regression analysis is used to estimate the
values within the range for which it is valid.
 The relationship between the dependent and
independent variables remains the same till the
regression equation is calculated.
 The dependent variable takes any random value
but the values of the independent variables are
fixed.
 In regression, we have only one dependant
variable in our estimating equation. However, we
can use more than one independent variable.
Regression line
 Regression line is the line which gives the
best estimate of one variable from the value of
any other given variable.
 The regression line gives the average
relationship between the two variables in
mathematical form.
Regression line
 For two variables X and Y, there are always
two lines of regression
 Regression line of X on Y : gives the best
estimate for the value of X for any specific
given values of Y
 X=a+bY a = X - intercept
b = Slope of the line
X = Dependent variable
Y = Independent variable
Simple Linear Regression
 In general, the simple linear regression equation is:
Ŷi = b0 + b1Xi
 Why do we need regression in addition to correlation?
 To predict a Y for a new value of X
 To answer questions regarding the slope. For example,
 With additional shelf space (X), what effect will there be on sales (Y)?
 If we raise prices by a particular amount or percentage, will it cause sales
to drop? (This measures elasticity.)
 It makes the scatter plot a better display (graph) of the data if we can
plot a line through it. It presents much more information on the
diagram.
 In correlation, on the other hand, we just want to know if
two variables are related. This is used a lot in social
science research.

8
Simple Linear Regression
 The regression equation Ŷi = b0 + b1Xi is a sample
estimator of the true population regression equation,
which we could build were we to take a census:

 Yi = β0 + β1Xi + εi
where,
β0 = true Y intercept for the population
β1 = true slope for the population
εi = random error in Y for observation i
 In regression analysis, we hypothesize that there is a
true regression line for the population. The b0 and b1
coefficients are estimates of the true population
coefficients, β0 and β1.

9
Simple Linear Regression

Ŷi is a point on the regression line


Yi is an individual data value

 The deviations of the individual observations (the


points) from the regression line, (Y i - Ŷi), the
residuals, are denoted by ei where ei = (Yi - Ŷi).
 Some deviations are positive (for the points above the line);
some are negative (for the points below the line). If a point
is on the line, its deviation = 0. Note that the Σei = 0.

10
Steps in Regression
1- For Xi (independent variable) and Yi (dependent variable),
Calculate:
ΣYi
ΣXi
ΣXiYi
ΣXi2
ΣYi2

2- Calculate the correlation coefficient, r:


nX i Yi  (X i )(Yi )
r=
nX i
2
 X i 
2
 nY
i
2
 Yi 
2

-1 ≤ r ≤ 1
[This can be tested for significance. H0: ρ=0. If the correlation is not significant,
then X and Y are not related. You really should not be doing this regression!]

11
Steps in Regression
3- Calculate the coefficient of determination: r2 = (r)2
0 ≤ r2 ≤ 1
This is the proportion of the variation in the dependent variable (Yi) explained by
the independent variable (Xi)

4- Calculate the regression coefficient b1 (the slope):


nX i Yi  (X i )( Yi )
b1 =
nX i2  X i 
2

Note that you have already calculated the numerator and the denominator for parts
of r. Other than a single division operation, no new calculations are required.
BTW, r and b1 are related. If a correlation is negative, the slope term must be
negative; a positive slope means a positive correlation.

5- Calculate the regression coefficient b0 (the Y-intercept, or constant):


b0 = Y  b1 X

The Y-intercept (b0) is the predicted value of Y when X = 0.

12
Steps in Regression
6- The regression equation (a straight line) is:
Yˆi = b0 + b1Xi

7- [OPTIONAL] Then we can test the regression for statistical significance.

There are 3 ways to do this in simple regression:


(a) t-test for correlation:
H0: ρ=0
H1: ρ≠0

r n 2
tn-2 =
1 r2

(b) t-test for slope term


H0: β1=0
H1: β1≠0

13
Example: Water and Tomato
Yield
n = 5 pairs of X,Y observations
 Independent variable (X) is amount of water

(in gallons) used on crop


 Dependent variable (Y) is yield (bushels of

tomatoes). Y
i

2
iX
1
XY
i

2
X
1
i Y
4
i
2
i
2

5 2 10 4 25
8 3 24 9 64
10 4 40 16 100
15 5 75 25 225
40 15 151 55 418

14
Example: Water and Tomato
Yield
Step 1-
ΣYi = 40
ΣXi =15
ΣXiYi =151
ΣXi2 = 55
ΣYi2 = 418

(5)(151)  (15)( 40) 155


Step 2- r = = = .9903
(5)(55)  (15) (5)(418)  (40) 
2 2
50490

15
Example: Water and Tomato
Yield
Step 3- r2 = (.9903)2 = 98.06%

155
Step 4- b1 = = 3.1 The slope is positive. There is a positive relationship
50
between water and crop yield.

40 15
Step 5- b0 =   - 3.1   = -1.3
 5   5

Step 6- Thus, Yˆi = -1.3 + 3.1Xi

16
Example: Water and Tomato
Yield

Yˆi = -1.3 + 3.1 Xi


# bushels Does no water Every gallon # gallons of water
of result in a adds
tomatoes negative yield? 3.1 bushels
of tomatoes

17
Example: Water and Tomato
Yield Y X Yˆ e
i ie i i i
2

2 1 1.8 2. 04.
5 2 4.9 1. 01.
8 3 8.0 0 0
10 4 11.1 1.1- 1.21
15 5 14.2 8. 64.
Σei = 0 Σei = 1.90
2

Σei2 = 1.90. This is a minimum, since regression


minimizes Σei2 (SSE)
Now we can answer a question like: How many
bushels of tomatoes can we expect if we use 3.5
gallons of water? -1.3 + 3.1 (3.5) = 9.55 bushels.
Notice the danger of predicting outside the range of X.
The more water, the greater the yield? No. Too much
water can ruin the crop.

18
Sources of Variation in
Regression
If we did not have a significant regression (i.e., X
Measures of Variation in Regression
does not predict Y) we would use Yˆi Y as our
regression equation. Y

Y  Y Yˆ  Y  Y  Yˆ 
i i i i

 ˆ
2

2
ˆ
 Yi  Y  Yi  Y   Yi  Yi
           
 
2

Total Explained Un exp lained
Variation Variation Variation
inY

Simple Regression 19
Sources of Variation in Regression

Total Variation: 
 Yi  Y  Y  n
2 2 Y 2
i

Yˆ  Y  b Y  b X Y 
2 Y  i
2

Explained Variation:
i 0 i 1 i i
n

Unexplained Variation:

 Yi  Yˆi  Y
2
i
2
 b0Yi  b1X iY

From our previous problem,


Total variation in Y = 418 – (40)2/5 = 98
Explained variation (explained by X) = -1.3(40) + 3.1(151) – (40) 2/5 = 96.10
Unexplained variation = 418 - -1.3(40) - 3.1(151) = 1.90
The coefficient of determination, r2, is the proportion of Y explained by X.
ExplainedVariation 96.10
r2   .98
TotalVaria tion 98
In other words, 98% of the total variation in crop yield is explained by
the linear relationship of yield with amount of water used on the crop.

20
The Explanation of
Regression Line
 In case of perfect correlation ( positive or
negative ) the two line of regression coincide.
 If the two R. line are far from each other then
degree of correlation is less, & vice versa.
 The mean values of X &Y can be obtained as
the point of intersection of the two regression
line.
 The higher degree of correlation between the
variables, the angle between the lines is
smaller & vice versa.
Line
& Method of Least
Squares
 Regression Equation of y on x
Y = a + bx
In order to obtain the values of ‘a’ & ‘b’
∑y = na + b∑x
∑xy = a∑x + b∑x2
 Regression Equation of x on y
X = c + dy
In order to obtain the values of ‘c’ & ‘d’
∑x = nc + d∑y
∑xy = c∑y + d∑y2
Standard Error of
.Estimate
 Standard Error of Estimate is the measure of
variation around the computed regression line.
 Standard error of estimate (SE) of Y measure the
variability of the observed values of Y around the
regression line.
 Standard error of estimate gives us a measure
about the line of regression of the scatter of the
observations about the line of regression.
:Regression Equation
Y = .823X + -4.239

We can predict a Y score from an X by


plugging a value for X into the
equation and calculating Y
What would we expect a person to get on
quiz #4 if they got a 12.5 on quiz #3?

Y = .823(12.5) + -4.239 = 6.049


SPSS Regression Set-up
• “Criterion,”
• y-axis variable,
• what you’re
trying to predict

• “Predictor,”
• x-axis variable,
• what you’re
basing the
prediction on

Note: Never refer to the IV or DV when doing regression


Test Procedure in SPSS
Output of Linear Regression
Analysis
 SPSS will generate quite a few tables of output for a
linear regression.
 The first table of interest is the Model Summary
table. This table provides the R and R 2 value. The R
value is 0.873, which represents the simple
correlation. It indicates a high degree of correlation.
The R2 value indicates how much of the dependent
variable, "price", can be explained by the
independent variable, "income". In this case, 76.2%
can be explained, which is very large .
 The next table is the ANOVA table. This table
indicates that the regression model predicts
the outcome variable significantly well. How
do we know this? Look at the "Regression"
row and go to the Sig. column. This indicates
the statistical significance of the regression
model that was applied. Here, p < 0.0005,
which is less than 0.05, and indicates that,
overall, the model applied can statistically
significantly predict the outcome variable.
 The next table is the ANOVA table. This table
indicates that the regression model predicts
the outcome variable significantly well. How
do we know this? Look at the "Regression"
row and go to the Sig. column. This indicates
the statistical significance of the regression
model that was applied. Here, p < 0.0005,
which is less than 0.05, and indicates that,
overall, the model applied can statistically
significantly predict the outcome variable.
Getting Regression Info from
SPSS See correlation
Model Summary
& regression
Adjusted Std. Error of worksheet
Model R R Square R Square the Estimate
1 .777a .603 .581 18.476
a. Predictors: (Constant), Distance from target

y’ = b (x) + a

a y’ = -4.263(20) + 125.401

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 125.401 14.265 8.791 .000
Distance from target -4.263 .815 -.777 -5.230 .000
a. Dependent Variable: Total ball toss points

You might also like