KEMBAR78
Regression Analysis Guide | PDF | Coefficient Of Determination | Regression Analysis
0% found this document useful (0 votes)
278 views22 pages

Regression Analysis Guide

Regression analysis is a statistical method used to quantify the relationship between two or more quantitative variables and predict the value of a dependent variable from the independent variables. Simple linear regression involves using one independent variable X to predict the dependent variable Y based on the equation Y = a + bX, where a is the y-intercept and b is the regression coefficient. The document provides an example of using sales calls data to predict the number of copiers sold, derives the regression equation, uses it to make a prediction, and tests the significance of the regression model.

Uploaded by

shane naigal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
278 views22 pages

Regression Analysis Guide

Regression analysis is a statistical method used to quantify the relationship between two or more quantitative variables and predict the value of a dependent variable from the independent variables. Simple linear regression involves using one independent variable X to predict the dependent variable Y based on the equation Y = a + bX, where a is the y-intercept and b is the regression coefficient. The document provides an example of using sales calls data to predict the number of copiers sold, derives the regression equation, uses it to make a prediction, and tests the significance of the regression model.

Uploaded by

shane naigal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT 8

REGRESSION ANALYSIS

Mathematics Department
XAVIER UNIVERSITY-ATENEO DE CAGAYAN
Regression Analysis
 Regression analysis is a statistical method that
deals with finding the best relationship between
two or more quantitative variables, quantifying the
strength of that relationship, and using methods
that allow for the prediction of the dependent
variable given the values of the independent
variable(s).

 If two variables are significantly correlated, then


we can predict the value of one variable (Y) in
terms of the other variable (X).
Simple Linear Regression

In this lesson, we want to express the linear relationship


between X and Y in terms of a mathematical equation
and understand how this equation is used in the
prediction. We want to find an equation that “best fits” the
relationship between X and Y.

A. Simple linear regression - one independent variable X is


used to predict the dependent variable Y.

Y  a  bX
Simple linear regression - one independent variable X is used to
predict the dependent variable Y. The following is the equation:

Y  a  bX
a = is the y-intercept of the regression line
b = is the regression coefficient
■ 1. When b > 0, this indicates that X and Y are directly linearly related.
■ 2. When b < 0, this indicates that X and Y are inversely linearly related.
■ 3. When b = 0, this indicates that X and Y are not linearly related.
n n n

n x y x y
i 1
i i
i 1
i
i 1
i

a  y  bx b 2
n
 n

n  i 1
xi  
2


i 1
xi 

Note: The regression coefficient b represents the change in the response variable
resulting from a unit change in the predictor variable.
Example on Simple Linear Regression: The
relationship between sales calls and number of
copiers sold.
No. of sales No. of
Based on the given data: calls copiers sold
a) Determine the equation of 9 3
the simple linear regression. 25 6
15 4
b) Predict the number of 20 6
copier machines to be sold if 7 3
a sales representative makes 10 4
40 sales calls in a month. 17 4
20 5
c) Test for the significance of 13 3
the regression equation 30 7
a) Determine the equation of the simple linear regression.
No. of sales calls No. of copiers sold
2 2
(X ) ( Y) x y xy
9 3 81 9 27
25 6 625 36 150
15 4 225 16 60
20 6 400 36 180
7 3 49 9 21
10 4 100 16 40
17 4 289 16 68
20 5 400 25 100
13 x
3 169 9 39
30 7 900 49 210
10 10 10 10 10

x
i 1
i  166 y i 1
i  45
i 1
2
xi  3,238 
i 1
y i2  221 x y
i 1
i i  835

n n n

n x y x y i i i i

=0.182 a  y  bx
i 1 i 1 i 1
b 2 = 1.472
n
 n

n  i 1
xi  
2


i 1
xi 

Therefore, the equation of the simple linear regression is Y = 1.472 + 0.182 X


Based on b: If the number of sales calls is increased by one extra call, then the
model predicts that 0.182 additional copier will be sold.
EXCEL: How to display simple linear regression equation in the
Scatter Plot

1. Open Microsoft Excel.


Encode the bivariate data
separately into two columns in
the spread sheet. Highlight the
data.
2. Go to Insert. Click Scatter in
the tool bar.
3. Click the figure – Chart
Tools: Design – Add Chart
Element
4. Add Chart Element –
Trendline – More Trendline
Options-Display Equation on
Chart.
5. Check Linear.
6. Check Display Equation on
Chart
b) We predict the number of copier machines to be sold if a sales
representative makes 40 sales calls in a month by substituting x = 40
in the equation of the regression line

Y = 1.472 + 0.182 X
Y  1.472  0.18240  8.752  9

Therefore approximately 9 copier machines are expected to be sold if a


sales representative makes 40 sales calls in a month.
Testing for the significance of the linear relationship between the
dependent and independent variables
EXCEL
1. Open Microsoft Excel. Encode
the bivariate data separately
into two columns in the spread
sheet.
2. Select Data – Data Analysis –
Regression
3. In the dialogue box (right), enter
the following:
Input Y Range:
Input X Range:
Click Labels.
Confidence Level: 95%
Output Range: select any cell where
you want to display the output
4. Click OK.
EXCEL OUTPUT
The regression output has three components:
1. Regression statistics table
2. ANOVA table
3. Regression coefficients table.
Interpretation
Multiple correlation coefficient -
In simple regression, this number
is the absolute value of Pearson’s
correlation coefficient.
Multiple R: 0.9315 indicates a strong
correlation.
R Square - R2 tells us how much of the variation in Y is
accounted for by the regression model from our sample.
Our model, which includes only sales calls, can explain approximately 86.77% of the
variation in the number of copiers sold. This means that 13.23% of the variation in the
number of copiers sold cannot be explained by sales calls alone. Therefore, there
must be other variables that have an influence also.

Adjusted R-squared - a modified version of R-squared that has


been adjusted for the number of predictors in the model. The adjusted R2
value tells us how much variance in Y would be accounted for if the model
had been derived from the population from which the sample was taken.
Standard Error - an estimate of the variation of the observed dependent
variable about the regression line.
■ ANOVA Table – provides a test of whether if the regression
equation is significant. It tells us whether the model, overall,
results in a significantly good degree of prediction of the
outcome variable.

The F-test of overall significance indicates whether the


regression model overall predicts the dependent variable
significantly well.
The F-test for overall significance has the following two hypotheses:

•The null hypothesis states that the model with no independent variables
fits the data as well as your model. (There is no significant linear
relationship between the independent variable X and the dependent variable
Y.)

•The alternative hypothesis says that your model fits the data better than
the intercept-only model. (There is a significant linear relationship between
the independent variable X and the dependent variable Y.)

Since the Significance F (or known as p-value) 0.00008 is less than 0.05, Ho is
rejected. The regression model overall predicts the dependent variable
significantly well.
Note: If a predictor is having a significant impact on our ability to predict the
outcome then this b should be significantly different from 0 (by performing a
t-test on b).

Regression Coefficient Table

So the simple linear regression equation to predict the number of copiers sold
is

no. of copiers sold(Y) = 1.472 + 0.182*sales_calls(X)

Test for the significance of the regression coefficient b


Ho: There is no significant linear relationship between X and Y. H0: β=0
H1: There is a significant linear relationship between X and Y. H1: β≠0

Since the p-value for sales calls (0.00008) is less than 0.05, Ho is rejected.
We can conclude that sales calls make a significant contribution to predicting
the number of copiers sold.
Multiple linear
regression
B. Multiple Linear Regression – an extension of
the simple linear regression in which more than
one independent variable (X) is used to predict a
single dependent variable (Y).

Y = β0+β1X1+ β2X2+ β3X3…+ βKXK+ ε


Example on Multiple Linear Regression: Suppose 4 independent
variables (number of sales calls X1, age of sales representative X2,
gender X3, and years of service X4) are included to predict the
number of copiers sold Y.

Gender
No. of copiers No. of sales Age of sales 1 = Male Years of
sold calls representative 0 = Female service
3 9 24 0 5
6 25 30 1 10
4 15 27 1 5
6 20 26 0 7
3 7 21 0 6
4 10 25 0 8
4 17 26 1 7
5 20 24 0 12
3 13 22 1 5
7 30 25 1 12

Perform regression analysis using 0.05 level of significance.


EXCEL OUTPUT
Our model, which includes the 4 Regression Statistics
independent variables, can explain Multiple R 0.97705395
approximately 95.46% of the R Square 0.954634421
variation in number of copiers sold. Adjusted R Square 0.918341958
This means that only 4.54% of the Standard Error 0.409698233
variation in the number of copiers Observations 10
sold cannot be explained by the
model.

ANOVA
df SS MS F Significance F
Regression 4 17.66074 4.415184 26.30393 0.001484493
Residual 5 0.839263 0.167853
Total 9 18.5

Since the Significance F (or known as p-value) 0.001 is less than 0.05, Ho is
rejected. The regression model overall predicts the dependent variable
significantly well.
Regression Coefficient Table
Coeffi Standard Upper Lower Upper
cients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 0.0437 1.6063 0.0272 0.9794 -4.0856 4.1729 -4.0856 4.1729
No. of sales
calls 0.2184 0.0450 4.8486 0.0047 0.1026 0.3342 0.1026 0.3342
Age 0.0685 0.0685 1.0002 0.3631 -0.1076 0.2446 -0.1076 0.2446
Gender -1.0124 0.3680 -2.7513 0.0403 -1.9583 -0.0665 -1.9583 -0.0665
Years of service -0.0488 0.0969 -0.5037 0.6359 -0.2978 0.2002 -0.2978 0.2002

So, the general form of the equation to predict the number of copiers sold is:

no. of copiers sold = 0.04 + 0.22*sales calls + 0.07*age - 1.01*gender - 0.04*yrs


of service

Note: Regression coefficients indicate how much the dependent variable varies
with an independent variable when all other independent variables are held
constant. Example: The regression coefficient, B1, for no. of sales calls is equal to
0.2184. This means that for each one sales call, there is an increase of 0.2184 in
the number of copiers sold.
The t-tests measures whether the predictor is making a significant contribution
to the model. Therefore, if the t-test associated with a b-value is significant (if
the value in the column labelled Sig. is less than .05) then the predictor is
making a significant contribution to the model.

Coeffi Standard Upper Lower Upper


cients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 0.0437 1.6063 0.0272 0.9794 -4.0856 4.1729 -4.0856 4.1729
No. of sales
calls 0.2184 0.0450 4.8486 0.0047 0.1026 0.3342 0.1026 0.3342
Age 0.0685 0.0685 1.0002 0.3631 -0.1076 0.2446 -0.1076 0.2446
Gender -1.0124 0.3680 -2.7513 0.0403 -1.9583 -0.0665 -1.9583 -0.0665
Years of service -0.0488 0.0969 -0.5037 0.6359 -0.2978 0.2002 -0.2978 0.2002

no. of copiers sold = 0.04 + 0.22*sales calls + 0.07*age - 1.01*gender - 0.04*yrs


of service

The individual t test of each coefficient b shows that the number of sales calls
and gender of sales representative (except Age and years of service) make a
significant contribution to predicting the number of copiers sold .
Optional
SPSS: Analyze  Regression  Bivariate

You might also like