KEMBAR78
Advanced Regression in Excel S | PDF | Multicollinearity | Regression Analysis
0% found this document useful (0 votes)
420 views53 pages

Advanced Regression in Excel S

Advanced Regression in Excel S

Uploaded by

Ivan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
420 views53 pages

Advanced Regression in Excel S

Advanced Regression in Excel S

Uploaded by

Ivan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Advanced Regression in Excel

The Excel Statistical Master

Advanced Regression
in Excel
The Excel Statistical Master
By Mark Harmon
Copyright 2011 Mark Harmon
No part of this publication may be reproduced
or distributed without the express permission
of the author.
mark@ExcelMasterSeries.com
www.ExcelMasterSeries.com
ISBN: 978-0-9833070-6-8

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 1

Advanced Regression in Excel

The Excel Statistical Master

Table of Contents
Click on Entries to Go To Each
Using Dummy Variable Regression in Excel To Perform Conjoint Analysis 6
Step-By-Step Video Showing How To Perform Conjoint Analysis Using
Dummy Variable Regression in Excel In Order To Find Out Which
Product Attributes Your Customers Value The Most....................................... 7
The 6 Steps of Performing Conjoint Analysis.................................................... 8
Step 1) List All Product Attributes For 1 Product ......................................... 8
Step 2) Make a List of All Possible Combinations of Those Attributes .. 9
Step 3) Have Consumer Rate Each Attribute Combination...................... 10
Step 4) Prepare Completed Survey for Regression.................................... 11
Dummy Variables to Be Removed From Input Data To Prevent
Collinearity......................................................................................................... 11
Step 5) Run Regression in Excel ..................................................................... 11
Step 6) Derive Attribute Utilities From Regression Output ...................... 12
An Example of Using a Dummy Variable........................................................... 13
The Problem of Collinearity - and How To Solve It......................................... 14
The Product Utilities - The Measure of Customer Liking .............................. 14

How To Quickly Read the Output of Regression in Excel ................................ 16


Step-By-Step Video About How To Quickly Read and Understand the
Output of Excel Regression .................................................................................. 17
The 4 Most Important Parts of Regression Output ......................................... 17
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 2

Advanced Regression in Excel


1)

The Excel Statistical Master

Overall Regressions Accuracy................................................................ 18


R Square ............................................................................................................. 18
Adjusted R Square........................................................................................... 18

2)

Probability That This Output Was Not By Chance.............................. 19


Significance of F .............................................................................................. 19

3) Individual Regression Coefficient Accuracy ........................................... 20


P-value of each coefficient and the Y-intercept....................................... 20
4) Visual Analysis of Residuals........................................................................ 21
Charting the Residuals ................................................................................... 21
The Residual Chart .......................................................................................... 22

Logistic Regression Analysis in Excel .................................................................. 23


Customer Quality Scores Are Created With Logistic Regression.............. 23
Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using
Logistic Regression in Excel:............................................................................... 24
What is Logistic Regression? .............................................................................. 24
An Example of Logistic Regression In Action ................................................. 25
Create the Predictive Equation ........................................................................ 26
The Logit................................................................................................................. 26
Calculating the Logit Variables - A, B, and Constant................................. 28
Optimizing the Logit Variables in the Excel Solver .................................... 28
The Final, Most Accurate Predictive Equation............................................. 30
You'll Have To Tweek the Constraints in the Excel Solver....................... 31

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 3

Advanced Regression in Excel

The Excel Statistical Master

The Four Steps of Regression in Excel (Including 2 Crucial Ones Always


Skipped).......................................................................................................................... 33
Step-By-Step Video Showing How To Do All 4 Steps of Regression in
Excel, Including the 2 Crucial Initial Steps That No One Does.................... 34
Crucial Step 1) Graphing the Data....................................................................... 35
Crucial Step 2) Running Correlation Analysis on All Variables
Simultaneously ......................................................................................................... 36
Remove Input Variables That Have Low Correlation With Output
Variable ................................................................................................................... 36
Remove Inputs Variables Highly Correlated With Other Input Variables
................................................................................................................................... 37
Adding New Input Variables To The Regression Analysis ....................... 38
Step 3 Run the Regression in Excel ................................................................ 39
Step 4) Analysis of Excel Output........................................................................ 40

How To Do Nonlinear Regression Using the Excel Solver............................... 41


The Solver dialogue box has the following 4 parameters that need to be
set: ............................................................................................................................... 45
Objective: ............................................................................................................... 46
Decision Variables:.............................................................................................. 46
Constraints: ........................................................................................................... 46
Selection of Solving Method: GRG Nonlinear.............................................. 46
Solver Tips ................................................................................................................. 50
Initial Solver Settings:......................................................................................... 50
Show Iteration Results:. ................................................................................. 50
Use Automatic Scaling:. ................................................................................. 50
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 4

Advanced Regression in Excel

The Excel Statistical Master

Assume Non-Negative:................................................................................... 50
Bypass Solver Reports:. ................................................................................ 50

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 5

Advanced Regression in Excel

The Excel Statistical Master

Using Dummy Variable


Regression in Excel
To Perform Conjoint Analysis
Dummy Variable Regression is a great tool for business managers. Dummy
Variable Regression, for example, provides the means to perform very useful
analysis such as Conjoint Analysis. Conjoint analysis quantifies how desirable
each product attribute choice is relative to the other available choices for a single
product. In other words, the marketer learns which product choices a consumer
values most and by how much. In this article and the linked video, you will learn
exactly how to perform Conjoint Analysis in Excel using Dummy Variable
Regression. That may sound like advanced stuff but its really quite a bit simpler
than you might imagine.

The video on the next page will make the entire procedure of Dummy Variable
Regression in Excel to perform Conjoint Analysis much easier to understand:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 6

Advanced Regression in Excel

The Excel Statistical Master

Step-By-Step Video Showing How To Perform Conjoint Analysis Using


Dummy Variable Regression in Excel In Order To Find Out Which Product
Attributes Your Customers Value The Most

Instructional Video
Go to
http://www.youtube.com/watch?v=EMbiGPGlBEM
to View a
Video From Excel Master Series
About How To Use
Dummy Variable Regression
in Excel To Perform
Conjoint Analysis
(Is Your Internet Connection and Sound Turned On?)

The ultimate objective of Conjoint Analysis is quantify the consumers degree of


liking for each of the choices for one product. The Utility of an attribute is the
value associated with the consumers degree of liking for that choice.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 7

Advanced Regression in Excel

The Excel Statistical Master

The 6 Steps of Performing Conjoint Analysis


A brief explanation of how Conjoint Analysis and Dummy Variable Regression
are used together to arrive at the Utility for each product attribute is as follows
and also in the linked video above:

Step 1) List All Product Attributes For 1 Product


The marketer lists all of the available choices that a consumer has for one
product. The marketer starts by listing all of the overall attribute categories such
as color and add-ons. The marketer then lists all of the available choices within
each attribute category. For example, here the marketer would be listing all
available colors and add-ons.

List Of All Product Attributes

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 8

Advanced Regression in Excel

The Excel Statistical Master

Step 2) Make a List of All Possible Combinations


of Those Attributes
The marketer then creates a list of all possible combinations of choices available
to the consumer for that one product.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 9

Advanced Regression in Excel

The Excel Statistical Master

Step 3) Have Consumer Rate Each Attribute


Combination
This list of all possible combinations is handed to the consumer. The consumer
rates each combination on a scale of 1 (least desirable) to 10 (most desirable).

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 10

Advanced Regression in Excel

The Excel Statistical Master

Step 4) Prepare Completed Survey for Regression


The survey results are arranged so that Dummy Variable Regression can be run
on them. Each product choice is assigned its own Dummy Variable and one
Dummy Variable from each overall attribute category is removed. This will be
explained below and also in more detail in the linked video.
Dummy Variables in a regression are variables that can only assume two values.
One Dummy Variable must be created for each product choice.

Dummy Variables to Be Removed From Input Data To Prevent Collinearity

Step 5) Run Regression in Excel


Dummy Variable Regression is then run on the survey results data.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 11

Advanced Regression in Excel

The Excel Statistical Master

Step 6) Derive Attribute Utilities From Regression


Output
The Utility for each product attribute is derived directly from the coefficients of the
resulting regression equation.

Excel Regression Output

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 12

Advanced Regression in Excel

The Excel Statistical Master

How To Derive The Utilities From the Output

An Example of Using a Dummy Variable


For example, if the product comes only in the colors red and white, There will be
a Dummy Variable for red and one for white. The Dummy Variable for the color
red can take values of only 1 or 0 because the product will either be red or not.
The same applies for the white Dummy Variable, and all other dummy variables.
When the survey is returned, the survey data is converted into the proper layout
for the Regression function in Excel. Each Dummy Variable assigned to a
specific attribute will be assigned the value of 0 or 1, depending on whether that
attribute was an element of the combination that is currently being rated.
Watching this done in the linked video is probably the easiest way to understand
how to do it.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 13

Advanced Regression in Excel

The Excel Statistical Master

The Problem of Collinearity - and How To Solve It


One problem can occur when Dummy Variables are inputs to a regression. The
problem of Collinearity or Multicollinearity occurs when any independent variable
can be used to predict the value of any other independent variable. For example,
if the product comes in only red or white, you can predict whether the product is
red if you know whether or not the product is white. This is Collinearity.
Collinearity and Multicollinearity are corrected by removing one Dummy Variable
from each choice category. For example, if color choices are red or white, the
Dummy Variable for one of those colors would be removed. Collinearity is then
solved. You cannot predict whether of not the product is red if you do not know
whether the product is white (because the Dummy Variable for white has been
removed).
The data can now be run as a regular regression using Excels regression tool.
The linked video shows how to do this in detail.
The regression is run and a regression equation is obtained.

The Product Utilities - The Measure of Customer Liking


The Utilities of each of the product choices are set to equal the value of the
coefficients of the regression equation. The Utility is the degree of liking that the
consumer attached to that product choice.
For example, the marketer will find out how important the color red was
compared to each of the other product choices during the purchase decision.
Utilities of product choices that were associated with the Dummy Variables that
were removed to prevent collinearity will be assigned the value of 0.
We now have Utilities for each attribute. Now, the overall attractiveness of a
particular combination of choices can be calculated by adding up the individual
Utilities associated with the each of the choices. The sum of the Utilities for each
combination is the regressions prediction of consumers degree of liking for that
combination of product choices.
The removal of the individual Dummy Variables does not affect the accuracy or
completeness of the answer. Adding up the Utilities for each combination will
produce a figure that will be very close to the consumers actual rating for that
combination. An example of this is shown in the video.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 14

Advanced Regression in Excel

The Excel Statistical Master

Showing the Regression Equation Predicts Nearly the Same Score as the
Customer's Ranking of Card 13, Even Though Dummy Variables Were
Removed

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 15

Advanced Regression in Excel

The Excel Statistical Master

How To Quickly Read


the Output of Regression
Analysis Done in Excel
There is a lot more to the Excel Regression output than just the regression
equation. If you know how to quickly read the output of a Regression done in,
youll know right away the most important points of a regression: if the overall
regression was a good, whether this output could have occurred by chance,
whether or not all of the independent input variables were good predictors, and
whether residuals show a pattern (which means theres a problem).

Excel Regression Output With Color-Coding Added

This video will illustrate exactly how to quickly and easily understand the output
of Regression performed in Excel:
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 16

Advanced Regression in Excel

The Excel Statistical Master

Step-By-Step Video About How To Quickly Read and Understand the


Output of Excel Regression

(Is Your Sound and Internet Connection Turned On?)

The 4 Most Important Parts of Regression Output


1) Overall Regression Equations Accuracy
(R Square and Adjusted R Square)

2) Probability That This Output Was Not By Chance


(ANOVA Significance of F)

3) Individual Regression Coefficient and Y-Intercept Accuracy


Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 17

Advanced Regression in Excel

The Excel Statistical Master

4) Visual Analysis of Residuals

Some parts of the Excel Regression output are much more important than
others. The goal here is for you to be able to glance at the Excel Regression
output and immediately understand it, so we will focus our attention only on the
four most important parts of the Excel regression output.

1) Overall Regressions Accuracy

R Square
This is the most important number of the output. R Square tells how well the
regression line approximates the real data. This number tells you how much of
the output variables variance is explained by the input variables variance.
Ideally we would like to see this at least 0.6 (60%) or 0.7 (70%).

Adjusted R Square
This is quoted most often when explaining the accuracy of the regression
equation. Adjusted R Square is more conservative the R Square because it is
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 18

Advanced Regression in Excel

The Excel Statistical Master

always less than R Square. Another reason that Adjusted R Square is quoted
more often is that when new input variables are added to the Regression
analysis, Adjusted R Square increases only when the new input variable makes
the Regression equation more accurate (improves the Regression equationss
ability to predict the output). R Square always goes up when a new variable is
added, whether or not the new input variable improves the Regression equations
accuracy.

2) Probability That This Output Was Not By


Chance

Significance of F
This indicates the probability that the Regression output could have been
obtained by chance. A small Significance of F confirms the validity of the
Regression output. For example, if Significance of F = 0.030, there is only a 3%
chance that the Regression output was merely a chance occurrence.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 19

Advanced Regression in Excel

The Excel Statistical Master

3) Individual Regression Coefficient Accuracy

P-value of each coefficient and the Y-intercept


The P-Values of each of these provide the likelihood that they are real results
and did not occur by chance. The lower the P-Value, the higher the likelihood
that that coefficient or Y-Intercept is valid. For example, a P-Value of 0.016 for a
regression coefficient indicates that there is only a 1.6% chance that the result
occurred only as a result of chance.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 20

Advanced Regression in Excel

The Excel Statistical Master

4) Visual Analysis of Residuals


Charting the Residuals

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 21

Advanced Regression in Excel

The Excel Statistical Master

The Residual Chart

The residuals are the difference between the Regressions predicted value and
the actual value of the output variable. You can quickly plot the Residuals on a
scatterplot chart. Look for patterns in the scatterplot. The more random (without
patterns) and centered around zero the residuals appear to be, the more likely it
is that the Regression equation is valid.
There are many other pieces of information in the Excel regression output but the
above four items will give a quick read on the validity of your Regression.

Hand Calculation of Regression Problems


Go To
http://excelmasterseries.com/Excel_Statistical_Master/Regression.php
To View How To Solve Regression Problems By Hand (No Excel)
(Is Your Internet Connection Turned On ?)
You'll Quickly See Why You Always Want To Use Excel To Solve Statistical
Problems !

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 22

Advanced Regression in Excel

The Excel Statistical Master

Logistic Regression Analysis in


Excel
Wouldnt it be great if there was a more accurate way to predict whether your
prospect will buy rather than just taking an educated guess? Well, there isif
you have enough data on your previous prospects. The tool that makes this
possible is called Logistic Regression and can be easily implemented in Excel.

Customer Quality Scores Are Created With


Logistic Regression
Marketers use Logistic Regression to rank their prospects with a quality score
which indicates that prospects likelihood to buy. The more data youve collected
from previous prospects, the more accurately youll be able to use Logistic
Regression in Excel to calculate your new prospects probability of purchasing.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 23

Advanced Regression in Excel

The Excel Statistical Master

Step-By-Step Video Showing How To Predict if a Prospect Will


Buy Using Logistic Regression in Excel:

Instructional Video
Go to
http://www.youtube.com/watch?v=NHOO7iceJrw
to View a
Video From Excel Master Series
About How To Use
Logistic Regression
in Excel To Predict of Your
Next Prospect
WILL BUY! (or not !#!$%!)
(Is Your Internet Connection and Sound Turned On?)

What is Logistic Regression?


Logistic Regression calculates the probability of the event occurring, such as the
purchase of a product. In general, the thing being predicted in a Regression
equation is represented by the dependent variable or output variable and is
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 24

Advanced Regression in Excel

The Excel Statistical Master

usually labeled as the Y variable in the Regression equation. In the case of


Logistic Regression, this Y is binary. In other words, the output or dependent
variable can only take the values of 1 or 0. The predicted event either occurs or it
doesnt occur your prospect either will buy or wont buy. Occasionally this type
of output variable also referred to as a Dummy Dependent Variable.

An Example of Logistic Regression In Action


Here is a marketing example showing how Logistic Regression works. The
embedded video walks through this example in Excel as well:

Suppose that you have collected three pieces of data on each of your previous
prospects. The data you have collected on each prospect was:

1) The prospects age


2) The prospects gender (1 = Male and 0 = Female)
3) Whether the prospect purchased or not (Did purchase Y = 1, Did not
purchase, Y = 0).

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 25

Advanced Regression in Excel

The Excel Statistical Master

Create the Predictive Equation


With the above data, you could create a predictive equation that would calculate
a new prospects probability of purchasing by inputting this new prospects age
and gender. This predictive equation will be in the form of:
P(X) = eL/ (1+eL)

P(X) represents the possibility of event X occurring.

The Logit
Event X is a purchase. In other words, P(X) is the probability that Y = 1.

P(X) has only one variable. That is L, which is called the Logit.

The Logit, L = Constant + A * Age + B * Gender

L, the Logit, has 3 variables: Constant, A, and B. They must be known before
P(X) can be calculated. Those 3 variables can be found in Excel by using the
Excel Solver. The Excel Solver will find the optimal combination of those 3
variables that causes the resulting P(X) to most accurately predict whether Y = 1
or 0 for all previous prospects.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 26

Advanced Regression in Excel

The Excel Statistical Master

Everything To the Right of the Above Is Continued As Follows:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 27

Advanced Regression in Excel

The Excel Statistical Master

Calculating the Logit Variables - A, B, and


Constant
Heres how the most optimal set of Logit variables (Constant, A, and B) are found
in Excel:

Using Excel, each recorded prospect has the following calculation performed:
P(X)Y * [ 1 - P(X) ] (1-Y)

The Y refers to Y = 1 if the prospect bought and Y = 0 if the prospect didnt buy.

The P(X) is the probability of purchase that will be calculated using the equation
listed above. In Excel, the P(X) calculation is initially performed by the Excel
Solver using Logit variables (Constant, A, and B) which are not optimal. The
Excel Solver will then continuously try new combinations of these variables until
the optimal P(X) is found.

Optimizing the Logit Variables in the Excel Solver


Heres how the Excel Solver knows when it has found the correct combinations
of these 3 variables so that the resulting P(X) equation most accurately predicts
whether Y = 1 or 0:
The equation P(X)Y * [ 1 - P(X) ] (1-Y) is maximized when P(X) is most accurate. It
approaches it highest value (1) when Y = 1 and P(X) approaches 1. It also
approaches its highest value (1) when Y = 0 and P(X) approaches 0. When Y = 1
and P(X) = 1, that is a 100% correct prediction by P(X) that Y = 1. When Y = 0
and P(X) = 0, that is a 100% correct prediction by P(X) that Y = 0.
Each prospect has a separate P(X)Y * [ 1 - P(X) ] (1-Y) value calculated for him or
her.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 28

Advanced Regression in Excel

The Excel Statistical Master

The sum of each P(X)Y * [ 1 - P(X) ] (1-Y)calculation for all prospects is taken.
The only variables that exist when calculating P(X)Y * [ 1 - P(X) ] (1-Y)are Y and
the variables of P(X), which are Constant, A, and B. Use the Excel Solver, these
variable are adjusted until their values maximize the sum of all
P(X)Y * [ 1 - P(X) ] (1-Y)

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 29

Advanced Regression in Excel

The Excel Statistical Master

The Final, Most Accurate Predictive Equation


When the sum of P(X)Y * [ 1 - P(X) ] (1-Y) is maximized, then the final resulting
P(X) equation is as accurate as possible at predicting whether Y will be 1 or 0.

The Excel Solver Dialogue Box

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 30

Advanced Regression in Excel

The Excel Statistical Master

Stated another way, we now have a predictive equation P(X ) which uses the
optimal combination of Constant, A, and B which most accurately calculates the
probability that Y = 1 given a prospects age and gender.

The embedded video provides a clear picture of all of this in action in Excel.

The use of the Excel Solver does require some hand-tweeking to ensure that the
most accurate answer is obtained. The video shows an example of this.
Ultimately what the Solver is doing is adjusting variables Constant, A, and B to
maximize the sum of the column of
P(X)Y * [ 1 - P(X) ] (1-Y) equations. The answer obtained by the Solver should
maximize that sum and provide realistic answers for the probabilities of each
prospect, including the new one.

You'll Have To Tweek the Constraints in the Excel


Solver
Youll probably find that you have to experiment by applying constraints to the
variables that Solver is adjusting in order to maximize the target sum. The
variables that Solver adjusts are called Decision Variables. Solver allows you to
create constraints on the value of any Decision Variable.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 31

Advanced Regression in Excel

The Excel Statistical Master

Adding a Constraint to the Solver

In the video, you will be able to watch how a Decision Variable is constrained to
make the final answer more accurate. The Decision Variable called Constant was
constrained to always remain above -25 during the Solver analysis. This resulted
in the most accurate and realistic maximization of the sum of the
P(X)Y * [ 1 - P(X) ] (1-Y) equations.

Conclusion Logistic Regression in Excel Is an


Incredible Predictor but Not the Simplest Analysis
Logistic Regression is not the simplest type of analysis to understand or perform.
Hopefully this article and video have provided a much clearer picture for you.
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 32

Advanced Regression in Excel

The Excel Statistical Master

The Four Steps of Regression in


Excel
(Including Two Crucial Steps That Most People
Skip)

Running a Regression in Excel is fairly easy. So is running one incorrectly. There


are two crucial steps that should always be performed on the data before any
Regression should be run. Fortunately these two steps are very quick and easy
to do in Excel. They are:
1) Graph the Data
2) Run Correlation Analysis On All Variables
Following is a video of this article showing how to perform all four steps to
Regression in Excel, including the above two crucial steps at the beginning:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 33

Advanced Regression in Excel

The Excel Statistical Master

Step-By-Step Video Showing How To Do All 4 Steps of Regression in Excel,


Including the 2 Crucial Initial Steps That No One Does, But Should

(Is Your Sound and Internet Connection Turned On?)

Why You Need To Run The 2 Crucial Steps Before


Doing Regression
Heres why you need to run the two crucial steps prior to regressing any data in
Excel:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 34

Advanced Regression in Excel

The Excel Statistical Master

Crucial Step 1) Graphing the Data


Whether or not you are using Excel to run a Regression, you should always
graph the data before doing anything else. Eyeballing the data will allow you to
quickly determine whether there is any relationship between the independent
(input) variables and the dependent (output) variable. You also want to evaluate
whether the graph generally appears to be linear or possibly quadratic. Excels
Regression Tool works well only for reasonably linear data. Eyeballing the data
upfront will tell you very quickly whether Excels Linear Regression is the right
tool for the job.

Graphing The Data To Check If It Is Linear

The input and output variables will be graphed together. The y-axis of the chart
will provide the scale for plotting of those values. The x-axis will provide a
measure of whatever continuum was used, e.g. time, to collect the values of all of
the variables. Excels charting function is the way to go here. The above linked
video shows exactly how to chart all the data in Excel.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 35

Advanced Regression in Excel

The Excel Statistical Master

Crucial Step 2) Running Correlation Analysis on


All Variables Simultaneously
There are two good reasons for doing this. First, we want to remove any input
variables which are clearly not good predictors of the output variable. Second, we
want to make sure that none of the input variables have a high correlation with
(are good predictors of) other input variables.

Running Correlation Analysis on the Data To Prevent Collinearity and also


To Remove Input Variables That Have Low Correlation With the Output
Variable

Correlation of multiple variables is easily done in Excel using the Correlation


Data Analysis tool. The linked video shows exactly how to do that.

Remove Input Variables That Have Low Correlation With


Output Variable
After you have run Correlation Analysis on the data, you will want to remove any
input variables that have a low correlation with the output variable. A Correlation
Coefficient of with an absolute value of less than 0.4 (between -0.4 and +0.4)
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 36

Advanced Regression in Excel

The Excel Statistical Master

between the output variable and an input variable indicates that the input variable
is not a good predictor of the output. That input variable should be removed from
the Regression Analysis. The attached video provides an example of this.

Data Columns Before Removing Input Variable With Low Correlation To


Output

Data Columns After Removing Input Variables With Low Correlation To


Output

Remove Inputs Variables Highly Correlated With Other


Input Variables
After looking at the Correlation Coefficients between the input and output
variables, look at the Correlation Coefficients between the input variables
themselves. You do not want to use pairs of input variables that are good
predictors of each other in a Regression. This will cause a Regression error
known as Collinearity or Multicollinearity. One variable from any pair of highlycorrelated input variables should be removed prior to running the Regression
Analysis. Variables can be considered highly-Correlated if the absolute value of
their Correlation Coefficient is greater the 0.7 (greater than +0.7 or less than 0.7).

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 37

Advanced Regression in Excel

The Excel Statistical Master

Adding New Input Variables To The Regression Analysis


Here are a few hints about adding new input variables to a Regression Analysis.
First, build up a Regression by starting with a small number of input variables
and add any new ones one at a time. Second, good new input variables
noticeably increase Adjusted R Square and also lower Standard Error without
significantly changing the existing Regression Coefficients.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 38

Advanced Regression in Excel

The Excel Statistical Master

Step 3 Run the Regression in Excel


When you are satisfied with the output of the data graph and the Correlation
Analysis, go ahead and run the Regression with Excel. An example of how to do
this is shown in the above video.

The Excel Regression Dialogue Box

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 39

Advanced Regression in Excel

The Excel Statistical Master

Final Step 4) Analysis of Excel Output


The final step of Excel Regression is Analysis of the Excel output. Please refer to
the chapter of this manual that goes into detail about how to quickly read and
understand the output of regression done in Excel.

Excel Regression Output With Color Coding Added

Conclusion - Plotting the Data and Running


Correlation Can Be BIG Time Savers
Plotting the data and running Correlation Analysis prior to running a Regression
can save you lots of time that you might otherwise have to spend making
adjustments to your Regression after running it.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 40

Advanced Regression in Excel

The Excel Statistical Master

Using How To Do Nonlinear


Regression Using the Excel Solver
Excel Solver is one of the best and easiest curve-fitting devices in the world, if
you know how to use it. Its curve-fitting capabilities make it an excellent tool to
perform nonlinear regression. The Excel Solver will find the equation of the linear
or nonlinear curve which most closely fits a set of data points.
One very important caveat must be added: the user must first determine the
general type of the curve and input that information into Solver at the start. This
information is in the form of the general equation that defines the curve, such as
a0 + a1*x + a2*x2 = c or a*ln(xb) = c. Solver then calculates all needed variables
which produce the equation which most closely fits the data points. We will run
through an example here.

In this problem we are going to show how to use the Excel Solver to calculate an
equation which most closely describes the relationship between sales and
number of ads being run. The purpose of this equation is to be able to predict the
number of sales based upon the number of ads that will be run.
A marketing manager has collected this following data on the companys sales
vs. the number of ads that were running at different times.

Sales Number of Ads Running


50
55
59
62
75
95
110
125
140
180

6700
7500
8700
8900
8800
10900
11200
11400
11500
12300

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 41

Advanced Regression in Excel

The Excel Statistical Master

Here is an Excel scatter plot of that data:

We would like to create an equation from this data that allows us to predict the
sales based upon the number of ads currently running.

The first step is to eyeball the data and estimate what general type of curve this
graph probably is. In this case it appears to a graph the has a diminishing y value
for an increasing x value. A formula for such a curve would have the general
form:
Y = A1 + A2 * XB1
Sales = A1 + A2 * (Number of Ads Running)B1
We can use the Excel Solver to solve for A1, A2, and B1. We need to arrange
the data in a form that can be input into the Excel Solver as follows:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 42

Advanced Regression in Excel

The Excel Statistical Master

This table shows the arrangement of data and the calculations. Here we have
created an Excel model based upon our model of:
Sales = A1 + A2 * (Number of Ads Running)B1

One example of this formula in action is explained for Cell E16. We are listing the
variable that we are solving for (A1, A2, and B1) in cells B3 to B5. In Solver
language, these solves that we are changing are called Decision Variables.

We have arbitrarily set our Decision Variables for:


A1 = 100
A2 = 100
B1 = 0.05

We now take the difference between the actual number of sales and the number
of sales predicted by our model with our arbitrary settings for the Decision
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 43

Advanced Regression in Excel

The Excel Statistical Master

Variables. The square of each difference is taken and then all squares are
summed up.

We are trying to find the settings for the Decision Variables that will minimize the
sum of the squares of the differences. In other words, we are trying to find A1,
A2, and B1 that will minimize the number in cell G13.

Once the Solver has been installed as an add-in (To add-in Solver: File /
Options / Add-Ins / Manage / Excel Add-Ins / Go / Solver Add-In), you can
access the Solver in Excel 2010 by: Data / Solver.

The following blank Solver dialogue box comes up:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 44

Advanced Regression in Excel

The Excel Statistical Master

The Solver dialogue box has the following 4 parameters that need to be set:

1) The Objective Cell This is the target cell that we are either trying to
maximize, minimize, or achieve a certain value.

2) Minimize or Maximize the Target, or attempt to achieve a


certain value in the Objective cell.

3) Decision Variables A set of variables that will be changed by the


Excel Solver in order to optimize the target cell.

4) Constraints These are the limitations that the problem subjects the
Solver to during its calculations

Once again, here is the data table for Solver inputs:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 45

Advanced Regression in Excel

The Excel Statistical Master

Objective:
We are trying to minimize Cell G13, the sum of the square of differences
between the actual and predicted sales.

Decision Variables:
We are changing A1, A2, and B1 (cells B3 to B5) to minimize our Objective, Cell
G13. The Decision Variables are therefore Cells B3 to B5.

Constraints:
There are none for this curve-fitting operation.

Selection of Solving Method: GRG Nonlinear


The GRG Nonlinear method is used when the equation producing the objective is
not linear but is smooth (continuous). Examples of smooth nonlinear functions in
Excel are:

=1/C1, =Log(C1), and =C1^2

These functions have graphs that are curved (nonlinear), but have no breaks
(smooth)

Our sales equation appears to be smooth and non-linear:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 46

Advanced Regression in Excel

The Excel Statistical Master

Sales = A1 + A2 * (Number of Ads Running)B1


Here is the completed Solver dialogue box:

Here is a close-up of the Solver Objective, Decision Variables, and Constraints:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 47

Advanced Regression in Excel

The Excel Statistical Master

If we now hit the Solve button, we get the following result:

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 48

Advanced Regression in Excel

The Excel Statistical Master

Solver has optimized the Decision Variables to minimize the objective function as
follows:
A1 = -445,616
A2 = 437,247
B1 = 0.00911
The Objective is minimized to: 2,556,343

We can now create an Excel graph of the Actual Sales vs. the Predicted Sales as
follows:

Solver calculates that Sales can be predicted from Number of Ads Running by
the following equation:
Sales = -445616 + 437247 * (Number of Ads Running)0.00911

The trickiest part of this problem is the first step; eyeballing the data to
determine what kind of graph the data is arranged in. You should take time to
evaluate whether you are pursuing calculation of the correct curve type.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 49

Advanced Regression in Excel

The Excel Statistical Master

Solver Tips
You may notice that if you run this problem through the Solver multiple time, you
will get slightly different answers. Each time that you run Solvers GRG algorithm,
it will calculate different values for the Decision Variables. You are trying to find
the values for the Decision Variables that minimize the objective function (cell
G13) the most.

When the Solver runs the GRG algorithm, it picks a starting point for its
calculations. Each time you run the Solver GRG method a slightly different
starting point will be picked. That is why different answers will appear during
each run. Choose the Decision Variable value that occur during the run which
produces the lowest value of the Objective. Keep running the Solver until the
objective is not minimized anymore. That should give you the optimal values of
the Decision Variables. That was done in the example above.

Initial Solver Settings:


Here are some Solver settings that you want to configure prior to running the
Solver for most problems. These settings are found when you click the Options
button:

Show Iteration Results: Leave this unchecked. This stops the GRG Solver after
each iteration, displaying the result for that iteration. Very rarely is there a reason
for doing that.

Use Automatic Scaling: Leave this box unchecked. You would only use this
option if you had reason to believe that inputs of the Solver were measured using
different scales.

Assume Non-Negative: Only check this if you are sure that none of the
variables can ever be negative. In this case, that is clearly not the case.

Bypass Solver Reports: Leave this box unchecked. There is no advantage to


not having Solver reports for each Solver run.
Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 50

Advanced Regression in Excel

The Excel Statistical Master

Summary
Excel Solver is an easy-to-use and powerful nonlinear regression tool as a result
of its curve-fitting capacity. One use of this is to calculate predictive sales
equations for your company. It will work as long as you have properly determined
the correct general curve type in the beginning.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 51

Advanced Regression in Excel

The Excel Statistical Master

Meet the Author


Mark Harmon is a master number cruncher. Creating overloaded Excel spreadsheets
loaded with complicated statistical analysis is his idea of a good time. His profession as
an Internet marketing manager provides him with the opportunity and the need to
perform plenty of meaningful statistical analysis at his job.
Mark Harmon is also a natural teacher. As an adjunct professor, he spent five years
teaching more than thirty semester-long courses in marketing and finance at the AngloAmerican College in Prague and the International University in Vienna, Austria. During
that five-year time period, he also worked as an independent marketing consultant in the
Czech Republic and performed long-term assignments for more than one hundred clients.
His years of teaching and consulting have honed his ability to present difficult subject
matter in an easy-to-understand way.
Harmon received a degree in electrical engineering from Villanova University and MBA
in marketing from the Wharton School.

Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php

Page 52

You might also like