Using SPSS for Multiple Regression
UDP 520 Lab 7 Lin Lin December 4th , 2007
Step 1 Define Research Question
What factors are associated with BMI? Predict BMI.
Step 2 Conceptualizing Problem (Theory)
Individual Behaviors
BMI
Individual Characteristics
Environment
Step 2 Conceptualizing Problem (Theory)
Individual behaviors are associated with BMI. Individual characteristics are associated with BMI. Environment is associated with BMI.
Step 3 & 4 Operationalizing and Hypothesizing
Individual behaviors are associated with BMI.
Eating behavior: daily calorie intake is positively associated with BMI Exercising behavior: level of exercise is negatively associated with BMI.
Individual characteristics are associated with BMI.
Sex Income Education level Occupation
Environment is associated with BMI.
Physical environment Social environment
Step 5 Collecting Data
1000 adults aged 18+ (males and females) were recruited to study factors associated with BMI (BMI) Variables
BMI (before WLTP) Sex (female=1) individual characteristics Calorie (calorie intake daily) individual behaviors Exercise (minutes of exercise per week) individual behaviors Income (monthly salary in dollars $) individual characteristics Expenditure on food (monthly food expense in dollars $) individual behaviors Education (education level in years) individual characteristics Residential density (high, median, low) physical environment
Step 6 Developing OLS Equation
Multiple regression
YBMI = 0 + 1 xcalorie + 2 xexercise + 3 xsex + 4 xincome + 5 xeducation + 6 xbuilt environment +
OLS Equation for SPSS
Multiple regression Model 1
YBMI = 0 + 1 xcalorie + 2 xexercise + 4 xincome + 5 xeducation +
Using SPSS for Multiple Regression
SPSS Output Tables
Descriptive Statistics Mean BMI calorie exercise income education 24.0674 2017.7167 21.7947 2005.1981 19.95 Std. Deviation 1.28663 513.71981 7.66196 509.49088 3.820 N 1000 1000 1000 1000 1000 Correlations BMI Pearson Correlation BMI calorie exercise income education Sig. (1-tailed) BMI calorie exercise income education N BMI calorie exercise income education Variables Entered/Removed(b) Variables Entered education, calorie, income, exercise(a) a All requested variables entered. b Dependent Variable: BMI Model Summary(b) Adjusted R Square .641 Std. Error of the Estimate .77095 Variables Removed 1.000 .784 -.310 .033 .011 . .000 .000 .148 .361 1000 1000 1000 1000 1000 calorie .784 1.000 -.193 -.009 .004 .000 . .000 .391 .451 1000 1000 1000 1000 1000 exercise -.310 -.193 1.000 -.030 -.046 .000 .000 . .175 .072 1000 1000 1000 1000 1000 income .033 -.009 -.030 1.000 .069 .148 .391 .175 . .014 1000 1000 1000 1000 1000 education .011 .004 -.046 .069 1.000 .361 .451 .072 .014 . 1000 1000 1000 1000 1000
Model 1
Method
Enter
Model 1
R .801(a)
R Square .642
a Predictors: (Constant), education, calorie, income, exercise b Dependent Variable: BMI
ANOVA(b) Sum of Squares Regression Residual Total 1062.377 591.394 1653.771
Model 1
df 4 995 999
Mean Square 265.594 .594
F 446.853
Sig. .000(a)
a Predictors: (Constant), education, calorie, income, exercise b Dependent Variable: BMI Coefficients(a) Unstandardized Coefficients B 1 (Constant) calorie exercise income education a Dependent Variable: BMI Collinearity Diagnostics(a) 20.693 .002 -.027 8.82E-005 -.001 Std. Error .208 .000 .003 .000 .006 .753 -.163 .035 -.002 Standardized Coefficients Beta 99.404 38.969 -8.434 1.837 -.086 .000 .000 .000 .067 .932
Model
Sig.
95% Confidence Interval for B Lower Bound 20.285 .002 -.034 .000 -.013 Upper Bound 21.102 .002 -.021 .000 .012
Collinearity Statistics Tolerance .962 .960 .994 .993 VIF 1.039 1.042 1.006 1.007
Variance Proportions Model 1 Dimension 1 2 3 4 5 a Dependent Variable: BMI Residuals Statistics(a) Minimum Predicted Value Residual Std. Predicted Value Std. Residual a Dependent Variable: BMI 21.8115 -3.36145 -2.188 -4.360 Maximum 26.9475 4.91952 2.793 6.381 Mean 24.0674 .00000 .000 .000 Std. Deviation 1.03123 .76941 1.000 .998 N 1000 1000 1000 1000 Eigenvalue 4.778 .110 .060 .041 .011 Condition Index 1.000 6.584 8.924 10.842 21.197 (Constant) .00 .00 .00 .01 .99 calorie .00 .10 .41 .21 .28 exercise .00 .72 .03 .05 .19 income .00 .02 .56 .26 .16 education .00 .01 .00 .55 .44
Step 7 Checking for Multicollinearity
Correlations Pearson Correlation BMI calorie exercise income education BMI calorie exercise income education BMI calorie exercise income education BMI 1.000 .784 -.310 .033 .011 . .000 .000 .148 .361 1000 1000 1000 1000 1000 calorie .784 1.000 -.193 -.009 .004 .000 . .000 .391 .451 1000 1000 1000 1000 1000 exercise -.310 -.193 1.000 -.030 -.046 .000 .000 . .175 .072 1000 1000 1000 1000 1000 income .033 -.009 -.030 1.000 .069 .148 .391 .175 . .014 1000 1000 1000 1000 1000 education .011 .004 -.046 .069 1.000 .361 .451 .072 .014 . 1000 1000 1000 1000 1000
Sig. (1-tailed)
Check multicollinearity of independent variables. If the absolute value of Pearson correlation is greater than 0.8, collinearity is very likely to exist. If the absolute value of Pearson correlation is close to 0.8 (such as 0.70.1), collinearity is likely to exist.
Step 7 Checking for Multicollinearity (cont.)
a Collinearity Diagnostics
Model 1
Dimension 1 2 3 4 5
Eigenvalue 4.778 .110 .060 .041 .011
Condition Index 1.000 6.584 8.924 10.842 21.197
(Constant) .00 .00 .00 .01 .99
Variance Proportions calorie exercise income .00 .00 .00 .10 .72 .02 .41 .03 .56 .21 .05 .26 .28 .19 .16
education .00 .01 .00 .55 .44
a. Dependent Variable: BMI
A condition index greater than 15 indicates a possible problem An index greater than 30 suggests a serious problem with collinearity.
Step 8 Statistics
Goodness of fit of model
Model Summaryb Model 1 R .801a R Square .642 Adjusted R Square .641 Std. Error of the Estimate .77095
a. Predictors: (Constant), education, calorie, income, exercise b. Dependent Variable: BMI
R2 = 0.642 It means that 64.2% of variation is explained by the model.
The adjusted R2 adjusts for the number of explanatory terms (independent variables) in a model and increases only if the new independent variable(s) improve(s) the model more than would be expected by chance.
Step 8 Statistics (cont.)
Coefficient of each independent variable
Coefficientsa Unstandardized Coefficients B Std. Error 20.693 .208 .002 .000 -.027 .003 8.82E-005 .000 -.001 .006 Standardized Coefficients Beta .753 -.163 .035 -.002 95% Confidence Interval for B Lower Bound Upper Bound 20.285 21.102 .002 .002 -.034 -.021 .000 .000 -.013 .012 Collinearity Statistics Tolerance VIF .962 .960 .994 .993 1.039 1.042 1.006 1.007
Model 1
(Constant) calorie exercise income education
t 99.404 38.969 -8.434 1.837 -.086
Sig. .000 .000 .000 .067 .932
a. Dependent Variable: BMI
Unstandardized coefficients used in the prediction and interpretation
standardized coefficients used for comparing the effects of independent variables
Compared Sig. with alpha 0.05. If Sig. <0.05 the coefficient is statistically significant from zero.
Step 9 Interpreting Estimated Coefficient
YBMI = 20.693 + 0.002 xcalorie + (0.027) xexercise + 0.0000882 xincome + (0.001) xeducation
Controlling for other variables constant, if a person increase 1 calorie intake per day, the BMI of the person will increase by 0.002. Please explain the estimated coefficient of exercise.
Steps on Model Development and Model Selection
First, include the theoretically important variables Second, include variables that are strongly associated with the dependent variable (to identify independent variables that are strongly associated with the dependent variable, Pearson r test could be used for interval-ratio variables with the dependent variable). Third, adjusted R2 need to be compared to determine if the new independent variables improve the model. At the mean time, multicollinearity needs to be checked.
Notes on Regression Model
It is VERY important to have theory before starting developing any regression model. If the theory tells you certain variables are too important to exclude from the model, you should include in the model even though their estimated coefficients are not significant. (Of course, it is more conservative way to develop regression model.)
BMI data
http://courses.washington.edu/urbdp520/UDP520/BMI.sav
For exercise, you can develop your own conceptual frameworks (theories), create different OLS models, and examine different independent variables.