LO4:Investigate a range of predictive analytic
techniques to discover new knowledge for
forecasting future events
• Agenda
• Linear regression
• Multiple linear regression
• Categorical regression
• Logistics regression
Simple Linear regression and
Correlation
• Many problems in engineering and science involve
exploring the relationships between two or more
variables. Regression analysis is a statistical technique
that is very useful for these types of problems. For
example, in a chemical process, suppose that the yield
of the product is related to the process-operating
temperature. Regression analysis can be used to build a
model to predict yield at a given temperature level. This
model can also be used for process optimization, such
as finding the level of temperature that maximizes
yield, or for process control purposes.
Simple Linear regression and
Correlation
HYPOTHESIS TESTS IN SIMPLE LINEAR
REGRESSION (Individual Coeficients)
Analysis of Variance Approach to
Test Significance of Regression
ADEQUACY OF THE REGRESSION
MODEL:Coefficient of Determination(R2)
A model may have a high R² but still be inadequate if
assumptions are violated: Linearity, normality of
residuals, Homoscedasticity, independence of error,
no multicollinearity (Variance Inflation Factor VIF<5
low to moderate, VIF>5 moderate to high)
sample correlation coefficient
MULTIPLE LINEAR REGRESSION
Matrix Approach to Multiple
Linear Regression
Multicollinearity: Variance
Inflation Factor VIF<5 low to
moderate, VIF>5 moderate to
high
R-sqr: for training data
R-sqr adj: Guard against overfitting
R-sqr pred: for testing data
PRESS: (prediction Sum of Squares) assess
how well a regression model will predict new,
unseen data. leave-one-out cross-validation
(LOOCV) calculate the SSE without y^i.Note:
R-sqr pred=PRESS/SST
Test for Significance of
Regression (Overall Model
significance)
•n: Total number of observations (samples)
•p: Total number of parameters estimated, including the intercept
•k: Number of predictor variables (independent variables)
R2 and Adjusted R2
MODEL ADEQUACY CHECKING:
Residual Analysis