KEMBAR78
MachineLearning Unit II | PDF | Regression Analysis | Linear Regression
0% found this document useful (0 votes)
23 views45 pages

MachineLearning Unit II

Uploaded by

aishwaryaslat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views45 pages

MachineLearning Unit II

Uploaded by

aishwaryaslat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Machine Learning

Unit-II
Linear regression
• Regression is essentially finding a relationship (or)
association between the dependent variable (Y) and the
independent variable(s) (X), i.e. to find the function ‘f ’ for
the association Y = f (X).
• Linear regression is a statistical model that is used to
predict a continuous dependent variable from one or more
independent variables
• It is called "linear" because the model is based on the idea
that the relationship between the dependent and independent
variables is linear.
• In a linear regression model, the independent variables are
referred to as the predictors and the dependent variable is
the response/target.
Linear regression
• The goal is to find the "best" line that fits the data. The
"best" line is the one that minimizes the sum of the
squared differences between the observed responses in the
dataset and the responses predicted by the line.
• For example, if you were using linear regression to model
the relationship between the temperature outside and the
number of ice cream cones sold at an ice cream shop, you
could use the model to predict how many ice cream cones
you would sell on a hot day given the temperature
outside.
Linear regression Cont…
• Linear Regression is Supervised Learning
The most common regression algorithms are
1. Simple linear regression

2. Multiple linear regression

3. Polynomial regression

4. Multivariate adaptive regression splines

5. Logistic regression

6. Maximum likelihood estimation (least squares)


Simple linear regression
• Simple linear regression is the simplest regression model
which involves only one predictor. This model assumes a
linear relationship between the dependent variable and
the predictor variable.
• For example, you might use simple linear regression to
model the relationship between the temperature outside
and the number of ice cream cones sold at an ice cream
shop.
• The temperature would be the independent variable and
the number of ice cream cones sold would be the
dependent variable.
Slope = Change in Y/Change in X

Simple linear regression

• The value of intercept indicates the value of Y when X = 0. It is known


as ‘the intercept or Y intercept’ because it specifies where the straight
line crosses the vertical or Y-axis.
• Slope of a straight line represents how much the line in a graph changes
in the vertical direction (Y-axis) over a change in the horizontal
direction (X-axis)
Simple linear regression Cont…

To fit a line to this data, we can use the following equation:


y = ax + b
Where:
➢ y is the dependent variable (the number of ice cream cones sold)
➢ x is the independent variable (the temperature)
➢ a is the slope of the line
➢ b is the y-intercept (the point at which the line crosses the y-axis)
Simple linear regression Cont…
• Example: If we take Price of a Property as the dependent
variable and the Area of the Property (in sq. m.) as the
predictor variable, we can build a model using simple
linear regression.

PriceProperty = f(AreaProperty )
• Assuming a linear association, we can reformulate the
model as
PriceProperty = a + b. AreaProperty

• where ‘a’ and ‘b’ are intercept and slope of the straight
line, respectively.
Slope of the simple linear regression model
• Slope of a straight line represents how much the line in a graph
changes in the vertical direction (Y-axis) over a change in the
horizontal direction (X-axis) as shown in Figure 8.2.
Slope = Change in Y/Change in X
• Rise is the change in Y-axis (Y − Y ) and Run is the change in
X-axis (X − X ). So, slope is represented as given below:
Loss functions
• Suppose the model is trained and gives the predicted output
then the loss is the difference between the predicted values
and actual data values.
Type of loss in a linear model
MAE-This is the difference between the predicted and actual
values. It is also called mean absolute error (MAE).
Loss functions
Type of loss in a linear model
MSE- the squared average difference between the predicted
and actual value. It is also known as Mean Squared Error
(MSE). The formula of MSE loss is shown below.
Loss functions
Type of loss in a linear model
RSME Error: It tells the error rate by the square root of the L2
loss i.e. MSE. The formula of RSME is shown below.
Loss functions
Type of loss in a linear model
• R-squared error: It tells the good fit of the model-predicted
line with the actual values of data. The coefficient value
range is from 0 to 1 i.e. the value close to 1 is a well-fitted
line. The formula is shown below.
Slope Equation

• Least Square Regression is a method which


minimizes the error in such a way that the sum of all
square error is minimized. Here are the steps you use
to calculate the Least square regression.
• First, the formula for calculating m = slope is

• The lower the error, lesser the overall deviation from


the original point.
Ordinary Least Squares (OLS) algorithm
Step 1: Calculate the mean of X and Y
Step 2: Calculate the errors of X and Y
Step 3: Get the product
Step 4: Get the summation of the products
Step 5: Square the difference of X
Step 6: Get the sum of the squared difference
Step 7: Divide output of step 4 by output of step 6 to
calculate ‘b’
Step 8: Calculate ‘a’ using the value of ‘b’
Exercise Problem
• A college professor believes that if the grade for
internal examination is high in a class, the grade for
external examination will also be high. A random
sample of 15 students in that class was selected, and
the data is given below:
• Solution
Maximum and minimum point of curves

• Maximum and minimum points on a graph are


found at points where the slope of the curve is zero.

• The maximum point is the point on the curve of the


graph with the highest y-coordinate and a slope of
zero.

• The minimum point is the point on the curve of the


graph with the lowest y-coordinate and a slope of
zero.
Maximum point
Minimum point
Multiple Linear Regression
• In a multiple regression model, two or more independent
variables, i.e. predictors are involved.
• Example: A model which can predict the correct value of a real
estate if it has certain standard inputs such as area (sq. m.) of the
property, location, floor, number of years since purchase,
amenities available etc as independent variables.
• We can form a multiple regression equation as shown below:
PriceProperty = f (AreaProperty , location, floor, Ageing, Amenities)
• The following expression describes the equation involving the
relationship with two predictor variables, namely X1 and X2 .
Multiple Linear Regression
• The model describes a plane in the three-dimensional
space of Ŷ, X1 , and X2 . Parameter ‘a’ is the intercept of
this plane. Parameters ‘b1’ and ‘b2’ are referred to as
partial regression coefficients.
• Parameter b1 represents the change in the mean response
corresponding to a unit change in X1 when X2 is held
constant.
• Parameter b2 represents the change in the mean response
corresponding to a unit change in X2 when X1 is held
constant.
Multiple Linear Regression
• Consider the following example of a multiple linear
regression model with two predictor variables, namely
X1 and X2

Multiple regression plane


Multiple Linear Regression
• Multiple regression for estimating equation when
there are ‘n’ predictor variables is as follows:

• While finding the best fit line, we can fit either a


polynomial or curvilinear regression. These are
known as polynomial or curvilinear regression,
respectively.
Use the following steps to fit a multiple
linear regression model
Step 1: Calculate X12, X22, X1y, X2y and X1X2

Step 2: Calculate Regression Sums.

Step 3: Calculate b0, b1, and b2.

Step 4: Place b0, b1, and b2 in the estimated linear


regression equation.
Step 1: Calculate X12, X22, X1y, X2y and X1X2
Step 2: Calculate Regression Sums.
Step 2: Calculate Regression Sums.
Step 3: Calculate b0, b1, and b2
Step 4: Place b0, b1, and b2 in the estimated
linear regression equation.
Assumptions in Regression Analysis
1. Linear relationship between the features and target

2. Little or no multicollinearity between the features

3. Normal Distribution of error terms

4. Little or no autocorrelation among residuals

5. Homoscedasticity of the errors i.e., the variance of


the residuals must be constant across the predicted
values
Improving Accuracy of the Linear Regression Model

• Accuracy refers to how close the estimation is near


the actual value
• Prediction refers to continuous estimation of the
value.
Bias and Variance is similar to accuracy and prediction
• High bias = low accuracy (not close to real value)
• High variance = low prediction (values are scattered)
• Low bias = high accuracy (close to real value)
• Low variance = high prediction (values are close to each
other)
Improving Accuracy of the Linear Regression Model

• A regression model which is highly accurate and highly


predictive, the overall error of the model will be low,
implying a low bias (high accuracy) and low variance (high
prediction) - highly preferable
• Similarly, if the variance increases (low prediction), the
spread of our data points increases, which results in less
accurate prediction. As the bias increases (low accuracy),
the error between our predicted value and the observed
values increases.
• Balancing out bias and accuracy is essential in a
regression model.
Improving Accuracy of the Linear Regression Model

• In the linear regression model, it is assumed that the


number of observations (n) is greater than the number
of parameters (k) to be estimated, i.e. n > k, and in that
case, the least squares estimates tend to have low
variance and hence will perform well on test
observations.
• However, if observations (n) is not much larger than
parameters (k), then there can be high variability in the
least squares fit, resulting in overfitting and leading to
poor predictions.
• If k > n, then linear regression is not usable.
Improving Accuracy of the Linear Regression Model

• Accuracy of linear regression can be improved using


the following three methods:

1. Shrinkage Approach

2. Subset Selection

3. Dimensionality (Variable) Reduction


Shrinkage (Regularization) approach
• This approach involves fitting a model involving all predictors.
However, the estimated coefficients are shrunken towards
zero relative to the least squares estimates.
• This shrinkage (also known as regularization) has the effect of
reducing the overall variance. Some of the coefficients may
also be estimated to be exactly zero, thereby indirectly
performing variable selection.
• The two best-known techniques for shrinking the regression
coefficients towards zero are
1. ridge regression
2. lasso (Least Absolute Shrinkage Selector Operator)
Shrinkage (Regularization) approach
1. Ridge Regression : It modifies the over-fitted or under
fitted models by adding the penalty equivalent to the
sum of the squares of the magnitude of coefficients.

Ridge Regression performs regularization by shrinking the


coefficients present.
Ridge Regression
• Ridge regression decreases the complexity of a model but does not

reduce the number of variables since it never leads to a coefficient

been zero rather only minimizes it

• As the regularization parameter increases, the value of the

coefficient tends towards zero. This leads to both low variance (as

some coefficient leads to negligible effect on prediction) and low

bias (minimization of coefficient reduces the dependency of

prediction on a particular variable)

• Ridge regression is not good for feature reduction


Shrinkage (Regularization) approach
2. lasso (Least Absolute Shrinkage Selector Operator): It
modifies the over-fitted or under-fitted models by
adding the penalty equivalent to the sum of the
absolute values of coefficients.
Lasso Regression
• If the regularization parameter is very high in In Lasso Regression,
then it can be used to select important features of a dataset and
shrinks the coefficients of less important features to exactly 0

• If the number of features (p) is greater than the number of


observations (n), Lasso will pick at most n features as non-zero, even
if all features are relevant

• Lasso can be used to select important features of a dataset

• The difference between ridge and lasso regression is that lasso


tends to make coefficients to absolute zero as compared to Ridge
which never sets the value of the coefficient to absolute zero
Shrinkage (Regularization) approach
Subset selection
• Identify a subset of the predictors that is assumed to
be related to the response and then fit a model
using OLS on the selected reduced subset of
variables.
• There are two methods in which subset of the
regression can be selected:
1. Best subset selection (considers all the possible (2k ))
2. Stepwise subset selection
i. Forward stepwise selection (0 to k)
ii. Backward stepwise selection (k to 0)
Dimensionality reduction (Variable reduction)

• In dimensionality reduction, predictors (X) are


transformed, and the model is set up using the
transformed variables after dimensionality
reduction.
• The number of variables is reduced using the
dimensionality reduction method.
• Principal component analysis is one of the most
important dimensionality (variable) reduction
techniques.
Thank You

You might also like