Machine Learning
Unit-II
               Linear regression
• Regression is essentially finding a relationship (or)
  association between the dependent variable (Y) and the
  independent variable(s) (X), i.e. to find the function ‘f ’ for
  the association Y = f (X).
• Linear regression is a statistical model that is used to
  predict a continuous dependent variable from one or more
  independent variables
• It is called "linear" because the model is based on the idea
  that the relationship between the dependent and independent
  variables is linear.
• In a linear regression model, the independent variables are
  referred to as the predictors and the dependent variable is
  the response/target.
              Linear regression
• The goal is to find the "best" line that fits the data. The
  "best" line is the one that minimizes the sum of the
  squared differences between the observed responses in the
  dataset and the responses predicted by the line.
• For example, if you were using linear regression to model
  the relationship between the temperature outside and the
  number of ice cream cones sold at an ice cream shop, you
  could use the model to predict how many ice cream cones
  you would sell on a hot day given the temperature
  outside.
      Linear regression                   Cont…
• Linear Regression is Supervised Learning
The most common regression algorithms are
  1. Simple linear regression
  2. Multiple linear regression
  3. Polynomial regression
  4. Multivariate adaptive regression splines
  5. Logistic regression
  6. Maximum likelihood estimation (least squares)
          Simple linear regression
• Simple linear regression is the simplest regression model
  which involves only one predictor. This model assumes a
  linear relationship between the dependent variable and
  the predictor variable.
• For example, you might use simple linear regression to
  model the relationship between the temperature outside
  and the number of ice cream cones sold at an ice cream
  shop.
• The temperature would be the independent variable and
  the number of ice cream cones sold would be the
  dependent variable.
                                            Slope = Change in Y/Change in X
            Simple linear regression
• The value of intercept indicates the value of Y when X = 0. It is known
  as ‘the intercept or Y intercept’ because it specifies where the straight
  line crosses the vertical or Y-axis.
• Slope of a straight line represents how much the line in a graph changes
  in the vertical direction (Y-axis) over a change in the horizontal
  direction (X-axis)
      Simple linear regression                         Cont…
To fit a line to this data, we can use the following equation:
                 y = ax + b
Where:
➢ y is the dependent variable (the number of ice cream cones sold)
➢ x is the independent variable (the temperature)
➢ a is the slope of the line
➢ b is the y-intercept (the point at which the line crosses the y-axis)
     Simple linear regression                    Cont…
• Example: If we take Price of a Property as the dependent
  variable and the Area of the Property (in sq. m.) as the
  predictor variable, we can build a model using simple
  linear regression.
              PriceProperty = f(AreaProperty )
• Assuming a linear association, we can reformulate the
  model as
              PriceProperty = a + b. AreaProperty
• where ‘a’ and ‘b’ are intercept and slope of the straight
  line, respectively.
    Slope of the simple linear regression model
• Slope of a straight line represents how much the line in a graph
  changes in the vertical direction (Y-axis) over a change in the
  horizontal direction (X-axis) as shown in Figure 8.2.
                     Slope = Change in Y/Change in X
• Rise is the change in Y-axis (Y − Y ) and Run is the change in
  X-axis (X − X ). So, slope is represented as given below:
                   Loss functions
• Suppose the model is trained and gives the predicted output
  then the loss is the difference between the predicted values
  and actual data values.
Type of loss in a linear model
MAE-This is the difference between the predicted and actual
  values. It is also called mean absolute error (MAE).
                  Loss functions
Type of loss in a linear model
MSE- the squared average difference between the predicted
  and actual value. It is also known as Mean Squared Error
  (MSE). The formula of MSE loss is shown below.
                    Loss functions
Type of loss in a linear model
RSME Error: It tells the error rate by the square root of the L2
  loss i.e. MSE. The formula of RSME is shown below.
                    Loss functions
Type of loss in a linear model
• R-squared error: It tells the good fit of the model-predicted
  line with the actual values of data. The coefficient value
  range is from 0 to 1 i.e. the value close to 1 is a well-fitted
  line. The formula is shown below.
                  Slope Equation
• Least Square Regression is a method which
  minimizes the error in such a way that the sum of all
  square error is minimized. Here are the steps you use
  to calculate the Least square regression.
• First, the formula for calculating m = slope is
• The lower the error, lesser the overall deviation from
  the original point.
  Ordinary Least Squares (OLS) algorithm
Step 1: Calculate the mean of X and Y
Step 2: Calculate the errors of X and Y
Step 3: Get the product
Step 4: Get the summation of the products
Step 5: Square the difference of X
Step 6: Get the sum of the squared difference
Step 7: Divide output of step 4 by output of step 6 to
       calculate ‘b’
Step 8: Calculate ‘a’ using the value of ‘b’
                Exercise Problem
• A college professor believes that if the grade for
  internal examination is high in a class, the grade for
  external examination will also be high. A random
  sample of 15 students in that class was selected, and
  the data is given below:
• Solution
  Maximum and minimum point of curves
• Maximum and minimum points on a graph are
  found at points where the slope of the curve is zero.
• The maximum point is the point on the curve of the
  graph with the highest y-coordinate and a slope of
  zero.
• The minimum point is the point on the curve of the
  graph with the lowest y-coordinate and a slope of
  zero.
Maximum point
Minimum point
           Multiple Linear Regression
• In a multiple regression model, two or more independent
  variables, i.e. predictors are involved.
• Example: A model which can predict the correct value of a real
  estate if it has certain standard inputs such as area (sq. m.) of the
  property, location, floor, number of years since purchase,
  amenities available etc as independent variables.
• We can form a multiple regression equation as shown below:
   PriceProperty = f (AreaProperty , location, floor, Ageing, Amenities)
• The following expression describes the equation involving the
  relationship with two predictor variables, namely X1 and X2 .
              Multiple Linear Regression
• The model describes a plane in the three-dimensional
  space of Ŷ, X1 , and X2 . Parameter ‘a’ is the intercept of
  this plane. Parameters ‘b1’ and ‘b2’ are referred to as
  partial regression coefficients.
• Parameter b1 represents the change in the mean response
  corresponding to a unit change in X1 when X2 is held
  constant.
• Parameter b2 represents the change in the mean response
  corresponding to a unit change in X2 when X1 is held
  constant.
       Multiple Linear Regression
• Consider the following example of a multiple linear
  regression model with two predictor variables, namely
  X1 and X2
           Multiple regression plane
      Multiple Linear Regression
• Multiple regression for estimating equation when
  there are ‘n’ predictor variables is as follows:
• While finding the best fit line, we can fit either a
  polynomial or curvilinear regression. These are
  known as polynomial or curvilinear regression,
  respectively.
  Use the following steps to fit a multiple
         linear regression model
Step 1: Calculate X12, X22, X1y, X2y and X1X2
Step 2: Calculate Regression Sums.
Step 3: Calculate b0, b1, and b2.
Step 4: Place b0, b1, and b2 in the estimated linear
  regression equation.
Step 1: Calculate X12, X22, X1y, X2y and X1X2
Step 2: Calculate Regression Sums.
Step 2: Calculate Regression Sums.
Step 3: Calculate b0, b1, and b2
Step 4: Place b0, b1, and b2 in the estimated
         linear regression equation.
  Assumptions in Regression Analysis
1. Linear relationship between the features and target
2. Little or no multicollinearity between the features
3. Normal Distribution of error terms
4. Little or no autocorrelation among residuals
5. Homoscedasticity of the errors i.e., the variance of
   the residuals must be constant across the predicted
   values
  Improving Accuracy of the Linear Regression Model
• Accuracy refers to how close the estimation is near
  the actual value
• Prediction refers to continuous estimation of the
  value.
Bias and Variance is similar to accuracy and prediction
• High bias = low accuracy (not close to real value)
• High variance = low prediction (values are scattered)
• Low bias = high accuracy (close to real value)
• Low variance = high prediction (values are close to each
  other)
  Improving Accuracy of the Linear Regression Model
• A regression model which is highly accurate and highly
  predictive, the overall error of the model will be low,
  implying a low bias (high accuracy) and low variance (high
  prediction) - highly preferable
• Similarly, if the variance increases (low prediction), the
  spread of our data points increases, which results in less
  accurate prediction. As the bias increases (low accuracy),
  the error between our predicted value and the observed
  values increases.
• Balancing out bias and accuracy is essential in a
  regression model.
 Improving Accuracy of the Linear Regression Model
• In the linear regression model, it is assumed that the
  number of observations (n) is greater than the number
  of parameters (k) to be estimated, i.e. n > k, and in that
  case, the least squares estimates tend to have low
  variance and hence will perform well on test
  observations.
• However, if observations (n) is not much larger than
  parameters (k), then there can be high variability in the
  least squares fit, resulting in overfitting and leading to
  poor predictions.
• If k > n, then linear regression is not usable.
 Improving Accuracy of the Linear Regression Model
• Accuracy of linear regression can be improved using
  the following three methods:
 1. Shrinkage Approach
 2. Subset Selection
 3. Dimensionality (Variable) Reduction
      Shrinkage (Regularization) approach
• This approach involves fitting a model involving all predictors.
  However, the estimated coefficients are shrunken towards
  zero relative to the least squares estimates.
• This shrinkage (also known as regularization) has the effect of
  reducing the overall variance. Some of the coefficients may
  also be estimated to be exactly zero, thereby indirectly
  performing variable selection.
• The two best-known techniques for shrinking the regression
  coefficients towards zero are
  1. ridge regression
  2. lasso (Least Absolute Shrinkage Selector Operator)
     Shrinkage (Regularization) approach
1. Ridge Regression : It modifies the over-fitted or under
   fitted models by adding the penalty equivalent to the
   sum of the squares of the magnitude of coefficients.
Ridge Regression performs regularization by shrinking the
    coefficients present.
                Ridge Regression
• Ridge regression decreases the complexity of a model but does not
 reduce the number of variables since it never leads to a coefficient
 been zero rather only minimizes it
• As the regularization parameter increases, the value of the
 coefficient tends towards zero. This leads to both low variance (as
 some coefficient leads to negligible effect on prediction) and low
 bias (minimization of coefficient reduces the dependency of
 prediction on a particular variable)
• Ridge   regression   is   not    good   for   feature    reduction
    Shrinkage (Regularization) approach
2. lasso (Least Absolute Shrinkage Selector Operator): It
  modifies the over-fitted or under-fitted models by
  adding the penalty equivalent to the sum of the
  absolute values of coefficients.
                      Lasso Regression
• If the regularization parameter is very high in In Lasso Regression,
  then it can be used to select important features of a dataset and
  shrinks the coefficients of less important features to exactly 0
• If the number of features (p) is greater than the number of
  observations (n), Lasso will pick at most n features as non-zero, even
  if all features are relevant
• Lasso can be used to select important features of a dataset
• The difference between ridge and lasso regression is that lasso
  tends to make coefficients to absolute zero as compared to Ridge
  which never sets the value of the coefficient to absolute zero
Shrinkage (Regularization) approach
                   Subset selection
• Identify a subset of the predictors that is assumed to
  be related to the response and then fit a model
  using OLS on the selected reduced subset of
  variables.
• There are two methods in which subset of the
  regression can be selected:
1. Best subset selection (considers all the possible (2k ))
2. Stepwise subset selection
   i. Forward stepwise selection (0 to k)
   ii. Backward stepwise selection (k to 0)
   Dimensionality reduction (Variable reduction)
• In dimensionality reduction, predictors (X) are
  transformed, and the model is set up using the
  transformed      variables     after   dimensionality
  reduction.
• The number of variables is reduced using the
  dimensionality reduction method.
• Principal component analysis is one of the most
  important     dimensionality    (variable)   reduction
  techniques.
Thank You