KEMBAR78
ML Unit-III Notes | PDF | Linear Regression | Regression Analysis
0% found this document useful (0 votes)
21 views83 pages

ML Unit-III Notes

The document provides an overview of regression analysis, including key terminologies such as dependent and independent variables, outliers, multicollinearity, underfitting, and overfitting. It explains the purpose of regression in predicting continuous outcomes and highlights its applications in various fields like forecasting and market trends. Additionally, it discusses the concepts of linear relationships, covariance, and correlation coefficients, emphasizing their importance in understanding relationships between variables.

Uploaded by

Asfia Al Hera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views83 pages

ML Unit-III Notes

The document provides an overview of regression analysis, including key terminologies such as dependent and independent variables, outliers, multicollinearity, underfitting, and overfitting. It explains the purpose of regression in predicting continuous outcomes and highlights its applications in various fields like forecasting and market trends. Additionally, it discusses the concepts of linear relationships, covariance, and correlation coefficients, emphasizing their importance in understanding relationships between variables.

Uploaded by

Asfia Al Hera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

ML Unit-III

Terminologies related to Regression


 Dependent Variable / Target variable: The main factor in Regression analysis which we want to predict or understand is

called the dependent variable.

 Independent Variable / Predictor: The factors which affect the dependent variables or which are used to predict the values

of the dependent variables

 Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed

values. It will hamper the result and should be avoided.

 Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition

is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting

variable.

 Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then such

problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is

called underfitting.
Regression Analysis
Regression is a method for understanding the relationship between independent variables or features and
a dependent variable or outcome.

Regression analysis is a form of predictive modelling technique which investigates the relationship
between a dependent variable and independent variable

Regression analysis is one of the most basic tools in the area of machine learning used for prediction.

In Regression analysis, an algorithm is used to predict continuous outcomes / variable.

Regression analysis is an integral part of any forecasting or predictive model, so is a common method
found in machine learning powered predictive analytics.

Machine learning regression generally involves plotting a line of best fit through the data points.

Using this plot, the machine learning model can make predictions about the data.
Regression Analysis
In simple words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the regression
line is minimum."

The distance between each point and the line is minimised to achieve the best fit line

The distance between data points and line tells whether a model has captured a strong relationship or
not.

Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables.

It predicts continuous/real values such as temperature, age, salary, price, etc.
Regression Analysis
Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.

It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.

This approach required labelled input and output training data to train models

Example: Suppose there is a marketing company A,

who does various advertisement every year and get sales on that.

Now, the company wants to do the advertisement of $200 in the year 2023 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.
Uses of Regression Analysis
Prediction of rain using temperature and other factors

Determining Market trends

Prediction of road accidents due to rash driving.

Determining the strength of predictors

Trend forecasting

Forecasting continuous outcomes like house prices, stock prices, sales prediction, map salary changes,
weather condition etc.,

Predicting the success of future retail sales or marketing campaigns to ensure resources are used
effectively.
Uses of Regression Analysis
Predicting customer or user trends, such as on streaming services or ecommerce websites.

Analysing datasets to establish the relationships between variables and an output.

Predicting interest rates or stock prices from a variety of factors.

Creating time series visualizations.


Linear Relationship
As its name suggests, a linear relationship is any equation that, when graphed, gives you a straight
line.

Linear equations can be used to represent the relationship between two variables, most commonly x and y.
To form the simplest linear relationship, we can make our two variables equal: x y
y=x 0 0
By plugging numbers into the equation, we can find some relative values of x and y 1 1
If we plot those points in the xy-plane, we create a line. 2 2

3 3
Linear Relationship
Four Criteria of Equation to Qualify at Linear Relationship
The equation can have up to two variables, but it cannot have more than two variables.

All the variables in the equation are to the first power. None are squared or cubed or taken to any power.

none of the variables will be in the denominator.

The equation must graph as a straight line.

These are examples of equations that do not have a linear relationship.

Linear relationships such as y = 2 and y = x all graph out as straight lines. When graphing y = 2, you get a
line going horizontally at the 2 mark on the y-axis. When graphing y = x, you get a diagonal line crossing
the origin.
Linear Relationship
The concept of linear relationship is used in Linear regression algorithm to shows a relationship between
a dependent (y) and one or more independent (x) variables, hence called as linear regression

Linear regression finds how the value of the dependent variable is changing according to the value of
the independent variable.
Examples of Linear Relationship
Linear relationships are very common in our everyday life, even if we aren't consciously aware of them.
Take, for example, how fast things such as cars and trains can go. Have you ever thought about how their
speeds are calculated? When a police officer gives someone a speeding ticket, how do they know for sure
if the person was speeding? Well, they use a simple linear relationship called the rate formula.
Speed of Object = Distance / Time

Another example is that of converting temperature from Fahrenheit to Celsius. If you live in the United
States, you probably use Fahrenheit, but if you discuss weather with a friend who lives in a different part of
the world, you may need to convert the temperature to Celsius. You can use the conversion formula to
convert one temperature type to the other:
Measures of Linear Relationship
Numerical measures of linear relationship that provide the direction (Positive or Negative) and strength
(Strong or weak relationship) of the linear relationship between two interval variables:
Measures of Linear Relationship
Linear relationship can be measured by using following

Covariance

Coefficient of Correlation

Coefficient of Determination

Least Square Method / Line


Covariance
❑Covariance is a measure of how much two random variables vary together.

❑It’s similar to variance, but where variance tells you how a single variable varies, covariance tells you
how two variables vary together.

❑Covariance is a statistical tool that is used to determine the direction of the relationship between the
movements of two random variables.

❑When two stocks tend to move together, they are seen as having a positive covariance; when they move
inversely, the covariance is negative.

❑The covariance equation is used to determine the direction of the relationship between two variables

❑This nature of relationship is determined by the sign (positive or negative) of the covariance value.

❑In other words, whether they tend to move in the same or opposite directions.
Covariance
 The magnitude of covariance describes the strength of the association

 Unfortunately, the magnitude may be difficult to judge. For example, if you’re told that the covariance

between two variables in 500, does this mean that there is a strong linear relationship? The answer is that it

is impossible to judge without additional statistics.

 When two variables move in the same direction (both increase or both decrease), the covariance will be a

large positive number

 When two variables move in the opposite direction ,the covariance will be a large negative number

 When there is no particular pattern, the covariance is a small number


Types of Covariance
❑Positive Covariance

• A positive covariance between two variables indicates that these variables tend to be higher or lower
at the same time.

• In other words, a positive covariance between variables x and y indicates that x is higher than average at
the same times that y is higher than average, and vice versa.

• When charted on a two-dimensional graph, the data points will tend to slope upwards.

❑Negative Covariance

• When the calculated covariance is less than zero, this indicates that the two variables have an
inverse relationship.

• In other words, an x value that is lower than average tends to be paired with a y that is greater than
average, and vice versa.
Covariance Formula
❑Formula

❑Where,
• xi = data value of x
• yi = data value of y
• x̄ = mean of x
• ȳ = mean of y
• N = number of data values.
Covariance
❑Below figure shows the covariance of X and Y.

❑If cov(X, Y) is greater than zero, then we can say that the covariance for any two variables is positive and
both the variables move in the same direction.

❑If cov(X, Y) is less than zero, then we can say that the covariance for any two variables is negative and both
the variables move in the opposite direction.

❑If cov(X, Y) is zero, then we can say that there is no relation between two variables.

❑The relationship between the correlation coefficient and covariance is given by;

Correlation,ρ(X,Y) = Cov(X,Y)/σX σy
❑Where:
• ρ(X,Y) = correlation between the variables X and Y
• Cov(X,Y) = covariance between the variables X and Y
• σX = standard deviation of the X variable
• σY = standard deviation of the Y variable
Covariance
Question:
Calculate the covariance for the following data:

X 2 8 18 20 28 30

Y 5 12 18 23 45 50

❑Solution:
Number of observations = 6
Mean of X = 17.67
Mean of Y = 25.5
Cov(X, Y) = (⅙) [(2 – 17.67)(5 – 25.5) + (8 – 17.67)(12 – 25.5) + (18 – 17.67)(18 – 25.5) + (20 – 17.67)(23 – 25.5)
+ (28 – 17.67)(45 – 25.5) + (30 – 17.67)(50 – 25.5)]
Cov(X, Y) = 157.83
Coefficient of Correlation
❑Correlation is used to test relationships between quantitative variables or categorical variables.

❑In other words, it’s a measure of how things are related.

❑The study of how variables are correlated is called correlation analysis.

❑Some examples of data that have a high correlation:

• Your caloric intake and your weight.

• Your eye color and your relatives’ eye colors.

• The amount of time your study and your GPA.

• Researchers have found a direct correlation between smoking and lung cancer.

❑Some examples of data that have a low correlation (or none at all):

• A dog’s name and the type of dog biscuit they prefer.

• The cost of a car wash and how long it takes to buy a soda inside the station.
Coefficient of Correlation
❑Correlations are useful because if you can find out what relationship variables have, you can
make predictions about future behaviour.

❑Knowing what the future holds is very important in the social sciences like government and healthcare.
Businesses also use these statistics for budgets and business plans.

❑The word Correlation is made of Co- (meaning "together"), and Relation

❑Correlation means association - more precisely it is a measure of the extent to which two variables are
related. There are three possible results of a correlational study:

• A positive correlation,

• A negative correlation, and

• No correlation.
Coefficient of Correlation
❑A Positive Correlation:

• It is a relationship between two variables in which both variables move in the same direction.

• Therefore, when one variable increases as the other variable increases, or one variable decreases while
the other decreases.

• An example of positive correlation would be height and weight. Taller people tend to be heavier.

❑A Negative Correlation:

• Relationship between two variables in which an increase in one variable is associated with a decrease
in the other.

• An example of negative correlation would be height above sea level and temperature. As you climb the
mountain (increase in height) it gets colder (decrease in temperature).

❑A zero Correlation exists when there is no relationship between two variables. For example there is no
relationship between the amount of tea drunk and level of intelligence.
Coefficient of Correlation
❑A correlation can be expressed visually by drawing a scattergram (also known as a scatterplot, scatter
graph, scatter chart, or scatter diagram).

❑A scattergram is a graphical display that shows the relationships or associations between two numerical
variables (or co-variables), which are represented as points (or dots) for each pair of score.

❑A scattergraph indicates the strength and direction of the correlation between the co-variables.

❑ Correlation can have a value:


• 1 is a perfect positive correlation
• 0 is no correlation (the values don't seem linked at all)
• -1 is a perfect negative correlation
Coefficient of Correlation
 Correlation coefficients are used to measure how strong a relationship is between two variables.

❑Correlation coefficients formula returns value of between -1 and 1.

1 indicates a strong positive relationship.

-1 indicates a strong negative relationship.

A result of zero indicates no relationship at all.


 There are several types of correlation coefficient, but the most popular is Pearson’s. Pearson’s
correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear regression.
Coefficient of Correlation
❑A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase
of a fixed proportion in the other. For example, shoe sizes go up in (almost) perfect correlation with foot
length.

❑A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative
decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost)
perfect correlation with speed.

❑Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t
related.

❑One of the most commonly used formulas is Pearson’s correlation coefficient formula.
Coefficient of Correlation
 Two other formulas are commonly used: the sample correlation coefficient and the population correlation
coefficient.

Sample correlation coefficient

Sx and sy are the sample standard deviations, and sxy is the sample covariance.

Population correlation coefficient

The population correlation coefficient uses σx and σy as the population standard deviations, and σxy as the
population covariance.
Coefficient of Correlation
❑The drawback to the coefficient of correlation is that- except for the three values ( -1, 0,+1), we cannot
interpret the correlation

❑For example, suppose that we calculated the coefficient of correlation to be -0.4. What does this tell us?

❑It tells us two things

The minus sign tells us the relationship is negative

Because 0.4 is closer to 0 than 1, we judge that the linear relationship is weak

❑In many applications, we need a better interpretation than the “linear relationship is weak”
Correlation Example
❑The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day.
Here are their figures for the last 12 days: Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
❑And here is the same data as a Scatter Plot 14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
In fact the correlation is 0.9575
Calculating Correlation (Pearson's Correlation)
❑Let us call the two sets of data "x" and "y" (in our case Temperature is x and Ice Cream Sales is y):

❑Step 1: Find the mean of x, and the mean of y

❑Step 2: Subtract the mean of x from every x value (call them "a"), and subtract the mean of y from every y
value (call them "b")

❑Step 3: Calculate: ab, a2 and b2 for every value

❑Step 4: Sum up ab, sum up a2 and sum up b2

❑Step 5: Divide the sum of ab by the square root of [(sum of a2) × (sum of b2)]

❑Here is how I calculated the first Ice Cream example (values rounded to 1 or 0 decimal places):
Calculating Correlation (Pearson's Correlation)
❑ Formula

Where:
Σ is Sigma, the symbol for "sum up"
is each x-value minus the mean of x (called "a" above)
is each y-value minus the mean of y (called "b" above)
Coefficient of Determination (R- Squared)
 It is calculated by squaring the coefficient of correlation.

 The coefficient of determination (R-squared) is used to analyse how differences in one variable can be

explained by a difference in a second variable. For example, when a lioness gets pregnant has a direct

relation to when it give birth.

 More specifically, R-squared gives you the percentage variation in y explained by x-variables. The range

is 0 to 1 (i.e. 0% to 100% of the variation in y can be explained by the x-variables).

 For this reason, we denote it R2 for population and r2 for sample.

 The coefficient of determination measures the amount of variation in the dependent variable that is

explained by the variation in the independent variable


Coefficient of Determination (R- Squared)
 If the coefficient of correlation is -1 or +1
The coefficient of determination is 1, which we interpret to mean that 100% of the variation in the
dependent variable Y is explained by variation in the independent variable X
 If the coefficient of correlation is 0
There is no linear relationship between the two variables, R2 = 0 and none of the variation in Y is
explained by the variation in X

 Finding R Squared / The Coefficient of Determination


Step 1: Find the correlation coefficient, r (it may be given to you in the question). Example, r = 0.543.
Step 2: Square the correlation coefficient. r2 = (0.5432)2 = .295
Step 3: Convert the correlation coefficient to a percentage. (.295)*100 = 29.5%
Coefficient of Determination (R- Squared)
 The coefficient of determination can be thought of as a percent. It gives you an idea of how many data
points fall within the results of the line formed by the regression equation.
 The higher the coefficient, the higher percentage of points the line passes through when the data
points and line are plotted.
 If the coefficient is 0.80, then 80% of the points should fall within the regression line. i.e. 80% of
variation in Y is explained by variation in X [The remaining 20% is unexplained].
 Values of 1 or 0 would indicate the regression line represents all or none of the data, respectively.
 It is used to evaluate the performance and strength of a linear regression model.
 It is the amount of the variation in the output dependent attribute which is predictable from the input
independent variable(s).
 It is the direct indicator of how good our model is in terms of performance whether it is accuracy,
Precision or Recall.
 It is the most common way to measure the strength of the model.
Coefficient of Determination (R- Squared)
 Finding R Squared / The Coefficient of Determination by using Another Formula:
 The coefficient of determination is simply one minus the SSR ( sum of squared residuals) divided by the SST
(sum of squared totals)

 Where The SSR calculates the difference between the observations of the dependent variable yi and the
corresponding value predicted by the model pi.
 The sum of squared residuals is also known as the sum of squared errors.

 The SST calculates the sum of squares between the observations of the dependent variable and their mean.
Coefficient of Determination (R- Squared)
 For Example: Assume R2 = 0.68
It can be referred that 68% of the changeability of the dependent output attribute can be explained by the model
while the remaining 32 % of the variability is still unaccounted for.
 R2 indicates the proportion of data points which lie within the line created by the regression equation. A higher value
of R2 is desirable as it indicates better results.
Case 1: Model gives accurate results

Actual Predicted
Error (E1=y-p) SSR =(E1)2 E2= y-Mean SST= (E2)2
(y) (p)

10 10 0 0 -10 100

20 20 0 0 0 0

30 30 0 0 10 100

Mean =20 SSR= 0 SST =200

R2 = 1 – (0/200) = 1
Coefficient of Determination (R- Squared)
Case 2: Model gives same result always

Actual Predicted E2= y-


Error (E1=y-p) SSR =(E1)2 SST= (E2)2
(y) (p) Mean

10 20 10 100 -10 100

20 20 0 0 0 0

30 20 -10 100 10 100

Mean =20 SSR= 200 SST =200

R2 = 1 – (200/200) = 0
Coefficient of Determination (R- Squared)
Case 3: Model gives ambiguous result

Actual Predicted
Error (E1=y-p) SSR =(E1)2 E2= y-Mean SST= (E2)2
(y) (p)

10 30 -20 400 -10 100

20 10 10 100 0 0

30 20 10 100 10 100

Mean =20 SSR= 600 SST =200

R2 = 1 – (600/200) = - 2
Least Square
 The least square is a formula used to measure the accuracy of a straight line in depicting the data that
was used to generate it. That is, the formula determines the line of best fit.
 The least squares criterion is determined by minimizing the sum of squares created by a mathematical
function.
 A square is determined by squaring the distance between a data point and the regression line or mean
value of the data set.
 The least squares criterion method is used throughout finance, economics, and investing.
Line of Best Fit / Regression line / Trend line
 A Line of best fit is a straight line that represents the best approximation of a scatter plot of data points.
 It is used to study the nature of the relationship between those points.
 It is an output of regression analysis and can be used as a prediction tool.
 Line of best fit refers to a line through a scatter plot of data points that best expresses the relationship
between those points.
 The equation to find the best fitting line is:

Y` = A + bX where,

Y` denotes the predicted value


b denotes the slope of the line
X denotes the independent variable
A is the Y intercept
Line of Best Fit / Regression line / Trend line
Line of Best Fit / Regression line / Trend line
 On a chart, a given set of data points would appear as scatter plot, that may or may not appear to be
organized along any line.
 It is possible to draw many straight lines through the data points in the chart, but to find a line of best fit
that minimizes the distance of those points from that line is one of the most important outputs of
regression analysis.
 So, how do we find a line of best fit using regression analysis?
 Usually, the apparent predicted line of best fit may not be perfectly correct, meaning it will have “prediction
errors” or “residual errors”.
 [Residuals in a statistical or machine learning model are the differences between observed and predicted
values of data.]
Line of Best Fit / Regression line / Trend line
 Prediction or Residual error is nothing but the difference between the actual value and the predicted
value for any data point.
 In general, when we use Y` = A + bX to predict the actual response Y`, we make a prediction error (or
residual error) of size:

E = Y – Y` where,

E denotes the prediction error or residual error


Y` denotes the predicted value
Y denotes the actual value
 A line that fits the data "best" will be one for which the prediction errors (one for each data point) are as
small as possible.
 Regression analysis uses “least squares method” to generate best fitting line.
 This method builds the line which minimizes the squared distance of each point from the line of best fit,
either though manual calculations or regression analysis software.
Coefficient Calculation with Least Square Method / Least Square
Regression
 The least squares method is a form of mathematical regression analysis used to determine the line of

best fit for a set of data, providing a visual demonstration of the relationship between the data points.

 Each point of data represents the relationship between a known independent variable and an unknown

dependent variable.

 The least squares method is a statistical procedure to find the best fit for a set of data points by

minimizing the sum of the offsets or residuals of points from the plotted curve.

 Least squares regression is used to predict the behaviour of dependent variables.

 The least squares method provides the overall rationale for the placement of the line of best fit among the

data points being studied.


Coefficient Calculation with Least Square Method / Least Square
Regression
 This method of regression analysis begins with a set of data points to be plotted on an x- and y-axis graph.

An analyst using the least squares method will generate a line of best fit that explains the potential

relationship between independent and dependent variables.

 An example of the least squares method is an analyst who wishes to test the relationship between a

company’s stock returns, and the returns of the index for which the stock is a component. In this example,

the analyst seeks to test the dependence of the stock returns on the index returns.

 To achieve this, all of the returns are plotted on a chart. The index returns are then designated as the

independent variable, and the stock returns are the dependent variable. The line of best fit provides the

analyst with coefficients explaining the level of dependence.


Coefficient Calculation with Least Square Method / Least Square
Regression
 The least squares method is used in a wide variety of fields, including finance and investing. For financial

analysts, the method can help to quantify the relationship between two or more variables—such as a

stock’s share price and its earnings per share (EPS). By performing this type of analysis investors often try

to predict the future behaviour of stock prices or other factors.

 To illustrate, consider the case of an investor considering whether to invest in a gold mining company.

The investor might wish to know how sensitive the company’s stock price is to changes in the market price

of gold. To study this, the investor could use the least squares method to trace the relationship between

those two variables over time onto a scatter plot. This analysis could help the investor predict the degree

to which the stock’s price would likely rise or fall for any given increase or decrease in the price of

gold.
Coefficient Calculation with Least Square Method / Least Square
Regression
 The equation to find the best fitting line is:

Y` = A + bX where,
Y` denotes the predicted value / dependent variable
b denotes the slope /gradient of the line
X denotes the independent variable
A is the Y intercept
Coefficient Calculation with Least Square Method / Least Square
Regression
 The coefficients A & b are derived using calculus given by following formula so that we minimize the sum of
squared deviations:
Coefficient Calculation with Least Square Method / Least Square
Regression
 Least Square Method Graph
In linear regression, the line of best fit is a straight line as shown in the following diagram:

 The given data points are to be minimized by the method of reducing residuals or offsets of each point
from the line. The vertical offsets are generally used in surface, polynomial and hyperplane problems, while
perpendicular offsets are utilized in common practice.
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
Coefficient Calculation with Least Square Method / Least Square
Regression
 Uses of Least Square Method
 The Least Square Method is used in order to find the independent variables in different fields coming from
Anthropology to Zoology:

Medicine: Study of Smoking and Life Expectancy based on it.

Economy: Study of the relation between Capital investment and Sales.

Biology: Study of Measured Data - Age and Length of Fish.

Agriculture: Study related to age and yield of the site.


Regression Analysis
Regression is a method for understanding the relationship between independent variables or features and
a dependent variable or outcome.

Regression analysis is a form of predictive modelling technique which investigates the relationship
between a dependent variable and independent variable

Regression analysis is one of the most basic tools in the area of machine learning used for prediction.

In Regression analysis, an algorithm is used to predict continuous outcomes / variable.

Regression analysis is an integral part of any forecasting or predictive model, so is a common method
found in machine learning powered predictive analytics.

Machine learning regression generally involves plotting a line of best fit through the data points.

Using this plot, the machine learning model can make predictions about the data.
Regression Analysis
In simple words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the regression
line is minimum."

The distance between each point and the line is minimised to achieve the best fit line

The distance between data points and line tells whether a model has captured a strong relationship or
not.

Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables.

It predicts continuous/real values such as temperature, age, salary, price, etc.
Regression Analysis
Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.

It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.

This approach required labelled input and output training data to train models

Example: Suppose there is a marketing company A,

who does various advertisement every year and get sales on that.

Now, the company wants to do the advertisement of $200 in the year 2023 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.
Uses of Regression Analysis
Prediction of rain using temperature and other factors

Determining Market trends

Prediction of road accidents due to rash driving.

Determining the strength of predictors

Trend forecasting

Forecasting continuous outcomes like house prices, stock prices, sales prediction, map salary changes,
weather condition etc.,

Predicting the success of future retail sales or marketing campaigns to ensure resources are used
effectively.
Uses of Regression Analysis
Predicting customer or user trends, such as on streaming services or ecommerce websites.

Analysing datasets to establish the relationships between variables and an output.

Predicting interest rates or stock prices from a variety of factors.

Creating time series visualizations.


Uses of Regression Analysis
Regression models a target prediction value based on independent variables.

It is mostly used for finding out the relationship between variables and forecasting.

Different regression models differ based on – the kind of relationship between dependent and
independent variables they are considering, and the number of independent variables getting used.
Simple Linear Regression
Linear regression is a statistical regression method which is used for predictive analysis.

It is one of the very simple and easy algorithms which works on regression and shows the relationship
between the continuous variables / quantity.

It is used for solving the regression problem in machine learning.

Linear regression shows the linear relationship between the independent variable (X-axis) and the
dependent variable (Y-axis), hence called linear regression.

If there is only one input variable (x), then such linear regression is called simple linear regression. And
if there is more than one input variable, then such linear regression is called multiple linear regression.

Simple Linear regression is a linear regression technique which plots a straight line within data points to
minimise error between the line and the data points.
Simple Linear Regression
 Outliers may be a common occurrence in simple linear regression because of the straight line of best
fit.

While training and building a regression model, it is these coefficients which are learned and fitted to
training data.

The aim of the training is to find the best fit line such that cost function is minimized. [The cost
function helps in measuring the error]. During the training process, we try to minimize the error between
actual and predicted values and thus minimizing the cost function.

The relationship between variables in the linear regression model can be explained using the below image.
Here we are predicting the salary of an employee on the basis of the year of experience.
Simple Linear Regression

Below is the mathematical equation for linear regression to predict dependent variable (Y) based on values
of independent variables (X). It can be used for the cases where we want to predict some continuous
quantity.
Applications of Simple Regression
 Predicting the house price based on the size of the house, availability of schools in the area, and other
essential factors

Predicting the sales revenue of a company based on data such as the previous sales of the company

Predicting the temperature of any day based on data such as wind speed, humidity, atmospheric pressure

Predicting age of a person

Analysing trends and sales estimates

Salary forecasting

Demand Forecasting – To predict demand for goods and services. For example, restaurant chains can
predict the quantity of food depending on weather.

Real estate prediction – To model residential home prices as a function of the home’s living area,
bathrooms, no. of bedrooms, lot size
Applications of Simple Regression
Arriving at ETAs in traffic.

Risk Analysis for disease – For example; To analyse the effect of a proposed radiation treatment on
reducing tumour sizes based on patient attributes such as age or weight

Economic Growth – Used to determine the Economic Growth of a country or a state in coming quarter; can
also be used to predict the GDP of a country.

Product Price – Can be used to predict what would be the price of a product in the future.

Score Prediction – To predict the no. of runs a player would score in the coming matches based on
previous performance
Simple Regression Use Case
Advantages & Disadvantages
Advantages:

Linear regression performs exceptionally well for linearly separable data

Easier to implement, interpret and efficient to train

It handles overfitting well using dimensionally reduction techniques, regularization and cross-
validation

Extrapolation beyond a specific data set

Disadvantages:

The assumption of linearity between dependent and independent variables

It is often quite prone to noise and overfitting

Linear regression is quite sensitive to outliers

It is prone to multicollinearity
Multiple Linear Regression
In Simple Linear Regression, a single Independent/Predictor(X) variable is used to model the dependent
variable (Y). But there may be various cases in which the dependent variable is affected by more than one
predictor variable; for such cases, the Multiple Linear Regression algorithm is used.

For MLR, the dependent or target variable(Y) must be the continuous/real, but the predictor or
independent variable may be of continuous or categorical form.

Each feature/dependent variable must model the linear relationship with the independent variable.

MLR tries to fit a regression line through a multidimensional space of data-points.

The technique enables analysts to determine the variation of the model and the relative contribution of
each independent variable in the total variance.

Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one
predictor variable to predict the dependent / response / feature variable. We can define it as:
Multiple Linear Regression
 “Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent
variable.”

 It can also be non-linear, where the dependent and independent variables do not follow a straight line.

You can use multiple linear regression when you want to know:

How strong the relationship is between two or more independent variables and one dependent
variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).

The value of the dependent variable at a certain value of the independent variables (e.g. the expected
yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).
Multiple Linear Regression
 In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor variables
x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied for the
multiple linear regression equation, the equation becomes:

Y= b0+b1x1+b2x2+b3x 3 +...... bnxn + ϵ


Where,

Y= Output/Response variable

b0=the y-intercept

b1, b2, b3 , bn....= Coefficients of the model.

x1, x2, x3, x4,...= Various dependent/feature variable

ϵ=the model’s error term (also known as the residuals)


Assumptions of MLR
A linear relationship should exist between the dependent and independent variables.

The regression residuals must be normally distributed i.e. Multiple linear regression assumes that the
amount of error in the residuals is similar at each point of the linear model.

In a normal distribution, data is symmetrically distributed with no skew. When plotted on a graph, the
data follows a bell shape, with most values clustering around a central region and tapering off as they
go further away from the center.

Normal distributions are also called Gaussian distributions or bell curves because of their shape.

MLR assumes little or no multicollinearity (correlation between the independent variable) in data.
Multiple Linear Regression
Example:

Prediction of CO2 emission based on engine size and number of cylinders in a car.

You are a public health researcher interested in social factors that influence heart disease. You survey
500 towns and gather data on the percentage of people in each town who smoke, the percentage of
people in each town who bike to work, and the percentage of people in each town who have heart
disease. Because you have two independent variables and one dependent variable, and all your
variables are quantitative, you can use multiple linear regression to analyze the relationship between
them.

Applications of Multiple Linear Regression:

Effectiveness of Independent variable on prediction:

Predicting the impact of changes


Polynomial Regression
Polynomial Regression is a regression algorithm that models the relationship between a dependent(y)
and independent variable(x) as nth degree polynomial.

The Polynomial Regression equation is given below:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial
terms to the Multiple Linear regression equation to convert it into Polynomial Regression.

It is a linear model with some modification in order to increase the accuracy.

The dataset used in Polynomial regression for training is of non-linear nature.

It makes use of a linear regression model to fit the complicated and non-linear functions and datasets.
Polynomial Regression
Hence, "In Polynomial regression, the original features are converted into Polynomial features of
required degree (2,3,..,n) and then modeled using a linear model."

If we apply a linear model on a linear dataset, then it provides us a good result as we have seen in Simple
Linear Regression, but if we apply the same model without any modification on a non-linear dataset, then
it will produce a drastic output. Due to which loss function will increase, the error rate will be high, and
accuracy will be decreased.

So for such cases, where data points are arranged in a non-linear fashion, we need the Polynomial
Regression model. We can understand it in a better way using the below comparison diagram of the
linear dataset and non-linear dataset.
Polynomial Regression

In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover it with a
linear model, then we can clearly see that it hardly covers any data point. On the other hand, a curve is
suitable to cover most of the data points, which is of the Polynomial model.

Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression
model instead of Simple Linear Regression.

A Polynomial Regression algorithm is also called Polynomial Linear Regression because it does not
depend on the variables, instead, it depends on the coefficients, which are arranged in a linear
fashion.
Equation of Polynomial Regression
Simple Linear Regression equation: y = b0+b1x

Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+....+ bnxn

Polynomial Regression equation: y= b0+b1x + b2x2+ b3x3+....+ bnxn

When we compare the above three equations, we can clearly see that all three equations are Polynomial
equations but differ by the degree of variables.

The Simple and Multiple Linear equations are also Polynomial equations with a single degree,

Polynomial regression equation is Linear equation with the nth degree. So if we add a degree to our linear
equations, then it will be converted into Polynomial Linear equations.
Metric for regression
Most beginners and practitioners most of the time do not bother about the model performance.

 The talk is about building a well-generalized model, Machine learning model cannot have 100 per cent
efficiency otherwise the model is known as a biased model. which further includes the concept of
overfitting and underfitting.

It is necessary to obtain the accuracy on training data, But it is also important to get a genuine and
approximate result on unseen data otherwise Model is of no use.

So to build and deploy a generalized model we require to evaluate the model on different metrics which
helps us to better optimize the performance, fine-tune it, and obtain a better result.

If one metric is perfect, there is no need for multiple metrics.


Metric for regression
In regression problems, we map input variables with continuous output variables.For example, predicting
the share price in stock market, predicting atmospheric temperature etc. Based on various usabilities, much
research is going on in this area to build a more accurate model. When we build a solution for any
regression problem, we compare its performance with the existing work. But to compare the two works,
there should be some standard metric, like measuring distance in meters, plot size in square feet
etc. Similarly, we need to have some standard evaluation metrics to evaluate two regression models

There are five error metrics that are commonly used for evaluating and reporting the performance of a
regression model; they are:
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R-Squared
Adjusted R-Squared
Mean Absolute Error (MAE)
MAE is a very simple metric which calculates the average of absolute difference between actual and
predicted values.

To better understand, let’s take an example you have input data and output data and use Linear
Regression, which draws a best-fit line.

Now you have to find the MAE of your model which is basically a mistake made by the model known
as an error. Now find the difference between the actual value and predicted value that is an absolute error
but we have to find the mean absolute of the complete dataset.

So, sum all the errors and divide them by a total number of observations And this is MAE. And we
aim to get a minimum MAE because this is a loss.
• Where,
• N = total number of data points
• Yi = actual value
• Ŷi = predicted value
Mean Absolute Error (MAE)
If we don’t take the absolute values, then the negative difference will cancel out the positive
difference and we will be left with a zero upon summation.

A small MAE suggests the model is great at prediction, while a large MAE suggests that your model
may have trouble in certain areas. MAE of 0 means that your model is a perfect predictor of the
outputs.

Advantages of MAE

The MAE you get is in the same unit as the output variable.

It is most Robust to outliers.

Disadvantages of MAE

The graph of MAE is not differentiable so we have to apply various optimizers like Gradient descent
which can be differentiable.
Mean Squared Error (MSE)
This is the mean / average of the squared difference of the actual value in the dataset and the value
predicted by the model.

Here, the error term is squared and thus more sensitive to outliers as compared to Mean Absolute Error
(MAE).

MSE uses the square operation to remove the sign of each error value and to punish large errors.

As we take the square of the error, the effect of larger errors become more pronounced then smaller error,
hence the model can now focus more on the larger errors.

• Where,
• N = total number of data points
• Yi = actual value
• Ŷi = predicted value
Mean Squared Error (MSE)
The MSE will be large if there are outliers in the dataset, this is not the case with MAE.

MSE focuses on larger errors, as when we are squaring the error the effect of large errors becomes
more prominent.

If the errors are low, lower than one, then it leads to underestimating the model’s error.

Advantages of MSE

The graph of MSE is differentiable, so you can easily use it as a loss function.

Disadvantages of MSE

The value you get after calculating MSE is a squared unit of output. for example, the output variable is
in meter(m) then after calculating MSE the output we get is in meter squared.
Root Mean Squared Error (RMSE)
As RMSE is clear by the name itself, that it is a simple square root of mean squared error.

It is the average root-squared difference between the real value and the predicted value. By taking a square
root of MSE, we get the Root Mean Square Error.

We want the value of RMSE to be as low as possible, as lower the RMSE value is, the better the model is
with its predictions. A Higher RMSE indicates that there are large deviations between the predicted and
actual value.
• Where,
• N = total number of data points
• Yi = actual value
• Ŷi = predicted value
Advantages of RMSE
The output value you get is in the same unit as the required output variable which makes
interpretation of loss easy.
Disadvantages of RMSE
It is not that robust to outliers as compared to MAE.

You might also like