KEMBAR78
Simple Linear Regression | PDF | Errors And Residuals | Linear Regression
0% found this document useful (0 votes)
9 views5 pages

Simple Linear Regression

Uploaded by

anujgite5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Simple Linear Regression

Uploaded by

anujgite5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Sipna College of Engineering & Technology, Amravati.

Department of Computer Science and Engineering


Branch: - Computer Science and Engineering Class: - III Year
Subject: - Data Science and Statistics Lab Sem: - V

Student Manual
Practical No 2
Aim: Implementation of Simple Linear Regression using python Code
Software Required: R Studio/Anaconda
Theory: Simple linear regression is an approach for predicting a response using a single
feature. It is assumed that the two variables are linearly related. Hence, we try to find a linear
function that predicts the response value(y) as accurately as possible as a function of the
feature or independent variable(x). Let us consider a dataset where we have a value of
response y for every feature x:

For generality, we define:


x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in above example, n=10).
A scatter plot of the above dataset looks like:-

1 CSE/SEM-VI/DSS Lab/PR02
Sipna College of Engineering & Technology, Amravati.
Department of Computer Science and Engineering

Now, the task is to find a line that fits best in the above scatter plot so that
we can predict the response for any new feature values. (i.e a value of x not
present in a dataset)
This line is called a regression line.
The equation of regression line is represented as:

Here,
 h(x_i) represents the predicted response value for ith observation.
 b_0 and b_1 are regression coefficients and represent y-
intercept and slope of regression line respectively.
To create our model, we must “learn” or estimate the values of regression
coefficients b_0 and b_1. And once we’ve estimated these coefficients, we
can use the model to predict responses!
In this article, we are going to use the principle of Least Squares.
Now consider:

Here, e_i is a residual error in ith observation.


So, our aim is to minimize the total residual error.
We define the squared error or cost function, J as:

and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is
minimum! Without going into the mathematical details, we present the result
here:

2 CSE/SEM-VI/DSS Lab/PR02
Sipna College of Engineering & Technology, Amravati.
Department of Computer Science and Engineering

where SS_xy is the sum of cross-deviations of y and x:

and SS_xx is the sum of squared deviations of x:

Code:
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x

3 CSE/SEM-VI/DSS Lab/PR02
Sipna College of Engineering & Technology, Amravati.
Department of Computer Science and Engineering
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()

Output:
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
And graph obtained looks like this:

4 CSE/SEM-VI/DSS Lab/PR02
Sipna College of Engineering & Technology, Amravati.
Department of Computer Science and Engineering

5 CSE/SEM-VI/DSS Lab/PR02

You might also like