KEMBAR78
Assignment 4 | PDF | Regression Analysis | Linear Regression
0% found this document useful (0 votes)
70 views7 pages

Assignment 4

The document outlines a process for creating a linear regression model using Python libraries such as Pandas, Numpy, and Matplotlib. It includes steps for data preparation, model creation using the Polyfit function, predictions, and performance evaluation using R-squared. Additionally, it introduces the Boston Housing dataset, detailing its attributes and characteristics.

Uploaded by

Omkar Landge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views7 pages

Assignment 4

The document outlines a process for creating a linear regression model using Python libraries such as Pandas, Numpy, and Matplotlib. It includes steps for data preparation, model creation using the Polyfit function, predictions, and performance evaluation using R-squared. Additionally, it introduces the Boston Housing dataset, detailing its attributes and characteristics.

Uploaded by

Omkar Landge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

In [1]:

# Import libraries and create alias for Pandas, Numpy and Matplotlib import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Create a Dataframe with Dependent Variable(x) and independent variable y. x=np.array([95,85,80,70,60])

y=np.array([85,95,70,65,70])

In [3]:
# Create Linear Regression Model using Polyfit Function: model= np.polyfit(x, y,
1)

In [4]:
# Observe the coefficients of the model.
model
array([ 0.64383562,
26.78082192])
Out[4]:

In [5]:
# Predict the Y value for X and observe the output. predict =
np.poly1d(model)
predict(65)
68.63013698630
137
Out[5]:

In [6]:
# Predict the y_pred for all values of x. y_pred = predict(x)
y_pred
array([87.94520548, 81.50684932, 78.28767123,
71.84931507, 65.4109589 ])
Out[6]:

In [7]:
# Evaluate the performance of Model (R-Suare) from
sklearn.metrics import r2_score
r2_score(y, y_pred)
0.4803218090889
322
Out[7]:

In [8]:
# Plotting the linear regression model y_line = model[1] +
model[0]* x
plt.plot(x, y_line, c = 'r')
plt.scatter(x,y_pred)
plt.scatter(x,y,c='r')
<matplotlib.collections.PathCollection at
0x1c17e8ab490>
Out[8]:

Loading [MathJax]/extensions/Safe.js

Algorithm (Boston Dataset):

In [9]:
# Import libraries and create alias for Pandas, Numpy and Matplotlib import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

In [16]:
# Import the Boston Housing dataset
from sklearn.datasets import load_boston
Boston = load_boston()

In [18]:
Boston
Loading [MathJax]/extensions/Safe.js
{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02, 4.9800e+00],
[2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
9.1400e+00],
[2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
4.0300e+00],
...,
[6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
5.6400e+00],
[1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
6.4800e+00],
[4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
7.8800e+00]]), 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,
17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,
25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,
23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,
32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,
34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,
20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,
26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,
31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,
22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,
42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,
36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,
32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,
20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,
20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,
22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,
21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,
19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,
32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,
18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,
16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,
13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3, 8.8,
7.2, 10.5, 7.4, 10.2, 11.5, 15.1, 23.2, 9.7, 13.8, 12.7, 13.1,
12.5, 8.5, 5. , 6.3, 5.6, 7.2, 12.1, 8.3, 8.5, 5. , 11.9,
27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3, 7. , 7.2, 7.5, 10.4,
8.8, 8.4, 16.7, 14.2, 20.8, 13.4, 11.7, 8.3, 10.2, 10.9, 11. ,
9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4, 9.6, 8.7, 8.4, 12.8,
10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,
15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,
19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,
Loading [MathJax]/extensions/Safe.js
29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,
20.6, 21.2, 19.1, 20.6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]), 'feature_name s': array(['CRIM', 'ZN',
'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7'), 'DESCR': "..
_boston_dataset:\n\nB oston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**
\n\n :Number of Instances: 506 \n\n :Number of Attributes: 13 numeric/categorical predictive. Median Value
(attribute 14) is usually the target.\n\n :Attribute Informa tion (in order):\n - CRIM per capita crime rate by
town\n - ZN p roportion of residential land zoned for lots over 25,000 sq.ft.\n - INDUS prop ortion of
non-retail business acres per town\n - CHAS Charles River dummy var iable (= 1 if tract bounds river; 0
otherwise)\n - NOX nitric oxides concent ration (parts per 10 million)\n - RM average number of rooms per
dwelling\n - AGE proportion of owner-occupied units built prior to 1940\n - DIS we ighted distances to five
Boston employment centres\n - RAD index of accessib ility to radial highways\n - TAX full-value property-tax
rate per $10,000\n - PTRATIO pupil-teacher ratio by town\n - B 1000(Bk - 0.63)^2 where Bk is the
proportion of black people by town\n - LSTAT % lower status of the populat ion\n - MEDV Median value of
owner-occupied homes in $1000's\n\n :Missing Attribute Values: None\n\n :Creator: Harrison, D. and
Rubinfeld, D.L.\n\nThis is a co py of UCI ML housing
dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-database s/housing/\n\n\nThis dataset was taken
from the StatLib library which is maintained at C arnegie Mellon University.\n\nThe Boston house-price data
of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics &
Managemen t,\nvol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley,
1980. N.B. Various transformations are used in the table on\npages 244-261 of t he latter.\n\nThe Boston
house-price data has been used in many machine learning papers that address regression\nproblems. \n
\n.. topic:: References\n\n - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and
Sources of Collinear ity', Wiley, 1980. 244-261.\n - Quinlan,R. (1993). Combining Instance-Based and Model
Based Learning. In Proceedings on the Tenth International Conference of Machine Learnin g, 236-243,
University of Massachusetts, Amherst. Morgan Kaufmann.\n", 'filename': 'bost on_house_prices.csv',
'data_module': 'sklearn.datasets.data'}

In [20]:
# Initialize the data frame
data = pd.DataFrame(boston.data)

# Add the feature names to the dataframe


data.columns = boston.feature_names
data.head()
7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 3

Out[20]: 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX
394.63 2.94 4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0
PTRATIO B LSTAT 0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2
222.0 18.7 396.90 5.33
4.0900 1.0 296.0 15.3 396.90 4.98 1 0.02731 0.0 7.07 0.0 0.469

6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 2 0.02729 0.0

In [21]:
# Adding target variable to dataframe data['PRICE'] =
boston.target
Loading [MathJax]/extensions/Safe.js
In [22]:
# Perform Data Preprocessing( Check for missing values) data.isnull().sum()
TAX 0
Out[22]: PTRATIO 0 B
CRIM 0 ZN 0 0 LSTAT 0
INDUS 0 PRICE 0
CHAS 0 NOX dtype: int64
0 RM 0 AGE 0
DIS 0 RAD 0

In [23]:
# Split dependent variable and independent variables x =
data.drop(['PRICE'], axis = 1)
y = data['PRICE']

In [25]:
# splitting data to training and testing dataset. from sklearn.model_selection
import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size =0.2,random_state = 0)
In [27]:
# Use linear regression( Train the Machine ) to Create Model from
sklearn.linear_model import LinearRegression
lm = LinearRegression()
model = lm.fit(xtrain, ytrain)

In [28]:
# Predict the y_pred for all values of train_x and test_x ytrain_pred =
lm.predict(xtrain)
ytest_pred = lm.predict(xtest)

In [29]:
# Evaluate the performance of Model for train_y and test_y df =
pd.DataFrame(ytrain_pred,ytrain)
df = pd.DataFrame(ytest_pred,ytest)

In [30]:
# Calculate Mean Square Paper for train_y and test_y from sklearn.metrics
import mean_squared_error, r2_score mse = mean_squared_error(ytest, ytest_pred)
print(mse)

Loading [MathJax]/extensions/Safe.js
mse = mean_squared_error(ytrain_pred,ytrain)
print(mse)

33.44897999767649
19.326470203585725

In [33]:
# Plotting the linear regression model
plt.scatter(ytrain ,ytrain_pred,c='blue',marker='o',label='Training data')
plt.scatter(ytest,ytest_pred,c='lightgreen',marker='s',label='Test data') plt.xlabel('True values')
plt.ylabel('Predicted')
plt.title("True value vs Predicted value")
plt.legend(loc='upper left')
plt.plot()
plt.show()
In [ ]:

Loading [MathJax]/extensions/Safe.js

You might also like