2/28/22, 3:10 PM Linear regression
NAME:- PRIADARSHANA
ROLL NO:- 2019332
SIMPLE LINEAR REGRESSION
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 1/7
2/28/22, 3:10 PM Linear regression
In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import statsmodels.api as sm
data = pd.read_csv(r"Advertising.csv")
data
Out[1]:
Unnamed: 0 TV Radio Newspaper Sales
0 1 230.1 37.8 69.2 22.1
1 2 44.5 39.3 45.1 10.4
2 3 17.2 45.9 69.3 9.3
3 4 151.5 41.3 58.5 18.5
4 5 180.8 10.8 58.4 12.9
5 6 8.7 48.9 75.0 7.2
6 7 57.5 32.8 23.5 11.8
7 8 120.2 19.6 11.6 13.2
8 9 8.6 2.1 1.0 4.8
9 10 199.8 2.6 21.2 10.6
10 11 66.1 5.8 24.2 8.6
11 12 214.7 24.0 4.0 17.4
12 13 23.8 35.1 65.9 9.2
13 14 97.5 7.6 7.2 9.7
14 15 204.1 32.9 46.0 19.0
15 16 195.4 47.7 52.9 22.4
16 17 67.8 36.6 114.0 12.5
17 18 281.4 39.6 55.8 24.4
18 19 69.2 20.5 18.3 11.3
19 20 147.3 23.9 19.1 14.6
20 21 218.4 27.7 53.4 18.0
21 22 237.4 5.1 23.5 12.5
22 23 13.2 15.9 49.6 5.6
23 24 228.3 16.9 26.2 15.5
24 25 62.3 12.6 18.3 9.7
25 26 262.9 3.5 19.5 12.0
26 27 142.9 29.3 12.6 15.0
27 28 240.1 16.7 22.9 15.9
28 29 248.8 27.1 22.9 18.9
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 2/7
2/28/22, 3:10 PM Linear regression
Unnamed: 0 TV Radio Newspaper Sales
29 30 70.6 16.0 40.8 10.5
... ... ... ... ... ...
170 171 50.0 11.6 18.4 8.4
171 172 164.5 20.9 47.4 14.5
172 173 19.6 20.1 17.0 7.6
173 174 168.4 7.1 12.8 11.7
174 175 222.4 3.4 13.1 11.5
175 176 276.9 48.9 41.8 27.0
176 177 248.4 30.2 20.3 20.2
177 178 170.2 7.8 35.2 11.7
178 179 276.7 2.3 23.7 11.8
179 180 165.6 10.0 17.6 12.6
180 181 156.6 2.6 8.3 10.5
181 182 218.5 5.4 27.4 12.2
182 183 56.2 5.7 29.7 8.7
183 184 287.6 43.0 71.8 26.2
184 185 253.8 21.3 30.0 17.6
185 186 205.0 45.1 19.6 22.6
186 187 139.5 2.1 26.6 10.3
187 188 191.1 28.7 18.2 17.3
188 189 286.0 13.9 3.7 15.9
189 190 18.7 12.1 23.4 6.7
190 191 39.5 41.1 5.8 10.8
191 192 75.5 10.8 6.0 9.9
192 193 17.2 4.1 31.6 5.9
193 194 166.8 42.0 3.6 19.6
194 195 149.7 35.6 6.0 17.3
195 196 38.2 3.7 13.8 7.6
196 197 94.2 4.9 8.1 9.7
197 198 177.0 9.3 6.4 12.8
198 199 283.6 42.0 66.2 25.5
199 200 232.1 8.6 8.7 13.4
200 rows × 5 columns
In [2]: data.columns
Out[2]: Index(['Unnamed: 0', 'TV', 'Radio', 'Newspaper', 'Sales'], dtype='object')
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 3/7
2/28/22, 3:10 PM Linear regression
In [4]: plt.figure(figsize=(16, 8))
plt.scatter(
data['TV'],
data['Sales']
)
plt.xlabel("TV ")
plt.ylabel("Sales ")
plt.show()
In [6]: X = data['TV'].values.reshape(-1,1)
y = data['Sales'].values.reshape(-1,1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_stat
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
reg = LinearRegression()
reg.fit(X_train, y_train)
(140, 1)
(60, 1)
(140, 1)
(60, 1)
Out[6]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 4/7
2/28/22, 3:10 PM Linear regression
In [7]: print(reg.coef_[0][0])
print(reg.intercept_[0])
print("The linear model is: Y = {:.5} + {:.5}X".format(reg.intercept_[0], reg.coe
0.04581434217189623
7.310810165411681
The linear model is: Y = 7.3108 + 0.045814X
In [9]: predictions = reg.predict(X_test)
plt.figure(figsize=(16, 8))
plt.scatter(
data['TV'],
data['Sales']
)
plt.plot(
X_test,
predictions,
linewidth=2,
color='red'
)
plt.xlabel("TV ")
plt.ylabel("Sales ")
plt.show()
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 5/7
2/28/22, 3:10 PM Linear regression
In [10]: X=X_train
y=y_train
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.555
Model: OLS Adj. R-squared: 0.552
Method: Least Squares F-statistic: 172.3
Date: Mon, 28 Feb 2022 Prob (F-statistic): 4.76e-26
Time: 14:28:59 Log-Likelihood: -371.64
No. Observations: 140 AIC: 747.3
Df Residuals: 138 BIC: 753.2
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 7.3108 0.611 11.957 0.000 6.102 8.520
x1 0.0458 0.003 13.125 0.000 0.039 0.053
==============================================================================
Omnibus: 1.727 Durbin-Watson: 1.908
Prob(Omnibus): 0.422 Jarque-Bera (JB): 1.452
Skew: -0.086 Prob(JB): 0.484
Kurtosis: 2.532 Cond. No. 366.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctl
y specified.
In [11]: print('Train Score :', reg.score(X_train,y_train))
print('Test Score:', reg.score(X_test,y_test))
Train Score : 0.5552336104251212
Test Score: 0.725606346597073
In [12]: from sklearn import metrics
print('MSE :', metrics.mean_squared_error(y_test,predictions))
print('RMSE :', np.sqrt(metrics.mean_squared_error(y_test,predictions)))
MSE : 7.497479593464674
RMSE : 2.7381525876883988
MULTIPLE LINEAR REGRESSION
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 6/7
2/28/22, 3:10 PM Linear regression
In [14]: Xs = data.drop(['Sales', 'Unnamed: 0'], axis=1)
y = data['Sales'].values.reshape(-1,1)
reg = LinearRegression()
reg.fit(Xs, y)
print("The linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper
The linear model is: Y = 2.9389 + 0.045765*TV + 0.18853*radio + -0.0010375*news
paper
In [16]: X = np.column_stack((data['TV'], data['Radio'], data['Newspaper']))
y = data['Sales']
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 0.897
Model: OLS Adj. R-squared: 0.896
Method: Least Squares F-statistic: 570.3
Date: Mon, 28 Feb 2022 Prob (F-statistic): 1.58e-96
Time: 15:01:01 Log-Likelihood: -386.18
No. Observations: 200 AIC: 780.4
Df Residuals: 196 BIC: 793.6
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.9389 0.312 9.422 0.000 2.324 3.554
x1 0.0458 0.001 32.809 0.000 0.043 0.049
x2 0.1885 0.009 21.893 0.000 0.172 0.206
x3 -0.0010 0.006 -0.177 0.860 -0.013 0.011
==============================================================================
Omnibus: 60.414 Durbin-Watson: 2.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241
Skew: -1.327 Prob(JB): 1.44e-33
Kurtosis: 6.332 Cond. No. 454.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctl
y specified.
localhost:8888/notebooks/Machine learning/Linear regression.ipynb 7/7