4/25/25, 9:12 AM DL_LR_1.
ipynb - Colab
1. Problem Statement – Real estate agents want help to predict the house price for
regions in the USA. He gave you the dataset to work on and you decided to use the
Linear Regression Model. Create a model that will help him to estimate what the house
would sell for. URL for a dataset: https://github.com/huzaifsayed/Linear-Regression-
Model-for-House-PricePrediction/blob/master/USA_Housing.csv
keyboard_arrow_down Linear Regression Machine Learning Project for House Price Prediction
keyboard_arrow_down Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
keyboard_arrow_down Importing Data and Checking out.
HouseDF = pd.read_csv('USA_Housing.csv')
HouseDF.head()
Avg. Area Avg. Area Avg. Area Avg. Area Number Area
Price Address
Income House Age Number of Rooms of Bedrooms Population
208 Michael Ferry Apt.
0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06
674\nLaurabury, NE 3701...
188 Johnson Views Suite 079\nLake
1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06
Kathleen, CA...
9127 Elizabeth
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06
Stravenue\nDanieltown, WI 06482...
HouseDF.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Avg. Area Income 5000 non-null float64
1 Avg. Area House Age 5000 non-null float64
2 Avg. Area Number of Rooms 5000 non-null float64
3 Avg. Area Number of Bedrooms 5000 non-null float64
4 Area Population 5000 non-null float64
5 Price 5000 non-null float64
6 Address 5000 non-null object
dtypes: float64(6), object(1)
memory usage: 273.6+ KB
HouseDF.describe()
Avg. Area Avg. Area House Avg. Area Number of Avg. Area Number of Area
Price
Income Age Rooms Bedrooms Population
count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03
mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06
std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05
min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04
25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05
50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06
75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06
max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06
https://colab.research.google.com/drive/1XryAyE8KlmqLqHl2drS7VFsYKN47tyod#scrollTo=qTfL0zUSCRyn&printMode=true 1/5
4/25/25, 9:12 AM DL_LR_1.ipynb - Colab
HouseDF.columns
Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
dtype='object')
sns.pairplot(HouseDF, diag_kind='kde')
plt.show()
https://colab.research.google.com/drive/1XryAyE8KlmqLqHl2drS7VFsYKN47tyod#scrollTo=qTfL0zUSCRyn&printMode=true 2/5
4/25/25, 9:12 AM DL_LR_1.ipynb - Colab
keyboard_arrow_down Exploratory Data Analysis for House Price Prediction
sns.histplot(HouseDF['Price'], kde=True)
plt.show()
https://colab.research.google.com/drive/1XryAyE8KlmqLqHl2drS7VFsYKN47tyod#scrollTo=qTfL0zUSCRyn&printMode=true 3/5
4/25/25, 9:12 AM DL_LR_1.ipynb - Colab
corr = HouseDF.select_dtypes(include='number').corr()['Price'].drop('Price')
corr.sort_values().plot(kind='barh', figsize=(8, 5), title='Correlation with Price')
plt.show()
keyboard_arrow_down Training a Linear Regression Model
X and y List
X = HouseDF[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]
y = HouseDF['Price']
keyboard_arrow_down Split Data into Train, Test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
keyboard_arrow_down Creating and Training the LinearRegression Model
from sklearn.linear_model import LinearRegression
https://colab.research.google.com/drive/1XryAyE8KlmqLqHl2drS7VFsYKN47tyod#scrollTo=qTfL0zUSCRyn&printMode=true 4/5
4/25/25, 9:12 AM DL_LR_1.ipynb - Colab
lm = LinearRegression()
lm.fit(X_train,y_train)
▾ LinearRegression
LinearRegression()
keyboard_arrow_down LinearRegression Model Evaluation
print(lm.intercept_)
-2640159.7968526953
coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])
coeff_df
Coefficient
Avg. Area Income 21.528276
Avg. Area House Age 164883.282027
Avg. Area Number of Rooms 122368.678027
Avg. Area Number of Bedrooms 2233.801864
Area Population 15.150420
keyboard_arrow_down Predictions from our Linear Regression Model
predictions = lm.predict(X_test)
plt.scatter(y_test,predictions)
plt.show()
https://colab.research.google.com/drive/1XryAyE8KlmqLqHl2drS7VFsYKN47tyod#scrollTo=qTfL0zUSCRyn&printMode=true 5/5