Multiple Linear Regression
1. Multiple Linear Regression
Multiple Linear Regression explains the relationship between a single dependent
continuous variable and more than one independent variable.
2. Problem statement
Assuming that we are planning to buy a new house and need to predict the price of a
house.
Here price depends on area (square feet), bedrooms, and age of the home (in years).
Given these prices we have to predict prices of new homes based on area, bedrooms,
and age.
Given these home prices find out price of a home that has:
3000 sqr ft area, 3 bedrooms, 40 years old
2500 sqr ft area, 4 bedrooms, 5 years old
3. Dataset
We use homeprices1.csv which contains:
Area Bedrooms Age Price
2600 3 20 550000
3000 4 15 565000
3200 NaN 18 610000
3600 3 30 595000
4000 5 8 760000
4100 6 8 810000
4. Machine Learning Terminology
4.1 Features and label
Area, Bedrooms, Age → Independent variables (features)
Price → Dependent variable (label)
4.2 Models
A machine learning model is a formula that predicts a label from features.
4.3 Prediction
The prediction is the output of the model.
5. Programs
demo1.py – Loading dataset
import pandas as pd
df = pd.read_csv("homeprices1.csv")
print(df)
demo2.py – Finding median of bedrooms
import pandas as pd
df = pd.read_csv("homeprices1.csv")
print("Mean of the bedrooms")
print(df.bedrooms.median())
# Output: 4.0
demo3.py – Fill NA with median
import pandas as pd
df = pd.read_csv("homeprices1.csv")
print("Filling missing value with mean\n")
m = df.bedrooms.median()
df.bedrooms = df.bedrooms.fillna(m)
print(df)
demo4.py – Model training
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv("homeprices1.csv")
m = df.bedrooms.median()
df.bedrooms = df.bedrooms.fillna(m)
a = df.drop('price', axis='columns')
reg = LinearRegression()
reg.fit(a.values, df.price)
print("Model trained")
demo5.py – Finding intercept
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv("homeprices1.csv")
m = df.bedrooms.median()
df.bedrooms = df.bedrooms.fillna(m)
a = df.drop('price', axis='columns')
reg = LinearRegression()
reg.fit(a.values, df.price)
print("Intercept is:")
print(reg.intercept_)
# Output: 221323.00186540408
demo6.py – Finding coefficients
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv("homeprices1.csv")
m = df.bedrooms.median()
df.bedrooms = df.bedrooms.fillna(m)
a = df.drop('price', axis='columns')
reg = LinearRegression()
reg.fit(a.values, df.price)
print("Coefficients are:")
print(reg.coef_)
# Output: [112.06244194 23388.88007794 -3231.71790863]
demo7.py – Predict price for 3000 sq ft, 3 bedrooms, 40 years old
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv("homeprices1.csv")
m = df.bedrooms.median()
df.bedrooms = df.bedrooms.fillna(m)
a = df.drop('price', axis='columns')
reg = LinearRegression()
reg.fit(a.values, df.price)
print("Price of home with 3000 sqr ft area, 3 bedrooms, 40 year old")
print(reg.predict([[3000, 3, 40]]))
# Output: [498408.25158031]
demo8.py –Manual calculation of price
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv("homeprices1.csv")
m = df.bedrooms.median()
df.bedrooms = df.bedrooms.fillna(m)
a = df.drop('price', axis='columns')
reg = LinearRegression()
reg.fit(a.values, df.price)
print("Price of home with 3000 sqr ft area, 3 bedrooms, 40 year old")
b = 112.06244194*3000 + 23388.88007794*3 + (-3231.71790863)*40 + 221323.00186540384
print(b)
Output: 498408.25158031