0% found this document useful (0 votes)

125 views7 pages

03 Multiple Linear Regression

Multiple linear regression allows predicting a target variable (y) based on two or more predictor variables (x1, x2, x3...). It extends simple linear regression, which uses only one predictor. The multiple linear regression equation is: y = b0 + b1x1 + b2x2 +...+ bnxn Where y is the target, x1...xn are the predictors, and b0...bn are the coefficients. The document demonstrates implementing multiple linear regression in Python. It loads data, preprocesses/scales the predictors, trains a model on a sample, evaluates performance on a test set, and makes predictions on new data both automatically and manually using the coefficients

Uploaded by

Gabriel Gheorghe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views7 pages

03 Multiple Linear Regression

Uploaded by

Gabriel Gheorghe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

** Difference Between Simple And Multiple Linear Regression

Simple linear regression has only one x and one y variable.

Multiple linear regression has one y and two or more x variables.
For instance, when we predict rent based on square feet alone that is simple linear regression.
When we predict rent based on square feet and age of the building that is an example of multiple linear
regression.

An extension of simple linear regression

In simple linear regression there is a one-to-one relationship between the input variable and the output
variable. But in multiple linear regression, as the name implies there is a many-to-one relationship, instead of
just using one input variable, you use several.

Multiple Linear Regression

Till now, we have created the model based on only one feature. Now, we’ll include multiple features and
create a model to see the relationship between those features and the label column. This is called Multiple
Linear Regression.

y = b0 + b1x1+ b2x2... + bnxn

What do terms represent?

y is the response or the target variable

x1,x2,x3...xn are the feature as it is multiple
b1,b2...bn are the coefficient of x1,x2,..xn respectively
b0 is the intercept

Each x represents a different feature, and each feature has its own coefficient

Implementation

Step1: Import data

In [1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [2]:

dataset = pd.read_csv('data/50_Startups.csv')
In [3]:

dataset.head()

Out[3]:

R&D Spend Administration Marketing Spend State Profit

0 165349.20 136897.80 471784.10 New York 192261.83

1 162597.70 151377.59 443898.53 California 191792.06

2 153441.51 101145.55 407934.54 Florida 191050.39

3 144372.41 118671.85 383199.62 New York 182901.99

4 142107.34 91391.77 366168.42 Florida 166187.94

See here more than one feature so it is multiplt linear regression

Here Profit is our Target Feature

In [4]:

dataset.shape

Out[4]:

(50, 5)

Step2: Visuallize The Data

In [5]:

#sns.pairplot(dataset)

In [6]:

dataset = dataset.drop('State',axis=True)

Here i simply drop the State feature.

in next some days i will show how to deal with categorical feature.

In [7]:

dataset.head()

Out[7]:

R&D Spend Administration Marketing Spend Profit

0 165349.20 136897.80 471784.10 192261.83

1 162597.70 151377.59 443898.53 191792.06

2 153441.51 101145.55 407934.54 191050.39

3 144372.41 118671.85 383199.62 182901.99

4 142107.34 91391.77 366168.42 166187.94

In [8]:

corr = dataset.corr()
sns.heatmap(corr,annot=True)

Out[8]:

<matplotlib.axes._subplots.AxesSubplot at 0x2e6d767ad08>

Evaluate The Model

Scalling The Data

In [9]:

X = dataset.drop('Profit',axis=True)
y = dataset['Profit']

In [10]:

X.head() #before standardized data

Out[10]:

R&D Spend Administration Marketing Spend

0 165349.20 136897.80 471784.10

1 162597.70 151377.59 443898.53

2 153441.51 101145.55 407934.54

3 144372.41 118671.85 383199.62

4 142107.34 91391.77 366168.42

In [11]:

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X = sc.fit_transform(X)

As here all data are in very big range so we need to take all the data in same range.
So here i use StandardScaler which take all data in same range
here it use z score to standardized data
Z- score formula

Z = x-μ /σ

Z = standard score
x = observed value
μ = mean of the sample
σ = standard deviation of the sample
Some Of the Algorithm Like Tree Base Algorithm not require scalling
In [12]:

X #after standardized/scalling data

Out[12]:

array([[ 2.01641149e+00, 5.60752915e-01, 2.15394309e+00],

[ 1.95586034e+00, 1.08280658e+00, 1.92360040e+00],
[ 1.75436374e+00, -7.28257028e-01, 1.62652767e+00],
[ 1.55478369e+00, -9.63646307e-02, 1.42221024e+00],
[ 1.50493720e+00, -1.07991935e+00, 1.28152771e+00],
[ 1.27980001e+00, -7.76239071e-01, 1.25421046e+00],
[ 1.34006641e+00, 9.32147208e-01, -6.88149930e-01],
[ 1.24505666e+00, 8.71980011e-01, 9.32185978e-01],
[ 1.03036886e+00, 9.86952101e-01, 8.30886909e-01],
[ 1.09181921e+00, -4.56640246e-01, 7.76107440e-01],
[ 6.20398248e-01, -3.87599089e-01, 1.49807267e-01],
[ 5.93085418e-01, -1.06553960e+00, 3.19833623e-01],
[ 4.43259872e-01, 2.15449064e-01, 3.20617441e-01],
[ 4.02077603e-01, 5.10178953e-01, 3.43956788e-01],
[ 1.01718075e+00, 1.26919939e+00, 3.75742273e-01],
[ 8.97913123e-01, 4.58678535e-02, 4.19218702e-01],
[ 9.44411957e-02, 9.11841968e-03, 4.40446224e-01],
[ 4.60720127e-01, 8.55666318e-01, 5.91016724e-01],
[ 3.96724938e-01, -2.58465367e-01, 6.92992062e-01],
[ 2.79441650e-01, 1.15983657e+00, -1.74312698e+00],
[ 5.57260867e-02, -2.69587651e-01, 7.23925995e-01],
[ 1.02723599e-01, 1.16918609e+00, 7.32787791e-01],
[ 6.00657792e-03, 5.18495648e-02, 7.62375876e-01],
[-1.36200724e-01, -5.62211268e-01, 7.74348908e-01],
[ 7.31146008e-02, -7.95469167e-01, -5.81939297e-01],
[-1.99311688e-01, 6.56489139e-01, -6.03516725e-01],
[ 3.53702028e-02, 8.21717916e-01, -6.35835495e-01],
[-3.55189938e-02, 2.35068543e-01, 1.17427116e+00],
[-1.68792717e-01, 2.21014050e+00, -7.67189437e-01],
[-1.78608540e-01, 1.14245677e+00, -8.58133663e-01],
[-2.58074369e-01, -2.05628659e-01, -9.90357166e-01],
[-2.76958231e-01, 1.13055391e+00, -1.01441945e+00],
[-2.26948675e-01, 2.83923813e-01, -1.36244978e+00],
[-4.01128925e-01, -6.59324033e-01, 2.98172434e-02],
[-6.00682122e-01, 1.31053525e+00, -1.87861793e-03],
[-6.09749941e-01, -1.30865753e+00, -4.54931587e-02],
[-9.91570153e-01, 2.05924691e-01, -8.17625734e-02],
[-6.52532310e-01, -2.52599402e+00, -1.15608256e-01],
[-1.17717755e+00, -1.99727037e+00, -2.12784866e-01],
[-7.73820359e-01, -1.38312156e+00, -2.97583276e-01],
[-9.89577015e-01, -1.00900218e-01, -3.15785883e-01],
[-1.00853372e+00, -1.32079581e+00, -3.84552407e-01],
[-1.10210556e+00, -9.06937535e-01, -5.20595959e-01],
[-1.28113364e+00, 2.17681524e-01, -1.44960468e+00],
[-1.13430539e+00, 1.20641936e+00, -1.50907418e+00],
[-1.60035036e+00, 1.01253936e-01, -1.72739998e+00],
[-1.59341322e+00, -1.99321741e-01, 7.11122474e-01],
[-1.62236202e+00, 5.07721876e-01, -1.74312698e+00],
[-1.61043334e+00, -2.50940884e+00, -1.74312698e+00],
[-1.62236202e+00, -1.57225506e-01, -1.36998473e+00]])
In [13]:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state
= 0)

In [14]:

X_train.shape,X_test.shape,y_train.shape,y_test.shape

Out[14]:

((40, 3), (10, 3), (40,), (10,))

Build Model
In [15]:

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

Out[15]:

LinearRegression()

In [16]:

y_pred = regressor.predict(X_test).round(1)

In [17]:

calculation = pd.DataFrame(np.c_[y_test,y_pred], columns = ["Original Salary","Predict

Salary"])
calculation.head(5)

Out[17]:

Original Salary Predict Salary

0 103282.38 103901.9

1 144259.40 132763.1

2 146121.95 133567.9

3 77798.83 72911.8

4 191050.39 179627.9

In [18]:

print("Training Accuracy :", regressor.score(X_train, y_train))

print("Testing Accuracy :", regressor.score(X_test, y_test))

Training Accuracy : 0.9499572530324031

Testing Accuracy : 0.9393955917820571
In [19]:

regressor.intercept_

Out[19]:

111297.71256204927

In [20]:

regressor.coef_

Out[20]:

array([35391.2501208 , 815.21987542, 4202.06618916])

Test The Model

In [21]:

feature = [165349.20,136897.80,471784.10]
scale_feature = sc.transform([feature])
scale_feature

Out[21]:

array([[2.01641149, 0.56075291, 2.15394309]])

In [22]:

y_pred_test = regressor.predict(scale_feature)
y_pred_test #By Using Sklearn Library

Out[22]:

array([192169.18440985])

In [23]:

# Here I use b1x1+b2x2+b3x3+b0 BY MANUAL

35391.2501208*2.01641149+815.21987542*0.56075291+4202.06618916*2.15394309+ 111297.71256
204927

Out[23]:

192169.1843003897

Now above you see manual and automatic prediction on the same data in this way linear regression
predict the data

Kmeans Example Mnnit
No ratings yet
Kmeans Example Mnnit
23 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
ML Labs
No ratings yet
ML Labs
14 pages
Practical 5
No ratings yet
Practical 5
13 pages
EXAM PREPERATION - Ipynb - Colaboratory-1
No ratings yet
EXAM PREPERATION - Ipynb - Colaboratory-1
8 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Kmeans and Apriori
No ratings yet
Kmeans and Apriori
20 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Python File
No ratings yet
Python File
5 pages
Minutely
No ratings yet
Minutely
1 page
Polynomial Regression Blogpost
No ratings yet
Polynomial Regression Blogpost
8 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
One Hot Encoding
No ratings yet
One Hot Encoding
12 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
1 page
Assumption of Linear Regression
No ratings yet
Assumption of Linear Regression
6 pages
Assignment-07-DBSCAN Clustering (Crimes) - Jupyter Notebook
No ratings yet
Assignment-07-DBSCAN Clustering (Crimes) - Jupyter Notebook
11 pages
Pattern - Recognition - 3 - Code With Output
No ratings yet
Pattern - Recognition - 3 - Code With Output
7 pages
AI Regression & Classification Guide
No ratings yet
AI Regression & Classification Guide
47 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Linear Reg 33
No ratings yet
Linear Reg 33
3 pages
Practical 8
No ratings yet
Practical 8
5 pages
HW8 La
No ratings yet
HW8 La
18 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Big Data Assignment - 4
No ratings yet
Big Data Assignment - 4
6 pages
Linear Regression Lab Guide
No ratings yet
Linear Regression Lab Guide
5 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
10 pages
Practical 5
No ratings yet
Practical 5
6 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
No ratings yet
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
4 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
DNN Tutorial for Data Scientists
No ratings yet
DNN Tutorial for Data Scientists
9 pages
L - AND - T - Project - Naveen 24cs002895
No ratings yet
L - AND - T - Project - Naveen 24cs002895
7 pages
Bank Nifty PDF
No ratings yet
Bank Nifty PDF
16 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Ai Practicle
No ratings yet
Ai Practicle
8 pages
Unit 2
No ratings yet
Unit 2
12 pages
Docu 4
No ratings yet
Docu 4
3 pages
Import As
100% (1)
Import As
27 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
BHMC17 P5.ipynb - Colaboratory
No ratings yet
BHMC17 P5.ipynb - Colaboratory
4 pages
Regression Model
No ratings yet
Regression Model
6 pages
C1 W2 Lab02 Multiple Variable Soln
No ratings yet
C1 W2 Lab02 Multiple Variable Soln
11 pages
Implementation of Simple Linear Regression Algorithm Using Python
No ratings yet
Implementation of Simple Linear Regression Algorithm Using Python
12 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Datos de 50 Nuevas
No ratings yet
Datos de 50 Nuevas
63 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
Daily Gold Price Time Series Analysis 1649918083
No ratings yet
Daily Gold Price Time Series Analysis 1649918083
23 pages
Linear Regression for Data Science Students
No ratings yet
Linear Regression for Data Science Students
21 pages
Health Outcomes With Linear Regression
No ratings yet
Health Outcomes With Linear Regression
8 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
Ai Last 5
No ratings yet
Ai Last 5
4 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
University of Northern Philippines
No ratings yet
University of Northern Philippines
5 pages
Tarea 8
No ratings yet
Tarea 8
7 pages
2 Linear Regression
No ratings yet
2 Linear Regression
5 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Kerr - Solve Ivp
No ratings yet
Kerr - Solve Ivp
8 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Polynomial Regression in Python
No ratings yet
Polynomial Regression in Python
6 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
Backward && Forward Feature Selection PART-2
No ratings yet
Backward && Forward Feature Selection PART-2
6 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
13 Cross - Validation
No ratings yet
13 Cross - Validation
4 pages
AI Voice Browser for Data Collection
No ratings yet
AI Voice Browser for Data Collection
11 pages
Knowledge Learning Steps Learnig in ServiceNow
No ratings yet
Knowledge Learning Steps Learnig in ServiceNow
1 page
ServiceNow Admin Certification Guide
No ratings yet
ServiceNow Admin Certification Guide
2 pages
Addsup
No ratings yet
Addsup
2 pages
Title of The Activity: Mean, Variance, and Standard Deviation of A Discrete Random
No ratings yet
Title of The Activity: Mean, Variance, and Standard Deviation of A Discrete Random
6 pages
AMR - 2 Group 2
No ratings yet
AMR - 2 Group 2
26 pages
Metadta: A Stata Command For Meta-Analysis and Meta-Regression of Diagnostic Test Accuracy Data - A Tutorial
No ratings yet
Metadta: A Stata Command For Meta-Analysis and Meta-Regression of Diagnostic Test Accuracy Data - A Tutorial
15 pages
Annotated 3
No ratings yet
Annotated 3
5 pages
2 Forecasting
No ratings yet
2 Forecasting
75 pages
ANOVA Basics for SPSS Beginners
No ratings yet
ANOVA Basics for SPSS Beginners
29 pages
Types of Correlation
No ratings yet
Types of Correlation
4 pages
Mod 8 Test Practice
No ratings yet
Mod 8 Test Practice
17 pages
SKEWENESS
No ratings yet
SKEWENESS
12 pages
A Review of Research Process, Data Collection and Analysis: Surya Raj Niraula
No ratings yet
A Review of Research Process, Data Collection and Analysis: Surya Raj Niraula
6 pages
Disruptiveness of Innovations - Measurement and An Assessment of Reliability and Validity
No ratings yet
Disruptiveness of Innovations - Measurement and An Assessment of Reliability and Validity
11 pages
Ayush File 1
No ratings yet
Ayush File 1
37 pages
Madhur BRM Practical File Final
No ratings yet
Madhur BRM Practical File Final
105 pages
Devya Distribution
No ratings yet
Devya Distribution
14 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
25 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
Moser 1992
No ratings yet
Moser 1992
4 pages
SPSS
No ratings yet
SPSS
22 pages
3b-Assignment of Numerical Measures
No ratings yet
3b-Assignment of Numerical Measures
6 pages
Dr. Rabi Shaw
No ratings yet
Dr. Rabi Shaw
17 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
19 pages
UNIT 3-Bayesian Statistics
No ratings yet
UNIT 3-Bayesian Statistics
80 pages
Machine Learning Course Guide
No ratings yet
Machine Learning Course Guide
3 pages
Aragon (2000) Evaluation of Four Vertical Jump Tests - Methodology, Reliability, Validity and Accuracy.
No ratings yet
Aragon (2000) Evaluation of Four Vertical Jump Tests - Methodology, Reliability, Validity and Accuracy.
16 pages
Stat and Prob - Q4 - Mod8 - Solving Problems Involving Test of Hypothesis On Population Mean
No ratings yet
Stat and Prob - Q4 - Mod8 - Solving Problems Involving Test of Hypothesis On Population Mean
22 pages
SPSS - Understanding
No ratings yet
SPSS - Understanding
5 pages
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
No ratings yet
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
8 pages
MATH 1281 - Unit 2 MA
100% (1)
MATH 1281 - Unit 2 MA
6 pages
Crowder - Classical Competing Risks
No ratings yet
Crowder - Classical Competing Risks
201 pages

03 Multiple Linear Regression

Uploaded by

03 Multiple Linear Regression

Uploaded by

** Difference Between Simple And Multiple Linear Regression

Simple linear regression has only one x and one y variable.

An extension of simple linear regression

Multiple Linear Regression

y = b0 + b1x1+ b2x2... + bnxn

What do terms represent?

y is the response or the target variable

Step1: Import data

R&D Spend Administration Marketing Spend State Profit

0 165349.20 136897.80 471784.10 New York 192261.83

1 162597.70 151377.59 443898.53 California 191792.06

2 153441.51 101145.55 407934.54 Florida 191050.39

3 144372.41 118671.85 383199.62 New York 182901.99

4 142107.34 91391.77 366168.42 Florida 166187.94

See here more than one feature so it is multiplt linear regression

Step2: Visuallize The Data

Here i simply drop the State feature.

R&D Spend Administration Marketing Spend Profit

0 165349.20 136897.80 471784.10 192261.83

1 162597.70 151377.59 443898.53 191792.06

2 153441.51 101145.55 407934.54 191050.39

3 144372.41 118671.85 383199.62 182901.99

4 142107.34 91391.77 366168.42 166187.94

Evaluate The Model

Scalling The Data

X.head() #before standardized data

R&D Spend Administration Marketing Spend

0 165349.20 136897.80 471784.10

1 162597.70 151377.59 443898.53

2 153441.51 101145.55 407934.54

3 144372.41 118671.85 383199.62

4 142107.34 91391.77 366168.42

from sklearn.preprocessing import StandardScaler

X #after standardized/scalling data

array([[ 2.01641149e+00, 5.60752915e-01, 2.15394309e+00],

from sklearn.model_selection import train_test_split

((40, 3), (10, 3), (40,), (10,))

from sklearn.linear_model import LinearRegression

calculation = pd.DataFrame(np.c_[y_test,y_pred], columns = ["Original Salary","Predict

Original Salary Predict Salary

print("Training Accuracy :", regressor.score(X_train, y_train))

Training Accuracy : 0.9499572530324031

array([35391.2501208 , 815.21987542, 4202.06618916])

Test The Model

array([[2.01641149, 0.56075291, 2.15394309]])

# Here I use b1x1+b2x2+b3x3+b0 BY MANUAL

You might also like