0% found this document useful (0 votes)

13 views11 pages

Linear Regression

The document outlines the process of performing linear regression using a dataset containing years of experience and corresponding salaries. It details steps such as data collection, preprocessing, model building, testing, and evaluation, including the use of libraries like pandas and scikit-learn. The model achieves a high accuracy score of approximately 0.986, and predictions can be made for new inputs.

Uploaded by

Harshitha Kv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Linear Regression

Uploaded by

Harshitha Kv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Linear Regression

Steps
Data collection
data wrangling / pre-processing
Model building
Testing the model
Evaluate the model performance
Prediction with random input

In [1]:

# required libraries
import numpy as np
import pandas as pd
In [2]:

# Load the dataset into pandas dataframe

data = pd.read_csv('Salary_Data.csv')
data

Out[2]:

YearsExperience Salary

0 1.1 39343.0

1 1.3 46205.0

2 1.5 37731.0

3 2.0 43525.0

4 2.2 39891.0

5 2.9 56642.0

6 3.0 60150.0

7 3.2 54445.0

8 3.2 64445.0

9 3.7 57189.0

10 3.9 63218.0

11 4.0 55794.0

12 4.0 56957.0

13 4.1 57081.0

14 4.5 61111.0

15 4.9 67938.0

16 5.1 66029.0

17 5.3 83088.0

18 5.9 81363.0

19 6.0 93940.0

20 6.8 91738.0

21 7.1 98273.0

22 7.9 101302.0

23 8.2 113812.0

24 8.7 109431.0

25 9.0 105582.0

26 9.5 116969.0

27 9.6 112635.0

28 10.3 122391.0

29 10.5 121872.0
In [3]:

data.head()

Out[3]:

YearsExperience Salary

0 1.1 39343.0

1 1.3 46205.0

2 1.5 37731.0

3 2.0 43525.0

4 2.2 39891.0

In [4]:

data.head(10)

Out[4]:

YearsExperience Salary

0 1.1 39343.0

1 1.3 46205.0

2 1.5 37731.0

3 2.0 43525.0

4 2.2 39891.0

5 2.9 56642.0

6 3.0 60150.0

7 3.2 54445.0

8 3.2 64445.0

9 3.7 57189.0

In [5]:

data.tail()

Out[5]:

YearsExperience Salary

25 9.0 105582.0

26 9.5 116969.0

27 9.6 112635.0

28 10.3 122391.0

29 10.5 121872.0
In [6]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null float64
dtypes: float64(2)
memory usage: 608.0 bytes

In [7]:

# yearsExperience(x) -> Independent variable

# Salary(y) -> Dependent variable

In [8]:

data.shape

Out[8]:

(30, 2)

In [9]:

data.describe()

Out[9]:

YearsExperience Salary

count 30.000000 30.000000

mean 5.313333 76003.000000

std 2.837888 27414.429785

min 1.100000 37731.000000

25% 3.200000 56720.750000

50% 4.700000 65237.000000

75% 7.700000 100544.750000

max 10.500000 122391.000000

Data Pre-processing

Step1: Handle Missing Data

In [10]:

data.isnull().any()

Out[10]:

YearsExperience False
Salary False
dtype: bool
Step2: Convert text column if any. to numeric column

Its not required as there are no non numeric columns

Step3: Perform Data Visualization

In [11]:

import matplotlib.pyplot as plt

In [12]:

plt.scatter(data.YearsExperience, data.Salary)

Out[12]:

<matplotlib.collections.PathCollection at 0x23cae99ddf0>

Step4: Split the data into dependent and independent variable

In [13]:

x = data.iloc[:,:1]
x

Out[13]:

YearsExperience

0 1.1

1 1.3

2 1.5

3 2.0

4 2.2

5 2.9

6 3.0

7 3.2

8 3.2

9 3.7

10 3.9

11 4.0

12 4.0

13 4.1

14 4.5

15 4.9

16 5.1

17 5.3

18 5.9

19 6.0

20 6.8

21 7.1

22 7.9

23 8.2

24 8.7

25 9.0

26 9.5

27 9.6

28 10.3

29 10.5
In [14]:

y = data.iloc[:,1:2]
y

Out[14]:

Salary

0 39343.0

1 46205.0

2 37731.0

3 43525.0

4 39891.0

5 56642.0

6 60150.0

7 54445.0

8 64445.0

9 57189.0

10 63218.0

11 55794.0

12 56957.0

13 57081.0

14 61111.0

15 67938.0

16 66029.0

17 83088.0

18 81363.0

19 93940.0

20 91738.0

21 98273.0

22 101302.0

23 113812.0

24 109431.0

25 105582.0

26 116969.0

27 112635.0

28 122391.0

29 121872.0

In [15]:

type(x)

Out[15]:

pandas.core.frame.DataFrame
In [16]:

type(y)

Out[16]:

pandas.core.frame.DataFrame

In [17]:

np.shape(x)

Out[17]:

(30, 1)

In [18]:

np.shape(y)

Out[18]:

(30, 1)

Step 5: Splitting the data into training and testing dataset

In [19]:

# Scikitlearn library has a train-test-fit function

In [20]:

from sklearn.model_selection import train_test_split

In [21]:

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

In [22]:

print(x_train.shape) # Training input

(24, 1)

In [23]:

print(x_test.shape) # testing input

(6, 1)

In [24]:

print(y_train.shape) # training output

(24, 1)

In [25]:

print(y_test.shape) # testing output

(6, 1)
Model Building - Linear Regression

In [26]:

from sklearn.linear_model import LinearRegression

Performing Linear regression by fitting the training data to the model

In [27]:

lr = LinearRegression()
lr.fit(x_train, y_train)

Out[27]:

LinearRegression()

Perform Testing

In [28]:

y_pred = lr.predict(x_test)
y_pred

Out[28]:

array([[ 40748.96184072],
[122699.62295594],
[ 64961.65717022],
[ 63099.14214487],
[115249.56285456],
[107799.50275317]])

Compare Predicted value to actual value

In [29]:

y_pred # predicted y

Out[29]:

array([[ 40748.96184072],
[122699.62295594],
[ 64961.65717022],
[ 63099.14214487],
[115249.56285456],
[107799.50275317]])
In [30]:

y_test # actual y

Out[30]:

Salary

2 37731.0

28 122391.0

13 57081.0

10 63218.0

26 116969.0

24 109431.0

Model Evaluation

In [31]:

# importing r2score metric

from sklearn.metrics import r2_score

In [32]:

# accuracy checking
acc = r2_score(y_pred, y_test)
acc

Out[32]:

0.986482673117654

Let us predict the value of y for some random input x

In [33]:

sal = lr.predict([[15]])
sal

Out[33]:

array([[166468.72605157]])

Draw the best-fit line

In [34]:

plt.scatter(x_train, y_train)
plt.plot(x_train, lr.predict(x_train),'r')

Out[34]:

[<matplotlib.lines.Line2D at 0x23cb6ab55b0>]

In [ ]:

Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
8 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Experiment No.8
No ratings yet
Experiment No.8
5 pages
2.3 ML (Implementation of Polynomial Regression Using Python)
No ratings yet
2.3 ML (Implementation of Polynomial Regression Using Python)
9 pages
Python Linear Regression Guide
No ratings yet
Python Linear Regression Guide
23 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Liner Regression Chapter N1
No ratings yet
Liner Regression Chapter N1
1 page
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
DS P6 Yash
No ratings yet
DS P6 Yash
8 pages
Task 1
No ratings yet
Task 1
5 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
ML Lab
No ratings yet
ML Lab
29 pages
Regression
No ratings yet
Regression
16 pages
Supervised Learning For Data Science...
No ratings yet
Supervised Learning For Data Science...
14 pages
Task 8
No ratings yet
Task 8
2 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Simple Linear Regression Lab II
No ratings yet
Simple Linear Regression Lab II
5 pages
ML Prac 1
No ratings yet
ML Prac 1
4 pages
22UCS303 DS-Unit IV-LINEAR REGRESSION
No ratings yet
22UCS303 DS-Unit IV-LINEAR REGRESSION
19 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
ML Record
No ratings yet
ML Record
14 pages
Data Science for Beginners
No ratings yet
Data Science for Beginners
98 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Unit5 - Linear Regression
No ratings yet
Unit5 - Linear Regression
4 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
? What Is Regression
No ratings yet
? What Is Regression
12 pages
Linear Regression Explained
No ratings yet
Linear Regression Explained
8 pages
Praktikum 1 Jupiter Machine Learning
No ratings yet
Praktikum 1 Jupiter Machine Learning
1 page
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Linear Regression Salary Prediction
No ratings yet
Linear Regression Salary Prediction
8 pages
223a1131 ML Exp 1
No ratings yet
223a1131 ML Exp 1
8 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lecture 5 - Polynomial Regression Imran 07032025 114203am
No ratings yet
Lecture 5 - Polynomial Regression Imran 07032025 114203am
39 pages
ML Polynomial Regression4
No ratings yet
ML Polynomial Regression4
36 pages
DA Programs
No ratings yet
DA Programs
44 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
171 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Da Rec
No ratings yet
Da Rec
29 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
2 Linear Regression
No ratings yet
2 Linear Regression
5 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
4 pages
Major Project 1
No ratings yet
Major Project 1
5 pages
DSEWebNet Smart Device Application Manual
No ratings yet
DSEWebNet Smart Device Application Manual
46 pages
Oracle Control File Recreation Guide
100% (1)
Oracle Control File Recreation Guide
3 pages
9A04306 Digital Logic Design
No ratings yet
9A04306 Digital Logic Design
4 pages
Manual Rotuladora Kroy K4100
No ratings yet
Manual Rotuladora Kroy K4100
59 pages
67067bos54070 cp12
No ratings yet
67067bos54070 cp12
21 pages
Addition Tips and Tricks
No ratings yet
Addition Tips and Tricks
11 pages
MUCLecture 2024 2539121
No ratings yet
MUCLecture 2024 2539121
15 pages
(Ebook PDF) Business Driven Information Systems 6 Edition by Paige Baltzan Download
100% (2)
(Ebook PDF) Business Driven Information Systems 6 Edition by Paige Baltzan Download
50 pages
KIRAN's Resume
No ratings yet
KIRAN's Resume
1 page
OS Environmental Science Level 6 - 024936
No ratings yet
OS Environmental Science Level 6 - 024936
117 pages
74F382 4-Bit Arithmetic Logic Unit: General Description Features
No ratings yet
74F382 4-Bit Arithmetic Logic Unit: General Description Features
9 pages
Q1 - LE - TLE 7 - Lesson 2 - Week 2
No ratings yet
Q1 - LE - TLE 7 - Lesson 2 - Week 2
13 pages
Naive Bayes for Data Science Students
No ratings yet
Naive Bayes for Data Science Students
1,652 pages
LESSON 10 - Functions
No ratings yet
LESSON 10 - Functions
6 pages
BGP for Network Professionals
No ratings yet
BGP for Network Professionals
12 pages
Instagram Powerpoint Template by Ppthemes
No ratings yet
Instagram Powerpoint Template by Ppthemes
13 pages
Accordion Arduino Mega Code
No ratings yet
Accordion Arduino Mega Code
9 pages
USB Dongle Setup for RES2DINV/3DINV
No ratings yet
USB Dongle Setup for RES2DINV/3DINV
1 page
RDX QuikStation 4 Quick Start Guide
No ratings yet
RDX QuikStation 4 Quick Start Guide
2 pages
Katalog Agra Jaya 2022
No ratings yet
Katalog Agra Jaya 2022
41 pages
Corel Draw Shortcut Keys
100% (1)
Corel Draw Shortcut Keys
5 pages
Ai Lab Reports
No ratings yet
Ai Lab Reports
7 pages
Linear Programming Duality Guide
No ratings yet
Linear Programming Duality Guide
17 pages
Database Backup REPORT - Updated
No ratings yet
Database Backup REPORT - Updated
19 pages
Oee Pocket Guide
No ratings yet
Oee Pocket Guide
4 pages
MicroLoan Project
No ratings yet
MicroLoan Project
6 pages
Flow, - Mass Flow, - Level, - Pressure, - Conductivity, - pH-Sensor, - Viscosity, - Humidity
No ratings yet
Flow, - Mass Flow, - Level, - Pressure, - Conductivity, - pH-Sensor, - Viscosity, - Humidity
40 pages
Entry-Level Web Developer Profile
No ratings yet
Entry-Level Web Developer Profile
2 pages
Parametric Sweeps in Ads
No ratings yet
Parametric Sweeps in Ads
7 pages
Smart Meter Verification Emerson Recommended Calibration Practices For Coriolis Meters Used To Comply Aga 11 Api Mpms CH 14 9 Micro Motion en 4928858
No ratings yet
Smart Meter Verification Emerson Recommended Calibration Practices For Coriolis Meters Used To Comply Aga 11 Api Mpms CH 14 9 Micro Motion en 4928858
2 pages