0% found this document useful (0 votes)

801 views32 pages

MachineLearning Project PDF

This document discusses predicting employees' choice of transportation using machine learning models. It first explores the dataset containing employees' personal and professional details as well as their mode of transportation. Key findings from the exploratory data analysis include age 30 and above and salary 30k and above are more likely to use a car for transportation. Female car usage is also much lower than male usage. The document then outlines the steps to build logistic regression, KNN, and naive bayes models to predict car usage and determine significant predictor variables influencing an employee's choice of transportation.

Uploaded by

Senthil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

801 views32 pages

MachineLearning Project PDF

Uploaded by

Senthil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Machine Learning

Transport Choice of Employees

Senthil Kumar M
22.Sep.2019
Machine Learning (PGP-BABI)
by Great Learning

Table of Contents

INTRODUCTION 2

Observation 4

Step by step approach 5

Exploratory Data Analysis 5
EDA Summary: 12
Logistic Regression 14
KNN 20
Naive Bayes 22

REFERENCES 26
Great Learning PGP 1

Great Learning PGP(BABI)

INTRODUCTION
This project is to understand the determinants of transport choice made by employees.
The given data has an employee information about their mode of transport as well as
their personal and professional details like age, salary, work exp. We need to predict
whether or not an employee will use Car as a mode of transport. Also, which variables
are a significant predictor behind this decision.

We are gonna use multiple model and performance metrics to derive a better model that

can describe a variable influencing employee to use a car as a mode of transport. The
input variables include employee personal details like Age, Salary, Work.exp. We are
going to use

Process Map

Great Learning PGP(BABI)

The structure of input variable is tabled below:

Given the dataset, we required to perform following tasks as explained to complete this
project successfully:

1. EDA
2. Data Preparation
3. Modeling
4. Actionable Insights & Recommendations

Great Learning PGP(BABI)

Observation
Employees use 2 wheeler, public transport and car as a mode of transport to commute to
their workplace. We have been given 418 rows of data with 9 variables. We might want
to cleanup the dataset and convert its type appropriately as required before processing it
for analysis.

Problem statement is that of predicting whether or not an employee will use a car
as a mode of transport, also which variable is a significant predictor behind the
decision.

Step by step approach

We shall do the following to perform stepwise analysis and conclude this project.

1. Exploratory Data Analysis

2. Clustering
3. CART
4. Random Forest
5. Performance Measurement
6. Conclusion

1. Exploratory Data Analysis

We will start with converting categorical variables to factor to start our EDA process.

Great Learning PGP(BABI)

The following graph of overview as how the variables spread with volume of usage:

Great Learning PGP(BABI)

Structure of the dataset printed for reference.

Lets notice that there is a missing value in a variable MBA. we have several ways to treat

but we will remove the whole record as there is only 1 missing value.

There are several automated packages in ‘R’ to perform exploratory data analysis, we are
going to use one such package “dlookr” in this project. EDA report from “dlookr” package
gives us the detailed count of distinct values in each variable along with normality test,
correlation coefficient other descriptive stats are elaborated as below:

Great Learning PGP(BABI)

Normality Test of Numeric Variable:

Normality test statistics proves that Age & Distance variables are closely distributed
normal, while Work Exp & Salary having positive skew in the dataset. Numeric variables
individually tested for normality and skewness values with QQ plots for each variables
printed down for reference.

Great Learning PGP(BABI)

Univariate Distribution: Histogram

Great Learning PGP(BABI)

Churn Ratio by numerical predictors:

We can notice that the higher the salary & age the employees are using a car. There is
clear indication that age 30 above as well as salary 30k and above preferred to use a car
as a mode of transport. Also the distance above 15miles are with higher salary are
choosing car as mode that is very evident in this dataset.

Great Learning PGP(BABI)

The above map depicts that female car usage is much lower compared to male, whereas
qualification doesn’t have any correlation with car usage. But license as we can assume
employee without license uses public transport.

Target based Analysis: (Categorical Variables)

Great Learning PGP(BABI)

Target based Analysis: (Numerical Variables)

AGE:

Great Learning PGP(BABI)

Wrok Exp:

Great Learning PGP(BABI)

Salary:

Great Learning PGP(BABI)

Distance:

Great Learning PGP(BABI)

Grouped Correlation Plot of Numerical Variables

Great Learning PGP(BABI)

EDA Summary:
1. There is 1 NA’s in the entire dataset
2. Correlation between predictor variables found and removed from dataset
3. We had challenges in numeric variables that were positively correlated, hence
removing a variable Age & Work.Exp reduced numeric predictors to only 2 to go
ahead with model. We could have used other methods such as PCA to fix the same
but since the correlation about 90% we are retaining only Salary from personal
details to train our model.

Data Preparation:
Our primary interest as per problem statement is to understand the factors influencing
car usage. Hence we will create a new column for Car usage. It will take value 0 for
Public Transport & 2 Wheeler and 1 for car usage Understand the proportion of cars in
Transport Mode.

Great Learning PGP(BABI)

Only 8% of employees in the dataset is using cars as a mode of transport.

Smote the Data

Before Smote After Smote

Great Learning PGP(BABI)

Modelling Building:

Great Learning PGP(BABI)

Improving the model

Great Learning PGP(BABI)

VIF scores to verify the multicollinearity, Work.Exp variable score above 10 confirms
that multicollinearity exists in the dataset.

After dropping out the Age & Work.Exp variables, we notice that VIF results are
significantly low and we can conclude that the data is free from multicollinearity. We
might go ahead training model with remaining variables.

Great Learning PGP(BABI)

Analysis of Transport Choice of Employees - A Project On Machine Learning
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
24 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
Predicting Commute Mode with ML
100% (1)
Predicting Commute Mode with ML
12 pages
Assignment ML
100% (2)
Assignment ML
21 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
Machine Learning - Final Project Report - Problem 1
100% (1)
Machine Learning - Final Project Report - Problem 1
26 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
FRA Report
100% (1)
FRA Report
30 pages
Social Media Tourism: Model Analysis
No ratings yet
Social Media Tourism: Model Analysis
39 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Capstone Project Submission
100% (2)
Capstone Project Submission
31 pages
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
No ratings yet
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
61 pages
Project Report
100% (3)
Project Report
36 pages
Great Learning DVT Final Project - Car Claims For Insurance
100% (1)
Great Learning DVT Final Project - Car Claims For Insurance
113 pages
Data Visualization in Tableau - Car Insurance Claim Project
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
51 pages
Predictive Modeling Guide
No ratings yet
Predictive Modeling Guide
29 pages
Wine Sales Forecasting Report
No ratings yet
Wine Sales Forecasting Report
26 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Problem Statement1
No ratings yet
Problem Statement1
1 page
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Election Prediction Model Analysis
100% (2)
Election Prediction Model Analysis
46 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
Dbms db03 2020 Assessment (Solved) : Find Study Resources
50% (2)
Dbms db03 2020 Assessment (Solved) : Find Study Resources
12 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
Predictive Modeling PDF
100% (3)
Predictive Modeling PDF
49 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
PCA Project Advanced Statistics
67% (3)
PCA Project Advanced Statistics
24 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
100% (1)
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
24 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
100% (1)
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
10 pages
Data Mining Project Anshul
100% (1)
Data Mining Project Anshul
48 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
SMDM Assignment: Problem 1
0% (1)
SMDM Assignment: Problem 1
16 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Data Analysis for Marketing Experts
100% (2)
Data Analysis for Marketing Experts
24 pages
FoodHub Data Insights for Growth
No ratings yet
FoodHub Data Insights for Growth
20 pages
Mini Project DVT
No ratings yet
Mini Project DVT
3 pages
Detail Project Report SMDM
100% (1)
Detail Project Report SMDM
25 pages
A Wholesale Distributor
100% (3)
A Wholesale Distributor
5 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
DVT Group Assignment PDF
100% (1)
DVT Group Assignment PDF
14 pages
SQL Project Questions
0% (1)
SQL Project Questions
3 pages
AKSHAYA - Advanced Statistics Project Report
No ratings yet
AKSHAYA - Advanced Statistics Project Report
50 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages

MachineLearning Project PDF

Uploaded by

MachineLearning Project PDF

Uploaded by

Machine Learning

Transport Choice of Employees

Step by step approach 5

Great Learning PGP(BABI)

We are gonna use multiple model and performance metrics to derive a better model that

Great Learning PGP(BABI)

The structure of input variable is tabled below:

Great Learning PGP(BABI)

Step by step approach

1. Exploratory Data Analysis

1. Exploratory Data Analysis

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Structure of the dataset printed for reference.

Lets notice that there is a missing value in a variable MBA. we have several ways to treat

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Normality Test of Numeric Variable:

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Univariate Distribution: Histogram

Great Learning PGP(BABI)

Churn Ratio by numerical predictors:

Great Learning PGP(BABI)

Target based Analysis: (Categorical Variables)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Target based Analysis: (Numerical Variables)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Grouped Correlation Plot of Numerical Variables

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Only 8% of employees in the dataset is using cars as a mode of transport.

Smote the Data

Before Smote After Smote

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Improving the model

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

Great Learning PGP(BABI)

You might also like