0% found this document useful (0 votes)

518 views10 pages

Mini Project Report

This document describes a machine learning model to predict software developer salaries based on factors like experience level, education, country, and developer type. It discusses collecting data from Stack Overflow surveys, cleaning the data, creating models using algorithms like decision trees and XGBoost, and deploying the best model (decision trees) via a Streamlit web app. The model aims to help developers and employers determine reasonable salary expectations.

Uploaded by

Vinay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

518 views10 pages

Mini Project Report

Uploaded by

Vinay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

GRAPHIC ERA DEEMED TO BE UNIVERSITY

DEHRADUN

MINI PROJECT REPORT

Machine Learning Based Application

VINAY KUMAR
2015401
CST
30
DATED - 14/12/2021.

Abstract –
Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to
imitate the way that humans learn, gradually improving its accuracy.
Machine learning is an important component of the growing field of
data science. Through the use of statistical methods, algorithms are
trained to make classifications or predictions, uncovering key insights
within data mining projects. These insights subsequently drive
decision making within applications and businesses, ideally impacting
key growth metrics. As big data continues to expand and grow, the
market demand for data scientists will increase, requiring them to
assist in the identification of the most relevant business questions and
subsequently the data to answer them.

So, here I have used data science to predict the salaries of developers
with different experience level, college degree, country and predicted
the average salary for the user. To get the least error, I have used
many models to predict the proper value with least number of errors.

This project will help us in finding what salary we should expect from
other companies, so rather than thinking ourselves, we can take help
from this application.
Introduction –

Machine Learning, as the name says, is all about machines learning

automatically without being explicitly programmed or learning without any
direct human intervention. This machine learning process starts with feeding
them good quality data and then training the machines by building various
machine learning models using the data and different algorithms. The choice of
algorithms depends on what type of data we have and what kind of task we are
trying to automate.

Now days, Major reason an employee switches the company is the salary of the
employee. Employees keep switching the company to get the expected salary.
And it leads to loss of the company and to overcome this loss we came with an
idea what if the employee gets the desired/expected salary from the Company or
Organization. In this Competitive world everyone has a higher expectation and
goals. But we cannot randomly provide everyone their expected salary there
should be a system which should measure the ability of the Employee for the
Expected salary. We cannot decide the exact salary but we can predict it by
using certain data sets. A prediction is an assumption about a future event.

Linear regression algorithm in machine learning is a supervised learning

technique to approximate the mapping function to get the best predictions. The
main goal of regression is the construction of an efficient model to predict the
dependent attribute from a bunch of attribute variables. A regression problem is
when the output value is real or a continuous value like salary.

In order to gain useful insights into the job recruitment, we compare different
strategies and machine learning models. The methodology different phases like:
Data collection, Data cleaning, Manual feature engineering, Data set
description, Automatic feature selection, Model selection, Model training and
validation, Model comparison. We are focusing to develop a system that will
predict the salary based on different parameters used in company and
abovementioned methodology phases. Some of the parameters we collected
from company data are: Job Type: CFO, CEO, Senior, vice president, manager
1. Degree: Doctoral, Bachelors, Masters, High School
2. Years of Experience
3. Country
4. Type of Developer
5. Salary

Motivation –
Nowadays prediction engine has become so popular that they are generating
accurate and affordable predictions just like a human, and being using industry
to solve many of the problems. Predicting justified salary for employee is
always being a challenging job for an employer. In this project I am proposing a
salary prediction model with suitable algorithm using key features required to
predict the salary of employee.

Many websites like glassdoor and indeed predict the salary of an employee
according to the given attribute and they need to be precise while doing this. I
have tried to implement most of the models to find the best and most precise
value here, to get the best predicted value here.

Methodology –

1. I imported the libraries needed for its implementation.

Pandas, numpy, matplotlib
2. Read the file and check all the columns and what are its values.
3. Take into account only the important columns.
Working on the salary column-
Removing the null values from the salary column.
Convert the float values into integers.
We will also remove the null values of all the other columns.

Working on Experience –
We will only convert the string values into integers.
So more than 50 and less 1 year is converted to 50 and 0.5.

Working on Education Level –

We will remove the unnecessary values from the degrees and just remain
with the-
1. Bachelors
2. masters
3. less than bachelors
4. post doctorate.
5. def clean_education(x):
6. if 'Bachelor’s degree' in x:
7. return 'Bachelor’s degree'
8. if 'Master’s degree' in x:
9. return 'Master’s degree'
10. if 'Professional degree' in x or 'Other doctoral' in x:
11. return 'Post grad'
12. return 'Less than a Bachelors'

We will remove the users with all the other type of values.

Working on the Developer Type-

We will just take into account the prime type of developer-
1. Full stack dev
2. Back end dev
3. Front end dev
4. Mobile Dev
5. Game Dev
6. Data Scientist
7. def clean_devtype(x):
8. if 'front-end' in x:
9. return 'front-end developer'
10. if 'back-end' in x:
11. return 'back-end developer'
12. if 'mobile' in x:
13. return 'mobile developer'
14. if 'academic' in x:
15. return 'academic researcher'
16. if 'game' in x:
17. return 'game developer'
18. if 'data' in x:
19. return 'data scientist'
20. if 'full-stack' in x:
21. return 'full-stack developer'

Working on Country –
I don’t want the model to get confused and so, I’ll take into the account
the countries having more than 300 developers.
I remove all the other developers from the dataset.
def remove_countries(counts,bar):
counts_map={}
for i in range(len(counts)):
if counts.values[i]>=bar:
counts_map[counts.index[i]]=counts.index[i]
else:
counts_map[counts.index[i]]="other"
return counts_map

Removing Outliers –

We need to remove the outliers from all the countries, countries like United
States of America have big billionaires which makes a lot of difference.
We plot a box plot for checking those outliers.
fig,ax=plt.subplots(1,1,figsize=(12,7))
df.boxplot("Salary",'Country',ax=ax)
plt.suptitle("Salary vs Country")
plt.ylabel("Salary")
plt.xticks(rotation=90)
plt.ylim(0,308520)
plt.show()

we can see that we have a lot of outliers here, after changing the limits of salary
various times, we arrive to the decision that to remove the outlier, we will limit
our salaries, but it should still contain some higher and lower values, therefore
max will be 250000 and lowest would be 10000, we will remove the other
values.
Creation of Models-

List of models I created-

Name Mean Absolute Error

Linear Regression 44035.000324853405

Decision Tree Regressor 26963.13126461602

Random Forest 27292.812864555683

Regressor

Grid Search 31128.610701331247

XG Boost 29285.348949161387

Light Gradient Boost 42278.6891221509

TensorFlow Keras 44795.866636312276

To my surprise, Decision Tree Regressor is performing the best here.

But we will go ahead with it.

We will save our model in a pickle file.

Now we are going to deploy our model on Streamlit.

So, we will make app.py file, with predict_page for prediction and explore for
viewing some metrics in the form of graph.

Working on Streamlit-
We will app.py and import the pages that is predict and explore.
In explore page- We will all our code and function to be executed.
And all the graphs we want to create.

In predict page- We will put the button and transformation.

We will print the salary with the help of streamlit function.

Flowchart-
User enter the
User enter its Country
Education Level

User enter the User specifies which

Experience type of developer he is

User clicks the calculate Predicted Salary is

button shown

References –
 Andrew NG course
 Hands on machine learning with Scikit-learn, Keras and
TensorFlow
 https://www.geeksforgeeks.org/machine-learning
 dataset- Stackoverflow developer survey-
http://insights.stackoverflow.com/survey

Project Report
No ratings yet
Project Report
40 pages
ML Internship: Red Wine Analysis
No ratings yet
ML Internship: Red Wine Analysis
31 pages
Sentiment Analysis for Movie Reviews
100% (1)
Sentiment Analysis for Movie Reviews
1 page
Computer Science & Engineering: Project Report
No ratings yet
Computer Science & Engineering: Project Report
8 pages
Stock-Price-Prediction-Using-Machine-Learning Final Project Indu Mam Project Final Project
100% (1)
Stock-Price-Prediction-Using-Machine-Learning Final Project Indu Mam Project Final Project
47 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
Upi Fraud Detection Using Machine Learning Algorithms
No ratings yet
Upi Fraud Detection Using Machine Learning Algorithms
12 pages
Symmetric Circuit Analysis
No ratings yet
Symmetric Circuit Analysis
10 pages
Fake News Detection Using LSTM
No ratings yet
Fake News Detection Using LSTM
67 pages
Flipkart Web Scraping Project Report
No ratings yet
Flipkart Web Scraping Project Report
25 pages
Python Currency Converter
No ratings yet
Python Currency Converter
5 pages
Projecr - Report House Price Pred
No ratings yet
Projecr - Report House Price Pred
18 pages
Flight Delay Prediction: Project Synopsis On
No ratings yet
Flight Delay Prediction: Project Synopsis On
13 pages
Handwritten Digit Recognition Using Convolutional Neural Networks
No ratings yet
Handwritten Digit Recognition Using Convolutional Neural Networks
6 pages
Amit Kumar: Bigmart Sales Prediction A Project Report
No ratings yet
Amit Kumar: Bigmart Sales Prediction A Project Report
47 pages
Big Data
No ratings yet
Big Data
30 pages
Plant Disease Detection for Farmers
100% (1)
Plant Disease Detection for Farmers
11 pages
Internship Report On Data Science
No ratings yet
Internship Report On Data Science
33 pages
Kinjal - ,black Book
No ratings yet
Kinjal - ,black Book
70 pages
Hand Gestures Report
No ratings yet
Hand Gestures Report
24 pages
Final Year Project Phase 1 Report
No ratings yet
Final Year Project Phase 1 Report
25 pages
Shweta Mba Project Report Final
No ratings yet
Shweta Mba Project Report Final
74 pages
Stock Price Sam23
No ratings yet
Stock Price Sam23
38 pages
Stock Market Prediction Using Machine Learning
100% (1)
Stock Market Prediction Using Machine Learning
49 pages
Project Report G4 (V3)
No ratings yet
Project Report G4 (V3)
57 pages
Telangana Tourism
No ratings yet
Telangana Tourism
5 pages
Currency Detector App For Visually Impaired
No ratings yet
Currency Detector App For Visually Impaired
5 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Project Report - Credit Card Fraud Detection
No ratings yet
Project Report - Credit Card Fraud Detection
11 pages
Cryptocurrency Prediction Report
100% (1)
Cryptocurrency Prediction Report
60 pages
Digital Naturalist Final (1) 22280
No ratings yet
Digital Naturalist Final (1) 22280
51 pages
Mini Project Report On
No ratings yet
Mini Project Report On
17 pages
Frugal Testing Assignment
No ratings yet
Frugal Testing Assignment
5 pages
Capstone Project
No ratings yet
Capstone Project
47 pages
Customer Review Analysis Project
No ratings yet
Customer Review Analysis Project
31 pages
Block Chain Mini-Project
No ratings yet
Block Chain Mini-Project
27 pages
Review of Products Using Sentiment Analysis (4-2 Project Report) - 3
No ratings yet
Review of Products Using Sentiment Analysis (4-2 Project Report) - 3
75 pages
Online Credit Card Fraud Detection Using Big Data: A Project Review On
No ratings yet
Online Credit Card Fraud Detection Using Big Data: A Project Review On
16 pages
A Project Report: in Partial Fulfillment For The Award of The Degree
No ratings yet
A Project Report: in Partial Fulfillment For The Award of The Degree
50 pages
MBA Project-Sujit Bhalekar
No ratings yet
MBA Project-Sujit Bhalekar
22 pages
Alteryx Data Analytics Process
No ratings yet
Alteryx Data Analytics Process
9 pages
Alexa Mini Project Synopsis Abhi
100% (1)
Alexa Mini Project Synopsis Abhi
4 pages
Project
No ratings yet
Project
13 pages
Internship Report
No ratings yet
Internship Report
20 pages
Aparna INTERN REPORT 12
No ratings yet
Aparna INTERN REPORT 12
46 pages
AI, ML, Data Science & WordPress Training
No ratings yet
AI, ML, Data Science & WordPress Training
74 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
C++ Record Ecom
No ratings yet
C++ Record Ecom
42 pages
CMOS DNA SoC Detection Project
No ratings yet
CMOS DNA SoC Detection Project
13 pages
Control Systems FINAL REPORT
No ratings yet
Control Systems FINAL REPORT
20 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
69 pages
A Project Report On Real Estate
No ratings yet
A Project Report On Real Estate
66 pages
Mini Project Report Format
No ratings yet
Mini Project Report Format
21 pages
Fruits & Vegetable Classification and Calories Measurement System
No ratings yet
Fruits & Vegetable Classification and Calories Measurement System
2 pages
Multi-Disease Prediction Guide
No ratings yet
Multi-Disease Prediction Guide
33 pages
BT4234 - RPT - Mr. Sreenarayanan N M
No ratings yet
BT4234 - RPT - Mr. Sreenarayanan N M
32 pages
Group 24 Miniproject
No ratings yet
Group 24 Miniproject
33 pages
Employee Salary Prediction
No ratings yet
Employee Salary Prediction
10 pages
Project Report
No ratings yet
Project Report
11 pages
JavaScript Notes
No ratings yet
JavaScript Notes
30 pages
BRKSPG 2381
No ratings yet
BRKSPG 2381
60 pages
Ensuring Security in Mobile Networks Post Quantum
No ratings yet
Ensuring Security in Mobile Networks Post Quantum
13 pages
NNTN7392 IMPRES BattReader UG
No ratings yet
NNTN7392 IMPRES BattReader UG
63 pages
Meridian CDC
No ratings yet
Meridian CDC
1,086 pages
Pipeline Design in Computer Arch
No ratings yet
Pipeline Design in Computer Arch
24 pages
Easy Learning Javascript Javascript For Beginners Guide by Yang Hu
No ratings yet
Easy Learning Javascript Javascript For Beginners Guide by Yang Hu
84 pages
Xmagpy Manual
No ratings yet
Xmagpy Manual
60 pages
Advanced Web Attacks and Exploitation: Figure 20: Burp Suite Repeater Previous Request and Response
No ratings yet
Advanced Web Attacks and Exploitation: Figure 20: Burp Suite Repeater Previous Request and Response
4 pages
Cucumber BDD
No ratings yet
Cucumber BDD
16 pages
ScrumMastercertified Resume 15yrsexp
No ratings yet
ScrumMastercertified Resume 15yrsexp
4 pages
Practical 14
No ratings yet
Practical 14
4 pages
Database Developer's Guide With Visual C++ 4 Second Edition
No ratings yet
Database Developer's Guide With Visual C++ 4 Second Edition
1,351 pages
Buku Imam Setiaji Ronoatmojo Pemodelan Geostatistik
No ratings yet
Buku Imam Setiaji Ronoatmojo Pemodelan Geostatistik
39 pages
XXE Defence (Les) S in JDK XML Parsers
No ratings yet
XXE Defence (Les) S in JDK XML Parsers
7 pages
Free Digital Planner 2024 Editado-Compactado 2
No ratings yet
Free Digital Planner 2024 Editado-Compactado 2
224 pages
Hasee Pricelist On Januray
No ratings yet
Hasee Pricelist On Januray
3 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
API Docs Eng-CBonds
No ratings yet
API Docs Eng-CBonds
36 pages
Dynamic SQL with EXECUTE IMMEDIATE
No ratings yet
Dynamic SQL with EXECUTE IMMEDIATE
5 pages
ITU-Trends in Telecommunication Reform 2006
No ratings yet
ITU-Trends in Telecommunication Reform 2006
240 pages
VERDICT User Manual
No ratings yet
VERDICT User Manual
220 pages
Patterns in Java A Catalog of Reusable Design Patterns Illustrated With UML 2nd Edition Volume 1 Grand Download
100% (5)
Patterns in Java A Catalog of Reusable Design Patterns Illustrated With UML 2nd Edition Volume 1 Grand Download
61 pages
Course Name: IAA202 Student Name: Chế Công Đại Instructor Name: Mai Hoang Dinh Lab Due Date: 9/2/2022
No ratings yet
Course Name: IAA202 Student Name: Chế Công Đại Instructor Name: Mai Hoang Dinh Lab Due Date: 9/2/2022
8 pages
Vehicle Parking System
No ratings yet
Vehicle Parking System
13 pages
Powertech Controller - SC503 - Datasheet - V1.0 - 210716
No ratings yet
Powertech Controller - SC503 - Datasheet - V1.0 - 210716
2 pages
90204-1023DEJ E Series External Lo Manual PDF
No ratings yet
90204-1023DEJ E Series External Lo Manual PDF
92 pages
Application of ICT in Research
No ratings yet
Application of ICT in Research
10 pages
Opper Time Opper Time Domain Reflectometer Domain Reflectometer
No ratings yet
Opper Time Opper Time Domain Reflectometer Domain Reflectometer
4 pages
Target Enterprises Use Case Analysis
No ratings yet
Target Enterprises Use Case Analysis
11 pages

Mini Project Report

Uploaded by

Mini Project Report

Uploaded by

GRAPHIC ERA DEEMED TO BE UNIVERSITY

MINI PROJECT REPORT

Machine Learning Based Application

Machine Learning, as the name says, is all about machines learning

Linear regression algorithm in machine learning is a supervised learning

1. I imported the libraries needed for its implementation.

Working on Education Level –

Working on the Developer Type-

List of models I created-

Name Mean Absolute Error

Linear Regression 44035.000324853405

Decision Tree Regressor 26963.13126461602

Random Forest 27292.812864555683

Grid Search 31128.610701331247

Light Gradient Boost 42278.6891221509

TensorFlow Keras 44795.866636312276

To my surprise, Decision Tree Regressor is performing the best here.

We will save our model in a pickle file.

Now we are going to deploy our model on Streamlit.

In predict page- We will put the button and transformation.

User enter the User specifies which

User clicks the calculate Predicted Salary is

You might also like