0% found this document useful (0 votes)

11 views11 pages

Finalll - Ipynb - Colab

The document outlines a data analysis and machine learning workflow for predicting vehicle prices using a dataset of advertisements. It includes steps for data exploration, preprocessing, feature engineering, model building, and evaluation, utilizing various regression algorithms such as Linear Regression, Decision Trees, and K-Nearest Neighbors. The analysis culminates in model performance comparison through metrics like Mean Squared Error (MSE) and R² scores, along with visualizations of results.

Uploaded by

Asif Manzoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views11 pages

Finalll - Ipynb - Colab

Uploaded by

Asif Manzoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

keyboard_arrow_down Import Libraries

# Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

keyboard_arrow_down 1. Data/Domain Understanding and Exploration

1.1. Meaning and Type of Features

# Load the dataset and inspect the features

# Load the dataset
data = pd.read_csv("adverts.csv")

# Display the first few rows and column information

data.head()

public_reference mileage reg_code standard_colour standard_make standard_model vehicle_condition year_of_registration pr

0 202006039777689 0.0 NaN Grey Volvo XC90 NEW NaN 739

1 202007020778260 108230.0 61 Blue Jaguar XF USED 2011.0 7

2 202007020778474 7800.0 17 Grey SKODA Yeti USED 2017.0 14

3 202007080986776 45000.0 16 Brown Vauxhall Mokka USED 2016.0 7

Range Rover
4 202007161321269 64000 0 64 G L dR USED 2015 0 269

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 402005 entries, 0 to 402004
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 public_reference 402005 non-null int64
1 mileage 401878 non-null float64
2 reg_code 370148 non-null object
3 standard_colour 396627 non-null object
4 standard_make 402005 non-null object
5 standard_model 402005 non-null object
6 vehicle_condition 402005 non-null object
7 year_of_registration 368694 non-null float64
8 price 402005 non-null int64
9 body_type 401168 non-null object
10 crossover_car_and_van 402005 non-null bool
11 fuel_type 401404 non-null object
dtypes: bool(1), float64(2), int64(2), object(7)
memory usage: 34.1+ MB

data.describe()
public_reference mileage year_of_registration price

count 4.020050e+05 401878.000000 368694.000000 4.020050e+05

mean 2.020071e+14 37743.595656 2015.006206 1.734197e+04

std 1.691662e+10 34831.724018 7.962667 4.643746e+04

min 2.013072e+14 0.000000 999.000000 1.200000e+02

25% 2.020090e+14 10481.000000 2013.000000 7.495000e+03

50% 2.020093e+14 28629.500000 2016.000000 1.260000e+04

75% 2.020102e+14 56875.750000 2018.000000 2.000000e+04

max 2.020110e+14 999999.000000 2020.000000 9.999999e+06

data.shape

(402005, 12)

data.isnull().sum()

public_reference 0

mileage 127

reg_code 31857

standard_colour 5378

standard_make 0

standard_model 0

vehicle_condition 0

year_of_registration 33311

price 0

body_type 837

crossover_car_and_van 0

fuel_type 601

dtype: int64

keyboard_arrow_down Analysis of Distributions

# Analyze distributions of numerical features
numerical_features = ['mileage', 'year_of_registration', 'price', 'body_type']
data[numerical_features].hist(bins=30, figsize=(10, 6))
plt.suptitle('Histograms of Numerical Features')
plt.show()
keyboard_arrow_down 1.2. Analysis of Predictive Power of Features
# Correlation matrix
correlation_matrix = data[numerical_features].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Scatter plots for numerical features vs price

for feature in ['mileage', 'year_of_registration']:
plt.figure(figsize=(8, 4))
sns.scatterplot(x=data[feature], y=data['price'], color='green')
plt.title(feature + ' vs Price')
plt.xlabel(feature)
plt.ylabel('Price')
plt.show()
keyboard_arrow_down 1.3. Data Processing for Data Exploration and Visualisation
# Check for missing values
print(data.isnull().sum())

# Dealing with missing values

data.dropna(subset=['price'], inplace=True)

public_reference 0
mileage 127
reg_code 31857
standard_colour 5378
standard_make 0
standard_model 0
vehicle_condition 0
year_of_registration 33311
price 0
body_type 837
crossover_car_and_van 0
fuel_type 601
dtype: int64

# Preprocessing the data

from sklearn.preprocessing import LabelEncoder

# Handle missing values

data.fillna({'mileage': 0, 'year_of_registration': 0, 'price': 0, 'standard_colour': 'Unknown',
'standard_make': 'Unknown', 'standard_model': 'Unknown', 'vehicle_condition': 'Unknown',
'body_type': 'Unknown', 'fuel_type': 'Unknown'}, inplace=True)

# Encode categorical variables

label_encoders = {}
categorical_features = ['standard_colour', 'standard_make', 'standard_model', 'vehicle_condition', 'body_type', 'fuel_type']
for feature in categorical_features:
le = LabelEncoder()
data[feature] = le.fit_transform(data[feature])
label_encoders[feature] = le

# Visualizing relationships
plt.figure(figsize=(10, 6))
sns.scatterplot(x='mileage', y='price', data=data)
plt.title('Mileage vs Price')
plt.show()

data.isnull().sum()

public_reference 0

mileage 0

reg_code 31857

standard_colour 0

standard_make 0

standard_model 0

vehicle_condition 0

year_of_registration 0

price 0

body_type 0

crossover_car_and_van 0

fuel_type 0

dtype: int64

Start coding or generate with AI.

keyboard_arrow_down 2. Data Processing for Machine Learning

2.1. Dealing with Missing Values, Outliers, and Noise

# Data Processing for Data Exploration and Visualization

# Check for missing values and handle them
missing_values = data.isnull().sum()
print("Missing values in each column:")
print(missing_values)

Missing values in each column:

public_reference 0
mileage 0
reg_code 31857
standard_colour 0
standard_make 0
standard_model 0
vehicle_condition 0
year_of_registration 0
price 0
body_type 0
crossover_car_and_van 0
fuel_type 0
dtype: int64

# Verify preprocessing
print("Data preprocessing complete. Here's the head of the processed dataframe:")
data.head()

Data preprocessing complete. Here's the head of the processed dataframe:

public_reference mileage reg_code standard_colour standard_make standard_model vehicle_condition year_of_registration pr

0 202006039777689 0.0 NaN 8 106 1107 0 0.0 739

1 202007020778260 108230.0 61 2 47 1110 1 2011.0 7

2 202007020778474 7800.0 17 8 91 1130 1 2017.0 14

3 202007080986776 45000.0 16 4 104 702 1 2016.0 7

4 202007161321269 64000.0 64 8 54 833 1 2015.0 269

# Subplot for before outlier removal

plt.subplot(1, 2, 1)
sns.boxplot(y=data['price'])
plt.title('Price Distribution (Before Outlier Removal)')
plt.ylabel('Price')

# Identifying and removing outliers using quantile method

q_low = data["price"].quantile(0.01)
q_hi = data["price"].quantile(0.99)
data_filtered = data[(data["price"] < q_hi) & (data["price"] > q_low)]

# Subplot for after outlier removal

plt.subplot(1, 2, 2)
sns.boxplot(y=data_filtered['price'])
plt.title('Price Distribution (After Outlier Removal)')
plt.ylabel('Price')

# Adjust layout and show the plot

plt.tight_layout()
plt.show()
# Verify the shape of the dataset after removing outliers
print("Shape of dataset after removing outliers:", data.shape)

Shape of dataset after removing outliers: (402005, 12)

keyboard_arrow_down 2.2. Feature Engineering, Data Transformations, Feature Selection

# Creating a new feature: vehicle age
data_filtered['vehicle_age'] = 2024 - data_filtered['year_of_registration']

# Selecting features for the model

features = ['mileage', 'vehicle_age']
X = data_filtered[features]
y = data_filtered['price']

<ipython-input-28-e38c903681f7>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus

data_filtered['vehicle_age'] = 2024 - data_filtered['year_of_registration']

Start coding or generate with AI.

keyboard_arrow_down 3. Model Building

3.1. Algorithm Selection, Model Instantiation and Configuration

# Train-validation-test split
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

##Algorithm Selection
# Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

# Decision Tree
tree_reg = DecisionTreeRegressor()
tree_reg.fit(X_train, y_train)

# k-Nearest Neighbors
knn_reg = KNeighborsRegressor()
knn_reg.fit(X_train, y_train)

▾ KNeighborsRegressor i ?

KNeighborsRegressor()

LinearRegression()

▾ LinearRegression i ?

LinearRegression()

DecisionTreeRegressor()

▾ DecisionTreeRegressor i ?

DecisionTreeRegressor()

keyboard_arrow_down 3.2. Grid Search, and Model Ranking and Selection

## 3.2. Grid Search for Hyperparameter Tuning
# Grid search for Decision Tree
param_grid = {'max_depth': [None, 5, 10, 20]}
grid_search = GridSearchCV(tree_reg, param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_tree = grid_search.best_estimator_

keyboard_arrow_down 4. Model Evaluation and Analysis

4.1. Coarse-Grained Evaluation/Analysis

models = {
"Linear Regression": lin_reg,
"Best Decision Tree": best_tree,
"KNN": knn_reg
}

results = {}

for model_name, model in models.items():

y_pred = model.predict(X_val)
mse = mean_squared_error(y_val, y_pred)
r2 = r2_score(y_val, y_pred)
results[model_name] = {'MSE': mse, 'R²': r2}
print(f"{model_name} - MSE: {mse:.2f}, R²: {r2:.2f}")

Linear Regression - MSE: 115341217.98, R²: 0.28

Best Decision Tree - MSE: 101868407.49, R²: 0.36
KNN - MSE: 123313134.73, R²: 0.23

# Converting results to DataFrame for easier plotting

results_df = pd.DataFrame(results).T

# Plotting the MSE comparison

plt.figure(figsize=(10, 6))
sns.barplot(x=results_df.index, y='MSE', data=results_df, palette='viridis')
plt.title('Model Comparison: Mean Squared Error (MSE)')
plt.ylabel('Mean Squared Error (MSE)')
plt.xlabel('Models')
plt.xticks(rotation=15)
plt.show()
<ipython-input-35-a7100df75570>:6: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `le

sns.barplot(x=results_df.index, y='MSE', data=results_df, palette='viridis')

# Plotting the R² comparison

plt.figure(figsize=(10, 6))
ax = sns.barplot(x=results_df.index, y='R²', data=results_df, palette='Blues')
plt.title('Model Comparison: R² Score')
plt.ylabel('R² Score')
plt.xlabel('Models')
plt.xticks(rotation=15)

# Adding values on top of the bars

for p in ax.patches:
ax.annotate(f'{p.get_height():.2f}',
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center',
fontsize=12, color='black',
xytext=(0, 5), textcoords='offset points')

plt.show()
<ipython-input-38-deff3738a757>:5: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `le

ax = sns.barplot(x=results_df.index, y='R²', data=results_df, palette='Blues')

keyboard_arrow_down 4.2. Feature Importance

# Feature Importance For Decision Tree
feature_importances = best_tree.feature_importances_
sns.barplot(x=features, y=feature_importances)
plt.title('Feature Importance for Best Decision Tree')
plt.show()

keyboard_arrow_down 4.3. Fine-Grained Evaluation (e.g., with instance-level errors)

# Plotting actual vs predicted prices for the best model
y_test_pred = best_tree.predict(X_test)
plt.figure(figsize=(10, 6))
sns.scatterplot(x=y_test, y=y_test_pred)
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Actual vs Predicted Prices')

Internship
No ratings yet
Internship
23 pages
Import As Import As
No ratings yet
Import As Import As
18 pages
Data Mining
No ratings yet
Data Mining
10 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
Note
No ratings yet
Note
9 pages
DataFrames: Handling Missing Values & Visualization
No ratings yet
DataFrames: Handling Missing Values & Visualization
12 pages
Eda Notes
No ratings yet
Eda Notes
4 pages
Car Price Prediction
No ratings yet
Car Price Prediction
35 pages
Car Price Prediction 1
No ratings yet
Car Price Prediction 1
24 pages
Engo 645
No ratings yet
Engo 645
9 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
Car Price Prediction
No ratings yet
Car Price Prediction
72 pages
Model
No ratings yet
Model
164 pages
Elite Sports Cars Eda
No ratings yet
Elite Sports Cars Eda
9 pages
Machine Learning With Python - Part-2
No ratings yet
Machine Learning With Python - Part-2
27 pages
Car Price
No ratings yet
Car Price
6 pages
Machine Learning Project 1690186790
No ratings yet
Machine Learning Project 1690186790
18 pages
Cars Sales Dashboard
No ratings yet
Cars Sales Dashboard
19 pages
Pandas Data Analysis Car Statistics
No ratings yet
Pandas Data Analysis Car Statistics
4 pages
Exp 5 Exploratory Data Analysis SDK Ok
No ratings yet
Exp 5 Exploratory Data Analysis SDK Ok
13 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Web App Code
No ratings yet
Web App Code
5 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Car Price Prediction Project
No ratings yet
Car Price Prediction Project
34 pages
Exploratiory Data Analysis
No ratings yet
Exploratiory Data Analysis
26 pages
Quikr Car Price Prediction Using Linear Regression 1717999953
No ratings yet
Quikr Car Price Prediction Using Linear Regression 1717999953
12 pages
City Cycle Fuel Consumption 2024
No ratings yet
City Cycle Fuel Consumption 2024
23 pages
Task03 Carpricepredictionwithmachinelearning 1752340918
No ratings yet
Task03 Carpricepredictionwithmachinelearning 1752340918
3 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
Report
No ratings yet
Report
4 pages
Task 3 Car Price Prediction Using Machine Learning
No ratings yet
Task 3 Car Price Prediction Using Machine Learning
30 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Electric Vehicle Range Prediction-Regression Analysis
No ratings yet
Electric Vehicle Range Prediction-Regression Analysis
37 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Trilokesh Assignment
No ratings yet
Trilokesh Assignment
15 pages
Belarus Car Price Prediction
No ratings yet
Belarus Car Price Prediction
18 pages
Xii Project PDF
No ratings yet
Xii Project PDF
19 pages
22eg107a11 DWV
No ratings yet
22eg107a11 DWV
15 pages
Car Price Prediction Oasis Infobyte Task3
No ratings yet
Car Price Prediction Oasis Infobyte Task3
7 pages
Python Codes
No ratings yet
Python Codes
17 pages
EDA Withoutcode
No ratings yet
EDA Withoutcode
36 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
Content Beyond Syllabus and Case Based Program
No ratings yet
Content Beyond Syllabus and Case Based Program
8 pages
Team AN
No ratings yet
Team AN
23 pages
Weekly Diary Report-244
No ratings yet
Weekly Diary Report-244
9 pages
Linear Regression
100% (1)
Linear Regression
16 pages
Data Analysis for Car Sales Insights
No ratings yet
Data Analysis for Car Sales Insights
19 pages
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
100% (2)
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
30 pages
Data Visualization with Jupyter: Mtcars Analysis
No ratings yet
Data Visualization with Jupyter: Mtcars Analysis
20 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
22 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
No ratings yet
Problem Statement Is To Predict Price Column Based On Data With 24 Columns With Over 200 Data Entries Using Linear Regression
5 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Eda 1
No ratings yet
Eda 1
29 pages
DV Ca-1
No ratings yet
DV Ca-1
9 pages
Mohy - Jupyter Notebook
No ratings yet
Mohy - Jupyter Notebook
3 pages
SQL SQL: Learn
No ratings yet
SQL SQL: Learn
10 pages
Ashmiya
No ratings yet
Ashmiya
17 pages
Book 30398
No ratings yet
Book 30398
45 pages
Internship Activity Plan
No ratings yet
Internship Activity Plan
3 pages
ASPIRE: Strategic Insights for Artec
No ratings yet
ASPIRE: Strategic Insights for Artec
3 pages
Data Analytics Internship Update
No ratings yet
Data Analytics Internship Update
2 pages
EntryLevel Influencer Contract IG - @
No ratings yet
EntryLevel Influencer Contract IG - @
2 pages
Top 5 SQL Functions Every Data Analyst Should Master 1690558535
No ratings yet
Top 5 SQL Functions Every Data Analyst Should Master 1690558535
8 pages
Legal and Ethical Issues
No ratings yet
Legal and Ethical Issues
10 pages
Human Resource Development Strategies
No ratings yet
Human Resource Development Strategies
10 pages
Tourism in The New Europe The Challenges and Opportunities of Eu Enlargement First D Hall Download
No ratings yet
Tourism in The New Europe The Challenges and Opportunities of Eu Enlargement First D Hall Download
81 pages
The Secret Chapter by Napoleon Hill
No ratings yet
The Secret Chapter by Napoleon Hill
4 pages
Krone Combi Pack 1250
No ratings yet
Krone Combi Pack 1250
164 pages
Notification of Award (NOA)
No ratings yet
Notification of Award (NOA)
2 pages
Operations - Introduction To Break-Even Analysis - Business - Tutor2u
No ratings yet
Operations - Introduction To Break-Even Analysis - Business - Tutor2u
8 pages
BT 406 M.C.Q File by Amaan Khan
No ratings yet
BT 406 M.C.Q File by Amaan Khan
37 pages
Dec50103 PW5 - Sesi 120232024
No ratings yet
Dec50103 PW5 - Sesi 120232024
10 pages
Unit-4 ASE
No ratings yet
Unit-4 ASE
13 pages
Fiscal Management
100% (2)
Fiscal Management
19 pages
Anaesthetic and Respiratory Equipment - Laryngoscopes For Tracheal Intubation
No ratings yet
Anaesthetic and Respiratory Equipment - Laryngoscopes For Tracheal Intubation
2 pages
How To Install Niresh 10
No ratings yet
How To Install Niresh 10
2 pages
Arduino Based Automated Waste Segregation Project
100% (1)
Arduino Based Automated Waste Segregation Project
34 pages
Getty Lawsuit Against Stable Diffusion
No ratings yet
Getty Lawsuit Against Stable Diffusion
36 pages
UltraTech Cement Project Report
No ratings yet
UltraTech Cement Project Report
3 pages
BRO Recruitment 2024 Application Form
No ratings yet
BRO Recruitment 2024 Application Form
4 pages
Unit 4-1
No ratings yet
Unit 4-1
11 pages
Tcs Dailycompound
No ratings yet
Tcs Dailycompound
3 pages
Wbcs Prelims Practice Set 4 2024
No ratings yet
Wbcs Prelims Practice Set 4 2024
4 pages
10.synergy & Change Management
No ratings yet
10.synergy & Change Management
5 pages
Service: Golf 2004 Golf Plus 2005 Passat 2006 Touran 2003
100% (1)
Service: Golf 2004 Golf Plus 2005 Passat 2006 Touran 2003
299 pages
Running Head: Homeland Security 1
No ratings yet
Running Head: Homeland Security 1
6 pages
Consti 1 - Syllabus
No ratings yet
Consti 1 - Syllabus
26 pages
Battery Guide - 2021
No ratings yet
Battery Guide - 2021
27 pages
Madrid, Mira Flor F. 3BSA-5
No ratings yet
Madrid, Mira Flor F. 3BSA-5
3 pages
SCI EI Art Qi
No ratings yet
SCI EI Art Qi
72 pages
Open DeepRacer Autonomous Racing Platform For Experimentation With Sim2Real Reinforcement Learning PDF
No ratings yet
Open DeepRacer Autonomous Racing Platform For Experimentation With Sim2Real Reinforcement Learning PDF
9 pages
Internet's Role in Student Learning
No ratings yet
Internet's Role in Student Learning
6 pages
Cyber Victimology New Chapter
No ratings yet
Cyber Victimology New Chapter
18 pages

Finalll - Ipynb - Colab

Uploaded by

Finalll - Ipynb - Colab

Uploaded by

keyboard_arrow_down Import Libraries

# Import necessary libraries

keyboard_arrow_down 1. Data/Domain Understanding and Exploration

# Load the dataset and inspect the features

# Display the first few rows and column information

public_reference mileage reg_code standard_colour standard_make standard_model vehicle_condition year_of_registration pr

0 202006039777689 0.0 NaN Grey Volvo XC90 NEW NaN 739

1 202007020778260 108230.0 61 Blue Jaguar XF USED 2011.0 7

2 202007020778474 7800.0 17 Grey SKODA Yeti USED 2017.0 14

3 202007080986776 45000.0 16 Brown Vauxhall Mokka USED 2016.0 7

count 4.020050e+05 401878.000000 368694.000000 4.020050e+05

mean 2.020071e+14 37743.595656 2015.006206 1.734197e+04

std 1.691662e+10 34831.724018 7.962667 4.643746e+04

min 2.013072e+14 0.000000 999.000000 1.200000e+02

25% 2.020090e+14 10481.000000 2013.000000 7.495000e+03

50% 2.020093e+14 28629.500000 2016.000000 1.260000e+04

75% 2.020102e+14 56875.750000 2018.000000 2.000000e+04

max 2.020110e+14 999999.000000 2020.000000 9.999999e+06

keyboard_arrow_down Analysis of Distributions

# Scatter plots for numerical features vs price

# Dealing with missing values

# Preprocessing the data

# Handle missing values

# Encode categorical variables

Start coding or generate with AI.

keyboard_arrow_down 2. Data Processing for Machine Learning

# Data Processing for Data Exploration and Visualization

Missing values in each column:

Data preprocessing complete. Here's the head of the processed dataframe:

0 202006039777689 0.0 NaN 8 106 1107 0 0.0 739

1 202007020778260 108230.0 61 2 47 1110 1 2011.0 7

2 202007020778474 7800.0 17 8 91 1130 1 2017.0 14

3 202007080986776 45000.0 16 4 104 702 1 2016.0 7

4 202007161321269 64000.0 64 8 54 833 1 2015.0 269

# Subplot for before outlier removal

# Identifying and removing outliers using quantile method

# Subplot for after outlier removal

# Adjust layout and show the plot

Shape of dataset after removing outliers: (402005, 12)

keyboard_arrow_down 2.2. Feature Engineering, Data Transformations, Feature Selection

# Selecting features for the model

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus

Start coding or generate with AI.

keyboard_arrow_down 3. Model Building

keyboard_arrow_down 3.2. Grid Search, and Model Ranking and Selection

keyboard_arrow_down 4. Model Evaluation and Analysis

for model_name, model in models.items():

Linear Regression - MSE: 115341217.98, R²: 0.28

# Converting results to DataFrame for easier plotting

# Plotting the MSE comparison

sns.barplot(x=results_df.index, y='MSE', data=results_df, palette='viridis')

# Plotting the R² comparison

# Adding values on top of the bars

ax = sns.barplot(x=results_df.index, y='R²', data=results_df, palette='Blues')

keyboard_arrow_down 4.2. Feature Importance

keyboard_arrow_down 4.3. Fine-Grained Evaluation (e.g., with instance-level errors)

You might also like