Ml-Exp-1 - Jupyter Notebook

Uploaded by

34 Neha Galande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views8 pages

Ml-Exp-1 - Jupyter Notebook

Uploaded by

34 Neha Galande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

In [1]:  # Import necessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import os

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 1/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

In [2]:  # Check files in the dataset directory

directory_path = '/kaggle/input/uber-fares-dataset'
files = os.listdir(directory_path)
print("Files in the directory:", files)

# Assuming the correct file is found, set the file path accordingly
file_path = '/kaggle/input/uber-fares-dataset/uber.csv' # Update if th

# 1. Pre-process the dataset
def preprocess_data(file_path):
# Load the dataset
df = pd.read_csv(file_path)

# Convert 'pickup_datetime' to datetime
df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'], error

# Drop rows with missing or NaN values
df.dropna(inplace=True)

# Extract important date and time features from 'pickup_datetime'
df['hour'] = df['pickup_datetime'].dt.hour
df['day'] = df['pickup_datetime'].dt.day
df['month'] = df['pickup_datetime'].dt.month
df['year'] = df['pickup_datetime'].dt.year
df['day_of_week'] = df['pickup_datetime'].dt.dayofweek

# Drop unnecessary columns
df = df.drop(columns=['pickup_datetime', 'key'])

return df

# Call the preprocess function
df_processed = preprocess_data(file_path)

# Display the first few rows of the processed DataFrame
print(df_processed.head()) # Or use df_processed.head() to display in

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 2/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

Files in the directory: ['uber.csv']

Unnamed: 0 fare_amount pickup_longitude pickup_latitude \
0 24238194 7.5 -73.999817 40.738354
1 27835199 7.7 -73.994355 40.728225
2 44984355 12.9 -74.005043 40.740770
3 25894730 5.3 -73.976124 40.790844
4 17610152 16.0 -73.925023 40.744085

dropoff_longitude dropoff_latitude passenger_count hour day m

onth \
0 -73.999512 40.723217 1 19 7
5
1 -73.994710 40.750325 1 20 17
7
2 -73.962565 40.772647 1 21 24
8
3 -73.965316 40.803349 3 8 26
6
4 -73.973082 40.761247 5 17 28
8

year day_of_week
0 2015 3
1 2009 4
2 2009 0
3 2009 4
4 2014 3

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 3/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

In [3]:  # 2. Identify outliers using Z-score

def identify_outliers(df):
# Calculate Z-scores for the 'fare_amount' column
z_scores = np.abs(stats.zscore(df['fare_amount']))
# Identify outliers
outliers = df[z_scores > 3]
print(f"Number of outliers: {outliers.shape[0]}")
return outliers

# Assuming df_processed is already defined and contains the preprocesse
outliers_found = identify_outliers(df_processed)
print(outliers_found)

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 4/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

Number of outliers: 5450

Unnamed: 0 fare_amount pickup_longitude pickup_latitude \
48 22405517 56.80 -73.993498 40.764686
84 25485719 49.57 -73.975058 40.788820
104 46435788 43.00 -73.862701 40.768959
204 6403066 45.00 -73.971663 40.757812
226 24085207 49.80 -73.992122 40.748577
... ... ... ... ...
199914 17686068 57.33 -73.776778 40.645427
199972 31236221 45.00 -73.786833 40.639842
199976 1780041 49.70 -73.978225 40.783318
199977 21117828 43.50 -73.996671 40.737483
199982 13096190 57.33 -73.969204 40.754771

dropoff_longitude dropoff_latitude passenger_count hour d

ay \
48 -73.993498 40.764686 1 22
3
84 -73.975058 40.788820 1 10
7
104 -73.999092 40.741829 2 18
15
204 -73.789273 40.641790 1 7
13
226 -73.806072 40.665272 1 17
29
... ... ... ... ...
...
199914 -73.948572 40.789107 5 5
14
199972 -74.001215 40.722429 1 13
20
199976 -73.700963 40.705852 1 23
18
199977 -73.867758 40.897563 1 21
20
199982 -73.790351 40.643802 1 11
6

month year day_of_week

48 1 2013 3
84 8 2009 4
104 5 2015 4
204 11 2010 5
226 7 2012 6
... ... ... ...
199914 11 2014 4
199972 8 2010 4
199976 10 2011 1
199977 11 2012 1
199982 8 2014 2

[5450 rows x 12 columns]

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 5/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

In [8]:  def plot_outliers(df, outliers):

plt.figure(figsize=(10, 6))

# Scatter plot of all points
plt.scatter(df.index, df['fare_amount'], label="Normal Data", alpha

# Scatter plot of outliers
plt.scatter(outliers.index, outliers['fare_amount'], color='red', l

# Add labels and title
plt.title('Fare Amount with Outliers Highlighted')
plt.xlabel('Index')
plt.ylabel('Fare Amount')

# Add legend
plt.legend()

# Show the plot
plt.show()
plot_outliers(df_processed,outliers_found)

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 6/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

In [4]:  # 3. Check the correlation

def plot_correlation(df):
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

print(plot_correlation(df_processed))

None

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 7/8
10/20/24, 10:42 PM ml-exp-1 - Jupyter Notebook

In [5]:  #4
def implement_models(df):
X = df.drop(columns='fare_amount')
y = df['fare_amount']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size

# Linear Regression
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred_lin = lin_reg.predict(X_test)

# Random Forest Regression
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)
y_pred_rf = rf_reg.predict(X_test)

return y_test, y_pred_lin, y_pred_rf, lin_reg, rf_reg

# Call implement_models and unpack the results
y_test, y_pred_lin, y_pred_rf, lin_reg, rf_reg = implement_models(df_pr

#5 Now you can evaluate the models
def evaluate_models(y_test, y_pred_lin, y_pred_rf):
# Evaluation metrics
lin_rmse = np.sqrt(mean_squared_error(y_test, y_pred_lin))
lin_r2 = r2_score(y_test, y_pred_lin)

rf_rmse = np.sqrt(mean_squared_error(y_test, y_pred_rf))
rf_r2 = r2_score(y_test, y_pred_rf)

print(f"Linear Regression RMSE: {lin_rmse}, R2: {lin_r2}")
print(f"Random Forest Regression RMSE: {rf_rmse}, R2: {rf_r2}")

# Call the evaluation function
evaluate_models(y_test, y_pred_lin, y_pred_rf)

Linear Regression RMSE: 10.113708439348311, R2: 0.016696500909208156

Random Forest Regression RMSE: 5.356064199348529, R2: 0.7242228535774
927

In [ ]: 

localhost:8888/notebooks/Downloads/ml-exp-1.ipynb 8/8

ML Practical 1
No ratings yet
ML Practical 1
15 pages
Practical 1
No ratings yet
Practical 1
6 pages
ML Prac 1 Urvashi
No ratings yet
ML Prac 1 Urvashi
15 pages
ML Prac 1 Pratiksha
No ratings yet
ML Prac 1 Pratiksha
15 pages
Uber Fare Prediction Analysis
No ratings yet
Uber Fare Prediction Analysis
6 pages
Uber Fare Prediction Analysis
100% (1)
Uber Fare Prediction Analysis
1 page
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
ML 1 Um
No ratings yet
ML 1 Um
5 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
LAB1 HTML
No ratings yet
LAB1 HTML
17 pages
Institute of Technology Management & Research
No ratings yet
Institute of Technology Management & Research
10 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
Report
No ratings yet
Report
25 pages
Data Analysis Dummy Report: 0. Data Import and Cleaning
No ratings yet
Data Analysis Dummy Report: 0. Data Import and Cleaning
1 page
Merged
No ratings yet
Merged
47 pages
Scaffold FG
No ratings yet
Scaffold FG
13 pages
Walmart Sales Forecasting Guide
No ratings yet
Walmart Sales Forecasting Guide
37 pages
Reanalysis Data Example - Ipynb
No ratings yet
Reanalysis Data Example - Ipynb
363 pages
King County House Price Analysis
No ratings yet
King County House Price Analysis
1 page
Q3AB
No ratings yet
Q3AB
15 pages
Customer Data Outliers Pyspark
No ratings yet
Customer Data Outliers Pyspark
1 page
SourceCode Assignment1
No ratings yet
SourceCode Assignment1
9 pages
Ds Pract 5 Data Analytics1 Vedanti
No ratings yet
Ds Pract 5 Data Analytics1 Vedanti
7 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
LP Prcatical 2 Jupyter Notebook
No ratings yet
LP Prcatical 2 Jupyter Notebook
5 pages
Bigdata - Ipynb - Colab
No ratings yet
Bigdata - Ipynb - Colab
28 pages
ML Code Output
No ratings yet
ML Code Output
38 pages
ML Observation
No ratings yet
ML Observation
29 pages
Data Cleaning On Melbourne Housing
No ratings yet
Data Cleaning On Melbourne Housing
16 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
9 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
Intro to Pandas for Data Science
No ratings yet
Intro to Pandas for Data Science
6 pages
S 4.5
No ratings yet
S 4.5
18 pages
Praktikum 5
No ratings yet
Praktikum 5
20 pages
LT05 L1TP 220076 20110919 20161006 01 T1 Ver
No ratings yet
LT05 L1TP 220076 20110919 20161006 01 T1 Ver
35 pages
Q3a Q3B
No ratings yet
Q3a Q3B
13 pages
S 6
No ratings yet
S 6
18 pages
Python Assignment 1.ipynb - Colaboratory
No ratings yet
Python Assignment 1.ipynb - Colaboratory
3 pages
Shaheed Zulfikar Ali Bhutto Institute of Science & Technology
No ratings yet
Shaheed Zulfikar Ali Bhutto Institute of Science & Technology
12 pages
Heathrow Sunshine Time Series Analysis
No ratings yet
Heathrow Sunshine Time Series Analysis
19 pages
Gridding Report - : Data Source
No ratings yet
Gridding Report - : Data Source
8 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
ML#05
No ratings yet
ML#05
35 pages
Practical No. 6
No ratings yet
Practical No. 6
15 pages
Raob Data Example - Ipynb
No ratings yet
Raob Data Example - Ipynb
484 pages
Data Science: Housing Price Prediction
No ratings yet
Data Science: Housing Price Prediction
2 pages
Clustering Documentation R Code
100% (1)
Clustering Documentation R Code
9 pages
House 2
No ratings yet
House 2
11 pages
ML LAB - Ipynb - (4) - JupyterLab
No ratings yet
ML LAB - Ipynb - (4) - JupyterLab
10 pages
Gridding Report - : Data Source
No ratings yet
Gridding Report - : Data Source
9 pages
HW 3
No ratings yet
HW 3
20 pages
Sea Level
No ratings yet
Sea Level
36 pages
Case Study 1 Exercise R Script
No ratings yet
Case Study 1 Exercise R Script
5 pages
S 3
No ratings yet
S 3
18 pages
데이터 과제
No ratings yet
데이터 과제
2 pages
Designing Comparative Experiments: Points of View
No ratings yet
Designing Comparative Experiments: Points of View
2 pages
Mid-Semester Test: Civil Statistics
No ratings yet
Mid-Semester Test: Civil Statistics
2 pages
VAR Model for Turkish Financial Markets
No ratings yet
VAR Model for Turkish Financial Markets
21 pages
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
No ratings yet
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
8 pages
JQT1997
No ratings yet
JQT1997
3 pages
Exam Statistics 2024
No ratings yet
Exam Statistics 2024
6 pages
DATA ANALYTICS-5049-B.Voc-IT
No ratings yet
DATA ANALYTICS-5049-B.Voc-IT
2 pages
3 Chapter 3. Methodology
100% (7)
3 Chapter 3. Methodology
41 pages
A Repeated Power Training Enhances Fatigue Resistance While Redu 2018
No ratings yet
A Repeated Power Training Enhances Fatigue Resistance While Redu 2018
11 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Solution Econometric Chapter 10 Regression Panel Data
No ratings yet
Solution Econometric Chapter 10 Regression Panel Data
3 pages
Bike Demand Prediction Guide
No ratings yet
Bike Demand Prediction Guide
14 pages
Efek Mediasi Work Engagement Dalam Pengaruh Job CH
No ratings yet
Efek Mediasi Work Engagement Dalam Pengaruh Job CH
10 pages
Uses and Abuses of The Analysis of Covariance
No ratings yet
Uses and Abuses of The Analysis of Covariance
11 pages
UCLA Statistics Final Exam
No ratings yet
UCLA Statistics Final Exam
5 pages
Time Series Analysis: Henrik Madsen
No ratings yet
Time Series Analysis: Henrik Madsen
25 pages
ADL 07 Quantitative Techniques in Management V3
No ratings yet
ADL 07 Quantitative Techniques in Management V3
5 pages
Causal Inference Extended Tutorial
No ratings yet
Causal Inference Extended Tutorial
189 pages
Unit 3
No ratings yet
Unit 3
17 pages
Disruptiveness of Innovations - Measurement and An Assessment of Reliability and Validity
No ratings yet
Disruptiveness of Innovations - Measurement and An Assessment of Reliability and Validity
11 pages
Hybrid Math 11 Stat Q1 M3 W3 V2
No ratings yet
Hybrid Math 11 Stat Q1 M3 W3 V2
14 pages
2609 Revisedsyllabusof MSC Statistics IISemester OLD
No ratings yet
2609 Revisedsyllabusof MSC Statistics IISemester OLD
6 pages
Prediksi Penetapan Tarif Penerbangan Menggunakan Auto-Ml Dengan Algoritma Random Forest
No ratings yet
Prediksi Penetapan Tarif Penerbangan Menggunakan Auto-Ml Dengan Algoritma Random Forest
8 pages
Stats Exam for BCA/B.Sc Students
No ratings yet
Stats Exam for BCA/B.Sc Students
2 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
ARIMA Procedure Ebook
No ratings yet
ARIMA Procedure Ebook
110 pages
Boys' Blood Pressure by Age & Height
No ratings yet
Boys' Blood Pressure by Age & Height
4 pages
Design of Experiments Workshop
No ratings yet
Design of Experiments Workshop
13 pages
Chapter 6. Supplemental Text Material 6-1. Factor Effect Estimates Are Least Squares Estimates
No ratings yet
Chapter 6. Supplemental Text Material 6-1. Factor Effect Estimates Are Least Squares Estimates
14 pages