0% found this document useful (0 votes)

432 views8 pages

Car Price Analysis and Modeling

i. The document discusses analyzing relationships between price, mileage, and liter variables in cars data through linear regression models and visualizations. Various linear models are created and evaluated using r-squared values and residual plots. ii. Boxplots are created to examine the effect of different categorical variables like make, model, doors, and others on price. Cadillac, Chevrolet, and Pontiac are identified as having outliers that increase the value of q3. iii. Multiple visualizations are used to evaluate linear models including observed vs predicted price plots, residual vs predicted plots, and QQ plots. The models are found to fit the data poorly with non-normal residuals.

Uploaded by

Ray Guo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

432 views8 pages

Car Price Analysis and Modeling

Uploaded by

Ray Guo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Assignment 9: How much for that car?

Raymond Guo
2020-04-13

Exercise 1
i. The other continuous variable is mileage.
cars %>%
gather(Mileage, Liter, key = "catagory", value = "value") %>%
ggplot() +
geom_point(mapping = aes(x = value, y = Price)) +
facet_wrap(~catagory, scales = "free_x") +
labs(title="Relationship Between Price with Liter and Mileage")

Relationship Between Price with Liter and Mileage

Liter Mileage

60000
Price

40000

20000

2 3 4 5 6 0 10000 20000 30000 40000 50000

value

Exercise 2

continuous_model <-lm(Price~Mileage + Liter, data = cars)

continuous_model %>%
tidy()

term estimate std.error statistic p.value

(Intercept) 9426.6014688 1095.0777745 8.608157 0.0e+00
Mileage -0.1600285 0.0349084 -4.584237 5.3e-06
Liter 4968.2781155 258.8011436 19.197280 0.0e+00

continuous_model %>%
glance() %>%
select(r.squared)

1
r.squared
0.3291279

The r.squared is closer to 0 than 1 which means this model is doing a poor job in capturing the
varability of Price. ## Exercise 3
# predict model plane over sensible grid of values
lit <- unique(cars$Liter)
mil <- unique(cars$Mileage)
grid <- with(cars, expand.grid(lit, mil))
d <- setNames(data.frame(grid), c("Liter", "Mileage"))
vals <- predict(continuous_model, newdata = d)

# form surface matrix and give to plotly

m <- matrix(vals, nrow = length(unique(d$Liter)), ncol = length(unique(d$Mileage)))
p <- plot_ly() %>%
add_markers(
x = ~cars$Mileage,
y = ~cars$Liter,
z = ~cars$Price,
marker = list(size = 1)
) %>%
add_trace(
x = ~mil, y = ~lit, z = ~m, type="surface",
colorscale=list(c(0,1), c("yellow","yellow")),
showscale = FALSE
) %>%
layout(
scene = list(
xaxis = list(title = "mileage"),
yaxis = list(title = "liters"),
zaxis = list(title = "price")
)
)
if (!is_pdf) {p}

This model accurately fits with the data from excerise 1. I do not even know how am I suppose to
integrate the 3 assumptions with the looks of this 3D model. It is much easier to understand the
2D model compared to the 3D.

Exercise 4

continuous_df <- cars %>%

add_predictions(continuous_model) %>%
add_residuals(continuous_model)

2
ggplot(continuous_df) +
geom_point(mapping = aes(x = pred, y = Price)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
) +
labs(title="Observed vs Predicted of Price",
x = "Predicted Price",
y = "Observed Price")

Observed vs Predicted of Price

60000
Observed Price

40000

20000

10000 20000 30000 40000

Predicted Price

This graph barely shows a linear relationship from the explanatory variable and the response
variable.
ggplot(continuous_df) +
geom_point(mapping =aes(pred, resid)) +
geom_ref_line(h = 0) +
labs(title="Residual vs Predicted", x = "Predicted", y = "Predicted")

3
Residual vs Predicted
40000

30000

Predicted 20000

10000

−10000
10000 20000 30000 40000
Predicted

It sort of looks funky because there happens to be a large contingent within the southern border,
but the northern border shows a few points that look like outliers. I say it is roughly yields a
constant variability.
ggplot(data = continuous_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid)) +
labs(title="Theoretical Residuals vs Actual Residuals")
Theoretical Residuals vs Actual Residuals
40000

20000
sample

−20000
−2 0 2
theoretical

This obviously does not follow a bell shape curve. ## Exercise 5

cars %>%
ggplot() +
geom_boxplot(aes(x = reorder(Make, Price, FUN=median), y = Price)) +
labs(x = "Make of car", title = "Effect of make of car on price")

4
Effect of make of car on price

60000

Price
40000

20000

Saturn Chevrolet Pontiac Buick SAAB Cadillac

Make of car

Based on these box plots, there are instances where outliers only exist on the right side for half of
them. The value for q3 is significantly higher because of the outliers.
i. Cadillac
ii. Cadillac
iii. Chevrolet

Exercise 6

cars %>%
gather(Model:Cylinder, Doors:Leather, key="original_column", value="value") %>%
ggplot() +
geom_boxplot(aes(x = reorder(value, Price, FUN=median), y = Price)) +
facet_wrap(~original_column, scales = "free_x") +
labs(title = "Boxplot of All Categorical Variables") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 90, hjust = 1))

5
Boxplot of All Categorical Variables
Cruise Cylinder Doors

60000

40000

20000

2
Leather Model Sound

60000
Price 40000

20000

AVEO
Sunfire
Cavalier
Classic
Ion
Cobalt
Grand Am
Vibe
Century
L Series
Malibu
Grand Prix
Impala
G6
Lesabre
Monte Carlo
Bonneville
Lacrosse
Park Avenue
9−2X AWD
9_5 HO
9_3
GTO
9_3 HO
9_5
CTS
Deville
STS−V6
Corvette
STS−V8
CST−V
XLR−V8

1
Trim Type

60000

40000

20000
SVM Sedan 4D
SVM Hatchback 4D
LS Hatchback 4D
LT Hatchback 4D
Coupe 2D
LS Sport Coupe 2D
LS Coupe 2D
LS Sport Sedan 4D
Quad Coupe 2D
LT Sedan 4D
LS Sedan 4D
GT Sportwagon
AWD Sportwagon 4D
Sportwagon 4D
GT Coupe 2D
L300 Sedan 4D
Sedan 4D
LS MAXX Hback 4D
SE Sedan 4D
MAXX Hback 4D
LT MAXX Hback 4D
Custom Sedan 4D
GT Sedan 4D
CX Sedan 4D
GTP Sedan 4D
LT Coupe 2D
SLE Sedan 4D
Limited Sedan 4D
SS Coupe 2D
CXL Sedan 4D
CXS Sedan 4D
GXP Sedan 4D
SS Sedan 4D
Linear Sedan 4D
Special Ed Ultra 4D
Aero Sedan 4D
Aero Wagon 4D
Linear Wagon 4D
Arc Sedan 4D
Arc Wagon 4D
Aero Conv 2D
Linear Conv 2D
Arc Conv 2D
DHS Sedan 4D
DTS Sedan 4D
Conv 2D
Hardtop Conv 2D

Hatchback

Coupe

Sedan

Wagon

Convertible
reorder(value, Price, FUN = median)

Exercise 7

cars_factor_df <- cars %>%

mutate(Cylinder = as.factor(Cylinder))

mixed_model <-lm(Price~Mileage + Liter + Cylinder + Make +

Type, data = cars_factor_df)

mixed_model %>%
tidy()

term estimate std.error statistic p.value

(Intercept) 1.885018e+04 892.4119413 21.122738 0.0000000
Mileage -1.861764e-01 0.0106433 -17.492387 0.0000000
Liter 5.697442e+03 342.7322419 16.623596 0.0000000
Cylinder6 -3.312544e+03 619.9683651 -5.343086 0.0000001
Cylinder8 -3.672597e+03 1246.2162662 -2.946998 0.0033032
MakeCadillac 1.450444e+04 517.9855224 28.001635 0.0000000
MakeChevrolet -2.270807e+03 355.9736337 -6.379145 0.0000000
MakePontiac -2.355468e+03 363.9063301 -6.472731 0.0000000
MakeSAAB 9.905074e+03 450.2011112 22.001443 0.0000000
MakeSaturn -2.090266e+03 470.8305609 -4.439529 0.0000103
TypeCoupe -1.163869e+04 464.7055454 -25.045297 0.0000000
TypeHatchback -1.172638e+04 545.3936364 -21.500769 0.0000000
TypeSedan -1.178618e+04 411.1021489 -28.669707 0.0000000
TypeWagon -8.156551e+03 500.6379995 -16.292312 0.0000000

Yes, there are slopes for all of the categorical variables.

mixed_model %>%
glance() %>%

6
select(r.squared)

r.squared
0.9389165

Exercise 8

mixed_df <- cars_factor_df %>%

add_predictions(mixed_model) %>%
add_residuals(mixed_model)

ggplot(mixed_df) +
geom_point(mapping = aes(x = pred, y = Price)) +
geom_abline(
slope = 1,
intercept = 0,
color = "red",
size = 1
) +
labs(title="Observed vs Predicted of Price",
x = "Predicted Price",
y = "Observed Price")

Observed vs Predicted of Price

60000
Observed Price

40000

20000

10000 20000 30000 40000 50000

Predicted Price

ggplot(mixed_df) +
geom_point(mapping =aes(pred, resid)) +
geom_ref_line(h = 0) +
labs(title="Residual vs Predicted", x = "Predicted", y = "Predicted")

7
Residual vs Predicted
15000

10000

Predicted
5000

−5000

10000 20000 30000 40000 50000

Predicted
ggplot(data = mixed_df) +
geom_qq(mapping = aes(sample = resid)) +
geom_qq_line(mapping = aes(sample = resid)) +
labs(title="Theoretical Residuals vs Actual Residuals")
Theoretical Residuals vs Actual Residuals
15000

10000
sample

5000

−5000

−2 0 2
theoretical

Exercise 9
i. The value for r.squared is significantly closer to 1 compared to the 2 variable model. The
observed vs predicted graph perfectly shows a linear relationship. The variability of points
around the line is perfectly constant. The 2 variable model meets these requirements, but it
is a lot weaker. The qqplot clearly shows a bell shape curve compared to the first where it
obviously was not.
ii. The second model is the best because there are 3 conditions that needs to be satisfied to be a
reliable model. The second model does that job more effectively than the first one.

Manual vs Auto Transmission MPG Analysis
No ratings yet
Manual vs Auto Transmission MPG Analysis
5 pages
Assignment 8
No ratings yet
Assignment 8
6 pages
Titanic Survival Analysis
100% (2)
Titanic Survival Analysis
13 pages
Doing Meta Analysis With R A Hands On Guide 1st Edition Mathias Harrer Full Access
100% (2)
Doing Meta Analysis With R A Hands On Guide 1st Edition Mathias Harrer Full Access
164 pages
PowerBI Insights for Resellers
No ratings yet
PowerBI Insights for Resellers
3 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Programming For Data Science Assignment-2
No ratings yet
Programming For Data Science Assignment-2
23 pages
Little Book of R For Multivariate Analysis
No ratings yet
Little Book of R For Multivariate Analysis
51 pages
One-Sample T-Test
No ratings yet
One-Sample T-Test
9 pages
Business Statistics Cheatsheet
100% (1)
Business Statistics Cheatsheet
2 pages
Multivariate Statistical Modelling Based On Generalized Linear Models 2nd Edition ISBN 0387951873, 9780387951874 PDF
No ratings yet
Multivariate Statistical Modelling Based On Generalized Linear Models 2nd Edition ISBN 0387951873, 9780387951874 PDF
17 pages
Income and Retirement Data Analysis
100% (2)
Income and Retirement Data Analysis
25 pages
Austo Case Study
No ratings yet
Austo Case Study
19 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Data Wrangling (Data Preprocessing) : Practical Assessment 1
No ratings yet
Data Wrangling (Data Preprocessing) : Practical Assessment 1
5 pages
How To Work With List Columns
No ratings yet
How To Work With List Columns
104 pages
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
No ratings yet
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
46 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Car Price Prediction Using Various Algorithms
100% (1)
Car Price Prediction Using Various Algorithms
19 pages
Cluster Analysis in Python Chapter1 PDF
No ratings yet
Cluster Analysis in Python Chapter1 PDF
31 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
57 pages
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
No ratings yet
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
76 pages
Data Analytics Using R (DA-R)
100% (1)
Data Analytics Using R (DA-R)
67 pages
Chapter2-Neural+Network PartA
No ratings yet
Chapter2-Neural+Network PartA
38 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
face2face Second Edition Overview
No ratings yet
face2face Second Edition Overview
20 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
Project DVT CarInsurance
No ratings yet
Project DVT CarInsurance
10 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
20 pages
Assignment # 1
No ratings yet
Assignment # 1
3 pages
Machine Learning Math Primer
No ratings yet
Machine Learning Math Primer
4 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
ML Section16 Causality
No ratings yet
ML Section16 Causality
57 pages
Data Visualization for Analysts
No ratings yet
Data Visualization for Analysts
4 pages
Aiml Manual 6th Sem
No ratings yet
Aiml Manual 6th Sem
15 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Dimensions
No ratings yet
Dimensions
61 pages
Abstraction and Interface
No ratings yet
Abstraction and Interface
17 pages
Analysis and Design of Algorithms 2nd Edition Amrinder Arora PDF Available
100% (12)
Analysis and Design of Algorithms 2nd Edition Amrinder Arora PDF Available
127 pages
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson Ready To Read
100% (2)
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson Ready To Read
84 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
ARMA-Stochastic Time Series Modeling
100% (1)
ARMA-Stochastic Time Series Modeling
19 pages
Outlier Detection Techniques
No ratings yet
Outlier Detection Techniques
55 pages
Scribd DDD
No ratings yet
Scribd DDD
175 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
73 pages
Feature Selection Engineering
No ratings yet
Feature Selection Engineering
72 pages
Machine Learning Basics: 1. General Introduction
No ratings yet
Machine Learning Basics: 1. General Introduction
46 pages
AirAsia DataScientist Interview Questions Take-Home
No ratings yet
AirAsia DataScientist Interview Questions Take-Home
5 pages
Marketing Experimentation Guide
No ratings yet
Marketing Experimentation Guide
41 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
Bayes' Theorem
No ratings yet
Bayes' Theorem
2 pages
Module 6 Data Visualiztion Matplotlib
No ratings yet
Module 6 Data Visualiztion Matplotlib
69 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
Image Alt Attribute
100% (2)
Image Alt Attribute
86 pages
Confirmatory Factor Analysis
No ratings yet
Confirmatory Factor Analysis
55 pages
ML6 TH, 7 TH
No ratings yet
ML6 TH, 7 TH
3 pages
Python Functions Guide
No ratings yet
Python Functions Guide
25 pages
DMPM-LAB-03-Assignment: Rcode
No ratings yet
DMPM-LAB-03-Assignment: Rcode
9 pages
Predictive Analytics for Car Pricing
No ratings yet
Predictive Analytics for Car Pricing
8 pages
Data Analysis for Blood Pressure
No ratings yet
Data Analysis for Blood Pressure
8 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
Avocado Price Analysis by Region
No ratings yet
Avocado Price Analysis by Region
9 pages
R Markdown & Data Science Basics
No ratings yet
R Markdown & Data Science Basics
1 page
Automobile Chassis Important Questions
No ratings yet
Automobile Chassis Important Questions
2 pages
A075S00342V-Nr 392 AUDI A5
No ratings yet
A075S00342V-Nr 392 AUDI A5
92 pages
OM - Superleggera V4 - MY20 - en
No ratings yet
OM - Superleggera V4 - MY20 - en
338 pages
Bodywork 3
No ratings yet
Bodywork 3
42 pages
Section 7 Electrical System
No ratings yet
Section 7 Electrical System
3 pages
Spoiler (Automotive) : For The Aircraft Component, See
No ratings yet
Spoiler (Automotive) : For The Aircraft Component, See
4 pages
Coduri Traduse in Romananeste Gta San Andreas
No ratings yet
Coduri Traduse in Romananeste Gta San Andreas
3 pages
Wheel Alignment Explained
No ratings yet
Wheel Alignment Explained
19 pages
Tesla: Electric Cars & Energy Innovations
No ratings yet
Tesla: Electric Cars & Energy Innovations
34 pages
DESCH New Clutch MC 18 GB
No ratings yet
DESCH New Clutch MC 18 GB
2 pages
QY50V532 - Chasis Electrical System
100% (5)
QY50V532 - Chasis Electrical System
24 pages
Aircraft and Helicopter Components
100% (1)
Aircraft and Helicopter Components
40 pages
"Air Powered Car - Future of Transportation": A Seminar Report On
No ratings yet
"Air Powered Car - Future of Transportation": A Seminar Report On
37 pages
k1200lt Speedo Correction
No ratings yet
k1200lt Speedo Correction
7 pages
Maruti Suzuki
No ratings yet
Maruti Suzuki
15 pages
FP Daimler Benz
No ratings yet
FP Daimler Benz
32 pages
Steeda Catalog
100% (1)
Steeda Catalog
96 pages
SL2065 Final 03 01 2017
No ratings yet
SL2065 Final 03 01 2017
24 pages
Mazda2 Supermini Features & Models
No ratings yet
Mazda2 Supermini Features & Models
20 pages
Tire Inspection Report
No ratings yet
Tire Inspection Report
1 page
Honda Motor Co. Analysis Report
No ratings yet
Honda Motor Co. Analysis Report
56 pages
Opel Kadett
No ratings yet
Opel Kadett
15 pages
Application For Transfer of Registration
No ratings yet
Application For Transfer of Registration
3 pages
Vehicle Standards Guide 5 (VSG-5) : Converting A Vehicle Into A Motorhome
No ratings yet
Vehicle Standards Guide 5 (VSG-5) : Converting A Vehicle Into A Motorhome
7 pages
Report Autodna Wauzzz8k4ea098780
No ratings yet
Report Autodna Wauzzz8k4ea098780
9 pages
Catalogo General Autolite 2021
No ratings yet
Catalogo General Autolite 2021
63 pages
Hyundai Brand Audit in India
No ratings yet
Hyundai Brand Audit in India
3 pages
PİSTON JapanKoreaCatalogue
No ratings yet
PİSTON JapanKoreaCatalogue
83 pages
Wiring Diagrams-6
No ratings yet
Wiring Diagrams-6
6 pages
Efi Euro 1: Aim Infotech
No ratings yet
Efi Euro 1: Aim Infotech
6 pages

Car Price Analysis and Modeling

Uploaded by

Car Price Analysis and Modeling

Uploaded by

Assignment 9: How much for that car?

Relationship Between Price with Liter and Mileage

2 3 4 5 6 0 10000 20000 30000 40000 50000

continuous_model <-lm(Price~Mileage + Liter, data = cars)

term estimate std.error statistic p.value

# form surface matrix and give to plotly

continuous_df <- cars %>%

Observed vs Predicted of Price

10000 20000 30000 40000

This obviously does not follow a bell shape curve. ## Exercise 5

Saturn Chevrolet Pontiac Buick SAAB Cadillac

cars_factor_df <- cars %>%

mixed_model <-lm(Price~Mileage + Liter + Cylinder + Make +

term estimate std.error statistic p.value

Yes, there are slopes for all of the categorical variables.

mixed_df <- cars_factor_df %>%

Observed vs Predicted of Price

10000 20000 30000 40000 50000

10000 20000 30000 40000 50000

You might also like