0% found this document useful (0 votes)

63 views20 pages

Fall 2023-2024 IE 451 Homework 2 Solutions

This document discusses analyzing a dataset called Auto using R. It contains: - Descriptions of which variables are quantitative vs qualitative - The ranges for each quantitative variable - Calculations of summary statistics like mean, standard deviation, minimum and maximum values for each quantitative variable. Tables are presented to show these results in a concise format.

Uploaded by

Abdullah Bingazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views20 pages

Fall 2023-2024 IE 451 Homework 2 Solutions

Uploaded by

Abdullah Bingazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

Table of contents
• 1 Question 9
• 2 Question 10

Fall 2023-2024 IE 451 Homework 2

Solutions
Author

Deniz Şahin

1 Question 9
This exercise involves the Auto data set studied in the lab. Make sure that the
missing values have been removed from the data.

(If you check the description, missing values have been already removed from the
data set)

Auto %>% head %>% pander

Table 1: The ﬁrst six rows of Auto dataset

mpg cylinders displacement horsepower weight acceleration year origin name
chevrolet
18 8 307 130 3504 12 70 1 chevelle
malibu
buick
15 8 350 165 3693 11.5 70 1 skylark
320
plymouth
18 8 318 150 3436 11 70 1
satellite
amc
16 8 304 150 3433 12 70 1
rebel sst
ford
17 8 302 140 3449 10.5 70 1
torino
ford
15 8 429 198 4341 10 70 1 galaxie
500

Auto %>% summary()

mpg cylinders displacement horsepower weight

Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140

1 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

acceleration year origin name

Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
(Other) :365

#?Auto

(a) Which of the predictors are quantitative, and which are qualitative?

If we check the Auto data set using ?Auto we will see that mpg, cylinders,
displacement, horsepower, weight, acceleration and year are the quantitative
predictors. On the other hand origin and name are the qualitative predictors.
Observe that even though origin takes integer values, each number corresponds to
a country, hence this is a categorical predictor. We can factor this one.
d1 <- as_tibble(Auto) %>%
mutate(origin=factor(origin))

(b) What is the range of each quantitative predictor? You can answer this
using the range() function.
d1 %>%
summarise(range(mpg),range(cylinders),range(displacement),range(horsepower),range(weight),range
%>%
pander()

Table 2: Ranges of variables

range(mpg) range(cylinders) range(displacement) range(horsepower) range(weigh
9 3 68 46 1613
46.6 8 455 230 5140

or
d1 %>%
summarize(across(where(is.numeric), list(min=min, max=max))) %>% # calculate all relevant
statistics for every numerical variable
pivot_longer(cols=everything(), names_to = c("variable", "stat"), names_pattern =
"(.*)_(.*)") %>% # collect variable names, statistic names, and values in three
columns
pivot_wider(names_from = stat, values_from = value) %>% # place stat names across the
columns
pander()
# Alternatively
d1 %>%
summarize(across(where(is.numeric), list(min=min, max=max))) %>% # calculate all relevant
statistics for every numerical variable
pivot_longer(cols=everything(), names_to = c("variable", "stat"), names_pattern =
"(.*)_(.*)") %>% # collect variable names, statistic names, and values in three
columns
pivot_wider(names_from = variable, values_from = value) %>% # place variable names across
the columns
pander()

Table 3: If the number of variables is large, we can apply the same function to all
variables at once

2 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

(a) Statistics are across the columns

variable min max
mpg 9 46.6
cylinders 3 8
displacement 68 455
horsepower 46 230
weight 1613 5140
acceleration 8 24.8
year 70 82
(b) Variables are across the columns
stat mpg cylinders displacement horsepower weight acceleration year
min 9 3 68 46 1613 8 70
max 46.6 8 455 230 5140 24.8 82

Let us extend the usage to several other statistics: calculate min, mean, median,
sd, max for every numerical values.
d1 %>%
summarize(across(where(is.numeric), list(min=min, mean = mean, median = median, sd = sd,
max=max))) %>% # calculate all relevant statistics for every numerical variable
mutate(across(everything(), round, digits=2)) %>% # do not let more than two decimals appear
in the table
pivot_longer(cols=everything(), names_to = c("variable", "stat"), names_pattern =
"(.*)_(.*)") %>% # collect variable names, statistic names, and values in three
columns
pivot_wider(names_from = variable, values_from = value) %>% # place variable names across
the columns
pander()

stat mpg cylinders displacement horsepower weight acceleration year

min 9 3 68 46 1613 8 70
mean 23.45 5.47 194.4 104.5 2978 15.54 75.98
median 22.75 4 151 93.5 2804 15.5 76
sd 7.81 1.71 104.6 38.49 849.4 2.76 3.68
max 46.6 8 455 230 5140 24.8 82

(c) What is the mean and standard deviation of each quantitative

predictor?
d1 %>%
summarise(across(where(is.numeric),list(mean=mean,sd=sd))) %>%
pivot_longer(cols = everything(), names_to = c("variable","stat"), names_sep = "_") %>%
pivot_wider(names_from = variable, values_from = value) %>%
pander()

stat mpg cylinders displacement horsepower weight acceleration year

mean 23.45 5.472 194.4 104.5 2978 15.54 75.98
sd 7.805 1.706 104.6 38.49 849.4 2.759 3.684

## Alternatively

d1 %>%
summarise(mean(mpg),mean(cylinders),mean(displacement),mean(horsepower),mean(weight),mean(acceleratio

3 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

# A tibble: 1 x 7
`mean(mpg)` `mean(cylinders)` `mean(displace~` `mean(horsepow~` `mean(weight)`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 23.4 5.47 194. 104. 2978.
# ... with 2 more variables: `mean(acceleration)` <dbl>, `mean(year)` <dbl>

d1 %>%
summarise(sd(mpg),sd(cylinders),sd(displacement),sd(horsepower),sd(weight),sd(acceleration),sd

# A tibble: 1 x 7
`sd(mpg)` `sd(cylinders)` `sd(displacement)` `sd(horsepower)` `sd(weight)`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 7.81 1.71 105. 38.5 849.
# ... with 2 more variables: `sd(acceleration)` <dbl>, `sd(year)` <dbl>

(d) Now remove the 10th through 85th observations. What is the range,
mean, and standard deviation of each predictor in the subset of the data
that remains?
d1_new <- d1 %>%
filter(!row_number() %in% 10:85)

d1_new %>%
summarise(across(where(is.numeric), list(min=min, max=max, mean=mean, sd=sd))) %>%
pivot_longer(cols = everything(), names_to = c("variable","stat"), names_sep = "_") %>%
pivot_wider(names_from=stat, values_from = value) %>%
pander()

variable min max mean sd

mpg 11 46.6 24.4 7.867
cylinders 3 8 5.373 1.654
displacement 68 455 187.2 99.68
horsepower 46 230 100.7 35.71
weight 1649 4997 2936 811.3
acceleration 8.5 24.8 15.73 2.694
year 70 82 77.15 3.106

(e) Using the full data set, investigate the predictors graphically, using
scatterplots or other tools of your choice. Create some plots highlighting
the relationships among the predictors. Comment on your ﬁndings.
my_fn <- function(data, mapping, method="loess", ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(formula = y ~ x,method=method,se=F, ...)
p
}

select(d1,c(1:7)) %>%
ggpairs(progress = FALSE,
upper = list(continuous = my_fn),
lower = list(continuous = "cor"))

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : pseudoinverse used at 6

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : neighborhood radius 2

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : reciprocal condition number 9.6355e-017

4 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

Observe that the correlation between displacement, horsepower and weight are
quite high. Moreover their relation within each other seem to be linear.

As displacement, horsepower and weight increases, mpg decreases.

mpg, displacement and horsepower have right-skewed distributions.

• How does mpg changes with respect to weight?

d1 %>%
ggplot(aes(x=weight,y=mpg))+
geom_point(aes(color=origin))

5 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

• Does number of cylinders have an eﬀect on mpg?

d1 %>%
mutate(cylinders=factor(cylinders)) %>%
ggplot(aes(cylinders,mpg))+
geom_boxplot()

6 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

• Does acceleration decrease as weight increases?

d1 %>%
ggplot(aes(weight,acceleration))+
geom_point(aes(color=origin))+
geom_smooth(se=FALSE)

`geom_smooth()` using method = 'loess' and formula 'y ~ x'

7 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

• Is mpg lower at older cars?

d1 %>% ggplot(aes(year,mpg))+
geom_point(aes(color=origin))+
geom_smooth(se=FALSE,color="black")

`geom_smooth()` using method = 'loess' and formula 'y ~ x'

8 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

• How does the proportion of the origins change within years?

d1 %>%
ggplot()+
geom_bar(aes(x=year,fill=origin))

9 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the
other variables. Do your plots suggest that any of the other variables might
be useful in predicting mpg? Justify your answer.

Almost every variable has a relation with mpg as it is seen from the graphs. We can
check the answer with linear models. However, be careful, as the correlation
between some variables are quite high to use in the same model. Let’s check them
separately.

summary(lm(mpg~displacement,d1))

Call:
lm(formula = mpg ~ displacement, data = d1)

Residuals:
Min 1Q Median 3Q Max
-12.9170 -3.0243 -0.5021 2.3512 18.6128

Coefficients:

10 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

Estimate Std. Error t value Pr(>|t|)

(Intercept) 35.12064 0.49443 71.03 <2e-16 ***
displacement -0.06005 0.00224 -26.81 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.635 on 390 degrees of freedom

Multiple R-squared: 0.6482, Adjusted R-squared: 0.6473
F-statistic: 718.7 on 1 and 390 DF, p-value: < 2.2e-16

summary(lm(mpg~horsepower,d1))

Call:
lm(formula = mpg ~ horsepower, data = d1)

Residuals:
Min 1Q Median 3Q Max
-13.5710 -3.2592 -0.3435 2.7630 16.9240

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.935861 0.717499 55.66 <2e-16 ***
horsepower -0.157845 0.006446 -24.49 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.906 on 390 degrees of freedom

Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16

summary(lm(mpg~weight,d1))

Call:
lm(formula = mpg ~ weight, data = d1)

Residuals:
Min 1Q Median 3Q Max
-11.9736 -2.7556 -0.3358 2.1379 16.5194

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.216524 0.798673 57.87 <2e-16 ***
weight -0.007647 0.000258 -29.64 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.333 on 390 degrees of freedom

Multiple R-squared: 0.6926, Adjusted R-squared: 0.6918
F-statistic: 878.8 on 1 and 390 DF, p-value: < 2.2e-16

summary(lm(mpg~acceleration,d1))

Call:
lm(formula = mpg ~ acceleration, data = d1)

Residuals:
Min 1Q Median 3Q Max
-17.989 -5.616 -1.199 4.801 23.239

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8332 2.0485 2.359 0.0188 *
acceleration 1.1976 0.1298 9.228 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

11 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

Residual standard error: 7.08 on 390 degrees of freedom

Multiple R-squared: 0.1792, Adjusted R-squared: 0.1771
F-statistic: 85.15 on 1 and 390 DF, p-value: < 2.2e-16

summary(lm(mpg~I(year-70),d1))

Call:
lm(formula = mpg ~ I(year - 70), data = d1)

Residuals:
Min 1Q Median 3Q Max
-12.0212 -5.4411 -0.4412 4.9739 18.2088

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.09081 0.61331 26.24 <2e-16 ***
I(year - 70) 1.23004 0.08736 14.08 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.363 on 390 degrees of freedom

Multiple R-squared: 0.337, Adjusted R-squared: 0.3353
F-statistic: 198.3 on 1 and 390 DF, p-value: < 2.2e-16

summary(lm(mpg~displacement+horsepower+weight+acceleration+I(year-70),d1))

Call:
lm(formula = mpg ~ displacement + horsepower + weight + acceleration +
I(year - 70), data = d1)

Residuals:
Min 1Q Median 3Q Max
-8.5211 -2.3920 -0.1036 2.0312 14.2874

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.3527596 2.0617039 18.117 <2e-16 ***
displacement 0.0027817 0.0054617 0.509 0.611
horsepower 0.0010201 0.0137631 0.074 0.941
weight -0.0068738 0.0006653 -10.333 <2e-16 ***
acceleration 0.0903236 0.1019070 0.886 0.376
I(year - 70) 0.7541153 0.0526118 14.334 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.435 on 386 degrees of freedom

Multiple R-squared: 0.8088, Adjusted R-squared: 0.8063
F-statistic: 326.5 on 5 and 386 DF, p-value: < 2.2e-16

Observe that even though separate linear models suggest that each of these
variables are significant for the model, when we put them all together in a model,
only weight and year remained significant. As stated earlier, this may be caused by
higher correlation values. Let’s write a final model:

summary(lm(mpg~weight+I(year-70),d1))

Call:
lm(formula = mpg ~ weight + I(year - 70), data = d1)

Residuals:
Min 1Q Median 3Q Max
-8.8505 -2.3014 -0.1167 2.0367 14.3555

12 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.6650267 0.8015347 48.24 <2e-16 ***
weight -0.0066321 0.0002146 -30.91 <2e-16 ***
I(year - 70) 0.7573183 0.0494727 15.31 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.427 on 389 degrees of freedom

Multiple R-squared: 0.8082, Adjusted R-squared: 0.8072
F-statistic: 819.5 on 2 and 389 DF, p-value: < 2.2e-16

This model seems well.

2 Question 10
This exercise involves the Boston housing data set.

(a) How many rows are in this data set? How many columns? What do the
rows and columns represent?

head(Boston)

crim zn indus chas nox rm age dis rad tax ptratio lstat medv
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 4.98 24.0
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 9.14 21.6
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 4.03 34.7
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 2.94 33.4
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 5.33 36.2
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 5.21 28.7

#?Boston

d2 <- as_tibble(Boston) %>% mutate(chas=factor(chas))

ncol(d2) #number of columns

[1] 13

nrow(d2) #number of rows

[1] 506

crim: per capita crime rate by town.

zn: proportion of residential land zoned for lots over 25,000 sq.ft.

indus: proportion of non-retail business acres per town.

chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).

nox: nitrogen oxides concentration (parts per 10 million).

rm: average number of rooms per dwelling.

age: proportion of owner-occupied units built prior to 1940.

dis: weighted mean of distances to ﬁve Boston employment centres.

rad: index of accessibility to radial highways.

13 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

tax: full-value property-tax rate per $10,000.

ptratio: pupil-teacher ratio by town.

lstat: lower status of the population (percent).

medv: median value of owner-occupied homes in $1000s.

(b) Make some pairwise scatterplots of the predictors (columns) in this data set.
Describe your ﬁndings.
my_fn <- function(data, mapping, method="loess", ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(formula = y ~ x,method=method,se=F, ...)
p
}

select(d2,-chas) %>%
ggpairs(progress = FALSE,
upper = list(continuous = my_fn),
lower = list(continuous = "cor"))

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : pseudoinverse used at -0.5

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : neighborhood radius 13

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : reciprocal condition number 2.9038e-031

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

parametric, : There are other near singularities as well. 156.25

14 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

As distance from employment areas increases, nitrogen oxides concentration and

crime rates decrease.

15 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

As the median value of owner-occupied homes increases, the percent lower status
of the population decreases.

As distance increases, crime rate decreases.

As the median value of owner-occupied homes increases, the crime rate decreases.

Finally, as the percent of the lower status of the population increases, the crime
rate also increases.

(d) Do any of the census tracts of Boston appear to have particularly high crime
rates? Tax rates? Pupil-teacher ratios? Comment on the range of each predictor.
par(mfrow=c(1,3))
boxplot(d2$crim, xlab="crim")
boxplot(d2$tax, xlab="tax")
boxplot(d2$ptratio, xlab="ptratio")

16 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

For crim: This variable has many outliers on the upper extreme. The data ranges
from 0 to 80, while the outliers range from 10 to 80. Many of the outliers are
between 10 and 30. There are some extreme values more than 70. Moreover, this
variable is right skewed.

For tax: This variable has no extreme points. The data ranges from 200 to 700. The
median is around 350, this variable is also right-skewed.

For ptratio: This variable has two lower extreme outliers. The data ranges from
12.6 to 22. Median is around 19.

(e) How many of the census tracts in this data set bound the Charles river?

sum(d2$chas == 1)

[1] 35

(f) What is the median pupil-teacher ratio among the towns in this data set?

17 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

median(d2$ptratio)

[1] 19.05

(g) Which census tract of Boston has lowest median value of owneroccupied
homes? What are the values of the other predictors for that census tract, and how
do those values compare to the overall ranges for those predictors? Comment on
your ﬁndings.
lwst <- d2 %>% filter(medv==min(medv))
lwst_ind <- which(d2$medv == min(d2$medv)) ##which index?
lwst %>% pander()

Census Tracts with Lowest Median Values of Owneroccupied Homes

crim zn indus chas nox rm age dis rad tax ptratio lstat medv
38.35 0 18.1 0 0.693 5.453 100 1.49 24 666 20.2 30.59 5
67.92 0 18.1 0 0.693 5.683 100 1.425 24 666 20.2 22.98 5

d2 %>%
summarise(across(where(is.numeric), quantile)) %>%
pander()

Quantiles of variables
crim zn indus nox rm age dis rad tax ptratio lstat medv
0.00632 0 0.46 0.385 3.561 2.9 1.13 1 187 12.6 1.73 5
0.08204 0 5.19 0.449 5.886 45.02 2.1 4 279 17.4 6.95 17.02
0.2565 0 9.69 0.538 6.208 77.5 3.207 5 330 19.05 11.36 21.2
3.677 12.5 18.1 0.624 6.623 94.07 5.188 24 666 20.2 16.96 25
88.98 100 27.74 0.871 8.78 100 12.13 24 711 22 37.97 50

As it is seen, there are two data points which have the lowest value for the median
value of owneroccupied homes.

If we check the quantiles:

The crime rates are higher than 75% of all data points, in these areas.

These areas are within the closest 25% of all data points, from employment
centres.

The percentage of lower status of the population is higher than 75% of all data
points, in these areas.

Both of these ares doesn’t bound Charles River.

(h) In this data set, how many of the census tracts average more than seven rooms
per dwelling? More than eight rooms per dwelling? Comment on the census tracts
that average more than eight rooms per dwelling.

sum(d2$rm >7) ##number of census tracts average more than seven rooms per dwelling

[1] 64

sum(d2$rm > 8) ##number of census tracts average more than eight rooms per dwelling

18 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

[1] 13

par(mfrow=c(2,3))
d2_upd <- d2 %>% filter(rm > 8)
boxplot(d2$crim,d2_upd$crim,xlab="crim", names=c("std","more_than_8"))
boxplot(d2$nox,d2_upd$nox,xlab="nox", names=c("std","more_than_8"))
boxplot(d2$dis,d2_upd$dis,xlab="dis", names=c("std","more_than_8"))
boxplot(d2$tax,d2_upd$tax,xlab="tax", names=c("std","more_than_8"))
boxplot(d2$lstat,d2_upd$lstat,xlab="lstat", names=c("std","more_than_8"))
boxplot(d2$medv,d2_upd$medv,xlab="medv", names=c("std","more_than_8"))

crim: As it is seen on the boxplot, crime rates are quite low in these census tracks
in comparison with the whole data, no crime value is more than 10.

nox: The median value for this variable is less than the whole data.

dis: The median is similar with the whole data, however, the highest dis value is
lower than the original data.

19 of 20 19/02/2024, 15:04
Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

tax: Tax is lower with respect to the original data, with the exception of two
outliers. The median values are similar.

lstat: Lower status of the population is quite low in these census tracks, with no
extreme points, and the median value is less than the original data.

medv: This variable is higher in the updated dataset, probably extreme points of
the original dataset belong to this new dataset.

20 of 20 19/02/2024, 15:04

Assignment Auto
No ratings yet
Assignment Auto
6 pages
Chapter 4 Exercise 11
No ratings yet
Chapter 4 Exercise 11
5 pages
Data Analytics Solution - Assignment - 1
No ratings yet
Data Analytics Solution - Assignment - 1
3 pages
CMSC 177 - Regressionlr&Svm
No ratings yet
CMSC 177 - Regressionlr&Svm
30 pages
Lab 4
No ratings yet
Lab 4
4 pages
Week2 Submission Assignment Solution AshaA-3
No ratings yet
Week2 Submission Assignment Solution AshaA-3
2 pages
As Data Manipulation With Dplyr-2
No ratings yet
As Data Manipulation With Dplyr-2
6 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
Manual vs Auto Transmission MPG Analysis
No ratings yet
Manual vs Auto Transmission MPG Analysis
5 pages
Big Data Analytics Practical Guide
No ratings yet
Big Data Analytics Practical Guide
41 pages
7406HW03
No ratings yet
7406HW03
2 pages
R
No ratings yet
R
3 pages
R Studio
No ratings yet
R Studio
4 pages
Assignment
No ratings yet
Assignment
49 pages
Mtcars Dataset: Multilinear Regression Analysis
No ratings yet
Mtcars Dataset: Multilinear Regression Analysis
13 pages
R Program
No ratings yet
R Program
2 pages
Badm 8th Record R Language
No ratings yet
Badm 8th Record R Language
6 pages
Regression Models Assignment 1
No ratings yet
Regression Models Assignment 1
5 pages
Data - Wrangling Analysis
No ratings yet
Data - Wrangling Analysis
26 pages
Regression Models Assignment 1
No ratings yet
Regression Models Assignment 1
5 pages
Week 02 Data Wrangling
No ratings yet
Week 02 Data Wrangling
10 pages
Report FinalProject
No ratings yet
Report FinalProject
89 pages
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
Economics 400 Computer Exercise
No ratings yet
Economics 400 Computer Exercise
7 pages
Introduction to Base R Programming
No ratings yet
Introduction to Base R Programming
10 pages
BANA 3010 Assignment 2
No ratings yet
BANA 3010 Assignment 2
3 pages
R Studio
No ratings yet
R Studio
5 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
HW3 Isye 7406
No ratings yet
HW3 Isye 7406
8 pages
Activity 2
No ratings yet
Activity 2
16 pages
Statistics and Data Science With R Part - 4
No ratings yet
Statistics and Data Science With R Part - 4
23 pages
R Lab Ex 1 To 5
No ratings yet
R Lab Ex 1 To 5
26 pages
Car Transmission & MPG Analysis
No ratings yet
Car Transmission & MPG Analysis
6 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Bda File
No ratings yet
Bda File
54 pages
S We 2009872
No ratings yet
S We 2009872
13 pages
DMPM-LAB-03-Assignment: Rcode
No ratings yet
DMPM-LAB-03-Assignment: Rcode
9 pages
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
No ratings yet
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
16 pages
CS605 Labcf
No ratings yet
CS605 Labcf
30 pages
Topic
No ratings yet
Topic
9 pages
ProbList2 24 SLN
No ratings yet
ProbList2 24 SLN
20 pages
DS On MTCARS Solutions
No ratings yet
DS On MTCARS Solutions
3 pages
ISyE7406 Homework3
No ratings yet
ISyE7406 Homework3
20 pages
R LAB Exproling Data
100% (2)
R LAB Exproling Data
6 pages
Statistics
No ratings yet
Statistics
10 pages
STA1040 Assignment
No ratings yet
STA1040 Assignment
9 pages
Binning and Normalization Activity
No ratings yet
Binning and Normalization Activity
2 pages
Practice Questions On Central Tendency On Mtcars
No ratings yet
Practice Questions On Central Tendency On Mtcars
3 pages
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
No ratings yet
'Horsepower' "?" 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower' 'Horsepower'
5 pages
S We 2009872
No ratings yet
S We 2009872
13 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
Assignment - CarsData - Descriptive - EDA - Munjal - Exercise - Ipynb - Colaboratory
No ratings yet
Assignment - CarsData - Descriptive - EDA - Munjal - Exercise - Ipynb - Colaboratory
6 pages
R Basics for Data Science Students
No ratings yet
R Basics for Data Science Students
16 pages
Business Analytics-1: STR (Crew - Data)
No ratings yet
Business Analytics-1: STR (Crew - Data)
16 pages
Exercises 2 Unfinished
No ratings yet
Exercises 2 Unfinished
8 pages
Practical 5
No ratings yet
Practical 5
5 pages
HW12
No ratings yet
HW12
10 pages
English5 Q3 Ver4 Mod7 Infer The Meaning of Unfamiliar Words Based-On Other Strategies - (Health) L15-L17
No ratings yet
English5 Q3 Ver4 Mod7 Infer The Meaning of Unfamiliar Words Based-On Other Strategies - (Health) L15-L17
23 pages
Quality and Manufacturing Acronyms
No ratings yet
Quality and Manufacturing Acronyms
15 pages
Chapter 4 Thinkers Beliefs and Buildings Notes
100% (1)
Chapter 4 Thinkers Beliefs and Buildings Notes
32 pages
Heating Catalogue 2019
No ratings yet
Heating Catalogue 2019
44 pages
Elasticity Theory Applications and Numerics 2nd Edition by Martin Sadd Official Test Bank
No ratings yet
Elasticity Theory Applications and Numerics 2nd Edition by Martin Sadd Official Test Bank
324 pages
E-Recruitment Insights for HR Pros
No ratings yet
E-Recruitment Insights for HR Pros
15 pages
Full Download Graphic Design The New Basics Second Edition Revised and Expanded Ellen Lupton PDF
100% (2)
Full Download Graphic Design The New Basics Second Edition Revised and Expanded Ellen Lupton PDF
54 pages
Storytelling and Worksheet
No ratings yet
Storytelling and Worksheet
3 pages
Progard H3
No ratings yet
Progard H3
15 pages
T780 Industrial Electronics N4 Memo Nov 2024
No ratings yet
T780 Industrial Electronics N4 Memo Nov 2024
7 pages
Evaluation SCIENCE Layers of The Earth - Tectonic Plates
No ratings yet
Evaluation SCIENCE Layers of The Earth - Tectonic Plates
3 pages
Yukitoshi Higashino Mfta
100% (2)
Yukitoshi Higashino Mfta
29 pages
IoT-Based Battery Health Monitoring
No ratings yet
IoT-Based Battery Health Monitoring
6 pages
Java Image Processing
No ratings yet
Java Image Processing
15 pages
Electrical and Solar Installation Technology
100% (1)
Electrical and Solar Installation Technology
4 pages
GIC Housing Finance GICHSG Initiating Coverage 18032016
No ratings yet
GIC Housing Finance GICHSG Initiating Coverage 18032016
11 pages
IIP Mr. & Ms. Palaro 2022-2023 Guide
No ratings yet
IIP Mr. & Ms. Palaro 2022-2023 Guide
2 pages
ANICAS, Jerimi V. - Project - in - IE203
No ratings yet
ANICAS, Jerimi V. - Project - in - IE203
12 pages
Theology For Beginners PDF
No ratings yet
Theology For Beginners PDF
287 pages
The Level of Interest Between Small Scale and Large Scale
No ratings yet
The Level of Interest Between Small Scale and Large Scale
30 pages
General Systems Theory Problems Perspectives and Practice 2nd Edition Lars Skyttner New Release 2025
No ratings yet
General Systems Theory Problems Perspectives and Practice 2nd Edition Lars Skyttner New Release 2025
118 pages
Mobile Tech Resume 3 31 2017
No ratings yet
Mobile Tech Resume 3 31 2017
2 pages
Multiple Choice Questions B A History Capitalism and Colonialism Semester VI
No ratings yet
Multiple Choice Questions B A History Capitalism and Colonialism Semester VI
5 pages
High Seas
No ratings yet
High Seas
7 pages
Aggregate & Capacity Planning Guide
100% (2)
Aggregate & Capacity Planning Guide
10 pages
SPM6-72L 380-400 Watt: Mono Crystalline Module
No ratings yet
SPM6-72L 380-400 Watt: Mono Crystalline Module
2 pages
Report
No ratings yet
Report
27 pages
Rule 8: Action To Avoid A Collision
100% (3)
Rule 8: Action To Avoid A Collision
48 pages
Tài Liệu Không Có Tiêu Đề-2
No ratings yet
Tài Liệu Không Có Tiêu Đề-2
19 pages
Biomaterials 2018 - Ceramic Biomaterials PDF
No ratings yet
Biomaterials 2018 - Ceramic Biomaterials PDF
44 pages

Fall 2023-2024 IE 451 Homework 2 Solutions

Uploaded by

Fall 2023-2024 IE 451 Homework 2 Solutions

Uploaded by

Fall 2023-2024 IE 451 Homework 2 Solutions ﬁle:///home/sdayanik/Downloads/ie451/Homework/...

Fall 2023-2024 IE 451 Homework 2

Auto %>% head %>% pander

Table 1: The ﬁrst six rows of Auto dataset

Auto %>% summary()

mpg cylinders displacement horsepower weight

acceleration year origin name

Table 2: Ranges of variables

(a) Statistics are across the columns

stat mpg cylinders displacement horsepower weight acceleration year

(c) What is the mean and standard deviation of each quantitative

stat mpg cylinders displacement horsepower weight acceleration year

variable min max mean sd

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

As displacement, horsepower and weight increases, mpg decreases.

mpg, displacement and horsepower have right-skewed distributions.

• How does mpg changes with respect to weight?

• Does number of cylinders have an eﬀect on mpg?

• Does acceleration decrease as weight increases?

`geom_smooth()` using method = 'loess' and formula 'y ~ x'

• Is mpg lower at older cars?

`geom_smooth()` using method = 'loess' and formula 'y ~ x'

• How does the proportion of the origins change within years?

Estimate Std. Error t value Pr(>|t|)

Residual standard error: 4.635 on 390 degrees of freedom

Residual standard error: 4.906 on 390 degrees of freedom

Residual standard error: 4.333 on 390 degrees of freedom

Residual standard error: 7.08 on 390 degrees of freedom

Residual standard error: 6.363 on 390 degrees of freedom

Residual standard error: 3.435 on 386 degrees of freedom

Residual standard error: 3.427 on 389 degrees of freedom

This model seems well.

d2 <- as_tibble(Boston) %>% mutate(chas=factor(chas))

ncol(d2) #number of columns

nrow(d2) #number of rows

crim: per capita crime rate by town.

indus: proportion of non-retail business acres per town.

chas: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).

nox: nitrogen oxides concentration (parts per 10 million).

rm: average number of rooms per dwelling.

age: proportion of owner-occupied units built prior to 1940.

dis: weighted mean of distances to ﬁve Boston employment centres.

rad: index of accessibility to radial highways.

tax: full-value property-tax rate per $10,000.

ptratio: pupil-teacher ratio by town.

lstat: lower status of the population (percent).

medv: median value of owner-occupied homes in $1000s.

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

Warning in simpleLoess(y, x, w, span, degree = degree, parametric =

As distance from employment areas increases, nitrogen oxides concentration and

As distance increases, crime rate decreases.

Census Tracts with Lowest Median Values of Owneroccupied Homes

If we check the quantiles:

Both of these ares doesn’t bound Charles River.

You might also like