20ITPW501 - Statistical Analysis using R Programming with Lab
Q No 1
The data for this example is the ages of male and female actors who won the Award for acting
for their work in a leading role. These award winners are from twelve consecutive years.(7)
Female 26,25, 33, 35, 35, 28, 30, 29, 61, 32, 33, 45
Male 46, 40, 36, 47, 29, 43, 37, 38, 45, 50, 48, 60
Write an R program for creating Parallel Box Plots for the above data. Draw the output
> female = c(26,25, 33, 35, 35, 28, 30, 29, 61, 32, 33, 45)
> male = c(46, 40, 36, 47, 29, 43, 37, 38, 45, 50, 48, 60)
> boxplot(male)
> boxplot(female)
> boxplot(male,female)
What is a Histogram? List the parameters used by the hist() function in R. Draw histogram for
the following data 19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39
> data = c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
> hist(data)
20ITPW501 - Statistical Analysis using R Programming with Lab
Q No 2
The table below shows the hours of relief provided by two analgesic drugs in 12 patients
suffering from arthritis. Is there any evidence that one drug provides longer relief than the other?
Solution
Wilcoxon Signed Rank Test to be done
20ITPW501 - Statistical Analysis using R Programming with Lab
Sum of positive ranks = 71
Sum of negative ranks = 7
HO : Null Hypothesis : Both drugs provide same relief
H1 : Alternate Hypothesis : Drug B provides more relief than Drug A (hence subtracting Drug B
- Drug A)
W(Statistics) = 71
W(Critical) = From Wilcox Table at alpha level = 0.05 is with 11 degree of freedom = 11
W(Critical) < W(Statistics) => HO is rejected, H1 is accepted
R Code
> drugA = c(2,3.6,2.6,2.6,7.3,3.4,14.9,6.6,2.3,2.0,6.8,8.5)
> drugB = c(3.5,5.7,2.9,2.4,9.9,3.3,16.7,6.0,3.8,4.0,9.1,20.9)
> wilcox.test(drugA,drugB,paired=TRUE)
Wilcoxon signed rank test with continuity correction
data: drugA and drugB
V = 7, p-value = 0.01344
alternative hypothesis: true location shift is not equal to 0
Q No 3
We have the data of potato yield from 12 different farms. We know that the standard potato yield
for the given variety is µ=20.
x = [21.5, 24.5, 18.5, 17.2, 14.5, 23.2, 22.1, 20.5, 19.4, 18.1, 24.1, 18.5]
Test if the potato yield from these farms is significantly better than the standard yield.
20ITPW501 - Statistical Analysis using R Programming with Lab
Solution
One sample T Test can be used
HO : Null Hypothesis : Mu = 20
H1: Alternate Hypothesis = Mu > 20
Summary of Given Data
Mean = (21.5+24.5+18.5+17.2+ 14.5+23.2+22.1+20.5+19.4+18.1+ 24.1+18.5) / 12 = 20.175
Standard Deviation = 3.021175
> yield = c(21.5, 24.5, 18.5, 17.2, 14.5, 23.2, 22.1, 20.5, 19.4, 18.1, 24.1, 18.5)
> mean(yield)
[1] 20.175
> sd(yield)
[1] 3.021175
Formula for ONE Sample T Test
t(statistics) = (20.175 - 20) / (3.021175/sqrt(12)) = 0.2006
df = Degrees of Freedom = n-1 = 12-1 = 11
Confidence level = 0.95, alpha=0.05. For d.o.f = 12 – 1 = 11,
T(Critical) = 1.796.
t(statistics) <= t(critical) => Accept HO =>Mean of yield is 20
R Code
> t.test(yield,mu=20)
One Sample t-test
20ITPW501 - Statistical Analysis using R Programming with Lab
data: yield
t = 0.20066, df = 11, p-value = 0.8446
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
18.25544 22.09456
sample estimates:
mean of x
20.175
Q No 4
A professor wants to know if her introductory statistics class has a good grasp of basic math. Six
students are chosen at random from the class and given a math proficiency test. The professor
wants the class to be able to score above 70 on the test. The six students get scores of 62, 92, 75,
68, 83, and 95. Can the professor have 90 percent confidence that the mean score for the class on
the test would be above 70
Solutions
One sample T Test can be used
HO : Null Hypothesis : Mu = 70
H1: Alternate Hypothesis = Mu > 70
Summary of Given Data
Mean = (62+92+75+69+83+95)/6 = 79.333
Standard Deviation = S = 13.16688
> marks = c(62, 92, 75, 68, 83,95)
> mean(marks)
[1] 79.16667
> sd(marks)
[1] 13.16688
20ITPW501 - Statistical Analysis using R Programming with Lab
t(statistics) = (79.16667-70)/(13.16688/sqrt(6)) = 1.7053
df = Degrees of freedom = n-1= 6-1 = 5
t(critical) at 95% level of significance = alpha = 0.05 = 1.476
t(statistics) > t(critical) => Reject Null Hypothesis : HO : Mean is not equal to 70
R Code
> t.test(marks,mu=70)
One Sample t-test
data: marks
t = 1.7053, df = 5, p-value = 0.1489
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
65.34888 92.98446
sample estimates:
mean of x
79.16667
Q No 5
20ITPW501 - Statistical Analysis using R Programming with Lab
One way to measure a person’s fitness is to measure their body fat percentage. Average body fat
percentages vary by age, but according to some guidelines, the normal range for men is 15-20%
body fat, and the normal range for women is 20-25% body fat.Our sample data is from a group
of men and women who did workouts at a gym three times a week for a year. Then, their trainer
measured the body fat. The table below shows Body fat percentage data grouped by gender
Test the significance of whether the body fat differs in the population between men and women
using two sample T test.
Solution
Two Sample T Test can be used
Null Hypothesis : HO : Mu1 = Mu2 Both means are equal
Alternative Hypothesis : H1: Mu1 != Mu2 Both means are not equal
Summary of Given Data
> men=c(13.3,6.0, 20.0, 8.0, 14.0,19.0, 18.0, 25.0, 16.0, 24.0,15.0, 1.0,
15.0)
> women = c(22.0, 16.0, 21.7, 21.0, 30.0,26.0, 12.0, 23.2, 28.0, 23.0)
> mean(men)
[1] 14.94615
20ITPW501 - Statistical Analysis using R Programming with Lab
> mean(women)
[1] 22.29
> sd(men)
[1] 6.842589
> sd(women)
[1] 5.31966
Number of Samples Mean Standard Deviation
Men 13 14.94615 6.842589
Women 10 22.29 5.31966
Two Sample T Test Formula
t(statistics) = (22.29-14.94615) / sqrt((5.31966*5.31966)/10+(6.842589 * 6.842589)/13) = 2.89
df = degrees of freedom = n1+n2 - 2 = 10+13-1 = 21
t(critical) value with α = 0.05 and 21 degrees of freedom is = 2.080
t(statistics) > t(critical) => Reject the null hypothesis => Both means are statistically not equal
20ITPW501 - Statistical Analysis using R Programming with Lab
R Code
> t.test(women, men, alternative="two.sided")
Welch Two Sample t-test
data: women and men
t = 2.8958, df = 20.989, p-value = 0.00865
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.069692 12.618000
sample estimates:
mean of x mean of y
22.29000 14.94615
Q No 6
Researchers want to examine the effect of perceived control on health complaints of geriatric
patients in a long-term care facility. Thirty patients are randomly selected to participate in the
study. Half are given a plant to care for and half are given a plant but the care is conducted by the
staff. Number of health complaints are recorded for each patient over the following seven days.
Compute the appropriate t-test for the data provided below.
Control over Plant No Control over Plant
23 35
12 21
6 26
15 24
18 17
5 23
20ITPW501 - Statistical Analysis using R Programming with Lab
21 37
18 22
34 16
10 38
23 23
14 41
19 27
23 24
8 32
Solution
Two Sample T Test can be used
Null Hypothesis : HO : Mu1 = Mu2 Both means are equal
Alternative Hypothesis : H1: Mu1 != Mu2 Both means are not equal
Summary of Given Data
> control = c(23, 12, 6, 15, 18, 5, 21, 18, 34, 10, 23, 14, 19, 23, 8)
> no_control = c(35, 21, 26, 24, 17, 23, 37, 22, 16, 38, 23, 41, 27, 24, 32)
> length(control)
[1] 15
> length(no_control)
[1] 15
> mean(control)
[1] 16.6
> mean(no_control)
[1] 27.06667
> sd(control)
[1] 7.790104
> sd(no_control)
[1] 7.741047
20ITPW501 - Statistical Analysis using R Programming with Lab
Number of Samples Mean Standard Deviation
Control 15 16.6 7.790104
No Control 15 27.06667 7.741047
Two Sample T Test Formula
t(statistics) = (27.06667 - 16.6) / sqrt((7.741047*7.741047)/15 + (7.790104*7.790104)/15)
t(statistics) = 3.69116
df = Degrees of freedom = n1 + n2 - 2 = 15+15-2 = 28
t(critical) at 28 degrees of freedom at 95% confidence interval = 2.763
t(statistics) > t(critical) => Reject Null hypothesis
R Code
> t.test(no_control,control,alternative = "two.sided")
Welch Two Sample t-test
data: no_control and control
20ITPW501 - Statistical Analysis using R Programming with Lab
t = 3.6912, df = 27.999, p-value = 0.0009556
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.65819 16.27514
sample estimates:
mean of x mean of y
27.06667 16.60000
Q No 7
A study assessed the effectiveness of a new drug designed to reduce repetitive behaviors in
children affected with autism. A total of 8 children with autism enroll in the study and the
amount of time that each child is engaged in repetitive behavior during three hour observation
periods are measured both before treatment and then again after taking the new medication for a
period of 1 week. Conclude that the treatment results in a statistically significant improvement or
not at α=0.05 using Wilcoxon signed rank test The data are shown below.
Solution
HO : Null Hypothesis : No change in mean before and after treatment
H1 : Alternate Hypothesis : The treatment does not result in significant change, before treament
is better.
Wilcoxon Signed Rank Test can be used
20ITPW501 - Statistical Analysis using R Programming with Lab
Difference
Child Before Treatment After 1 Week of Treatment (Before-After)
1 85 75 10
2 70 50 20
3 40 50 -10
4 65 40 25
5 80 20 60
6 75 65 10
7 55 40 15
8 20 25 -5
Ordered Absolute Values of Difference Scores Signed
Observed Differences Ranks
Ranks
10 -5 1 -1
20 10 3 3
-10 -10 3 -3
25 10 3 3
60 15 5 5
10 20 6 6
15 25 7 7
-5 60 8 8
H0: The median difference is zero versus
H1: The median difference is positive α=0.05
W+ = 32
20ITPW501 - Statistical Analysis using R Programming with Lab
W- = 4
w(statistics) = 4
df = Degrees of freedom = df = n-1 = 8-1 = 7
w(critical) = 2
w(statistics) > w(critical) =>Reject Null Hypothesis
R Code
> wilcox.test(before_treatment,after_treatement,paired = TRUE, alternative = "two.sided")
Wilcoxon signed rank test with continuity correction
data: before_treatment and after_treatement
V = 32, p-value = 0.05747
alternative hypothesis: true location shift is not equal to 0
Q No 8
Check whether there is a difference between the median values for the following sets of
treatment data for the twelve groups using Wilocxon signed rank test?
Solution
HO : Null Hypothesis : No change in mean before and after treatment
H1 : Alternate Hypothesis : The treatment does not result in significant change, before treament
is better.
20ITPW501 - Statistical Analysis using R Programming with Lab
Wilcoxon signed rank test can be used
W– = 1 + 2 + 4 = 7
W+ = 3 + 5.5 + 5.5 + 7 + 8 + 9 + 10 + 11 + 12 = 71
W(statistics) = 7
df = Degrees of Freedom = df = 11
w(critical) = 10
W(statistics) < W(critical) => Accept Null Hypothesis
R Code
> treatment1 = c(2.5,3.5,2.9,2.1,6.9,2.4,4.9,6.6,2.0,2.0,5.8,7.5)
> treatment2 = c(4.0,5.6,3.2,1.9,9.5,2.3,6.7,6.0,3.5,4.0,8.1,19.9)
> wilcox.test(treatment1,treatment2, paired = TRUE, alternative = "two.sided")
Wilcoxon signed rank test with continuity correction
data: treatment1 and treatment2
V = 7, p-value = 0.01344
alternative hypothesis: true location shift is not equal to 0
20ITPW501 - Statistical Analysis using R Programming with Lab
Q No 9
The percentage of juice lost after thawing for 19 different strawberry varieties appeared in the
article “Evaluation of Strawberry Cultivars with Different Degrees of Resistance to Red Scale”
(Fruit Varieties Journal [1991]: 12–17):
46 51 44 50 33 46 60 41 55 46 53 53 42 44 50 54 46 41 48 41 55 46 53 53
a. Are there any observations that are mild outliers or Extreme outliers?
b.Construct a boxplot, and comment on the important features of the plot.
To analyze the data for mild and extreme outliers, we can use the interquartile range (IQR)
method and construct a boxplot. In R, we can achieve this with the following steps:
1. Input the data.
2. Calculate the five-number summary (minimum, first quartile, median, third quartile,
maximum).
3. Calculate the interquartile range (IQR), mild outliers (outside 1.5 * IQR), and extreme
outliers (outside 3 * IQR).
4. Construct the boxplot.
5. Comment on the important features.
Here is the R code to achieve this:
R
# Data for percentage of juice lost
data <- c(46, 51, 44, 50, 33, 46, 60, 41, 55, 46, 53, 53, 42, 44, 50, 54, 46, 41, 48)
# Summary statistics
summary(data)
> summary(data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
33.00 44.00 46.00 47.53 52.00 60.00
# Calculate IQR
IQR_value <- IQR(data)
20ITPW501 - Statistical Analysis using R Programming with Lab
# Calculate mild and extreme outlier boundaries
lower_bound_mild <- quantile(data, 0.25) - 1.5 * IQR_value
upper_bound_mild <- quantile(data, 0.75) + 1.5 * IQR_value
lower_bound_extreme <- quantile(data, 0.25) - 3 * IQR_value
upper_bound_extreme <- quantile(data, 0.75) + 3 * IQR_value
# Identify mild outliers
mild_outliers <- data[data < lower_bound_mild | data > upper_bound_mild]
# Identify extreme outliers
extreme_outliers <- data[data < lower_bound_extreme | data > upper_bound_extreme]
# Construct boxplot
boxplot(data, main="Boxplot of Juice Loss After Thawing", ylab="Percentage", col="lightblue")
# Print mild and extreme outliers
list(mild_outliers = mild_outliers, extreme_outliers = extreme_outliers)
$mild_outliers
numeric(0)
$extreme_outliers
numeric(0)
20ITPW501 - Statistical Analysis using R Programming with Lab
Q No 10
The discharge of industrial wastewater into rivers affects water quality. To assess the effect of a
particular power plant on water quality, 24 water specimens were taken 16 km upstream and 4
km downstream of the plant. Alkalinity (mg/L) was determined for each specimen, resulting in
the summary quantities in the accompanying table. Does the data suggest that the true mean
alkalinity is higher downstream than upstream by more than 50 mg/L? Use a .05 significance
level.
Solution
Null Hypothesis : HO : Downstream Mean = Upstream Mean
Alternate Hypothesis : H1: Downstream Mean > Upstream Mean
Since summary is given, t test formula is
20ITPW501 - Statistical Analysis using R Programming with Lab
t(statistics) = (183.6 - 75.9) / sqrt((1.70*1.70)/24+(1.83*1.83)/24) = 211.23
df = Degrees of freedom = n1+n2-2 = 24+24-2 = 46
t(critical) = at 95% confidence interval = 2.03
t(statistics) > t(critical) => Reject HO
Accept H1 => Downstream is alkaline than upstream
R Code
> upstream = rnorm(24,75.9,1.83)
> downstream = rnorm(24,183.6,1.70)
> t.test(downstream,upstream,two.sided="GREATER")
Welch Two Sample t-test
data: downstream and upstream
t = 207.91, df = 45.106, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
106.9680 109.0606
sample estimates:
20ITPW501 - Statistical Analysis using R Programming with Lab
mean of x mean of y
183.46077 75.44649
Q No 11
The Oregon Department of Health web site provides information on the cost-to-charge ratio (the
percentage of billed charges that are actual costs to the hospital). The cost-to-charge ratios for
both inpatient and outpatient care in 2002 for a sample of six hospitals in Oregon follow.
Is there evidence that the mean cost-to-charge ratio for Oregon hospitals is lower for outpatient
care than for inpatient care? Use a significance level of .05.
Solution
Null Hypothesis : HO : There is no difference in mean between inpatient and outpatient
Alternative Hypothesis : H1 : Inpatient mean ratio is higher than the outpatient mean ratio
> inpatient = c(68,100,71,74,100,83)
> outpatient = c(54,75,53,56,74,71)
> mean(inpatient)
[1] 82.66667
> mean(outpatient)
[1] 63.83333
> sd(inpatient)
[1] 14.33411
> sd(outpatient)
[1] 10.53407
20ITPW501 - Statistical Analysis using R Programming with Lab
> t.test(inpatient,outpatient,alternative = "greater")
Welch Two Sample t-test
data: inpatient and outpatient
t = 2.5934, df = 9.1812, p-value = 0.0143
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
5.550828 Inf
sample estimates:
mean of x mean of y
82.66667 63.83333
Q No 12
Implement a Linear Regression for GDP vs 4wheeler_passengar_vehicle_sale
Also predict the 4wheeler_passengar_vehicle_sale in the year 2023 if GDP is 7.5
year GDP 4wheeler_passengar_vehicle_sale
(in lakhs)
2017 6.2 26.3
2018 6.5 26.65
2019 5.48 25.03
2020 6.54 26.01
2021 7.18 27.9
2022 7.93 30.47
Solution
GDP is Independent = X
4 Wheeler sale is Dependent = Y
Y = Mx+C
20ITPW501 - Statistical Analysis using R Programming with Lab
GDP(X) 4wheeler_passenga X^2 Y^2 X*Y
r_vehicle_sale
(in lakhs)(Y)
6.2 26.3 38.44 691.69 163.06
6.5 26.65 42.25 710.2225 173.225
5.48 25.03 30.0304 626.5009 137.1644
6.54 26.01 42.7716 676.5201 170.1054
7.18 27.9 51.5524 778.41 200.322
7.93 30.47 62.8849 928.4209 241.6271
39.83 162.36 267.9293 4411.764 1085.5039
m = (6*1085.5039 -39.83*162.36) / (6*267.9293 - 39.83*39.83) = 2.1858
c = (162.36 * 267.9293 - 39.83 * 1085.5039) / (6*267.9293 - 39.83*39.83) = 12.549
> gdp = c(6.2, 6.5, 5.48, 6.54, 7.18, 7.93)
20ITPW501 - Statistical Analysis using R Programming with Lab
> 4wheelersale = c(26.3, 26.65, 25.03, 26.01, 27.9, 30.47)
> relation=lm(four_wheelersale ~ gdp)
> print(relation)
Call:
lm(formula = four_wheelersale ~ gdp)
Coefficients:
(Intercept) gdp
12.549 2.186
> plot(relation)
> new_data = data.frame(gdp = c(7.5))
> predict(relation,new_data)
1
28.9435
Q No 13
To study the relationship between the monthly e-commerce sales and the online advertising
costs. You have the survey results for 7 online stores for the last year. Your task is to find
the equation of the straight line that fits the data best. The following table represents the
survey results from the 7 online stores. Write R program for doing the analysis
20ITPW501 - Statistical Analysis using R Programming with Lab
Solution
E Commerce Sales in Dependent Variable = Y
Ad Cost is Independent Variable = X
Y = Mx+C
X Y X^2 Y^2 XY
368 1.7 135424 2.89 625.6
340 1.5 115600 2.25 510
20ITPW501 - Statistical Analysis using R Programming with Lab
665 2.8 442225 7.84 1862
954 5 910116 25 4770
331 1.3 109561 1.69 430.3
556 2.2 309136 4.84 1223.2
376 1.3 141376 1.69 488.8
3590 15.8 2163438 46.2 9909.9
m = ((7* 9909.9) - (3590 * 15.8)) /((7*2163438)-(3590*3590)) = 0.00560615718
c = ((15.8*2163438) - (3590*9909.9)) / ((7*2163438)-(3590*3590)) = -0.61801489916
> ad_cost = c(368, 340, 665, 954, 331, 556, 376)
> ecom_sale = c(1.7, 1.5, 2.8, 5, 1.3, 2.2, 1.3)
> relation = lm(ecom_sale~ad_cost)
> print(relation)
Call:
lm(formula = ecom_sale ~ ad_cost)
Coefficients:
(Intercept) ad_cost
-0.618015 0.005606
> plot(relation)
20ITPW501 - Statistical Analysis using R Programming with Lab
Q No 14
A beverage company claims its soda cans contain 12 ounces. A researcher randomly samples
their cans and measures the amount of fluid in each one. Conduct a one-sample t-test using the
sample data to determine whether the entire population of soda cans differs from the
hypothesized value of 12 ounces.
soda =
c(11.78284,11.87099,11.97785,11.78375,11.60367,12.16584,11.75969,11.66779,11.77516)
Solution
Formula for ONE Sample T Test
20ITPW501 - Statistical Analysis using R Programming with Lab
Null Hypothesis : HO : Mu = 12
Alternate Hypothesis : H1 : Mu != 12
Summary of Data
> soda =
c(11.78284,11.87099,11.97785,11.78375,11.60367,12.16584,11.75969,11.66779,11.77516)
> mean(soda)
[1] 11.82084
> sd(soda)
[1] 0.1678634
t(statistics) = (11.82084 - 12) / (0.1678634/sqrt(9)) = -3.201889
df = Degrees of freedom = n-1 = 9-1 = 8
t(critical) at 95% level of significance at df = 8 = 2.306
t(statistics) is not in the range of acceptance. Reject HO.
> t.test(soda,mu=12)
One Sample t-test
data: soda
t = -3.2018, df = 8, p-value = 0.01258
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
11.69181 11.94987
sample estimates:
mean of x
11.82084
20ITPW501 - Statistical Analysis using R Programming with Lab
Q No 15
Imagine we have collected a random sample of 31 energy bars from a number of different stores
to represent the population of energy bars available to the general consumer. The labels on the
bars claim that each bar contains 20 grams of protein.
Energy Bar - Grams of Protein
20.70 27.46 22.15 19.85 21.29 24.75
20.75 22.91 25.34 20.33 21.54 21.08
22.14 19.56 21.10 18.04 24.12 19.95
19.72 18.28 16.26 17.46 20.53 22.12
25.06 22.44 19.08 19.88 21.39 22.33 25.79
Solution
Formula for ONE Sample T Test
Null Hypothesis : HO : Mu = 20
Alternate Hypothesis : H1 : Mu != 20
Summary of Data
20ITPW501 - Statistical Analysis using R Programming with Lab
> protein = c(20.70, 27.46, 22.15, 19.85, 21.29, 24.75, 20.75, 22.91, 25.34, 20.33, 21.54, 21.08,
22.14, 19.56, 21.10, 18.04, 24.12, 19.95, 19.72, 18.28, 16.26, 17.46, 20.53, 22.12, 25.06, 22.44,
19.08, 19.88, 21.39, 22.33, 25.79
+)
> mean(protein)
[1] 21.4
> sd(protein)
[1] 2.541669
t(statistics) = (21.4 - 20) / (2.541669/sqrt(31)) = 3.06683
df = degrees of freedom = n-1 = 31 - 1 = 30
t(critical) at 30 degree of freedom at 95% level of significance = 2.05
t(statistics) > t(critical) => Null Hypothesis is rejected
R Code
> t.test(protein,mu=20)
One Sample t-test
data: protein
t = 3.0668, df = 30, p-value = 0.004553
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
20.46771 22.33229
sample estimates:
mean of x
21.4
Q No 16
The following data represent the number of employees at various restaurants in New York
City. Using this data, create a histogram in R
22; 35;15; 26;40; 28; 18; 20; 25;34; 39;42; 24; 22; 19; 27; 22; 34; 40; 20; 38; and 28
20ITPW501 - Statistical Analysis using R Programming with Lab
Solution
# Input the data
employees <- c(22, 35, 15, 26, 40, 28, 18, 20, 25, 34, 39, 42, 24, 22, 19, 27, 22, 34, 40, 20, 38,
28)
# Create the histogram
hist(employees,
main="Histogram of Employees at Restaurants in NYC",
xlab="Number of Employees",
ylab="Frequency",
col="lightblue",
border="black",
breaks=10) # Adjust the number of breaks as needed