0% found this document useful (0 votes)

44 views30 pages

Unit 3 - Unit 4 Problems and Solutions

Hhbbbbbbnb b

Uploaded by

agashagshagashagsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views30 pages

Unit 3 - Unit 4 Problems and Solutions

Hhbbbbbbnb b

Uploaded by

agashagshagashagsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

20ITPW501 - Statistical Analysis using R Programming with Lab

Q No 1
The data for this example is the ages of male and female actors who won the Award for acting
for their work in a leading role. These award winners are from twelve consecutive years.(7)
Female 26,25, 33, 35, 35, 28, 30, 29, 61, 32, 33, 45
Male 46, 40, 36, 47, 29, 43, 37, 38, 45, 50, 48, 60
Write an R program for creating Parallel Box Plots for the above data. Draw the output
> female = c(26,25, 33, 35, 35, 28, 30, 29, 61, 32, 33, 45)
> male = c(46, 40, 36, 47, 29, 43, 37, 38, 45, 50, 48, 60)
> boxplot(male)
> boxplot(female)
> boxplot(male,female)

What is a Histogram? List the parameters used by the hist() function in R. Draw histogram for
the following data 19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39
> data = c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
> hist(data)
20ITPW501 - Statistical Analysis using R Programming with Lab

Q No 2
The table below shows the hours of relief provided by two analgesic drugs in 12 patients
suffering from arthritis. Is there any evidence that one drug provides longer relief than the other?

Solution
Wilcoxon Signed Rank Test to be done
20ITPW501 - Statistical Analysis using R Programming with Lab

Sum of positive ranks = 71

Sum of negative ranks = 7
HO : Null Hypothesis : Both drugs provide same relief
H1 : Alternate Hypothesis : Drug B provides more relief than Drug A (hence subtracting Drug B
- Drug A)

W(Statistics) = 71
W(Critical) = From Wilcox Table at alpha level = 0.05 is with 11 degree of freedom = 11
W(Critical) < W(Statistics) => HO is rejected, H1 is accepted
R Code
> drugA = c(2,3.6,2.6,2.6,7.3,3.4,14.9,6.6,2.3,2.0,6.8,8.5)
> drugB = c(3.5,5.7,2.9,2.4,9.9,3.3,16.7,6.0,3.8,4.0,9.1,20.9)
> wilcox.test(drugA,drugB,paired=TRUE)

Wilcoxon signed rank test with continuity correction

data: drugA and drugB

V = 7, p-value = 0.01344
alternative hypothesis: true location shift is not equal to 0

Q No 3
We have the data of potato yield from 12 different farms. We know that the standard potato yield
for the given variety is µ=20.
x = [21.5, 24.5, 18.5, 17.2, 14.5, 23.2, 22.1, 20.5, 19.4, 18.1, 24.1, 18.5]
Test if the potato yield from these farms is significantly better than the standard yield.
20ITPW501 - Statistical Analysis using R Programming with Lab

Solution
One sample T Test can be used
HO : Null Hypothesis : Mu = 20
H1: Alternate Hypothesis = Mu > 20
Summary of Given Data
Mean = (21.5+24.5+18.5+17.2+ 14.5+23.2+22.1+20.5+19.4+18.1+ 24.1+18.5) / 12 = 20.175
Standard Deviation = 3.021175
> yield = c(21.5, 24.5, 18.5, 17.2, 14.5, 23.2, 22.1, 20.5, 19.4, 18.1, 24.1, 18.5)
> mean(yield)
[1] 20.175
> sd(yield)
[1] 3.021175

Formula for ONE Sample T Test

t(statistics) = (20.175 - 20) / (3.021175/sqrt(12)) = 0.2006

df = Degrees of Freedom = n-1 = 12-1 = 11
Confidence level = 0.95, alpha=0.05. For d.o.f = 12 – 1 = 11,
T(Critical) = 1.796.
t(statistics) <= t(critical) => Accept HO =>Mean of yield is 20
R Code
> t.test(yield,mu=20)

One Sample t-test

20ITPW501 - Statistical Analysis using R Programming with Lab

data: yield
t = 0.20066, df = 11, p-value = 0.8446
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
18.25544 22.09456
sample estimates:
mean of x
20.175

Q No 4
A professor wants to know if her introductory statistics class has a good grasp of basic math. Six
students are chosen at random from the class and given a math proficiency test. The professor
wants the class to be able to score above 70 on the test. The six students get scores of 62, 92, 75,
68, 83, and 95. Can the professor have 90 percent confidence that the mean score for the class on
the test would be above 70
Solutions
One sample T Test can be used
HO : Null Hypothesis : Mu = 70
H1: Alternate Hypothesis = Mu > 70
Summary of Given Data
Mean = (62+92+75+69+83+95)/6 = 79.333
Standard Deviation = S = 13.16688
> marks = c(62, 92, 75, 68, 83,95)
> mean(marks)
[1] 79.16667
> sd(marks)
[1] 13.16688
20ITPW501 - Statistical Analysis using R Programming with Lab

t(statistics) = (79.16667-70)/(13.16688/sqrt(6)) = 1.7053

df = Degrees of freedom = n-1= 6-1 = 5
t(critical) at 95% level of significance = alpha = 0.05 = 1.476
t(statistics) > t(critical) => Reject Null Hypothesis : HO : Mean is not equal to 70
R Code
> t.test(marks,mu=70)

One Sample t-test

data: marks
t = 1.7053, df = 5, p-value = 0.1489
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
65.34888 92.98446
sample estimates:
mean of x
79.16667

Q No 5
20ITPW501 - Statistical Analysis using R Programming with Lab

One way to measure a person’s fitness is to measure their body fat percentage. Average body fat
percentages vary by age, but according to some guidelines, the normal range for men is 15-20%
body fat, and the normal range for women is 20-25% body fat.Our sample data is from a group
of men and women who did workouts at a gym three times a week for a year. Then, their trainer
measured the body fat. The table below shows Body fat percentage data grouped by gender

Test the significance of whether the body fat differs in the population between men and women
using two sample T test.
Solution
Two Sample T Test can be used
Null Hypothesis : HO : Mu1 = Mu2 Both means are equal
Alternative Hypothesis : H1: Mu1 != Mu2 Both means are not equal

Summary of Given Data

> men=c(13.3,6.0, 20.0, 8.0, 14.0,19.0, 18.0, 25.0, 16.0, 24.0,15.0, 1.0,
15.0)
> women = c(22.0, 16.0, 21.7, 21.0, 30.0,26.0, 12.0, 23.2, 28.0, 23.0)
> mean(men)
[1] 14.94615
20ITPW501 - Statistical Analysis using R Programming with Lab

> mean(women)
[1] 22.29
> sd(men)
[1] 6.842589
> sd(women)
[1] 5.31966

Number of Samples Mean Standard Deviation

Men 13 14.94615 6.842589

Women 10 22.29 5.31966

Two Sample T Test Formula

t(statistics) = (22.29-14.94615) / sqrt((5.319665.31966)/10+(6.842589 6.842589)/13) = 2.89

df = degrees of freedom = n1+n2 - 2 = 10+13-1 = 21
t(critical) value with α = 0.05 and 21 degrees of freedom is = 2.080
t(statistics) > t(critical) => Reject the null hypothesis => Both means are statistically not equal
20ITPW501 - Statistical Analysis using R Programming with Lab

R Code
> t.test(women, men, alternative="two.sided")

Welch Two Sample t-test

data: women and men

t = 2.8958, df = 20.989, p-value = 0.00865
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.069692 12.618000
sample estimates:
mean of x mean of y
22.29000 14.94615

Q No 6
Researchers want to examine the effect of perceived control on health complaints of geriatric
patients in a long-term care facility. Thirty patients are randomly selected to participate in the
study. Half are given a plant to care for and half are given a plant but the care is conducted by the
staff. Number of health complaints are recorded for each patient over the following seven days.
Compute the appropriate t-test for the data provided below.
Control over Plant No Control over Plant
23 35
12 21
6 26
15 24
18 17
5 23
20ITPW501 - Statistical Analysis using R Programming with Lab

21 37
18 22
34 16
10 38
23 23
14 41
19 27
23 24
8 32

Solution
Two Sample T Test can be used
Null Hypothesis : HO : Mu1 = Mu2 Both means are equal
Alternative Hypothesis : H1: Mu1 != Mu2 Both means are not equal
Summary of Given Data
> control = c(23, 12, 6, 15, 18, 5, 21, 18, 34, 10, 23, 14, 19, 23, 8)
> no_control = c(35, 21, 26, 24, 17, 23, 37, 22, 16, 38, 23, 41, 27, 24, 32)
> length(control)
[1] 15
> length(no_control)
[1] 15
> mean(control)
[1] 16.6
> mean(no_control)
[1] 27.06667
> sd(control)
[1] 7.790104
> sd(no_control)
[1] 7.741047
20ITPW501 - Statistical Analysis using R Programming with Lab

Number of Samples Mean Standard Deviation

Control 15 16.6 7.790104

No Control 15 27.06667 7.741047

Two Sample T Test Formula

t(statistics) = (27.06667 - 16.6) / sqrt((7.7410477.741047)/15 + (7.7901047.790104)/15)

t(statistics) = 3.69116
df = Degrees of freedom = n1 + n2 - 2 = 15+15-2 = 28
t(critical) at 28 degrees of freedom at 95% confidence interval = 2.763
t(statistics) > t(critical) => Reject Null hypothesis
R Code
> t.test(no_control,control,alternative = "two.sided")

Welch Two Sample t-test

data: no_control and control

20ITPW501 - Statistical Analysis using R Programming with Lab

t = 3.6912, df = 27.999, p-value = 0.0009556

alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.65819 16.27514
sample estimates:
mean of x mean of y
27.06667 16.60000

Q No 7
A study assessed the effectiveness of a new drug designed to reduce repetitive behaviors in
children affected with autism. A total of 8 children with autism enroll in the study and the
amount of time that each child is engaged in repetitive behavior during three hour observation
periods are measured both before treatment and then again after taking the new medication for a
period of 1 week. Conclude that the treatment results in a statistically significant improvement or
not at α=0.05 using Wilcoxon signed rank test The data are shown below.

Solution
HO : Null Hypothesis : No change in mean before and after treatment
H1 : Alternate Hypothesis : The treatment does not result in significant change, before treament
is better.

Wilcoxon Signed Rank Test can be used

20ITPW501 - Statistical Analysis using R Programming with Lab

Difference
Child Before Treatment After 1 Week of Treatment (Before-After)

1 85 75 10

2 70 50 20

3 40 50 -10

4 65 40 25

5 80 20 60

6 75 65 10

7 55 40 15

8 20 25 -5

Ordered Absolute Values of Difference Scores Signed

Observed Differences Ranks
Ranks
10 -5 1 -1

20 10 3 3

-10 -10 3 -3

25 10 3 3

60 15 5 5

10 20 6 6

15 25 7 7

-5 60 8 8

H0: The median difference is zero versus

H1: The median difference is positive α=0.05

W+ = 32
20ITPW501 - Statistical Analysis using R Programming with Lab

W- = 4

w(statistics) = 4

df = Degrees of freedom = df = n-1 = 8-1 = 7

w(critical) = 2

w(statistics) > w(critical) =>Reject Null Hypothesis

R Code

> wilcox.test(before_treatment,after_treatement,paired = TRUE, alternative = "two.sided")

Wilcoxon signed rank test with continuity correction

data: before_treatment and after_treatement

V = 32, p-value = 0.05747

alternative hypothesis: true location shift is not equal to 0

Q No 8
Check whether there is a difference between the median values for the following sets of
treatment data for the twelve groups using Wilocxon signed rank test?

Solution
HO : Null Hypothesis : No change in mean before and after treatment
H1 : Alternate Hypothesis : The treatment does not result in significant change, before treament
is better.
20ITPW501 - Statistical Analysis using R Programming with Lab

Wilcoxon signed rank test can be used

W– = 1 + 2 + 4 = 7
W+ = 3 + 5.5 + 5.5 + 7 + 8 + 9 + 10 + 11 + 12 = 71
W(statistics) = 7
df = Degrees of Freedom = df = 11
w(critical) = 10
W(statistics) < W(critical) => Accept Null Hypothesis
R Code
> treatment1 = c(2.5,3.5,2.9,2.1,6.9,2.4,4.9,6.6,2.0,2.0,5.8,7.5)
> treatment2 = c(4.0,5.6,3.2,1.9,9.5,2.3,6.7,6.0,3.5,4.0,8.1,19.9)
> wilcox.test(treatment1,treatment2, paired = TRUE, alternative = "two.sided")
Wilcoxon signed rank test with continuity correction
data: treatment1 and treatment2
V = 7, p-value = 0.01344
alternative hypothesis: true location shift is not equal to 0
20ITPW501 - Statistical Analysis using R Programming with Lab

Q No 9
The percentage of juice lost after thawing for 19 different strawberry varieties appeared in the
article “Evaluation of Strawberry Cultivars with Different Degrees of Resistance to Red Scale”
(Fruit Varieties Journal [1991]: 12–17):
46 51 44 50 33 46 60 41 55 46 53 53 42 44 50 54 46 41 48 41 55 46 53 53
a. Are there any observations that are mild outliers or Extreme outliers?
b.Construct a boxplot, and comment on the important features of the plot.

To analyze the data for mild and extreme outliers, we can use the interquartile range (IQR)
method and construct a boxplot. In R, we can achieve this with the following steps:

1. Input the data.

2. Calculate the five-number summary (minimum, first quartile, median, third quartile,
maximum).
3. Calculate the interquartile range (IQR), mild outliers (outside 1.5 * IQR), and extreme
outliers (outside 3 * IQR).
4. Construct the boxplot.
5. Comment on the important features.

Here is the R code to achieve this:

R
# Data for percentage of juice lost
data <- c(46, 51, 44, 50, 33, 46, 60, 41, 55, 46, 53, 53, 42, 44, 50, 54, 46, 41, 48)
# Summary statistics
summary(data)
> summary(data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
33.00 44.00 46.00 47.53 52.00 60.00
# Calculate IQR
IQR_value <- IQR(data)
20ITPW501 - Statistical Analysis using R Programming with Lab

# Calculate mild and extreme outlier boundaries

lower_bound_mild <- quantile(data, 0.25) - 1.5 * IQR_value
upper_bound_mild <- quantile(data, 0.75) + 1.5 * IQR_value
lower_bound_extreme <- quantile(data, 0.25) - 3 * IQR_value
upper_bound_extreme <- quantile(data, 0.75) + 3 * IQR_value
# Identify mild outliers
mild_outliers <- data[data < lower_bound_mild | data > upper_bound_mild]
# Identify extreme outliers
extreme_outliers <- data[data < lower_bound_extreme | data > upper_bound_extreme]
# Construct boxplot
boxplot(data, main="Boxplot of Juice Loss After Thawing", ylab="Percentage", col="lightblue")
# Print mild and extreme outliers
list(mild_outliers = mild_outliers, extreme_outliers = extreme_outliers)
$mild_outliers
numeric(0)

$extreme_outliers
numeric(0)
20ITPW501 - Statistical Analysis using R Programming with Lab

Q No 10
The discharge of industrial wastewater into rivers affects water quality. To assess the effect of a
particular power plant on water quality, 24 water specimens were taken 16 km upstream and 4
km downstream of the plant. Alkalinity (mg/L) was determined for each specimen, resulting in
the summary quantities in the accompanying table. Does the data suggest that the true mean
alkalinity is higher downstream than upstream by more than 50 mg/L? Use a .05 significance
level.

Solution
Null Hypothesis : HO : Downstream Mean = Upstream Mean
Alternate Hypothesis : H1: Downstream Mean > Upstream Mean

Since summary is given, t test formula is

20ITPW501 - Statistical Analysis using R Programming with Lab

t(statistics) = (183.6 - 75.9) / sqrt((1.701.70)/24+(1.831.83)/24) = 211.23

df = Degrees of freedom = n1+n2-2 = 24+24-2 = 46
t(critical) = at 95% confidence interval = 2.03

t(statistics) > t(critical) => Reject HO

Accept H1 => Downstream is alkaline than upstream
R Code
> upstream = rnorm(24,75.9,1.83)
> downstream = rnorm(24,183.6,1.70)
> t.test(downstream,upstream,two.sided="GREATER")
Welch Two Sample t-test
data: downstream and upstream
t = 207.91, df = 45.106, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
106.9680 109.0606
sample estimates:
20ITPW501 - Statistical Analysis using R Programming with Lab

mean of x mean of y
183.46077 75.44649

Q No 11
The Oregon Department of Health web site provides information on the cost-to-charge ratio (the
percentage of billed charges that are actual costs to the hospital). The cost-to-charge ratios for
both inpatient and outpatient care in 2002 for a sample of six hospitals in Oregon follow.

Is there evidence that the mean cost-to-charge ratio for Oregon hospitals is lower for outpatient
care than for inpatient care? Use a significance level of .05.
Solution
Null Hypothesis : HO : There is no difference in mean between inpatient and outpatient
Alternative Hypothesis : H1 : Inpatient mean ratio is higher than the outpatient mean ratio
> inpatient = c(68,100,71,74,100,83)
> outpatient = c(54,75,53,56,74,71)
> mean(inpatient)
[1] 82.66667
> mean(outpatient)
[1] 63.83333
> sd(inpatient)
[1] 14.33411
> sd(outpatient)
[1] 10.53407
20ITPW501 - Statistical Analysis using R Programming with Lab

> t.test(inpatient,outpatient,alternative = "greater")

Welch Two Sample t-test
data: inpatient and outpatient
t = 2.5934, df = 9.1812, p-value = 0.0143
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
5.550828 Inf
sample estimates:
mean of x mean of y
82.66667 63.83333

Q No 12
Implement a Linear Regression for GDP vs 4wheeler_passengar_vehicle_sale
Also predict the 4wheeler_passengar_vehicle_sale in the year 2023 if GDP is 7.5
year GDP 4wheeler_passengar_vehicle_sale
(in lakhs)
2017 6.2 26.3
2018 6.5 26.65
2019 5.48 25.03
2020 6.54 26.01
2021 7.18 27.9
2022 7.93 30.47

Solution
GDP is Independent = X
4 Wheeler sale is Dependent = Y
Y = Mx+C
20ITPW501 - Statistical Analysis using R Programming with Lab

GDP(X) 4wheeler_passenga X^2 Y^2 X*Y

r_vehicle_sale
(in lakhs)(Y)
6.2 26.3 38.44 691.69 163.06

6.5 26.65 42.25 710.2225 173.225

5.48 25.03 30.0304 626.5009 137.1644

6.54 26.01 42.7716 676.5201 170.1054

7.18 27.9 51.5524 778.41 200.322

7.93 30.47 62.8849 928.4209 241.6271

39.83 162.36 267.9293 4411.764 1085.5039

m = (61085.5039 -39.83162.36) / (6267.9293 - 39.8339.83) = 2.1858

c = (162.36 * 267.9293 - 39.83 * 1085.5039) / (6*267.9293 - 39.83*39.83) = 12.549
> gdp = c(6.2, 6.5, 5.48, 6.54, 7.18, 7.93)
20ITPW501 - Statistical Analysis using R Programming with Lab

> 4wheelersale = c(26.3, 26.65, 25.03, 26.01, 27.9, 30.47)

> relation=lm(four_wheelersale ~ gdp)
> print(relation)
Call:
lm(formula = four_wheelersale ~ gdp)
Coefficients:
(Intercept) gdp
12.549 2.186
> plot(relation)

> new_data = data.frame(gdp = c(7.5))

> predict(relation,new_data)
1
28.9435

Q No 13
To study the relationship between the monthly e-commerce sales and the online advertising
costs. You have the survey results for 7 online stores for the last year. Your task is to find
the equation of the straight line that fits the data best. The following table represents the
survey results from the 7 online stores. Write R program for doing the analysis
20ITPW501 - Statistical Analysis using R Programming with Lab

Solution
E Commerce Sales in Dependent Variable = Y
Ad Cost is Independent Variable = X
Y = Mx+C

X Y X^2 Y^2 XY

368 1.7 135424 2.89 625.6

340 1.5 115600 2.25 510

20ITPW501 - Statistical Analysis using R Programming with Lab

665 2.8 442225 7.84 1862

954 5 910116 25 4770

331 1.3 109561 1.69 430.3

556 2.2 309136 4.84 1223.2

376 1.3 141376 1.69 488.8

3590 15.8 2163438 46.2 9909.9

m = ((7* 9909.9) - (3590 * 15.8)) /((72163438)-(35903590)) = 0.00560615718

c = ((15.8*2163438) - (3590*9909.9)) / ((7*2163438)-(3590*3590)) = -0.61801489916
> ad_cost = c(368, 340, 665, 954, 331, 556, 376)
> ecom_sale = c(1.7, 1.5, 2.8, 5, 1.3, 2.2, 1.3)
> relation = lm(ecom_sale~ad_cost)
> print(relation)
Call:
lm(formula = ecom_sale ~ ad_cost)

Coefficients:
(Intercept) ad_cost
-0.618015 0.005606

> plot(relation)
20ITPW501 - Statistical Analysis using R Programming with Lab

Q No 14
A beverage company claims its soda cans contain 12 ounces. A researcher randomly samples
their cans and measures the amount of fluid in each one. Conduct a one-sample t-test using the
sample data to determine whether the entire population of soda cans differs from the
hypothesized value of 12 ounces.

soda =
c(11.78284,11.87099,11.97785,11.78375,11.60367,12.16584,11.75969,11.66779,11.77516)

Solution
Formula for ONE Sample T Test
20ITPW501 - Statistical Analysis using R Programming with Lab

Null Hypothesis : HO : Mu = 12
Alternate Hypothesis : H1 : Mu != 12
Summary of Data
> soda =
c(11.78284,11.87099,11.97785,11.78375,11.60367,12.16584,11.75969,11.66779,11.77516)
> mean(soda)
[1] 11.82084
> sd(soda)
[1] 0.1678634
t(statistics) = (11.82084 - 12) / (0.1678634/sqrt(9)) = -3.201889
df = Degrees of freedom = n-1 = 9-1 = 8
t(critical) at 95% level of significance at df = 8 = 2.306
t(statistics) is not in the range of acceptance. Reject HO.
> t.test(soda,mu=12)
One Sample t-test
data: soda
t = -3.2018, df = 8, p-value = 0.01258
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
11.69181 11.94987
sample estimates:
mean of x
11.82084
20ITPW501 - Statistical Analysis using R Programming with Lab

Q No 15
Imagine we have collected a random sample of 31 energy bars from a number of different stores
to represent the population of energy bars available to the general consumer. The labels on the
bars claim that each bar contains 20 grams of protein.

Energy Bar - Grams of Protein

20.70 27.46 22.15 19.85 21.29 24.75

20.75 22.91 25.34 20.33 21.54 21.08

22.14 19.56 21.10 18.04 24.12 19.95

19.72 18.28 16.26 17.46 20.53 22.12

25.06 22.44 19.08 19.88 21.39 22.33 25.79

Solution
Formula for ONE Sample T Test

Null Hypothesis : HO : Mu = 20
Alternate Hypothesis : H1 : Mu != 20
Summary of Data
20ITPW501 - Statistical Analysis using R Programming with Lab

> protein = c(20.70, 27.46, 22.15, 19.85, 21.29, 24.75, 20.75, 22.91, 25.34, 20.33, 21.54, 21.08,
22.14, 19.56, 21.10, 18.04, 24.12, 19.95, 19.72, 18.28, 16.26, 17.46, 20.53, 22.12, 25.06, 22.44,
19.08, 19.88, 21.39, 22.33, 25.79
+)
> mean(protein)
[1] 21.4
> sd(protein)
[1] 2.541669
t(statistics) = (21.4 - 20) / (2.541669/sqrt(31)) = 3.06683
df = degrees of freedom = n-1 = 31 - 1 = 30
t(critical) at 30 degree of freedom at 95% level of significance = 2.05
t(statistics) > t(critical) => Null Hypothesis is rejected
R Code
> t.test(protein,mu=20)
One Sample t-test
data: protein
t = 3.0668, df = 30, p-value = 0.004553
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
20.46771 22.33229
sample estimates:
mean of x
21.4

Q No 16
The following data represent the number of employees at various restaurants in New York
City. Using this data, create a histogram in R

22; 35;15; 26;40; 28; 18; 20; 25;34; 39;42; 24; 22; 19; 27; 22; 34; 40; 20; 38; and 28
20ITPW501 - Statistical Analysis using R Programming with Lab

Solution
# Input the data
employees <- c(22, 35, 15, 26, 40, 28, 18, 20, 25, 34, 39, 42, 24, 22, 19, 27, 22, 34, 40, 20, 38,
28)
# Create the histogram
hist(employees,
main="Histogram of Employees at Restaurants in NYC",
xlab="Number of Employees",
ylab="Frequency",
col="lightblue",
border="black",
breaks=10) # Adjust the number of breaks as needed

Task 5
No ratings yet
Task 5
3 pages
MAT 2001 Experiment 6
100% (1)
MAT 2001 Experiment 6
3 pages
Lesson 4 TEST OF DIFFERENCE
No ratings yet
Lesson 4 TEST OF DIFFERENCE
26 pages
Two Independent Sample T-Test: Example
No ratings yet
Two Independent Sample T-Test: Example
4 pages
Hypothesis Testing and T-tests in R
No ratings yet
Hypothesis Testing and T-tests in R
16 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Paired T-Test for Cholesterol Levels
No ratings yet
Paired T-Test for Cholesterol Levels
7 pages
T Test
No ratings yet
T Test
11 pages
Assignment06 1
No ratings yet
Assignment06 1
4 pages
MH3511 Midterm 2017 Q
No ratings yet
MH3511 Midterm 2017 Q
4 pages
Research Methods for Statisticians
50% (2)
Research Methods for Statisticians
5 pages
STAT 1150 Worksheet 6
No ratings yet
STAT 1150 Worksheet 6
9 pages
R Commands New 2
No ratings yet
R Commands New 2
23 pages
Unit 2 Assignment SKELETON R spr18
No ratings yet
Unit 2 Assignment SKELETON R spr18
12 pages
T Test
No ratings yet
T Test
30 pages
App Stat (1) - 4
No ratings yet
App Stat (1) - 4
14 pages
What Statistical Analysis Should I Use?: Sunday, June 4, 2017 04:22 AM
No ratings yet
What Statistical Analysis Should I Use?: Sunday, June 4, 2017 04:22 AM
364 pages
Lab6 - Hypothesis Testing and Confidence Intervals in R
No ratings yet
Lab6 - Hypothesis Testing and Confidence Intervals in R
3 pages
T-Test Practical
No ratings yet
T-Test Practical
31 pages
Statistical Analysis Homework
No ratings yet
Statistical Analysis Homework
6 pages
Hypothesis Testing in R
No ratings yet
Hypothesis Testing in R
13 pages
Economics Exam Revision Guide
No ratings yet
Economics Exam Revision Guide
8 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
26 pages
The T Test Prepared by B.saikiran (12NA1E0036)
No ratings yet
The T Test Prepared by B.saikiran (12NA1E0036)
14 pages
T Test in R
No ratings yet
T Test in R
12 pages
5 Single Sample T JASP
No ratings yet
5 Single Sample T JASP
10 pages
BIOS576A W6 HW Key PDF
No ratings yet
BIOS576A W6 HW Key PDF
5 pages
Student T-Test
No ratings yet
Student T-Test
68 pages
Hinds Matthew ML4
No ratings yet
Hinds Matthew ML4
2 pages
Hns b308 Ib Biostatistics Supplementary Exam-May 2016
No ratings yet
Hns b308 Ib Biostatistics Supplementary Exam-May 2016
6 pages
Statistical Tests in R: t-Test & F-Test
No ratings yet
Statistical Tests in R: t-Test & F-Test
4 pages
Statistical Hypothesis Testing
No ratings yet
Statistical Hypothesis Testing
20 pages
Hypothesis Testing 5vPq3c2Gzw4HKVhV
No ratings yet
Hypothesis Testing 5vPq3c2Gzw4HKVhV
45 pages
Stat 362 UNIT 2
No ratings yet
Stat 362 UNIT 2
40 pages
The T Test Prepared by B.saikiran (12NA1E0036)
No ratings yet
The T Test Prepared by B.saikiran (12NA1E0036)
14 pages
T-Test Guide for Data Analytics Course
No ratings yet
T-Test Guide for Data Analytics Course
30 pages
Assignment STAT5002
No ratings yet
Assignment STAT5002
5 pages
2-Basic Statistics For Pharmacology Practicals
No ratings yet
2-Basic Statistics For Pharmacology Practicals
38 pages
Quantitative Data Analysis Guide
No ratings yet
Quantitative Data Analysis Guide
56 pages
7-Applying The T-Test For Independent and Dependent Samples-13
No ratings yet
7-Applying The T-Test For Independent and Dependent Samples-13
6 pages
Student S T Statistic: Test For Equality of Two Means Test For Value of A Single Mean
No ratings yet
Student S T Statistic: Test For Equality of Two Means Test For Value of A Single Mean
35 pages
Lab6 - HT and CI in R Some Solutions
No ratings yet
Lab6 - HT and CI in R Some Solutions
7 pages
Modern Regression Homework 5-1
No ratings yet
Modern Regression Homework 5-1
8 pages
Inbound 5502677004412826692
No ratings yet
Inbound 5502677004412826692
61 pages
Statprob Finals 4TH
No ratings yet
Statprob Finals 4TH
8 pages
Question 1
No ratings yet
Question 1
135 pages
Module 3 Hypothesis Testing Using R
No ratings yet
Module 3 Hypothesis Testing Using R
7 pages
UFS SW Module 5 - 6 Review KEY
No ratings yet
UFS SW Module 5 - 6 Review KEY
10 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Practical 8 PDF
No ratings yet
Practical 8 PDF
3 pages
Introduction To Statistical Hypothesis Testing in R
No ratings yet
Introduction To Statistical Hypothesis Testing in R
8 pages
Lab Test 2018 Answers PDF
No ratings yet
Lab Test 2018 Answers PDF
6 pages
ECMT1020 - Week 04 Workshop PDF
No ratings yet
ECMT1020 - Week 04 Workshop PDF
4 pages
Tutorial Chapter 3 & 4
No ratings yet
Tutorial Chapter 3 & 4
11 pages
T-Test For Thesis
100% (2)
T-Test For Thesis
4 pages
Agash P-1-1
No ratings yet
Agash P-1-1
1 page
SEC - EmbedUR Systems Eligible Students Eduamate Status
No ratings yet
SEC - EmbedUR Systems Eligible Students Eduamate Status
22 pages
Certificates 2025 -FINAL
No ratings yet
Certificates 2025 -FINAL
5 pages
DT20234856542 Application Form
No ratings yet
DT20234856542 Application Form
5 pages
824810016 Lesson Plan on Multiple Pregnancy
No ratings yet
824810016 Lesson Plan on Multiple Pregnancy
8 pages
Unit 1
No ratings yet
Unit 1
2 pages
Agash Resume
No ratings yet
Agash Resume
1 page
Cancer Facts For Women
No ratings yet
Cancer Facts For Women
11 pages
EmbedUR Campus Database Sheet - Sairam Institutions
No ratings yet
EmbedUR Campus Database Sheet - Sairam Institutions
32 pages
SIVA
No ratings yet
SIVA
44 pages
Sivakami 1
No ratings yet
Sivakami 1
1 page
3.9 Database and Java Servlets
No ratings yet
3.9 Database and Java Servlets
21 pages
20HSMC501 Universal Human Values 2 - Understanding Harmony
No ratings yet
20HSMC501 Universal Human Values 2 - Understanding Harmony
5 pages
QB - Cloud Computing - CAT I
No ratings yet
QB - Cloud Computing - CAT I
13 pages
UHV UNIT-1 & 2 Important Questions
No ratings yet
UHV UNIT-1 & 2 Important Questions
65 pages
Uhv 2marks
No ratings yet
Uhv 2marks
11 pages
3.1 Browsers and DOM
No ratings yet
3.1 Browsers and DOM
25 pages
R Programming Exam for IT Students
No ratings yet
R Programming Exam for IT Students
11 pages
Artificial Intelligence Catlog
No ratings yet
Artificial Intelligence Catlog
14 pages
Web Development
No ratings yet
Web Development
14 pages
MPMC Cat Questions
No ratings yet
MPMC Cat Questions
7 pages
Big Data Cat Questions
No ratings yet
Big Data Cat Questions
7 pages
Forest Fire Prediction Models
No ratings yet
Forest Fire Prediction Models
6 pages
Fin320 Simulation 2020 July
100% (1)
Fin320 Simulation 2020 July
40 pages
Worksheet - Confidence Interval
No ratings yet
Worksheet - Confidence Interval
2 pages
Six Sigma Measurement Analysis Guide
100% (1)
Six Sigma Measurement Analysis Guide
19 pages
Ss-Chapter 12: Sampling: Final and Initial Sample Size Determination
No ratings yet
Ss-Chapter 12: Sampling: Final and Initial Sample Size Determination
14 pages
hw2 2024spring Solution
No ratings yet
hw2 2024spring Solution
11 pages
Automatic 1-D Inversion of Magnetotelluric Data by The Method of Modelling
No ratings yet
Automatic 1-D Inversion of Magnetotelluric Data by The Method of Modelling
9 pages
Stressful Experiences of First Year Students of Selected Universities in South Africa
No ratings yet
Stressful Experiences of First Year Students of Selected Universities in South Africa
16 pages
Statistical Process Control (SPC) Tutorial
No ratings yet
Statistical Process Control (SPC) Tutorial
10 pages
Interpret All Statistics and Graphs For One-Way ANOVA - Minitab Express
No ratings yet
Interpret All Statistics and Graphs For One-Way ANOVA - Minitab Express
18 pages
Sample Size Calculation
No ratings yet
Sample Size Calculation
13 pages
Theoretical Foundations of Conformal Prediction 1732440976
100% (2)
Theoretical Foundations of Conformal Prediction 1732440976
179 pages
Impact of Random Phase Distribution in Ferroelectric Transistors-Based 3-D NAND Architecture On In-Memory Computing
No ratings yet
Impact of Random Phase Distribution in Ferroelectric Transistors-Based 3-D NAND Architecture On In-Memory Computing
6 pages
O Level - Amath Syllabus
No ratings yet
O Level - Amath Syllabus
12 pages
HW20 - Lecture 20 (9.5-9.6) - Intro To Statistics Assignment
No ratings yet
HW20 - Lecture 20 (9.5-9.6) - Intro To Statistics Assignment
17 pages
Grade 11 Statistics Module: Random Variables
No ratings yet
Grade 11 Statistics Module: Random Variables
12 pages
Kwame Nkrumah University of Science and Technology: Glassware: A Statistical Study
No ratings yet
Kwame Nkrumah University of Science and Technology: Glassware: A Statistical Study
8 pages
The Effect of Google SketchUp and Need For Achieve
No ratings yet
The Effect of Google SketchUp and Need For Achieve
16 pages
Thesis On Lean
No ratings yet
Thesis On Lean
34 pages
Estimation From Samples To Population Estimation of Population Mean Estimation of Population Proportion Sample Size Distribution
No ratings yet
Estimation From Samples To Population Estimation of Population Mean Estimation of Population Proportion Sample Size Distribution
57 pages
Biology Laboratory Manual 10th Edition by Vodopich and Moore ISBN Solution Manual
100% (67)
Biology Laboratory Manual 10th Edition by Vodopich and Moore ISBN Solution Manual
10 pages
Hypothesis Testing Scenarios
No ratings yet
Hypothesis Testing Scenarios
3 pages
Exercises - Unit 6 - Discrete Probability Distributions
No ratings yet
Exercises - Unit 6 - Discrete Probability Distributions
5 pages
Measurement of Uncertainty PDF
No ratings yet
Measurement of Uncertainty PDF
96 pages
Data Management for Students
No ratings yet
Data Management for Students
51 pages
STEM Impact on Academic Skills
No ratings yet
STEM Impact on Academic Skills
20 pages
Answer Q14
No ratings yet
Answer Q14
28 pages
A Study On Cash Flow Statement Analysis
No ratings yet
A Study On Cash Flow Statement Analysis
5 pages
What Is A Z Score PDF
No ratings yet
What Is A Z Score PDF
2 pages
Analysis of Environmental Microbiology Data From Cleanroom Samples
No ratings yet
Analysis of Environmental Microbiology Data From Cleanroom Samples
5 pages

Unit 3 - Unit 4 Problems and Solutions

Uploaded by

Unit 3 - Unit 4 Problems and Solutions

Uploaded by

20ITPW501 - Statistical Analysis using R Programming with Lab

Sum of positive ranks = 71

Wilcoxon signed rank test with continuity correction

data: drugA and drugB

Formula for ONE Sample T Test

t(statistics) = (20.175 - 20) / (3.021175/sqrt(12)) = 0.2006

One Sample t-test

t(statistics) = (79.16667-70)/(13.16688/sqrt(6)) = 1.7053

One Sample t-test

Summary of Given Data

Number of Samples Mean Standard Deviation

Men 13 14.94615 6.842589

Women 10 22.29 5.31966

Two Sample T Test Formula

t(statistics) = (22.29-14.94615) / sqrt((5.31966*5.31966)/10+(6.842589 * 6.842589)/13) = 2.89

Welch Two Sample t-test

data: women and men

Number of Samples Mean Standard Deviation

Control 15 16.6 7.790104

No Control 15 27.06667 7.741047

Two Sample T Test Formula

t(statistics) = (27.06667 - 16.6) / sqrt((7.741047*7.741047)/15 + (7.790104*7.790104)/15)

Welch Two Sample t-test

data: no_control and control

t = 3.6912, df = 27.999, p-value = 0.0009556

Wilcoxon Signed Rank Test can be used

Ordered Absolute Values of Difference Scores Signed

H0: The median difference is zero versus

H1: The median difference is positive α=0.05

df = Degrees of freedom = df = n-1 = 8-1 = 7

w(statistics) > w(critical) =>Reject Null Hypothesis

> wilcox.test(before_treatment,after_treatement,paired = TRUE, alternative = "two.sided")

Wilcoxon signed rank test with continuity correction

data: before_treatment and after_treatement

V = 32, p-value = 0.05747

alternative hypothesis: true location shift is not equal to 0

Wilcoxon signed rank test can be used

1. Input the data.

Here is the R code to achieve this:

# Calculate mild and extreme outlier boundaries

Since summary is given, t test formula is

t(statistics) = (183.6 - 75.9) / sqrt((1.70*1.70)/24+(1.83*1.83)/24) = 211.23

t(statistics) > t(critical) => Reject HO

> t.test(inpatient,outpatient,alternative = "greater")

GDP(X) 4wheeler_passenga X^2 Y^2 X*Y

6.5 26.65 42.25 710.2225 173.225

5.48 25.03 30.0304 626.5009 137.1644

6.54 26.01 42.7716 676.5201 170.1054

7.18 27.9 51.5524 778.41 200.322

7.93 30.47 62.8849 928.4209 241.6271

39.83 162.36 267.9293 4411.764 1085.5039

m = (6*1085.5039 -39.83*162.36) / (6*267.9293 - 39.83*39.83) = 2.1858

> 4wheelersale = c(26.3, 26.65, 25.03, 26.01, 27.9, 30.47)

> new_data = data.frame(gdp = c(7.5))

368 1.7 135424 2.89 625.6

340 1.5 115600 2.25 510

665 2.8 442225 7.84 1862

954 5 910116 25 4770

331 1.3 109561 1.69 430.3

556 2.2 309136 4.84 1223.2

376 1.3 141376 1.69 488.8

3590 15.8 2163438 46.2 9909.9

m = ((7* 9909.9) - (3590 * 15.8)) /((7*2163438)-(3590*3590)) = 0.00560615718

Energy Bar - Grams of Protein

20.70 27.46 22.15 19.85 21.29 24.75

20.75 22.91 25.34 20.33 21.54 21.08

22.14 19.56 21.10 18.04 24.12 19.95

19.72 18.28 16.26 17.46 20.53 22.12

25.06 22.44 19.08 19.88 21.39 22.33 25.79

You might also like

t(statistics) = (22.29-14.94615) / sqrt((5.319665.31966)/10+(6.842589 6.842589)/13) = 2.89

t(statistics) = (27.06667 - 16.6) / sqrt((7.7410477.741047)/15 + (7.7901047.790104)/15)

t(statistics) = (183.6 - 75.9) / sqrt((1.701.70)/24+(1.831.83)/24) = 211.23

m = (61085.5039 -39.83162.36) / (6267.9293 - 39.8339.83) = 2.1858

m = ((7* 9909.9) - (3590 * 15.8)) /((72163438)-(35903590)) = 0.00560615718