KEMBAR78
Case 4 - Tutorial 2 | PDF | P Value | Statistics
0% found this document useful (0 votes)
41 views20 pages

Case 4 - Tutorial 2

1. Descriptive statistics including summary tables, graphs, and tests are conducted on a dataset to analyze characteristics by province and ownership. 2. A one-way ANOVA finds no significant difference in total assets between provinces. Assumptions of normality and equal variances are satisfied. 3. A two-way ANOVA tests for differences in total assets between province, ownership, and their interaction. While distributions are approximately normal, variances are unequal. The ANOVA finds no significant differences.

Uploaded by

Hoàng Huế
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views20 pages

Case 4 - Tutorial 2

1. Descriptive statistics including summary tables, graphs, and tests are conducted on a dataset to analyze characteristics by province and ownership. 2. A one-way ANOVA finds no significant difference in total assets between provinces. Assumptions of normality and equal variances are satisfied. 3. A two-way ANOVA tests for differences in total assets between province, ownership, and their interaction. While distributions are approximately normal, variances are unequal. The ANOVA finds no significant differences.

Uploaded by

Hoàng Huế
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Question 1: Produce descriptive statistics to summarize the data.

You are expected to generate as


many relevant descriptive statistics as possible using ALL the relevant tools introduced in the
labs of this course. Remember to provide appropriate interpretations for the descriptive statistics.
Try not to include unnecessary or irrelevant descriptive statistics.
Firstly, import the dataset23.csv data frame into R and assign it to case4.
getwd()
setwd("")
case4<- read.table("dataset23.csv", header = TRUE , sep = ",", quote ="/", stringsAsFactors =
FALSE )
1. Some first rows of the data
head(case4)

2. Display the structure of case4 data frame


str(case4)

3. Convert character variables into factors


case4$X.province<-factor(case4$X.province, levels = c("Hanoi","Haiphong","TP HCM"))
case4$own<-factor(case4$own, levels = c("One-owner","Multi-owner"))
str(case4)
4. Summary data
Summary(case4)

table(case4$X.province,case4$own)

5. Summary data by groups


by(case4$X.quantityproduct,list(case4$X.province,case4$own),summary)
by(case4$X.quantitysold,list(case4$X.province,case4$own),summary)

by(case4$totalass,list(case4$X.province,case4$own),summary)
# Descriptive data in statistic
install.packages("psych")
library("psych")
describeBy(case4["X.quantityproduct"],list(case4$X.province,case4$own))

describeBy(case4["X.quantitysold"],list(case4$X.province,case4$own))
describeBy(case4["totalass"],list(case4$X.province,case4$own))

# Descriptive data in graphs


boxplot(X.quantityproduct ~ X.province + own, data = case4, xlab = "Specify address of firm
and Ownership status", ylab = "Quantity produced for the most important product", col =
c("red", "blue", "yellow","pink","grey","green"))
boxplot(X.quantitysold ~ X.province + own, data = case4, xlab = "Specify address of firm and
Ownership status", ylab = "Quantity sold base one quantity produced for the most important
product", col = c("red", "blue", "yellow","pink","grey","green"))

boxplot(totalass ~ X.province + own, data = case4, xlab = "Specify address of firm and
Ownership status", ylab = "Total assets in 2014", col = c("red", "blue",
"yellow","pink","grey","green"))

install.packages("gplots")
library("gplots")
plotmeans(X.quantityproduct~interaction(X.province, own), data=case4, xlab = "Specify address
of firm and Ownership status", ylab = "Quantity produced for the most important product",
main="Mean Plot with 95% CI")

plotmeans(X.quantitysold~interaction(X.province, own), data=case4, xlab = "Specify address of


firm and Ownership status", ylab = "Quantity sold base one quantity produced for the most
important product", main="Mean Plot with 95% CI")
plotmeans(totalass~interaction(X.province, own), data=case4, xlab = "Specify address of firm
and Ownership status", ylab = "Total assets in 2014", main="Mean Plot with 95% CI")

Question 2: Use analysis of variance to test for any significant differences due to province. Use
a .05 level of significance, and for now, ignore the effect of types of ownership, quantity
produced and quantity sold. Check all the assumptions of the inference technique you use. Are
the assumptions satisfied? Explain.
Check assumption:
1. All populations are normally distributed (qqplot)
 install.packages("car")
 library(car)
 qqPlot(lm(case4$totalass ~ case4$X.province,data = case4), simulate=T, main="Q-Q
Plot", labels=F)
2. Samples were selected by using simple random sampling. Samples are independent and
simple random sample and sample sizes are equal
 table(case4$X.province)

3. All population variances are equal (Slargest <2Ssmallest )


 by(case4$totalass,case4$X.province,sd)
Slargest 110745.9
= = 9.921601 > 2
Ssmallest 11162.1
The ratio of largest SD over smallest SD is around 9.92 (which is greater than 2) in this case it is
not so clear to pool variances, then it’s good to check again using Levene’s test:
(limitation: the ratio is too big
Hypothesis:
Ho : All populations variances are equal
Ha : At least 2 populations variances are different.
R code:
 library(car)
 leveneTest(case4$totalass, case4$X.province, center=median)

p-value = 0.0077
Decision rule : Reject Ho if p-value < ∝
We have : p-value = 0.077 > 0.05
 Do not reject Ho
 We have enough evidence to conclude that all populations variances are equal
 Assumption 3 correct
Use one-way ANOVA to test for any significant differences due to province
# One-way ANOVA
aovcase4.1<- aov(case4$totalass~ case4$X.province, data=case4)
summary(aovcase4.1)

Question 3: At the .05 level of significance test for any significant differences due to
X.province, types of ownership, and interaction (ignore the effect of quantity produced and
quantity sold. Check all the assumptions of the inference technique you use. Are the assumptions
satisfied? Explain. Draw an interaction plot and interpret the plot. Is the plot consistent with the
conclusions?
I. Assumptions:
1) All populations are normally distributed
2) Samples were selected by using simple random sampling
3) Samples are independent
4) All population standard deviations are equal (Slargest <2Ssmallest )

Assumption 1: All populations are normally distributed


In order to check the normal distribution of the populations, we use QQ plot with R command:
 install.packages("car")
 library(car)
 qqPlot(lm(case4$totalass ~ X.province + own + own*X.province, data = case4), simulate
= T, main = “Q-Q Plot”, labels=F)
few outliers  vẫn cho là normally distributed và cho phần outliers vào limitations
Assumption 2 & 3: Samples were selected by using simple random sampling, independent
table(case4$own, case4$X.X.province)
Output:

Assumption 4: All population standard deviations are equal


by(case4$totalass, list(case4$X.X.province,case4$own),sd)

Slargest 148,425.6
= = 19.62588
Ssmallest 7562.748

 standard deviation of each


sample was not
 equal.
 standard deviation of each
sample was not
 equal.
 SD are not all equal  continue to use ANOVA  limitation
 Use Levene test although >2.5 many times
Rstudio:
 install.packages("car")
 library(car)
 leveneTest(case4$totalass, case4$own, center = median)
Output:

1. Hypothesis
H0: The population variances are equal
Ha: The population variances are not all equal
2. P-value = 0.2179
3. Rejection rule: Reject H0 if p-value < α
We have: 0.2179 > 0.05
 Do not reject H0
4. Conclusion
 Assumption 3 is satisfied
II. Hypothesis
H0: µ1 = µ2
Ha: Two populations are different
 aov2 <- aov(totalass ~ own,data= case4)
 summary(aov2)
Output:

III. Rejection Rules: Reject H0 if p-value < α


We have: 0.152 > 0.05
 Do not reject H0
Conclusion
R INPUT
Q1
# import the .csv file “dataset23.csv”
getwd()
setwd("")
case4<- read.table("dataset23.csv", header = TRUE , sep = ",", quote ="/", stringsAsFactors =
FALSE )

# Some first rows of the data


head(case4)

# Display the structure of case4 data frame


str(case4)

# Convert character variables into factors


case4$X.province<-factor(case4$X.province, levels = c("Hanoi","Haiphong","TP HCM"))
case4$own<-factor(case4$own, levels = c("One-owner","Multi-owner"))
str(case4)

# Summary data
summary(case4)
table(case4$X.province,case4$own)

# Summary data by groups


by(case4$X.quantityproduct,list(case4$X.province,case4$own),summary)
by(case4$X.quantitysold,list(case4$X.province,case4$own),summary)
by(case4$totalass,list(case4$X.province,case4$own),summary)

# Descriptive data in statistic


install.packages("psych")
library("psych")
describeBy(case4["X.quantityproduct"],list(case4$X.province,case4$own))
describeBy(case4["X.quantitysold"],list(case4$X.province,case4$own))
describeBy(case4["totalass"],list(case4$X.province,case4$own))

# Descriptive data in graphs


boxplot(X.quantityproduct ~ X.province + own, data = case4, xlab = "Specify address of firm
and Ownership status", ylab = "Quantity produced for the most important product", col =
c("red", "blue", "yellow","pink","grey","green"))
boxplot(X.quantitysold ~ X.province + own, data = case4, xlab = "Specify address of firm and
Ownership status", ylab = "Quantity sold base one quantity produced for the most important
product", col = c("red", "blue", "yellow","pink","grey","green"))
boxplot(totalass ~ X.province + own, data = case4, xlab = "Specify address of firm and
Ownership status", ylab = "Total assets in 2014", col = c("red", "blue",
"yellow","pink","grey","green"))

install.packages("gplots")
library("gplots")
plotmeans(X.quantityproduct~interaction(X.province, own), data=case4, xlab = "Specify address
of firm and Ownership status", ylab = "Quantity produced for the most important product",
main="Mean Plot with 95% CI")
plotmeans(X.quantitysold~interaction(X.province, own), data=case4, xlab = "Specify address of
firm and Ownership status", ylab = "Quantity sold base one quantity produced for the most
important product", main="Mean Plot with 95% CI")
plotmeans(totalass~interaction(X.province, own), data=case4, xlab = "Specify address of firm
and Ownership status", ylab = "Total assets in 2014", main="Mean Plot with 95% CI")

Q2
#Check assumptions
#Check independence and simple random sample and sample sizes are equal
table(case4$X.province)

#Check population are normally distributed


install.packages("car")
library(car)
qqPlot(lm(case4$totalass ~ case4$X.province,data = case4), simulate=T, main="Q-Q Plot",
labels=F)

#Check all population variances are equal


by(case4$totalass,case4$X.province,sd)
43451.78/110745.9

#levene test
library(car)
leveneTest(case4$totalass, case4$X.province, center=median)

# One-way ANOVA
aovcase4.1<- aov(case4$totalass~ case4$X.province, data=case4)
summary(aovcase4.1)

Q3
#Check assumptions
#Check independent and simple random sample and sample sizes are equal
table(case4$own,case4$X.province)
str(case4)

#Check population are normally distributed


library(car)
qqPlot(lm(case4$totalass ~ X.province + own + own*X.province, data = case4), simulate = T,
labels=F)

#Check all population variances are equal


by(case4$totalass, list(case4$X.province,case4$own),sd)
148425.6/7562.748

#levene test
library(car)
leveneTest(case4$totalass, case4$own, center = median)

# Two-way ANOVA
aovcase4.2 <- aov(case4$totalass ~ own,data= case4)
summary(aovcase4.2)
Box plot:

QQ plot:

You might also like