DEDAN KIMATHI UNIVERSITY OF TECHNOLOGY
University Examinations 2017/2018
Third Year First Semester Examination for the Degree of Bachelor of Science in Actuarial
Science
Date: 23rd August 2017 STA 2311: Statistical Programming II Time: 2 hours
INSTRUCTIONS: Answer question ONE (COMPULSORY) and any other TWO questions.
QUESTION ONE [30 MARKS]
(a) Briefly discuss some of the distinguishing features between Univariate data and Bivariate
data. [3 marks]
(b) Distinguish between Chi-squared goodness of fit tests and Chi-squared tests for homogeneity.[4 marks]
(c) Explain the concept of central limit theorem, and hence write the R-code that can justify this
concept by using randomly generated data from a normal distribution. [6 marks]
(d) Categorical variables can come from numeric variables by aggregating values. Consider the
following salary data (in thousands); {12, 0.4, 5, 2, 50, 8, 3, 1, 4, 0.25}. Write the R-code
that can input the data and create three groups of employees, namely; (poor, moderate, rich)
ranging from [0, 1], (1, 5], (5, 50] respectively. [5 marks]
(e) Briefly discuss the role of analysis of variance (ANOVA) in statistical analysis and how it is
implemented in R. [5 marks]
(f) Suppose the manufacturer claims that the mean lifetime of a light bulb is more than 10,000
hours. In a sample of 30 light bulbs, it was found that they only last 9,900 hours on average.
Assume the population standard deviation is 120 hours. At 5% level of significance, can we
reject the claim by the manufacturer? Text this and hence give the R-code for each step. [7 marks]
STA 2311: Statistical Programming II Page 2 of 3
QUESTION TWO [20 MARKS]
(a) Briefly discuss the concept of Exploratory Data Analysis as used in statistical analysis and
hence describe any three graphical representations tools. [5 marks]
(b) A local company conducts a survey where people watch an advertisement which contains
a phone number. Afterwards they are asked to pick the phone number from a short list of
numbers. In a simple random sample of 1512 respondents 474 picked the right answer and
1038 did not. Write R-code that finds the 90% confidence interval for the proportion of all
people who could correctly choose the given phone number. [7 marks]
(c) Briefly discuss how model assumptions can be tested in regression analysis, and hence give
the R-code that can produce the diagnostic plots and interpret each. [8 marks]
QUESTION THREE [20 MARKS]
(a) Briefly explain the following statistical tests; z-test and t-testand give their respective R-codes.[5 marks]
(b) Consider the following R-code and hence explain what each line of code does;
> simple.z.test = function(x,sigma,conf.level=0.95) {
+ n = length(x);
+ xbar=mean(x)
+ alpha = 1 - conf.level
+ zstar = qnorm(1-alpha/2)
+ SE = sigma/sqrt(n)
+ xbar + c(-zstar*SE,zstar*SE)
+ }
> simple.z.test(x,1.5)
[5 marks]
(c) Briefly describe a stem and leaf plot, and hence use the data; 10, 15, 22, 25, 28, 23, 29, 31,
36, 45, 48 to generate a stem and leaf plot. [5 marks]
(d) Discuss the concept of resistant measures of center and spread by using the relevant R-code.[5 marks]
QUESTION FOUR [20 MARKS]
(a) Explain the underlying concepts of the functions, f ivenum() and the quantiles() hence give
an example for each using appropriate R commands. [4 marks]
(b) Explain the following model formula as used in analysis of variance; (i) y ∼ x1 (ii) y ∼
x1 + x2 (iii) y ∼ x1 + x2 + x3 (iv) y ∼ x1 ∗ x2 .
[4 marks]
(c) Describe any three ways to view multivariate data. [6 marks]
(d) Describe the steps to be followed when carrying out an ANOVA test during statistical analysis.[6 marks]
Semester one exam Please turn over. . .
STA 2311: Statistical Programming II Page 3 of 3
QUESTION FIVE [20 MARKS]
(a) Briefly discuss the concept of regression as used in statistical analysis. [3 marks]
(b) The cost of a home depends on the number of bedrooms in the house. Suppose the following
data is recorded for homes in Nyeri town.
price (in thousands) 300 250 400 550 317 389 425 289 389 559
No. bedrooms 3 3 4 5 4 3 6 3 4 5
i. Write the R-code that inputs the data and implement the appropriate commands to perform
linear regression analysis. [4 marks]
ii. Suppose that the linear regression analysis output was obtained from the data above.
Interpret the results. [7 marks]
> summary(lm(homes$Price˜homes$‘No. of bedrooms‘))
Call:
lm(formula = homes$Price ˜ homes$‘No. of bedrooms‘)
Residuals:
Min 1Q Median 3Q Max
-108.00 -53.95 -5.75 59.77 99.10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 94.40 97.98 0.963 0.3635
homes$‘No. of bedrooms‘ 73.10 23.76 3.076 0.0152 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 75.15 on 8 degrees of freedom
Multiple R-squared: 0.5419,Adjusted R-squared: 0.4846
F-statistic: 9.462 on 1 and 8 DF, p-value: 0.01521
iii. From the results above, estimate the cost of a 2-bedroomed house and hence distinguish
between Type I and Type II errors . [4 marks]
(c) Explain the concept of bootstrap sampling and hence give some of the benefits of sampling. [5 marks]
END OF EXAM
Semester one exam Last page