Applied Statistics - Revision Exercise
Question 1
The volume of each box of "Healthy Milk" is supposed to be normally distributed with mean of 1.05 liters
and standard deviation of 0.02 liter. It is labeled as 1.00 liter so as to minimize the chance of under volume.
(a) What is the probability that a randomly selected box of "Healthy Milk" weighs less than 1.00 liter?
(b) There is a 5% chance that a randomly selected box of "Healthy Milk" weighs more than K liters. What
is the value of K?
(c) Six randomly selected boxes of “Healthy Milk” are packed together as a value pack. What is the
probability that there is exactly one box of "Healthy Milk" that weighs less than 1.00 liter in a
randomly selected value pack?
(d) A new management team is overtaking the production of “Healthy Milk”. They suggest that the
average volume of the milk per box can be reduced to 1.03 liters with the standard deviation being
unchanged. After the change, would the probability that a randomly selected box of “Healthy Milk”
weights less than 1.00 liter been increased or decreased? Explain your answer with calculation.
Question 2
The weights (in grams) of a random sample of 15 Japanese peaches were recorded as follows:
k 98 100 104 107 108 113 115
115 116 116 117 118 119 120
2
(a) The sample mean of these 15 peaches is 110 3 grams. What is the value of k?
(b) Find the first quartile, the median and the third quartile of the weights of the Japanese peaches.
(c) Describe the skewness of the distribution of the weights of the Japanese peaches. State the reason with
calculation.
(d) The price of the Japanese peach is $0.25 per gram. Find the sample mean, sample median, and sample
standard deviation of the price of a Japanese peach.
1
Question 3
The following data were obtained in order to test for the effectiveness of an industrial safety program in
reducing the loss of labor hours due to accidents. The table below presents the numbers of labor hours loss
due to accidents before and after the program was put into operation in 10 randomly selected factories.
Factory 1 2 3 4 5 6 7 8 9 10
Before 45 73 46 124 33 57 83 34 26 17
After 36 60 44 119 35 51 77 29 24 11
(a) Test, at the 5% significance level, if there is any evidence to conclude that the weekly loss of labor
hours due to accidents after the implementation of the program is less than before. Give a full report
which must include (i) null hypothesis and alternative hypothesis of the test, (ii) rejection region, (iii)
calculation of test statistics, and (iv) conclusion.
(b) Construct a 90% confidence interval for the population mean number of weekly losses of labor hours
due to accidents in the factories after the program was put into operation.
Question 4
The table shows the monthly salary and the education level of 1600 employees in an international company.
Salary Master-Degree holder Non Master-Degree
holder
Salary $40000 or above 450 150
$20000 to below $40000 230 500
Below $20000 98 172
(a) What is the probability that a randomly selected employee in this company earns below $40000 a
month?
(b) What is the probability that a randomly selected employee in this company is a Master-Degree holder?
(c) John is a Master-Degree holder working in this company. What is the probability that he earns below
$20000 a month?
(d) For a staff earns $20000 or above a month in this company, what is the probability that he / she is a non
Master-Degree holder?
(e) It is known that Peter works in this company and earns $35000 a month. What is the probability that he
is a Master-Degree holder?
2
Question 5
A total of 350 over-weight patients took a training program organized by a physical club. The table below
shows the frequency distribution of the patients regarding to the weight lost within one month, and the
center at which the patient took the training program.
Weight lost within Center
one month Central Wan Chai Causeway Bay Total
1 to 4 pounds 56 57 72 185
5 to 9 pounds 32 39 55 126
≥ 10 pounds 12 14 13 39
Total 100 110 140 350
The senior manager wants to know which of the three centers, Central, Wan Chai, or Causeway Bay is more
popular. Test, at the 5% significance level, whether the ratio of patient took the training program at
“Central : Wan Chai : Causeway Bay” is different from “1 : 1 : 1”. Give a full report which must include (i)
null hypothesis and alternative hypothesis of the test, (ii) rejection region, (iii) calculation of test statistics,
and (iv) conclusion.
Question 6
The filling process dispenses cookies into bags so that the weight of a bag of cookies follows a normal
distribution with a mean of 510 grams and a standard deviation of 4 grams. A bag of cookies is classified as
underweight if it weighs less than 500 grams.
(a) What is the probability that a randomly selected bag of cookies is underweight?
(b) There is a 10% chance that a randomly selected bag of cookies weighs less than K grams. What is the
value of K?
(c) Two bags of cookies are selected randomly and independently. What is the probability that the total
weight of two bags of cookies is less than 1030 grams?
3
Question 7
A survey was conducted after students had graduated from the community college. The following data were
the grade point averages and the monthly incomes of six graduates of the associate degree in business
administration.
GPA, X Monthly Salary (000), Y
2.6 13
2.9 16
3.2 15
3.4 19
3.5 14
3.6 16
(a) Present the relationship between the grade point average and the monthly income by calculating the
correlation coefficient of the above data and comment on it.
(b) Present the relationship between the grade point average and the monthly salary income by finding the
value of a and b in the regression line y = a + bx.
(c) Interpret the values of a and b of the regression line in part (b).
(d) Use the regression line in part (b) to estimate the monthly income for an associate degree in business
administration graduate who has a GPA of 4.0. Comment on the reliability of the estimation with
reason.
Question 8
The personnel department of a company decides to investigate whether the computer training course offered
by the software company is effective. Among all employees who have taken the training course, 15
employees are selected at random and a test is given to them to evaluate their performances. The test scores
are given as follows:
38 47 88 66 44 47 52 67 68 41 55 77 62 66 57
It is reasonable to assume the test scores are normally distribution with population standard deviation is 13
marks.
Another sample of 31 employees is selected randomly. Among the 31 randomly selected employees, 25 of
them evaluate the training course positively.
(a) Test, at the 5% significance level, whether the population mean test score is lower than 60 marks. Give
a full report which must include (i) null hypothesis and alternative hypothesis of the test, (ii) rejection
region, (iii) calculation of test statistics, and (iv) conclusion.
(b) Construct a 98% confidence interval estimate for the population proportion of employees who evaluate
the training course positively.
4
Question 9
The distribution of the number of optical mice, X, bought by a customer in a single purchase at a computer
shop, is given in the following table:
x 1 2 3 4 5 6 >6
Probability 1/12 2/12 3/12 k 2/12 1/12 0
(a) Find the value of k.
(b) Find the expected value of X.
(c) Find the variance of X.
The shopkeeper charges $40 for each optical mouse, and the cost of each optical mouse is $10.
(d) Using the answers in (b) and (c) or otherwise, find the expected value and variance of the profit made
from each customer in a single purchase of optical mouse.
Question 10
As part of a survey conducted by a travel agency, the following information is collected from a sample of
passengers who use different kinds of transportation to the airport:
Kinds of transportation
Purpose of the trip Taxi Bus Railway Total
On vacation 23 62 33 118
Business trip 47 28 7 82
Total 70 90 40 200
The railway director once said that "30% of the passengers would take railway to the airport". Using the
information provided in the table, test, at the 5% significance level, if the statement made by the director is
correct. Give a full report which must include (i) null hypothesis and alternative hypothesis of the test, (ii)
rejection region, (iii) calculation of test statistics, and (iv) conclusion.
5
Question 11
A manager of a bank is studying the credit records of his clients. He finds that the number of mortgages
being held by each of his clients has the following probability distribution:
x 1 2 3 4 5 >5
Probability 0.7 4k 3k 2k k 0
Let X be the number of mortgages held by a client.
(a) Find the value of k.
(b) Find the expectation and standard deviation of X.
(c) The bank evaluates the risk of a client's portfolio by using the standard deviation of Y, where
Y = 3X – 5. Find the expectation and standard deviation of Y.
(d) For a randomly selected sample of 9 clients, what is the probability that not more than 2 of them have
exactly 5 mortgages?
6
Question 12
A research is conducted to review the monthly spending of primary and secondary school students on extra-
curriculum activities. Students are randomly selected from different schools to take place in the survey. The
summary of the survey result is as follow:
Total number of interviewees Average spending ($)
Primary school students 40 2500
Secondary school students 45 2700
(a) Find the combined sample mean monthly spending on extra-curriculum activities of these 85 students.
(b) Refer to the below Excel summary report generated at 5% significance level, prepare a report in p-value
approach to discuss if there is sufficient evidence to conclude that the population mean spending on
extra-curriculum activities in a month by secondary school students is higher than that by primary
school students. (The test report must include (i) null hypothesis and alternative hypothesis, (ii)
calculated test statistics, (iii) p-value and (iv) conclusion with reason.)
t-Test: Two-Sample Assuming Equal Variances
Primary school Secondary school
student student
Mean 2500 2700
Variance 56923.0769 68181.8182
Observations 40 45
Pooled Variance 62891.5663
Hypothesized Mean Difference 0
df 83
t Stat -3.6700
P(T<=t) one-tail 0.0002
t Critical one-tail 1.6634
P(T<=t) two-tail 0.0004
t Critical two-tail 1.9890
7
Question 13
(a) A convenience store company with 2000 convenience stores wants to conduct a survey to understand
the sales of frozen food products in the convenience store. Questionnaires are sent to a sample of 40
convenience stores, and the corresponding store managers are responsible to complete the
questionnaires. The number of convenience stores classified by the store’s size as Large, Medium,
Small and Tiny are as follows:
Store size Large Medium Small Tiny
Number of convenience store 750 600 550 100
(ai) What is the population of this study.
(aii) If the sample is selected by stratified sampling method, with stores of different sizes to form the 4
strata, how many (I) Large sized and (II) Medium sized convenience stores should be included in
the sample?
(b) Suppose the traveling time from Mong Kok to Sha Tin by bus is normally distributed with mean of 45
minutes and standard deviation of 22 minutes, while the travelling time from Mong Kok to Sha Tin by
MTR is normally distributed with mean of 55 minutes and standard deviation of 12 minutes. It is now
8:15 pm and you are in Mong Kok. By which travelling method you will have a higher chance to
arrive Sha Tin before 9:00 pm?
I: by bus
II: by MTR
(c) In a bookstore, the spending of a customer for a comic book is known to follow a normal distribution
with population mean of $120 and population standard deviation of $25. Suppose each customer will
get a 5% off discount today. Find the median spending for a comic book in the bookstore after the
discount.
(d) In a company with 900 employees, 400 of them have more than 5 years working experience. Among
all employees, 200 of them are university graduates. For those who have more than 5 years working
experience, 150 of them are university graduates. One employee is randomly selected from the
company, and he says that he has less than 5 years working experience. What is the probability that he
is a university graduate?
8
For parts (e) to (g)
FAST bus company is planning to raise their fares. The management team wants to know the current
spending patterns of their customers. Choose the appropriate hypothesis test from the below list for parts
(c) to (e).
I: One-tailed z-test for a mean
II: One-tailed t-test for a mean
III: Two-tailed z-test for a mean
IV: Two-tailed t-test for a mean
V: One-tailed z-test for a proportion
VI: Two-tailed z-test for a proportion
VII: ANOVA test
VIII: Chi-square test
(e) Test, at 1% significance level, if the population proportion of customers who spend more than $5000 in
travelling in a month is different from 0.1.
(f) Test, at 5% significance level, if the population mean monthly travelling expenditures of their
customers living in 4 different regions are all the same.
(g) Test, at 1% significance level, if the population mean monthly travelling expenditure of their customers
is higher than $1,000. (Assume the population standard deviation is known)
9
The entries in Table I are the probabilities that a random variable having the standard normal
distribution will take on a value between 0 and z. They are given by the area of the gray region
under the curve in the figure.
TABLE I NORMAL-CURVE AREAS
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4648 0.4656 0.4664 0.4671 0.4678 0.4685 0.4692 0.4699 0.4706
1.9 0.4713 0.4719 0.4725 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Also, for z = 4.0, 5.0 and 6.0, the areas are 0.49997, 0.4999997, and 0.499999999.
10
The entries in Table II are values for which the area to their right under the t distribution with
given degrees of freedom (the gray area in the figure) is equal to .
TABLE II VALUE OF t
d.f. t0.050 t0.025 t0.010 t0.005 d.f.
1 6.314 12.706 31.821 63.657 1
2 2.920 4.303 6.965 9.925 2
3 2.353 3.182 4.541 5.841 3
4 2.132 2.776 3.747 4.604 4
5 2.015 2.571 3.365 4.032 5
6 1.943 2.447 3.143 3.707 6
7 1.895 2.365 2.998 3.499 7
8 1.860 2.306 2.896 3.355 8
9 1.833 2.262 2.821 3.250 9
10 1.812 2.228 2.764 3.169 10
11 1.796 2.201 2.718 3.106 11
12 1.782 2.179 2.681 3.055 12
13 1.771 2.160 2.650 3.012 13
14 1.761 2.145 2.624 2.977 14
15 1.753 2.131 2.602 2.947 15
16 1.746 2.120 2.583 2.921 16
17 1.740 2.110 2.567 2.898 17
18 1.734 2.101 2.552 2.878 18
19 1.729 2.093 2.539 2.861 19
20 1.725 2.086 2.528 2.845 20
21 1.721 2.080 2.518 2.831 21
22 1.717 2.074 2.508 2.819 22
23 1.714 2.069 2.500 2.807 23
24 1.711 2.064 2.492 2.797 24
25 1.708 2.060 2.485 2.787 25
26 1.706 2.056 2.479 2.779 26
27 1.703 2.052 2.473 2.771 27
28 1.701 2.048 2.467 2.763 28
29 1.699 2.045 2.462 2.756 29
Inf. 1.645 1.960 2.326 2.576 Inf.
11
The entries in Table III are values for which the area to their right under the chi-square distribution with given
degrees of freedom (the gray area in the figure) is equal to .
TABLE III VALUES OF 2
d.f. 0.2 05 2 d.f.
0.01
1 3.841 6.635 1
2 5.991 9.210 2
3 7.815 11.345 3
4 9.488 13.277 4
5 11.070 15.086 5
6 12.592 16.812 6
7 14.067 18.475 7
8 15.507 20.090 8
9 16.919 21.666 9
10 18.307 23.209 10
11 19.675 24.725 11
12 21.026 26.217 12
13 22.362 27.688 13
14 23.685 29.141 14
15 24.996 30.578 15
16 26.296 32.000 16
17 27.587 33.409 17
18 28.869 34.805 18
19 30.144 36.191 19
20 31.410 37.566 20
21 32.671 38.932 21
22 33.924 40.289 22
23 35.172 41.638 23
24 36.415 42.980 24
25 37.652 44.314 25
26 38.885 45.642 26
27 40.113 46.963 27
28 41.337 48.278 28
29 42.557 49.588 29
30 43.773 50.892 30
12