KEMBAR78
Practice Tutorial Pack | PDF | Statistics | Probability Distribution
0% found this document useful (0 votes)
15 views35 pages

Practice Tutorial Pack

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views35 pages

Practice Tutorial Pack

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

PRACTICE TUTORIALS

1. DESCRIPTIVE STATISTICS...................................................................................................................1
2. CORRELATION AND REGRESSION ....................................................................................................9
University of the Witwatersrand 3. TIME SERIES .........................................................................................................................................13
4. PROBABILITY PART I..........................................................................................................................21
Johannesburg 5. PROBABILITY PART II ........................................................................................................................24
6. DISCRETE DISTRIBUTIONS ...............................................................................................................28

School of Statistics and Actuarial Science 7. CONTINUOUS VARIABLES AND EXPONENTIAL DISTRIBUTION.............................................32
8. NORMAL DISTRIBUTION AND CENTRAL LIMIT THEOREM......................................................36
9. CONFIDENCE INTERVALS AND HYPOTHESIS TESTING ............................................................40
10. p-VALUES AND CHI-SQUARED DISTRIBUTION ...........................................................................46

Business Statistics
STAT1000/4/A

PRACTICE TUTORIAL PACK


1. DESCRIPTIVE STATISTICS Question 3
A survey was conducted to determine the number of calls received by a crime stop call centre over a period

Question 1 of 50 days. The bar chart of the data is as follows:

In a household survey the respondents were asked how many children there are in the household. The 20 19
18
following data were recorded: 16
16

0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 14
12

Frequency
2 2 2 3 3 3 4 4 5 5 10
8
8
6
1.1 Calculate the mean, median, range and variance for the above data. 4 3
4

1.2 Construct a frequency table for the above data. List the frequency and relative frequency. 2
0
1.3 Recalculate the measures in Q1.1 for the grouped data from Q1.2. Compare the results from Q1.1 1 2 3 4 5

and Q1.3 and comment on the differences / similarities.


1.4 Plot a bar chart of the grouped data from Q1.2. Which is the best measure of central tendency for
3.1 Calculate the first decile and the interquartile range for this data.
this data? Justify your answer.
3.2 What proportion of days received less than 3 calls?

Question 2
Question 4
At a meeting of information systems officers for regional offices of a national company, a survey was taken
A financial institution is interested in the number of store cards a person uses. The data in the bar chart
to determine the number of employees the managers supervised in the operation of their departments. The
below were collected from individuals whose total income exceeds R10 000 per month.
data are given as follows:
60
53
3 1 5 3 5
50
5 2 2 4 2
40 38
5 3 4 1 1

Frequency
30
30
3 3 4 3 1
20
20
3 1 5 4 3
12
10 7

2.1 Calculate the mode and coefficient of variation for the above data. 0
0 1 2 3 4 5
2.2 Construct a frequency table for the above data. List the frequency, cumulative frequency and
cumulative frequency percentage in the table.
2.3 Which one of the following is true? Which of the following statements is / are false?

a) The highest number of employees supervised is 8 a) 30% of people do not use store cards

b) At least 25 managers were included in the survey b) No more than 5 store cards are used

c) 32% of managers supervised at least 2 employees c) Since the data are positively skewed, the mean is larger than the median

d) 2 employees are supervised by 3 managers d) About 12% of people use at least 4 store cards

e) None of the above are true

1 2
Questions 5 Question 7
The data in the following table are the average maturities (in days) of a sample of money market funds. The data in in the following table represent scores achieved by a number of job applicants on a personality
profile.
11 20 22 24 26 28
13 20 22 24 26 28 Score Relative frequency
17 21 23 24 27 30 [10, 20) 0.04
18 21 23 25 27 32 [20, 30) 0.09
19 21 24 25 28 34 [30, 40) 0.16
[40, 50) 0.21
5.1 Calculate the mean, variance and range for the above data. [50, 60) 0.24
5.2 Group the data into intervals of width 5, where the lower limit of the first class is 10 (inclusive). List [60, 70) 0.26
the frequency and cumulative frequency of each class.
5.3 Plot an ogive from the grouped data in Q5.2. What proportion of funds mature within 25 days? State, with reason, whether the following statements are true or false.
5.4 Recalculate the measures in Q5.1 for the grouped data from Q5.2. Compare the results from Q5.1 a) If there were 60 applicants, 30 of them would score not more than 50
and Q5.3 and comment on the differences / similarities. b) If there were 25 applicants, approximately 7 would score in the top 25%
c) If there were 100 applicants, 25 of them would score at least 20 but less than 40
Question 6 d) If approximately 38 applicants scored less than 40, then 131 were tested in total
The following table depicts the speeds of 140 drivers at a certain point on a main road, where the speed limit
is 80 km/h. Question 8
A study was conducted to determine the number of times a student logs on to facebook in a two-week
Driver speed Frequency percentage period. The data obtained from a sample of 110 students are summarized as follows:
”x < 65 10
”x < 70 13 Class Relative frequency
”x < 75 17 2d x4 0.21
”x < 80 25 4d x6 0.29
”x < 85 27 6d x8 0.26
”x < 90 7 8 d x  10 0.13
”x < 95 1 10 d x  12 0.08
12 d x  14 0.03
6.1 Calculate the mode and interquartile range for this data.
6.2 Construct frequency polygon for the above data. State, with reason, whether the following statements are true or false.
6.3 Comment of the shape of the distribution in terms of symmetry and modality. Which is the best a) The best measure of central tendency for this data is the IQR
measure of variability for this data? Justify your answer. b) The median is contained in the second class interval
c) 24% of students log on to facebook more than 8 times during a 2-week period
d) The second decile is in position 2.2

3 4
Question 9 Question 11
An accountant for a retail store summarised the account balances of a random sample of customers. The The number of times a customer shops at her store during a 2-week period. The data collected are described
cumulative frequency percentage is given as follows: by the following ogive for a sample of 60 customers.
Class Cumulative frequency percentage 70

59 60
0 to under 200 10 60
55
50
200 to under 400 26 50

400 to under 600 66 40 35

30
600 to under 800 88
20
800 to under 1000 100 11
10
0
0
9.1 If 90 customers were sampled, how many account balances are between 600 and 800? 0 3 6 9 12 15 18

a) 0.88
b) 20
Which one of the following statements is true?
c) 700
a) The median value is 9
d) 79
b) 35% of customers shopped less than 6 times in the 2 weeks
9.2 If 150 customers were sampled, calculate the 68th percentile for the above data.
c) 55 customers shopped exactly 12 times in the 2 weeks
d) None of the above are true
Question 10
The number of accounts evaluated during a company audit depends on the size of the company. The
Question 12
following ogive is based on the number of accounts that require checking for 92 companies.
The following stem-and-leaf display represents the times (to the nearest minute) taken by 25 Business
100
92 Statistics students to carry out a sequence of data analyses using Excel (leaf unit = 1). Before this test, all
89
90 85
80 students received the same training in keyboard skills and the use of that particular computer. During the
72
70
test, conditions were kept as similar as possible.
60
50
40
31
30 Stem Leaf
20
8 0 9
10 2
0
0 1 4 5
25 30 35 40 45 50 55 60
2 0 0 0 3 4
3 1 1 3 4 5 8
State, with reason, whether the following statements are true or false. 4 2 3 5 5 7 8 8
a) For 72 companies exactly 45 accounts must be checked 5 0 1 2 2
b) No more than 92 accounts must be checked
c) Between 35 and 40 accounts must be checked for 25% of companies
d) The modal number of accounts to check is 41
e) In the ogive above the midpoints are plotted against the cumulative frequencies

5 6
12.1 Calculate the variance for this data. 14.1 Which one of the following statements about the best measure of central tendency is true?
12.2 Which one of the following statements is true? a) The mean is better than the median because the mean takes all readings into account
a) The mean is greater than the median b) The median is better than the mean because the mean will be deflated by the large number of
b) The best measure of central tendency for this data is the interquartile range relatively small readings
c) The data are skewed to the right c) The mean is better than the median because the median will be inflated by the tail on the right-
d) Since the data are represented in a graph, the mean can only be estimated hand-side of the distribution
e) Considering the shape of the distribution only, not the raw data, the mean is less than the mode d) The median is better than the mean because the mean will be distorted by the relatively small
number of relatively large readings
Questions 13 e) The mean is better than the median because the median will be distorted by the large number of
The following stem-and-leaf display represents the number of vitamin supplements sold by a health food relatively small readings
store over a period of 16 days. Leaf unit = 1. 14.2 Calculate the range, variance and upper quartile for this data.

Questions 15
Stem Leaf
A survey was conducted to determine how people rated the quality of programming available on television.
1* 9 9
Respondents were asked to rate the overall quality from 0 (no quality at all) to 100 (extremely good quality).
2 0 0 2 3
The stem-and-leaf display of the data is as follows (leaf unit = 1):
2* 5 6 7
3 0 3 4
3* 5 6 8 Stem Leaf
4 3 2 4
4* 5 4 0 3 4 7 8 9 9 9
5 0 1 1 2 3 4 5
6 1 2 5 6 6
13.1 Determine the sample size, mode and coefficient of variation for this data. 7 0 1
13.2 Describe the shape of the data in terms of symmetry. 8 2

Questions 14
The following stem-and-leaf plot shows the number of units produced per day in a factory (leaf unit = 1). 15.1 Calculate the mean, mode and 80th percentile for this data.
15.2 State, with reason, whether the following statements are true or false.
a) The best measure of variability for this data is the median
Stem Leaf
b) The median is the point that has exactly 50 values below it
5 66
c) The 35th percentile is a number between 9 and 10
6 0133559
d) The values calculated in Q15.1 are parameters
7 0236778
8 14559
9 0016
10 36

7 8
2. CORRELATION AND REGRESSION Question 2
A study was done to determine whether the test mark achieved by a student for a test written on Monday

Question 1 morning was affected by the amount of television watched (in hours) during the weekend. The following

1.1 The scatter diagram below was obtained after plotting Y against X. The correlation coefficient for information was obtained.

this data:
a) Is close to minus one Test Mark 96 85 82 74 95 68 84 58 65 75 50
b) Is close to zero Hours watching TV 0 1 2 3 3 5 5 6 7 7 10
c) Is close to one
d) Indicates that the slope is positive 2.1 Identify the independent and dependent variables and explain your choices.
e) Indicates that the slope is negative 2.2 The estimated linear model for this data is given by:

80 a) yˆ 93.9700  4.0674 x

70 b) yˆ 93.8197  4.0820 x

60 c) yˆ 17.3558  0.1699 x

50 d) yˆ 17.3123  0.1699 x
Y

40
e) None of the above
2.3 Calculate the error term for i = 1.
30
2.4 State whether the following statements are true or false. Give reasons for your answers.
20
15 20 25 30 35 a) If a student watched TV for 12 hours over the weekend, we would expect their test mark to be
X
57.08%
b) If a student obtained a mark of 37%, they had watched 14 hours of TV over the weekend
c) If a student watched 6 hours of TV over the weekend, their predicted test mark would be 58%.
1.2 The scatter diagram below was obtained after plotting Y against X. Describe the relationship
d) If a student watched 4 hours of TV over the weekend, their predicted test mark would be 77.5%
between X and Y. Would it be appropriate to calculate a linear regression line for these data? Justify
2.5 Calculate and interpret the coefficient of determination.
your answer.
120
Question 3
100
The marketing manager of a group of supermarkets would like to assess the relationship between shelf space
80 (in feet) and sales (in Rands) of a dishwashing liquid. The following data were collected from a random
60 sample of equal-sized stores:
Y

40

Shelf Space (A) 2 3 4 5 6 7 8 9


20
Weekly sales (B) 176 194 212 226 236 244 254 261
0
0 2 4 6 8 10 12 14
X
3.1 Identify the role of each variable. Justify your selection.
3.2 Determine the least squares regression line to predict the impact of shelf space on sales.
3.3 Calculate the error term for i =3.
9 10
3.4 The strength of the linear relationship between shelf space and weekly sales is given by: Question 5
a) –0.985 Given the following summary statistics for 10 pairs of observations of (X, Y):

¦x ¦x ¦y ¦y ¦ xy
2
b) 0.985 51 333 70 2
516 386
c) 0.970
min x 4 max x 22 min y 3 max y 19
d) –0.970
e) 0.992
5.1 The estimated equation for the regression of Y on X is:
3.5 State whether the following statements are true or false. Give reasons for your answers.
a) Ŷ 4.971  0.398X
a) It is statistically valid to use the least-squares regression equation calculated in Question 2 to
predict the shelf space when the weekly sales value is 220 b) Ŷ 4.971  0.398X
b) The coefficient of determination indicates that there is a strong positive linear relationship c) Ŷ 4.971  0.398X
between the weekly sales and shelf space d) Ŷ 0.398  4.971X
c) An increase in shelf space results in an increase in weekly sales
e) Ŷ 4.971  0.398X
d) If a randomly selected store has 7.5 feet of shelf space, the store’s estimated weekly sales is
5.2 Which one of the following statements is true?
R249.2
a) It is statistically valid to use the least-squares regression equation calculated in Question 7 to
e) It is statistically valid to use the least-squares regression equation calculated in Question 2 to
predict X for a Y value of 15.
predict the weekly sales value when the shelf space is 1 feet
b) The correlation coefficient is positive, which means that as X increases, Y tends to increase.
3.6 Calculate the amount of variation in the weekly sales that is explained by shelf space in the
c) The correlation coefficient is negative, which means that as X increases, Y tends to decrease.
regression equation.
d) It is statistically valid to use the least-squares regression equation calculated in Question 7 to
predict Y for an X value of 3.
Question 4
e) If the correlation coefficient is 0 there is no linear relationship between the two variables.
In a chemical manufacturing company, it is thought that the yield (Y) of a chemical process is related to the
5.3 What percentage of the total variation in Y is not explained by variation in X?
various fixed levels of temperature (T). The following summary statistics were obtained from a sample of 8
5.4 Would you predict Y for an X value of 15? Justify your answer.
chemical processes:

¦Y 33.5 ¦T 26 ¦T 2
95 ¦Y 2
160.75 ¦ TY 95 Question 6
min(Y ) 1.5 max(Y ) 6.2 min(T ) 1.5 max(T ) 5.0 State, with reason, which of the following statements are true / false.
a) The coefficient of determination indicates the strength and direction of a linear relationship
4.1 Calculate the appropriate least squares regression line of yield on temperature. b) The least squares regression model regresses variable X on variable Y
4.2 Consider the following four statements and state which are true or false, and justify your answers. c) If Ŷ 3.5  0.5X , then for every one unit change in X there is a 3 unit change in Y.
a) The regression line found in Q4.1 indicates that chemical yield decreases with an increase in d) The correlation coefficient can take on any numerical value
temperature e) If r = 0.9, then 90% of the variation in Y is accounted for by variation in X in the model
b) Using the regression line found in Q4.1, if the temperature is 7, the predicted yield is 0.76 f) The dependent variable influences the independent variable
c) It is statistically valid to use the regression line found in Q4.1 to predict the temperature when g) The error term shows the difference between the observed and predicted values of the
the chemical yield is 6 independent variable
d) It is statistically valid to use the regression line found in Q4.1 to predict the chemical yield when h) It is statistically valid to predict the value of Y, provided that the answer lies in the valid range of
the temperature is 3.6 the dependent variable

11 12
3. TIME SERIES 1.1 Which one of the following statements is correct?
An additive model is not appropriate since the amplitude of the series is constant

Question 1 Winters model is appropriate since the amplitude of the series is constant

The management of a chain of motor vehicle dealerships is interested in forecasting vehicle sales at their A multiplicative model is appropriate since the amplitude of the series is constant
th
dealerships on a quarterly basis. Management have sales data starting from the 4 quarter of 2009. The data An additive model is appropriate since the amplitude of the series is constant

and some partial intermediate results are presented in Table 1 (all intermediate calculations have been A multiplicative model is not appropriate since there is seasonality in the series

rounded to four decimal places). A graph of the data appears in Figure 1. 1.2 The value of X in Table 1 is:

Quarter Time Yt CMAVt a) 134.000

4 1 143 - b) 134.125

1 2 131 - c) 134.250

2 3 122 133.000 d) 134.625

3 4 135 133.875 e) 134.750


1.3 The corrected seasonal estimate for the 4th quarter is (using the method of means and rounding
4 5 145 134.625
intermediate calculations to four decimal places):
1 6 136 134.750
a) –10.7343
2 7 123 134.625
b) 1.1250
3 8 135 X
c) 10.1719
4 9 144 134.000
d) 10.2031
1 10 133 134.250
e) 10.8848
2 11 125 134.625
1.4 If Tt 132.2  0.28t , what is the forecasted sales in the 3rd quarter of 2015 (to one decimal place)?
3 12 135 135.500
4 13 147 -
Question 2
1 14 137 -
A large toy manufacturer is analysing quarterly sales figures (in units sold) of one of their products. They
Table 1: Decomposition Analysis for Number of Car Sales
are particularly interested in forecasting sales for the 4th quarter of 2013 (which includes the Christmas
period). The data are given in Table 2 and depicted in Figure 2.
150

140
2.1 An inexperienced analyst uses Brown’s Simple Exponential Smoothing to obtain a forecast for the
4th quarter of 2013. If he uses a smoothing constant of 0.25 and finds that A12 = 228.1619, his
Car sales

130 forecast for 2013/4 will be (to two decimal places):


a) 166.54
120
b) 207.62
c) 228.16
110
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 d) 287.66
Time
e) 440.00

Figure 1: Plot of car sales vs. time

13 14
Date Yt (Sales) CMAVt De-trended Yt De-seasonalised Yt 2.3 The missing value labelled Y in Table 2 is:
2010/1 74 - - 119.1 a) 0.2500
2010/2 102 - - 127.3 b) 0.8014
2010/3 92 135.250 0.6802 138.9 c) 1.7500
2010/4 265 140.750 Y 138.4 d) 1.8828
2011/1 90 147.125 0.6117 144.9 e) 1.9150

2011/2 130 160.625 0.8093 162.2 2.4 The corrected seasonal index for the 4th quarter (using the method of medians and rounding all

2011/3 115 174.750 0.6581 173.6 intermediate calculations to 4 decimal places) is:

2011/4 350 X 1.9231 182.8 a) 0.2500

2012/1 118 189.375 0.6231 189.9 b) 0.8014

2012/2 160 204.250 0.7834 199.7 c) 1.7500

2012/3 144 219.000 0.6575 217.4 d) 1.8828


e) 1.9150
2012/4 440 - - 229.8
2.5 The forecast for the 4th quarter of 2013 using the classical decomposition method is:
2013/1 146 - - 235.0
a) 234.8
Table 2: Quarterly Toy Sales
b) 265.50
c) 445.9
450
d) 504.8
400

350
e) 521.4
300
Toy sales

250 Question 3
200 A manager of a grocery store believes that if he uses the sales data from his store to forecast for future sales
150
it would assist him in correctly estimating the quantity of stock that he should be ordering. The quarterly
100
sales data are given in Table ? and in Figure ?.
50
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Time
300

Figure 2: Toy sales against quarter 250

200

Grocery sales
2.2 The missing value labelled X in Table 2 is:
150
a) 182.000
100
b) 185.750
50
c) 193.000
d) 191.188 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
e) 215.500 Time

Figure: Plot of grocery sales versus time


15 16
Estimated De-trended Corrected Seasonal Which one of the following statements is correct?
Year Quarter Time Sales De-seasonalised Data
Trend data Indices a) a=0.2 is better than a=0.3 since it has a higher ME.
1 1 107 - - - 189.302 b) The ME is the best value to use to determine which the best smoothing constant is.
2 2 146 - - - 135.393 c) Smoothing constant producing the highest MSE is the best smoothing constant.
2010
3 3 177 138.375 1.279 - 128.010 d) a=0.3 is better than a=0.2 since it has a lower MSE.

4 4 130 137.250 0.947 - 133.509 e) a=0.2 is better than a=0.3 since it has a higher MAE.

1 5 94 139.625 0.673 - 166.302 3.3 Refer to Figure ?. Which one of the following statements is correct?

2 6 150 143.000 1.049 - 139.102 a) Since the data exhibits a constant trend, the additive model is appropriate.
2011 b) Since the data exhibits trend and seasonality, the multiplicative model is appropriate.
3 7 192 142.750 1.345 1.383 138.859
c) If one were to employ smoothing methods, Holts would be the most appropriate method to use.
4 8 142 141.500 1.004 0.974 145.833
d) A multiplicative model is appropriate since the amplitude of the series is increasing.
1 9 80 A ? B 141.534
e) All of the above statements are correct.
2 10 154 147.500 1.044 - 142.811
2012 3.4 The CMAV value indicated by X in Table 5 is:
3 11 220 145.000 1.517 - 159.109
a) 125.333
4 12 130 144.750 0.898 - 133.509
b) 131.667
1 13 72 149.250 0.482 - 127.380
c) 138.000
2 14 160 144.000 1.111 - 148.375
2013 d) 141.000
3 15 250 - - - 180.806
e) 145.500
4 16 58 - - - 59.565
3.5 Use the method of medians to find the missing value indicated by B in Table 5.
Table: Decomposition values of grocery sales data
3.6 Forecast sales for quarter 2 during

3.1 The manager first tries to use Brown’s smoothing method with a smoothing constant of 0.3 and he
Question 4
obtains A14 =137.277. Thus the forecast for sales for quarter 2 in 2014 is (correct to 3 decimal
A small cell phone retailer records the number of handsets that it sells of each brand on a monthly basis in
places):
order to assist with stock control. The data in following table are the records for one specific brand of
a) 67.211
handset over the past three months.
b) 96.102
c) 137.166
Month 1 2 3
d) 139.277
Yt 17 25 23
e) 171.094
3.2 He then decides to check if a smoothing constant of 0.3 is appropriate. He does the calculations for a
Using Brown’s simple exponential smoothing with a smoothing constant of 0.15, the forecasted handset
smoothing constant of 0.2 and constructs the following table for his findings.
sales in month 4 (to one decimal place) is:
Smoothing constant 0.2 0.3
a) 16.3
Mean error (ME) 8.122 4.399
b) 17.0
Mean absolute error (MAE) 37.243 34.441
c) 17.4
Mean square error (MSE) 2131.81 1757.75
d) 18.2
e) 18.9
17 18
Questions 5 5.3 Suppose Brown’s exponential smoothing with a smoothing constant of 0.1 is used to forecast guests.
A hotel keeps data on the number of guests it has per quarter. The data are plotted in Figure 6 and appear in If A10=415.55, what is the forecasted number of guests in the 3rd quarter of 2013?
Table 7. 5.4 Determine the value of A in Table 7.
a) 467.500
1000 b) 485.000
900 c) 500.625
800 d) 516.250
Number of guests

700
e) 530.000
600
5.5 Determine the value of B in Table 7.
500
a) –221.250
400

300
b) –165.625

200 c) –161.250
0 1 2 3 4 5 6 7 8 9 10 11 12 13
d) 0.699
Time
e) 4.375
Figure 6: Plot of number of guests versus time 5.6 Use the method of means to find the corrected seasonal indices to three decimal places.
5.7 Estimate the trend line and forecast the number of guests for t = 14. Round all intermediate and final
Year Quarter t Yt CMAVt Noisy Seasonality calculations to 3 decimal places.
2010 2 1 260 - -
2010 3 2 475 - -
2010 4 3 770 463.125 306.875
2011 1 4 330 476.25 -146.25
2011 2 5 295 A -205.625
2011 3 6 545 523.125 21.875
2011 4 7 895 538.125 356.875
2012 1 8 385 550.625 B
2012 2 9 360 563.125 -203.125
2012 3 10 580 576.875 3.125
2012 4 11 960 - -
2013 1 12 430 - -
Table 7: Decomposition analysis for hotel guests

5.1 Which method of exponential smoothing would you recommend for this data? Justify your answer.
5.2 Which model is most appropriate for decomposition analysis? Justify your answer and give the
formula for this model.

19 20
4. PROBABILITY PART I Question 5
Consider a game of chance in which a fair coin is flipped and a fair six-sided die is rolled. A player wins if

Question 1 the number on the die is smaller than or equal to 2 or if the coin is a tail and the number on the die is a 6.

A and B are events such that P( A) 0.3 , P( A ˆ B) 0.2 and P( B) 0.8 . Which one of the following
5.1 How many outcomes are there in the relevant sample space?
statements is true?
5.2 What is the probability that a player loses any particular game?
a) P( A ‰ B) 0.8

b) P ( A ˆ B ) 0.8
Question 6
c) P( A ˆ B) 0.4 The probability that house sales will increase in the next 6 months is estimated to be 0.25. The probability
d) P( A ‰ B) 0.4 that the interest rates on housing loans will go up in the same period is estimated to be 0.74. The probability
that house sales or interest rates will go up during the next 6 months is estimated to be 0.89. Which one of
e) P( A ˆ B) 0.75
the following statements is false?
a) The probability that both house sales and interest rates will increase is 0.10
Question 2
_
b) The probability that neither house sales nor interest rates will increase is 0.11
Let A and B be two events such that P( A) 0.7 , P( B) 0.4 and P( A ˆ B) 0.1 . Which one of the c) The probability that house sales will increase but interest rates will not is 0.15
following statements is false? d) The events of increase in house sales and no increase in house sales are mutually exclusive and
a) P ( A ˆ B) 0.5 collectively exhaustive
_ _ e) The events of increase in house sales and increase in interest rates are mutually exclusive
b) P( Aˆ B) 0.2
_ _
c) P( A‰ B) 0.5 Question 7
_______ A lecturer notices that 36% of Business statistics students attended lectures and tutorials. Three of the
d) P ( A ‰ B) 0.2
students had not attended any lectures and had not attended any tutorials. If attending tutorials and not
______
e) P ( A ˆ B) 0.2 attending lectures are mutually exclusive, and a total of 47 of the students attended lectures, then calculate
the probability that a randomly selected student attended lectures but not tutorials.
Question 3
A six-sided die is biased in such a way that even numbers are three times as likely as odd numbers. What is Question 8
the probability (to three decimal places) of observing at least a four on one roll of this die? If P A 0.5 , P B 0.3 and P ( A ˆ B ) 0.2 . State whether the following statements are true or
false and give detailed reasons.
Question 4
a) A and B are exhaustive events
The employees of a company were asked questions regarding their educational background and marital
b) A and B are mutually exclusive events
status. Of the 600 employees, 400 had college degrees, 100 were single, and 60 were single college
c) A and B are partitioning events
graduates. Which one of the following statements is false?
a) The probability that an employee of the company is single or has a college degree is 0.733
Question 9
b) The probability that an employee of the company is married and has a college degree is 340/600
Suppose two events A and B exist such that P( A) 0.45 and P( B) 0.32 . Find the minimum and maximum
c) The probability that an employee of the company does not have a college degree is 0.75
probabilities for P( A ‰ B) .

21 22
Question 10 5. PROBABILITY PART II
Consider the events A and B such that P(A) = 0.7 and P(B) = 0.35. Find the smallest and largest values of
3 $ŀ%  Question 1
Let A and B denote events in the sample space of a random experiment where P A 0.4 and P B 0.5 .
Question 11
State whether the following statements are true or false:
A and B are events such that P A = 0.70, P A  B = 0.40 and P B = 0.345. Which one of the following
a) 0.1 d P A ˆ B d 0.4
is false? b) P A ‰ B d 0.8
a) P A  B 0.045
c) P B | A d 1
b) P A * B 0.700

c) P A  B 0.255 Question 2
A and B are events such that P A 0.6 , P A ˆ B 0.3 and P B 0.45 .
d) P A * B 0.645

e) P A ‰ B 0.745
2.1 Calculate P B | A

2.2 Calculate P A | B
Question 12
A study reported that there were 67000 purchasing managers who were male and 33000 who were female.
There were also 245000 financial managers who were male and 150 000 who were female. What is the Question 3

probability that an individual randomly selected from the 495 000 individuals is either a purchasing manager B Com students at a university are split into two groups depending on whether they plan to become

or a male? chartered accountants (CA) or not (Other). The table below indicates the breakdown of students according to
career plan and gender.

Questions 13
A South African wine club has classified its last 200 customers’ orders according to the criteria in Table 2 Other (A) CA (B)

below. Female (F) 150 322


Male (M) 126 276

Age of customer
Type of wine bought
Under 30 30 to 50 Over 50 3.1 Calculate P A | M

South African 99 28 16 3.2 Calculate P B | M


French 3 1 18 3.3 Calculate P F | A
German 15 3 9
3.4 If two students were selected sequentially at random what is the probability that the first student does
Other 2 5 1
not want to become a chartered accountant and the second student wants to become a chartered
accountant.
13.1 What is the probability that a random customer from the above sample ordered French wine or is
between 30 and 50 years of age?
13.2 What is the probability that a random customer from the above sample did not order German wine
and is 30 years of age or more?
23 24
Question 4 Question 9
A courier company has two vehicles available for local deliveries. Because of the demand on their time and A certain article is visually inspected by two inspectors. When a defective article comes through, the
the chance of mechanical failure, the probability that a specific vehicle will be available when needed is probability that it is overlooked by the first inspector is 0.05. Thirty percent of those overlooked by the first
0.85. The availability of one vehicle is independent of the other. inspector will also be overlooked by the second inspector. What fraction of defectives will be overlooked by
both inspectors?
4.1 In the event of large orders, what is the probability that both vehicles will be available?
4.2 What is the probability that none of the vehicles will be available? Question 10
Two stores (1 and 2) each sell two brands of microwave ovens (A and B). These are the only shops selling
Question 5 these ovens. The probability that someone shops at Store 1 is 0.6. The probability that someone shopping at
Buyers of large quantities of goods from a supplier often use sampling inspection schemes to judge the Store 1 buys Brand A is 0.4. The probability that someone shopping at Store 2 buys Brand A is 0.25. What
incoming quality of the goods. Suppose that an inspector incorrectly rejects 2% of good shipments and it is is the probability that the microwave was purchased at Store 2 if someone bought a Brand B microwave
known that 5% of all shipments are of inferior quality. Calculate the probability that a shipment is good and oven?
is rejected.
Question 11
Question 6 A study is conducted to determine the attitudes of nurses in a hospital to various administrative procedures
In recent years affirmative action commitments by industrial organisations have led to an increase in the that are currently employed. If a sample of 10 nurses is selected from a total of 25, how many different
number of women in executive positions. Suppose a company has vacancies that are to be fulfilled by samples can be selected?
randomly selecting two people from a list of seven people. On the list are three women and four men, all of a) 2.5
whom have worked for the company for a long period of time. b) 250
c) 3268760
6.1 What is the probability that at least one of the women will be selected? d) 6537520
6.2 What is the probability that no woman will be selected? e) 118617000

Question 7 Question 12
Experience has shown that 65% of the time a particular union-management contract negotiation has led to a A manager needs to assign ten workmen to ten different jobs.
contract settlement within a week, 45% of the time the union strike fund has been adequate to support a
strike, and 35% of the time both conditions have been satisfied. If the union strike fund is adequate to 12.1 In how different ways can the ten men be assigned to ten different jobs?
support a strike, what is the probability of a contract settlement being awarded? Is settlement of a contract 12.2 Suppose there are only 6 different jobs available for the ten men. In how many different ways can six
within a week dependent on whether the union strike fund is adequate to support a strike? men be selected from the ten men to perform the different jobs?
a) 36
Question 8 b) 60
A man takes either a Metrobus or Metrorail to work, with probabilities 0.4 and 0.6 respectively. When he c) 210
takes the bus, he is late on 25% of the days. When he takes the train, he is late on 15% of the days. If the d) 720
man is late for work on a particular day, what is the probability he took the bus? e) 151200

25 26
Question 13 6. DISCRETE DISTRIBUTIONS
A team of 11 soccer players is to be selected from a squad of 23 people.

Question 1
13.1 How many different teams are possible (before positions are allocated)? A car dealer calculates the proportion of new cars sold that have been returned for the correction of defects
13.2 Once the names of the 11 team members have been announced, the captain and the vice-captain are during the warranty period. The results are shown in the following table:
named. How many distinct choices of captain and vice-captain are there (from the 11 members)?

Number of returns 0 1 2 3 4
Question 14 Proportion 0.20 0.36 0.33 0.07 0.04
A fast-food chain is opening four new outlets – one in each of Johannesburg, Cape Town, Durban and Port
Elizabeth. Each of these new outlets requires a new manager. If the fast-food chain currently has seven 1.1 Verify that the data in Table 1 represent a probability mass function
employees who are eligible for these positions, in how many ways can the four new managers be chosen 1.2 Find the mode, mean and standard deviation of the number of times new cars are returned for
from the seven employees and assigned to the new outlets? correction of defects
a) 24
b) 35 Question 2
c) 840 2.1 Which one of the following is a valid probability mass function?
d) 240
­§ 3 · x
e) 5040 °¨ ¸ x 0,1, 2,!
a) p x ®© 5 ¹
° 0
¯ otherwise
Question 15
­ x 1
A woman has logged on to her favourite sports betting website. She is going to try to pick the winner in a ° x 1,3,5
b) p x ® 6
horserace, a soccer tournament and a bizarre Nordic version of male synchronised swimming which, for °̄ 0 otherwise
some reason, takes place in the fjords during winter. In how many different ways can she pick the three
­1
° x 0,1, 2,3
winners (i.e. winning horse, winning soccer team and winning synchronised swimming team) if there are 7 c) p x ®3
°̄ 0 otherwise
horses in the horserace, 12 soccer teams in the tournament and 6 teams of male synchronised swimmers?

­3
° x 2, 4, 6,8
Question 17 d) p x ®x
°̄ 0 otherwise
In how many unique ways can the letters in the word ‘FOOD’ be arranged?
­x
° x 4,5, 6
Question 18
e) p x ®10
°̄ 0 otherwise
In how many unique ways can the letters in the word ‘STATISTICS’ be arranged?
2.2 Use the p.m.f from Question 2 to find E(X), Var(X) and P 4 d X d 5
Question 19
Student numbers at a certain university consist of 7 digits followed by a single letter. The first two digits are
the year of first registration, followed by one digit, which is either a 0 or a 1, followed by four randomly
selected digits and then the randomly selected letter. How many student numbers are possible between 2003
and 2006 inclusive?
27 28
Question 3 Questions 7
Given the following probability mass function for the random variable X and the constant c: The probability that a particular type of smoke alarm will function properly and sound an alarm in the
­ c 2, 1, 0,1
presence of smoke is 0.8. You have 2 such alarms in your home and they operate independently.
° x
p( x) ® x2
°0
¯ otherwise Show whether of the following statements are true / false?
a) The probability that both sound an alarm in the presence of smoke is 0.64
3.1 Calculate the value of c b) The probability that neither sound an alarm in the presence of smoke is 0.04
3.2 Find P( X d 0) c) The probability that at least one sounds an alarm in the presence of smoke is 0.96

Question 4 Questions 8
A telemarketer for a company that sells home security systems contacts customers and find that each contact A certain type of new business succeeds 60% of the time. Suppose that 3 such businesses open where they
will result in no sale with probability 0.85, a R5000 sale with a probability 0.10 and a R10000 sale with do not compete with each other, so it is reasonable to believe that their relative successes would be
probability 0.05. independent.

4.1 What is expected daily sales for the telemarketer? 8.1 Show that X = the number of businesses that succeed out of 3 follows a binomial distribution
4.2 What is the standard deviation of her daily sales? 8.2 Find E(X)
8.3 Find Var(X)
Question 5 8.4 Which of the following statements are false?
An investor is planning to invest in a new company. She loses 25% with a probability of 0.2 and breaks even a) The probability that all 3 businesses succeed is 0.216
with a probability of 0.25. If she gains, her rate of return will be either 25% or 50% with probabilities of b) The probability that exactly 1 business succeeds is 0.064
0.35 and 0.2 respectively. c) The probability that all 3 businesses fail is 0.784
d) The probability that at least 2 business fail is 0.352
5.1 Find her expected rate of return
5.2 Calculate the probability that her rate of return in this investment will be more than 10% Question 9
A University student visits her favourite website on average 2 times a day including weekends.
Questions 6
It is known that 90% of customers who purchase LCD Televisions will not claim against the guarantee Which of the following statements are false?
throughout its duration. Suppose that 4 customers buy an LCD television set from a certain appliance dealer. a) The expected number of times she visits this website per week is 14
b) The probability that in any given week she fails to visit her favourite website is e-14
6.1 Which of the following statements are true? c) The probability that in any given week she visits this website more than 10 times but not more
a) The probability that at least 3 customers will claim against guarantee is 0.0037 than 12 times is 0.1828
b) The probability that at least 2 but not more than 3 customers will claim is 0.0522 d) The probability that she visits this website at most two times a day is 0.2907
c) The probability that all customers will claim against guarantee is 1 e) The probability that she visits this website more than 2 times a week is 0.7093
d) The probability that only 3 customers will not claim against the guarantee is 0.2916
6.2 What is the expected number of customers who will not claim against the guarantee?

29 30
Question 10 7. CONTINUOUS VARIABLES AND EXPONENTIAL DISTRIBUTION
If electric power failures in a certain city occur according to a Poisson distribution with an average of 4
failures every 20 weeks, calculate the probability that there will be more than two failures in a particular Question 1
week. Unleaded 95 petrol is delivered to a garage every Sunday morning. The weekly demand (in thousands of
litres) for this petrol blend has the following cumulative distribution function:
Questions 11
A prominent small scale tailor in town uses two sewing machines, a new machine and an old machine which ­0 x2
°1
regularly breaks down. On average this old machine breaks down three times during a given week. °
® (3 x  x ) 2d x3
2
F ( x)
°18
°¯1 xt3
11.1 What is the expected number of breakdowns during a 4-week time period?
a) 1.3
b) 3 Which of the following probabilities is / are false?
c) 4 a) P( X 2) 0

d) 12 b) P ( X t 1.5) 1
e) 24 c) P( X  2.5) 0.7639
11.2 The probability that in any given month there will be more than 10 but less 13 breakdowns is? d) P(1.5  X  2.5) 0.2083
a) 2.7600 u 10 4 e) P( X ! 3) 0.4444
b) 0.2287
c) 0.3335
Question 2
d) 0.7713
A continuous cumulative distribution function F(x) is defined as follows:
e) 0.9997

­0 x 1
Questions 12 °1
°
® ( x  1) 1d x d 3
3
F ( x)
For each of the following scenarios, identify the distribution and its parameters. Justify your answer and
°8
comment on any assumptions you made. °¯1 x!3

12.1 The number of people entering the intensive care unit of a 24/7 private hospital in a certain city with Which of the following probabilities is / are correct?
an average of 14 patients per week. a) P(1 d X d 2) 0.125
12.2 For a certain life policy, an insurance company requests applicants to take a medical exam. b) P(2.5  X  3) 0.5781
Experience shows that only 65% of the applicants pass the medical exam. In the previous month only
c) P( X 2) 0.125
8 people had applied for the life policy.
d) P( X ! 4) 0
12.3 A manufacturer of a small electronic desk calculator knows from experience that 2% of all
e) P( X t 1.5) 0.9844
calculators manufactured are defective and have to be replaced under the warranty. The School of
Commerce purchases 12 calculators for use by its accounting students.
12.4 The number of complaints that a laundromat receives, with an average of 3.5 complaints per day.

31 32
Question 3 Question 6
The length of time (in weeks) between sales made by a car salesman is modelled as an exponential random In a busy car showroom, suppose the arrival of customers is Poisson distributed with an average arrival rate
variable with a cumulative distribution function: of 10.2 per hour.

F t 1  e 2.5t for t ! 0 and O ! 0 6.1 What is the mean and standard deviation (in minutes) of the waiting time between arrivals of
customers?

3.1 Find the mean and variance of waiting time between car sales in days.. 6.2 Which one of the following statements is incorrect?

3.2 If the salesman just sold a car, what is the probability that he will make his next sale within a week? a) The probability that at least 25 minutes will elapse between arrivals is 0.0143

3.3 What is the probability that the salesman goes for more than 2 weeks without a sale? b) The probability that at least 12 minutes will elapse between arrivals is 0.1300
c) The probability that at most 15 minutes will elapse between arrivals is 0.9219

Question 4 d) The probability that more than 12 minutes but less than 25 minutes will elapse between arrivals

The length of time (in hours) between accidents in a textile industry that operates 8 hours a day is modelled is 0.8903

as an exponential random variable with a cumulative distribution function as follows: e) The probability that exactly 25 customers will arrive in a two hour period is 0.0490

9 Question 7
 t
F t 1 e 100
for t ! 0 and O ! 0 The lifetime of an energy saving bulb follows and Exponential distribution with a mean lifetime of 8000
hours.
Which of the following statements is / are false?
a) The average number of accidents per day is 0.72 7.1 The probability that a bulb will last for more than 10 000 hours is:
b) The average waiting time between accidents is approximately 11.11 hours a) 0.2865
c) The variance of the waiting time between accidents is approximately 123 hours b) 0.4493
d) If an accident just occurred, probability that another will happen in the next 10 working hours is c) 0.5507
0.5934 d) 0.7135
e) If an accident just occurred, probability that another one will not happen in the next 15 working e) 1
hours is 0.7408 7.2 If the manufacturer wants to ensure that less than 1 in 1000 bulbs fail before 2500 hours, what is the
approximate lowest mean lifetime he can allow his bulbs to have?
Question 5
On examining a company’s computer network, a software engineer has determined that the number of faults Question 8
on the network, per hour, follows a Poisson distribution with mean 3. A technician has just dealt with a Trains arrive at a station at 15 minute intervals starting from 3A.M according to a Poisson process.
reported fault. What is the probability (to four decimal places) that he will wait no more than 10 minutes for
the next reported fault? 8.1 What is the average number of trains that arrive at this station between 3A.M and 5A.M?
8.2 Find the variance of the waiting time between trains.
8.3 If a passenger arrives at the station at 09h00, find the probability that he has to wait for the train until
after 09h06.
8.4 If a passenger arrives at the station at 09h00, find the probability that the train will come between
09h10 and 09h15.
33 34
Question 9 8. NORMAL DISTRIBUTION AND CENTRAL LIMIT THEOREM
A specific make of aircraft engines are known to have lifetimes that are exponentially distributed. The
average number of hours to failure is known to be 15 000. A twin-engine aircraft can still fly if at least one Question 1
engine is operational. If the lifetimes of the engines are independent, what is the probability (to four decimal The scores on a national achievement test are normally distributed with mean and standard deviation equal
places) that an aircraft fitted with these engines will still be able to fly after 9 400 hours? to 500 and 100 respectively.

Question 10 State whether the following statements are true or false


Suppose that a company receives, on average, 150 telephone calls per hour in any given work-day according a) The proportion of students with scores exceeding 650 is 0.0668
to a Poisson process. b) The proportion of students with scores not exceeding a score of 550 is 0.6915
c) The proportion of students with scores greater than 550 but less than 650 is 0.7583
10.1 What is the expectation and variance of the waiting time (in minutes) between calls? d) The probability that the average score for 35 students is between 550 and 560 is 0.00131
10.2 If a call was received at 14:25, find the probability that the next call will be received after 14:27. e) If 60% of the students could not beat John’s score, what was John’s score?
10.3 What is the probability that the telephone operator will wait between 2 and 5 minutes for the next
call? Question 2
A public utility noted that the monthly electricity usage (measured in kilowatt-hours) by certain residential
Question 11 users is normally distributed with a mean of 1040.
At a small retailer the waiting time between customers follows an exponential distribution with a mean of 20
minutes. 2.1 If the monthly electricity usage of a sample of 15 randomly selected users yielded a variance of
6500, what is the probability that their mean electrical usage exceeds 1100?
11.1 What is the probability that, during a particular one-hour interval, he has more than two customers? 2.2 Suppose the public utility also noted that the monthly electricity usage by residential users has a
11.2 If a customer just arrived, what is the probability that another customer will arrive after 40 minutes population standard deviation of 78. Find the probability that the average monthly electricity usage
of waiting for 35 randomly selected users does not exceed 1080.

Question 12 Question 3
A local municipality installed electric tubes in various streets whose lifetime has a standard deviation of IQ scores are normally distributed with a mean of 100 and a variance of 225. Suppose that a country
1200 hours. What is the time beyond which eighty percent of the bulbs will continue to work? provides special education for students with IQ scores that are in the lowest 5% of the population and
university education for students with IQ scores that are in the top 7% of the population.
Question 13
Two vendors sell food at a seaside resort during the summer season. One vendor runs a hot-dog stall and the 3.1 What would the cut-off IQ score in this country be for students requiring special education?
other runs an ice-cream stall. Customers arrive at the hot-dog stall at the rate of 15 per hour. Customers 3.2 What would the cut-off IQ score in this country be for students entering university?
arrive at the ice-cream stall at the rate of 35 per hour. The number of customers arriving per hour at either 3.3 Suppose that the variance of the IQ scores is not known so a sample of 28 students was taken and the
stand follows independent Poisson processes. At 14:00 one customer arrives at each vendor. What is the standard deviation of their IQ scores was found to be 16. Calculate the probability that their average
probability (to 4 decimal places) that both vendors will wait for less than 10 minutes for the next customer to IQ is less than 105.
arrive?

35 36
Question 4 Question 8
The switchboard at a company receives calls according to a Poisson process. The average waiting time Dry battery cells are known to have a mean lifetime of 12 hours. A sample of 100 such batteries was tested.
between calls is 3 minutes. Approximate the probability that the average of 40 waiting times for calls does What percentage of such cells is expected to have a mean lifetime exceeding 15 hours?
not exceed 2.8 minutes.
Question 9
Question 5 In a grocery store, the mean expenditure per customer is R1000. A random sample of 25 customers was
Suppose monthly cell phone bills are normally distributed with a mean of R300 and standard deviation s. If taken and yielded a mean of R950 and a standard deviation of R300. Assume that expenditure is
the cellular phone company requires that the monthly cell phone bill in Rands is within R100 of R300 with approximately normally distributed.
98% probability, determine the maximum value of s. HINT: Within R100 of R300 means that the bill could
be R100 less than R300 or R100 more than R300. Hence we are told that P(200 < X < 400) = 0.98 where X 9.1 Suppose 10% of the customers’ average expenditure exceeded a certain Rand value x. Find x.
= monthly cell phone bill. 9.2 Which of the following probabilities are correct?
a) The probability that the customers’ average expenditure is below R800 is 0.00048
Question 6 b) The probability that the customers’ average expenditure is above R900 lies between 0.75 and 0.9
A company that sells retirement annuities bases the annual payout on the probability distribution of the c) The probability that the customers’ average expenditure is between R900 and R1100 lies 0.8 and
length of life of its members. Suppose the lifetimes of the members are normally distributed with mean 68 0.9
years and variance 12.25 years2. d) The probability that the customers’ average expenditure is between R850 and R1150 lies
between 0.98 and 0.99
6.1 Which of the following are incorrect? e) The probability that the customers’ average expenditure is less than R1100 lies between 0.94 and
a) The proportion of members that would receive payments beyond age 70 is 0.7157 0.96
b) The proportion of members that would receive payments beyond age 75 is 0.9772
c) The proportion of members that would receive payments between age 70 and 75 is 0.2615 Question 10
d) The proportion of members that would receive payments before age 69 is 0.5753 The product specification for 140g packets of sweets states that 90% of the packets must weigh below 145g.
e) The proportion of members that would receive payments between age 65 and 70 is 0.5208 The packet weights are normally distributed with a variance of 4g2.
6.2 Below what age would 15% of the members receive payment?
6.3 If a random sample of 50 members were randomly selected, what is the probability that their mean 10.1 What is the mean weight of the packets of sweets?
age to receive payment exceed 69? 10.2 Twenty-five such packets were randomly selected and yielded a mean of 135g and a variance of
3.80g. Find the proportion of packets with an average weight below 141g.
Question 7
Which of the following statements about the sampling distribution of the sample mean is incorrect? Question 11
a) Regardless of the distribution that the parent population follows, the sampling distribution of the An investor believes that the annual return on the JSE All Share Index is adequately described by a normal
mean is approximately normal, provided that the sample size is sufficiently large ( n t 30 ) distribution with mean 0.08 and variance 0.0049.
b) The sampling distribution of the mean is generated by repeatedly taking samples of size n from
the same population and computing the sample means 11.1 What is the probability that the index yields a positive return?
c) The mean of the sampling distribution of the mean is equal to P 11.2 What is the probability that the return on the index is between –0.07 and 0.03?

d) The variance of the sampling distribution of the mean is equal to V 2

37 38
Question 12 9. CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
A local bus service determined that the distance travelled annually by each bus is normally distributed with a
mean of 400 000 km. In a particular year, 20 buses were selected and they yielded a mean of 380 000 km Question 1
with a standard deviation of 10 000. The daily circulation of a particular newspaper is known to be normally distributed with a mean of 192 000
and a standard deviation of 12 000. In the weeks leading up to a national election the newspaper runs special
12.1 State whether the following statements are true/false. Justify your answer in each case. sections on issues surrounding the elections. The editor would like to know whether the special sections
12.2 The probability that the mean distance exceed 405 000 km lies between 0.01 and 0.025 have changed readership. A sample of 7 days yielded a mean circulation of 183 450 and a standard
12.3 The probability that the mean distance is below 408 000 km lies between 0 and 0.005 deviation of 14 680. Test the appropriate hypothesis at the 10% level of significance.
12.4 Find the minimum distance travelled by the top 10% of the buses.

Question 2
Question 13 The amount of nicotine in cigarettes is assumed to follow a normal distribution with mean and standard
The organising committee of an end-of-year party at a company decides to hire a juice dispensing machine deviation of 15 and 2 milligrams respectively. A cigarette manufacturer claims that the mean amount of
to provide fresh juice for the non-alcoholic staff. The dispensing machine can hold a maximum of 12 litres nicotine in cigarettes sold under its brand is below the industry standard. A sample of 32 cigarettes from this
of juice when filled to capacity. The volume of juice dispensed by the machine per cup has a normal manufacturer yielded a mean of 16.4 milligrams of nicotine.
distribution with mean 250 ml and variance 225ml2.

2.1 The appropriate hypotheses for this test are:


If the machine is filled to capacity and 47 cups of juice are dispensed at the party, then the probability that a) H 0 : P 16.4 vs H 1 : P  16.4
the machine will run out of juice on that day is approximately (to 4 decimal places):
b) H 0 : X t 15 vs H 1 : X  15
a) 0.0075
b) 0.4364 c) H 0 : P d 15 vs H 1 : P ! 15

c) 0.4925 d) H 0 : P 15 vs H 1 : X  16.4
d) 0.5636 e) H 0 : P 15 vs H 1 : P  15
e) 0.9925

2.2 If you are to test the hypothesis in Question 3 at the 5% level of significance, then the rejection
Question 14
region for this test will be:
The weight of luggage that aircraft passengers take with them is distributed with a mean of 30kg and a
a) 1.6449; f
standard deviation of 5kg. A certain type of aircraft carries 100 passengers. What is the probability that the
b)  f;1.6449
total weight of the passengers’ luggage exceeds 2850kg?
c)  f;1.960
d) 1.6955; f
e)  f;1.960

39 40
Questions 3 Question 7
From historical data the scores for a university entrance examination in Mathematics are known to be A crime analyst suspects that the number of car thefts in City A is significantly greater than the number of
normally distributed with a mean score of 54 and a standard deviation of 7.5. The university suspects that car thefts in City B. He examines the number of thefts on 26 randomly selected days in City A and finds a
their current students will perform differently from those in previous years. A random sample of 48 current mean of 32.8 and a standard deviation of 12.44. Based on 36 randomly selected days in City B he finds a
students obtained a mean score of 58. mean of 26.5 thefts with a standard deviation of 14.62.

3.1 Find the 95% confidence interval for the mean score for this university entrance examination. 7.1 Perform the appropriate test at the 10% significance level. You may assume that the number of
3.2 Use the CI from Q3.1 to test if the current students perform differently to students from previous thefts on any given day in both cities is normally distributed.
years. 7.2 The 99% confidence interval for the difference in average number of car thefts is ???, and it is
interpreted as follows:
Question 4 a) We are 95% confident that the true mean profit is between 0.52 and 0.66
The manager of the branch of a particular insurance company believes that his branch processes claims b) 95% of the students get between 52% and 66% of their tuition paid for by financial aid
significantly faster than the company in general. The average time taken to process a claim is, in general, 12 c) We are 95% confident that between 52% and 66% of the sampled students receive some sort of
business days. The manager randomly selects 40 claims processed by his branch and finds a mean time of financial aid
8.82 business days with a standard deviation of 3.077 days. Assuming the time taken to process car d) We are 95% confident that 59% of the students are on some sort of financial aid
insurance claims follows a normal distribution perform the appropriate test at the 1% significance level.
Question 8
Question 5 The manager recorded the weekly production rates of 12 randomly selected flexi-time and 15 randomly
Profit per sale of pre-owned cars for the past week were R2 100, R3 000, R1 200, R6 200, R4 500 and R5 selected full-time (standard 40 hour week) employees. All sampled employees performed an identical task.
100. Assume profits are approximately normally distributed.
Flexi-time employees Full-time employees
Mean 49.3 43.8
5.1 Rounding all calculations to 2 decimal places, find the 90% confidence interval for the mean profit.
Standard deviation 3.9 3.7
5.2 Does the data present sufficient evidence to indicate that the average profit is more than
R 4 800? Test at a 1% level of significance.
8.1 Find the 95% confidence interval for the difference between the mean production rates for flexi-time
Question 6 and full time employees. Let the Difference = Flexi-time – Full-time.
The average drying time of a manufacturer’s paint is advertised as less than 20 minutes after a chemical 8.2 To test the hypothesis that the productivity rate of the flexi-time employees is different from that of
modification in the composition of the paint. A random sample of 20 tins of these paints is selected and the the full time employees, the appropriate hypotheses are:
average drying time turns out to be 19 minutes with a standard deviation of 2.5 minutes. Assume that the a) H 0 : P d 0 vs H1 : P d ! 0
drying times of the paint follow a normal distribution. b) H 0 : P d t 0 vs H1 : P d  0

c) H 0 : P1 t P 2 vs H 1 : P1  P 2
6.1 Find the 95% confidence interval for the average drying time of the manufacturer’s paint.
d) H 0 : P1  P 2 0 vs H1 : P1  P 2 z 0
6.2 Test the manufacturer’s statement made in the advertisement at the 5% level of significance.
e) H 0 : x1 x2 vs H1 : x1 z x2

8.3 Using your confidence interval in Q8.2 to test if there is a difference in the mean production rate for
the two methods.
41 42
Question 9 10.1 The 90% confidence interval for the mean difference in the test results is:
The operations manager of a hardware chain believes that their Parrow branch is performing differently a) (– 4.197; 31.197)
from their Wynberg branch in terms of the average size of the orders received by each branch. The b) (1.990; 25.010)
accountant was asked to verify his statement at a 5% level of significance. She therefore sampled 10 orders c) (4.988: 22.012)
from Parrow and 13 orders from Wynberg and obtained the information summarised below. Assume that the d) (6.411; 20.590)
populations have a normal distribution and that the variances are equal. The summary statistics are as e) (7.550; 19.450)
follows: 10.2 Interpret the confidence interval from Q10.1.

Parrow Wynberg
Question 11
n 10 13
An economist suspects that company profit warnings result in a significant decrease in the share price. It is
Mean R 13.16 R 11.57
assumed that this change in share price follows an approximate normal distribution. To test his idea, the
Standard deviation R 2.58 R 3.23
economist performs an event study on the share prices of ten companies listed on the JSE. The even study
consists of recording the share price the day before the profit warning is issued and then again the day after
9.1 Find the 98% confidence interval for the differences in average size of the orders received by the two the profit warning is issued. The data are as follows:
branches.
9.2 Based on the confidence interval from Q9.1 the conclusion that the accountant arrived at using Company A B C D E F G H I J
D 0.02 is: Share price before 1230 1690 890 1400 1120 1560 1850 1840 1210 1100
a) Sufficient evidence exists to support the claim that the Parrow branch is performing better than Share price after 1260 1640 860 1460 1020 1560 1810 1770 1180 1020
the Wynberg branch
b) Sufficient evidence exists to support the claim that the Wynberg branch is performing better than 11.1 Perform the appropriate test using a 5% significance level. Let D = Share price before – Share price
the Parrow branch after.
c) Sufficient evidence exists to support the claim that the branches are performing equally well 11.2 Construct a 95% confidence interval for the difference
d) Insufficient evidence exists to support the claim that the Parrow branch is performing differently 11.3 Use the confidence interval to if company profit warnings have an impact on the share price.
from the Wynberg branch
e) Insufficient evidence exists to support the claim that the branches are performing equally well Question 12
An auditor is examining bad debts at the credit card division of a bank. In a sample of 128 accounts he finds
Question 10 8 that have defaulted on payment (and become bad debts).
A group of four students taking a short course write a test at the beginning of the course and another test of
similar standard at the end of the course. The two tests cover the same material. It is assumed that the results 12.1 Determine a 95% confidence interval for the proportion of credit card accounts at the bank that are
for each test are normally distributed. The test marks of the four students are given below. Let Difference = bad debts.
Test 2 – Test 1. 12.2 A bank manager claims that 4% of the accounts are bad debts. The auditor believes that the
proportion of bad debts is, in fact, higher. Perform the appropriate test at the 10% level of
Student number: 01468W 00991N 01233R 97508L significance. Be sure to check any assumptions.
Test 1 56 45 49 31
Test2 76 64 54 41

43 44
Question 13 10. p-VALUES AND CHI-SQUARED DISTRIBUTION
A cosmetic company just introduced a new product and wishes to assess the exposure of their product in
their target market. The marketing manager states that approximately 35% of potential users are aware of the Question 1
new product. A random sample of 50 potential users was taken and only 18 were aware of the product. A law firm believes that its staff works on average 8 hours per business day. The personnel manager doesn’t
believe that this is the case. Suppose that the number of hours employees work is normally distributed. She
13.1 Find the 98% confidence interval for this data. randomly selects 6 employees and recorded the number of hours they worked:
13.2 An employee at the company disagrees with the marketing managers statement, does the data
provide sufficient evidence to support the employees claim at a 1% level of significance. Employee 1 2 3 4 5 6
Number of hours worked 8.1 8.0 8.0 7.8 7.6 7.3
Question 14
A biologist has developed an insect spray designed to kill more than 60% of a certain type of insect. Perform the appropriate hypothesis test and use p-values to test the manager’s belief.
Suppose that 200 such insects were sprayed and 130 of them were killed.

Question 2
14.1 Find the 90% confidence interval for the proportion of insects killed. A drug manufacturer has installed a machine which automatically fills 5 mg of the active ingredient of a
14.2 Would you conclude that his insect spray was satisfactory? Test at the 1% level of significance. certain drug in each container. A random sample of 10 containers was taken and was found to contain 5.02
mg (of the active ingredient) on average with a standard deviation of 0.02 mg. Is there evidence that the
Question 15 machine is dispensing more than 5 mg of active ingredient of the drug in each container? Use p-values.
A confidence interval was used to estimate the proportion of statistics students that are females. A random
sample of 72 statistics students generated the following 90% CI: (0.438, 0.642). Question 3
A clinical researcher is interested in the mean time to pain relief for a particular medicine. He selected a
Based on the constructed interval, is the population proportion of females different from 0.60? random sample of 50 headache sufferers and their mean time to relief was 2.3 minutes with a standard
a) No, and we are 90% sure of it deviation of 55 seconds. Assume the time to pain relief is normally distributed with a standard deviation of
b) No. The proportion is 54.17% 53 seconds. The manufacturer of this pain relief medicine claims that its product brings pain relief to
c) No, 0.60 is a believable value of the population proportion based on the information above headache sufferers in less than 2.5 minutes, on average. Test the manufacturer’s belief using p-values.
d) Yes, and we are 90% sure of it
e) No, and we are 10% sure of it Question 4
Ten athletes ran a 400 m race at sea level and at a later meeting they ran another 400 m race at high altitude.
Their times in seconds were as follows:

Runner 1 2 3 4 5 6
Sea level 43.3 42.6 44.2 45.3 43.8 46.1
High altitude 45.4 42.3 45.8 47.3 42.7 49.5

Use p-values to test whether the athletes’ performances are affected by the altitude.

45 46
Question 5 Question 8
Measurements are made of the tensile strengths of 10 steel rods, half of which have received special Employees of a large company are able to choose any one of three health care plans. The choices of plan and
treatment and the others have not received any treatment. The results are as follows: job categories of a random sample of 200 employees are recorded as follows:

No treatment 2.8 2.4 2.9 2.4 1.6 Health care plan


Special treatment 3.1 3.3 3.8 2.2 2.9 Job category Plan A Plan B Plan C
Administration 6 10 4
5.1 Does the special treatment change the tensile strength? Use p-values. Office 14 10 6
5.2 State any assumptions made. Line 40 40 70

Question 6 The HR officer believes that there is an association between the choice of a health care plan and job
A company manager wants to show that night-shift workers take longer, on average, to complete common category.
tasks than their more heavily supervised day shift counterparts. She observes a specific task being performed
at randomly selected times from the second hour in each shift. The summary statistics for the two shifts are 8.1 Test the appropriate hypothesis at a 5% level of significance.
as follows: 8.2 What is the p-value for this test?

Day Shift (A) Night Shift (B) Question 9

Number of observations 20 15 Two groups of people were asked what their opinion was of a television programme about farming. For a

Mean (in minutes) 25 35 sample of 60 city dwellers, 35% liked the programme while only 9 out of a sample of 40 rural dwellers liked

Standard deviation (in minutes) 10 15 it. Is there evidence that their opinion of the programme and area of residence are related?

9.1 Construct a contingency table using the information provided.


Assuming the true standard deviations are equal, use p-values to test whether night shift workers take longer 9.2 Test the appropriate hypothesis at a 5% level of significance.
on average, to perform the task than day shift workers. 9.3 Determine the p-value for this test.

Question 7
A medical investigator believes that more than half the elderly patients given anesthetics for operations
suffer from complications. On examining medical records, he found that 26 out of a random sample of 50
patients in fact had complications. Is there evidence that his claim is justified? Use p-values.

47 48
1. Descriptive Statistics Value Frequency Relative Frequency
Question 1 1 5 0.20
1.1 Mean = 1.88; Median = 2; Range = 5; Variance = 2.193
1.2 Frequency Table: 2 3 0.12

Value Frequency Relative Frequency 3 8 0.32

0 4 0.16 4 4 0.16

1 8 0.32 5 5 0.20

2 6 0.24 Total 25 1

3 3 0.12
6.3 E
4 2 0.08
(D is false because different managers cannot supervise the same employees)
5 2 0.08
Question 3
Total 25 1 3.1 D1 = 2; IQR = 1
3.2 22%
Question 4
1.3 Mean = 1.88; Median = 2; Range = 5; Variance = 2.193. The values are exactly the same as this is
A
grouped discrete data, i.e. all the raw data are reflected on the table, no data is lost due to
grouping. Question 5

1.4 Bar Chart: 5.1 Mean = 23.43; Variance = 25.36; Range = 23 (Raw data)
5.2 Frequency Table:
9
8 Frequency Cumulative Frequency
8

7
[10;15) 2 2
6
6 [15; 20) 3 5
Frequency

5
4
[20; 25) 13 18
4
3 [25; 30) 9 27
3
2 2
2 [30; 35) 3 30

1 Total 30
0
0 1 2 3 4 5

Question 2
6.1 Mode = 3; CV = 46.01%
6.2 Frequency Table:

49 50
5.3

Ogive Question 8
35 a) False: the data is skewed, best measure of central tendency is median, variability is IQR
30 b) False: median is in 3rd class interval, position = 55.5
30 27 c) False: 8 is included, i.e. they log on 8 or more times
25
d) False: the second decile is in position 22.2

20 18
Question 9
15
9.1 B
10 9.2 P68 = 622.303
5
5 2
0
0 Question 10
0 5 10 15 20 25 30 35 40
a) False: At most 45
Days b) False: No more than 60
c) True
5.4 Mean = 23.83; Variance = 25.75; Range = 25 (Grouped continuous data)
d) False: Mode = 42.5 (in class 40-45)
e) False: The class endpoints (upper-limits) are plotted against the cumulative frequencies
Question 6
6.1 Mode = 82.5; IQR = 11.83 (or 11.26 - rounding difference) Question 11
6.2 Frequency Polygon
D
30 27
25 Question 12
25
Frequency Percentage

12.1 181.5
20 17 12.2 E
15 13
10 Question 13
10 7
13.1 n = 16; Modes = 19 and 20; CV = 27.89%
5
0 1 0
13.2 Mean = 28.25 > Median = 26.5, the distribution is skewed to the right (positively
0 skewed) OR The data is tailed to the right; therefore, it is skewed to the right.
57.5 62.5 67.5 72.5 77.5 82.5 87.5 92.5 97.5
Driver Speed Question 14
14.1 D
6.3 Mean = 76.04 < Mode = 82.5, the distribution is negatively skewed OR 14.2 Range = 50; Variance = 192.76; Upper quartile (75th percentile) = 89
The distribution is tailed to the left therefore it is negatively skewed.
Question 15
The best measure of variability is the Interquartile Range as it is not affected
by extreme values unlike the variance (or standard deviation) 15.1 Mean = 53.76; Mode = 49; P80 = 65.8
Question 7 15.2 True / False:
a) False: Data is skewed best measure of variability is IQR, of central tendency is median
a) True. If n = 60, then 30 of the students scored less than 50 in their profile
b) False: Median has 50% of the values below it
(0.04+0.09+0.16+0.21 = 0.5; 0.5*60 = 30)
b) True. If n = 25, then the number of applicants in the top 25% is 0.25*25 = 6.5 c) False: Position is between 9 and 10, P35 = 49
approximately 7. d) False: this is a sample, so the values are statistics
c) True. If n = 100, then 25 applicants would score in that range.
(0.09+0.16 = 0.25; 0.25*100 = 25)
d) True. If n = 131, number of applicants who scored less than 40 is – 0.04+0.09+0.16 = 0.29;
0.29*131 = 37.99 – approximately 38.
51 52
2. Correlation and Regression d) True

Question 1 Question 5
1.1 B 5.1 E
1.2 A negative, nonlinear relationship between X and Y. The regression line is used to model linear 5.2 B
relationships and would not be appropriate. 5.3 55.63%
(The relationship is almost exponential, not linear) 5.4 Yes, Interpolation

Question 2 Question 6
2.1 Independent = hours watching TV; Dependent = Test mark; Reason – The more time you spend a) False: The coefficient of determination indicates the percentage variation in Y explained by X; the
studying; watching less TV, the higher your test marks. Therefore, your test marks depend on the correlation coefficient shows the strength and direction of a linear relationship
time spent watching TV. b) False: It regresses variable X on variable Y
2.2 B c) False: For every one-unit change in X there is a 0.5 unit change in Y
2.3 2.1803 (i = 1 refers to the first column; when x = 0, y = 96) d) False: The correlation coefficient is a number between -1 to +1, inclusive
e) False: 81% of the variation in Y is accounted for by the variation in X in the model
2.4 True / False f) False: The independent variable influences the dependent variable
a) False: cannot predict for x values outside range (extrapolation) g) False: The error term shows the difference between the observed and predicted values of the
b) False: cannot use y to predict x dependent variable
c) False: cannot just take the corresponding value from the original data. The estimated regression h) False: It is statistically valid to predict the value of Y, provided that the value of X lies in the valid
line is used and the correct answer is 69.33% range of X
d) True

2.5 r2 = 0.6939, i.e. 69.39% of the variation in test marks is explained by the hours of TV watched and
the other 30.61% is explained by other factors.

Question 3
3.1 Independent variable: shelf space, Dependent variable: weekly sales; Reason: The store with the
largest shelf space for the liquid is likely to have higher weekly sales and vice-versa.
3.2 ܻ෠ = 159.83 + 11.92 X
3.3 ݁ଷ = 4.49
3.4 B

3.5 True / False


a) False: only predict y for x, not the other way around
b) False: It indicates the proportion of variation in weekly sales explained by the shelf space in the
regression model
c) True, positively correlated
d) True: ܻ෠ = 159.83 + 11.92 * 7.5 = 249.2
e) False: cannot predict y for x values outside the range (extrapolation)

3.6 96.98%

Question 4
4.1 ܻ෠ = 8.48 – 1.32T
4.2 True / False
a) True
b) False: cannot predict y for x values outside the range (extrapolation)
c) False: only predict y for x, not the other way around

53 54
3. Time Series 4. Probability Part 1
Question 1
Question 1
4. D - An additive model is appropriate since the amplitude of this series is constant D
1.1 B
1.2 D Question 2
1.3 139.4 E

Question 2 Question 3
2.1 B 0.583
2.2 A
2.3 D Question 4
2.4 E (Multiplicative model – scoped out) a) True
2.5 D (Multiplicative model – scoped out) b) True
c) False: The correct answer is 0.33
Question 3
3.1 C Question 5
3.2 B 5.1 12
3.3 D 5.2 0.583
3.4 E
3.5 0.565 (Multiplicative model – scoped out) Question 6
3.6 133.190 (Multiplicative model – scoped out) E

Question 7
Question 4
0.58
E
Question 8
Question 5
5.1 Winter’s method would be a better exponential smoothing method to use to cater for the trend and a) False: events A and B do not entirely make up the sample space (ܲ(‫ = )ܤ ׫ ܣ‬0.8)
seasonality components b) True: ܲ(‫ = )ܤ ת ܣ‬0
5.2 Additive model, since the amplitude of the time series remains constant. ܶ௧ = ܵ௧ + ܶ௧ + ߝ௧ c) False: the events are mutually exclusive, but not exhaustive
5.3 465.996
5.4 C
Question 9
5.5 B
5.6 Q1 = -151.953; Q2 = -200.390; Q3 = 16.485; Q4 = 335.860 0.55 ൑ ܲ(‫ܣ‬ҧ ‫ )ܤ ׫‬൑ 0.87
5.7 ܶ௧ = 426.455 + 14.968‫ܻ ;ݐ‬തଵସ = 652.492 Question 10
0.05 ൑ ܲ(‫ )ܤ ת ܣ‬൑ 0.35
Question 11
D
Question 12
0.697

Question 13
ହ଼
13.1
ଶ଴଴

଺ଽ
13.2
ଶ଴଴

55 56
5. Probability Part 2 Question 10
Question 1 0.455
a) False: 0 ൑ ܲ(‫ )ܤ ת ܣ‬൑ 0.4 (NB: 40% Buy brand A at store 1 – 0.4×0.6; 25% Buy brand A at store 2 – 0.25×0.4)
b) False: 0.5 ൑ ܲ(‫ )ܤ ׫ ܣ‬൑ 0.9 Question 11
௉(஺‫ת‬஻) ଴ ௧௢ ଴.ସ
c) True: ܲ(‫= )ܣ|ܤ‬ = = 0 ‫ ݋ݐ‬1; 0 ൑ ܲ(‫ )ܣ|ܤ‬൑ 1 C
௉(஺) ଴.ସ

Question 12
Question 2
12.1 3628800
2.1 0.5
12.2 E
2.2 0.333
Question 13
13.1 1352078
Question 3
(NB: Order is not important; the order in which the players are chosen doesn’t matter – Combination)
3.1 0.313 13.2 110
3.2 0.687 (NB: Since there are defined/specific different positions, the order of selection does matter -
3.3 0.543 Permutation)
3.4 0.216 Question 14
C - (The outlets are different – PE, CT, DB and JB – Permutation)
Question 4
Question 15
4.1 0.7225
4.2 0.0225 504
Question 17
12
Question 5
Question 18
0.019
50400
Question 6 Question 19
6.1 0.714 – P(At least 1 woman) = (3C1×4C1+3C2×4C0)/ 7C2 = 0.714 2080000
= 1 – (3C0×4C2)/ 7C2 = 0.714 (complement)
6.2 0.286 – P(No woman selected) = (3C0×4C2)/ 7C2 = 0.286

Question 7
7.1 0.778 – P(Settlement | Adequate) = 0.778
7.2 Yes, P(Settlement ‫ ת‬Adequate) ് P(Settlement) × P(Adequate); (0.35 ് 0.65 × 0.45; 0.35 ്
0.2925); Since the two events are not statistically independent, they must be dependent to some
degree.

Question 8
0.526
(NB: 25% of the bus days – 0.25×0.4 = 0.1; 15% of the train days – 0.15×0.6 = 0.09)
Question 9
0.015

57 58
6. Discrete Distributions
Question 10
0.001148
Question 1
Question 11
1.1 0 ൑ ‫ )ݔ(݌‬൑ 1 for all the probabilities of ‫ ;ݔ‬σ௔௟௟ ௫ ‫ = )ݔ(݌‬0.2 + 0.36 + 0.33 + 0.07 + 0.04 = 1
11.1 D
1.2 Mode = 1; Mean = 1.39; Standard deviation = 1.0089
11.2 B
Question 2
Question 12
2.1 B 12.1 ܺ~ܲ(14) per week; assume the number of people enter the unit at a constant rate according
ଶ to a poisson process
2.2 ‫ = )ܺ(ܧ‬4.333; ܸܽ‫ = )ܺ(ݎ‬0.889; ܲ(4 ൑ ܺ ൑ 5) =

12.2 ܺ~‫(ܤ‬8, 0.65); assume applicants are independent from one another
Question 3 12.3 ܺ~‫(ܤ‬12, 0.02); defects occur independently
3.1 ܿ = 0.48 12.4 ܺ~ܲ(3.5) per day; assume the number of complaints is at a constant rate according to a poisson
3.2 ܲ(ܺ ൑ 0) = 0.52 process

Question 4
4.1 E(X) = R1 000
4.2 Standard deviation = R2 549.51

Question 5
5.1 E(X) = 13.75%
5.2 0.55

Question 6
6.1 True / False
a) True
b) True
c) False
d) True
6.2 3.6

Question 7
All three statements are true.

Question 8
8.1 All four conditions are satisfied
8.2 E(X) = 1.8
8.3 Var(X) = 0.72
8.4 Which of the following statements are false:
a) True
b) False
c) False
d) True

Question 9
a) True
b) True
c) True
d) False
e) False
59 60
7. Continuous Variables and Exponential Distribution Question 11
11.1 0.5768
Question 1 11.2 0.1353
a) True
b) True Question 12
c) True 268 hours
d) False: P(1.5 < X < 2.5) = 0.7639
e) False: P(X > 3) = 0 Question 13
0.91523
Question 2
a) True
b) True
c) False: P(X = 2) = 0
d) True
e) True

Question 3
3.1 E(X) = 2.8; Var(X) = 7.84
3.2 0.9179
3.3 0.0067

Question 4
E

Question 5
0.3935

Question 6
6.1 Mean = 5.8824; Standard deviation = 5.8824

Question 7
7.1 A
7.2 362 (Scoped out – Lambda (poisson or exponential) can be approximated by np (binomial))

Question 8
8.1 8 trains
8.2 225 minutes
8.3 0.6703
8.4 0.1455

Question 9
వరబబ వరబబ
0.7832 : 1 – P(X1 < 9400) × P(X2 < 9400); 1 െ (1 െ ݁ భఱబబబ ) × (1 െ ݁ భఱబబబ )

Question 10
10.1 Mean = 0.4; Variance = 0.16
10.2 0.0067
10.3 0.0067

61 62
8. Normal Distribution and Central Limit Theorem c) True
d) True
Question 1 e) False
a) True
b) True
c) False Question 10
d) True
10.1 142.44 grams
e) 525
10.2 0.00016
Question 2
2.1 0.005 ൑ ܲ(ܺത) ൑ 0.01
Question 11
2.2 0.9988
11.1 0.8729
Question 3 11.2 0.2227
3.1 75.325 Question 12
3.2 122.2
12.1 True / False
3.3 0.90 ൑ ܲ(ܺത) ൑ 0.95
12.2 True
Question 4 12.3 False, probability is between 0.995 and 1
12.4 402 969 km
0.3372
Question 5 Question 13
Approximately R42.9 A

Question 14
Question 6 0.9987

6.1 True / False


a) False
b) False
c) True
d) False
e) True
6.2 64.36 years
6.3 0.0217

Question 7
A – (This statement applies when the distribution is NOT normally distributed)

Question 8
0.0062

Question 9
9.1 R1 079.07
9.2 True / False
a) False
b) False
63 64
9. Confidence Intervals and Hypothesis Testing
Question 10
Question 1 10.1 C
H0: ‫ = ݑ‬192 000 H1: ‫ ് ݑ‬192 000; CV = +/-1.6449; Test statistic = -1.8851; Reject the null hypothesis;
10.2 We are 90% confident that the true mean difference lies in the interval (4.988; 22.012)
We have sufficient evidence to support the alternative hypothesis at 10% level of significance
Question 11
Question 2
2.1 E 11.1 H0: ud = 0 H1: ud > 0; CV = 1.8331 (Rejection Region ߳(1.8331; λ)); Test statistic = 1.9787;
2.2 B (error - in question 2, not 3) Reject the null hypothesis; Sufficient evidence to support the alternative hypothesis at 5% level of
significance
Question 3 11.2 (-4.438; 66.438)
3.1 (55.8782; 60.1218) 11.3 Therefore, since 0 is contained in the 95% confidence interval, it is a possible value for the
3.2 ‫ = ݑ‬54 – lies outside the CI; Therefore, we are 95% confident that the students performed difference between the 2 means. Thus, we fail to reject H0 (outside the rejection region; in the
differently from previous years (the CI does not include the previous score - 54) confidence interval)

Question 12
Question 4 12.1 (0.0206; 0.1044)
H0: ‫ = ݑ‬12 H1: ‫ < ݑ‬12; CV = -2.4258; Test statistic = -6.5363; Reject the null hypothesis; Sufficient
evidence to support the alternative hypothesis at 1% level of significance 12.2 H0: p = 0.04 H1: p > 0.04; CV = 1.2816 (Rejection Region ߳(1.2816; λ)); Test statistic = 1.29904;
Reject the null hypothesis; Sufficient evidence to support the alternative hypothesis at 10% level of
Question 5
5.1 (2116.10; 5250.56) significance that the proportion of bad debts is higher than 4%
5.2 H0: ‫ = ݑ‬4800 H1: ‫ > ݑ‬4800; CV = 3.3649 (Rejection region ߳(3.3649; λ) ; Test statistic = -1.4357;
Fail to reject the null hypothesis; Insufficient evidence to support the alternative hypothesis at 1%
Question 13
level of significance that the average profit per sale is more than R4 800
13.1 (0.2021; 0.5179)
Question 6 13.2 H0: p = 0.35 H1 : ‫݌‬଴ ് 0.35 ; CV = +/- 2.5758 (Rejection Region ߳(2.5758; λ) ܽ݊݀ (െλ; െ2.5758))
; Test statistic = 0.1482; We fail to reject the null hypothesis; Insufficient evidence to support the
6.1 (17.83; 20.17)
alternative hypothesis at 1% level of significance
6.2 H0: ‫ = ݑ‬20 H1: ‫ < ݑ‬20; CV = -1.7291 (Rejection Region ߳(െλ; െ1.7291)); Test statistic = -1.7889;
Fail to reject the null hypothesis; Insufficient evidence to support the alternative hypothesis at 5%
Question 14
level of significance 14.1 (0.5945; 0.7055)
14.2 H0: p = 0.6 H1: p > 0.6; CV = 2.3263 (Rejection Region ߳(2.3263; λ)); Test statistic = 1.4434;
Question 7
We fail to reject the null hypothesis; Insufficient evidence to support the alternative hypothesis at
7.1 H0: ua – ub = 0 vs. H1: ua – ub > 0; CV = 1.2958 (Rejection Region ߳(1.2958; λ)); Test statistic = 1% level of significance that his chemical is satisfactory
1.7798 (City A – City B); Reject the null hypothesis; Sufficient evidence to support the alternative
hypothesis at 10% level of significance Question 15
7.2 (Error) C

Question 8
8.1 (2.4775; 8.5225)
8.2 D
8.3 Zero (no difference) is NOT included in the confidence interval, so we reject H0 and conclude that
at a 5% level of significance, sufficient evidence exists to indicate that the weekly production rates
for the two methods are different

Question 9
9.1 (-1.554; 4.734)
9.2 D - (0 (up - uw = 0) is in the confidence interval – hence not in the rejection region; fail to reject;
Insufficient evidence)

65 66
10. p – Values and Chi-Squared Distribution
Question 8
Question 1 8.1
a) ‫ܪ‬଴ : ‫ = ݑ‬8 vs. ‫ܪ‬ଵ : ‫ ് ݑ‬8 a) ‫ܪ‬଴ : Health care plan and job category are not associated
b) Test statistic = െ1.6151 ‫ܪ‬ଵ : Health care plan and job category are associated

c) ‫ ݌‬െ value = 2P(T > 1.6151) where ܶ~‫ݐ‬ହ ; p-value is between 0.1 and 0.2 b) CV = ߯଴.଴ହ;ସ = 9.488 (Rejection region ߳ (9.488; λ)
d) We have weak evidence in favour of H1 that the average number of hours worked differs from 8 c) Test Statistic = 13.3333
hours. d) We reject H0 in favour of H1 at 5% level of significance
e) We have sufficient evidence to conclude that health care plan and job category are associated.
Question 2
a) ‫ܪ‬଴ : ‫ = ݑ‬5 vs. ‫ܪ‬ଵ : ‫ > ݑ‬5 8.2
b) Test statistic = ‫ = ݐ‬3.1623 a) ‫ܪ‬଴ : Health care plan and job category are not associated
c) ‫ ݌‬െ value = P(T > 3.1623) where ܶ~‫ݐ‬ଽ ; p-value is between 0.005 and 0.01 ‫ܪ‬ଵ : Health care plan and job category are associated
d) We have very strong evidence to conclude that the machine is dispensing more than 5 mg of b) Test statistic = 13.333
active ingredient of the drug in each container. c) p-value = ܲ(߯ସଶ > 13.333) is between 0.005 and 0.01
d) We have very strong evidence, in favour of H1, to conclude that health care plan and job category
Question 3 are associated
a) ‫ܪ‬଴ : ‫ = ݑ‬2.5 vs. ‫ܪ‬ଵ : ‫ < ݑ‬2.5
b) Test statistic = ‫ = ݖ‬െ1.6010 Question 9
c) ‫ ݌‬െ value = P(Z < -1.60) = 0.0548; p-value is between 0.05 and 0.1
d) We have moderate evidence to support the manufacturer’s claim. 9.1 Contingency Table:
Like Do not like Total
Question 4
a) ‫ܪ‬଴ : ‫ݑ‬ௗ = 0 vs. ‫ܪ‬ଵ : ‫ݑ‬ௗ ് 0 City dwellers 21 (18) 39 (42) 60
b) Test statistic= ‫ = ݐ‬െ1.8821
c) ‫ ݌‬െ value = 2P(T > 1.8821) where ܶ~‫ݐ‬ହ ; p-value is between 0.1 and 0.2 Rural dwellers 9 (12) 31 (28) 40
d) We have weak evidence in favour of H1 that the athletes’ performances are affected by the
altitude. Total 30 70 100

Question 5
a) H0: ‫ݑ‬ே = ‫ݑ‬ௌ or ‫ݑ‬ே െ ‫ݑ‬ௌ = 0 The expected values in parentheses (round brackets) are relevant for question 9.2 and 9.3
H1: ‫ݑ‬ே ് ‫ݑ‬ௌ or ‫ݑ‬ே െ ‫ݑ‬ௌ ് 0 9.2
b) Test statistic = - 1.8399 (s*2 = 0.3025; s* = 0.55) a) ‫ܪ‬଴ : Opinion of the programme and area of residence are unrelated
c) p-value = 2 × P(T8 > 1.8399) is between 0.1 and 0.2 ‫ܪ‬ଵ : Opinion of the programme and area of residence are related
d) We have weak evidence to conclude that the special treatment changes the tensile strength. ଶ
b) CV = ߯଴.଴ହ;ଵ = 3.841 (Rejection region ߳ (3.841; λ)
c) Test Statistic = 1.7857
d) We fail to reject H0 in favour of H1 at 5% level of significance
Question 6
e) We have insufficient evidence to conclude that their opinion of the programme and area of
a) H0: ub – ua = 0
residence are related.
H1: ub > ua or ub – ua > 0
b) Test statistic = 2.3667 (Night – Day) (s*2 = 153.0303; s* = 12.37054…) 9.3
c) p-value = P(T33 > 2.3667) is between 0.01 and 0.025
a) ‫ܪ‬଴ : Opinion of the programme and area of residence are unrelated
d) We have strong evidence to support the manager’s suspicion
‫ܪ‬ଵ : Opinion of the programme and area of residence are related
b) Test statistic = 1.7857
Question 7 c) p-value = ܲ(߯ଵଶ > 1.7857) is between 0.1 and 0.25
a) ‫ܪ‬଴ : p0 = 0.5 vs. ‫ܪ‬ଵ : p0 > 0.5 d) We have weak evidence, in favour of H1, to conclude that the opinion of the programme and area
b) Test statistic = 0.2828 of residence are related.
c) p-value = ܲ(ܼ > 0.28) = 0.3897 – between 0.2 and 1
d) We have insufficient evidence to support the medical investigators’ claim

67 68

You might also like