KEMBAR78
STAT 1520 Notes | PDF | Quartile | Ordinary Least Squares
0% found this document useful (0 votes)
630 views61 pages

STAT 1520 Notes

This document provides a guide and notes for the STAT1520 Economic and Business Statistics final exam at the University of Western Australia. It covers key topics in four parts: topic summaries and exam tips; practice exam questions and answers; a sample final exam paper; and suggested solutions to the exam paper. The topic summaries review important statistical concepts like descriptive analytics methods, estimators, data types, numerical descriptive measures, basic probability rules, and data visualization techniques like scatter diagrams. The document aims to help students prepare and review for their STAT1520 final exam.

Uploaded by

nojnfo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
630 views61 pages

STAT 1520 Notes

This document provides a guide and notes for the STAT1520 Economic and Business Statistics final exam at the University of Western Australia. It covers key topics in four parts: topic summaries and exam tips; practice exam questions and answers; a sample final exam paper; and suggested solutions to the exam paper. The topic summaries review important statistical concepts like descriptive analytics methods, estimators, data types, numerical descriptive measures, basic probability rules, and data visualization techniques like scatter diagrams. The document aims to help students prepare and review for their STAT1520 final exam.

Uploaded by

nojnfo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

lOMoARcPSD|6387688

University of Western Australia - STAT1520 Economic and


Business Statistics
Economic And Business Statistics (University of Western Australia)

StuDocu is not sponsored or endorsed by any college or university


Downloaded by James acaster (jiimmya662@gmail.com)
lOMoARcPSD|6387688

UNIVERSITY OF WESTERN AUSTRALIA

STAT1520
ECONOMIC AND BUSINESS
STATISTICS

Final Examination Guide and Notes

- Premium Edition -

Exclusively Designed by

FIRST-CLASS HONOURS
- High Distinction -

TV

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Table of
Contents

STAT1520: Economic and Business Statistics

Part A – Topic Summary & Exam Tips

Part B – Exam Practice Questions & Answers

Part C – Sample Final Examination Paper

Part D – Suggested Solutions to Final Exam Paper

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

A
PART A

Topic Summary
& Exam Tip s

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

IN TROD U CTION

BRAN CH ES METH OD S TECH N IQU ES

Collecting, summarizing, presenting and


1. Descriptive Analytics Survey
organizing data

Use a model and data to make forecasts of


2. Predictive Analytics Forecasting
outcomes

Use data collected from a small group to draw


3. Inferential Analytics Hypothesis
conclusions about population

ESTIMATORS SAMPLE POPU LATION

1. Mean 𝑋 𝜇

2. Variance 𝑠 𝜎

3. Standard Deviation 𝑠 𝜎

4. Proportion 𝑝 𝜋

TYPES OF D ATA D ESCRIPTION EXAMPLES

Nominal Label, Name Gender


Categorical
Ordinal = Nominal Scale (+ Meaningful Order) Likert Scale

Interval Ordered, Meaningful Difference, No True Zero Celsius Scale


Numerical
Ratio = Interval Scale (+ True Zero) Age

Discrete Measuring How many, No Fractions Number of children


Numerical
Continuous Any numerical value is possible & meaningful Weight

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

N U MERICAL D ESCRIPTIVE MEASU RES

MEASU RES EXPLAN ATION FORMU LA

∑𝑵
𝒊 𝟏 𝑿𝒊
Mean The average value in a dataset 𝑴𝒆𝒂𝒏
𝑵

𝒏 𝟏
Median The middle value in a dataset 𝒏𝒕𝒉
𝟐

Find the value with highest


Mode The most common value in a data set
frequency of occurrence

A measure how spread out data is around


Standard Deviation ∑𝑵
𝒊 𝟏 𝑿𝒊 𝝁 𝟐
center of the distribution. N is the size of 𝝈
(Population) 𝑵
population, 𝜎 is S.D of population.

Variance = (Standard Deviation)2


Variance ∑𝑵
𝒊 𝟏 𝑿𝒊 𝝁 𝟐
𝟐
(Population)
Variance measures of how the data 𝝈
𝑵
distribute itself around the mean.

𝒏 𝟏
First quartile 25th percentile of the sorted sample data 𝒏𝒕𝒉
𝟒

𝒏 𝟏
Third quartile 75th percentile of the sorted sample data 𝒏𝒕𝒉 𝟑∗
𝟒

Interquartile Range – distance between the


Interquartile Range IQR = Q3 – Q1
third quartile and first quartile.

Standard Deviation Spread of data observations around the ∑𝒏𝒊 𝑿𝒊 𝑿 𝟐


𝟏
(Sample) Sample Mean. n is Sample Size. 𝒔
𝒏 𝟏

Variance Sample variance measures how far a set of ∑𝒏𝒊 𝟏𝑿𝒊 𝑿 𝟐


𝟐
data is spread out within that sample.
𝒔
(Sample) 𝒏 𝟏

Upper Fence To determine outliers using Boxplot. Upper: 𝑸𝟏 𝟏. 𝟓 ∗ 𝑰𝑸𝑹


Lower Fence Upper limit & Lower Limit Fence Lower: 𝑸𝟏 𝟏. 𝟓 ∗ 𝑰𝑸𝑹

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

BASIC PROBABILITY

Sample Space (S) is the set of all possible S1 = {Head, Tail}


Space
outcomes S2 = {Mon, Tues. … Sun}

An event is an outcome or a set of Event A1: Exactly (1 Head)


Events
outcomes of the Sample Space (S1, S2) Event A2: Exactly (Mon & Sun)

If A is any event, probability of A is P(A), When A & B are disjoint:


Probability
and it satisfies 0 𝑃 𝐴 1 P (A or B) = P(A) + P(B)

General Additional Rule For mutually exclusive events A & B


Rules
P (A or B) = P (A) + P (B) – P (A and B) P (A or B) = P (A) + P (B)

Marginal & Joint Probability


A Co n t i n g e n c y T a b l e

𝑃 𝐴 𝑎𝑛𝑑 𝐵
𝑃 𝐵|𝐴 𝑃 𝐴 is Marginal Probability of A
𝑃 𝐴

𝑃 𝐴 𝑎𝑛𝑑 𝐵
𝑃 𝐴|𝐵 𝑃 𝐵 is Marginal Probability of B
𝑃 𝐵

𝑃 𝐴 𝑎𝑛𝑑 𝐵 𝑃 𝐴 𝑃 𝐵 A and B are independent

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

D ATA VISU ALISATION

Type I

Numerical Variable Numerical Variable Scatter Diagram

The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a
relationship between them.

Type II

Comparative Summary Measures*


Numerical Variable Categorical Variable
Multiple Boxplots**

*Comparative Summary Measures: In analytical work a frequently recurring operation is the


verification of performance by comparison of data. For example, comparing number of residential
replacements, commercial, new residential, and other over a period from 2013 – 2024.

**Multiple box plot represents ranges of values of multiple variables. A box plot is a standardised
way of displaying the distribution of data based on: Lower Fence, 1st Quartile, Median, 3rd
Quartile, Upper Fence.

Type III

Cross-tabulation*
Categorical Variable Categorical Variable
Multiple Bar Chart**

*Cross-Tabulations: Cross tabulation is a statistical tool that is used to analyze categorical data.
Your eye color can be divided into 'categories' (i.e., blue, brown, green), and it is impossible for
eye color to belong to more than one category (i.e., color).

**Multiple Bar Chart: A multi-bar chart is a bar chart in which multiple data sets are represented
by drawing the bars side by side in a cluster (i.e., on- off- campus).

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

COU N TIN G RU LES

The factorial of a non-negative integer n, denoted by n!, 𝑛! 𝑛∗ 𝑛 1 !


Factorials is the product of all positive integers less than or equal
For example
to n 4! 4 ∗ 3 ∗ 2 ∗ 1
24
A permutation is an arrangement in a particular order of
𝑛𝑃
Permutations r randomly sampled items from a group of n items and 𝑛!
is denoted by 𝑛𝑃 𝑛 𝑟 !

A combination is an arrangement of r items chosen at


random from n items where the order of the selected 𝑛𝐶
Combinations 𝑛!
items is not important, for example XYZ is the same as
𝑛! 𝑛 𝑟 !
ZYX. A combination is denoted by 𝑛𝐶
𝑏: binomial probability
𝑏 𝑥; 𝑛, 𝑃
Binomial 𝑥: total number of successes (pass or fail, heads or tails,
…) 𝑛𝐶 ∗ 𝑃 ∗
Distribution
𝑃: probability of a success on an individual trial ∗ 1 𝑃
𝑛: number of trials
The average number of successes 𝜇 that occurs in a

Poisson specified region is known 𝑃 𝑥; 𝜇


e: A constant equal to 2.718
Distribution 𝑒 ∗𝜇
x: The actual number of successes that occur in a 𝑥!
specified region

Exam Tips 1: If you are given an exact Exam Tips 2: If your question has an average
probability and you want to find the probability of an event happening per unit (i.e.
probability of the event happening in a per unit of time, cycle, event) and you want to
certain number out times of x (i.e. 10 times find probability of a certain number of events
out of 100, or 99 times out of 1000), use the happening in a period of time, then use the
Binomial Distribution Poisson Distribution

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

CON FID EN CE IN TERVAL ESTIMATES


A confidence interval addresses this issue because it provides a range of values which is likely to
contain the population parameter of interest. Confidence intervals are constructed at a confidence
level, such as 90%, 95%, or 99%, selected by the user.

PROPORTION MEAN

Z-Score (always) Z-Score when 𝝈 known t-Score when 𝝈 unknown

Information Given: Information Given: Information Given:

𝒑 𝝁, 𝝈 𝑿, 𝒔, 𝒏

Question 1: Question 2: Question 3:

Of the 49 employees at Records indicate that (annual) A random sample of 81


Woolworth surveyed, 32 were absenteeism of employees at houses were selected from
satisfied with the survey they the Coles plenty follows a all houses in Tasmania,
received. We find proportion Normal distribution, with µ = 10 and from the sample: 𝑋=
of ALL employees who days, 𝜎 4 days. 1175 𝑚 , 𝑠 = 373 𝑚 .
satisfied.

Question 1 (cont’d): Question 2 (cont’d): Question 3 (cont’d):

Calculate 90% confidence Find 95% interval estimate for Calculate 99% confidence
interval. the number of absent days. interval of all houses.

Step 1: Step 1: Step 1:


Find the value of 𝑝: Find the value of Population Find the value of Sample
32 Mean 𝜇 and Population Mean 𝑋 and Sample
𝑝 0.653 Standard Deviation 𝜎 :
49 Standard Deviation 𝑠 :
65.3% of the 49 employees 𝜇 10
surveyed were satisfied 𝜎 √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 √4 2 𝑋 10, 𝑠 373

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Step 2: Step 2: Step 2:

Find the value of Z score Find the value of Z score To find t-critical, we
corresponding to (given) corresponding to (given) need Degree of
confidence interval confidence interval Freedom

90% Confidence Interval 95% Confidence Interval 𝑛 1 81 1 80,

Z-Score = 1.645, 1.645 Z-Score = 1.96, 1.96 and 𝛼 = 5%. Use t-table:

t-critical = 2.6387

Step 3: Step 3: Step 3:


Apply Formula - Apply Formula - Apply Formula -
Confidence Interval Confidence Interval for Mean We only know 𝑠 (sample
for Proportion When 𝜎 known, or standard standard deviation),
deviation of population given, NOT 𝜎, then use t-
𝑝 1 𝑝 we use Z-Score formula: statistics formula:
𝑝 𝑍∗ 𝑠
𝑛 𝑋 𝑡∗
𝜇 𝑍∗𝜎
√𝑛

𝟔𝟓. 𝟑% 𝟏. 𝟔𝟒𝟓 𝟏𝟎 𝟏. 𝟗𝟔 ∗ 𝟐 𝟑𝟕𝟑


𝟏𝟏𝟕𝟓 𝟐. 𝟔𝟑𝟖𝟕 ∗
√𝟖𝟏
𝟔𝟓. 𝟑% ∗ 𝟏𝟎𝟎% 𝟔𝟓. 𝟑% 10 3.92
∗ 1175 109.36
𝟒𝟗
1065.64, 1284.36
0.653 1.645 ∗ 0.068 6.08, 13.92
0.5411, 0.7649 Or from 1065.64 to
Or from 6 days to 14 days 1284.36 square metres
Or from 54.11% to 76.49% (rounded)

Step 4 (Final): Step 4 (Final): Step 4 (Final):

Conclusion – We are 90% Conclusion – We are 95% Conclusion – We are 99%


confident that all employees confident that number of absent confident that Lot Size of
at Woolworth satisfied with days of all employees at Coles all houses in Tasmania
the survey somewhere somewhere between 6 days to somewhere between
between 54.11% to 76.49%. 14 days. 1065.64 to 1284.36 𝑚 .

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

H YPOTH ESIS TESTIN G

PROPORTION MEAN

Z-Score (always) Z-Score when σ known t-Score when σ unknown

Information Given: 𝑝 Information Given: μ, σ Information Given: X, s, n

Use Z-Test in Hypothesis Use Z-Test in Hypothesis Use t-Test in Hypothesis


Testing Testing Testing

A hypothesis test is a statistical test that is used to determine whether there is enough statistical
evidence in a sample of data to infer that a certain condition is true for an entire population. A
hypothesis test examines two opposing hypotheses about a population: Null Hypothesis 𝐻 ,
Alternative Hypothesis 𝐻

EXAM TIPS & TRICKS

Keywords in Questions Sign of Null Hypothesis & Alternative Hypothesis


Equal to 𝐻
Not Equal to 𝐻
No Change 𝐻
Difference 𝐻
No Difference 𝐻
Exceeds (Greater than) 𝐻
No Less than 𝐻
Higher than or Equal 𝐻
At Least 𝐻
Less than 𝐻
Less than or Equal to 𝐻
No More than 𝐻

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

67(36WDWH1XOO+\SRWKHVLVDQG$OWHUQDWLYH+\SRWKHVLV

Proportion Mean
+ʌ D +— D
3RSXODWLRQSURSRUWLRQHTXDOVWRD 3RSXODWLRQPHDQHTXDOVWRD
+ʌD +—D
3RSXODWLRQSURSRUWLRQGLIIHUHQFHIURPD 3RSXODWLRQPHDQGLIIHUHQFHIURPD
+ʌ”D +—”D
3RSXODWLRQSURSRUWLRQOHVVWKDQRUHTXDOWRD 3RSXODWLRQPHDQOHVVWKDQRUHTXDOVD
+ʌ!D +—!D
3RSXODWLRQSURSRUWLRQKLJKHUWKDQD 3RSXODWLRQPHDQKLJKHUWKDQD
+ʌ•D +—•D
3RSXODWLRQSURSRUWLRQKLJKHUWKDQRUHTXDOWRD 3RSXODWLRQPHDQKLJKHUWKDQRUHTXDOVD
+ʌD +—D
3RSXODWLRQSURSRUWLRQOHVVWKDQD 3RSXODWLRQPHDQOHVVWKDQD


67(3'HWHUPLQHWDLOWHVW'(3(1'217+(6,*12)7+($/7(51$7,9(+<327+(6,6

67(3'HWHUPLQHYDOXHRI=DQGWFULWLFDOYDOXH

Case 1: Lower tail

 Į
/RZHUWDLOWHVW

,IOHYHORIVLJQLILFDQW ÎĮ 

,IOHYHORIVLJQLILFDQW ÎĮ 



,IOHYHORIVLJQLILFDQW Î Į 
=W




Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

/2:(57$,/7(67

=±6&25( 7±6&25(
,IĮ Î=  ,IĮ Î'HWHUPLQH')&ROXPQ>@
,IĮ Î=  ,IĮ Î'HWHUPLQH')&ROXPQ>@
,IĮ Î=  ,IĮ Î'HWHUPLQH')&ROXPQ>@


Case 2: Upper tail

 Į 8SSHUWDLOWHVW

,IOHYHORIVLJQLILFDQW ÎĮ 

,IOHYHORIVLJQLILFDQW ÎĮ 



=W
,IOHYHORIVLJQLILFDQW ÎĮ 

833(57$,/7(67
=±6&25( 7±6&25(
x ,IĮ Î=  x ,IĮ Î'HWHUPLQH')&ROXPQ>@
x ,IĮ Î=  x ,IĮ Î'HWHUPLQH')&ROXPQ>@
x ,IĮ Î=  x ,IĮ Î'HWHUPLQH')&ROXPQ>@
 
ONLY 1 VALUE OF Z-SCORE AND POSITIVE ONLY 1 VALUE OF Z-SCORE AND POSITIVE

Case 3: Two tails

હ હ
  7ZRWDLOVWHVW
ଶ ଶ


,IOHYHORIVLJQLILFDQW ÎĮ IRUHDFKWDLO

,IOHYHORIVLJQLILFDQW ÎĮ IRUHDFKWDLO



,IOHYHORIVLJQLILFDQW ÎĮ IRUHDFKWDLO




Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

7:27$,/67(67
=±6&25( 7±6&25(
x ,IĮ Î= >@ x ,IĮ Î'HWHUPLQH')&ROXPQ>@
x ,IĮ Î= >@ x ,IĮ Î'HWHUPLQH')&ROXPQ>@
x ,IĮ Î= >@ x ,IĮ Î'HWHUPLQH')&ROXPQ>@
 
TWO VALUES OF Z-SCORE TWO VALUES OF T-SCORE( - + )


67(3'(&,6,2158/(6

x/RZHUWDLOWHVW,I=FDOFXODWHOHVVWKDQ=FULWLFDORUWFDOFXODWHOHVVWKDQWFULWLFDOZHUHMHFWWKHQXOOK\SRWKHVLVWHVW
x8SSHUWDLOWHVW,I=FDOFXODWHKLJKHUWKDQ=FULWLFDORUWFDOFXODWHKLJKHUWKDQWFULWLFDOZHUHMHFWWKHQXOO
K\SRWKHVLVWHVW
x7ZRWDLOVWHVW,I=FDOFXODWHOHVVWKDQ=FULWLFDORUKLJKHUWKDQ=FULWLFDOZHUHMHFWWKHQXOOK\SRWKHVLVWHVW
,IWFDOFXODWHOHVVWKDQWFULWLFDORUKLJKHUWKDQWFULWLFDOZHUHMHFWWKHQXOOK\SRWKHVLVWHVW

67(3&DOFXODWH=FDOFXODWHDQGWFDOFXODWH

=FDOFXODWH 8VLQJIRU3URSRUWLRQ  =FDOFXODWH 0HDQZLWKıJLYHQ  WFDOFXODWH 0HDQZLWKı127JLYHQ


࣌ ࢙
࣊ሺ૚ି࣊ሻ 6WHS࣌ࢄ    6WHS࢙ࢄ   
6WHS࣌࢖ ට  ξ࢔ ξ࢔

ıJHWIURPSRSXODWLRQ6WDQG'HY VJHWIURPVDPSOH6'
ʌJHWIURPVWDWHGQXOOK\SRWKHVLV
ࢄିஜ ࢄିஜ
࢖ି࣊ 6WHS=FDOF   6WHSWFDOF  
6WHS=FDOF   ࣌ࢄ ࢙ࢄ
࣌࢖
ࢄJHWIURPWKHPHDQRIVDPSOH ࢄJHWIURPWKHPHDQRIVDPSOH
SJHWIURPWKHSURSRUWLRQRIVDPSOH



67(3'HFLVLRQDQGFRQFOXVLRQ

'HFLVLRQGHSHQGRQ6WHSÎ5HMHFWRUGRQRW5HMHFW+

&RQFOXVLRQ:HFRQFOXGHWKDW3238/$7,213523257,21 3238/$7,210($1 «




Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

SIMPLE LINEAR REGRESSION MODEL


Least Squares Estimation

If 𝒃𝟏 and 𝒃𝟐 are the least square estimates, then

𝒚 𝒃𝟏 𝒃𝟐 𝒙𝒊

𝒆 𝒚𝒊 𝒚 𝒚𝒊 𝒃𝟏 𝒃𝟐 𝒙𝒊

The Normal Equations

(Formula 1D) 𝑵𝒃𝟏 𝑥 𝒃𝟐 𝑦

(Formula 2D) 𝒙𝒊 𝒃𝟏 𝑥𝟐 𝑏 𝒙𝒊 𝒚𝒊

Least Squares Estimators

∑ 𝒙𝒊 𝒙 𝒚𝒊 𝒚 𝒏 ∑ 𝑿𝒊 𝒀𝒊 ∑ 𝑿𝒊 ∑ 𝒀𝒊
(Formula 3D) 𝒃𝟏
∑ 𝒙𝒊 𝒙 𝟐 𝒏 ∑ 𝑿𝟐𝒊 ∑ 𝑿𝒊 𝟐

(Formula 4D) 𝒃𝟎 𝒚 𝒃𝟐 𝒙

Assumptions of the Simple Linear Regression Model

SR1 The value of 𝒚, for each value of 𝒙, is 𝒚 𝜷𝟏 𝜷𝟐 𝒙 𝒆

SR2 The average value of the random error 𝒆 is 𝑬 𝒆 𝟎 since we assume that 𝑬 𝒚 𝜷𝟏 𝜷𝟐 𝒙

SR3 The variance of the random error 𝒆 is 𝒗𝒂𝒓 𝒆 𝝈𝟐 𝒗𝒂𝒓 𝒚

SR4 The covariance between any pair of random errors, 𝒆𝒊 and 𝒆𝒋 is 𝒄𝒐𝒗 𝒆𝒊 , 𝒆𝒋 𝒄𝒐𝒗 𝒚𝒊 , 𝒚𝒋 𝟎

SR5 The variable 𝒙 is not random and must take at least two different values

SR6 The values of 𝒆 are normally distributed about their mean 𝒆 ~ 𝑵 𝟎, 𝝈𝟐

Gauss-Markov Theorem: Under the assumptions SR1 – SR5 of the linear regression model
the estimators 𝑏 and 𝑏 have the smallest variance of all linear and unbiased estimators of 𝛽
and 𝛽 . They are the Best Linear Unbiased Estimators (BLUE) of 𝛽 and 𝛽 .

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

B
PART B

Exam Practice
Qu e s tio n s & Answers

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 1: Numerical Data Measures


A bank branch located in a commercial district of a city has developed a process to improve customer
service during the noon to 1 pm lunch period. The waiting time in minutes of all customers during this
hour is recorded over a period of one week.

The waiting time is defined as the time the customer enters the line to when he or she reaches the
teller window. A random sample of 15 customers is selected, and the results are as follows:

4.21 5.55 0.5 5.13 4.77 2.34 3.54 3.2 4.5 6.1 0.38 5.12 6.46 12.19 3.79

Use an appropriate technique to summarise the data and answer following questions.

Part a. What are the shortest and the longest waiting times?

Part b. What is the typical waiting time?

Part c. Around what values, if any, are the waiting times concentrated?

Part d. The standard deviation is approximately 2.8 minutes, what does this mean?

Part e. Calculate the Q1 and Q3

Part f. Calculate Inter-Quartile Range (IQR)

Part g. Comment on the shape of the waiting time data.

Part h. Determine if there are any unusual waiting times in the above dataset.

Part i. Use the results above to provide a summary of the waiting time in plain language.

Step-by-step Solutions

Part a.

Step 1: Remember to reorganize data in ascending order

0.38 0.5 2.34 3.2 3.54 3.79 4.21 4.5 4.77 5.12 5.13 5.55 6.1 6.46 12.19

Step 2: Find the value of shortest and longest waiting times

Shortest waiting time = 0.38 minutes

Longest waiting time = 12.19 minutes

Part b.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS


Calculate the typical waiting time:
𝟎. 𝟑𝟖 𝟎. 𝟓 𝟐. 𝟑𝟒 ⋯ 𝟔. 𝟏 𝟔. 𝟒𝟔 𝟏𝟐. 𝟏𝟗
𝐌𝐞𝐚𝐧 𝟒. 𝟓𝟏
𝟏𝟓

Part c.

Calculate Median
𝟏𝟓 𝟏
𝐋𝐨𝐜𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐦𝐞𝐝𝐢𝐚𝐧 𝐯𝐚𝐥𝐮𝐞 𝟖𝐭𝐡 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧
𝟐
Value of median is at 8th position in the sorted data array above, or 4.5

Generally, these values seem to be concentrated between 3.2 and 5.55.

Part d.

On average, each individual customer’s waiting time deviates by 2.8 minutes from the mean of 4.51
minutes.

Part e.

Calculate 1st Quartile

𝟏𝟓 𝟏
𝐋𝐨𝐜𝐚𝐭𝐢𝐨𝐧 𝟏𝐬𝐭 𝐐𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝟒𝐭𝐡 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧
𝟒

Value of 1st Quartile is at 4th position in the sorted data array above, or 3.2.

Calculate 3rd Quartile


𝟑 ∗ 𝟏𝟓 𝟏 𝟒𝟖
𝐋𝐨𝐜𝐚𝐭𝐢𝐨𝐧 𝟑𝐫𝐝 𝐐𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝟏𝟐𝐭𝐡 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧
𝟒 𝟒
Value of 3rd Quartile is at 12th position in the sorted data array above, or 5.55

Part f.

𝐈𝐧𝐭𝐞𝐫𝐪𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝐑𝐚𝐧𝐠𝐞 𝐐𝟑 𝐐𝟏
𝟓. 𝟓𝟓 𝟑. 𝟐 𝟐. 𝟑𝟓

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS


Part g.

Mean and Median very similar Symmetrical Data

Part h.

Calculate Limits - Using Empirical Rule (Symmetrical Data)

From 𝑀𝑒𝑎𝑛 3 ∗ 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 to 𝑀𝑒𝑎𝑛 3 ∗ 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

From 4.51 3 ∗ 2.8 to 4.51 3 ∗ 2.8

From 3.89 to 12.91

Since all of data value lie between lower limit and upper limit, there seem to be no outliers in the
data set.

Part i.

The average waiting time for these 15 customers was around 4.5 minutes.

The shortest waiting time was 0.4 minutes, while the longest waiting time was 12.2 minutes,

resulting in a range of roughly 11.8 minutes.

25% of the shortest waiting times did not exceed 3.2 minutes, while the longest 25% lasted

at least 5.6 minutes. Thus, the middle half was spread across around 2.4 minutes.

The waiting time of 12.2 minutes seemed unusually long as it was almost twice the second

longest waiting time.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 2 - Probability
A survey has been conducted of companies involved in software development. It showed that the last

200 computer software packages recently released, the production expenditure and the profitability

in the first year were as follows:

Production Expenditure ($’000) Unprofitable Profitable

Less than 100 75 15

100 to less than 300 40 20

300 or more 25 25

Part 1: Co n tin ge n cy Table

Part a. For these data, construct the row percentage contingency table.

Part b. Construct the column percentage contingency table.

Part 2 : Te ch n ica l An a lys is

Part a. Calculate Probability of Production Expenditure for at least $100,000.

Part b. Calculate Probability of Production Expenditure for at least $300,000 and Unprofitable.

Part c. Calculate Probability of Production Expenditure for less than $100,000 given Profitable.

Part d. Test for whether Production Expenditure & Profit dependent, or independent with each other

Part e. Without any further calculations, state whether you think production cost & profitability are

dependent and give a brief explanation of your choice.

Step-by-step Solutions
Part 1. (a)
Step 1: Find Marginal Probability of each variable

Use contingency table:

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Production Expenditure ($’000) Unprofitable Profitable Total

Less than 100 75 15 90

100 to less than 300 40 20 60

300 or more 25 25 50

Total 140 60 200

Step 2: Row Percentage Contingency Table

Exam Tips: Values are summed in the rows for a total of 100%.

Production Expenditure ($’000) Unprofitable (%) Profitable (%) Total (%)

Less than 100 83.3 16.7 100

100 to less than 300 66.7 33.3 100

300 or more 50 50 100

Total 70 30 100

Part 1. (b)

From the table in Step 1, we construct Column Percentage Contingency Table

Production Expenditure ($’000) Unprofitable (%) Profitable (%) Total (%)

Less than 100 53.6 25.0 45.0

100 to less than 300 28.6 33.3 30.0

300 or more 17.9 41.7 25.0

Total (%) 100.0 100.0 100.0

Part 2. (a)

From the table in Step 1, Production Expenditure $100,000 is a total of (60 + 50)

Exam Tips - Applying Simple Probability formula:

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS


𝟔𝟎 𝟓𝟎
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝟎. 𝟓𝟓 𝐨𝐫 𝟓𝟓. 𝟎%
𝟐𝟎𝟎

Part 2. (b)

𝟐𝟓
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟑𝟎𝟎 𝐚𝐧𝐝 𝐔𝐧𝐩𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞 𝟎. 𝟏𝟐𝟓 𝐨𝐫 𝟏𝟐. 𝟓%
𝟐𝟎𝟎

Part 2. (c)

𝟏𝟓
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝐚𝐧𝐝 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞 𝟎. 𝟎𝟕𝟓 𝐨𝐫 𝟕. 𝟓%
𝟐𝟎𝟎
𝟔𝟎
𝐏 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞 𝟎. 𝟑𝟎 𝐨𝐫 𝟑𝟎. 𝟎%
𝟐𝟎𝟎
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝐚𝐧𝐝 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝐠𝐢𝐯𝐞𝐧 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞
𝐏 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞
𝟎. 𝟎𝟕𝟓
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝐠𝐢𝐯𝐞𝐧 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞 𝟎. 𝟐𝟓 𝐨𝐫 𝟐𝟓. 𝟎%
𝟎. 𝟑𝟎

Part 2. (d)

𝟗𝟎
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝟎. 𝟒𝟓 𝐨𝐫 𝟒𝟓. 𝟎%
𝟐𝟎𝟎
𝐏 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞 𝟏𝟎𝟎 𝐠𝐢𝐯𝐞𝐧 𝐏𝐫𝐨𝐟𝐢𝐭𝐚𝐛𝐥𝐞 𝟐𝟓. 𝟎%

Since P (Expenditure < 100) P (Expenditure < 100 given Profitable), we can conclude that

Production Expenditure and Profit dependent on each other

Part 2. (e)

Yes, there seems to be a relationship between the variables.

If the production expenditure is less than $100,000, barely 17% of software packages are profitable. If

the expenditure is increased to somewhere in the range of $100,000 to less than $300,000, around

33% are profitable. Finally, with expenditure in excess of $300,000 chances of profitability are at 50%.

In general, it seems that if production expenditure increases, so does the chance of being profitable.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 3 – Discrete Probability Distribution

This question has two parts:

Part a.

The probability that an individual packet of biscuits is damaged in a box of biscuits has been found to

be 0.2 over many years for a company. If a box consists of 10 packets of biscuits, what is the probability

that:

i. Exactly two (2) packets of biscuits will be damaged in the box?

Binomial distribution with n = 20 and p = 0.2 so q = 0.8

𝑃 𝑒𝑥𝑎𝑐𝑡𝑙𝑦 2 20𝐶 𝑥 𝑝 𝑥 𝑞

20!
𝑥 0. 2 𝑥 0. 8 0.1369
2! 18!

Hence there is a 13.69% chance that exactly 2 sales will be made

ii. One (1) or more packets of biscuits will be damaged in the box?

𝑃 1 𝑜𝑟 𝑚𝑜𝑟𝑒 1 𝑃 0

1 20𝐶 𝑥 𝑝 𝑥 𝑞

20!
1 𝑥 0.2 𝑥 0.8
0! 20!

1 0.0115 0.9885

Hence there is a 98.85% chance that one (1) or more sales will be made.

iii. How many packets of damaged biscuits would you expect in a box of 10 packets of
biscuits?

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

We expect 𝜇 𝑛𝑝 10 ∗ 0.2 2 defectives in a sample of 10 items

Part b.

Seriously injured people arrive at an emergency unit of a hospital at an average rate of 3 per hour.

Assume that the number arriving has a Poisson distribution. What is the probability -

i. Exactly three (3) seriously injured people arrive in the next hour?

Arrivals are a Poisson distribution with 𝜆 0.5 persons per minute expected

In 𝑡 10 minutes, we expect 𝜆 𝜆 𝑡 0.5 𝑥 10 5 people to arrive

So 𝑃 𝑋 7 0.1044
! !

Hence there is a 10.44% chance that 7 people will arrive in the next 10 minutes.

ii. One (1) or more seriously injured people arrive in the next 30 minutes?

In 𝑡 12 minutes, we expect 𝜆 𝜆 𝑡 0.5 𝑥 12 6 people to arrive

So 𝑃 1 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑝𝑒𝑜𝑝𝑙𝑒 1 𝑃 0 1 0.0025 0.9975


!

Hence there is a 99.75% chance that 1 or more people will arrive in the next 12 minutes.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 4 – Continuous Probability Distribution

Part a. The life (in hours) of a particular brand of light bulb is known to be uniformly distributed
between three hundred (300) and eight hundred (800) hours.

i. Draw a simple graph below to illustrate the probability distribution

The total area under the curve must be 1, or 100%

𝐻𝑒𝑛𝑐𝑒, 𝑎𝑟𝑒𝑎 ℎ𝑒𝑖𝑔ℎ𝑡 𝑥 𝑏𝑎𝑠𝑒

1 ℎ𝑒𝑖𝑔ℎ𝑡 𝑥 800 300

1
𝑆𝑜, ℎ𝑒𝑖𝑔ℎ𝑡 0.002
500

ii. What is probability that a light bulb will last less than five hundred (500) hours?

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

From the diagram, we have that the Shaded area

𝑆ℎ𝑎𝑑𝑒𝑑 𝐴𝑟𝑒𝑎 0.002 𝑥 500 300 0.4

Part b. The life of a nickel-cadmium battery produced by a company is normally distributed

with a mean of 20 hours and a standard deviation of 10 hours. The company is considering a

warranty for the battery.

i. What proportion of batteries have a life more than 30 hour?

The distribution of battery life is normal with 𝜇 20 ℎ𝑜𝑢𝑟𝑠 and 𝜎 10 ℎ𝑜𝑢𝑟𝑠

𝑿 𝝁 𝟑𝟎 𝟐𝟎
𝒁 𝟏
𝝈 𝟏𝟎

For Z = 1, the value of Z-score in Z-table is 0.3413

Therefore, the area required is

𝟎. 𝟓 𝟎. 𝟑𝟒𝟏𝟑 𝟎. 𝟏𝟓𝟖𝟕

Hence, 15.87% of batteries have life more than 30 hours.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

ii. What proportion of batteries last between 15 and 30 hours?

𝑿𝟏 𝝁 𝟏𝟓 𝟐𝟎
𝒁𝟏 𝟎. 𝟓
𝝈 𝟏𝟎

𝑿𝟐 𝝁 𝟑𝟎 𝟐𝟎
𝒁𝟐 𝟏
𝝈 𝟏𝟎

For Z = 0.5, the value of Z-score in Z-table is 0.1915

For Z = 1, the value of Z-score in Z-table is 0.3413

Therefore, the area required is

𝟎. 𝟏𝟗𝟏𝟓 𝟎. 𝟑𝟒𝟏𝟑 𝟎. 𝟓𝟑𝟐𝟖

Hence, 53.28% of batteries have life more between 15 and 30 hours.

iii. If the company wants to replace less than 5% of all its batteries under a warranty,
how many hours of use should the warranty cover?

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

From Z-table, we find the value of 0.05, or 5%, which the value is determined by 1.645

From diagram, we have


𝑋 20
1.645 →𝑋 1.645 ∗ 10 20 3.55 ℎ𝑜𝑢𝑟𝑠
10
Hence the warranty should cover 3.55 hours or less than as 5% of batteries have lifetimes of 3.5
hours or less.

Part c. The game of Poker consists of dealing 5 cards to each player from a pack of 52 different cards

without replacement. In Poker the order in which cards are dealt is not important. For example, one

possible Poker hand is given below:

One possible Poker hand = 5 ♣, Ace ♥, 10 ♣, 3 ♦ and Queen ♠

How many different Poker hands of five cards are possible from a pack of 52 cards?

Order is not important, so use combinations. Hence the number of possible poker hands of 5

cards from a pack of 52 card is

52! 52 ∗ 51 ∗ 50 ∗ 49 ∗ 48 ∗ 47!
𝑁𝑜. ℎ𝑎𝑛𝑑𝑠 52 𝐶
5! 47! 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1 ∗ 47!

2,598,960 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑝𝑜𝑘𝑒𝑟 ℎ𝑎𝑛𝑑𝑠

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 5 – Confidence Interval Estim ation

We wish to estimate the mean Lot Size (square metres) of all houses in the Tasmania region.

Assume the random sample of 120 houses sold are representative of all houses in Tasmania.

Part a.

Calculate the 95% confidence interval estimate of the mean lot size (square metres) given that

there were n = 120 houses in the sample, sample mean = 1175 (square metres) and standard

deviation s = 373 (square metres).

Step-by-step Solutions

First Step: Determine MEAN or PROPORTION?


Step 1 𝜇 or 𝜋?
Because we estimate for the mean lot size, then it is (𝜇)

Only standard deviation (s) of the sample was given, then


Step 2 𝜎 known? we use t scores. By changing the fact that standard deviation
of population (𝜎) was given, we will use Z Score instead.

Degree of freedom n 1, where n is sample size


Step 3 Degrees of freedom?
Degree of freedom 120 1 119

Upper-tail area = 2.5% = 0.025


Step 4 95% confidence interval
Use t table: 𝑡 1.9799

𝑠 373
𝑥̅ 𝑡∗ 1175 1.9799 ∗ 1175 67.42
√𝑛 √120
Step 5 Applying formula
[1107.58, 1242.42] square metres

We are 95% confident that the mean lot size all houses in
Step 6 Conclusion
Tasmania somewhere between 1107.58 𝑚2 and 1242.42𝑚2

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Part b.

Suppose that the mean lot size for Sydney overall is 1,000 square metres. From your confidence
interval in part (a), what can we say about the lot sizes of Tasmania houses compared to Sydney
overall?
Given the above confidence interval, it seems that the mean lot size for all Tasmania houses is

higher than the mean lot size for Sydney houses.

Part c.

For the confidence interval calculated above in part (a), had you used 90% confidence instead of
95%, could you have come to a different conclusion in (b)? Explain your answer. (No calculations
are required).
90% confidence instead of 95% would produce a narrower interval. Hence, this would not change

the conclusion in (b)

Part d.
For the confidence interval calculated above in part (a), had the sample size of houses been n = 50
instead of n =120, could you have come to a different conclusion in (b)? Explain your answer. (No
calculations are required).
If we decrease the sample size, the Margin of Error would increase, in turn resulting in a wider

confidence interval – potentially including the Sydney mean of 1,000. Hence, this might change

the conclusion in (b)

Part e.
If the sample size of houses was n =20 what potential problems could there be in performing this
type of analysis?
If the sample size was less than 30, we would not be able to simply assume that the distribution of

the sample mean is normal. If, upon checking, it was not normal, we would not be able to construct

a confidence interval

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 6 – Hypothesis Testing

At a recent Union meeting of Westfield staff, concern was expressed about the increasing number

of hours that stores were open. Staff felt that they we being made to work longer and longer hours.

One union official claimed that, on average, all Westfield stores were open (i.e. trading) for more

than 100 hours per week. Assume level of significance is 5%.

To test this claim, the Union took a survey of 100 stores where it was found that the average

opening hours (𝑥̅ ) was 104.35 with a standard deviation (s) of 23.677 hours.

Use the “Six Steps in Hypothesis Testing” to see if there are grounds to the Unions claim.

Part a.

Set up Null Hypothesis and Alternative Hypothesis.

Part b.

Decide on the type of test

Part c.

Decide on a level of significance, 𝛼, and set critical value

Part d.

Write down the Decision Rule in terms of the critical value

Part e.

From the random sample, perform appropriate calculations

Part f.

Draw your conclusion

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Average number of hours ALL Westfield stores are opened is


𝐻 ∶ 𝜇 100
less than or equal to 100
Step 1
Average number of hours ALL Westfield stores are opened is
𝐻 ∶ 𝜇 100
more than 100

Exam Tips 1:
How do we know?
𝐻 > : (upper tail test)
Step 2 Upper tail test
It depends on the sign of 𝐻 𝐻 < : (lower tail test)
𝐻 : (2-tails test)

Exam Tips 2:
We use Z-Score in
Because we only know standard deviation of the
Step 3 Hypothesis Testing when:
sample, 𝑠 23.677, then apply t-score calculation.
• Test for Proportion
• Test for Mean + 𝜎 given

Sample size is 100, therefore degree of freedom (d.f) = n - 1 = 100 – 1 = 99

Step 4 From t table, critical value of t = +1.6604

Rejection rule: we reject H if t-statistic > 1.6604

We are given: 𝑠 23.677, 𝑥̅ 104.35, 𝑛 100, and 𝜇 100

Step 5
𝑋 𝜇 104.35 100 4.35
𝑡 1.8372
𝑠/√𝑛 23.677/√100 2.3677

Because t-statistics = 1.8372 > t-critical value (1.6604), we reject 𝐻

Step 6
Hence, there is sufficient evidence to conclude that average number of hours all
Westfield stores are opened for more than 100 hours, and we agree with Union claim

Exam Tips: Students should provide a graph for Rejection regions at ‘Step 4’ to gain full mark.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Question 7 – Sim ple Linear Regression Analysis


Foodmart Board is concerned about the variation in Sales ($ million) between individual

supermarkets. In particular, one Board member has suggested that a key factor in improving Sales is

for individual supermarkets to advertise more. You subsequently develop a simple regression model to

try and help explain the variation in sales. Below is the computer output of your simple regression

model. The dependent variable is Sales (measured in $million) and the independent variable is

Advertising (measured in $’000).

Model Summary

Model R R Squared Adjusted R Square Std. Error of the Estimate

0.842a 0.709 0.707 1.9317

Coefficients

Model Unstandardized Coefficients t Sig. 95.0% Confidence Interval for Beta

Beta Std. Error Lower Bound Upper Bound

(Constant) 5.145 0.377 13.636 0.000 4.399 5.891

Adv. $’000 0.044 0.002 19.005 0.000 0.040 0.049

Use the above model to answers all the following:

(a) How well does this model do in explaining the variation in sales? Explain fully.

(b) Write down the regression model equation and use it to predict sales for a store that spends

$100,000 on advertising.

(c) In the simple regression output above, the advertising variable has a 95% confidence interval

of 0.04 to 0.049. Give a practical explanation of this confidence interval.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

PRACTICE QUESTIONS & ANSWERS

Step-by-step Solutions

Part a.

R2 = 0.709

70.9% of the variation in sales can be explained by the variation in the amount spent on advertising.

The remaining 29.1% of variation would be explained by other factors, or variables, not in the model.

Thus, this is a fairly strong model in explaining the variation in sales.

Part b.

𝑆𝑎𝑙𝑒𝑠 5.145 0.044 ∗ 𝐴𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑖𝑛𝑔

𝑆𝑎𝑙𝑒𝑠 5.145 0.044 ∗ 100

𝑆𝑎𝑙𝑒𝑠 5.145 4.4 9.545

Hence, predicted sales for a store that spends $100,000 on advertising would be $9.545 million.

Part c.

We are 95% confident that, on average, for every extra $1000 spent on advertising, sales will increase

by somewhere between $0.04m and $0.049m ($40,000 and $49,000).

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

C
PART C

Sample Final
Examination Paper

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

SAMPLE FINAL EXAMINATION


QUESTIONS AND SUGGESTED SOLUTIONS

Question 1

A bank branch located in a commercial district of a city has developed a process to improve

customer service during the noon to 1 pm lunch period. The waiting time in minutes of all

customers during this hour is recorded over a period of one week. The waiting time is

defined as the time the customer enters the line to when he or she reaches the teller window.

A random sample of 16 customers is selected, and the results are as follows:

Waiting Time (in minutes)

5.5 12.3 6.5 5.9

8.2 5.9 2.8 8.3

10.7 3.2 8.5 15.3

12.5 6.8 17.2 8.4

(a) Following is a table of key summary measures for the minutes of waiting time for

these 16 customers. Use the numbers above, and/or the summary measures given

below, to complete the table by including the eight (8) missing summary measure?

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Number of Data Points 16

Minimum 2.8

Maximum 17.2

Total 138

Mean (Average) (i)

Median (ii)

Mode (iii)

First Quartile (iv)

Third Quartile (v)

Range (vi)

Interquartile Range (vii)

Variance (Sample) 16.47

Standard Deviation (Sample) (viii)

Standard Error 1.015

Skewness Coefficients (Pearson’s, Sample) 0.684

(b) Refer to the Interquartile Range. Using your result, explain in plain language how

this is useful for understanding data.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

(c) Refer to the Standard Deviation. Using your result, explain in plain language how

this is useful for understanding data.

(d) We would describe this data set of 16 values as being skewed to the right (or having

a positive skew). Provide two sets of evidence from your table of summary measures

which confirm this and provide a brief explanation of each.

(3 + 3 + 3 + 2) = 11 marks

Question 2

Investigate whether or not Age of Store is dependent on Location. You have the following

cross-tabulations for the sample of 150 stores:

Frequency Count Location


Total
Age of Store Country Mall Strip

0 < 5 years 8 15 15 38

5 < 10 years 22 26 7 55

10 < 15 years 6 14 12 32

15 < 20 years 2 5 11 18

25 < 25 years 0 1 6 7

Total 38 61 51 150

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

% of Total Location
Total
Age of Store Country Mall Strip
0 < 5 years 5.3% 10.0% 10.0% 25.3%

5 < 10 years 14.7% 17.3% 36.7%

10 < 15 years 4.0% 9.3% 8.0% 21.3%

15 < 20 years 1.3% 7.3% 12.0%

25 < 25 years 0.0% 0.7% 4.0% 4.7%


Total 25.3% 40.7% 34.0% 100%

% of Row Location
Total
Age of Store Country Mall Strip
0 < 5 years 21.0% 39.5% 39.5% 100%

5 < 10 years 40.0% 12.7% 100%

10 < 15 years 18.8% 43.8% 37.5% 100%

15 < 20 years 27.8% 61.1% 100%

25 < 25 years 0.0% 14.3% 85.7% 100%


Total 25.3% 40.7% 34.0% 100%

% of Column Location
Total
Age of Store Country Mall Strip
0 < 5 years 21.0% 24.6% 29.4% 25.3%
5 < 10 years 57.9% 42.6% 13.7% 36.7%
10 < 15 years 15.8% 23.0% 23.5% 21.3%
15 < 20 years 5.3% 8.2% 21.6% 12.0%
25 < 25 years 0.0% 1.6% 11.8% 4.7%
Total 100% 100% 100% 100%

(a) Complete the four (4) missing boxes in the cross-tabulation table above.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

(b) If you were to choose one store at random from the 150 in the group:

(i) What is the probability it would be a Mall store with an Age of less than 5

years?

(ii) What is the probability it would be aged 20 years or more?

(iii) What is the probability it would be aged from 10 to less than 15 years given

that it was a Strip store?

(iv) What is the probability it would be a Country store given that it was aged 20

years or more?

(4 + 8) = 12 marks

Question 3

(a) Wilson is interested in determining the true proportion of all customers who rank

the length of time they have to spend in queues as ‘excellent’. Given that 16.25% of

the 400 customers who were surveyed gave a rating of ‘excellent’, calculate a 90%

confidence interval for the true proportion.

(b) Suppose management want to know the true proportion of customers who rank the

length of time they have to spend in queues as ‘excellent’ to within 3% with 95%

confidence. How large a sample would need to be taken to achieve these

requirements?

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

(c) Below is output from a 90% confidence interval for the difference between two

population proportions. In this case the proportion of females who gave a rating of

‘excellent’ versus the same proportion of males. Based on the output can you

conclude that there is a difference between all male and all female customers?

Two Sample, Difference Proportions: Z

Details Data Input Results

Confidence Interval: Two sided Confidence Level % 90.0% Sample Proportion Difference % -3.079

Categorical Variable Sample 1: Size, n 1 269 Standard Error 3.93

Variable 1: Female Sample 1: p 1 , % 15.242 𝑍𝑍 +/− 1.645

Variable 2: Male Sample 1: Count 1 41 90% Interval: π 1 – π 2

Sample 2: Size, n 2 131 From -9.544

Sample 2: p 2 , % 18.321 To 3.386

Sample 2: Count 2 24

Attribute Value Excellent

(3 + 2 + 2) = 7 marks

Question 4

We wish to estimate the mean Lot Size (square metres) of all houses in the Tasmania

region. Assume the random sample of 120 houses sold are representative of all houses in

Tasmania.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

(a) Calculate the 95% confidence interval estimate of the mean lot size (square metres)

given that there were n = 120 houses in the sample, sample mean = 1175 (square

metres) and standard deviation s = 373 (square metres).

(b) Suppose that the mean lot size for Sydney overall is 1,000 square metres. From your

confidence interval in part (a), what can we say about the lot sizes of Tasmania

houses compared to Sydney overall?

(c) For the confidence interval calculated above in part (a), had you used 90%

confidence instead of 95%, could you have come to a different conclusion in (b)?

Explain your answer. (No calculations are required)

(d) For the confidence interval calculated above in part (a), had the sample size of

houses been n = 50 instead of n =120, could you have come to a different conclusion

in (b)? Explain your answer. (No calculations are required)

(e) If the sample size of houses was n =20 what potential problems could there be in

performing this type of analysis?

(3 + 2 + 2 + 1 + 1) = 9 marks

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Question 5

The Foodmart Board is concerned about variation in Sales ($ million) between individual

supermarkets. In particular, one Board member has suggested that a key factor in

improving Sales is for individual supermarkets to advertise more. Other Board members

believe there would be several factors, such as Gender of the manager (where Male is coded

as 0 and Female is 1), or car parking spaces, that would influence Sales.

You now develop a multiple regression model to try and explain the variation in sales.

Below is your regression output:

Model Summary

Model R R Squared Adjusted R Square Std. Error of the Estimate

0.854a 0.729 0.724 1.8770

Coefficients

Model Unstandardized Coefficients t Sig. 95.0% Confidence Interval for B

Beta Std. Error Lower Bound Upper Bound

(Constant) 4.660 0.408 11.407 0.000 3.852 5.467

Adv. $’000 0.039 0.003 14.614 0.000 0.034 0.045

Mng-Sex -0.045 0.414 -0.109 0.913 -0.864 0.774

Car Spaces 0.027 0.008 3.278 0.001 0.011 0.043

Use the output to answers all the following

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

(a) Are all of the variables included in the model significant? Explain.

(b) Write down the regression equation.

(c) Give a practical interpretation of the coefficients b 0 and b 1 , in the regression model.

(d) How well does this model do in explaining variation in weekly shops? Explain.

(e) What is the purpose of the Adjusted R2 in the regression model above.

(f) Can you conclude the population coefficient (β 1 ) for Advertisement is not zero?

Explain.

(3 + 2 + 3 + 2 + 1 + 2) = 13 marks

Question 6

Six months ago, Cali supermarkets launched their new website which allows customers to

shop online. The web development company who created web site believes that 10% of

Cali’s customers will use the site for shopping. Cali’s management is interested in whether

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

the actual proportion of customers who would shop online is any different to that claimed

by the web development company.

(a) Write down the null and alternative hypotheses in both symbols and words for the

above situation.

(b) A sample of 400 randomly selected Cali customers was taken. Of the 400 customers,

28 said they would use the new website to shop online. Using this information,

complete the hypothesis test from part (a) using a level of significance of 5%.

(c) Based on your answer to part (b), write down if the p-value would be bigger or

smaller than 𝛼𝛼 = 5%.

(2 + 4 + 2) = 8 marks

END OF EXAMINATION

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

D
PART D

Suggested Solutions
to Final Exam Paper

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

SOLUTIONS
Question 1

Part a

Number of Data Points 16

Minimum 2.8

Maximum 17.2

Total 138

Arithmetic Mean 8.625 (i)

Median 8.25 (ii)

Mode 5.9 (iii)

First Quartile 5.9 (iv)

Third Quartile 12.3 (v)

Range 14.4 (vi)

Interquartile Range 6.4 (vii)

Variance (Sample) 16.47

Standard Deviation (Sample) 4.059 (viii)

Standard Error 1.015

Skewness Coefficients (Pearson’s, Sample) 0.684

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Re-arrange data

Waiting Time (in minutes)

2.8 3.2 5.5 5.9

5.9 6.5 6.8 8.2

8.3 8.4 8.5 10.7

12.3 12.5 15.3 17.2

(i) Mean (Average)

𝟐𝟐. 𝟖𝟖 + 𝟑𝟑. 𝟐𝟐 + ⋯ + 𝟏𝟏𝟐𝟐. 𝟓𝟓 + 𝟏𝟏𝟓𝟓. 𝟑𝟑 + 𝟏𝟏𝟏𝟏. 𝟐𝟐 𝟏𝟏𝟑𝟑𝟖𝟖


�=
𝐗𝐗 = = 𝟖𝟖. 𝟏𝟏𝟐𝟐𝟓𝟓
𝟏𝟏𝟏𝟏 𝟏𝟏𝟏𝟏

(ii) Median

𝐒𝐒𝐌𝐌𝐒𝐒𝐒𝐒𝐒𝐒𝐌𝐌 𝐒𝐒𝐏𝐏𝐒𝐒𝐌𝐌 + 𝟏𝟏 𝟏𝟏𝟏𝟏 + 𝟏𝟏


𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝐏𝐏𝐨𝐨 𝐌𝐌𝐌𝐌𝐌𝐌𝐏𝐏𝐌𝐌𝐏𝐏 = = = 𝟖𝟖. 𝟓𝟓𝐏𝐏𝐭𝐭
𝟐𝟐 𝟐𝟐

𝐇𝐇𝐌𝐌𝐏𝐏𝐇𝐇𝐌𝐌, 𝐌𝐌𝐌𝐌𝐌𝐌𝐏𝐏𝐌𝐌𝐏𝐏 𝐏𝐏𝐏𝐏 𝐌𝐌𝐚𝐚𝐌𝐌𝐚𝐚𝐌𝐌𝐚𝐚𝐌𝐌 𝐚𝐚𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 𝐛𝐛𝐌𝐌𝐏𝐏𝐛𝐛𝐌𝐌𝐌𝐌𝐏𝐏 𝐌𝐌𝐌𝐌𝐏𝐏𝐌𝐌 𝐒𝐒𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝐌𝐌𝐏𝐏 𝐒𝐒𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 8th and 9th

𝟖𝟖. 𝟐𝟐 + 𝟖𝟖. 𝟑𝟑
𝐌𝐌𝐌𝐌𝐌𝐌𝐏𝐏𝐌𝐌𝐏𝐏 = = 𝟖𝟖. 𝟐𝟐𝟓𝟓
𝟐𝟐

(iii) Mode

Mode is the number with highest frequency of occurrence in the data table, or 5.9 (2

times – most occurring)

(iv) First Quartile

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

𝟏𝟏 𝟏𝟏 𝟏𝟏
𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝐏𝐏𝐨𝐨 𝟏𝟏𝐏𝐏𝐏𝐏 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 = ∗ (𝐒𝐒𝐌𝐌𝐒𝐒𝐒𝐒𝐒𝐒𝐌𝐌 𝐒𝐒𝐏𝐏𝐒𝐒𝐌𝐌 + 𝟏𝟏) = ∗ (𝟏𝟏𝟏𝟏 + 𝟏𝟏) = ∗ 𝟏𝟏𝟏𝟏
𝟒𝟒 𝟒𝟒 𝟒𝟒

= 𝟒𝟒. 𝟐𝟐𝟓𝟓𝐏𝐏𝐭𝐭

𝐇𝐇𝐌𝐌𝐏𝐏𝐇𝐇𝐌𝐌, 𝐅𝐅𝐏𝐏𝐚𝐚𝐏𝐏𝐏𝐏 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 𝐏𝐏𝐏𝐏 𝐚𝐚𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 𝐌𝐌𝐏𝐏 𝐌𝐌𝐌𝐌𝐏𝐏𝐌𝐌 𝐒𝐒𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 − 𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝟒𝟒𝐏𝐏𝐭𝐭 (𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌)

𝐅𝐅𝐏𝐏𝐚𝐚𝐏𝐏𝐏𝐏 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 = 𝟓𝟓. 𝟗𝟗 (𝐚𝐚𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 𝐌𝐌𝐏𝐏 𝐒𝐒𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝟒𝟒𝐏𝐏𝐭𝐭 )

(v) Third Quartile

𝟑𝟑 𝟑𝟑 𝟑𝟑
𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝐏𝐏𝐨𝐨 𝟑𝟑𝐚𝐚𝐌𝐌 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 = ∗ (𝐒𝐒𝐌𝐌𝐒𝐒𝐒𝐒𝐒𝐒𝐌𝐌 𝐒𝐒𝐏𝐏𝐒𝐒𝐌𝐌 + 𝟏𝟏) = ∗ (𝟏𝟏𝟏𝟏 + 𝟏𝟏) = ∗ 𝟏𝟏𝟏𝟏
𝟒𝟒 𝟒𝟒 𝟒𝟒

= 𝟏𝟏𝟐𝟐. 𝟏𝟏𝟓𝟓𝐏𝐏𝐭𝐭

𝐇𝐇𝐌𝐌𝐏𝐏𝐇𝐇𝐌𝐌, 𝐓𝐓𝐭𝐭𝐏𝐏𝐚𝐚𝐌𝐌 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 𝐏𝐏𝐏𝐏 𝐚𝐚𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 𝐌𝐌𝐏𝐏 𝐌𝐌𝐌𝐌𝐏𝐏𝐌𝐌 𝐒𝐒𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 − 𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝟏𝟏𝟑𝟑𝐚𝐚𝐌𝐌 (𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌)

𝐓𝐓𝐭𝐭𝐏𝐏𝐚𝐚𝐌𝐌 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 = 𝟏𝟏𝟐𝟐. 𝟑𝟑 (𝐚𝐚𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 𝐌𝐌𝐏𝐏 𝐒𝐒𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 𝟏𝟏𝟑𝟑𝐚𝐚𝐌𝐌 )

(vi) Range

𝐑𝐑𝐌𝐌𝐏𝐏𝐚𝐚𝐌𝐌 = 𝐌𝐌𝐌𝐌𝐌𝐌𝐏𝐏𝐒𝐒𝐯𝐯𝐒𝐒 𝐕𝐕𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 − 𝐌𝐌𝐏𝐏𝐏𝐏𝐏𝐏𝐒𝐒𝐯𝐯𝐒𝐒 𝐕𝐕𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 = 𝟏𝟏𝟏𝟏. 𝟐𝟐 − 𝟐𝟐. 𝟖𝟖 = 𝟏𝟏𝟒𝟒. 𝟒𝟒

(vii) Interquartile Range

𝐈𝐈𝐏𝐏𝐏𝐏𝐌𝐌𝐚𝐚𝐈𝐈𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 𝐑𝐑𝐌𝐌𝐏𝐏𝐚𝐚𝐌𝐌 = 𝟑𝟑𝐚𝐚𝐌𝐌 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 − 𝟏𝟏𝐏𝐏𝐏𝐏 𝐐𝐐𝐯𝐯𝐌𝐌𝐚𝐚𝐏𝐏𝐏𝐏𝐒𝐒𝐌𝐌 = 𝟏𝟏𝟐𝟐. 𝟑𝟑 − 𝟓𝟓. 𝟗𝟗 = 𝟏𝟏. 𝟒𝟒

(viii) Standard Deviation

𝐒𝐒𝐏𝐏𝐌𝐌𝐏𝐏𝐌𝐌𝐌𝐌𝐚𝐚𝐌𝐌 𝐃𝐃𝐌𝐌𝐚𝐚𝐏𝐏𝐌𝐌𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏 = √𝐕𝐕𝐌𝐌𝐚𝐚𝐏𝐏𝐌𝐌𝐏𝐏𝐇𝐇𝐌𝐌 = √𝟏𝟏𝟏𝟏. 𝟒𝟒𝟏𝟏 = 𝟒𝟒. 𝟎𝟎𝟓𝟓𝟗𝟗

Part b

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

The Interquartile Range measures the amount of variation in the data set. In this case, the middle

50% of “waiting time at the line till customers reach the teller window” lie within a range of 6.4

minutes.

Part c

The Standard Deviation also measures the amount of variation in the data set. It tells us how far,

on average, the data is away from the mean (average). In this case, the average variation away

from the mean number of waiting time is 4.059 minutes.

Part d

Skewness Coefficient is 0.684. The positive value indicates positive skew. This value closes to 0, or

slight skewness.

Mean of 8.625 minutes, which is higher than median of 8.25 minutes. Positive skewness results in

the mean being pulled to the right of the median. Mean not far from the median, slight skewness.

(11 marks)

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Question 2

Part a

% of Total Location
Total
Age of Store Country Mall Strip
0 < 5 years 5.3% 10.0% 10.0% 25.3%

5 < 10 years 14.7% 17.3% 4.7% 36.7%

10 < 15 years 4.0% 9.3% 8.0% 21.3%

15 < 20 years 1.3% 3.3% 7.3% 12.0%

25 < 25 years 0.0% 0.7% 4.0% 4.7%


Total 25.3% 40.7% 34.0% 100%

Hints:

𝟏𝟏 𝟓𝟓
= 𝟒𝟒. 𝟏𝟏% (𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌); = 𝟑𝟑. 𝟑𝟑% (𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌)
𝟏𝟏𝟓𝟓𝟎𝟎 𝟏𝟏𝟓𝟓𝟎𝟎

% of Row Location
Total
Age of Store Country Mall Strip
0 < 5 years 21.0% 39.5% 39.5% 100%

5 < 10 years 40.0% 47.3% 12.7% 100%

10 < 15 years 18.8% 43.8% 37.5% 100%

15 < 20 years 11.1% 27.8% 61.1% 100%

25 < 25 years 0.0% 14.3% 85.7% 100%


Total 25.3% 40.7% 34.0% 100%

Hints:

𝟐𝟐𝟏𝟏 𝟐𝟐
= 𝟒𝟒𝟏𝟏. 𝟑𝟑% (𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌); = 𝟏𝟏𝟏𝟏. 𝟏𝟏% (𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌)
𝟓𝟓𝟓𝟓 𝟏𝟏𝟖𝟖

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Part b

Probability it would be a Mall store with an Age of less than 5 years

𝟏𝟏𝟓𝟓
= 𝟏𝟏𝟎𝟎%
𝟏𝟏𝟓𝟓𝟎𝟎

Probability it would be aged 20 years or more

𝟏𝟏
= 𝟒𝟒. 𝟏𝟏%
𝟏𝟏𝟓𝟓𝟎𝟎

Probability it would be aged from 10 to less than 15 years given that it was a Strip store

𝟏𝟏𝟐𝟐
= 𝟐𝟐𝟑𝟑. 𝟓𝟓%
𝟓𝟓𝟏𝟏

Probability it would be a Country store given that it was aged 20 years or more

𝟎𝟎
= 𝟎𝟎%
𝟏𝟏

(12 marks)

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Question 3

Part a

Mean or Proportion  Proportion  Use Z Score

Sample Proportion

𝐒𝐒 = 𝟏𝟏𝟏𝟏. 𝟐𝟐𝟓𝟓% = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓

Sample Size

𝐏𝐏 = 𝟒𝟒𝟎𝟎𝟎𝟎

90% Confidence Interval

𝐙𝐙 − 𝐒𝐒𝐇𝐇𝐏𝐏𝐚𝐚𝐌𝐌 = ± 𝟏𝟏. 𝟏𝟏𝟒𝟒𝟓𝟓

Confidence Interval Formula (for lower & upper tails)

𝐒𝐒 ∗ (𝟏𝟏 − 𝐒𝐒) 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓 ∗ (𝟏𝟏 − 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓)


𝐒𝐒 ± 𝐙𝐙 ∗ � = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓 ± 𝟏𝟏. 𝟏𝟏𝟒𝟒𝟓𝟓 ∗ � = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓 ± 𝟎𝟎. 𝟎𝟎𝟑𝟑𝟎𝟎𝟑𝟑
𝐏𝐏 𝟒𝟒𝟎𝟎𝟎𝟎

= 𝟎𝟎. 𝟏𝟏𝟑𝟑𝟐𝟐𝟐𝟐 𝐏𝐏𝐏𝐏 𝟎𝟎. 𝟏𝟏𝟗𝟗𝟐𝟐𝟖𝟖,

𝐏𝐏𝐚𝐚 𝟏𝟏𝟑𝟑. 𝟐𝟐𝟐𝟐% 𝐏𝐏𝐏𝐏 𝟏𝟏𝟗𝟗. 𝟐𝟐𝟖𝟖%

Part b

Margin of Error (ME)

𝐌𝐌𝐌𝐌 = 𝟑𝟑% = 𝟎𝟎. 𝟎𝟎𝟑𝟑 (𝐚𝐚𝐏𝐏𝐚𝐚𝐌𝐌𝐏𝐏 𝐏𝐏𝐏𝐏 𝐏𝐏𝐭𝐭𝐌𝐌 𝐈𝐈𝐯𝐯𝐌𝐌𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏𝐏)

95% Confidence Interval

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

𝐙𝐙 = ± 𝟏𝟏. 𝟗𝟗𝟏𝟏

Estimation of Population Proportion (π)

𝐒𝐒 = 𝟏𝟏𝟏𝟏. 𝟐𝟐𝟓𝟓% = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓

Sample Size required

𝐙𝐙 𝟐𝟐 ∗ 𝛑𝛑 ∗ (𝟏𝟏 − 𝛑𝛑) 𝟏𝟏. 𝟗𝟗𝟏𝟏𝟐𝟐 ∗ 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟐𝟐𝟓𝟓 ∗ 𝟎𝟎. 𝟖𝟖𝟑𝟑𝟏𝟏𝟓𝟓


𝐏𝐏 = = = 𝟓𝟓𝟖𝟖𝟎𝟎. 𝟗𝟗, 𝐚𝐚𝐏𝐏𝐯𝐯𝐏𝐏𝐌𝐌𝐌𝐌𝐌𝐌 𝐏𝐏𝐏𝐏 𝟓𝟓𝟖𝟖𝟏𝟏
𝐌𝐌𝐌𝐌𝟐𝟐 𝟎𝟎. 𝟎𝟎𝟑𝟑𝟐𝟐

Hence, if the management want to know the true proportion of customers who rank the length of

time they have to spend in queues as ‘excellent’ to within 3% with 95% confidence, 581 data

point need to be taken to achieve these requirements

Part c

We cannot conclude that there is a difference between all male and female customers, who gave a

rating of ‘excellent’.

It is because the numbers are mixed; that is, we have a positive and a negative number.

(7 marks)

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Question 4

Part a

INSTRUCTIONS

Step 1 Mean or Proportion Mean

Population standard
Step 2 No  Use t scores
deviation 𝛔𝛔 known?

Step 3 Degrees of freedom (d.f) 𝐌𝐌. 𝐨𝐨 = 𝐏𝐏 − 𝟏𝟏 = 𝟏𝟏𝟐𝟐𝟎𝟎 − 𝟏𝟏 = 𝟏𝟏𝟏𝟏𝟗𝟗

Upper-tail area = 2.5% = 0.025


Step 4 95% confidence interval
Use t table: 𝐏𝐏 𝐇𝐇𝐚𝐚𝐏𝐏𝐏𝐏𝐏𝐏𝐇𝐇𝐌𝐌𝐒𝐒 𝐚𝐚𝐌𝐌𝐒𝐒𝐯𝐯𝐌𝐌 = 𝟏𝟏. 𝟗𝟗𝟏𝟏𝟗𝟗𝟗𝟗

𝐏𝐏
𝐌𝐌� ± 𝐏𝐏 ∗
√𝐏𝐏
𝟑𝟑𝟏𝟏𝟑𝟑
𝟏𝟏𝟏𝟏𝟏𝟏𝟓𝟓 ± 𝟏𝟏. 𝟗𝟗𝟏𝟏𝟗𝟗𝟗𝟗 ∗
√𝟏𝟏𝟐𝟐𝟎𝟎
Step 5 Applying formula
𝟏𝟏𝟏𝟏𝟏𝟏𝟓𝟓 ± 𝟏𝟏𝟏𝟏. 𝟒𝟒𝟏𝟏𝟓𝟓𝟖𝟖

[1107.58, 1242.42] square metres

We are 95% confident that the mean lot

Conclusion for 95% size for All Tasmania houses is somewhere


Step 6
confidence interval between 1107.58 and 1242.42 square

metres.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Part b

Given the above confidence interval, it seems that the mean lot size for all

Tasmania houses is higher than the mean lot size for Sydney houses.

Part c

90% confidence instead of 95% would produce a narrower interval.

Hence, this would not change the conclusion in (b).

Part d

If we decrease the sample size, the Margin of Error would increase, in turn

resulting in a wider confidence interval – potentially including the Sydney mean

of 1,000.

Hence, this might change the conclusion in (b)

Part e

If the sample size was less than 30, we would not be able to simply assume that

the distribution of the sample mean is normal.

If, upon checking, it was not normal, we would not be able to construct a

confidence interval.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

(9 marks)

Question 5

Part a

The variable Mng-Sex (dummy variable – 0 is Male, 1 is Female) is NOT significant

as its p-value of 0.913, which is higher than 0.05.

Part b

� = 𝟒𝟒. 𝟏𝟏𝟏𝟏𝟎𝟎 + 𝟎𝟎. 𝟎𝟎𝟑𝟑𝟗𝟗 ∗ 𝐀𝐀𝐌𝐌𝐚𝐚 − 𝟎𝟎. 𝟎𝟎𝟒𝟒𝟓𝟓 ∗ 𝐌𝐌𝐏𝐏𝐚𝐚𝐒𝐒𝐌𝐌𝐌𝐌 + 𝟎𝟎. 𝟎𝟎𝟐𝟐𝟏𝟏 ∗ 𝐂𝐂𝐌𝐌𝐚𝐚 𝐒𝐒𝐒𝐒𝐌𝐌𝐇𝐇𝐌𝐌𝐏𝐏
𝐒𝐒𝐌𝐌𝐒𝐒𝐌𝐌𝐏𝐏

Part c

b 0 = 4.660

A supermarket will have sales of $million 4.660 when all other variables have a zero

value.

b 1 = 0.039

For every extra $1,000 in Advertising, sales increase by $39,000 on average.

Part d

𝐑𝐑𝟐𝟐 = 𝟎𝟎. 𝟏𝟏𝟐𝟐𝟗𝟗, or 72.90%

The value of R2 is high, suggesting a strong relationship between Sales and

Advertising, Manager Sex, Car Spaces.

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Approximately 72.90% of the variation in Sales can be explained by the variation

in Advertising, Manager Sex, Car Spaces.

The remaining 27.10% of the variation in Sales can be explained by other factors

not included in this model.

Part e

The adjusted R2 is ONLY used to compare one regression model with another,

where the higher adjusted R2 being the preferred model.

Part f

The population coefficient (β 1 ) for Advertising is not zero because the p-value for

this variable is less than 0.05, we can conclude that it is a significant variable in the

model.

(13 marks)

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Question 5

Step 1

𝐇𝐇𝟎𝟎 : 𝛑𝛑 = 𝟏𝟏𝟎𝟎%

The proportion of customers shopping on the website is 10%.

𝐇𝐇𝟏𝟏 : 𝛑𝛑 ≠ 𝟏𝟏𝟎𝟎%

The proportion of customers shopping on the website is different 10%.

Step 2

𝐇𝐇𝟏𝟏 𝐭𝐭𝐌𝐌𝐏𝐏 𝐌𝐌 ≠ 𝐏𝐏𝐏𝐏𝐚𝐚𝐏𝐏; 𝐭𝐭𝐌𝐌𝐏𝐏𝐇𝐇𝐌𝐌, 𝐏𝐏𝐏𝐏 𝐏𝐏𝐏𝐏 𝐏𝐏𝐛𝐛𝐏𝐏 − 𝐏𝐏𝐌𝐌𝐏𝐏𝐒𝐒 𝐏𝐏𝐌𝐌𝐏𝐏𝐏𝐏

Step 3

𝛂𝛂 = 𝟓𝟓% = 𝟎𝟎. 𝟎𝟎𝟓𝟓

Critical value of 𝐙𝐙 = ±𝟏𝟏. 𝟗𝟗𝟏𝟏

Step 4

If the sample produces a Z Score lower than −𝟏𝟏. 𝟗𝟗𝟏𝟏 or higher than +𝟏𝟏. 𝟗𝟗𝟏𝟏, we will

reject 𝐇𝐇𝟎𝟎

Downloaded by James acaster (jiimmya662@gmail.com)


lOMoARcPSD|6387688

Step 5

𝛑𝛑 ∗ (𝟏𝟏 − 𝛑𝛑) 𝟎𝟎. 𝟏𝟏 ∗ (𝟏𝟏 − 𝟎𝟎. 𝟏𝟏)


𝐒𝐒𝐏𝐏𝐌𝐌𝐏𝐏𝐌𝐌𝐌𝐌𝐚𝐚𝐌𝐌 𝐌𝐌𝐚𝐚𝐚𝐚𝐏𝐏𝐚𝐚 (𝛔𝛔𝐒𝐒 ) = � =� = 𝟎𝟎. 𝟎𝟎𝟏𝟏𝟓𝟓
𝐏𝐏 𝟒𝟒𝟎𝟎𝟎𝟎

𝟐𝟐𝟖𝟖
Sample Proportion = = 𝟎𝟎. 𝟎𝟎𝟏𝟏
𝟒𝟒𝟎𝟎𝟎𝟎

𝐒𝐒 − 𝛑𝛑 𝟎𝟎. 𝟎𝟎𝟏𝟏 − 𝟎𝟎. 𝟏𝟏


𝐙𝐙 = = = −𝟐𝟐
𝛔𝛔𝐒𝐒 𝟎𝟎. 𝟎𝟎𝟏𝟏𝟓𝟓

Step 6

As the sample Z-Score is (-2), which is lower than the critical value of Z (-1.96), we

reject 𝐇𝐇𝟎𝟎 .

Conclusion

The proportion of customers shopping on the website is different from 10%.

P-value

We know that:

• If p-value less than alpha (level of significance), we reject 𝐇𝐇𝟎𝟎

• If p-value higher than alpha, we do not reject 𝐇𝐇𝟎𝟎

• Given that we reject 𝐇𝐇𝟎𝟎 , the p-value must be smaller than alpha.

(8 marks)

Downloaded by James acaster (jiimmya662@gmail.com)

You might also like