KEMBAR78
Core Data Analysis Worksheet 6 | PDF | Ordinary Least Squares | Regression Analysis
0% found this document useful (0 votes)
216 views20 pages

Core Data Analysis Worksheet 6

This document contains 10 multiple choice questions about correlation, regression, and least squares regression lines. It provides scatter plots of bivariate data and equations of least squares regression lines to analyze relationships between variables and make predictions.

Uploaded by

chris brown
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
216 views20 pages

Core Data Analysis Worksheet 6

This document contains 10 multiple choice questions about correlation, regression, and least squares regression lines. It provides scatter plots of bivariate data and equations of least squares regression lines to analyze relationships between variables and make predictions.

Uploaded by

chris brown
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

3.

CORE, FUR1 2007 VCAA 7-8 MC


FURMATHS: CORE, Data Distributions
The lengths and diameters (in mm) of a sample of jellyfish selected were recorded and
Correlation and Regression displayed in the scatterplot below. The least squares regression line for this data is shown.

Teacher: Dora Verrocchi


Exam Equivalent Time: 108 minutes (based on VCE allocation of 2.25
min/mark for FUR1, and 1.5 min/mark for FUR2)

Questions
1. CORE, FUR1 2006 VCAA 7 MC
For a set of bivariate data, involving the variables and ,

The equation of the least squares regression line is closest to


A.
B.
C.
D.
E.

2. CORE, FUR1 2006 VCAA 8 MC


The waist measurement (cm) and weight (kg) of 12 men are displayed in the table below. The equation of the least squares regression line is
length = 3.5 + 0.87 × diameter
The correlation coefficient is

Using this data, the equation of the least squares regression line that enables weight to be
Part 1
predicted from waist measurement is
Written as a percentage, the coefficient of determination is closest to

When this equation is used to predict the weight of the man with a waist measurement of 80 A.
cm, the residual value is closest to B.
A. C.
B. D.
C. E.
D.
E. Part 2
From the equation of the least squares regression line, it can be concluded that for these
jellyfish, on average
A. there is a 3.5 mm increase in diameter for each 1 mm increase in length.
B. there is a 3.5 mm increase in length for each 1 mm increase in diameter.
C. there is a 0.87 mm increase in diameter for each 1 mm increase in length.
D. there is a 0.87 mm increase in length for each 1 mm increase in diameter.
E. there is a 4.37 mm increase in diameter for each 1 mm increase in length.

4. CORE, FUR1 2010 VCAA 10 MC


For a set of bivariate data that involves the variables and , with as the dependent
variable

The equation of the least squares regression line is closest to


A.
B.
C.
D.
E.

5. CORE, FUR1 2011 VCAA 6-8 MC


When blood pressure is measured, both the systolic (or maximum) pressure and the diastolic Part 1
(or minimum) pressure are recorded. Correct to one decimal place, the mean and standard deviation of this person's systolic
Table 1 displays the blood pressure readings, in mmHg, that result from fifteen successive blood pressure measurements are respectively
measurements of the same person's blood pressure. A.
B.
C.
D.
E.

Part 2
Using systolic blood pressure (systolic) as the dependent variable, and diastolic blood
pressure (diastolic) as the independent variable, a least squares regression line is fitted to
the data in Table 1.
The equation of the least squares regression line is closest to
A.
B.
C. 6. CORE, FUR1 2012 VCAA 8 MC
D.
The maximum wind speed and maximum temperature were recorded each day for a month.
E. The data is displayed in the scatterplot below and a least squares regression line has been
fitted. The dependent variable is temperature. The independent variable is wind speed.

Part 3
From the fifteen blood pressure measurements for this person, it can be concluded that the
percentage of the variation in systolic blood pressure that is explained by the variation in
diastolic blood pressure is closest to
A.
B.
C.
D.
E.

The equation of the least squares regression line is closest to


A.
B.
C.
D.
E.

7. CORE, FUR1 2013 VCAA 11 MC


A least squares regression line is fitted to data in a scatterplot, as shown below.
8. CORE, FUR1 2014 VCAA 9 MC
The equation of a least squares regression line is used to predict the fuel consumption, in
kilometres per litre of fuel, from a car’s weight, in kilograms.
This equation predicts that a car weighing 900 kg will travel 10.7 km per litre of fuel, while a
car weighing 1700 kg will travel 6.7 km per litre of fuel.
The slope of this least squares regression line is closest to
A.
B.
C.
The corresponding residual plot is closest to D.
E.

9. CORE, FUR1 2015 VCAA 10 MC


For a set of bivariate data that involves the variables and :
, , , ,

Given the information above, the least squares regression line predicting from is closest
to
A.
B.
C.
D.
E.
10. CORE, FUR1 2015 VCAA 9 MC

The equation of this least squares line is


life expectancy = 43.0 + 0.422 × HDI

The coefficient of determination is = 0.875

Part 1
A least squares regression line has been fitted to the scatterplot above to enable distance, in Given the information above, which one of the following statements is not true?
kilometres, to be predicted from time, in minutes.
A. The value of the correlation coefficient is close to 0.94
The equation of this line is closest to
B. 12.5% of the variation in life expectancy is not explained by the variation in the Human
A. distance time Development Index.
B. time distance C. On average, life expectancy increases by 43.0 years for each 10-point increase in the
Human Development Index.
C. distance time
D. Ignoring any outliers, the association between life expectancy and the Human
D. time distance
Development Index can be described as strong, positive and linear.
E. distance time
E. Using the least squares line to predict the life expectancy in a country with a Human
Development Index of 75 is an example of interpolation.
11. CORE, FUR1 2016 VCAA 9-10 MC
The scatterplot below shows life expectancy in years (life expectancy) plotted against the Part 2
Human Development Index (HDI) for a large number of countries in 2011.
In 2011, life expectancy in Australia was 81.8 years and the Human Development Index was
A least squares line has been fitted to the data and the resulting residual plot is also shown. 92.9
When the least squares line is used to predict life expectancy in Australia, the residual is
closest to
A.
B.
C.
D.
E.
12. CORE, FUR1 2018 VCAA 10 MC closest to
A.
In a study of the association between a person’s height, in centimetres, and body surface
area, in square metres, the following least squares line was obtained. B.
body surface area = –1.1 + 0.019 × height C.
Which one of the following is a conclusion that can be made from this least squares line? D.
A. An increase of 1 m² in body surface area is associated with an increase of 0.019 cm in E.
height.
B. An increase of 1 cm in height is associated with an increase of 0.019 m² in body
Part 2
surface area.
The independent variable is foot length.
C. The correlation coefficient is 0.019
The equation of the least squares regression line is closest to
D. A person’s body surface area, in square metres, can be determined by adding 1.1 cm
A. height = –110 + 0.78 × foot length.
to their height.
B. height = 141 + 1.3 × foot length.
E. A person’s height, in centimetres, can be determined by subtracting 1.1 from their body
surface area, in square metres. C. height = 167 + 1.3 × foot length.
D. height = 167 + 0.67 × foot length.
13. CORE, FUR1 2010 VCAA 7-9 MC E. foot length = 167 + 1.3 × height.

The height (in cm) and foot length (in cm) for each of eight Year 12 students were recorded
and displayed in the scatterplot below. Part 3
A least squares regression line has been fitted to the data as shown.
The plot of the residuals against foot length is closest to

Part 1
By inspection, the value of the product-moment correlation coefficient for this data is
20

19

wrist
circumference 18
(cm)

17

16
21 22 23 24 25 26
ankle circumference (cm)

Part 1
The equation of the least squares line is closest to
A. ankle = 10.2 + 0.342 × wrist
B. wrist = 10.2 + 0.342 × ankle
C. ankle = 17.4 + 0.342 × wrist
D. wrist = 17.4 + 0.342 × ankle
E. wrist = 17.4 + 0.731 × ankle

Part 2
When the least squares line on the scatterplot is used to predict the wrist circumference of
the person with an ankle circumference of 24 cm, the residual will be closest to
14. CORE, FUR1 2017 VCAA 8-10 MC
A.
The scatterplot below shows the wrist circumference and ankle circumference, both in B.
centimetres, of 13 people. A least squares line has been fitted to the scatterplot with ankle
circumference as the explanatory variable. C.
D.
E.

Part 3
The residuals for this least squares line have a mean of 0.02 cm and a standard deviation of
0.4 cm.
The value of the residual for one of the data points is found to be – 0.3 cm.
The standardised value of this residual is
A. A least squares line is to be fitted to the data with the aim of predicting evening congestion
level from morning congestion level.
B.
The equation of this line is.
C.
evening congestion level = 8.48 + 0.922 × morning congestion level
D.
E.
b. Name the response variable in this equation. (1 mark)

c. Use the equation of the least squares line to predict the evening congestion level when the
15. CORE, FUR2 2018 VCAA 2 morning congestion level is 60%. (1 mark)

The congestion level in a city can be recorded as the percentage increase in travel time due d. Determine the residual value when the equation of the least squares line is used to predict
to traffic congestion in peak periods (compared to non-peak periods). the evening congestion level when the morning congestion level is 47%.
Round your answer to one decimal place? (2 marks)
This is called the percentage congestion level.
e. The value of the correlation coefficient
The percentage congestion levels for the morning and evening peak periods for 19 large
is 0.92
cities are plotted on the scatterplot below.
What percentage of the variation in the evening congestion level can be explained by the
variation in the morning congestion level?
Round your answer to the nearest whole number. (1 mark)

16. CORE, FUR2 2007 VCAA 3


The table below displays the mean surface temperature (in °C) and the mean duration of
warm spell (in days) in Australia for 13 years selected at random from the period 1960 to
2005.

a. Determine the median percentage congestion level for the morning peak period and the
evening peak period.
Write your answers in the appropriate boxes provided below. (2 marks)

Median percentage congestion level for morning peak


%
period

Median percentage congestion level for evening peak period %


This data set has been used to construct the scatterplot below. The scatterplot is incomplete.
a. Complete the scatterplot below by plotting the bold data values given in the table above.
Mark the point with a cross (×). (1 mark)

c. Explain why this residual plot supports the assumption of linearity for this relationship. (1
mark)

d. Write down the percentage of variation in the mean duration of a warm spell that is
explained by the variation in mean surface temperature. Write your answer correct to the
nearest per cent. (1 mark)
e. Describe the relationship between the mean duration of a warm spell and the mean surface
temperature in terms of strength, direction and form. (2 marks)

b. Mean surface temperature is the independent variable.


i. Determine the equation of the least squares regression line for this set of data. Write
the equation in terms of the variables mean duration of warm spell and mean surface
temperature. Write the value of the coefficients correct to one decimal place. (2 marks)
ii. Plot the least squares regression line on Scatterplot 1. (1 mark)

The residual plot below was constructed to test the assumption of linearity for the
relationship between the variables mean duration of warm spell and the mean surface
temperature.
17. CORE, FUR2 2010 VCAA 2
In the scatterplot below, average annual female income, in dollars, is plotted against
average annual male income, in dollars, for 16 countries. A least squares regression line is
fitted to the data.

a. Use the scatterplot to describe the association between apparent temperature and actual
temperature in terms of strength, direction and form. (1 mark)
b.
The equation of the least squares regression line for predicting female income from male i. Determine the equation of the least squares line that can be used to predict the
income is apparent temperature from the actual temperature.
female income = 13 000 + 0.35 × male income Write the values of the intercept and slope of this least squares line in the appropriate
boxes provided below.
a. What is the independent variable? (1 mark)
Round your answers to two significant figures. (3 marks)
b. Complete the following statement by filling in the missing information.
From the least squares regression line equation it can be concluded that, for these apparent temperature
actual temperature
countries, on average, female income increases by for each $1000 increase in
male income. (1 mark)
ii. Interpret the intercept of the least squares line in terms of the variables apparent
c. temperature and actual temperature. (1 mark)
i. Use the least squares regression line equation to predict the average annual female
c. The coefficient of determination for the association between the variables apparent
income (in dollars) in a country where the average annual male income is $15 000. (1
temperature and actual temperature is 0.97
mark)
Interpret the coefficient of determination in terms of these variables. (1 mark)
ii. The prediction made in part c.i. is not likely to be reliable.
d. The residual plot obtained when the least squares line was fitted to the data is shown
Explain why. (1 mark)
below.

18. CORE, FUR2 2016 VCAA 3


The data in the table below shows a sample of actual temperatures and apparent
temperatures recorded at a weather station. A scatterplot of the data is also shown.
The data will be used to investigate the association between the variables apparent
temperature and actual temperature.
19. CORE, FUR2 2009 VCAA 3
The scatterplot below shows the rainfall (in mm) and the percentage of clear days for each
month of 2008.

i. A residual plot can be used to test an assumption about the nature of the association
between two numerical variables.
What is this assumption? (1 mark)
ii. Does the residual plot above support this assumption? Explain your answer. (1 mark)

An equation of the least squares regression line for this data set is
rainfall × percentage of clear days
a. Draw this line on the scatterplot. (1 mark)

b. Use the equation of the least squares regression line to predict the rainfall for a month with
35% of clear days. Write your answer in mm correct to one decimal place. (1 mark)
c. The coefficient of determination for this data set is 0.8081.
i. Interpret the coefficient of determination in terms of the variables rainfall and
percentage of clear days. (1 mark)
ii. Determine the value of Pearson’s product moment correlation coefficient. Write your
answer correct to three decimal places. (2 marks)
20. CORE, FUR2 2014 VCAA 2 21. CORE, FUR2 2012 VCAA 2
The scatterplot below shows the population and area (in square kilometres) of a sample of The maximum temperature and the minimum temperature at this weather station on each of
inner suburbs of a large city. the 30 days in November 2011 are displayed in the scatterplot below.

The correlation coefficient for this data set is .

The equation of the least squares regression line for the data in the scatterplot is The equation of the least squares regression line for this data set is

population = 5330 + 2680 × area maximum temperature = × minimum temperature

a. Write down the dependent variable. (1 mark)


a. Draw this least squares regression line on the scatterplot above. (1 mark)

b. Draw the least squares regression line on the scatterplot above. b. Interpret the vertical intercept of the least squares regression line in terms of maximum
temperature and minimum temperature. (1 mark)
(Answer on the scatterplot above.) (1 mark)
c. Describe the relationship between the maximum temperature and the minimum
c. Interpret the slope of this least squares regression line in terms of the variables area temperature in terms of strength and direction. (1 mark)
and population. (2 marks)
d. Interpret the slope of the least squares regression line in terms of maximum temperature
d. Wiston is an inner suburb. It has an area of 4 km² and a population of 6690. and minimum temperature. (1 mark)
The correlation coefficient, , is equal to 0.668
e. Determine the percentage of variation in the maximum temperature that may be explained
i. Calculate the residual when the least squares regression line is used to predict by the variation in the minimum temperature.
the population of Wiston from its area. (1 mark) Write your answer, correct to the nearest percentage. (1 mark)

ii. What percentage of the variation in the population of the suburbs is explained by On the day that the minimum temperature was 11.1 °C, the actual maximum
the variaton in area. temperature was 12.2 °C.
Write your answer, correct to one decimal place. (1 mark)
f. Determine the residual value for this day if the least squares regression line is used to
predict the maximum temperature.
Write your answer, correct to the nearest degree. (2 marks)

VCE Mathematics examination questions reproduced by permission, VCAA. VCE is a registered trademark of the VCAA. The
VCAA does not endorse or make any warranties regarding this study resource. Current and past VCE exams and related
content can be accessed directly at www.vcaa.vic.edu.au.
Worked Solutions 4. CORE, FUR1 2010 VCAA 10 MC

1. CORE, FUR1 2006 VCAA 7 MC

2. CORE, FUR1 2006 VCAA 8 MC

3. CORE, FUR1 2007 VCAA 7-8 MC


5. CORE, FUR1 2011 VCAA 6-8 MC 7. CORE, FUR1 2013 VCAA 11 MC

8. CORE, FUR1 2014 VCAA 9 MC

6. CORE, FUR1 2012 VCAA 8 MC


9. CORE, FUR1 2015 VCAA 10 MC 11. CORE, FUR1 2016 VCAA 9-10 MC

10. CORE, FUR1 2015 VCAA 9 MC

12. CORE, FUR1 2018 VCAA 10 MC

♦ Mean mark 51%.


13. CORE, FUR1 2010 VCAA 7-9 MC 14. CORE, FUR1 2017 VCAA 8-10 MC

♦♦ Mean mark 35%.


STRATEGY: An alternate but
less efficient strategy could be
to find 2 points and calculate
the gradient and then use the
point gradient formula.
15. CORE, FUR2 2018 VCAA 2
a.

Mean mark 56%.


COMMENT: This question was
surprisingly poorly answered.
Review carefully!

b.

c.

d.

Mean mark part (d) 53%.


COMMENT: Many students b.i.
had problems at a number of
stages in this part.

b.ii. MARKER'S COMMENT: A


consistent error in these type of
questions is taking 2 points
that are too close together!

e.

c.

16. CORE, FUR2 2007 VCAA 3

a.
d.
e. 18. CORE, FUR2 2016 VCAA 3
a.
17. CORE, FUR2 2010 VCAA 2
a.
b.i.

b.

b.ii. ♦♦ Mean mark of part (b)(ii) -


28%.
MARKER'S COMMENT: "the
predicted apparent temp is -
c.i. 1.7°C" also gained a mark.

c. ♦ Mean mark 49%.


IMPORTANT: Any mention of
causality loses a mark!
♦♦ This part was poorly
answered (exact data
c.ii.
unavailable).
MARKER'S COMMENT: Many
students offered "real world"
explanations which did not d.i.
gain a mark here.

♦ Mean mark of both parts of


(d) was 46%.
d.ii. MARKER'S COMMENT:
Students should refer to
randomness or a lack of
pattern explicitly here.
19. CORE, FUR2 2009 VCAA 3 20. CORE, FUR2 2014 VCAA 2

a. a.

b. ♦ Mean mark 36%.


MARKER'S COMMENT: Use
the equation to draw the line
and use points at the
extremities.

c.
b. ² ♦ Mean mark 41% (part (iii)).

MARKER'S COMMENT: Any d.i.


answers that suggested
c.i. causation with terms like "is
due to" etc.., did not receive a
mark.

c.ii. MARKER'S COMMENT: Many


students failed to include the
negative sign as indicated by d.ii. ♦ Part (iv) in total had a mean
the negative slope of the mark 42%.
graph.

21. CORE, FUR2 2012 VCAA 2


a.
Copyright © 2016-2019 M2 Mathematics Pty Ltd (SmarterMaths.com.au)

b.

c. ♦ Parts (i) to (vi) have an


average mean mark of 41%.

d.

e.

f.

MARKER'S COMMENT:
Students had particular
difficulty with this part, with
many using the incorrect
calculation of 12.2 - 11.1 = 1.1.

You might also like