Higher Level AA – 14.
Sampling and data statistics
Exercises for classes: b) What type of sampling method did Alex
use?
1. A scientific researcher conducting a study on lizards collected a sample of c) Can it be said that the sample is biased?
weights of 60 lizards by travelling to three different habitats and weighing d) Describe the distribution of the data.
the first distinct 20 lizards she spotted. e) Write down the modal number of coffee
a) Describe the difference between a sample and the population of lizards. cups being drank daily.
b) Is the collected data discrete or continuous? f) How many people did Alex ask?
c) Name the researcher’s method of sampling: g) Find the percentage of people drinking 3 or more cups of coffee per day.
i. considering one particular habitat, h) How many cups of coffee all people Alex met drink daily in total?
ii. in general of three habitats. i) Calculate the mean number of cups of coffee drank daily according to
d) Name the sampling method for the whole sample if the lizards in each Alex’s study.
habitat were chosen randomly. 3. The heights of 10 students, to the nearest centimeter, are as follows:
1
e) The weights of the lizards are shown below. Create a sample of of the 172 169 156 175 182 170 165 176 177 169
3
data using systematic sampling starting from the second data value. a) Write down the mode and calculate the mean.
26 36 26 34 32 26 36 32 32 15 16 30 22 28 30 b) Find the median and lower and upper quartile.
34 32 28 28 36 25 32 13 25 27 37 19 22 23 37 c) Write down the range and find interquartile range.
26 26 22 24 35 22 12 25 34 39 26 20 20 20 16 d) Identify the outlier(s).
28 21 21 27 28 18 23 31 24 33 30 16 31 15 34 e) Sketch the box and whisker diagram and comment on the result.
f) Calculate the variance and standard deviation.
f) Draw a histogram for the sample created in c) with class intervals of
4. A group of 100 IB students was given a maths test that was graded out of 20.
length 5.
The following is the distribution of the marks obtained:
g) Identify the modal class and describe the distribution of data.
2. Alex conducted a study asking all people he met in a day how many cups of Mark 9 10 11 12 13 14 15 16 17 18 19
coffee they are having daily. The column chart below presents the results. No. of students 1 1 3 5 8 13 19 24 14 10 2
a) Is number of coffee cups a discrete or continuous variable? a) Write down the mode.
1
Higher Level AA – 14. Sampling and data statistics
b) Calculate the mean. middle 50% of cars from this sample stayed on the parking lot for.
c) Find the median. b) Find the percentage of cars parking for more than 1 hour.
d) Find the upper and lower quartiles. c) Estimate the mean length of time of parking for this sample of cars.
e) Another group of 50 students had a mean mark of 17.16 on the same 7. PE teachers from an elementary school collected the lengths of the throws of
test. Calculate the mean of the entire group of 150 students. a ball of all 120 students and presented them in the cumulative frequency
5. For a mathematics project Eun-Kee timed the length of 30 popular Korean graph seen below.
songs to the nearest second. His raw data is presented below: a) How many throws at 32 m or less were there?
185 230 205 215 217 206 192 187 207 245 b) What was the shortest and the longest throw?
205 181 216 227 239 214 242 248 193 222 c) How many throws at 50 m or more were there?
217 219 204 234 227 236 234 217 186 236 d) What percentage of throws were less than 25 m?
a) Create the frequency and cumulative frequency distribution table using e) Find the median length of the throw.
class intervals: [180,190[, [190,200[ etc. f) Find the interquartile range for the throws’ lengths.
b) Use your table to estimate the mean. g) Find the: i. 10th percentile ii. 90th percentile iii. 40th percentile
c) Calculate the mean using the raw data.
d) Find the mode of the sample and the modal class using the frequency
distribution table. Is it possible for mode not to belong to the modal class?
6. The Foodcity supermarket recorded the
Cumulative frequency
Time (minutes) Frequency
length of time, to the nearest minute, that a 0 ≤ 𝑡 ≤ 15 13
sample of 200 cars was parked in their car 15 < 𝑡 ≤ 30 23
30 < 𝑡 ≤ 45 32
park. The results are shown in the table
45 < 𝑡 ≤ 60 33
besides. 60 < 𝑡 ≤ 75 27
a) Add the cumulative frequency column to 75 < 𝑡 ≤ 90 20
90 < 𝑡 ≤ 105 12
the table. Then draw appropriate parts of
105 < 𝑡 ≤ 120 11
the cumulative frequency graph to 120 < 𝑡 ≤ 135 10
estimate the number of minutes the 135 < 𝑡 ≤ 150 11
150 < 𝑡 ≤ 165 8
Throw’s length (m)
2
Higher Level AA – 14. Sampling and data statistics
8. The cumulative frequency table for the holiday No. of Cumulative Homework exercises:
destinations frequency
destinations visited by the students from some
𝑛=0 2
class is given in the table beside. 𝑛≤1 8 1. The 100 4th grade students from an elementary school participated in a math
a) Find the median, lower and upper quartile. 𝑛≤2 24 quiz. The quiz contained 10 questions marked 0 or 1 point each. The sample
𝑛≤3 31
b) Construct the frequency table. of 25 scores presented below was made by selecting every 4th score from a
𝑛≤4 34
c) Find the mode of the data. full set, beginning with second data value.
d) Two students were absent the day the data was collected. After taking 7 6 10 6 5 6 4 8 7 6 8 8 9
5 6 7 5 8 7 9 7 10 9 8 7
3
into account their responses the mean of the data dropped by . How
68 a) Name the sampling method used.
many destinations did each of these students visit during the holidays? b) Is the collected data discrete or continuous?
9. The set of integers: 𝑥, 4, 3, 7, 𝑦, 5, 0 has a mode of 3 and a mean of 4. Find the c) Draw a histogram for the results of the quiz.
values 𝑥 and 𝑦. d) Find the modal, median and mean score.
10. A Sample #1 has a mean of 10.5, a standard deviation of 4.44. and a following e) Describe the distribution of the data using the histogram made in c). Is it
five-number summary: always true for this type of distribution that:
minimum 𝑄1 median 𝑄3 maximum i. median and mode are equal?
5 6.5 10 14 19
ii. median and mean are approximately equal?
Sample #2 is created by adding 4 to each of the data value from the Sample #1
2. The scores of 100 students in a test are given in the table below:
and Sample #3 is created by multiplying each of the data value from the
score 3 4 5 6 7 8 9 10
Sample #1 by 3. For both new samples find: frequency 1 2 5 16 30 24 15 7
a) the five-number summary, a) Find the mode, median and mean of the data.
b) mean, b) Write down the range.
c) standard deviation. c) Find the lower and upper quartiles and calculate the inter-quartile range.
d) Draw box and whisker diagram.
e) Find the conditions for the outliers. Do you think the data values satisfying
these conditions should be excluded from the data set?
3
Higher Level AA – 14. Sampling and data statistics
f) Use technology to calculate the standard deviation of the data. 5. A biologist measures the lengths of 60 mature Frond’s length (cm) Frequency
g) What percentage of the scores are the modal scores? fern fronds to the nearest centimeter. The 10 ≤ 𝑥 ≤ 15 2
15 < 𝑥 ≤ 20 6
h) What percentage of the scores are the outliers? results are summarized in the table besides. 20 < 𝑥 ≤ 25 8
3. The set of prices (PLN) of some book is given below. a) Write down the modal class. 25 < 𝑥 ≤ 30 10
30 < 𝑥 ≤ 35 15
28.79 33.13 31.99 23.16 27.17 26.59 26.99 27.99 23.95 24.98 b) Draw a histogram of the fronds’ lengths.
35 < 𝑥 ≤ 40 9
25.39 24.37 26.30 25.30 28.79 24.75 27.17 29.99 23.13 23.96 c) Estimate the mean length of the frond. 40 < 𝑥 ≤ 45 6
24.94 23.92 32.39 30.52 24.60 32.47 28.79 24.64 26.77 24.28
d) Using appropriate part of the cumulative 45 < 𝑥 ≤ 50 4
a) Create a sample using the first 8 data values. Name the sampling method. frequency graph estimate the median length of the frond.
b) Create another sample of the same size using the systematic sampling
e) What percentage of the fronds are longer than 40 cm?
starting from the second data value.
6. The masses of a sample of potatoes were measured to the nearest gram and
c) Find the five-number summary for both samples.
are summarized besides. Cumulative
Mass (g)
d) Sketch two box and whisker diagrams on the same axis. frequency
a) Find the median.
e) Using the box and whisker diagrams compare the samples range, inter- 20 ≤ 𝑥 ≤ 30 12
b) Construct the frequency table. 30 < 𝑥 ≤ 40 31
quartile range, medians and symmetry.
c) Find the modal class. 40 < 𝑥 ≤ 50 88
4. Consider the complete set of book prices from Q3: 50 < 𝑥 ≤ 60 129
d) Estimate the mean of the data.
60 < 𝑥 ≤ 70 157
a) Is the data set discrete or continuous?
e) Estimate the standard deviation. 70 < 𝑥 ≤ 80 172
b) Create the frequency and cumulative frequency table with class intervals: 80 < 𝑥 ≤ 90 176
f) Calculate the percentage of potatoes heavier
22 ≤ 𝑥 < 24, 24 ≤ 𝑥 < 26 etc.
than 70 g.
c) Use the table to estimate the mean price of the book.
g) Around 89% of the potatoes weigh more than 𝑝 g. Find 𝑝.
d) Estimate the standard deviation of the book prices.
e) Find the modal book price using the complete data set and compare it to
the modal class. What property of the data can explain discovered
inconsistency?
4
Higher Level AA – 14. Sampling and data statistics
7. The following cumulative frequency graph displays the set of points scored by 8. A dendrologist measured the circumference of 60 maple trees and presented
the Atlanta Hawks in one season. the collected data in the below cumulative frequency graph.
Cumulative frequency
Cumulative frequency
Points scored
a) How many games there are in a season? Did the team play all of them? Circumference (cm)
b) Is the data continuous or discrete? a) How many trees have circumference of 125 cm or more?
c) What was the range of the points scored? b) What was the radius of the thickest tree?
d) At how many games the Hawks scored less than 100 points? c) Find the median circumference.
e) Estimate the percentage of the games in which the team scored between d) Estimate the 30th percentile.
120 and 130 points. e) Create the frequency table for the data with intervals: 100 < 𝑥 ≤ 110,
f) Estimate the 5th and 95th percentile. 110 < 𝑥 ≤ 120 etc.
g) Estimate the median and lower and upper quartile of the data. f) Find the modal class.
h) Check if the data contains outliers.
5
Higher Level AA – 14. Sampling and data statistics
9. The following frequency and cumulative frequency table shows the marks of a b) The Sample #2 is created by adding 𝑘 to each data value of the Sample #1.
math test of a group of 32 students. The mean mark was 5.25 and the lower Find 𝑘.
quartile was 4.5. Copy and complete the table. c) Draw the box and whisker diagram for the joined samples.
Mark 3 4 5 6 7 d) Does any of the samples, separate and joined, contain outliers?
Frequency 6 e) The number 𝑛 was added to each value of the Sample #1, which was then
Cumulative frequency 19 27
joined with Sample #2. Find possible values of 𝑛 if range of the created
10. Consider the below frequency table showing age of a group of 15 children. sample was 10.
Age 9 10 11 14. The following table show the five-number summary, mean and standard
Frequency 𝑎 𝑏 𝑐
deviation of three data samples. Sample #2 was created by subtracting 𝑥 from
a) Knowing that the median of the children’s age is 10 state the maximum each of the data value from the Sample #1 and Sample #3 was created by
values of the variables 𝑎 and 𝑐. multiplying each of the data value from the Sample #1 by 𝑦.
1
b) The mean age of the group is 10 . Find the value of an expression 𝑏 + 2𝑐. minimum 𝑄1 median 𝑄3 maximum mean st.dev.
3
Sample #1 20 14
c) The modal age is 11. Find the values of 𝑎, 𝑏 and 𝑐.
Sample #2 2 5.5 8.5 4.45
d) The mean age of this group of children including the teacher is 12. How
Sample #3 21 54 60
old is the teacher?
a) Find 𝑥 and 𝑦.
11. The set of integers: 8, 17, 11, 𝑎, 8, 𝑏, 9, 10 has a mean of 11, a median of 10.5
b) Complete the rest of the table.
and no mode. Find 𝑎 and 𝑏.
12. Consider the set of positive integers: 3, 6, 9, 11, 𝑥, 𝑥 + 1, 𝑦. The mode and the
mean of the set is 6. Find 𝑥 and 𝑦.
13. Two samples have the following five-number summaries:
minimum 𝑄1 median 𝑄3 maximum
Sample #1 7 8 10 13 15
Sample #2 12 13 15 18 20
a) Draw the box and whisker diagram for both samples on the same axis.
6
Higher Level AA – 14. Sampling and data statistics
Answers for classes: c) not really, the bias depends on the places he 5. a)
Cumulative
went that day. Length Frequency
frequency
1. a) A sample is the group of lizards which were d) asymmetrical, positively skewed [180,190[ 4 4
weighted, the population is the whole group of [190,200[ 2 6
e) 1
[200,210[ 5 11
lizards living in visited and nearby habitats, about f) 52
[210,220[ 7 18
which the researcher will draw conclusions. g) 23.1%
[220,230[ 3 21
b) continuous h) 96 [230,240[ 6 27
c) i. convenience sampling i) 1.85 [240,250[ 3 30
ii. quota sampling 3. a) mode = 169, mean = 171.1 b) 216.0
d) stratified sampling b) Median = 171, 𝑄1 = 169, 𝑄3 = 176 c) 216.2
e) c) Range = 26, 𝐼𝑄𝑅 = 7 d) mode = 217, modal class: [210,220[
36 32 32 16 28 32 36 13 37 23 d) 156 6. a) Cumulative frequencies: 13, 36, 68, 101, 128,
26 35 25 26 20 21 28 31 30 15 e) 148, 160, 171, 181, 192, 200
f) middle 50%: from 𝑄1 ≈ 36.6 to 𝑄3 ≈ 92.5
b) 61%
c) 67.15
The distribution of the data is asymmetrical, 7. a) 25
positively skewed with one outlier which is a valid b) 20 m, 60 m
part of the sample (156 can be someone’s height). c) 10
f) 𝜎 2 = 46.9, 𝜎 = 6.85 d) 4.17%
g) There is no modal class, the distribution is 4. a) 16 e) 38 m
asymmetrical, negatively skewed. b) 15.23 f) 43 − 33 = 10 m
2. a) discrete c) 15.5 g) i. 28 m ii. 48 m iii. 36 m
b) convenient sampling d) 𝑄1 = 14, 𝑄3 = 17 8. a) Median = 2, 𝑄1 = 2, 𝑄3 = 3
e) 15.87 (2d.p.)
7
Higher Level AA – 14. Sampling and data statistics
b) No. of d) mode = 7, median = 7, mean = 7.12 d)
Frequency
destinations e) symmetrical
0 2
1 6 i. no, example: bimodal distributions (two most
2 16 common values are on the left and right side on
3 7 the histogram)
4 3
ii. yes
c) mode: 2
2. a) mode =7, median =7, mean =7.39 e) the samples have similar range, but the first
d) 0 and 1 b) 7 sample has significantly bigger 𝐼𝑄𝑅. Median of the
9. 3 and 6 c) 𝑄1 = 7, 𝑄3 = 8, 𝐼𝑄𝑅 = 1 second sample is greater than the first one by
10. a)
d) around 1.5. Both samples are asymmetrical.
min 𝑄1 med 𝑄3 max 4. a) discrete
#2 9 10.5 14 18 23
b) Cumulative
#3 15 19.5 30 42 57 Book price Frequency
frequency
b) #2 mean: 14.5, #3 mean: 31.5 22 ≤ 𝑥 < 24 5 5
e) outliers < 5.5 or > 9.5. Outliers shouldn’t be
c) #2 st.dev.: 4.44, #3 st.dev.: 13.32 24 ≤ 𝑥 < 26 9 14
excluded as they are valid scores obtained at the
26 ≤ 𝑥 < 28 7 21
Homework: test and they aren’t distinct from the rest of the 28 ≤ 𝑥 < 30 4 25
scores. 30 ≤ 𝑥 < 32 2 27
1. a) systematic sampling f) 1.42 32 ≤ 𝑥 < 34 3 30
b) discrete g) 30% c) mean = 26.87
c) h) 15% d) 1.61
3. a) convenient sampling e) modal price = 28.79, modal class: 24 ≤ 𝑥 < 26
c) sample #1: min = 23.16, 𝑄1 = 26.79, Modal class doesn’t contain modal price, there are
median = 27.58, 𝑄3 = 30.39, max = 33.13 only few repeated data values in the data set.
sample #2: min = 23.92, 𝑄1 = 24.63, 5. a) 30 < 𝑥 ≤ 35
median = 25.95, 𝑄3 = 31.23, max = 33.13
8
Higher Level AA – 14. Sampling and data statistics
b) d) 8 11. 11 and 14
13 12. 𝑥 = 5, 𝑦 = 2
e) ≈ 18.1%
72
f) 5th: 91, 95th: 133 13. a)
g) median = 116, 𝑄1 = 110, 𝑄3 = 121
h) yes because there are scores less than 110 −
1.5 ∙ 11 = 93.5 and greater than 121 + 1.5 ∙ 11 =
137.5
c) 30.9 8. a) 10
d) 28.8 b) 23.9 cm
e) 16.7% c) 114 cm b) 𝑘 = 5
6. a) 50 g d) 110 cm c)
b) Mass(g) Frequency e) Circumference (cm) Frequency
20 ≤ 𝑥 ≤ 30 12 100 < 𝑥 ≤ 110 17
30 < 𝑥 ≤ 40 19 110 < 𝑥 ≤ 120 26
40 < 𝑥 ≤ 50 57 120 < 𝑥 ≤ 130 11
50 < 𝑥 ≤ 60 41 130 < 𝑥 ≤ 140 4
60 < 𝑥 ≤ 70 28 140 < 𝑥 ≤ 150 2 d) no
70 < 𝑥 ≤ 80 15 f) 110 < 𝑥 ≤ 120 e) 𝑛 = 3 or 𝑛 = 7
80 < 𝑥 ≤ 90 4
9. 14. a) 𝑥 = 5, 𝑦 = 3
c) 40 < 𝑥 ≤ 50
Mark 3 4 5 6 7 b) minimum 𝑄1 median
d) 51.5 g
Frequency 2 6 11 8 5 Sample #1 7 10.5 13.5
e) 4.05 g Cumulative frequency 2 8 19 27 32 Sample #2 2 5.5 8.5
f) 10.8% Sample #3 21 31.5 40.5
10. a) 𝑎 ≤ 7 and 𝑐 ≤ 7
g) 𝑝 = 60
b) 20
7. a) 72, yes 𝑄3 maximum mean st.dev.
c) 𝑎 = 2, 𝑏 = 6, 𝑐 = 7 14 4.45
Sample #1 18 20
b) discrete
d) The teacher is 37 years old. Sample #2 13 15 9 4.45
c) 65 Sample #3 54 60 42 13.35