NAME: ____________________
Assignment 5
Data Files needed for these problems are in the Attached Files.
Problems:
3.31
a. Calculate the population mean.
7+5+11+ 8+3+6+2+1+ 9+ 8 60
= =6
10 10
b. Calculate the population standard deviation.
( 7−6 )2 + ( 5−6 )2 + ( 11−6 )2 + ( 8−6 )2 + ( 3−6 )2+ ( 6−6 )2+ ( 2−6 )2 + ( 1−6 )2 + ( 9−6 )2+ ( 8−6 )2 94 47
√ 10 √ √ 10
=
5
≈3.0659
3.33
a. Calculate the mean, variance, and standard deviation for this population.
276.49
Mean 02
76508.
Variance 84
276.60
Stand Dev 23
b. What percentage of the 50 states have a number of McDonald’s stores
within ±1,±2,±1,±2, or ±3±3 standard deviations of the mean?
Within one: 88%
Within two: 94%
Within three: 96%
c. Compare your findings with what would be expected on the basis of the empirical rule. Are
you surprised at the results in (b)?
The empirical rule is not a great estimate for the first standard deviation, but it is for the
second and third, this is most likely because the data is not symmetric.
3.37
a. Calculate the mean and standard deviation of the market capitalization for this population of
30 companies.
Mean 185.8
Standard Dev 131.461
Pop 3
b. Interpret the parameters calculated in (a).
This means that the average company has a market cap of 185.8 and we can expect about 68% of
companies to have between 54.3 and 317.2, going out more than one standard deviation in this
context does not make a lot of sense because we know the stock price can’t go below zero.
3.39
a. Does the study suggest that perceived usefulness of smartphones in educational settings
and use of smartphones for class purposes are positively correlated or negatively
correlated?
Since the students that use it more, reported a higher usefulness, they are positively correlated
because as one raises so does the other.
b. Do you think that there might be a cause-and-effect relationship between perceived
usefulness of smartphones in educational settings and use of smartphones for class
purposes? Explain.
Yes, I think this is a clear cause and effect relation, the students who use their smartphone
more for class related activities are going to be seeing first hand how they are benefiting from
them, they are also biased to believe the way they use their time is constructive.
3.41
a. Calculate the covariance between first weekend gross and U.S. gross, first weekend gross
and worldwide gross, and U.S. gross and worldwide gross.
First Vs US 779.0137275
First Vs World 3501.558968
US vs World 5289.87611
b. Calculate the coefficient of correlation between first weekend gross and U.S. gross, first
weekend gross and worldwide gross, and U.S. gross and worldwide gross.
First Vs US 0.728417481
First Vs World 0.823319135
US vs World 0.96419656
c. Which do you think is more valuable in expressing the relationship between first weekend
gross, U.S. gross, and worldwide gross—the covariance or the coefficient of correlation?
Explain.
The coefficient of correlation seems to be the best indication of that at a glace since it is always
on the interval from -1 to 1
d. Based on (a) and (b), what conclusions can you reach about the relationship between first
weekend gross, U.S. gross, and worldwide gross?
The US vs World gross seems to be the most correlated out of the three data sets
3.44
What are the properties of a set of numerical data?
Numerical data sets have a shape, measures of central tendency and variation.
3.45
What is meant by the property of central tendency?
While it can be measured in several ways, mean, median or mode for example, it is a number that
attempts to describe the center of a given set of data.
3.46
What are the differences among the mean, median, and mode, and what are the advantages and
disadvantages of each?
The mean is the average of the set of data, the sum divided by the total number of points, while the
median tells you the center number of the data set, this has one advantage that it is not influenced by
extreme outliers. The mode tells you the most common element of a data set which is useful because it
might indicate a bias in the data towards a particular outcome.
3.47
How do you interpret the first quartile, median, and third quartile?
The first quartile is greater than the first 25% of the data, the median 50 and the third quartile 75%
3.48
What is meant by the property of variation?
Variation refers to the overall spread of the data, the more spread out the higher the variation and the more
clustered the data the lower the variation.
3.49
What does the Z score measure?
The number of standard deviations away from the mean.
3.50
What are the differences among the various measures of variation, such as the range, interquartile
range, variance, standard deviation, and coefficient of variation, and what are the advantages and
disadvantages of each?
The range is the distance from the largest number in the set to the smallest, which can tell you the general
sense of the kinds of numbers you are dealing with. The interquartile range tells you the middle 50% of the
data set, which gives you a sense of where “most” numbers in the set are. The variance tells you about the
spread of the distribution, clustered or spread out, standard deviation has the correct units and you know
using the empirical rules generally how much of the data is withing how many standard deviations of the
mean. While the coefficient of variation is normalized by dividing it by the mean of the data so it can more
easily be used as a comparison to other data sets.
3.51
How does the empirical rule help explain the ways in which the values in a set of numerical data
cluster and distribute?
In most normal sets of data, the empirical rule is a useful rule-of-thumb that says 68% of the data is
within one standard deviation of the mean, 95% within 2 and 99.7 within 3. So if you know both the mean
and standard deviation you can know if a given data point should be considered expected or rare.
3.52
How do the empirical rule and the Chebyshev rule differ?
The empirical rule is an estimation about data sets and their distributions so long as they are “normal”, the
Chebyshev rule will work on any given data set.
3.53
What is meant by the property of shape?
Shape of a data set describes how the data is distributed over the rage, such a standard symmetrical bell
shape, or one that is skewed, or even uniform data.
3.54
What is the difference between skewness and kurtosis?
Skewness refers to how symmetrical the data is, which kurtosis refers to if the data has a bigger or smaller
tail.
3.55
How do the boxplots for distributions of varying shapes differ?
If a data shape is symmetric, the mean will be in the middle and the 2nd and 3rd quartiles will be an
equal distance away from it. If it is positive skewed than the mean will be less than the median and closer to
the 2nd quartile. If the data is negatively skewed than the mean will be greater than the median and closer to
the 3rd quartile.
3.56
How do the covariance and the coefficient of correlation differ?
Covariance is specific to a given set of data while the coefficient of correlation is standardized so
you can use it to compare to other data sets. “Sets A and B are more correlated than C and D”
3.61
a. Calculate the mean, median, range, and standard deviation for the call duration, which is
the amount of time spent speaking to customers on the phone. Interpret these measures of
central tendency and variability.
mean 232.78
median 228
range 1076
standard 158.686
dev 6
The mean is relatively close to the median so the data is not that skewed, most of the data
(68%) can be found between 74 seconds and 391 seconds. So the data has a fairly wide spread.
b. List the five-number summary.
5 number summery
min 65
first q 138.25
median 228
third q 276.75
max 1141
c. Construct a boxplot and describe its shape.
The data has a mean greater than the median, so is skewed in the positive direction, there
is also an outlier in the data set on the positive extreme.
d. What can you conclude about call center performance if a call duration target of less than
240 seconds is set?
We know that both the mean and the median are under this target duration, so on average they are meeting
the goal, but because the standard deviation is so high, a large number of calls are missing the target.
3.73
You are planning to study for your statistics examination with a group of classmates, one of whom
you particularly want to impress.
Box plots do not make sense for categorical data such as gender or major, while pie charts do not
make sense for numerical data such as height and grade point average, these data sets should be switched
to get the appropriate charts. While the mean for height and grade point average makes sense, it should not
be used for categorical data such as gender or major.