Measures of Variability
Range
Variance
Standard Deviation
Coefficient of Variation
Range
The range is a simple measure that tells you the spread of values in a data set. It has a simple definition:
    •   Found by subtracting the smallest value from the largest value in a data set
        Range = maximum value – minimum value
        So if you have a set of data such as 4, 2, 5, 8, 12, 15, the range is the highest number (15) minus
        the lowest number (2). In this case:
Range = 15-2 = 13
    •   Illustration: Consider the data on home sales in Cincinnati, Ohio, suburb
                                      Home Sale             Selling Price ($)
                                                    10                138000
                                                    10                254000
                                                    10                186000
                                                    10                257500
                                                    10                108000
                                                    10                254000
                                                    10                138000
                                                    10                298000
                                                    10                199500
                                                    10                208000
                                                    10                142000
                                                    10                456250
             •   Largest home sales price: $456,250
             •   Smallest home sales price: $108,000
             •   Range = Largest value – Smallest value
           = $456,250 – $108,000
           = $348,250
    •   Drawback: Range is based on only two of the observations and thus is highly influenced by
        extreme values
Variance
    •   Measure of variability that utilizes all the data
    •    It is based on the deviation about the mean, which is the difference between the value of each
         observation (xi) and the mean
    •    The deviations about the mean are squared while computing the variance
                                             ∑(𝑥𝑖 − 𝑥̅ )2
                •   Sample variance, 𝑠 2 =
                                                𝑛−1
                                                  ∑(𝑥𝑖 − µ)2
                •   Population variance , 𝜎 2 =        𝑁
Table 2.12: Computation of Deviations and Squared Deviations about the Mean for the Class Size Data
Computation of Sample Variance:
    •    Standard Deviation
                •   Positive square root of the variance
                •   Measured in the same units as the original data
                •   For sample , s = √𝑠 2
                •   For population, σ = √σ2
    •    Coefficient of Variation
                     Standard deviation
                •   (                     x 100 ) %
                           Mean
                •   Measures the standard deviation relative to the mean
                •   Expressed as a percentage
Illustration:
    •    Consider the class size data:
                            46 54 42 46 32
    •    Mean, 𝑥̅ = 44
    •   Standard deviation, s = 8
                                        8
    •   Coefficient of variation = (44 x 100)% = 18.2%
Analyzing Distributions
Percentiles                 Empirical Rule
Quartiles                   Identifying Outliers
Z-Scores                    Box Plots
Percentiles
    •   Value of a variable at which a specified (approximate) percentage of observations are below
        that value
    •   The pth percentile tells us the point in the data where:
              •   Approximately p percent of the observations have values less than the pth percentile
              •   Approximately (100 – p) percent of the observations have values greater than the pth
                  percentile
    •   Steps to calculate the pth percentile:
              •   Arrange the data in ascending order (smallest to largest value)
              •   Compute k = (n + 1) × p
              •   Divide k into its integer component, i, and its decimal component, d
                        •   If d = 0, find the kth largest value in the data set; this is the pth percentile
                        •   If d > 0, the percentile is between the values in positions i and i + 1 in the sorted
                            data; to find this percentile, we must interpolate between these two values:
                                •   Calculate the difference between the values in positions i and i + 1 in
                                    the sorted data set; we define this difference between the two values as
                                    m
                                •   Multiply this difference by d: t = m × d
                                •   To find the pth percentile, add t to the value in position i of the sorted
                                    data
    •   Illustration
    •   To determine the 85th percentile for the home sales data in Table 2.9.
    1. Arrange the data in ascending order
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
    Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05
    2. Dividing 11.05 into the integer and decimal components gives us i = 11 and d = 0.05
d > 0, interpolate between the values in the 11th and 12th positions in the sorted data
Illustration (contd.)
    •     To determine the 85th percentile for the home sales data in Table 2.9
              •    The value in the 11th position is 298,000
              •    The value in the 12th position is 456,250
m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home sales data
Quartiles
    •     When the data is divided into four equal parts:
              •    Each part contains approximately 25% of the observations
              •    Division points are referred to as quartiles
    •     𝑄1 = first quartile, or 25th percentile
              •    𝑄2 = second quartile, or 50th percentile (also the median)
              •    𝑄3 = third quartile, or 75th percentile
z-score
    •     Measures the relative location of a value in the data set
    •     Helps to determine how far a particular value is from the mean relative to the data set’s
          standard deviation
    •     Standardized value
    •     If 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 is a sample of n observations
                                            𝑥𝑖 − 𝑥̅
                                     𝑧𝑖 =      𝑠
                       •    𝑧𝑖 = z-score for 𝑥𝑖
                       •    𝑥̅ = sample mean
                       •    s = sample standard deviation
    •     For class size data, 𝑥̅ = 44 and s = 8
              •    For observations with a value > mean, z-score > 0
              •    For observations with a value < mean, z-score < 0
Empirical Rule
              •   For data having a bell-shaped distribution:
                       •   Within 1 standard deviation—approximately 68% of the data values
                       •   Within 2 standard deviations—approximately 95% of the data values
                       •   Within 3 standard deviations—almost all the data values
Identifying Outliers
              •   Outliers: Extreme values in a data set
              •   It can be identified using standardized values (z-scores)
              •   Any data value with a z-score less than –3 or greater than +3 is an outlier
Box Plots
      •   Graphical summary of the distribution of data
      •   Developed from the quartiles for a data set
*q`
Figure 2.23: Box Plots Comparing Home Sale Prices in Different Communities
                               Figure 2.22: Box Plot
                               for the Home Sales
                               Data