Variation
• Variability: The extent numbers in a data set are
  dissimilar (different) from each other.
• When all elements measured receive the same
  scores (e.g., everyone in the data set is the same
  age, in years), there is no variability in the data set.
• As the scores in a data set become more
  dissimilar, variability increases.
               Variation: Range
• The range tells us the span over which the data are
  distributed, and is only a very rough measure of
  variability.
• Range: The difference between the maximum and
  minimum scores.
  Example: The youngest student in a class is 19 and the
   oldest is 46. Therefore, the age range of the class is 46
   – 19 = 27 years.
   X      X X
   5      0.00    This is an example of data
   5      0.00    with NO variability
   5      0.00
   5      0.00
   5      0.00
 X= 25     n=5    X =5
   X      X X
   6   +1.00      This is an example of data
   4    -1.00     with low variability
   6   +1.00
   5    0.00
   4   -1.00
 X= 25     n=5    X =5
   X      X X
   8   +3.00      This is an example of data
   1    -4.00     with higher variability
   9   +4.00
   5    0.00
   2   -3.00
 X= 25     n=5    X =5
                      Note:
• Let’s say we wanted to figure out the average
  deviation from the mean. Normally, we would want
  to sum all deviations from the mean and then divide
  by n, i.e.,
              X  X 
                    n
• BUT: We have a problem.   ( X  X ) will always add
  up to zero
• However, if we square each of the deviations from
  the mean, we obtain a sum that is not equal to
  zero.
• This is the basis for the measures of variance and
  standard deviation, the two most common
  measures of variability of data.
      X                 XX                     X  X 
                                                       2
      8                 +3.00                   9.00
      1                  -4.00                 16.00
      9                 +4.00                  16.00
      5                    0.00                 0.00
      2                   -3.00                 9.00
 X = 25    X  X    = 0.00
                                          
                                           2 = 50.00
                                       XX
 Note: The  X  X 2 is called the Sum of Squares.
        Variance of a Population
• VARIANCE OF A POPULATION: the sum of
  squared deviations from the mean divided by the
  number of scores (sigma squared):
                X   
                                   2
             2
                   n
  Population Standard Deviation
Square root of the variance   2
            X   
                                   2
        
               n
               Sample Variance
• The sum of squared deviations from the mean
  divided by the number of degrees of freedom (an
  estimate of the population variance, n-1)
         s 
           2  X x            2
               n 1
    Sample Standard Deviation
• Square root of the variance s2
      s          X  x         2
                       n 1
   Why use Standard Deviation and not
             Variance!??!
• Normally, you will only calculate variance in
  order to calculate standard deviation, as standard
  deviation is what we typically want.
• Why? Because standard deviation expresses
  variability in the same units as the data.
• Example: Standard deviation of ages in a class is
  3.7 years.
        Degrees of Freedom
• Degrees of Freedom: The number of
  independent observations, or, the number of
  observations that are free to vary.
• In our data example above, there are 5
  numbers that total 25 (  X = 25, n = 5)
             Degrees of Freedom
• Many combinations of numbers can total 25, but only the
  first 4 can be any value.
• The 5th number cannot vary if  X = 25
• This example has 4 degrees of freedom, as four of the
  five numbers are free to vary.
• Sample standard deviation usually underestimates
  population standard deviation.
• Using n-1 in the denominator corrects for this and gives
  us a better estimate of the population standard deviation.
            Normal Distribution
• The normal distribution is a theoretical
  distribution.
• “Normal” does not mean typical or average, it is a
  technical term given to this mathematical
  function.
• The normal distribution is unimodal and
  symmetrical, and is often referred to as the Bell
  Curve.
Normal Distribution
        Mean
        Median
        Mode
           Normal Distribution
• We study the normal distribution because many
  naturally occurring events yield a distribution
  that approximates the normal distribution.
Properties of Area Under the Normal
             Distribution
• One of the properties of the Normal Distribution
  is the fixed area under the curve.
• If we split the distribution in half, 50% of the
  scores of the sample lie to the left of the mean (or
  median, or mode), and 50% of the scores lie to
  the right of the mean (or median, or mode).
• The mean, median, and mode always cut the
  Normal Distribution in half, and are equal since
  the Normal Distribution is unimodal and
  symmetrical.
50% of                        50% of
scores                        scores
         Mean, Median, Mode
• The entire area under the normal curve can be
  considered to be a proportion of 1.0000.
• Thus, half, or .5000 of the scores lie in the bottom
  half (i.e., left of the mean) of the distribution, and
  half, or .5000 of the scores lie in the top half (i.e.,
  right of the mean).
.5000 of                        .5000 of
scores                          scores
           Mean, Median, Mode
                     Z-scores
• Z-Scores (or standard scores) are a way of
  expressing a raw score’s place in a distribution.
• Z-score formula:
                      X 
             z
                        
• The mean      and standard deviation                 are
  always notated in Greek letters.
• Z-scores only reflect the data points’ position relative to
  the overall data set (so you’re now considering the data
  as a population, as you’re not looking to infer to a
  greater population).
• This means use the population formula for standard
  deviation rather than the sample formula whenever you
  calculate Z.
• A z-score is a better indicator of where your score
  falls in a distribution than a raw score.
• A student could get a 75/100 on a test (75%) and
  consider this to be a very high score.
• If the average of the class marks is 89 and the
  (population) standard deviation is 5.2, then the z-score
  for a mark of 75 would be:
       89 
                                          X 
     =        = 5.2
z = (75-89)/5.2                  z
z = (-14)/5.2
z = -2.69
                                              
• This means that a mark of 75% is actually 2.69
  standard deviations BELOW the mean.
• The student would have done poorly on this test,
  as compared to the rest of the class.
• z = 0 represents the mean score (which would be
  89 in this example).
• z < 0 represents a score less than the mean (which
  would be less than 89).
• z > 0 represents a score greater than the mean
  (which would be greater than 89).
• A z-score expresses the position of the raw score
  above or below the mean in standard deviation
  sized units.
• E.g.,
  z = +1.50 means that the raw score is 1 and one-half
   standard deviations above the mean.
  z = -2.00 means that the raw score is 2 standard
   deviations below the mean.
             Z-score Example
• If you write two exams, in Math and English, and
  get the following scores:
                     
   Math 70% (class = 55,  = 10)
   English 60% (class  = 50,  = 5)
• Which test mark represents the better performance
  (relative to the class)?
• Math mark:
  z = (70-55)/10
  z = +1.50
• English mark:
  z = (60-50)/5         X 
  z = +2.00       z
                         
Z-score Example Illustration
     Mean
     Z=0.00    Z=1.50   Z=2.00
                 The Answer
• Because: Z = +2.00 is greater than Z = +1.50, the
  English class mark of 60% reflects a better
  performance relative to that class than does the
  Math class mark of 70%.
       Z-score: Solving for X
• The z-score formula can be rearranged to solve
  for X:
     X   X  (z)( )  
  z
            
• This formula is used when you know the z-score
  of a data point, and want to solve for the raw
  score.
                      Example
 • E.g., if a class midterm exam has  = 65 and  = 5,
   what exam mark has a z-score value of 1.25?
                             X = (1.25)(5) + 65
X  (z)( )                  = 6.25 + 65
                               = 71.25
 So, a person whose test is 1.25 standard deviations above the
   mean obtained a score of 71.25%.
              Skew Distributions
• Outliers skew distributions.
• If group has one high score,
  the curve has a positive
  skew (contains more low
  scores)
• If a group has a low outlier,
  the curve has a negative
  skew (contains more high
  scores)