KEMBAR78
Data and Overview Lecture 6 | PDF | Mean | Mode (Statistics)
0% found this document useful (0 votes)
8 views22 pages

Data and Overview Lecture 6

The document provides an overview of measures of central tendency, including definitions and properties of the arithmetic mean, median, and mode. It outlines the requirements for an ideal average and describes how to calculate these measures for both discrete and continuous data. Additionally, it briefly mentions geometric and harmonic means, although they are not required for the semester.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views22 pages

Data and Overview Lecture 6

The document provides an overview of measures of central tendency, including definitions and properties of the arithmetic mean, median, and mode. It outlines the requirements for an ideal average and describes how to calculate these measures for both discrete and continuous data. Additionally, it briefly mentions geometric and harmonic means, although they are not required for the semester.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Data: An Overview

Lecture 6
Measures of Central Tendency
Measures of Central Tendency
If we look at the frequency distributions which we came across in practice, we shall find that there is usually a
tendency of the variate values to cluster around a central value: in other words, most of the values lie in a
small interval about a central value. This characteristic is called the central tendency of a frequency
distribution.

The central value , which is taken as the representative of the entire data is called a measure of central
tendency or, simply, an average. In relation to a frequency distribution, an average is also termed as a measure
of location, because it helps to locate the position of the distribution on the axis of the variable. It may be
noted that an average is not necessarily one of the given values.

Requirements of an ideal average

• Rigidly defined.
• Based on all the observation.
• Capable of simple interpretation.
• Easy to compute.
• Really amenable for algebraic treatment.
• More all less stable from sample to sample.
Common Measures of Central Tendency

An average may be of different forms and among them, the commonly used types are:

i. Arithmetic Mean (or, simply, mean)


ii. Median
iii. Mode
iv. Geometric Mean (not reqd. for semester)
v. Harmonic Mean (not reqd. for semester)
Arithmetic Mean:
The arithmetic mean is a moment-based measure and is defined as the first-order raw moment of the dataset.
It is derived by dividing the sum of the values of the datapoints by the number of the values. If 𝑥 denotes the
variable under consideration and its values namely, 𝑥1 , 𝑥2 ,… ,𝑥𝑛 then the arithmetic mean of 𝑥 denoted by 𝑥ҧ is
defined as:
𝑛
𝑥1 + 𝑥2 +… + 𝑥𝑛 1
𝑥ҧ = = ෍ 𝑥𝑖
𝑛 𝑛
𝑖=1

In an office the monthly wages (in Rs.) of 8 employees for the month of May 1995 were found as 4500,
4800, 4650, 4700, 4774, 4600, 4850, and 4530. Find the mean age of the employees.

If the frequency table corresponding to a discrete variable is given, then the mean can be obtained in the
following way.
𝑟
𝑥1 𝑓1 + 𝑥2 𝑓2 +… + 𝑥𝑟 𝑓𝑟 1
𝑥ҧ = = ෍ 𝑥𝑖 𝑓𝑖
𝑓1 + 𝑓2 +… + 𝑓𝑟 𝑛
𝑖=1
Where 𝑥1 , 𝑥2 ,… ,𝑥𝑟 denote the distinct values of the variable 𝑥, 𝑓1 , 𝑓2 ,… ,𝑓𝑟 denote the respective
frequencies and σ𝑟𝑖=1 𝑓𝑖 = 𝑛.
For a continuous variable, the data are summarised in a frequency table showing the various class
intervals and their corresponding class frequencies. In this case, the class mark of a class interval is
supposed to represent the interval and based on this assumption, an approximate value of the mean
may be obtained.
𝑟
𝑥1 𝑓1 + 𝑥2 𝑓2 +… + 𝑥𝑟 𝑓𝑟 1
𝑥ҧ = = ෍ 𝑥𝑖 𝑓𝑖
𝑓1 + 𝑓2 +… + 𝑓𝑟 𝑛
𝑖=1
Where 𝑥1 , 𝑥2 ,… ,𝑥𝑟 represent the classmark of 𝑟 class intervals and 𝑓1 , 𝑓2 ,… ,𝑓𝑟 denote the
corresponding class frequencies and σ𝑟𝑖=1 𝑓𝑖 = 𝑛.

Properties of Arithmetic Mean:


1. If the observed values of a variable are all equal, then their mean will be the common value.
2. The sum of the deviations of the values of a variable from its mean is 0.
3. Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the arithmetic means of 𝑥 and 𝑦
are related as 𝑥ҧ = 𝑎 + 𝑏𝑦.

4. If there are two groups of values of a variable 𝑥, one containing 𝑛1 values with mean 𝑥1 and the
other containing 𝑛2 values with mean 𝑥2 then, the mean of the combined data is given by
𝑛 𝑥 +𝑛 𝑥
ഥ𝑥 = 1 1 2 2
𝑛1 +𝑛2
5. Arithmetic mean can be computed for Interval and ratio scales of measurements.
Median:
The median of a variable is defined as the middlemost value when its values are arranged in ascending or
descending order of magnitude. This is a quantile-based measure; to be precise, the median is the quantile of
order 0.5 of the dataset. The median divides the whole dataset into two parts such that half of the
observations are less than or equal to it and half are more than or equal to it.

If the total number of given values 𝑛, is an odd number, then there exists only one middle most value, namely,
𝑛+1
the 2 th value on the arrangement and it represents the median of the values.

If 𝑛 be even, median may not be uniquely determined. In fact, any possible value between the two middle
𝑛 𝑛
values , namely, the 2 th value and the ( 2 + 1)th values in the ordered arrangement, may be taken as median.
𝑛 𝑛
But, in order to obtain a definite value, the arithmetic mean of the th value and the ( + 1)th values is
2 2
regarded as the median of the set of values by convention.

The scores of 9 students in Economics in a class test were found to be 40, 37, 41, 38, 31, 37, 44, 45, 42. Find the
median. What would happen to the median when the last kid was absent in the class test?

Properties of Median:
1. Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the arithmetic means of 𝑥 and 𝑦 are
related as Me(𝑥)= 𝑎 + Me(y).
2. The median can be calculated for Interval, Ratio and Ordinal scales of measurements.
Mode:
The mode of a variable is defined as that value of the variable which has the highest frequency or frequency
density, according to as the variable is discrete or continuous. In the latter case, if a unimodal distribution can
be fitted with a smooth frequency curve, then the mode is the abscissa of the highest point on the curve.

A distribution can have more than one mode., but if all the values in a distribution have the same frequency or
frequency density (as the case may be), then the mode is not defined.

In a given frequency distribution for a discrete variable, the mode can be immediately found by inspection

Column Diagram showing the frequency distribution of family size


35

30

25
Frequency

20

15

10

0
0 1 2 3 4 5 6 7 8
Family Size
The difficulty arises with a continuous variable. We can easily pick out the modal class, which is the class
having the highest frequency density, But the position of the mode within the class is difficult to find.
Sometimes, the classmark is taken as the mode, but this is a poor approximation unless the histograms are
more or less symmetrical. A somewhat better method is to use three consecutive classes with the modal class in
the middle, provided these classes are of equal width.

Let 𝑥𝑙 and 𝑥𝑢 be respectively the lower and upper boundaries of


𝐴 𝐷 the modal class, and 𝑓𝑚−1 , 𝑓𝑚 and 𝑓𝑚+1 be the frequencies (for
𝑃 equal class width, they are equivalent to frequency densities) of the
𝑀 𝑁 three classes. Since, in practice, we usually see the class frequencies,
starting from a low value, gradually come down, it can be expected
𝐵 that,
𝐶
𝑀𝑜 − 𝑥𝑙 < = > 𝑥𝑢 − 𝑀𝑜
According as 𝑓𝑚 − 𝑓𝑚−1 < = > 𝑓𝑚 − 𝑓𝑚+1 . 𝑀𝑜 is the mode here.
𝑀𝑜 −𝑥𝑙 𝑓𝑚 −𝑓𝑚−1
It is assumed that, = .
𝑥𝑢 −𝑀𝑜 𝑓𝑚 −𝑓𝑚+1
𝑓𝑚−1 𝑓𝑚 𝑓𝑚+1
From this, simple algebra leads to the formula for mode as:

𝑓𝑚 − 𝑓𝑚−1
𝑥𝑙 𝑀𝑜 𝑥𝑢 𝑀𝑜 = 𝑥𝑙 + 𝑐
2𝑓𝑚 − 𝑓𝑚−1 − 𝑓𝑚+1
Here, 𝑐 is the common width of the 3 classes.
Classes Class Frequen
-mark cy
25-30 27.5 0
30-35 32.5 4
35-40 37.5 4
40-45 42.5 2 Find the value of the mode for the table
45-50 47.5 8
50-55 52.5 3
to the left.
55-60 57.5 7
60-65 62.5 3
65-70 67.5 4
70-75 72.5 0

Properties of Mode:
1. Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the arithmetic means of 𝑥 and 𝑦 are
related as Mo(𝑥)= 𝑎 + Mo(y).
2. If exists, the mode can be calculated for Interval, Ratio, Ordinal and Nominal scales of measurements.
Geometric mean: (not required for semester)
The geometric mean of a set of 𝑛 positive values of a variable is the 𝑛 th root of their product. If a variable
𝑥 assumes 𝑛 values 𝑥1 , 𝑥2 ,… ,𝑥𝑛 , then its geometric mean, denoted by 𝑥𝐺 , is
𝑛 1ൗ
𝑛
1ൗ
𝑥𝐺 = (𝑥1 𝑥2 … 𝑥𝑛 ) 𝑛= ෑ 𝑥𝑖
𝑖=1
For a frequency distribution,
𝑛 1ൗ 𝑛
𝑛
𝑥𝐺 = ෑ 𝑥𝑖 𝑓𝑖 , 𝑛 = ෍ 𝑓𝑖
𝑖=1 𝑖=1

Properties of Geometric Mean:


1. The observations must be non-negative for the Geometric Mean to be defined.
2. If the observed values of a variable are all equal, then their geometric mean will be the common value.
3. The logarithm of the geometric mean of a set of values is the arithmetic mean of their logarithm.
4. If 𝑦 is a function of a variable in the form 𝑦 = 𝑎𝑥, then the geometric mean of 𝑦 is related to that of 𝑥 in
the similar form.
5. The geometric mean of ratio of two variables is the ratio of their geometric means.
6. If there are two sets of values of variable 𝑥, consisting of 𝑛1 and 𝑛2 values and if 𝐺1 and 𝐺2 are their
respective geometric means, then the geometric mean 𝐺 of the combined set is given by
1
𝑛1
𝐺 = (𝐺1 𝐺2 𝑛2 )𝑛1+𝑛2
7. This is defined for the ratio scale of measurement.
Harmonic Mean: (not required for semester)
The harmonic mean of a set of 𝑛 non-zero values of a variable is the reciprocal of the arithmetic mean of the
reciprocals of their values. If a variable 𝑥 assumes 𝑛 values 𝑥1 , 𝑥2 ,… ,𝑥𝑛 , then its geometric mean, denoted by
𝑥𝐻 , is
𝑛 𝑛
𝑥𝐻 = =
1 1 1 1
+ + ⋯ + σ𝑛𝑖=1
𝑥1 𝑥2 𝑥𝑛 𝑥𝑖
For a frequency distribution,
𝑛
𝑛
𝑥𝐺 = , 𝑛 = ෍ 𝑓𝑖
𝑓
σ𝑛𝑖=1 𝑖 𝑖=1
𝑥𝑖
Properties of Geometric Mean:
1. The observations must be non-zero for the Harmonic Mean to be defined.
2. If the observed values of a variable are all equal, then their harmonic mean will be the common value.
3. The reciprocal of the geometric mean of a set of values is the arithmetic mean of their reciprocals.
4. If 𝑦 is a function of a variable in the form 𝑦 = 𝑎𝑥, then the geometric mean of 𝑦 is related to that of 𝑥 in
the similar form. i.e., 𝑦ℎ = 𝑎𝑥ℎ .
5. If there are two sets of values of variable 𝑥, consisting of 𝑛1 and 𝑛2 values and if 𝐻1 and 𝐻2 are their
respective harmonic means, then the harmonic mean 𝐺 of the combined set is given by 𝐺=
𝑛1 +𝑛2
𝑛1 𝑛2
+
𝐺1 𝐺2
6. This is defined for the interval scale of measurement.
Measures of Dispersion
Measures of Dispersion
The values of a variable are generally not all equal. In some cases, the values are very close to one another;
again, in some cases, they are markedly different from one another. In order to get a proper idea about the
overall nature of a given set of values, it is necessary to know, besides average, the extent to which the given
values differ among themselves or equivalently, how they are scattered about the average. This feature of a
frequency distribution which represents the variability of the given values or reflects how scattered the values
are, is called dispersion.

A device that is used to measure this characteristic, namely scatter, is referred to as a measure of dispersion.
The various common measures are
a. Range
b. Mean deviation
c. Standard deviation
d. Quartile deviation (not required for the semester)
Range:
The range of a variable is the simplest measure of its dispersion, and it is defined as the difference between the
greatest and the least of its given set of values. It should be noted that if the data are in a grouped frequency
distribution, the range can be considered as the difference between the largest upper boundary and the smallest
lower boundary.

1. A variable takes the values 3, 5, -1, 8, 4. Find the range.


2. Compare the following data sets.

56 59 60 49 44 41 56 52 57 49 50 63 52 56 57
47 54 52 47 53 45 68 54 51 66 52 62 61 48 54
57 54 47 59 54 54 56 60 61 57 52 64 49 54 57
64 46 68 57 67 70 53 66 49 62 49 47 65 68 63
53 60 55 58 53 58 60 55 53 60 51 48 50 55 53
50 60 57 53 56 59 56 55 54 57 57 58 50 58 57
62 56 54 50 57 55 55 50 54 46 55 59 57 52 49
53 54 55 55 56 59 61 57 58 65 55 57 53 53 55

Property of Range
Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the ranges of 𝑥 and 𝑦 are related as
𝑅𝑎𝑛𝑔𝑒 𝑥 = 𝑏 𝑅𝑎𝑛𝑔𝑒 𝑦 .
Mean Deviation:
The mean deviation is actually the man of absolute differences of the given values of the variable from some
average. Suppose, 𝑥1 , 𝑥2 , … , 𝑥𝑛 are the given values of a variable 𝑥 and 𝑐 is the chosen average. Then first we
consider the differences (also known as the deviations) 𝑥𝑖 − 𝑐 (𝑖 = 1,2, … , 𝑛) of the values from 𝑐. The greater
the differences, more is the dispersion. To get a suitable measure, it is necessary to combine the differences. The
simple arithmetic mean of the differences will not serve the purpose, since the sum of the deviations may be
small even when individually they are large in magnitude; that is why, differences of opposite signs cancel each
other. So, we take only the magnitudes of the differences and subsequently calculate the arithmetic mean of
those absolute differences. It is termed as the mean deviation of 𝑥 about 𝑐 and is denoted by 𝑀𝐷𝑐 .
Thus,
𝑛
1
𝑀𝐷𝑐 = ෍ |𝑥𝑖 − 𝑐|
𝑛
𝑖=1
In particular, if when 𝑐 = 𝑥,ҧ the mean deviation about mean is
𝑛
1
𝑀𝐷𝑥ҧ = ෍ |𝑥𝑖 − 𝑥|ҧ
𝑛
𝑖=1
Again, if 𝑥1 , 𝑥2 , … , 𝑥𝑛 are the given values of a variable 𝑥 and 𝑓1 , 𝑓2 , … , 𝑓𝑛 are the corresponding frequencies,
then
𝑛
1
𝑀𝐷𝑐 = ෍ |𝑥𝑖 − 𝑐|𝑓𝑖
𝑛
𝑖=1
A similar definition can be provided for grouped frequency distribution for continuous data considering 𝑥𝑖 to be
the class mark of the 𝑖 the class and substitution it to the above mentioned formula.
Property of Mean deviation
1. Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the ranges of 𝑥 and 𝑦 are related as =
𝑀𝐷(𝑥)𝐴(𝑥) = 𝑏 𝑀𝐷(𝑦)𝐴(𝑦)
Where, 𝐴 𝑥 and 𝐴(𝑦) are corresponding values of 𝑥 and 𝑦 satisfying the given relation.
2. The mean deviation is least when measured about the median.

1. A variable takes the values 3, 5, 0, 8, 4. Find the mean deviations about 3, the mean, and the median.
2. Compare the following data sets.

56 59 60 49 44 41 56 52 57 49 50 63 52 56 57
47 54 52 47 53 45 68 54 51 66 52 62 61 48 54
57 54 47 59 54 54 56 60 61 57 52 64 49 54 57
64 46 68 57 67 70 53 66 49 62 49 47 65 68 63
53 60 55 58 53 58 60 55 53 60 51 48 50 55 53
50 60 57 53 56 59 56 55 54 57 57 58 50 58 57
62 56 54 50 57 55 55 50 54 46 55 59 57 52 49
53 54 55 55 56 59 61 57 58 65 55 57 53 53 55
Standard Deviation:
The measure root mean square deviation is defined as the positive square root of the arithmetic mean of the
squares of the differences of the variable values form a chosen average. Thus the root mean square deviation
of 𝑛 values 𝑥1 , 𝑥2 , … , 𝑥𝑛 of variable 𝑥 about 𝑐 is
1
𝑅𝑀𝑆𝐷 𝑐 = σ𝑛𝑖=1(𝑥𝑖 − 𝑐)2 .
𝑛
Although, the mean square deviation indicates the scatter of the variable, the square root is obtained for
expressing the measure in the same unit as that of the variable.
If we put 𝑐 = 𝑥,ҧ in the expression above, then the measure is referred to as standard deviation and is denoted
by 𝑠. So the standard deviation may be defined as the positive square root of the mean of the squares of the
differences of the variable values from their mean. Thus,
1
𝑠= σ𝑛𝑖=1(𝑥𝑖 − 𝑥)ҧ 2
𝑛

Again, if 𝑥1 , 𝑥2 , … , 𝑥𝑛 are the given values of a variable 𝑥 and 𝑓1 , 𝑓2 , … , 𝑓𝑛 are the corresponding frequencies,
then
1
𝑠= σ𝑛𝑖=1(𝑥𝑖 − 𝑥)ҧ 2 𝑓𝑖
𝑛

A similar definition can be provided for grouped frequency distribution for continuous data considering 𝑥𝑖 to be
the class mark of the 𝑖 the class and substitution it to the above-mentioned formula.
Property of Standard Deviation
1. Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the ranges of 𝑥 and 𝑦 are related as ==
𝑠(𝑥) = 𝑏 𝑠(𝑦)
Where, 𝐴 𝑥 and 𝐴(𝑦) are corresponding values of 𝑥 and 𝑦 satisfying the given relation.
2. The root mean square deviation is least when measured about the mean. i.e., the standard deviation is the
lowest root mean square deviation.
3. If all the values of a variable are equal, its standard deviation is 0. The converse is also true.
4. Suppose, two groups of values of a variable are given. If 𝑥1 and 𝑠1 respectively denote the mean and the
standard deviations of 𝑛1 values of the first group and 𝑥2 and 𝑠2 respectively denote tha same for the second
group with 𝑛2 observations, then the expression for the standard deviation of the combined data is given by

𝑛 (𝑥 − 𝑥)
ҧ 2 +𝑛 (𝑥 − 𝑥)
ҧ 2 𝑛 𝑠 2+𝑛 𝑠 2
1 1 2 2 1 1 2 2
𝑠2 = +
𝑛1 + 𝑛2 𝑛1 + 𝑛2
𝑛1 𝑥1 +𝑛2 𝑥2
Where, 𝑥ҧ = .
𝑛1 +𝑛2
1. A variable takes the values 3, 5, -1, 8, 4. Find the standard deviation.

2. Find the standard deviation for each of these two groups. Hence find the combined standard deviation.

56 59 60 49 44 41 56 52 57 49 50 63 52 56 57
47 54 52 47 53 45 68 54 51 66 52 62 61 48 54
57 54 47 59 54 54 56 60 61 57 52 64 49 54 57
64 46 68 57 67 70 53 66 49 62 49 47 65 68 63

53 60 55 58 53 58 60 55 53 60 51 48 50 55 53
50 60 57 53 56 59 56 55 54 57 57 58 50 58 57
62 56 54 50 57 55 55 50 54 46 55 59 57 52 49
53 54 55 55 56 59 61 57 58 65 55 57 53 53 55
Quartile Deviation:
(not required for the semester)
The quartile deviation is a measure of dispersion based on the quartiles. It is understood that, if the values of
a variable differ much from one another, the differences between the quartiles would be large; on the other
hand, when the values are close to one another, the differences would be small.As such, one can take the
average of the differences between 𝑄2 and 𝑄1 and that between 𝑄3 and 𝑄2 as a measure of dispersion, called
quantile deviation (denoted by 𝑄). Thus,
(𝑄2 −𝑄1 ) + (𝑄3 − 𝑄2 ) 𝑄3 − 𝑄1
𝑄= =
2 2
This is also called the semi-interquartile range. If the data are given in a frequency table with one or both of
the terminal classes open or with class intervals of unequal size, then the quartile deviation is used as an
appropriate measure.

Property of Quartile Deviation:


Suppose 𝑥 is a linear function of 𝑦 in the form 𝑥 = 𝑎 + 𝑏𝑦 ; then the ranges of 𝑥 and 𝑦 are related as
𝑄 𝑥 = 𝑏 𝑄 𝑦 .
Find the Quartile Deviation for the following data:
Height
141-145 146-150 151-155 156-160 161-165 166-170 171-175
(cm)
No. of
7 9 15 23 21 10 5
persons

Class Relative
Height (cm) Frequency CRF
Boundaries Frequency
141-145 140.5-145.5 7 0.078 0.078
146-150 145.5-150.5 9 0.1 0.178
151-155 150.5-155.5 15 0.167 0.345
156-160 155.5-160.5 23 0.256 0.6
161-165 160.5-165.5 21 0.233 0.833
166-170 165.5-170.5 10 0.111 0.944
171-175 170.5-175.5 5 0.056 1
Total 90

You might also like