KEMBAR78
3 Data Descrition | PDF | Arithmetic Mean | Mode (Statistics)
0% found this document useful (0 votes)
9 views28 pages

3 Data Descrition

The document discusses descriptive statistics, including measures of central tendency (mean, median, mode) and dispersion, distinguishing between statistics (sample) and parameters (population). It explains how to calculate arithmetic and weighted means, geometric means, and the median, providing examples for clarity. The document emphasizes the importance of selecting appropriate measures based on data characteristics and the influence of extreme values on the mean.

Uploaded by

talaatemad666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views28 pages

3 Data Descrition

The document discusses descriptive statistics, including measures of central tendency (mean, median, mode) and dispersion, distinguishing between statistics (sample) and parameters (population). It explains how to calculate arithmetic and weighted means, geometric means, and the median, providing examples for clarity. The document emphasizes the importance of selecting appropriate measures based on data characteristics and the influence of extreme values on the mean.

Uploaded by

talaatemad666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Data collected and organized by the

experimenters would be described by:


I. Measures of central tendency.
II. Measures of dispersion.
III. Complete description is using both
numerical summaries.

Descriptive measures may be computed from the


data of a sample or the data of a population.
To distinguish between them we have the
following definitions:
1. A descriptive measure computed from the
data of a sample is called a statistic.
2. A descriptive measure computed from the
data of a population is called parameter.
Several types of descriptive measures can be
computed from a set of data.
1. The mean (arithmetic, geometric,
harmonic, and weighted),
2. The median, and
3. The mode.
Referred simply as average or mean.

The arithmetic mean is calculating by summing all


the observations in a set of data and then dividing the
total by the number of items involved.

X
N

or, in symbolic notation,

Sum of the observations


Mean 
Number of the observations

 X
Where: N
μ = The population mean
ΣX = All the observations in a set of data.
N = The number of items involved.

N.B: The population mean computed from a


population is called a parameter.

Parameter: A measurable characteristics of a population



X
 X
where: n
(read X-bar) is the arithmetic mean of a sample.
Σ is the Greek capital letter sigma (to sum or to add).
X refers to the individual values (observations) in the
sample.
N is the number of observations in the sample.

N.B: The arithmetic mean  - that is, a number


X
computed from a sample-is called a statistic.

Statistic: A measurable characteristics of a sample.


Not only is it simple, familiar, and easy to calculate,
but it also has the following desirable properties:

1. It can be calculated for any set of interval or


ratio-level data. Therefore, it always exists.
2. A set of data has only one arithmetic mean.
Therefore, it is a unique value.
3. The mean is considered the most reliable, or
precise, average because the means of several
samples taken from a population will not fluctuate
as widely as either the median or the mode.
4. All the data items are used in its calculation.
The arithmetic mean has one additional important
characteristic, namely, the sum of the deviations
from the mean will always be zero.

Since each and every value in a set of data


enters into the computation of the mean, it is
affected by each value.
Extreme values, therefore, have an influence on
the mean and in some cases, can so distort it
that it becomes undesirable as a measure of
central tendency.
The weighted mean is used, if one wants to
combine average values from samples of the
same population with different sample sizes :
n
 i  1W . X
X i i
n W
i1 i
The weights Wi represent the bounds of the
partial sample .
- How to calculate hatchability percentages of quails’
eggs with different numbers of families?

- Compare between arithmetic and weighted means.

Hatchability % (X) 60 65 40 55 62 80 50 53
Number of families (W) 40 35 60 45 38 20 50 47
Arithmetic mean of hatchability percentage is:

60  65  40  55  62  80  50  50  53 465
X    58.125%
8 8

Weighted mean is:


60 x 40  65 x35  ................  50 x50  53x 47
Xw   55.21%
40  35  60  45  38  20  50  47

Note that the two values are different and the last one
is the most appropriate as the number of families used
in the study was considered.
In the following example, it will be found that
the geometric mean is the nth root of the
product of n observations, provided that all
observations are positive and exceed zero.

n
G  n  X i  n X 1 . X 2 .X 3 .........X n
i 1
The geometric mean is suitable for averaging ratios.

In the following example, it will be found that the use


of the arithmetic mean is not proper, while the
geometric mean is suitable:

Number
Number of
of
Farm Friesian, F, Ratio of B to F Ratio of F to B
Baladi,
Cattle
B, Cattle
A 800 400 200 % 50 %
B 150 300 50 % 200 %
Sum 250 250

Arith. mean 125 % 125%


The arithmetic mean of the ratio of Baladi to
Friesian cattle is calculated to be the same as that
of the ratio of Friesian to Baladi cattle.
However, this is obviously not logical.
The appropriate mean is the geometric mean,
which would be as follows:

G 2 (200) (50)  100 %

For both ratios of B to F and F to B.


The geometric mean is used extensively in
microbiological and serological research,
where observations are expressed as titers
(i.e. the dilution of certain suspensions or
reagents at which a specified phenomenon, e.g.
agglutination of RBC's, first take place).

As each titter actually represents a ratio, then


the geometric mean is the appropriate mean.

Log G = (log X1 + log X2 + ……+ log Xn ) / n


= (Σ log Xi ) / n
Thus, the logarithm of the geometric mean is the
arithmetic mean of the sum of the logarithms of
the values.
The geometric mean G is the antilogarithm of
the value (log G).

The following set of antibody titres were


obtained in a sample of five sera.
Calculate the geometric mean:

4, 8, 16, 32, 64 (n = 5)
Log G = (log 4 + log 8 + log 16+ + log 32 log 64 ) / 5
= (0.6021 + 0.9031 + 1.2041 + 1.2041 + 1.8062) / 5
= (5.7196) / 5
= 1.1439
G = Anti log 1.1439
= 13.93

(It may be noticed that the arithmetic mean of the


above set of titers would be 21.6, which is not typical
of the observations, being exaggerated by the extreme
value of 64).
We will now consider the application of the
geometric mean to find the mean rate of
increase in sales, production and other data
reported over a period of time.
The formula for the geometric mean percent
increase is:
Value at end of period
GM  n  1.0
Value at start of period
The number of shipments of home computers increased from
644,000 in 1986 to 6,600,000 in 1990. Compute the geometric
mean annual rate of increase.
Solution:
- The first step is to determine n, the number of periods. It is 4,
found by 1990-1986 = 4.
- The beginning and ending sales are inserted into the formula
and the average annual rate of increase determined:
6.600.000
GM  4  1.0
644.000
= 4
10.248447  1.0
GM = 1.789 – 1.0 = 0.7892

The geometric-mean annual rate of increase was 78.92 % for


the four-year period. The fourth root of can be
10.248447
found via logarithms or an electronic hand calculator.
The median is the middle value in a set of
observations arranged in an increasing order.

To calculate the median from a set of data


collected in its raw form,
1. First arrange the data in rank order, from
the smallest to the largest observation.

2. Then count the number of observations to


find the positioning-point of median as:
a. If the number of observations in sample an odd
number,
b. the median is represented by the numerical value
corresponding to the (n+1)/2 ordered observation.

b. If the number of observations in the sample is an


even number,
the median is represented by the mean or average of
the two middle values in the ordered array - the n/2
ordered observation and the (n/2)+1 ordered
observation.
Compute the median for each of two sets of data:
a) 9, 13, 12, 7, 6, 11,12
b) 9, 13, 12, 7, 6, 11, 12, 10

Solution:
a) The ordered observation:
6, 7, 9, 11, 12, 12, 13
Numerical order for odd number:
(n+1)/2 = (7+1)/2 = 4th ordered observation. The median = 11
b) The ordered observation:
6, 7, 9, 10, 11, 12, 12, 13
Numerical order for even number:
n/2 = 8/2 = 4th ordered observation.
(n/2) +1 = (8/2) +1 = 5th ordered observation.
The median = The mean of the two middle values in
the ordered array = (10+11)/2 =11.5
1. The median is the central value; 50% of the
measurements lie above it and 50 % fall below it.

2. The median lie between the largest and the smallest


measurements of the set.

3. The median is not influenced by extreme


measurements, but affected by the number of
observations.

4. There is only one median for a set of measurements


The mode is the most frequent or most common value
in a set of observations.

1. The mode is the most frequent measurement in the


set.
2. The mode is not influenced by extreme values.
3. The mode is not always unique. A data set can have
more than one mode, or the mode may not exist for a
data set.

N.B. The mode can be used when the data are nominal,
such as coat color, blood groups, or gender.
Compute the mode for each three sets of data:

a) Noontime temperature (oC) in Cairo the first week in


January.
20, 21, 22, 23, 22, 25, 26, 22, 21

b) Noontime temperature (oC) in Alexandria the first


week in January.
18, 20, 21, 22, 21, 24, 23, 20

c) Noontime temperature (oC) in Sharkia the first week


in January.
21, 25, 24, 20, 24, 22, 20, 23, 26, 22
a. Ordered array:
20, 21, 21, 22, 22, 22, 23, 25, 26
The mode is 22 - Unimodal.

b. Ordered array:
18, 20, 20, 21, 21, 22, 23, 24
The mode are 20, 21 - Bimodal.

c. Ordered array:
20, 20, 21, 22, 22, 23, 24, 24, 25, 26
The mode are 20, 22 and 24 - Trimodal.
For a data set, the mean, median, and mode can be
quite different. Consider the following example.

A small company consists of the owner, the manager,


the salesperson, and two technicians, all of whose
annual salaries are listed here. (Assume that this is the
entire population.)

Find the mean, median, and mode.


Hence, the mean is $20,000, the median is $12,000,
and the mode is $9,000.
4. The Midrange:
In this example, the mean is much higher than the
median or the mode. This is because the extremely
high salary of the owner tends to raise the value of the
mean.

In this and similar situations, the median should be


used as the measure of central tendency.

You might also like