KEMBAR78
Chemometrics Lecture Note | PDF | Statistical Hypothesis Testing | Statistical Significance
0% found this document useful (0 votes)
847 views162 pages

Chemometrics Lecture Note

This document provides an introduction to chemometrics and discusses types of errors in quantitative analysis. It describes random errors, which cause imprecision, and systematic errors, which result in inaccurate measurements. Random errors are unavoidable and reflected by precision. Systematic errors cause bias by deviating measurements too high or too low. The document also covers statistics used to analyze data from repeated measurements, including mean, standard deviation, and normal distributions. It introduces confidence intervals for defining a range that likely includes the true value.

Uploaded by

Bikila Belay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
847 views162 pages

Chemometrics Lecture Note

This document provides an introduction to chemometrics and discusses types of errors in quantitative analysis. It describes random errors, which cause imprecision, and systematic errors, which result in inaccurate measurements. Random errors are unavoidable and reflected by precision. Systematic errors cause bias by deviating measurements too high or too low. The document also covers statistics used to analyze data from repeated measurements, including mean, standard deviation, and normal distributions. It introduces confidence intervals for defining a range that likely includes the true value.

Uploaded by

Bikila Belay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 162

Chemometrics, Data Processing and

Validation (Chem 568) Lecture Note

2 Cr. hrs

Shimeles Addisu (PhD)

shimeles.addisu1@gmail.com,

shimeles.addisu@ju.edu.et

1
2
Brief Introduction to Chemometrics
1. Analytical problems
Analytical chemists face both qualitative and quantitative
problems.
Modern analytical chemistry is significantly a quantitative
science
 quantitative result is much more valuable than a
qualitative one.
For Example:
It may be useful to have detected boron in a water
sample, but it is much more useful to be able to say how
3
much boron is present.
Errors in quantitative analysis
Measurements invariably involve errors and
uncertainties.
It is impossible to perform a chemical analysis
that is totally free of errors or uncertainties.
We can only hope to minimize errors and
estimate their size with acceptable accuracy.

4
Types of error
Experimental scientists make a fundamental distinction
between three types of error.
These are known as gross, random and systematic
errors.

Random (indeterminate) error


causes data to be scattered more or less symmetrically
around a mean value.

5
Absolute error in
the micro-Kjeldahl
determination of
nitrogen

Notice that the scatter in the data:


 The random error, for analysts 1 and 3 is significantly less than
that for analysts 2 and 4.
 In general, the random error in a measurement is reflected by its
6
precision.
Random, or indeterminate, errors:
 exist in every measurement.
 can never be totally eliminated and
 often the major source of uncertainty in a determination.
 are caused by the many uncontrollable variables that are an
unavoidable part of every analysis.
 Most contributors to random error cannot be positively identified.
 Even if we can identify sources of uncertainty, it is impossible to
measure them because most are so small that they cannot be
detected individually.
Random errors affect measurement precision.

7
Systematic (or determinate) error:
 causes the mean of a data set to differ from the accepted value.

For example: the results of analysts 1 and 2 have little systematic


error, but the data of analysts 3 and 4 show systematic errors of
about -0.7% and -1.2%.
8
 In general, a systematic error in a series of replicate
measurements causes all the results to be too high or too low.
 An example of a systematic error is the unsuspected loss of a
volatile analyte while heating a sample.
 Determinate errors affect the accuracy of an analysis by a
systematic deviation from the true value;
 i.e. all the individual measurements are either too large or too
small.
 A positive determinate error results in a central value that is
larger than the true value, and a negative determinate error leads
to a central value that is smaller than the true value.

9
Sources of Systematic Errors

There are three types of systematic errors:


Instrumental errors are caused by nonideal instrument
behavior, by faulty calibrations, or by use under inappropriate
conditions.
Method errors arise from nonideal chemical or physical
behavior of analytical systems.
Personal errors result from the carelessness, inattention, or
personal limitations of the experimenter.

10
11
Gross errors
Gross errors are so serious that there is no alternative to
abandoning the experiment and making a completely fresh
start.
Gross errors lead to outliers, results that appear to differ
significantly from all other data in a set of replicate
measurements.
Examples:
 a complete instrument breakdown,
 accidentally dropping or discarding a crucial sample,
 discovering during the course of the experiment that a pure
reagent was badly contaminated.
12
Reproducibility and Repeatability
 Suppose students is asked to do the five replicate titrations in
rapid succession.
 The same set of solutions and the same glassware would be
used throughout, the same preparation of indicator would be
added to each titration flask, and the temperature, humidity
and other laboratory conditions would remain much the same.
 In such cases the precision measured would be the within-
run precision: this is called the repeatability.

13
 Suppose, however, that for some reason the titrations were
performed by different staff on five different occasions in different
laboratories, using different pieces of glassware and different
batches of indicator.
 It would not be surprising to find a greater spread of the results in
this case.
 The resulting data would reflect the between-run precision of the
method, i.e. its reproducibility.

14
15
Statistics of repeated measurements
Mean and standard deviation
Considered the results of five replicate titrations done by four
students.

16
A more useful measure, which utilizes all the values, is the
standard deviation, s, which is defined as follows:

Example:

17
The distribution of repeated measurements
Lets’ consider the following table.
Suppose it is 50 replicate determinations of the levels of nitrate
ion in a particular water specimen.

The mean of these results is 0.500 μg mL-1 and the standard


deviation is 0.0165 μg mL-1.

18
The above table can be summarized in a frequency table (Table
below).
This table shows that, the value 0.46 μg ml-1 appears once, the
value 0.47 g ml-1 appears three times, and so on.

19
The distribution of the results can most easily be appreciated by
drawing a histogram.

In theory a concentration could take any value, so a continuous


curve is needed to describe the form of the population from which
the sample was taken.

20
The mathematical model usually used is the normal or Gaussian
distribution which is described by the equation:

where x is the measured value, and y the frequency with which it occurs

21
The normal distribution has the following properties

This would mean that, if the nitrate ion concentrations (in μg mL-1)
are normally distributed:
 about 68% should lie in the range 0.483–0.517,
 about 95% in the range 0.467–0.533 and
 99.7% in the range 0.450–0.550
In fact 33 out of the 50 results (66%) lie between 0.483 and 0.517,
49 (98%) between 0.467 and 0.533, and all the results between
0.450 and 0.550, so the agreement with theory is fairly good. 22
Figure: Properties of the normal distribution: (i) approximately 68% of values lie
within ±1s of the mean; (ii) approximately 95% of values lie within ±2s of the mean;
(iii) approximately 99.7% of values lie within ±3s of the mean. 23
For a normal distribution with known mean, μ, and standard
deviation, σ, the exact proportion of values which lie within any
interval can be found from tables, provided that the values are
first standardized so as to give z-values.
This is done by expressing any value of x in terms of its
deviation from the mean in units of the standard deviation, σ.
That is:

24
Example:
If repeated values of a titration are normally distributed with mean
10.15 mL and standard deviation 0.02 mL, find the proportion of
measurements which lie between 10.12 mL and 10.20 mL.

From Table, F(-1.5) = 0.0668

F(2.5) = 0.9938

Thus the proportion of values between x = 10.12 to 10.20


(corresponding to z = -1.5 to 2.5) is:
0.9938 – 0.0668 = 0.927

92.7% of values lie between 10.12 and 10.20 mL


25
26
27
28
Confidence limits of the mean for large samples
Now that we know the form of the sampling distribution of the
mean we can return to the problem of using a sample to define a
range which we may reasonably assume includes the true value.
Such a range is known as a confidence interval and the
extreme values of the interval are called the confidence limits.
Figure below shows the sampling distribution of the mean for
samples of size n. If we assume that this distribution is normal,
then 95% of the sample means will lie in the range given by:

29
The sampling distribution of the mean, showing the range within which
95% of sample means lie.

(The exact value 1.96 has been used in this equation rather than the
approximate value, 2. We can use Table to check that the proportion
of values between z = -1.96 and z = 1.96 is indeed 0.95.)

30
Example:
Calculate the 95% and 99% confidence limits of the mean for the
nitrate ion concentration measurements in Table below

31
Solution:

95% confidence limit as:

99% confidence limit as:

32
Confidence limits of the mean for small samples

 The subscript (n-1) indicates that t depends on this quantity,


which is known as the number of degrees of freedom.
 For large n, the values of tn-1 for confidence intervals of 95%
and 99% respectively are very close to the values 1.96 and
2.58.

33
Example:
The sodium ion level in a urine specimen was measured using an
ion-selective electrode. The following values were obtained: 102, 97,
99, 98, 101, 106 mM. What are the 95% and 99% confidence limits
for the sodium ion concentration?
Solution:
 The mean and standard deviation of these values are 100.5 mM
and 3.27 mM respectively.
 There are six measurements and therefore five degrees of
freedom.

34
 From Table the value of t5 for calculating the 95% confidence
limits is 2.57.
 From the 95% confidence limits of the mean are given by:

35
Presentation of results
Errors involved in all quantitative results

 the presentation of the results and the errors is


important.

A common practice is:

 the mean as the estimate of the quantity measured


and the standard deviation as the estimate of the
precision.

Less commonly:
 the standard error of the mean is sometimes quoted
instead of the standard deviation,

 or the result is given in the form of the 95% confidence


36
limits of the mean.
A related aspect of presenting results is the rounding-
off of the answer.

 The number of significant figures given indicates


the precision of the experiment.

For example: to give the result of a titrimetric analysis


as 0.107846 M – no analyst could achieve the implied
precision of 0.000001 in ca. 0.1, i.e. 0.001%.

37
In practice it is usual to quote as significant figures
all the digits which are certain, plus the first uncertain
one.

For example: the mean of the values 10.09, 10.11, 10.09,


10.10 and 10.12 is 10.102, and their standard deviation
is 0.01304.

38
39
Significance tests
One of the most important properties of an analytical method
is that it should be free from bias, so that the value it gives for
the amount of the analyte should be the true value.
This property may be tested by applying the method to a
standard test portion containing a known amount of analyte.
Even if there are no systematic errors, random errors make it
most unlikely that the measured amount will exactly equal the
known amount in the standard.
To decide whether the difference between the measured and
standard amounts can be accounted for by random errors a
statistical test known as a significance test can be used.
40
Comparison of an experimental mean with a known value
 In making a significance test we are testing the truth of a
hypothesis which is known as a null hypothesis, often
denoted by H0.
 The term null is used to imply that there is no difference
between the observed and known values apart from that due
to random variation.
 The null hypothesis is rejected if the probability of such a
difference occurring by chance is less than 1 in 20 (i.e. 0.05 or
5%).

41
 If ltl (i.e. the calculated value of t) exceeds a certain critical value
then the null hypothesis is rejected.
 The critical value of t for a given significance level can be found
from Table.

Example:
In a new method for determining selenourea in water the following
values were obtained for tap water samples spiked with 50 ng mL-1
of selenourea:
50.4, 50.7, 49.1, 49.0, 51.1 ng mL-1
Is there any evidence of systematic error?

42
Solution:
 The mean of these values is 50.06 and the standard deviation is
0.956.
 Adopting the null hypothesis that there is no systematic error.

From Table, the critical value is t4 = 2.78.


Since the observed value of t is less than the critical value the
null hypothesis is retained: there is no evidence of systematic
error.
Note that this does not mean that there are no systematic
errors, only that they have not been demonstrated. 43
44
Comparison of two experimental means
In this case the two methods give two sample means,

The null hypothesis is that the two methods give the


same result,
i.e. Ho: μ1 = μ2 or μ1 - μ2 = 0, so we need to test
whether ( ) differs significantly from zero.
The results from the two methods might have
different sample sizes, n1 and n2, and that we also
have two different standard deviations s1 and s2.

45
 If these standard deviations are not significantly
different for a method of testing this assumption), a
pooled estimate, s, of the standard deviation can first be
calculated using the equation:
2(
s  1n  1) s1
2
 ( n 2 1) s2
2

(n1  n 2 2)
 To decide whether the difference between the two
means, is significant, i.e. to test the null
hypothesis, H0: μ1 = μ2, the statistic t is then
calculated from

46
47
48
Paired t-test
Two methods of analysis are compared by applying
both of them to the same set of test materials, which
contain different amounts of analyte.
To test whether n paired results are drawn from the
same population, that is H0: μd = 0, we calculate the
t-statistic from the equation:

49
Table below shows the results of determining the
paracetamol concentration (% m/m) in tablets by two
different methods.
Tablets from ten different batches were analyzed to see
whether the results obtained by the two methods differed.
Each batch is thus characterized by a pair of
measurements, one value for each method.
Differences between the tablets, differences between the
methods and random measurement errors contribute to
the variation between the measurements.

50
Test whether there is a significant difference between the
results obtained by the two methods in Table above:
The differences between the pairs of values (subtracting
the second value from the first value in each case) are:

51
These differences have mean and standard
deviation
 Substituting in Eq. above, with n 10, gives t 0.88.
 The critical value is t9 = 2.26 (P = 0.05).
 Since the calculated value of is less than this the null
hypothesis is retained: the methods do not give
significantly different results for the paracetamol
concentration.

52
F-test for the comparison of standard deviations
 This is a test designed to indicate whether there is a significant
difference between two methods based on their standard
deviations.
For example, the results of two different analytical methods
or results from two different laboratories.
 The F-test uses the ratio of the two sample variances, i.e. the
ratio of the squares of the standard deviations, s12>s22
 It is calculated from the equation:
S12
F 2
S2
There are two different degrees of freedom, v1 (n1-1) and v2 (n2-1).
If the calculated value of F exceeds a certain critical value
53
(obtained from tables), then the null hypothesis is rejected.
Example:
A proposed method for the determination of the chemical oxygen
demand of wastewater was compared with the standard (mercury
salt) method. The following results were obtained for a sewage
effluent sample:

For each method eight determinations were made.


Is the precision of the proposed method significantly greater than
that of the standard method?
54
Solution:
We have to decide whether the variance of the standard method is
significantly greater than that of the proposed method.
F is given by the ratio of the variance.
S12
F 2
S2

Both samples contain eight determinations so the number of


degrees of freedom in each case is seven.

55
 The critical value is F7,7 = 3.787 (P 0.05), where the first and
second subscripts indicate the number of degrees of freedom of
the numerator and denominator respectively.
 Since the calculated value of F (4.8) exceeds this, the null
hypothesis of equal variances is rejected.
 The variance of the standard method is significantly greater than
that of the proposed method at the 5% probability level, i.e. the
proposed method is more precise.

56
57
Outliers
Every experimentalist is familiar with the situation in which one
(or possibly more than one) measurement in a set of results
appears to differ unexpectedly from the others.
In some cases the suspect result may be attributed to a human
error.
For example, if the following results were given for a titration:
12.12, 12.15, 12.13, 13.14, 12.12 mL
The fourth value is large value.
Should such suspect values be retained or should be
rejected as outlier?

Outlier is data point whose value is much larger or smaller than


the remaining data 58
The most commonly used significance test for identifying outliers is
Dixon’s Q-test.
 The Q-test is a simple statistical test to determine if a data
point that is very different from the other data points in a set
can be rejected. Only one data point may be discarded
using the Q-test.

/outlier - value closest to the outlier/


Q=
highest value - lowest value

 If Q is larger than QC the outlier can be discarded

59
Example:
The following sets of data were reported chloride analysis in a
sample: 103, 106, 107 and 114 meq/L. One value appears suspect.
Determine if it can be ascribed to accidental error, at 95 %
confidence level.
Solution:
The suspect result is 114 meq/L.

/114  107 /
Q  0.64
114  103
The tabulated value for four observations is 0.829. Since the
calculated Q is less than the tabulated Q, the suspected number
should not be rejected.
60
N Qcrit Qcrit Qcrit

(CL: 90%) (CL: 95%) (CL: 99%)


3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568

61
Grubbs’ test
This test compares the deviation of the suspect value from the
sample mean with the standard deviation of the sample. The
suspect value is naturally the value that is furthest away from the
mean.
In order to use Grubbs’ test for an outlier, i.e. to test the null
hypothesis, H0, that all measurements come from the same
population, the statistic G is calculated:

Note that the mean and the standard deviation are calculated with
the suspect value included, as H0 presumes that there are no
outliers. 62
The critical values for G for P 0.05 are given in Table.
 If the calculated value of G exceeds the critical value, the
suspect value is rejected.
 The values given are for a two-sided test, which is appropriate
when it is not known in advance at which extreme of the data
range an outlier may occur.

63
From G-Table, for sample size 4, the critical value of G is 1.481, (P 0.05).
Since the calculated value of G does not exceed 1.481, the suspect
measurement should be retained.
64
Analysis of variance
 In analytical work there are often more than two means
to be compared.
Let’s consider the following situations:
 comparing the mean concentration of protein in solution for
samples stored under different conditions
 comparing the mean results obtained for the concentration
of an analyte by several different methods
 comparing the mean titration results obtained by several
different experimentalists using the same apparatus

65
In all these examples there are two possible sources of
variation.
a. due to the random error in measurement.
b. due to what is known as a controlled or fixed-
effect factor.
 For the examples above, the controlled factors are:
 the conditions under which the solution was stored
 the method of analysis used and
 the experimentalist carrying out the titration

66
 Analysis of variance (ANOVA) is a statistical
technique which can be used to separate and estimate
the different causes of variation.
 For the examples above, it can be used to separate any
variation which is caused by changing the controlled
factor from the variation due to random error.
 It can test whether altering the controlled factor leads to
a significant difference between the mean values
obtained.
 ANOVA can also be used in situations where there is
more than one source of random variation.
67
Comparison of several means
Table below shows the results obtained in an investigation
into the stability of a fluorescent reagent stored under
different conditions.

68
Within-sample variation
 For each sample a variance can be calculated by using the
formula
A
102 101 1 1
100 101 -1 1
101 101 0 0
2

B
101 102 1 1
101 102 -1 1
104 102 2 4
6 69
C
97 97 0 0
95 97 -2 4
99 97 2 4
8

D
90 92 -2 4
92 92 0 0
94 92 2 4
8

70
Variance formula:

Degree of freedom = 12 - 4 = 8
71
101 98 3 9
102 98 4 16
97 98 -1 1
92 98 -6 36
62

72
 Within-sample mean square = 3 with 8 d.f
 Between-sample mean square = 62 with 3 d.f
 If the null hypothesis is correct, these two estimates of σ02
should not differ significantly.
 If it is incorrect, the between-sample estimate of σ02 will be
greater than the within-sample estimate because of between-
sample variation.

 The critical value of F is 4.066 (P 0.05).


 Since the calculated value of F is much greater than critical
value the null hypothesis is rejected: the sample means do
differ significantly.
73
The chi-squared test
 The chi-squared test is concerned with frequency,
i.e. the number of times a given event occurs.
 The chi-squared test can be used to test whether
the observed frequencies in a particular case differ
significantly from those which would be expected on
this null hypothesis.
 To test whether the observed frequencies, Oi, agree
with those expected, Ei, according some null
hypothesis, the statistic Χ2 is calculated:

74
 The null hypothesis is that there is no difference in
reliability.
 Assuming that the workers use the laboratory for an
equal length of time, we would thus expect the same
number of breakages by each worker.
 Since the total number of breakages is 61, , the expected
number of breakages per worker is 61/4 =15.25.

75
Solution

 For three degrees of freedom the critical value is 7.81.


 Since the calculated value is greater than this, the null
hypothesis is rejected at the 5% significance level: there
is evidence that the workers do differ in their reliability.

76
77
78
The quality of analytical measurements
Sampling
 In most analyses we rely on chemical samples to give us
information about a whole object.
 Unless the sampling stages of an analysis are considered
carefully, the statistical methods may be invalidated.
 For example it is not possible to analyze all the water in a stream
for a toxic pollutant.
 The sample studied must be taken in a way that ensures as far as
possible that it is truly representative of the whole object.

79
 To illustrate some aspects of sampling we can study the
situation in which we have a large batch of tablets and wish to
obtain an estimate for the mean weight of a tablet.
 Rather than weigh all the tablets, we take a few of them (say
ten) and weigh each one.
 In this example the batch of tablets forms the population and the
ten weighed tablets form a sample from this population.
Therefore,
 Sampling is the process by which a sample population is
reduced in size to an amount of homogeneous material that can
be conveniently handled in the laboratory and whose
composition is representative of the population.
80
Obtaining A Representative Sample
 The sampling process must ensure that the items chosen are
representative of the bulk of material or population.
 The items chosen for analysis are often called sampling units or
sampling increments.
 For example, our population might be 100 coins, and we might wish to
know the average concentration of lead in the collection of coins.
 Our sample is to be composed of five coins.
 Each coin is a sampling unit or an increment.
 In the statistical sense, the sample corresponds to several small parts
taken from different parts of the bulk material.
 To avoid confusion, chemists usually call the collection of sampling
units or increments the gross sample.
81
Separation and estimation of variances using ANOVA
Table below shows the results of the purity testing of a
barrelful of sodium chloride.
Five sample increments (A–E) were taken from different
parts of the barrel chosen at random, and four replicate
analyses were performed on each sample.

82
There are two possible sources of variation:
① due to the random error in the measurement of purity,
given by the measurement variance, σ02
② due to real variations in the sodium chloride purity at
different points in the barrel, given by the sampling
variance, σ12.
A test should be carried out to see whether σ12 differs
significantly from 0.
This is done by comparing the within- and between-
sample mean squares: if they do not differ significantly
then σ12=0 and both mean squares estimates σ02 .

83
The one way ANOVA shows that the between sample mean
square is greater than the within-sample mean square: F-
test shows that this difference is very significant i.e., σ12
does differ significantly from 0.

Between-sample variance = 1.96


Within sample variance = 0.0653

84
Introduction to quality control methods
If a laboratory is to produce analytical results of a quality
that is acceptable to its clients, and allow it to perform well
in proficiency tests or method performance studies,
 its results should show excellent consistency from
day to day.
Checking for such consistency is complicated by the
inevitable occurrence of random errors, so several statistical
techniques have been developed to show whether or not
time-dependent trends are occurring in the results,
alongside the random errors.
These are referred to as quality control methods.
85
Analytical quality control (AQC): refers to all those
processes and procedures designed to ensure that the
results of laboratory analysis are consistent,
comparable, accurate and within specified limits of
precision.
Suppose that a laboratory uses a chromatographic
method for determining the level of a pesticide in fruits.
The results may be used to determine whether a large
batch of fruit is acceptable or not, and their quality is
thus of great importance.

86
The performance of the method will be checked at
regular intervals by applying it, with a small number of
replicate analyses, to a standard reference material
(SRM), in which the pesticide level is certified by a
regulatory authority.
A standard reference material is a highly purified
compound that is well characterized.
The quality and purity of reference standards are crucial
to determining scientifically valid results for many
analytical methods.

87
Alternatively an internal quality control (IQC) standard of
known composition and high stability can be used.
IQC is a valuable technique to ensure that the results
produced from any assay are reliable and reproducible.
IQC ensures that factors determining the magnitude of
uncertainty do not change during the routine use of an
analytical method over long periods of time.
IQC is conducted by inserting one or more control
materials into every run of analysis.
The control materials are treated by an analytical
procedure identical to that performed on the test
materials. 88
In practice z = 1.96 is often rounded to 2 for 95% confidence
limits and z = 2.97 is rounded to 3 for 99.7% confidence.
89
90
Shewhart chart for mean
values

91
92
93
These factors take values depending on the sample size, n.
The relevant equations are:

Shewhart chart for


range

94
95
 The ARL can be reduced significantly by using a
different type of control chart, a CUSUM (cumulative
sum) chart.
 CUSUM chart is a type of control chart used to
monitor small shifts in the process mean.
 It uses the cumulative sum of deviations from a
target.
 The CUSUM chart plots the cumulative sum of
deviations from the target for individual
measurements or subgroup means.

96
97
98
5. Calibration methods: regression and correlation
Calibration graphs in instrumental analysis
 The analyst takes a series of samples in which the
concentration of the analyte is known.
 These calibration standards are measured in the
analytical instrument under the same conditions as
those subsequently used for the test (the ‘unknown’)
samples.
 The results are used to plot a calibration graph, which is
then used to determine the analyte concentrations in
test samples by interpolation.

99
A reagent blank and a set of
five standards
Calibration procedure in
instrumental analysis: o
calibration points; • test
sample.

100
This general procedure raises several important statistical
questions:
 Is the calibration graph linear? If it is a curve, what is the form of the
curve?
 Since each of the points on the calibration graph is subject to errors,
what is the best straight line (or curve) through these points?
 Assuming that the calibration plot is actually linear, what are the errors
and confidence limits for the slope and the intercept of the line?
 When the calibration plot is used for the analysis of a test sample,
what are the errors and confidence limits for the determined
concentration?
 What is the limit of detection of the method? That is, what is the least
concentration of the analyte that can be detected with a
predetermined level of confidence?

101
 The calibration curve is always plotted with the instrument
signals on the vertical (y) axis and the standard
concentrations on the horizontal (x) axis.
 This is because it is assumed that:
 all the errors are in the y-values and that the standard
concentrations (x-values) are error-free.
 the y-values obtained have a normal (Gaussian) error
distribution, and that the magnitude of the random errors
in the y-values is independent of the analyte
concentration.
The straight line calibration graphs take the algebraic form:

where b is the slope of the line and a its intercept on the y-axis.
102
The product–moment correlation coefficient
Is the calibration graph linear?
A common method of estimating how well the
experimental points fit a straight line is to calculate the
product–moment correlation coefficient, r.
 It is often referred to simply as the correlation
coefficient.

103
, is called the covariance of the two variables x and y.

 Covariance measures their joint variation.


 If x and y are not related their covariance will be close to
zero.
 The correlation coefficient r equals the covariance of x and y
divided by the product of their standard deviations.
 so if x and y are not related r will also be close to zero.
 r can only take values in the range -1 ≤ r ≤ +1.

 When r =-1 describes perfect negative correlation


(negative slope), whereas when r = +1 we have
perfect positive correlation (positive slope).

104
Figure: The product–moment
correlation coefficient, r.

105
106
When r =-1 describes perfect negative correlation (negative
slope), whereas when r = +1 we have perfect positive
correlation (positive slope).

107
The line of regression of y on x
 The least-squares straight line is given by:

 The line determined from these equations is known as


the line of regression of y on x.
 The line indicating how y varies when x is set to
chosen values.

108
Example:

Calculate the slope and intercept of the regression line


for the data

109
Errors in the slope and Intercept of the regression line
The line of regression calculated is used to estimate:
 the concentrations of test samples by interpolation
 the limit of detection of the analytical procedure
The random errors in the values for the slope and intercept
important and should be calculated.
The random errors in the y-direction (sy/x)

It will be seen that this equation utilizes the y-residuals


where the ỹi values are the points on the calculated regression line
corresponding to the individual x-values, i.e. the ‘fitted’ y-values.

110
The y-residuals of a regression line

111
 Provided with a value for sy/x we can now calculate sb and
sa, the standard deviations for the slope (b) and the
intercept (a).
 These are given by:

112
Example
Calculate the standard deviations and confidence limits of the
slope and intercept of the regression line calculated in above
example.

113
114
Limits of detection
The limit of detection (LOD) of an analyte is the
concentration which gives an instrument signal (y)
significantly different from the ‘blank’ or ‘background’
signal.
LOD can be calculated as the analyte concentration
giving a signal equal to the blank signal, yB, plus three
standard deviations of the blank, sB:

115
 Curve A represents the normal
distribution of measured values
of the blank signal.
 A point y = P towards the upper
edge of this distribution, and
claim that a signal greater than
this was unlikely to be due to
the blank
 a signal less than P would be assumed to indicate a blank
sample.
 for a sample giving an average signal P, 50% of the observed
signals will be less than this, since the signal will have a normal
distribution (of the same shape as that for the blank) extending
116
below P (curve B).
 The probability of concluding that this sample does not differ
from the blank when in fact it does is therefore 50%.
 Point P, which has been called the limit of decision, is thus
unsatisfactory as a limit of detection, since it solves the first of
the problems mentioned above, but not the second.
 A more suitable point is at y = Q, such that Q is twice as far as P
from yB.
 The distance from yB to Q in the x-direction is 3.28 times the
standard deviation of the blank, sB, then the probability of each
of the two kinds of error occurring is only 5%.
 If the distance from yB to Q is only 3sB, the probability of each
error is about 7%: many analysts would consider that this is a
reasonable definition of a limit of detection.

117
Limit of quantitation (LOQ)
The limit of quantitation (or limit of
determination), which is regarded as the lower limit
for precise quantitative measurements, as opposed to
qualitative detection.
LOQ = yB + 10sB

118
The method of standard additions
The complication of matching the matrix of the standards
to that of the sample can be avoided by conducting the
standardization in the sample. This is known as the
method of standard additions.

 Equal volumes of the sample solution


are taken, each is separately ‘spiked’
with known and different amounts of
the analyte, and all are then diluted to
the same volume.

119
The signal is plotted on the y-axis and the x-axis is
graduated in terms of the amounts of analyte added.
The regression line is calculated in the normal way, but
space is provided for it to be extrapolated to the point on
the x-axis at which y = 0.
This negative intercept on the x-axis corresponds to the
amount of the analyte in the test sample.

120
Weighted regression lines
In any calibration analysis the overall random error of
the result will arise from a combination of the error
contributions from the several stages of the analysis.
When the y-direction error in a regression calculation
gets larger as the concentration increases.
In a weighted linear regression, each xy-pair’s
contribution to the regression line is inversely proportional
to the precision of yi;
 that is, the more precise the value of y, the greater
its contribution to the regression.

121
 If the individual points are denoted by (x1, y1), (x2, y2), etc. as
usual, and the corresponding standard deviations are s1, s2,
etc., then the individual weights, w1, w2, etc., are given by:

 The slope and the intercept of the recession line are then given
by:

y w and x w represent the co-ordinates of the weighted centroid

yw  
i
wi yi
n
and x w  
i
wi xi
n
122
Example:
Calculate the unweighted and weighted regression lines for
the following calibration data. For each line calculate also
the concentrations of test samples with absorbances of
0.100 and 0.600.

123
The slope and the intercept of unweighted regression line is
calculated as follow:

  x  x  y  y 
i i
slope, b  i

 x  x
2
i
i

Intercept , a  y  bx

Slope = 0.0725
Intercept = 0.0133

The regression line equation: y = 0.0725x + 0.0133


124
The concentrations corresponding to absorbances of
0.100 and 0.600 are then found to be 1.20 and 8.09
μgmL-1 respectively.

125
The weighted regression line can be calculated as:
In the absence of a suitable computer program it is usual to
set up a table as follows.

126
y w  0.1558 / 6  0.0260
y w  1.372 / 6  0.229

aw = 0.0260 - (0.0738  0.229) = 0.0091

The regression line equation:


y = 0.0738x + 0.0737

127
These values for aw and bw can be used to show that absorbance
values of 0.100 and 0.600 correspond to concentrations of 1.23
and 8.01 μgmL-1 respectively.

128
129
The median: initial data analysis
Mean or average is used as the ‘measure of central
tendency’ or ‘measure of location’ of a set of results
when the (symmetrical) normal distribution is assumed,
but in non-parametric statistics, the median is usually
used instead.

130
The sign test
 The sign test is amongst the simplest of all non-
parametric statistical methods
 The sign test is used to test hypotheses concerning the
median of a continuous distribution.
 Let’s use the symbol θ to represent to median of the
distribution.
 Remember that in the case of a normal distribution the
mean is equal to the median and so the sign test can
be used to test hypotheses concerning the mean of a
normal distribution.
131
① Form null and alternative hypotheses and choose a degree of
confidence.
 The null hypothesis is that the median of our population
distribution is equal to a specified median value, and the
alternative hypothesis is that it is different.
 The chosen degree of confidence determines the significance
level, which will be used when deciding whether or not to
reject the null hypothesis.
② Compute a test statistic.
 We count how many of the sample values are greater than or
less than the specified median value.
 We then compute the probability of getting this number (or a
more unlikely one) if the specified median was correct.
 This probability is our test statistic.
132
③ Compare the test statistic to a critical value.
 For the sign test, our critical value is the chosen significance
level.
 Therefore, if the probability is less than or equal to the
significance level, then we reject the null hypothesis.

Example:
Professor A wanted to test if the contaminant levels in the drug
were better than the government guideline of 50. Suppose that
Professor A has now produced a new batch of the drug and
noticed that the new contaminant level data do not appear to be
normally distributed. In this case, Professor A would need to use
a nonparametric hypothesis test.

133
The contaminant level data resulting from Professor A’s new
production of the drug are as follows:
45.344, 48.655, 36.199, 54.881, 49.287, 49.336, 53.492,
40.702, 46.318, 31.303
To perform the sign test, we first form our null and alternative
hypotheses.
We are interested in whether the sample median is less than
50
 Our null hypothesis is the sample median is not less than
50.
 The alternative hypothesis is the sample median is not the
same as 50.

134
Next, we compute our test statistic.
 To do this, we determine whether each sample value is
greater than or less than the specified median of 50.
 Denoting values greater than 50 by “+” and those less
than 50 by “-”, we have: (- - - + - - + - - -). Counting
these up, we have two pluses and eight minuses.
 If the null hypothesis were true (i.e. the median was not
less than 50), what would be the probability that we would
get this result (or a more unlikely one) by chance?
To answer this question, we must introduce the binomial
distribution.
 In probability theory, the binomial distribution indicates
the probability of the number of “successes” in n trials,
135
each of which has a probability p of “success”.
The binomial distribution specifies that the probability of r
successes in n trials with a chance of success in a single
trial of p is:

For us, our “trial” consists of asking whether each sample


value is greater than or less than the specified median
value. We can define “success” as either “+” or “-” (the
final result will be the same): we choose to define success
as “-”.
136
What we want to know is the probability of getting r = 8
or more “-” results in n = 10 trials.
If our null hypothesis is true, then the chance of success
in a single trial is p = 0.5. Therefore we compute:

This value 0.0547 is our test statistic.


This is telling us the probability of getting 8 or more values that
are less than the specified median value of 50 if our null
hypothesis were true.
137
The critical value for a 1-tailed sign test is simply the
significance level 0.05.
Comparing our test statistic with the critical value, we
find that 0.0547 > 0.05, so we cannot reject the null
hypothesis.
This means that Professor A has been unable to show
that the contaminant levels are significantly better than
the government guideline level.

138
The Wald–Wolfowitz runs test
The Wald Wolfowitz run test is a non-parametric test or
method that is used in cases when the parametric test is not in
use.
In some instances we are interested not merely in whether
observations generate positive or negative signs, but also in
whether these signs occur in a random sequence.
If a straight line is a good fit to a set of calibration points,
positive and negative residuals will occur more or less at
random.
A sequence of +ve signs followed by a sequence of –ve signs,
and then another sequence of +ve signs. Such sequences are
technically known as runs
139
The Wald–Wolfowitz method tests whether the number of runs
is small enough for the null hypothesis of a random
distribution of signs to be rejected.
The number of runs in the experimental data is compared
with the numbers in the Wald–Wolfowitz runs test table, which
refers to the P = 0.05 probability level.
The table is entered by using the appropriate values for N, the
number of +ve signs, and M, the number of -ve signs.
If the experimental number of runs is smaller than the
tabulated value, then the null hypothesis can be rejected.

140
Example

141
142
The Wilcoxon signed rank test
 A disadvantage of the sign test is it uses so little of the
information provided.
 Important advances were made by Wilcoxon, and his
signed rank test has several applications. Its
mechanism is best illustrated by an example.
Example:
 The blood lead levels (in pgmL-1) of seven children
were found to be 104, 79, 98, 150, 87, 136 and 101.
Could such data come from a population, assumed to
be symmetrical, with a median/mean of 95 pgmL-1?

143
The first step of the Wilcoxon sign test is to calculate the
differences of the repeated measurements and to calculate
the absolute differences.
 On subtraction of the reference concentration (95) the
data give values of:
 Absolute differences are first arranged in order of
magnitude without their signs

No. Pb (pgmL-1) Diff. Abs. diff.


1 104 9 9
2 79 -16 16
3 98 3 3
4 150 55 55
5 87 -8 8
6 136 41 41
7 101 6 6 144
The next step of the Wilcoxon sign test is to sign each
rank.
If the original difference < 0 then the rank is multiplied
by -1; if the difference is positive the rank stays positive.
(Their signs are then restored to them)

No. Pb (pgmL-1) Diff. Abs. diff.


3 98 3 3
7 101 6 6
5 87 -8 8
1 104 9 9
2 79 -16 16
6 136 41 41
4 150 55 55
145
 For the Wilcoxon signed rank test we can ignore
cases where the difference is zero.
 For all other cases we assign their relative rank. In
case of tied ranks the average rank is calculated.
 That is if rank 10 and 11 have the same observed
differences both are assigned rank 10.5.

146
The numbers are then ranked: in this process they keep
their signs but are assigned numbers indicating their
order (or rank):

No. Pb (pgmL-1) Diff. Abs. diff. Rank


3 98 3 3 1
7 101 6 6 2
5 87 -8 8 -3
1 104 9 9 4
2 79 -16 16 -5
6 136 41 41 6
4 150 55 55 7

147
The next step is to calculate the +ve rank and –ve rank.

+ve rank =1 + 2 + 4 + 6 + 7 = 20
-ve rank = 3 + 5 = 8

We can double check this knowing

20 + 8 = 7(7+1)/2
28 = 28
 The lower of these two figures (8) is taken as the test statistic.
 For n = 7, the test statistic must be less than or equal to 2
before the null hypothesis – that the data do come from a
population of median (mean) 95 – can be rejected at a
significance level of P = 0.05.
 Since the test statistic is 8 the null hypothesis must be retained.
148
Example:
The following table gives the percentage concentration of zinc,
determined by two different methods, for each of eight
samples of health food.

Is there any evidence for a systematic difference between the


results of the two methods?

149
If there is no systematic difference between the two methods,
then we would expect that the differences between the results
for each sample, i.e. (titration result ̶ spectrometry result),
should be symmetrically distributed about zero.
The signed differences are:
Sample EDTA Atomic Signed
titration spectrometry Diff.
1 7.2 7.6 -0.4
2 6.1 6.8 -0.7
3 5.2 4.6 0.6
4 5.9 5.7 0.2
5 9.0 9.7 -0.7
6 8.5 8.7 -0.2
7 6.6 7.0 -0.4
8 4.4 4.7 -0.3
150
Arranging these values in numerical order while retaining their
signs, we have:

Sample EDTA Atomic Signed


titration spectrometry Diff.
6 8.5 8.7 -0.2
4 5.9 5.7 0.2
8 4.4 4.7 -0.3
1 7.2 7.6 -0.4
7 6.6 7.0 -0.4
3 5.2 4.6 0.6
2 6.1 6.8 -0.7
5 9.0 9.7 -0.7

151
 The ranking of these results presents an obvious difficulty, that of
tied ranks.
 There are two results with the numerical value 0.2, two with a
numerical value of 0.4, and two with a numerical value of 0.7.
 This problem is resolved by giving the tied values average ranks,
with appropriate signs.
 Thus the ranking for the present data is:
Sample EDTA Atomic Signed Tied
titration spectrometry Diff. rank
6 8.5 8.7 -0.2 -1.5
4 5.9 5.7 0.2 1.5
8 4.4 4.7 -0.3 -3
1 7.2 7.6 -0.4 -4.5
7 6.6 7.0 -0.4 -4.5
3 5.2 4.6 0.6 6
2 6.1 6.8 -0.7 -7.5
152
5 9.0 9.7 -0.7 -7.5
This sum for the numbers above is 36, which is the same as the
sum of the first eight integers (for the first n integers the sum is
n(n+1/2), and therefore correct.
 The sum of the positive ranks is 7.5
 The sum of the negative ranks is 28.5
 Therefore, the test statistic is 7.5

For n = 8, the test statistic has to be ≤3 before the null


hypothesis can be rejected at the level P = 0.05.
Therefore, the null hypothesis must be retained:
 there is no evidence that the median (mean) of the
difference is not zero, and
 no evidence for a systematic difference between the two
analytical methods.
153
154
Mann–Whitney U test
A nonparametric test that is appropriate two-sample
unpaired data is the Mann–Whitney U test.
Following our checklist for hypothesis testing, we start
off by forming our hypotheses:
 Null hypothesis: the two populations have identical
distributions.
 Alternative hypothesis: the two populations have
different medians, but otherwise are identical.

155
To compute our test statistic, we start off by pooling both
samples (which are of sizes nc and nt for the control and
test data, respectively) into a single large sample.
We sort the data values in the large sample from 1,...,nc +nt
and then calculate the sums of the ranks from each
individual sample.
We use Rt to denote the sum of the ranks of the test sample
and Rc the sum of the ranks of the control sample.
Ut and Uc are calculated using the following formulae:

156
The lower value of Ut or Uc is then compared with a critical
value from a table.
If the calculated value is lower than the critical value, then we
reject the null hypothesis.

Example:
A study investigating potential links between diet and physical
development. Height and weight data have been gathered
from two cohorts: one of subjects who had suffered from
malnutrition in childhood (cohort A) and one of subjects who had
not (cohort B).

157
 The research team wishes to determine if the heights of
the subjects in cohorts A and B are different.
 Based on other findings of the study, the team has a
good reason to doubt that all the populations from
which these samples were drawn are normally
distributed.
 Therefore, as we have unpaired data (i.e. the subjects
in cohorts A and B are different), we will use the Mann–
Whitney U test.
 To perform the test, both samples ranked from lowest to
highest.

158
 Next sum up the ranks for cohorts A and B.
 Denoting cohort B as the control sample and cohort A as
the test sample, we have:

We use these computed values Rc = 35.5 and Rt = 55.5,


and the sample sizes nc = 6 and nt = 7

159
As a check:
Uc + Ut = nc × nt
Uc + Ut = 27.5 + 14.5 = 42
nc × nt = 6 × 7 = 42
Ut = 14.5, as our test statistic (the lower of these two
values).
We find that our critical value is 6.
As 14.5 is not less than 6, we cannot reject the null
hypothesis,
 meaning that the team cannot conclude with 95%
confidence that the heights of the two cohorts are
different.
160
161
162

You might also like