NGUYEN Hong Diep 2011 These
NGUYEN Hong Diep 2011 These
Spécialité : Informatique
18 January 2011
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Description of the document . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Interval arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Algebraic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Absolute value properties . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.4 Midpoint and radius properties . . . . . . . . . . . . . . . . . . . . 8
1.3.5 Variable dependency . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.6 Wrapping effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.8 Implementation environment . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
i
ii CONTENTS
4 Relaxation method 63
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Interval improvement: Jacobi and Gauss-Seidel iterations . . . . . . . . . . 64
4.2.1 Convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.3 Initial solution of the relaxed system . . . . . . . . . . . . . . . . . 70
4.3 Implementation and numerical experiments . . . . . . . . . . . . . . . . . . 71
4.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
iii
iv LIST OF FIGURES
List of Algorithms
v
vi LIST OF ALGORITHMS
Chapter 1
Introduction
This chapter presents the general context and the motivation of my Ph.D.
preparation. It gives a brief introduction to interval arithmetic together with
its advantages as well as a discussion on the main obstacles to its application.
Interval arithmetic always provides reliable results but suffers from low perfor-
mance with respect to floating-point arithmetic. Moreover, a naive application
of interval arithmetic does not provide a sharp enclosure of the exact result, or
can even fail to provide guaranteed results. In this chapter, we present some
properties of interval arithmetic, which will serve as the basis to derive efficient
implementations of interval algorithms in the following chapters.
1.1 Motivation
Floating-point arithmetic is used for scientific computing purpose and it allows to obtain
accurate results at a very fast speed. In many cases, floating-point arithmetic provides
results of a quality which is satisfactory for user demand. However, with its well-known
effect of rounding errors, computed results are rarely exact. Each number being encoded
by a finite number of digits, the set of floating-point numbers is indeed discrete and thus
cannot represent all real values. Meanwhile the result of a mathematical operation between
real numbers or floating-point numbers is a real value and in many cases is not representable
by another floating-point number. Thus, to be able to be encoded by another floating-point
number, the computed result takes as value a floating-point number which is close to the
exact result. More precisely, the computed result is rounded to a nearby floating-point
value. The difference between the floating-point value and the exact value is called the
rounding error.
These computed results are indeed approximate results. To reduce the effect of round-
ing errors, one can increase the computing precision. The higher the computing precision
is, the smaller the rounding errors are. However, increasing the computing precision means
increasing the storage space as well as increasing significantly the computing time. There-
fore, fixed finite precision is chosen to gain performance. The most common formats are
IEEE single and double precisions. Using the IEEE double precision, which is encoded by
64 bits with 1 sign bit, 11 exponent bits, and 52 mantissa bits, the effect of rounding errors
in many applications is negligible. Nevertheless, in some cases, it is needed to assess how
accurate the computed results are.
There exists a number of techniques for assessing the effect of rounding errors. Firstly,
1
2 CHAPTER 1. INTRODUCTION
an a priori error bound upon the computed result can be obtained by performing numer-
ical analysis. This error bound is based on the theory of perturbations [Hig02], and is
usually estimated using the condition number which measures the stability of results with
respect to small changes in inputs. Nevertheless, when the problem is ill-conditioned, these
condition-based error bounds are usually pessimistic and can give only an overview of how
accurate the computed results could be expected, regardless of which computation model
is employed. In some cases, a rigorous assessment of result accuracy is needed.
Another option is to use running error analysis, which yields an a posteriori error bound
simultaneously with the computed approximate result [Hig02, Wil60, Lan04]. Running
error computation is based on the rounding error model: |x ◦ y − fl(x ◦ y)| ≤ |fl(x ◦ y)|,
where is the machine relative error. Running error computation uses the actual computed
intermediate quantities, therefore a running error bound is usually smaller than an a priori
error bound. However, running error computation depends on every specific problem.
Each individual problem needs to have its own model of computing running errors. This
is disadvantageous when the problem in question is of high complexity.
Interval arithmetic is designed to automate this process of computing reliable results, as
stated in the dissertation of R. E. Moore entitled "Interval Arithmetic and Automatic Error
Analysis in Digital Computing" [Moo63] which is the most popular among the pioneers to
introduce interval arithmetic in the early 60s. Interval arithmetic does not aim directly at
improving the accuracy of computations, but at providing a certificate of accuracy. The
fundamental principle of interval arithmetic is to enclose all quantities which are not known
exactly during computations. Using intervals to represent data, and interval arithmetic to
perform computations, one obtains not an approximation but a range of values in which
lies the exact result. With this enclosure, one gets a certificate of accuracy, i.e. a bound
on the distance between the exact result and the computed result. This computed result
is said to be verified in the sense that a value which is not included in the computed range
cannot be the exact result.
Interval arithmetic always provides verified results. Nevertheless, if the computed en-
closure is too large, then the provided certificate is of no use. Therefore, accuracy is of
great concern when developing numerical methods that yield reliable results using interval
arithmetic.
Interval arithmetic allows to obtain certification for computed results. Meanwhile,
arbitrary precision allows to increase the accuracy of computations by reducing the impact
of rounding errors. The combination of these two components provides accurate results
accompanied by a sharp error bound.
On the one hand, having to compute an enclosure of all possible values, interval arith-
metic suffers from low performance. The ratio of execution time of interval arithmetic with
respect to floating-point arithmetic is 4 in theory, as this can be seen on the formula for
interval multiplication. In practice, this factor is much higher if care is not taken when im-
plementing it. Moreover, using arbitrary precision also introduces an important overhead
in execution time. Although interval arithmetic is able to obtain reliable results endowed
with a certificate of accuracy, if the computation time is too long then it is not of much
use.
On the other hand, for many problems which have been solved efficiently in floating-
point arithmetic, when porting to interval arithmetic, simply replacing floating-point op-
erations by their interval counterparts does not lead to an efficient algorithm and mostly
does not provide results of high precision, or it can even fail to provide guaranteed results.
1.2. DESCRIPTION OF THE DOCUMENT 3
Therefore, I proceeded the Ph.D. preparation with the aim of obtaining an efficient
implementation of interval arithmetic for some numerical problems. "Efficient" means
computing enclosures of small relative width at a cost of reasonable execution time, which
is of a constant modest factor with respect to computing approximate results using floating-
point arithmetic. Additionally, increasing the accuracy is likely to increase the computation
time. Hence, in some cases, a trade-off between accuracy and speed is mandatory.
As a first step toward providing a general methodology to obtain an efficient imple-
mentation of interval arithmetic, the 3 years of my PhD preparation succeeded to provide
some techniques to improve the efficiency of interval arithmetic for several problems in
numerical linear algebra, as well as some new tradeoffs between accuracy and speed.
In the context of this Ph.D., we propose several accurate algorithms and efficient implemen-
tations in verified linear algebra using interval arithmetic. Two fundamental problems are
addressed, namely the multiplication of interval matrices and the verification of floating-
point solutions of linear systems.
For the multiplication of interval matrices, we propose two algorithms which are based
respectively on endpoints and midpoint-radius representations of intervals [NR10, Ngu].
These algorithms offer new trade-offs between speed and accuracy. Indeed, they are less
accurate but much faster than the natural algorithm, the speedup is of order 10 for matrices
of dimensions 100 × 100 and might attain order 100 for matrices of dimensions 1000 × 1000.
On the other hand, they are slower than the fastest algorithm (proposed by S. M. Rump),
and the factor of execution time is around 2, but they provide sharper bounds of the
exact results with an overestimation factor less than 1.18, compared to 1.5 for the fastest
algorithm.
To verify the floating-point solution of a linear system, we adapt floating-point iterative
refinements to interval arithmetic [NR]. By using alternately floating-point and interval
refinements, our method provides an accurate solution together with a tight error bound
upon it. To reduce the slowdown due to the intensive use of interval operations, we intro-
duce a relaxation technique, which helps to reduce drastically the algorithm’s execution
time. Additionally, the use of extended precision for the approximate solution yields so-
lutions which are guaranteed to almost the last bit at a cost of an acceptable overhead of
execution time.
This document is organized as follows: the rest of this first chapter briefly introduces
interval arithmetic together with some of its properties. A more thorough introduction to
interval arithmetic and its applications can be found in [Moo63, MB79, MKC09, JKDW01,
Neu90, HKKK08]. The second chapter presents algorithms for implementing the multipli-
cation of interval matrices. The third chapter is devoted to the underlying approach to
verify a linear system of equations. Chapter 4 presents the relaxation technique to improve
the performance of the algorithm presented in chapter 3. Chapter 5 studies the effect of
the computing precisions on the accuracy of computed results.
4 CHAPTER 1. INTRODUCTION
1.3.1 Operations
An interval algebraic operation is defined by taking all the possible results of the corre-
sponding operation, applied to all the possible pairs of real values taken from the input
intervals. Interval operations must always obey the fundamental property of isotonicity,
regardless of representation or implementation:
∀a ∈ a ∀b ∈ b . . . f(a, b, . . .) ∈ f(a, b, . . .).
Convex hull
The convex hull of a set of real values is the smallest interval which includes all these real
values. Let S be a set of real values then the convex hull of S, denoted by , is defined by
S = [min(S), max(S)] .
Additionally, let a = [a, a] ∈ IR, b = b, b ∈ IR be two intervals, then the hull of these
two intervals is defined as
(a, b) = min(a, b), max(a, b) .
Unary operations
For a = [a, a] ∈ IR, a unary operation f on a is defined as
f(a) = {f(a) a ∈ a}.
Table 1.1 lists some unary interval operations: the formula are established using the mono-
tonicity of the operations when possible.
It is easy to prove the isotonicity of these operations. Let us take for example the case
of the square operation. For all a ∈ a, then a ≤ a ≤ a:
• if a ≥ 0 then 0 ≤ a ≤ a ≤ a ⇒ 0 ≤ a2 ≤ a2 ≤ a2 , hence a2 ∈ [a2 , a2 ];
• if a ≤ 0 then a ≤ a ≤ a ≤ 0 ⇒ 0 ≤ a2 ≤ a2 ≤ a2 , hence a2 ∈ [a2 , a2 ];
• if a < 0 < a. Because a ≤ a ≤ a then |a| ≤ max(|a|, |a|) ⇒ a2 ≤ max(|a2 |, |a2 |). It
is always true that a2 ≥ 0. Hence a2 ∈ [0, max(a2 , a2 )]. Furthermore, as 0, a and a
belong to a, 0, a2 and a2 belong to a2 , hence [0, max(a2 , a2 )] is the tightest interval.
1.3. INTERVAL ARITHMETIC 5
Binary operations
Let a = [a, a] ∈ IR, b = b, b ∈ IR be two intervals. A binary operation ◦ ∈ {+ − ∗/}
between a and b is defined as
a ◦ b = {a ◦ b a ∈ a, b ∈ b}.
Addition
Due to its monotonicity, the addition of two intervals can be computed by adding their
corresponding endpoints:
a + b = a + b, a + b .
The isotonicity of the addition is directly deduced from its monotonicity.
Subtraction
Similarly to the addition, the subtraction between two intervals is computed by
a − b = a − b, a − b .
Multiplication
The multiplication of two intervals is formed by taking the minimum and maximum
values of the four products between their two pairs of endpoints
a ∗ b = min(a ∗ b, a ∗ b, a ∗ b, a ∗ b), max(a ∗ b, a ∗ b, a ∗ b, a ∗ b) .
Therefore, an interval multiplication requires four punctual products and six compar-
isons. Otherwise, one can reduce the number of punctual products and comparisons by
inspecting the sign of input intervals. For example, if a ≥ 0 then
a ∗ b = min(a ∗ b, a ∗ b), max(a ∗ b, a ∗ b) .
The worst case is when a < 0 < a and b < 0 < b, in this case we say that a and b
contain 0 in their interior. Then
a ∗ b = min(a ∗ b, a ∗ b), max(a ∗ b, a ∗ b) ,
we still have to compute all the four punctual products, except that only two comparisons
are needed instead of six. This explains that the cost of interval arithmetic is 4 times the
cost of real arithmetic.
Note that, different from real number operations, an interval square is not equivalent
to a multiplication of an interval by itself. For a ∈ IR, it is always true that a2 ⊆ a ∗ a,
and equality holds when a does not contain 0 in its interior. For example, if a = [1, 2]
then a2 = [1, 4], and a ∗ a = [1, 4]. Hence, a2 = a ∗ a. Meanwhile, if a = [−1, 2] then by
definition a2 = [0, 4] whereas a ∗ a = [−2, 4].
Division
Interval division is defined via multiplication and inverse, see Table 1.1. Let a ∈ IR, b ∈
IR be two intervals then
a/b = a ∗ (1/b).
Set operations
Intervals are connected closed set of real values, hence some set operations on intervals
apply to intervals.
Infimum/supremum
The infimum and supremum of an interval are respectively its inferior and superior
endpoints. For a = [a, a] ∈ IR, then
inf(a) = a,
sup(a) = a.
Maximal magnitude value, denoted by mag(.) or |.|, is the maximal absolute value
of all real numbers taken from the interval. For a = [a, a] ∈ IR, then
mag(a) = max{|a| : a ∈ a}
= max(|a|, |a|).
Otherwise, the maximal magnitude value can be computed by the following proposition.
Proof. We prove that max(−a, a) = max(|a|, |a|). There are three possible cases:
Minimal magnitude value, denoted by mig(.) or h.i, is the minimal absolute value
of all real numbers taken from the interval. For a = [a, a] ∈ IR, then
mig(a) = min{|a| : a ∈ a}
0 if a < 0 < a,
=
min(|a|, |a|) otherwise.
We can also compute the minimal magnitude value by the following proposition.
The proof is by inspecting the same three cases as in the proof of Proposition 1.3.1.
Midpoint
1
mid(a) = (a + a).
2
Radius
1
rad(a) = (a − a).
2
2. commutativity: a + b = b + a and a ∗ b = b ∗ a,
4. sub-distributivity: a ∗ (b + c) ⊆ a ∗ b + a ∗ c.
Proof. (1),(2),(3),(5) can be deduced directly from the properties of the real absolute value.
Let us prove (4). For all a ∈ a, b ∈ b, we have
Let â ∈ a, b̂ ∈ b such that |â| = hai and |b̂| = hbi. It is obvious that â ± b̂ ∈ a ± b.
Therefore ha ± bi ≤ |â ± b̂| ≤ |â| + |b̂| ≤ hai + hbi. It implies (4).
Note that the following inequality is not always true: hai − hbi ≤ ha ± bi. For example,
if a = [1, 2] and b = [−1, 2] then hai − hbi = 1 − 0 = 1. Meanwhile, ha + bi = h[0, 4]i = 0.
Hence, hai − hbi > ha + bi. On the other hand, if a ≥ 0 and b ≥ 0 then ha + bi = a + b =
hai + hbi ≥ hai − hbi.
Proof.
• If 0 ∈ a and 0 ∈
/ b then hai = 0. Moreover, (mid(a) + rad(a)) ∗ mid(b) ∈ a ∗ b,
and (mid(a) − rad(a)) ∗ mid(b) ∈ a ∗ b, hence,
rad(a ∗ b) ≥ 1/2|(mid(a) + rad(a)) ∗ mid(b) − (mid(a) − rad(a)) ∗ mid(b)|
≥ |rad(a) ∗ mid(b)| = rad(a) ∗ |mid(b)|.
As |mid(b)| = hbi + rad(b), then
rad(a ∗ b) ≥ rad(a) ∗ (hbi + rad(b))
≥ rad(a) ∗ hbi + rad(a) ∗ rad(b)
≥ rad(a) ∗ rad(b) + rad(a) ∗ hbi + hai ∗ rad(b).
• The case 0 ∈/ a and 0 ∈ b is similar, as a and b play symmetric roles.
• If 0 ∈
/ a and 0 ∈/ b. Again, |mid(a)| = hai + rad(a), |mid(b)| = hbi + rad(b).
Both (mid(a) + α) ∗ (mid(b) + β) and (mid(a) − α) ∗ (mid(b) − β) are included in
a ∗ b, hence, thanks to the sign conditions imposed on α and β,
rad(a ∗ b) ≥ 1/2|(mid(a) + α) ∗ (mid(b) + β) − (mid(a) − α) ∗ (mid(b) − β)|
≥ |mid(a) ∗ β + α ∗ mid(b)|
≥ |mid(a)| ∗ |β| + |α| ∗ |mid(b)|
≥ (hai + rad(a)) ∗ rad(b) + rad(a) ∗ (hbi + rad(b))
≥ rad(a) ∗ rad(b) + hai ∗ rad(b) + rad(a) ∗ hbi.
x2 − 2x where x = [1, 2] .
Performing interval operations, we have x2 = [1, 4], 2x = [2, 4], and finally x2 − 2x =
[−3, 2].
Meanwhile, if we rewrite the above expression as follows
(x − 1)2 − 1
Example 2. Let x = {[4, 6] , [−1, 1]} ∈ IR2 be a box whose coordinates are the two
intervals x1 = [4, 6], and x2 = [−1, 1]. This Cartesian product of intervals x is depicted by
the black box in Figure 1.1.
4/5 −3/5
Let us now rotate x by a rotation matrix A = and then rotate it back
3/5 4/5
4/5 3/5
by the inverse matrix A−1 = .
−3/5 4/5
12 CHAPTER 1. INTRODUCTION
x
2
x’
x1
x x’’
In theory, x should not change after these two rotations. However, this computation
performed using interval arithmetic gives a result which is a larger interval vector x00 includ-
ing x inside its interior. After the first rotation, we get an interval vector x0 , represented
by the gray box in Figure 1.1.
0 4/5 −3/5 [4, 6] [13/5, 27/5]
x = = .
3/5 4/5 [−1, 1] [8/5, 22/5]
This box x0 wraps the exact image of x (dotted lines) in a box whose sides are parallel
to the axes. After the second rotation, we get the final result, represented by the white
box
00 4/5 3/5 [13/5, 27/5] [76/25, 174/25]
x = = .
−3/5 4/5 [8/5, 22/5] [−49/25, 49/25]
One can see on Figure 1.1 that x00 is much larger than x. The factor of overestimation
of x00 with respect to x is f = {49/25, 49/25}, which means that each side of the original
box has been expanded by a factor of nearly 2.
1.3.7 Implementation
Interval arithmetic is implemented on computers using floating-point arithmetic. The most
common representation of intervals uses their endpoints. Each endpoint of an interval is
encoded by one floating-point number. Therefore, we have a new set of intervals whose
endpoints are discrete
IF = {[a, a] ∈ IR, a, a ∈ F}.
Using floating-point arithmetic, we have to handle the problem of rounding errors.
In the presence of rounding errors, the exactness of endpoint computations is no longer
guaranteed. Let us consider, for example, the addition of two intervals. The result’s upper
bound is, by definition, the sum of the upper bounds of the two summands, which is a
real number and in most cases not representable by a floating-point number. Hence, to
guarantee the isotonicity property, we adapt the definition to operations on IF.
Definition 1.3.6. For x ∈ IF, y ∈ IF, the result of an operation x ◦ y, ◦ ∈ {+ − ∗/} is
the smallest interval z ∈ IF containing
{z = x ◦ y x ∈ x, y ∈ y}.
1.3. INTERVAL ARITHMETIC 13
For ranges of arithmetic operations, directed rounding modes allow to obtain a rigorous
and tight enclosure of the exact
result
interval.
Let a = [a, a] ∈ IF, b = b, b ∈ IF. Using directed rounding modes, basic interval
arithmetic operations can easily be implemented.
Addition
a ⊕ b = 5 (a + b) , 4 a + b .
The isotonicity of this operation is simple. For all a ∈ a, b ∈ b, it is obvious that
5 (a + b) ≤ a + b ≤ a + b ≤ a + b ≤ 4 a + b .
Implementation issues
In IEEE 754-1985 processors, i.e. on most processors including IA-32 [Int07], changing the
rounding mode of floating-point operations is done by changing the value of some global
register. This requires flushing the floating-point pipeline: all the floating-point operations
initiated beforehand must be terminated before performing the new ones with the new
rounding mode. Therefore changing rounding mode frequently as required to implement
interval arithmetic will lead to a dramatic slowdown in performance: in practice, the
computation is slowed down compared to its floating-point counterpart, by a factor between
20 and 50 [Rum99a].
This problem can be eradicated in more recent instruction sets which allow to change
rounding-mode per-instruction without penalty [MBdD+ 10]. Unfortunately, this feature
can be accessed only from low-level programming language by manipulating directly the
instruction word, not from current high level languages which are designed to be compliant
with the IEEE 754-1985 [IEE85] standard. The new IEEE 754-2008 [IEE08] standard for
floating-point arithmetic corrects this issue. However, it may take some time before it is
fully supported.
Otherwise, the inclusion property of interval arithmetic usually requires obtaining the
maximal and minimal values from intermediate results, e.g. interval multiplication or
square. These maximum and minimum functions require themselves at least a comparison
followed by a selection of the right value. They are usually coded by a test and branch or on
some particular processors, such as the CELL processor, by comparison and bit selection.
These max and min functions introduce a significant overhead of execution time.
With the obstacles mentioned above, a lot of work is needed in order to obtain an
efficient interval arithmetic implementation, which is well pipelined and parallelized. It may
even require redesigning and reimplementing everything from scratch. That is not desirable,
as there exists a number of efficient implementations for the floating-point versions of
the corresponding algorithms which have been well optimized and parallelized. A more
1.4. CONCLUSION 15
1.4 Conclusion
Interval arithmetic paves the way for a new powerful tool to obtain verified results in scien-
tific computing. Instead of giving an approximation, algorithms using interval arithmetic
provide an enclosure of the results. It means that the exact results are guaranteed to
be included in the computed results. On the contrary, a value not included in the com-
puted results can never be the exact result. Therefore using interval arithmetic provides
some level of certification. The accuracy of computed results can also be easily checked by
assessing the width of computed intervals.
Nevertheless, there are obstacles that prevent interval arithmetic from being broadly
used. Firstly, interval computations usually provide results of large width. Some impor-
tant sources of overestimation can be named here: variable dependency and wrapping
effects. A naive application of interval arithmetic to existing algorithms for floating-point
arithmetic does not, in general case, provide sharp enclosures of the results. The width of
intervals explodes very fast if care is not taken when implementing an algorithm in interval
arithmetic.
16 CHAPTER 1. INTRODUCTION
This chapter presents two new algorithms for the multiplication of interval
matrices, which are based on endpoints and midpoint-radius representations of
interval respectively. These algorithms are based on a decomposition of inter-
vals into two parts, one which is centered in zero, and the other which captures
the largest part not containing zero in its interior. They are implemented using
9 and 7 floating-point matrix products respectively. They are a bit slower than
the most efficient algorithm, to my knowledge, proposed by S. M. Rump, which
costs only 4 floating-point matrix products. Nonetheless, the overestimation
factor of the results computed by the two new algorithms is always bounded
by 1.18, compared to 1.5 for Rump’s algorithm. Both algorithms are imple-
mented using IntLab, which is a BLAS-based library for interval arithmetic, to
demonstrate their performance.
2.1 Introduction
In this chapter we study the case of matrix multiplication which involves O(n3 ) operations
on O(n2 ) data. Interval matrix product implemented by the natural algorithm provides
good results in terms of accuracy but suffers from low performance in terms of execution
time. An idea is to exploit existing libraries which are well optimized for floating-point
matrix operations such as BLAS, ATLAS, etc. Rump [Rum99a] proposed an efficient algo-
rithm to calculate the interval matrix multiplication with a cost of 4 floating-point matrix
products. The factor of overestimation in the worst case of this algorithm is 1.5. In this
chapter, we propose a new algorithm which offers a trade-off between performance and
accuracy: this algorithm costs 7 floating-point matrix products with a factor of overesti-
mation in the worst case of 1.18.
17
18 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
Hence
Regarding the exactness of operations, it can easily be verified that a⊕b = a+b. For the
case of multiplication, except for the case where either one of the two multipliers is punctual
or is centered in zero, (2.2) always provides overestimated results. The overestimation
factor of this formula is provided in Proposition 2.2.2 [Rum99a]:
rad(a b)
ρ= ,
rad(a ∗ b)
Matrix operations can be defined in the same way as scalar operations. Suppose that
each interval matrix is represented by two punctual matrices, one representing the mid-
points and the other representing the radius. Let A = hA, Λi ∈ IRn×n , B = hB, Σi ∈ IRn×n
A ⊕ B = hA + B, Λ + Σi ,
(2.3)
A B = hA ∗ B, |A| ∗ Σ + |B| ∗ Λ + Λ ∗ Σi .
The main advantage of Rump’s algorithm is that using formula (2.3), an interval matrix
product can be computed by performing merely point matrix products. It means that the
interval matrix product as defined by formula (2.3) can be implemented using existing
optimized libraries for floating-point matrix operations, and can benefit from pipelined or
parallelized implementations.
Nonetheless, care must be taken when implementing (2.3) in finite precision. Indeed, in
the presence of rounding errors, one cannot compute exactly the product of two floating-
point matrices. Therefore, firstly, we have to enclose the midpoint of the result by comput-
ing it following (2.3) twice, in downward and upward rounding modes. Secondly, to guaran-
tee the property of isotonicity, we must compute an upper bound of the radius. According to
(2.3), the expression to compute the radius involves only products of non-negative matrices.
Thus an upper bound of the radius can be obtained by simply evaluating in upward round-
ing mode. Additionally, (2.3) can be rewritten as A B = hA ∗ B, (|A| + Λ) ∗ Σ + |B| ∗ Λi.
Therefore, in total, the implementation of Rump’s algorithm in finite precision to compute
an interval matrix product requires only 4 floating-point matrix products together with a
few rounding mode changes.
In regard to accuracy, the addition A ⊕ B is always exactly equal to A + B in infinite
precision. For the case of multiplication:
( n )
X
A∗B= Ai,k ∗ Bk,j , i = 1 : n, j = 1 : n
( k=1
n
)
X
⊆ Ai,k Bk,j , i = 1 : n, j = 1 : n
( k=1
n
)
X
⊆ hAi,k ∗ Bk,j , |Ai,k | ∗ Σk,j + Λi,k ∗ |Bk,j | + Λi,k ∗ Σk,j i , i = 1 : n, j = 1 : n
k=1
(* n n
+)
X X
⊆ Ai,k ∗ Bk,j , |Ai,k | ∗ Σk,j + Λi,k ∗ |Bk,j | + Λi,k ∗ Σk,j
k=1 k=1
⊆ hA ∗ B, |A| ∗ Σ + |B| ∗ Λ + Λ ∗ Σi
⊆ A B.
The overestimation factor of the interval matrix product defined by (2.3) is also given
in [Rum99a]:
Proposition 2.2.3. For A an interval matrix with all components of relative precision e,
and for B an interval matrix with all components of relative precision f , the overestimation
of each component of the result of multiplication is bounded by
ef
1+ .
e+f
In any case, the overestimation is uniformly bounded by 1.5.
2.3. ALGORITHM 1 USING ENDPOINTS REPRESENTATION 21
Example 3. Let’s compute the following interval matrix product by both the natural
algorithm and Rump’s algorithm:
[0, 4] [0, 2] [0, 2] [0, 2] [0, 12] [0, 12]
∗ = .
[0, 2] [0, 4] [0, 2] [0, 2] [0, 12] [0, 12]
[0, 4] [0, 2] [0, 2] [0, 2] h2, 2i h1, 1i h1, 1i h1, 1i
=
[0, 2] [0, 4] [0, 2] [0, 2] h1, 1i h2, 2i h1, 1i h1, 1i
h3, 9i h3, 9i
=
h3, 9i h3, 9i
[−6, 12] [−6, 12]
= .
[−6, 12] [−6, 12]
In practice, the factor between results computed by (2.3) and those computed by (2.1)
is interestingly much smaller than this bound, and in many cases close to 1. In some cases
when the majority of input elements are small, i.e their radius is small with respect to their
midpoint, the matrix product computed by (2.3) is even sharper than that computed by
(2.1).
On the other hand, if all components of A and B are of large relative precision e ≈ 1,
then the overestimation of each component of the result computed by (2.3) is close to 1.5.
This might not be acceptable in some applications, for example in global optimization. In
the following sections we will introduce new algorithms to reduce this overestimation factor
while not deteriorating significantly the performance.
Centered-in-zero multiplier
Let a = [a, a] ∈ IR be a centered-in-zero interval, meaning that a = −a and a > 0. Let
b = b, b be an interval, then the product between a and b can be computed by
a ∗ b = a ∗ max(|b|, |b|), a ∗ max(|b|, |b|)
= [−a ∗ mag (b) , a ∗ mag (b)] .
22 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
Note that this formula is still valid in the presence of rounding errors, with a ∗ mag (b)
being computed in upward rounding mode. Indeed, suppose that c = 4(a ∗ mag (b)) then
c ≥ a ∗ mag (b), which implies −c ≤ −a ∗ mag (b) ≤ a ∗ mag (b) ≤ c. Hence a ∗ b ⊆ [−c, c].
Due to the symmetry, the interval multiplication can be computed by one punctual
multiplication for the upper bound, followed by one negation to compute the lower bound.
This also holds
for a matrix product.
n×n
Let A = A, A ∈ IR be a centered-in-zero interval matrix, i.e. A = −A and A > 0.
Let B = B, B be an interval matrix, then the product between A and B can be computed
by
A ∗ B = −A ∗ mag (B) , A ∗ mag (B) . (2.4)
The implementation in floating-point arithmetic simply sets the rounding mode to
upward before computing A ∗ mag (B).
The lower bound is opposite to the upper bound, hence the product A ∗ B requires
only one punctual matrix product.
Non-negative multiplier
Let a = [a, a] ∈ IR be an interval where a ≥ a ≥ 0. Let b = b, b ∈ IR be an interval,
then
max(a ∗ b, a ∗ b) = a ∗ b, max(a ∗ b, a ∗ b) = a ∗ b,
min(a ∗ b, a ∗ b) = a ∗ b, min(a ∗ b, a ∗ b) = a ∗ b.
Hence
By induction on the dimension of the matrices, we can generalize this formula to matrix
multiplication.
Proposition 2.3.1. For A = A, A ∈ IRn×n , with Ai,j ≥ 0 for all 1 ≤ i ≤ n, 1 ≤ j ≤ n,
h + −
i
A ∗ B = A ∗ B− + A ∗ B+, A ∗ B + A ∗ B . (2.5)
Hence, in this case, an interval matrix product requires four punctual matrix products.
Similarly, when A is non-positive, the multiplication is also computed by four punctual
matrix products:
h − +
i
A ∗ B = A ∗ B + A ∗ B , A ∗ B+ + A ∗ B− . (2.6)
2.3. ALGORITHM 1 USING ENDPOINTS REPRESENTATION 23
a+ = {a ∈ a | a ≥ 0},
a− = {a ∈ a | a ≤ 0}
a ∗ b = a+ ∗ b + a− ∗ b.
Proposition 2.3.2 means that the computation of the interval multiplication in this case
requires eight punctual multiplications.
We can generalize Proposition 2.3.2 to matrix multiplication by induction. One matrix
does not contain zero if each element of this matrix does not contain zero.
Proposition 2.3.3. For A ∈ IRn×n not containing zero, and B ∈ IRn×n , the multiplication
A ∗ B can be computed by eight punctual matrix multiplications:
A ∗ B =A+ ∗ B + A− ∗ B
h + −
i
= A + ∗ B − + A+ ∗ B + , A+ ∗ B + A+ ∗ B
h − +
i
+ A− ∗ B + A− ∗ B , A− ∗ B + + A − ∗ B −
h − +
= A+ ∗ B − + A+ ∗ B + + A − ∗ B + A − ∗ B ,
+ −
i
A+ ∗ B + A + ∗ B + A − ∗ B + + A− ∗ B − .
Note that all the formulae for these special cases provide results without overestimation.
A = A + h0, Λi
B = B + h0, Σi
⇒ A ∗ B = (A + h0, Λi) ∗ (B + h0, Σi)
⊆ A ∗ B + A ∗ h0, Σi + h0, Λi ∗ B + h0, Λi ∗ h0, Σi
⊆ hA ∗ B, 0i + h0, |A| ∗ Σi + h0, Λ ∗ |B|i + h0, Λ ∗ Σi
⊆ hA ∗ B, |A| ∗ Σ + Λ ∗ |B| + Λ ∗ Σi .
With this decomposition, the overestimation in the worst case is 1.5. Our idea is to
find another decomposition such that:
• sub-products can be efficiently computed, and
(3) A0 + A∗ = A, and
C = A0 ∗ B + A∗ ∗ B, (2.8)
Moreover, A0 is centered in zero and A∗ does not contain zero, according to Section
2.3.1, A0 ∗B and A∗ ∗B can be computed exactly using respectively one and eight punctual
matrix products. Hence, it requires in total nine punctual matrix products to compute an
enclosure of A ∗ B by Proposition 2.3.5. Nevertheless, because of the sub-distributivity,
the result computed by this proposition is always an overestimation of the exact result.
The following theorem provides the overestimation factor of this algorithm in case of
scalar inputs.
Theorem 2.3.6. For a ∈ IR, b ∈ IR, with a being decomposed, following Proposition
2.3.4, into two parts: a = a0 + a∗ , the overestimation of a ~ b = a0 ∗ b + a∗ ∗ b with respect
to a ∗ b is defined as
diam(a ~ b)
ρ1 = .
diam(a ∗ b)
The following holds:
√
(1) ρ1 is always bounded by 4 − 2 2;
a ~ b = a0 ∗ b + a∗ ∗ b
= [a, −a] ∗ b + [0, a + a] ∗ b
= [a ∗ mag (b) , −a ∗ mag (b)] + [0, a + a] ∗ b.
There are four subcases for the value of b, which are b ≥ 0, b ≤ 0, b > |b| > 0 > b,
and |b| > b > 0 > b.
26 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
1. Case b ≥ 0
mag (b) = b
⇒ a~b ∗ mag (b) , −a ∗ mag (b)] + [0,
= [a a + a] ∗ b
= a ∗ b, −a ∗b + 0, (a + a) ∗ b
= a ∗ b, a ∗ b .
Meanwhile, b ≥ 0 → a ∗ b = a ∗ b, a ∗ b . Hence a ~ b = a ∗ b. It means that there
is no overestimation in this case, i.e ρ1 = 1.
2. Case b ≤ 0: this case is similar to the case b ≥ 0. It means that if a contains zero
but b does not contain zero then ρ1 = 1. Hence property (3) is true.
3. Case b > 0 > b and b ≥ |b| → mag (b) = b and (a + a) ∗ b < 0 < (a + a) ∗ b. Hence
a ~ b = [a ∗ mag (b) , −a ∗ mag (b)] + [0, a + a] ∗ b
= a ∗ b, −a ∗ b + (a + a) ∗ b, (a + a) ∗ b
= (a + a) ∗ b + a ∗ b, a ∗ b
⇒ diam(a ~ b) = a ∗ b − ((a + a) ∗ b + a ∗ b)
= a ∗ b ∗ (1 − a/a − b/b − a/a ∗ b/b).
As a > 0 > a > −a and b > 0 > b > −b
a∗b<0
a∗b<0<a∗b<a∗b
⇒ a ∗ b = min(a ∗ b, a ∗ b), a ∗ b
⇒ diam(a ∗ b) = a ∗ b − min(a ∗ b, a ∗ b)
= a ∗ b ∗ (1 − min(a/a, b/b)).
4. Case b > 0 > b and b < |b|: this case is similar to the case b > 0 > b and b ≥
|b|. By denoting by M = max(|a|/a, b/|b|) and m = min(|a|/a, b/|b|), the factor of
overestimation in this case is
m(1 − M )
ρ1 = 1 + ,
1+M
√ √
which is also bounded by ρ1 ≤ 4 − 2 2. Equality holds when a/a = b/b = 1 − 2.
Theorem 2.3.7. For A ∈ IRn×n , B ∈ IRn×n , the overestimation factor of each component
of the multiplication defined by Proposition 2.3.5 is bounded by
√
ρ ≤ 4 − 2 2.
Moreover, if all the components of either A or B do not contain zero then all the computed
components are exacts.
Proof. By induction from Theorem 2.3.6 and from the fact that
Hence,
√ it is clear that the overestimation in the worst case of our algorithm, which is
4 − 2 2 ≈ 1.18, is smaller than the overestimation in the worst case of Rump’s algorithm,
which is 1.5. For each individual pair of inputs, the following proposition gives a rough
comparison between the two algorithms.
Proposition 2.3.8. For a = [a, a] = ha, αi ∈ IR, b = b, b = hb, βi ∈ IR
a~b⊆a b.
Equality holds when either one of the two inputs is punctual or centered in zero.
a~b⊆a b.
2. Both a and b contain zero. There are four possible sub-cases where:
28 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
We will consider here only the first sub-case. The proof of the other 3 sub-cases is
similar.
[a, a] and ha, αi represent the same interval, i.e.
a = a−α a = 1/2 ∗ (a + a)
and
a = a+α α = 1/2 ∗ (a − a).
Similarly we have
b = b−β b = 1/2 ∗ (b + b)
and
b = b+β β = 1/2 ∗ (b − b).
We have a ≥ |a| > 0 > a, b ≥ |b| > 0 > b hence α > a > 0 and β > b > 0.
a b as defined by (2.2) is
mid(a b) = a ∗ b,
rad(a b) = |a| ∗ β + α ∗ |b| + α ∗ β
= a ∗ β + α ∗ b + α ∗ β.
As mentioned in the proof of Theorem 2.3.6, when a ≥ |a| > 0 > a and b ≥ |b| >
0 > b, a ~ b is
a~b = (a + a) ∗ b + a ∗ b, a ∗ b
= [2 ∗ a ∗ (b − β) + (a − α) ∗ (b + β), (a + α) ∗ (b + β)]
= [3 ∗ a ∗ b − a ∗ β − α ∗ β − α ∗ β, a ∗ b + a ∗ β + α ∗ b + α ∗ β]
= [2 ∗ a ∗ b + mid(a b) − rad(a b), mid(a b) + rad(a b)]
⊆ [mid(a b) − rad(a b), mid(a b) + rad(a b)] as a ∗ b > 0
⊆ a b.
2.3.3 Implementation
We will now detail the implementation of the formula given in Proposition 2.3.5. Using
floating-point arithmetic to implement interval arithmetic, we have to deal with rounding
errors.
For the matrix decomposition as defined by Proposition 2.3.4, the only source of round-
ing error is the addition between the two bounds, as the rest can be expressed in terms of
max and min functions which are free of rounding errors. Hereafter, MatLab notations are
used to describe algorithms: = means assignment. Operations take place between scalars,
vectors or matrices as long as their dimensions agree. Indeed, to increase the parallelism
of algorithm, instead of using loops, we will express algorithms solely in terms of matrix
operations, using MatLab-like syntax.
n×n
Let A = A, A ∈ IF be an interval matrix, the following algorithm decomposes A
into two parts as defined by Proposition 2.3.4. All operations are componentwise.
A0 = max(min(−A, A), 0)
A1 = 5(A + A0)
A2 = 4(A − A0)
2. A∗ ⊆ [A1, A2].
Proof. We will inspect several subcases for the component Ai,j , 1 ≤ i, j ≤ n of A:
• If 0 ≤ Ai,j ≤ Ai,j , then A0i,j = 0 and A∗i,j = Ai,j . For A0i,j we have
Hence [−A0i,j , A0i,j ] = 0 = A0i,j , and [A1i,j , A2i,j ] = Ai,j , Ai,j = A∗i,j .
• If Ai,j ≤ Ai,j ≤ 0, then A0i,j = 0 and A∗i,j = Ai,j . For A0i,j we have
Therefore A1i,j = Ai,j and A2i,j = Ai,j . Hence [−A0i,j , A0i,j ] = 0 = A0i,j , and
[A1i,j , A2i,j ] = Ai,j , Ai,j = A∗i,j .
30 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
• If Ai,j < 0 < |Ai,j | < Ai,j , then A0i,j = Ai,j , −Ai,j , and A∗i,j = 0, Ai,j + Ai,j .
• If Ai,j < 0 < Ai,j < |Ai,j |, then A0i,j = −Ai,j , Ai,j , and A∗i,j = Ai,j + Ai,j , 0 .
Now we have to compute the two sub-products, which are [−A0, A0]∗B and [A1, A2]∗B.
Following Section 2.3.1,
Hence to compute [−A0, A0] ∗ B, it is only needed to compute its upper bound, which is
A0 ∗ mag (B). In the presence of rounding errors, it must be computed in upward rounding
mode.
To compute the multiplication [A1, A2] ∗ B using Proposition 2.3.3, it is needed to get
the negative and positive parts of the two bounds of [A1, A2] and B.
For a ∈ F, it is true that a− = min(a, 0) and a+ = max(a, 0).
For a = [a, a] ∈ IF and a does not contain zero, it is true that
a+ = max(a, 0),
a+ = max(a, 0),
a− = min(a, 0),
a− = min(a, 0).
Hence, using Proposition 2.3.3 we can compute the multiplication [A1, A2] ∗ B by:
Finally, the full algorithm is implemented in MatLab as a function called igemm using
the IntLab library as follows.
2.3. ALGORITHM 1 USING ENDPOINTS REPRESENTATION 31
8 A2a = max(A2 , 0 ) ;
9 A2i = min(A2 , 0 ) ;
10 A1a = max(A1 , 0 ) ;
11 A1i = min(A1 , 0 ) ;
13 B2 = B . sup ;
14 B1 = B . i n f ;
15 B2a = max( B2 , 0 ) ;
16 B2i = min( B2 , 0 ) ;
17 B1a = max( B1 , 0 ) ;
18 B1i = min( B1 , 0 ) ;
20 zeroA = a l l ( a l l (A0 == 0 ) ) ;
21 i f ( zeroA )
22 A0 = zeros ( s i z e (A0 , 1 ) , s i z e (B, 2 ) ) ;
23 else
24 A0 = A0 ∗ max(−B . i n f , B . sup ) ;
25 end
31 R = i n f s u p ( RI ,RU) ;
32 end
The IntLab library uses endpoints format to represent real intervals. Each real interval
term (scalar, vector, or matrix) is represented by two floating-point terms of the same size,
one for the lower bound and one for the upper bound. One can obtain the lower bound and
the upper bound of each interval term by using the getter functions, which are .inf and
.sup respectively. Beside basic functions of MatLab, we use here some additional functions:
• setround(rnd) to change the rounding mode according to the input parameter rnd:
1 means upward rounding mode, −1 means downward rounding mode, and 0 means
to-nearest rounding mode;
• infsup(P 1, P 2) to construct an interval matrix whose lower bound and upper bound
is respectively P 1 and P 2;
20
15
10
0
0 100 200 300 400 500 600 700 800 900 1000
dimension
mode for the following lines: this has no impact on lines 8–22, and it is the required
rounding mode for line 24 which consists in computing A0 ∗ mag(B), and line 27 which
computes the upper bound of the result. Then we change to downward rounding mode to
compute the lower bound of the result in line 29. In total, this implementation uses only
3 changes of rounding mode.
Practical performance
The igemm function uses in total 9 floating-point matrix products, along with some matrix
operations of order O(n2 ) to manipulate data. Hence, in theory, the factor in terms of
execution time between igemm and a floating-point matrix product is 9.
In practice, when the matrix dimension is small, the difference between operations of
order O(n2 ) and operations of order O(n3 ) is small. That is not only because the theoretical
factor n is small, but also that operations of order O(n3 ) better exploit memory locality
and parallelism.
Test matrices for this set of experiments as well as other experiments in this section
are generated by function rand(n) from MatLab, given n the dimension of the matrix.
This function returns a matrix of size n × n whose components are uniformly distributed
random numbers between 0 and 1.
Since the igemm function uses only a small constant number of BLAS level 3 routines,
the overhead in execution time introduced by the interpretation is negligible. It means that
there is almost no difference in performance in terms of execution time between a MatLab
implementation and other implementations on C or FORTRAN programming language.
As shown on Figure 2.1, when the matrix dimension is small, i.e smaller than 100, the
execution time of igemm with respect to a floating-point matrix multiplication, marked by
the × symbol, is high. Nevertheless, when the matrix dimension gets higher, this factor
gets smaller and gets close to the theoretical factor.
The same phenomenon can be observed with Rump’s algorithm, marked by the +
symbol.
In the implementation, for some special cases, we can reduce the number of floating-
2.3. ALGORITHM 1 USING ENDPOINTS REPRESENTATION 33
point matrix products by inspecting the signs of input matrices. For example, if A is a
non-negative matrix then, following Proposition 2.3.1, the interval multiplication requires
only four punctual multiplications. Furthermore, if the upper bound of all the components
−
of B are of the same sign then either B or B + is zero. Hence, in total it requires only
three punctual matrix multiplications.
The best case is when all the components of A are of the same sign, and all the
components of B are of the same sign, though it is not a very common case. In this
case, we need only two punctual matrix multiplications to implement an interval matrix
multiplication.
Implementation issues
Following Proposition 2.3.8, results computed by this algorithm are always included in
results computed by the IntLab library. Especially when one of the multiplier does not
contain zero, then there is no overestimation with this algorithm. For the implementation
using floating-point arithmetic, we face the problem of rounding errors. Therefore, when
the input data are of small relative precision, then results computed by the IntLab library
are curiously even tighter than results computed by this algorithm. This can be explained
by the following example.
Denote by the relative machine error for double precision, which is 2−53 , then 1 +
2, 1 + 4 and 1 + 6 are floating-point numbers but 1 + , 1 + 3 and 1 + 5 are not.
Let’s consider the interval multiplication a ∗ b where a = [1 − 2, 1 + 2] = h1, 2i, b =
[1 + 2, 1 + 6] = h1 + 4, 2i.
Using the natural algorithm
= [1 − , 1 + 10] .
This example is somewhat tricky, but its purpose is to show that, taking into account
rounding errors, algorithms’ behavior is different from that of the mathematical model.
Hence, although in theory our algorithm is shown to be always better than Rump’s algo-
rithm in terms of accuracy, in practice it is only true when the intervals’ width is large
enough.
In the following section, we will explore another algorithm which takes advantage of
the decomposition proposition, and which is less prone to rounding errors.
34 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
Case a ≥ 0
When a ≥ 0, a + α ≥ 0 and a − α ≥ 0. Therefore
Hence
sup(a ∗ b) + inf(a ∗ b)
mid(a ∗ b) =
2
a ∗ (b + β) + α ∗ |b + β| + a ∗ (b − β) − α ∗ |b − β|
=
2
|b + β| − |b − β|
= a∗b+α∗ ,
2
sup(a ∗ b) − inf(a ∗ b)
rad(a ∗ b) =
2
a ∗ (b + β) + α ∗ |b + β| − (a ∗ (b − β) − α ∗ |b − β|)
=
2
|b + β| + |b − β|
= a∗β+α∗ .
2
2.4. ALGORITHM 2 USING MIDPOINT-RADIUS REPRESENTATION 35
Case a < 0
When a < 0, a + α < 0 and a − α < 0. Therefore
sup(a ∗ b) = max{(a + α) ∗ (b − β), (a − α) ∗ (b − β)}
= a ∗ (b − β) + max{α ∗ (b − β), −α ∗ (b − β)}
= a ∗ (b − β) + α ∗ |b − β|,
inf(a ∗ b) = min{(a + α) ∗ (b + β), (a − α) ∗ (b + β)}
= a ∗ (b + β) + min{α ∗ (b + β), −α ∗ (b + β)}
= a ∗ (b − β) − α ∗ |b + β|.
Hence
sup(a ∗ b) + inf(a ∗ b)
mid(a ∗ b) =
2
a ∗ (b − β) + α ∗ |b − β| + a ∗ (b + β) − α ∗ |b + β|
=
2
|b + β| − |b − β|
= a∗b−α∗ ,
2
sup(a ∗ b) − inf(a ∗ b)
rad(a ∗ b) =
2
a ∗ (b − β) + α ∗ |b − β| − (a ∗ (b + β) − α ∗ |b − β|)
=
2
|b + β| + |b − β|
= −a ∗ β + α ∗ .
2
Denote by sign(a) the sign of a, these two cases where a does not contain zero can be
rewritten by the same formula
(
mid(a ∗ b) = a ∗ b + sign(a) ∗ α ∗ |b+β|−|b−β|
2
,
|b+β|+|b−β|
rad(a ∗ b) = |a| ∗ β + α ∗ 2
.
Moreover β ≥ 0, hence
|b + β| + |b − β| = 2 ∗ b,
• If b > β ≥ 0 then
|b + β| − |b − β| = 2 ∗ β.
|b + β| + |b − β| = 2 ∗ β,
• If β ≥ b ≥ 0 then
|b + β| − |b − β| = 2 ∗ b.
|b + β| + |b − β| = 2 ∗ β,
• If 0 ≥ b ≥ −β then
|b + β| − |b − β| = −2 ∗ b.
|b + β| + |b − β| = −2 ∗ b,
• If b < −β ≤ 0 then
|b + β| − |b − β| = −2 ∗ β.
It is easy to deduce that
|b + β| + |b − β|
= max(|b|, β),
2
|b + β| − |b − β|
= sign(b) ∗ min(|b|, β).
2
36 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
Therefore
mid(a ∗ b) = a ∗ b + sign(a) ∗ α ∗ sign(b) ∗ min(|b|, β),
(2.10)
rad(a ∗ b) = |a| ∗ β + α ∗ max(|b|, β).
Case 0 ≤ a < α
We have a − α < 0 < |a − α| < a + α. Using the technique of decomposition from section
2.3, a = ha, αi = [a − α, a + α] is decomposed into two parts as follows:
a0 = [a − α, α − a] = h0, α − ai ,
a∗ = [0, 2 ∗ a] = ha, ai .
As a0 is centered in zero, mid(a0 ∗ b) = 0 and
rad(a0 ∗ b) = (α − a) ∗ mag(b)
= (α − a) ∗ (|b| + β).
Moreover, a∗ does not contain zero, and following (2.10)
mid(a∗ ∗ b) = a ∗ b + a ∗ sign(b) ∗ min(|b|, β),
Case 0 ≥ a > −α
We have a − α < 0 < a + α < |a − α|. Using the technique of decomposition from section
2.3, a is decomposed into two parts as follows:
a0 = [−(a + α), α + a] = h0, a + αi ,
a∗ = [2 ∗ a, 0] = ha, −ai .
As a0 is centered in zero, mid(a0 ∗ b) = 0 and
rad(a0 ∗ b) = (α + a) ∗ mag(b)
= (α + a) ∗ (|b| + β).
Again, a∗ does not contain zero, following (2.10)
mid(a∗ ∗ b) = a ∗ b + sign(a) ∗ (−a) ∗ sign(b) ∗ min(|b|, β)
= a ∗ b + a ∗ sign(b) ∗ min(|b|, β),
rad(a∗ ∗ b) = |a| ∗ β − a ∗ max(|b|, β).
2.4. ALGORITHM 2 USING MIDPOINT-RADIUS REPRESENTATION 37
a ∗ b ⊆ a0 ∗ b + a∗ ∗ b ≡ a ~ b
where
mid(a ~ b) = a ∗ b + a ∗ sign(b) ∗ min(|b|, β),
rad(a ~ b) = |a| ∗ β − a ∗ max(|b|, β) + (α + a) ∗ (|b| + β)
= |a| ∗ β + α ∗ |b| + α ∗ β + a ∗ (|b| + β − max(|b|, β))
= |a| ∗ β + α ∗ |b| + α ∗ β − |a| ∗ min(|b|, β).
Hence, the last two cases where 0 ≤ a < α or 0 ≥ a > −α can be rewritten in the same
formula
mid(a ~ b) = a ∗ b + a ∗ sign(b) ∗ min(|b|, β),
(2.11)
rad(a ~ b) = |a| ∗ β + α ∗ |b| + α ∗ β − |a| ∗ min(|b|, β).
Let’s now compare the two formulae (2.10) and (2.11) to deduce a homogeneous formula
for the general case. Let us recall Formula (2.10) for the case where a does not contain
zero:
mid(a ∗ b) = a ∗ b + sign(a) ∗ α ∗ sign(b) ∗ min(|b|, β),
rad(a ∗ b) = |a| ∗ β + α ∗ max(|b|, β).
If a does not contain zero, then min(|a|, α) = α.
Otherwise, if a contains zero, then min(|a|, α) = |a|. Hence a = sign(a) ∗ |a| =
sign(a) ∗ min(|a|, α).
Therefore, the midpoint of the two formulae (2.10) and (2.11) can be rewritten in the
same form:
Finally, these two formulae can be rewritten in the same form for both cases, whether
a contains zero or not:
mid(a ~ b) = a ∗ b + sign(a) ∗ min(|a|, α) ∗ sign(b) ∗ min(|b|, β),
(2.12)
rad(a ~ b) = |a| ∗ β + α ∗ |b| + α ∗ β − min(|a|, α) ∗ min(|b|, β).
mmr(a) = sign(a) ∗ min(|a|, α)
Denote by
mmr(b) = sign(b) ∗ min(|b|, β)
then (2.12) is rewritten by:
mid(a ~ b) = a ∗ b + mmr(a) ∗ mmr(b),
(2.13)
rad(a ~ b) = |a| ∗ β + α ∗ |b| + α ∗ β − |mmr(a)| ∗ |mmr(b)|.
38 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
This formula differs from Rump’s algorithm (2.2) only by the term mmr(a) ∗ mmr(b). It
can be considered as a correction term to Rump’s algorithm. Using this correction term
and the fact that |mmr(a) ∗ mmr(b)| ≤ |mmr(a)| ∗ |mmr(b)|, it is easy to deduce that a ~ b is
always included in a b for all a ∈ IR and b ∈ IR. The following proposition provides an
estimation of the accuracy of a ~ b.
Proposition 2.4.1. Let a, b be two intervals. The product between a and b as defined by
(2.13) is:
• equal to the exact product when a or b does not contain zero,
• an enclosure
√ of the exact product with an overestimation factor of less than or equal
to 4 − 2 2 when both a and b contain zero in its interior.
Proof. Using the decomposition proposition to deduce the formula, the overestimation
factor can also easily be deduced from the previous section.
2.4.3 Implementation
To implement this algorithm, we first have to implement the function mmr. Getting the
sign of a matrix is often costly, hence instead of using the original formula we will use an
alternate formula which employs only maximum and minimum functions.
Proposition 2.4.4. For a = ha, αi ∈ IR
Proof. The equivalence of this alternative formula and the original formula can be checked
by inspecting all possible relative positions of a with respect to 0 and α.
1. If a ≥ 0 then sign(a) = 1 and a = |a|. Hence
2. If a < 0 then sign(a) = −1 and max(a, −α) = max(−|a|, −α) = −min(|a|, α) < 0 <
α. Hence
The implementation in IntLab of the function mmr, using the previous proposition,
exhibits some minor improvement. It is faster than the straightforward implementation
using the original formula by a factor 4/3.
Although IntLab library supports both endpoints and midpoint-radius representations
of interval, for real input data, intervals are exposed to the user in endpoints format.
Midpoint-radius format is only used internally for complex data. Hence, to implement our
algorithm, we have to transform the input data from endpoints representation to midpoint-
radius representation before performing computations, and finally transform results from
midpoint-radius representation back to endpoints representation for output.
Using floating-point arithmetic to implement interval arithmetic, rounding errors must
be taken care of when transforming interval data. Transforming from midpoint-radius
representation to endpoints representation is simple.
40 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
a = 5(a − α),
a = 4(a + α).
a = 4(a + a) ∗ 0.5,
α = 4(a − a).
α = 4(a − a) ≥ a − a.
Hence, a − α ≤ a.
Moreover
a = 4(a + a) ∗ 0.5 ≥ (a + a) ∗ 0.5.
Hence a + α ≥ a + (a − a) ≥ 2 ∗ a − a ≥ (a + a) − a = a.
Combining this with a − α ≤ a, we get a − α ≤ a ≤ a ≤ a + α and thus a ⊆ ha, αi.
Like in Rump’s algorithm, because of rounding errors, the midpoint of the result cannot
be computed exactly using formula (2.14). Hence, we have to compute an enclosure of the
midpoint. It means that it must be computed twice, once with downward rounding mode
and once with upward rounding mode.
What follows is a function named igemm3 implementing this algorithm in MatLab,
using the IntLab library.
1 function C = igemm3 (A, B)
2 setRound ( 1 ) ; % s w i t c h t o upward r o u n d i n g mode
3 mA = ( sup (A) + i n f (A) ) ∗ 0 . 5 ;
4 mB = ( sup (B) + i n f (B) ) ∗ 0 . 5 ;
5 rA = mA − i n f (A) ;
6 rB = mB − i n f (B) ;
7 mmA = min(max(mA,−rA ) , rA ) ;
8 mmB = min(max(mB,−rB ) , rB ) ;
18 mCu = mCu + rC ;
19 setRound ( −1) ; % s w i t c h t o downward r o u n d i n g mode
20 mCd = mCd − rC ;
22 C = i n f s u p (mCd, mCu) ;
23 end
4(1 + 2−54 − 2−53 ) = 4(4(1 + 2−54 ) − 2−53 ) = 4((1 + 2−52 ) − 2−53 ) = 1 + 2−52 > 1.
With the presence of rounding errors, computed results depend not only on the input
data but also on the order of the computations. With the same example as above, if we
change the order of computations then the computed result is better.
One useful trick to minimize the effect of rounding errors when performing floating-point
summation is to sum the smaller terms first. It is shown in [Hig02, chap.4] to be the best
strategy (in terms of the accuracy of the final result) when all summands are non-negative.
It is often a good heuristic otherwise. By definition of mmA and mmB, mmA ∗ mmB and
abs(mmA) ∗ abs(mmB) should be the smallest among the other sub-products. Because
in the implementation of A ~ B, we need to compute the addition and the subtraction
between midpoint and radius to get the final result in endpoints representations, we will first
compute these two smallest products, then compute the other sub-products to hopefully
reduce rounding errors. This leads to another implementation of the formula (2.14).
1 function C = igemm4 (A, B)
2 setRound ( 1 ) ;
3 mA = ( sup (A) + i n f (A) ) ∗ 0 . 5 ;
4 mB = ( sup (B) + i n f (B) ) ∗ 0 . 5 ;
5 rA = mA − i n f (A) ;
6 rB = mB − i n f (B) ;
8 mmA = min(max(mA,−rA ) , rA ) ;
9 mmB = min(max(mB,−rB ) , rB ) ;
11 setRound ( −1) ;
16 setRound ( 1 ) ;
42 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
20 mCu = mCu + rC ;
21 mCu = mCu + mA ∗ mB;
22 setRound ( −1) ;
23 mCd = mCd − rC ;
24 mCd = mCd + mA ∗ mB;
26 C = i n f s u p (mCd, mCu) ;
27 end
With this new implementation, results computed by (2.14) are at least as accurate as
results computed by the IntLab library when input intervals are small. On the other hand,
when input intervals are thick, this implementation provides results tighter than those
computed by the IntLab library.
Using 7 floating-point matrix multiplication, the theoretical factor of this algorithm
in terms of execution time with respect to floating-point multiplication is 7. In practice,
because of data manipulation which is of order O(n2 ), when the matrix dimension is small,
the practical factor is much higher than the theoretical factor. The theoretical factor is
only reached when matrix dimension is high enough, i.e larger than 100 × 100, see Figure
2.2.
2.5 Conclusion
Using solely floating-point matrix operations to implement interval matrix products, the
performance of our algorithms depends strongly on the performance of the employed
floating-point matrix library. Moreover, these algorithms can be easily parallelized by
using parallelized versions of the floating-point matrix library.
We can use whatever library optimized for the running system is available, with only
one requirement, which is that this library must support upward and downward rounding
2.5. CONCLUSION 43
modes. We can cite here for example the ATLAS and GotoBLAS libraries. Unfortunately,
this requirement prevents us from using Strassen-like algorithms or even faster ones [Str69,
Pan84, Hig02] to reduce the complexity of floating-point matrix product to less than O(n3 ).
Rump’s algorithm computes the interval matrix products very fast, with only four
floating-point matrix products, but suffers from a factor of overestimation in the worst
case of 1.5. Our algorithms provide a trade-off between performance and accuracy. With
five more floating-point matrix products for the first algorithm, and three more floating-
point matrix products for the second algorithm, we manage to reduce the overestimation
factor in the worst case to approximately 1.18. Moreover, when one multiplier does not
contain zero, then our algorithms provide exact results in exact arithmetic. In the presence
of rounding errors, the second proposed algorithm always provides results of smaller width
than those computed by Rump’s algorithm.
44 CHAPTER 2. INTERVAL MATRIX MULTIPLICATION
Chapter 3
This chapter deals with the verification of the floating-point solutions of linear
systems of equations. Our approach is based on the floating-point iterative re-
finement, applied to the system pre-multiplied by an approximate inverse of the
coefficient matrix. In our method, interval arithmetic is used to compute the
pre-multiplication and the residual. Then interval iterative improvements are
applied, namely the interval Jacobi or Gauss-Seidel iterations. The computed
result is a tight enclosure of the correction term and not merely an approxima-
tion of if. This method is implemented in MatLab using the IntLab library as a
function called certifylss to perform numerical experiments, and to compare
our method with an algorithm proposed by S.M. Rump, which is packed in the
IntLab library as the function called verifylss. Numerical results shows the
high accuracy of the two verified functions at the cost of a longer execution time
in comparison with the MatLab non-verified built-in function. When the con-
dition number of the matrix of the system is small or moderate, certifylss is
faster than verifylss. Nevertheless, certifylss gets slower than verifylss
as the coefficient matrix is ill-conditioned.
3.1 Introduction
In this chapter, our approach is presented for solving a linear system in floating-point arith-
metic and at the same time verifying the computed solution. “Verify” means to compute
an enclosure of the error by switching from floating-point arithmetic to interval arithmetic
to solve the residual system: this yields a guaranteed enclosure of the error between the
exact result and the approximate floating-point result. This idea can be found in the
verifylss function of the IntLab library [Rum]. The function verifylss first computes a
floating-point approximation of the solution using iterative refinement, then it switches to
interval arithmetic to compute a “tight” interval error bound using a method due to Neu-
maier [Neu99]. Our proposal is to use alternately floating-point arithmetic and interval
arithmetic to refine simultaneously the approximation and the error bound.
The use of the residual is the basis of iterative refinement methods [Mol67, Ske80,
Wil63, Hig02]. An enclosure of the error can be computed, using interval arithmetic, by
45
46 CHAPTER 3. VERIFICATION OF THE SOLUTION OF A LINEAR SYSTEM
adapting one of these iterative refinement methods. These two building blocks, i.e, the
floating-point solution of a linear system and the iterative refinement of the error bounds
using interval arithmetic, are combined to produce a more accurate solution along with a
tight enclosure of the error. In practice, the error bound is tight enough to indicate the
number of correct digits of the approximate solution.
where = 2−53 is the machine relative error. Hence 1 − and 1 + 2 are floating-point
numbers. Nevertheless, 1 + is not a floating-point number.
Using forward substitution to solve this lower triangular system in interval arithmetic,
it is obvious that
Note that, except for the computation of the first element which leads to an interval
of small width, all the other computations are exact, i.e. no rounding error occurs. By
induction one can prove that, for i ≥ 3,
xi = −2i−2 , 2i−2 .
3.2. STATE OF THE ART, VERIFYLSS FUNCTION 47
Hence the width of the computed solution’s components explodes exponentially fast
with respect to row’s index. Meanwhile it is easy to verify that the exact solution is [1/(1−
), −/(1 − ), 0, 0, 0, 0]T . Moreover, the coefficient matrix is well-conditioned, because
−1
1− 1/(1 − )
1 1 −1/(1 − ) 1
1
1 1 =
−1 1 .
1 1 1 1 −1 1
1 1 1 1 1 −1 1
The main reason to that explosion is that there is no cancellation in interval arithmetic.
For all x ∈ IR, then x − x 6= 0. Addition and subtraction both increase the width of
intervals. Hence, a naive application of interval arithmetic does not provide good results.
One important trick when working with interval arithmetic is to use original data, in
punctual form, as long as possible [Rum05].
The verifylss function of the IntLab library [Rum] verifies the solution of a linear
system. Its signature is given below
x = verifylss(A, b)
eK K K
i+1 = z + C ∗ ei + 0.1 ∗ rad(ei ) ∗ [−1, 1] .
[HS81, Han92, Neu99]. The formula for the interval enclosure of the solution set of interval
linear systems of equations, in the case where the coefficient matrix is an H-matrix, is given
by the following theorem of Ning and Kearfott [NK97].
Theorem 3.2.2 (Ning-Kearfott [NK97]). Let A ∈ IRn×n be an H-matrix, b ∈ IRn a
right-hand side,
u = hAi−1 |b|, di = (hAi−1 )ii
and
αi = hAii i − 1/di , βi = ui /di − |bi |,
where hAi is the comparison matrix of A, which is defined by
mig(Ai,j ) for i = j,
hAii,j =
−mag(Ai,j ) for i 6= j.
[Hig97] gives a more thorough and general analysis for a generic solver and for both fixed
and mixed computing precision. In fixed precision iterative refinement, the working pre-
cision is used for all computations. In mixed precision iterative refinement, residuals are
computed in twice the working precision.
First, it is stated in [Mol67, Hig97] that the rates of convergence of mixed and fixed
precision iterative refinements are similar. It is also stated in [Mol67] that the computing
precision used for residual computations does not materially affect the rate of convergence.
However, the computing precision used for residual computations affects the accuracy of
results. In this section, only forward errors on x̃, namely the relative error kx − x̃k∞ /kxk∞
and the componentwise relative error k ((xi − x̃i )/xi ) k∞ are of concern.
Indeed, given a matrix A of dimension n which is not too ill-conditioned, the rel-
ative error, after convergence, of the mixed precision iterative refinement is of order
kx − x̃k∞ /kxk∞ ≈ , with being the relative machine error related to working preci-
sion [Hig97]. For fixed precision iterative refinement, the relative error, after convergence,
is only of order kx − x̃k∞ /kxk∞ ≈ 2n cond(A, x): a relative error of order is no longer
ensured. Here cond(A, x) denotes the condition number of the linear system when only A
is subject to perturbations [Ske79]. However, this relative error bound is the best that can
be obtained without using higher precision. Usually, fixed precision iterative refinement is
used to get a stable solution of linear systems, such as in [LD03, LLL+ 06]. Indeed, only
one iteration of iterative refinement with only fixed precision accumulation of the residual
suffices to make Gaussian elimination componentwise backward stable [Ske80].
The goal is to make the iteration contractant and thus to ensure its convergence. However,
the system now has an interval matrix and solving such a system is NP-hard [RK95, Roh05].
For A ∈ IRn×n , b ∈ IRn , the solution of the interval linear system Ax = b, denoted
by Σ(A, b), is the set of all the solutions of all the possible punctual linear systems taken
from the interval system:
Σ(A, b) = {x ∈ Rn | ∃A ∈ A, ∃b ∈ b : Ax = b}.
Usually, Σ(A, b) is not an interval vector. It is not even a convex set.
Example 5. Consider the following example
2 [−1, 1] x1 [−2, 2]
= . (3.3)
[−1, 1] 2 x2 0
The solution of the above system is given by
Σ(A, b) = {x ∈ R2 | 2|x2 | ≤ |x1 |, 2|x1 | ≤ 2 + |x2 |}
which is depicted in Figure 3.1 by the grayed out region.
If A is regular, i.e every A ∈ A is regular, then Σ(A, b) is also bounded, since A and
b are bounded intervals. Hence, the hull of the solution set is defined as follows:
AH b = Σ(A, b) = [inf(Σ(A, b)), sup(Σ(A, b))] . (3.4)
The hull of the solution set of the system (3.3) is depicted by the dashed rectangular
box in Figure 3.1. Let A−1 denote the matrix inverse of A which is
A−1 = {Ã−1 | Ã ∈ A}
then we have the following inclusion [Neu90, Eq. (6), Section 3.4, p.93]
AH b ⊆ A−1 b, with equality if A is thin.
Algorithms for solving interval linear systems return a box containing the convex hull
of the solution set (3.4), which is thus not the minimal enclosure. Direct algorithms to
enclose the solution of an interval linear system exist [Han92, Roh93, Neu99]. Here we
prefer an iterative refinement method, which reuses some of the previous floating-point
computations. In our case, e is usually relatively small with respect to the approximate
solution of the original system x̃. Hence ideally, the enclosure of the error corresponds to
the last bits of x̃, and does not need to be too tight. Indeed, as it will be shown in the
next section, such an enclosure of that convex hull can be obtained at a cost of O(n2 )
operations. This complexity is asymptotically negligible compared to the cost of solving
the original system using floating-point arithmetic.
3.4. INTERVAL ITERATIVE REFINEMENT 51
x2
x1
Lemma 3.4.1. Let e∗ be the exact solution of the system Ae = mid(r). If ẽ is a solution
of the system Ae = r then 2e∗ − ẽ is also a solution of the system Ae = r.
Proof. e∗ is the exact solution of the system Ae = mid(r) hence Ae∗ = mid(r).
ẽ is a solution of the system Ae = r, which means that there exists a vector r̃ ∈ r such
that Aẽ = r̃.
Moreover, r̃ ∈ r ⇒ |r̃−mid(r)| ≤ rad(r) ⇒ mid(r)+(mid(r)− r̃) ∈ r ⇒ 2mid(r)− r̃ ∈ r.
Replace mid(r) and r̃ by Ae∗ and Aẽ respectively we get
2Ae∗ − Aẽ ∈ r
⇒ A(2e∗ − ẽ) ∈ r.
The above lemma means that the exact solution of the system Ae = mid(r) is the
midpoint of the exact solution of the system Ae = r. Hence if we can obtain a good
enclosure of the solution of the system Ae = r then
mid(A \ r) = e∗ ∼
= A \ mid(r).
x̃ = A \ b
while (stopping criterion not verified) do
r̂ = b − Ax̃
ê = A \ r
x̃ = x̃ + ê
end while
One can see that the numerical behavior of x̃ in the above algorithm is similar to that
of the floating-point iterative refinement.
x̃0 = x̃ + mid(e)
e = e − [x̃0 − x̃]
x̃ = x̃0
Using [x̃0 − x̃] instead of mid(e) seems to inflate the interval a little bit. In fact, if mid(e)
is small in magnitude with respect to x̃, i.e |mid(e)| ≤ 1/2|x̃| which is often the case, then
x̃ and x̃0 have the same sign and even satisfy 1/2x̃ ≤ x̃0 ≤ 2x̃, thus following the Sterbenz
lemma [Ste74], x̃0 − x̃ is exact in floating-point arithmetic.
Moreover, the following inclusion holds
Hence the inclusion property is ensured by this updating process. This updated error
bound will serve as the starting point for the next step, or as the initial solution for the
next refinement: there is no need to recompute it.
Still, to start the refinement, an initial solution of the residual system is needed. In
the following section, we explain how to compute an initial solution of an interval linear
system.
Proposition 3.4.2 ([Neu90, Prop. 4.1.9, p.121]). Let A ∈ IRn×n be an interval matrix of
dimensions n × n and let C, C 0 ∈ Rn×n .
where AH b is the convex hull of the solution set of the system Ax = b, hAi is the
comparison matrix of A, whose components are defined by:
and kbkv is the scaled norm with respect to v: kbkv = max(|bi |/vi ) i ∈ 1, . . . , n.
|A−1 r| ≤ kRrkv u,
A−1 r ⊆ kRrkv · [−u, u].
What remains now is to find a positive vector u so that h[RA]iu > 0. Using [Neu90,
Prop. 3.7.3, p.112], u is set to an approximate solution ũ of the linear system h[RA]iu = e,
where e = (1, . . . , 1)T . Because two important properties of an H-matrix A are that hAi
is regular, and hAi−1 e > 0, ũ satisfies ũ ≈ h[RA]i−1 e > 0 and h[RA]iũ ≈ e > 0, for
sufficiently good approximations ũ.
In our case, A is a floating-point matrix. If A is well preconditioned, or more precisely if
RA is close to identity, or diagonally dominant, then it suffices to use the vector (1, . . . , 1)T
as the value of u and the product h[RA]iu is positive. If the test of positivity of h[RA]iu
fails with u = (1, . . . , 1)T , then h[RA]i is not a diagonally dominant matrix. However it
54 CHAPTER 3. VERIFICATION OF THE SOLUTION OF A LINEAR SYSTEM
does not mean that h[RA]i is not an H-matrix. In this case, we apply some floating-point
Jacobi iterations in order to compute an approximate solution of the system h[RA]iu = e ≡
(1, . . . , 1)T . Let D, E be the diagonal and off-diagonal parts of h[RA]i. Jacobi iterations
can be expressed by the following formula, where u is the previous iterate, u0 is the new
iterate, and v is the product between h[RA]i and u:
Note that, in order to ensure that u is a non-negative vector, we take the absolute value
of the updated solution after each refinement. Moreover, as it is not necessary to compute
an accurate solution, the refinement stops as soon as Ku > 0.
If the Jacobi iterative refinement fails to obtain a vector u ≥ 0 such that v = K ∗ u > 0
after 10 iterations, then our algorithm stops and issues a message of failure. Each Jacobi
refinement costs only O(n2 ) floating-point operations, hence the overhead of execution time
introduced by this refinement is negligible.
Nevertheless, Proposition 3.4.3 does not yield a tight enclosure of the solution. The
next section presents methods to improve this interval enclosure.
Developing the i-th line and separating the i-th component, one gets:
i−1 n
!
X X
ẽi = z̃i − K̃i,j ẽj − K̃i,j ẽj /K̃i,i and ẽi ∈ ei .
j=1 j=j+1
Replacing punctual terms by the corresponding interval terms yields the formula of the
interval Jacobi iteration:
i−1 n
!
X X
ẽi ∈ e0i := z i − K i,j ej − K i,j ej /K i,i ∩ ei .
j=1 j=j+1
Taking into account the fact that for components that have already been refined, e˜j
belongs to both original value ej and refined valued e0j , ej can be replaced by e0j for j < i
to obtain the Gauss-Seidel iteration:
i−1 n
!
X X
e0i := z i − K i,j e0j − K i,j ej /K i,i ∩ ei . (3.5)
j=1 j=j+1
Taking the intersection with the former iterate ei implies the contracting property of
both Jacobi and Gauss-Seidel iterations. Hence, both iterations converge. Nevertheless,
making full use of the refined values, Gauss-Seidel iterations converge much more quickly
than Jacobi iterations. As mentioned in Section 3.4.3, our sufficient condition to compute
an initial solution of the interval residual system is that [RA] is an H-matrix. Under this
condition, Gauss-Seidel iterations converge very quickly.
Let us now detail the complexity of an iteration. For both Jacobi and Gauss-Seidel
iterations, the improvement of each component requires n − 1 interval multiplications,
n interval additions and one interval division. Thus, in total, each Jacobi/Gauss-Seidel
iteration costs O(n2 ) interval operations. Hence, theoretically, the improvement stage
should not affect the overall cost of the method, which is O(n3 ).
Complexity
The complexity of the whole algorithm is determined by counting the number of operations
for each step. To be more precise, we distinguish here several types of operations, namely:
• recomputing the residual system requires (2n2 + O(n)) FLOP2 s + (2n2 + O(n)) FLOPs.
3.5. NUMERICAL EXPERIMENTS 57
Stopping criteria
As mentioned in Section 3.4, the width of the interval error decreases after each step. So
it is a non-negative non-increasing series. This property leads to two stopping criteria (for
a study of stopping criteria, see [DHK+ 06]).
Firstly, we stop the computation whenever reaching a required number of accurate bits.
For example, working with double floating-point numbers, there is no point of getting a
result which is accurate to more than 52 bits. Using interval arithmetic, it is quite easy to
compute the number of exact bits in the result via componentwise relative errors:
rad (ei )
nb_bits = −log2 max . (3.6)
|x̃i |
Nevertheless, the program can end up without reaching the required number of exact
bits. Hence, we have to detect whether the iteration yields extra correct digits or stagnates.
Let e and e0 be two successive iterates, a second stopping criterion is that no more correct
digit is obtained in any component of the approximation:
!
rad e0j − rad (ej )
max (i)
< , (3.7)
j |x̃j |
with being the machine relative error, which is 2−53 for IEEE 754 binary double precision
format.
In cases where none of these two criteria above is matched, it is necessary to stop
the computation when one has tried enough. Practically, in our implementations, the
maximum number of iterations is set to 10. Also, the maximum number of Gauss-Seidel
iterations is set to 5 due to its quick convergence property.
precision which is based on error-free transforms according to [Dek71, HLB01, LDB+ 02,
ORO05, LL08b, LL08a, Rum09b, Rum09a, NGL09, GNL09].
More precisely, it uses the the error-free transforms for the sum and for the product
of two floating-point numbers. The error-free transform of the sum of two floating-points,
called TwoSum, costs 6FADDs. Moreover, the error-free transform of the product of two
floating-points, called TwoProd, costs 10FADDs and 7FMULs unless an operation of type
Fused-Multiply-Add is available [BM05]. Hence the cost of computing the residual, as
accounted by 2n2 + O(n)FLOP2 s in the previous section, using simulated double-double
precision will be translated into FLOPs by 7n2 FMULs + 16n2 FADDs. Hence, in theory, this
cost can still be considered as O(n2 )FLOPs.
1 function r e s = r e s i d u a l (A, x , b )
2 setround (0)
3 f a c t o r = 2^27 + 1 ;
5 % s p l i t v e c t o r −x i n t o two p a r t s x1 and x2
6 % x1 c o r r e s p o n d s t o t h e f i r s t h a l f o f t h e mantissa , o f h i g h
weight
7 % x2 c o r r e s p o n d s t o t h e second h a l f o f t h e mantissa , o f low
weight
8 x = −x ;
9 y = f a c t o r ∗x ;
10 xbig = y − x ;
11 x1 = y − x b i g ;
12 x2 = x − x1 ;
14 n = s i z e (A, 1 ) ;
17 r = b;
19 e r 2 = zeros ( n , 1 ) ;
20 e r 1 = zeros ( n , 1 ) ;
21 for i = 1 : n
22 setround (0) ;
23 % s p l i t a row o f A
24 a = A( : , i ) ;
25 a2 = a ∗ f a c t o r ;
26 a b i g = a2 − a ;
27 a1 = a2 − a b i g ;
28 a2 = a − a1 ;
30 % EFT f o r t h e p r o d u c t
31 p_h = a ∗ x ( i ) ;
32 p_l = a2 ∗ x2 ( i ) − ( ( ( p_h − a1 ∗ x1 ( i ) ) − a2 ∗ x1 ( i ) ) − a1
∗ x2 ( i ) ) ;
34 % EFT f o r t h e sum o f h i g h p a r t s o f t h e p r o d u c t s
35 s = r + p_h ;
36 v = s − r;
37 e = ( r − ( s − v ) ) + (p_h − v ) ;
38 r = s;
40 setround (1) ;
41 e r 2 = e r 2 + e + p_l ;
3.5. NUMERICAL EXPERIMENTS 59
42 e r 1 = ( e r 1 − e ) − p_l ;
43 end
45 r e s = r + i n f s u p (− er1 , e r 2 ) ;
46 end
47 end
Gauss-Seidel vs Jacobi
As mentioned in Section 3.4.5, each Gauss-Seidel iteration as well as each Jacobi iteration
costs n2 IMULs + n2 IADDs + O(n)IOPs. Nevertheless, making full use of refined values,
Gauss-Seidel iterations converge more quickly than Jacobi iterations. In implementation,
this observation is still true in terms of number of iterations. On the other hand, in terms
of execution time, this observation does not always hold. That is because of parallelism and
memory access problem in MatLab. Each Jacobi iteration can be translated directly into a
matrix-vector multiplication. Hence its performance relies totally on the underlying library.
Meanwhile, a Gauss-Seidel iteration is translated into a series of dot products, which
reduces the possibility of parallelism. Moreover, accessing rows individually in MatLab
is disadvantageous. Hence, in an implementation in MatLab, a Gauss-Seidel iteration is
much slower than a Jacobi iteration.
Therefore, for problems of not too large dimensions, Jacobi iterations are preferable over
Gauss-Seidel iterations in MatLab. Its slower convergence speed with respect to Gauss-
Seidel iterations can be compensated by augmenting the number of iterations, and still
gaining in execution time. This phenomenon has been observed in [BT89, chap.2, Section
2.4 and 2.5, pp.130–142] for parallel implementation of classical iterative methods.
60
accuracy (bits) 50
40
matlab
30
verifylss
20 certifylss
10
0
5 10 15 20 25 30 35 40 45 50
log2(condition number)
5 matlab
execution time (s)
verifylss
4
certifylss
3
0
5 10 15 20 25 30 35 40 45 50
log2(condition number)
When the condition number increases, an accuracy of 52 bits cannot be obtained any
more. However, the execution time for certifylss becomes higher than the execution
time for verifylss. This is explained by the fact that verifylss uses only floating-point
iterations to refine the approximate solutions, while certifylss uses alternately floating-
point iterations and interval iterations to refine the approximate solutions, and at the
same time the error bounds. As the condition number increases, the number of iterations
also increases. Interval operations are slower than floating-point operations, hence, the
execution time of certifylss increases faster than the execution time of verifylss along
with the increase of the number of iterations.
When the coefficient matrix is not too ill-conditioned, the factor of execution time of
certifylss with respect to the non-verified function of MatLab is about 8. Nevertheless,
when the condition number increases, this theoretical factor is no longer satisfied. It is not
only that interval operations are slow, but also that as the condition number increases, the
number of iterations as well as the time of recomputing the residual increase. As explained
in the previous section, residual computing is costly and it introduces a significant overhead
of execution time to both certifylss and verifylss functions.
When the condition number gets close to 1/, both functions fail to verify the solution.
Indeed, a good approximation of the inverse of A cannot be obtained, thus the sufficient
condition that RA is an H-matrix does not hold.
3.6. CONCLUSION 61
3.6 Conclusion
Combining floating-point iterative refinement and interval improvement, we are able to
provide accurate results together with a tight error bound upon the approximate solutions.
This error bound, which is indeed an interval vector, enables us to state how many guar-
anteed bits there are in the approximate solutions. When the condition number of the
system is small to moderate, our method succeeds in verifying up to the last bit of the
approximate solution. It means that the computed solution is the exact solution rounded
to a floating-point vector in faithful rounding mode.
Using alternately floating-point iterative refinement and interval improvement also en-
ables us to detect easily when to terminate the refinements, by counting the number of
guaranteed bits as well as by checking improvement in the width of the error bound.
Moreover, our method also checks the feasibility before performing any further refinement.
Therefore our method exhibits better performance than Rump’s algorithm when the matrix
is either well-conditioned or too ill-conditioned.
On the other hand, when the matrix is ill-conditioned, the computation time of our
method increases very fast and becomes much slower than Rump’s algorithm. That is
because the proposed method uses intensively interval operations. The next chapter will
present a technique to overcome this problem.
62 CHAPTER 3. VERIFICATION OF THE SOLUTION OF A LINEAR SYSTEM
Chapter 4
4.1 Introduction
The interval improvement step is used to improve the error bound upon a computed ap-
proximation. Nevertheless, an intensive use of interval operations makes the interval im-
provement costly in terms of execution time. Especially when the condition number of the
system is high, the number of iterations is high. The overhead of execution time introduced
by the interval improvement becomes significant and even larger than the execution time
of computing a tight error bound enclosure according to [Neu99].
In this chapter, we introduce a technique, which we shall call the relaxation technique,
to reduce the overhead of execution time introduced by the interval improvement. By
nature, the error bound (i) is an enclosure of the difference between the approximation
and the exact solution and hence (ii) is used to update the approximation. This error
bound should correspond to lost bits in the approximation. Hence, it is not necessary to
compute an error bound with high accuracy. Thus we relax the tightness requirement on
this error bound to speed up the program.
63
64 CHAPTER 4. RELAXATION METHOD
e0 = D−1 (b − (L + U)e0 ).
Likewise, Gauss-Seidel iteration can also be expressed symbolically in the same way
D is a diagonal matrix, its inverse is also a diagonal matrix whose diagonal elements
are the inverse of their corresponding elements of D. Hence, the most expensive part of
Jacobi or Gauss-Seidel iterations is the interval matrix-vector multiplication by L and U.
Even if we use Rump’s fast algorithm for interval matrix multiplication, the theoretical
factor between an interval matrix vector product in terms of execution time with respect
to a floating-point matrix vector product of the same size is 4. However, if we take into
account the execution time for transforming data, for example computing midpoint and
radius of the matrix, then the practical factor is even higher.
When applied to a preconditioned system, A is close to the identity, hence L, U are
close to zero. As it has been observed in Chapter 2 Eq. (2.4), an interval multiplication
between a centered-in-zero matrix and another interval matrix or vector is equivalent to
its floating-point counterpart in terms of execution time. Hence, to gain in performance,
the matrix of the system is enlarged, so as to have its off-diagonal elements centered in
0. Formulas for subsequent operations are thus simplified: the computed intervals are
symmetrical around 0 and only one endpoint is computed, using floating-point arithmetic.
More precisely, we will work with
L̂ = [−|L|, |L|] ,
(4.2)
Û = [−|U |, |U |] .
Âe = b. (4.3)
Proposition 4.2.2. Let A, A0 ∈ IRn×n be two regular matrices. Let b ∈ IRn be an interval
vector. If A ⊆ A0 then Σ(A, b) ⊆ Σ(A0 , b).
Proof. For all x̃ ∈ Σ(A, b), then
The solution set of the original system is depicted in Figure 4.1 by the grayed out
region. Meanwhile, the solution set of the relaxed system is depicted by the white region.
X2
1.5
X1
1 1.5
One can see that the solution set of the relaxed system overestimates the solution set
of the original system. Nonetheless, if the off-diagonal elements of the original system are
almost centered in zero then the grayed out region is close to the white region.
Let us apply the interval improvement to this relaxed system.
The relaxed Jacobi iterations are
Since L̂, Û are centered in zero, L̂ + Û is also centered in zero. According to Section
2.3.1, the product between L̂, Û or L̂ + Û with any interval vector can be computed by
just one punctual matrix-vector product. For x ∈ IRn :
Therefore the execution time of applying the interval improvement to the relaxed system
is much smaller than applying it to the original system. In theory, the factor of execution
time between an interval matrix-vector product with respect to a punctual matrix-vector
product is 4. The relaxed interval iterations also exhibit some advantages over the original
interval iterations regarding storage memory. As  is a matrix whose off-diagonal com-
ponents are centered in zero, to store Â, it is only needed to store its upper bound as a
floating-point matrix, and its diagonal as an interval vector.
|D−1
ii | = |1/Dii | = 1/hDii i
If hAi is a diagonally dominant matrix, then the Jacobi iterations converge very quickly.
Denote P
k6=i |Aik |
θ = maxi i = (1, . . . , n) = khDi−1 |L + U|k∞
hAii i
then
Therefore, the series k|e0 | − uk∞ converges more quickly than a geometric series of ratio
θ. Moreover, the Jacobi iterations converge even more quickly if the coefficient matrix is
close to the identity, since in this case θ is close to 0.
4.2.2 Accuracy
It is evident that using the relaxed system to improve the solution will produce a larger
enclosure of the solution than the original system. Nonetheless, in the context of our
method, the interval system is a preconditioned system, hence it is nearly centered in a
diagonal matrix if the original system is not too ill-conditioned.
This relaxation technique is not really a new idea. Although it is not explicitly stated so,
this relaxation technique is already used in [HS81, Han92, Neu99]. Let us recall Theorem
3.2.2: the solution of the original system (4.1) is bounded by
Σ(A, b) ⊆ eo
where
u = hAi−1 |b|, di = (hAi−1 )ii ,
Σ(Â, b) = ê
where
û = hÂi−1 |b|, dˆi = (hÂi−1 )ii
We have that  = L̂ + D + Û and thus Âii = Dii = Aii . Moreover, following Lemma
4.2.1, hL̂i = hLi and hÛi = hUi ⇒ hÂi = hL̂i + hDi + hÛi = hLi + hDi + hUi = hAi. It
means that: û = u, dˆ = d, α̂ = α, β̂ = β. Hence eo = ê. In other words, Theorem 3.2.2
computes an enclosure of the solution of (4.1) by computing the hull of the solution set of
the relaxed system (4.3).
Take Example 6 again. Applying Theorem 3.2.2 to compute an enclosure of the solution,
we have
4 −1 −1 4/15 1/15
hAi = hAi =
−1 4 1/15 4/15
18/15 4/15
u = hAi−1 |b| = d =
12/15
4/15
1/4 1/2
α = β =
1/4 1
0 4+[−1/2,1/2] 0 2+[−1,1]
e1 = 4+[−1/4,1/4] = [14/17, 6/5] e2 = 4+[−1/4,1/4] = [4/17, 4/5] .
The interval vector e is in fact the hull of the solution set of the relaxed system (4.4),
which is depicted by the outer dashed box in Figure 4.1. Meanwhile, the exact hull of the
solution set of the original system is
∗ [14/15, 68/63]
e =
[4/15, 40/63]
which is depicted by the inner dotted dashed box. It is obvious that e∗ is included in
e. Therefore, it is suggested by Neumaier [Neu99] that if the midpoint of the coefficient
matrix is not a diagonal matrix, then some Gauss-Seidel iterations should be applied in
order to obtain a tighter enclosure of the solution.
Nevertheless, the computed result of the interval improvement is not the hull of the
solution of the system Âx = b. Jacobi iterations converge to an interval vector, which is
the solution of the following system of equations:
!
X
ei = bi − Âik ek /Ãii i = (1, . . . , n).
k6=i
The uniqueness of this solution is provided by [Neu90, Theorem 4.4.4, p145], we recall
it below.
Theorem 4.2.3. Let A ∈ IRn×n be an H-matrix. Then for all b ∈ IRn , the system of
equations !
X
ei = bi − Aik ek /Aii i = (1, . . . , n)
k6=i
n
has a unique solution z ∈ IR .
Denote AF b = z. If A is thin then AF b = A−1 b.
Applied to the relaxed system, the midpoint of  is a diagonal matrix and the fixed
point of the Jacobi iterations can be computed by the following theorem [Neu90, Theorem
4.4.10, p150].
4.2. INTERVAL IMPROVEMENT: JACOBI AND GAUSS-SEIDEL ITERATIONS 69
Theorem 4.2.4. Let A ∈ IRn×n be an H-matrix and suppose that mid(A) is diagonal.
Then
|AF b| = hAi−1 |b|,
and z = AF b can be expressed in terms of u = hAi−1 |b| as
Consequently, we can compare the enclosure of the solution set of the system Âx = b,
and the enclosure computed by Jacobi iterations.
Corollary 4.2.5. Let A ∈ IRn×n be an H-matrix and suppose that mid(A) is diagonal.
Then for all b ∈ IRn
(1) |AF b| = |AH b|,
bi + [−βi , βi ]
(AH b)i = .
Aii + [−αi , αi ]
Following Theorem 4.2.4, |AF b| = hAi−1 |b|. Therefore the first equality holds.
Because of the subdistributivity property, we have
bi + (hAiii ui − |bi |) [−1, 1] bi [−1, 1]
(AF b)i = ⊆ + (hAiii ui − |bi |) ≡ yi ,
Aii Aii Aii
where
bi [−1, 1] |bi | 1
|yi | ≤ + (hAiii ui − |bi |) = + (hAiii ui − |bi |) = ui ,
Aii Aii hAiii hAiii
bi [−1, 1] bi
mid(yi ) = mid( ) + mid((hAiii ui − |bi |) ) = mid( ).
Aii Aii Aii
Therefore
rad(yi ) = |yi | − |mid(yi )| = ui − |mid(bi /Aii )|.
It is obvious that
bi bi bi + [−βi , βi ]
mid ∈ ⊆ = (AH b)i .
Aii Aii Aii + [−αi , αi ]
From the definition of the minimal magnitude value, we get:
bi
mid ≥ h(AH b)i i.
Aii
70 CHAPTER 4. RELAXATION METHOD
Hence
2 ∗ rad(AH b)i ≥ |AH b|i − hAH bii ≥ |AF b|i − |mid (bi /Aii ) |
≥ |AF b|i − |mid(AF b)|i ≥ rad(AF b)i .
Meanwhile, hÂi = hAi ⇒ hÂiv ≥ w > 0. It means that the solution of the relaxed
system (4.3) is bounded by the same interval vector as the original system:
Hence, the initial solution of both the original system and the relaxed system is com-
puted by
4 [−1, 1] [−4/3, 4/3]
kbkw · [−v, v] = = .
3 [−1, 1] [−4/3, 4/3]
This initial solution
is much larger
then the hull of the solution set computed by The-
[14/17, 6/5]
orem 3.2.2, which is .
[4/17, 4/5]
On the other hand, from Theorem 4.2.4, if we can obtain an initial solution e0 = [−u, u],
with u = hAi−1 |b| then the interval improvement converges after only one Jacobi iteration:
!
X
e1i = bi − Aik e0k /Aii
k6=i
!
X
= bi − [−|Aik |uk , |Aik |uk ] /Aii
k6=i
!
X
= bi − ( |Aik |uk ) [−1, 1] /Aii .
k6=i
4.3. IMPLEMENTATION AND NUMERICAL EXPERIMENTS 71
This initial enclosure is better than that for v = [1, . . . , 1]T and makes the iterations
converge more quickly. Nonetheless, the computation of an approximation of the solution
of the system hAix = |b| costs O(n3 ) floating-point operations and therefore introduces a
significant overhead of execution time to the overall algorithm. Thus we do not use it in
our algorithms.
4.3.1 Implementation
There are only some minor modifications in certifylss_relaxed with respect to its pre-
decessor certifylss. First, the preconditioned matrix is inflated, hence it is not necessary
to store the off-diagonal components of the preconditioned matrix as an interval but to
store only their maximal magnitude value. Consequently, to store the product of that
matrix with an interval vector, it is sufficient to store only the maximal magnitude of the
result.
Algorithm 4.1 describes the interval improvement using Jacobi iterations and the re-
laxation technique.
Note that D and K are computed only once, and need not to be recomputed each time
the interval improvement is applied.
Algorithm 4.1 Interval improvement function refine using relaxed Jacobi iterations
Require: e ∈ IFn be an initial enclosure of the solution of the interval system Ae = z,
where A ∈ IFn×n be an H-matrix, and z ∈ IFn
Ensure: e is improved using Jacobi iterations
D = diag(A)
K = mag(A)
Set the diagonal components of K to 0
while (termination criteria not satisfied) do
setround(1)
tmp = K ∗ mag(e)
e1 = (z + infsup(−tmp, tmp)) ./ D
e = intersect(e, e1)
end while
the approximate solution and to bound the error upon this solution. Hence it is not always
true that the solution computed by certifylss_relaxed is of larger relative precision
than that computed by certifylss.
Figure 4.2 depicts numerical results with matrices of dimensions 1000 × 1000 and con-
dition number varying from 25 to 250 for the three verified functions, namely verifylss
of the IntLab library, certifylss, certifylss_relaxed, together with results computed
by MatLab. Again, the matrices used in the tests are generated by the MatLab function
gallery (0 randsvd0 , dim, cond).
One can see that, although it relaxes the tightness of the error bound enclosure, the
function certifylss_relaxed always provides results of the same accuracy as those com-
puted by the function certifylss. This is also the reason why we do not use the algo-
rithms presented in Chapter 2 for the interval matrix product. Indeed, in this case, it is
not necessary to compute a very accurate enclosure of the interval matrix vector product.
Moreover, if certifylss_relaxed fails to verify the solution then certifylss also
fails and vice versa. That is because the two functions share the same initial error bound
enclosure. Failure to obtain an initial error bound means failure to verify the solution.
Nevertheless, using the relaxation technique helps to reduce a lot the overhead of execu-
tion time introduced by the interval improvement, and therefore it reduces a lot the overall
execution time of the function. As it can be seen on Figure 4.2, certifylss_relaxed is
even almost faster than verifylss except for too ill-conditioned systems.
When the condition number gets close to the machine relative error, there is a drop in
execution time for both certifylss_relaxed and verifylss. It corresponds, for both,
to a failure to compute an initial error bound. In such a case, there is no need to perform
any further iterative refinement and the two functions terminate prematurely without any
certification on the computed approximate solution.
Even though the use of the relaxation technique makes the error bound improvement
phase become negligible in terms of execution time, when the condition number increases,
the execution time increases significantly. Indeed, when the condition number increases,
the number of iterations for refining the approximate solution increases. More precisely,
the dominant computation, in terms of execution time, is the computation of the residual,
as the computing precision is doubled, and thus the computation of the successive residuals
is no more negligible compared to the overall execution time.
4.4. CONCLUSION 73
60
50
accuracy (bits)
40
matlab
30 verifylss
certifylss
20
certifylss_relaxed
10
0
5 10 15 20 25 30 35 40 45 50
log2(condition number)
matlab
5
verifylss
execution time (s)
4 certifylss
certifylss_relaxed
3
0
5 10 15 20 25 30 35 40 45 50
log2(condition number)
Figure 4.2: Relaxation technique: MatLab implementation results for 1000×1000 matrices.
4.4 Conclusion
By relaxing the exactness to exploit the symmetry of data as well as of operations, the
performance has been significantly improved. At the same time, the relaxed method still
allows to obtain good accuracy. Following Section 4.2.2, the hull of the solution set of
the relaxed system is exactly the enclosure of the solution computed by the Ning-Kearfott
method [NK97]. Moreover, Corollary 4.2.5 shows that the enclosure produced by applying
interval Jacobi iterations on the relaxed system is at worst 2 times larger than the exact
hull of the solution set of the relaxed system. It means that, in comparison with the
Ning-Kearfott method, the relaxed method to compute an enclosure of the solution set
of an interval linear system looses only 1 bit, whereas it costs only O(n2 ) floating-point
operations, instead of O(n3 ) floating-point operations for the Ning-Kearfott method.
In the context of our problem, the relaxation technique is applied to the residual system.
Therefore, although deteriorating the tightness of the error bound, in effect, it still allows
to recover the lost bits in the approximate solution. Therefore, it does not deteriorate
the accuracy of the main result. As shown in the Section 4.3, the relaxed method for the
verification of the solution of a linear system is able to guarantee the same number of exact
bits as the original method, which is explained in Chapter 3.
Additionally, using the relaxation technique, each interval improvement step in par-
ticular and the whole method in general can now be expressed almost solely by punctual
operations except for the vectorial componentwise division. Therefore, the performance
of the method relies entirely on the floating-point library employed. It also means that,
without having to use a dedicated interval library, the method can be implemented using
74 CHAPTER 4. RELAXATION METHOD
only floating-point matrix library and appropriate rounding mode control. We can there-
fore use existing libraries which are well optimized for floating-point matrix operations,
such as ATLAS [WP05], GotoBlas [TAC], LAPACK, or the Intel R Math Kernel Library,
to implement our method.
The relaxation technique works very well with matrices whose midpoint is nearly di-
agonal. Hence it can be applied to any algorithm that needs to precondition the original
system, for example the Krawczyk iterations, to improve the performance of the algorithm.
Chapter 5
The effect of the computing precisions used for each step of the function
certifylss on the accuracy, i.e. the width, of the final result is studied. The
error analysis shows that, not only the computing precision for the residual
computation, but also the computing precisions used for inverting the matrix,
for pre-multiplying the matrix and for storing the approximate solution influ-
ence the accuracy of the final result. Nonetheless, for the sake of performance,
we choose to increase the computing precision only for storing the approximate
solution. Each approximate solution is stored as an unevaluated sum of two
vectors in the working precision. By decoupling the computation of the resid-
ual and refining the two parts of the approximate solution successively, we can
still obtain results which are guaranteed to almost the last bit with an acceptable
overhead in execution time, even when the coefficient matrix is ill-conditioned.
5.1 Introduction
Classically, in the implementation of iterative refinement using floating-point arithmetic,
the residual is computed using twice the working precision in order to obtain a solution
accurate to the working precision. The working precision is used for all the other compu-
tations, as well as for storing program variables.
In this chapter, we will study the effect of computing precision for each step of our
algorithm. The use of a higher precision will increase the result quality but reduce the
performance of the program. Conversely, a lower precision will increase the performance
of the program at the cost of lower accuracy. Studying the effect of computing precision
at each step allows us to find a trade-off between performance and accuracy.
75
76 CHAPTER 5. EFFECTS OF THE COMPUTING PRECISION
condition number. The coefficients of the generated matrices are IEEE double floating-
point numbers. The condition number varies between 25 and 250 .
In order to clarify each experimental set, let us recall the algorithm.
a precision equal to the output precision to get a result which is correct to the last bit.
Experimental results are shown in figure 5.2.
For a fixed output precision, the needed working precision depends on the condition
number, and this dependency is again linear. As a rule of thumb [Hig02, chap.1, p.10],
the solution loses one digit of accuracy for every power of 10 in the condition number. On
Figure 5.2, this rule can be interpreted as follows: in order to be able to verify the solution
of the system, when the condition number increases by 1 bit, the computing precision
should be increased also by 1 bit. Nevertheless, the phenomenon is different when the
componentwise relative error is of concern. As revealed on Figure 5.2 by the white line,
when the condition number increases by 2 bits, the computing precision should be increased
by 1 bit in order to get a result which is correct to the last bit.
|K − I| = |K − RA + (RA − I)|
≤ |K − RA| + |RA − I|
≤ γn,p |R| · |A| + c0n i |R| · |L̃| · |Ũ |. (5.2)
80 CHAPTER 5. EFFECTS OF THE COMPUTING PRECISION
Let recall that r and r are the precisions used to compute and to store the residual
respectively. Let r̃ ∈ Fn be the result of b − Ax̃ computed in floating-point arithmetic with
some rounding mode, then following [Hig02, chap.12, Section 12.1, p.233]
r̃ = b − Ax̃ + ∆r, |∆r| ≤ [r + (1 + r )γ n+1,r ]|A| · |x∗ − x̃| + 2(1 + r )γ n+1,r |A| · |x∗ |
where γ n+1,r = (n + 1)r /(1 − (n + 1)r ). Hence
r ⊆ A(x∗ − x̃) + midrad(0, [r + (1 + r )γ n+1,r ]|A| · |x∗ − x̃| + 2(1 + r )γ n+1,r |A| · |x∗ |).
Ignoring rounding errors introduced by pre-multiplying the residual, we have
z = Rr ⊆ RA(x∗ −x̃)+midrad(0, |R|([r +(1+r )γ n+1,r ]|A|·|x∗ −x̃|+2(1+r )γ n+1,r |A|·|x∗ |))
which gives
|z| ≤ |RA(x∗ − x̃)| + |R|([r + (1 + r )γ n+1,r ]|A| · |x∗ − x̃| + 2(1 + r )γ n+1,r |A| · |x∗ |)
≤ |RA| · |x∗ − x̃| + [r + (1 + r )γ n+1,r ]|R| · |A| · |x∗ − x̃| + 2(1 + r )γ n+1,r |R| · |A| · |x∗ |.
Since x̃ is a floating-point approximate solution, the best bound upon the difference
between x̃ and the exact solution x∗ that can be expected is:
1
|x̃ − x∗ | ≤ x |x∗ |.
2
Hence
1 1
|z| ≤ x |RA| · |x∗ | + [ x [r + (1 + r )γ n+1,r ] + 2(1 + r )γ n+1,r ]|R| · |A| · |x∗ |. (5.3)
2 2
Let D denote the diagonal of K, and E denote the off-diagonal components of K.
Suppose that with the interval improvement, i.e Jacobi or Gauss-Seidel iterations, the
enclosure of the solution of the residual system converges to a fixed point ef then ef
satisfies:
ef = D−1 (z − Eef )
⇒ |ef | = |D−1 (z − Eef )|
= |D−1 | · |z − Eef |.
The necessary condition for Jacobi/Gauss-Seidel to be convergent is that K is an H-
matrix. It implies that D does not contain zero in its interior. Therefore we can suppose
here that D is positive, if it is not the case we simply multiply the rows of K, and the
corresponding element of z, whose diagonal element is negative by −1. Hence
(D−1 )ii = 1/Dii
⇒ |D−1 |ii = 1/hDii i
|(z − Eef |i
⇒ |ef |i =
hDii i
⇒ |e |i hDii i = |z − Eef |i
f
kx̃ − x∗ k∞ kef k∞
≤
kx∗ k∞ kx∗ k∞
1 k|R| · |A| · |x∗ |k∞ 0 k|R| · |L̃| · |Ũ | · |x∗ |k∞
≤ x + dn + c
n i x
2 kx∗ k∞ kx∗ k∞
1 kRk∞ · kAk∞ · kx∗ k∞ 0 kRk∞ · k|L̃| · |Ũ |k∞ · kx∗ k∞
≤ x + dn + c n i x
2 kx∗ k∞ kx∗ k∞
1
≤ x + dn kRk∞ · kAk∞ + c0n i x kRk∞ · k|L̃| · |Ũ |k∞ .
2
Following Lemma 5.3.1 we get
kx̃ − x∗ k∞ 1
∗
≤ x + dn kRk∞ · kAk∞ + c0n i x (1 + 2(n2 − n)ρn )kRk∞ · kAk∞
kx k∞ 2
1
≤ x + [dn + c0n i x (1 + 2(n2 − n)ρn )]e κ(A)
2
1 1
≤ x + x [r + (1 + r )γ n+1,r + γn,p ]e κ(A)
2 2
+ 2(1 + r )γ n+1,r κe(A) + c0n i x (1 + 2(n2 − n)ρn )e
κ(A). (5.4)
Note that, here, we use κ e(A) = kRk∞ · kAk∞ instead of κ(A), because by definition
κ(A) = k|A−1 | · |A|k∞ meanwhile we only have that R is an approximate inverse of A.
Hopefully R ≈ A−1 hence κ e(A) ≈ κ(A).
(5.4) shows the dependence of the result quality on the condition number of A, as well
as on the different precisions employed in the algorithm. Note that, although the precision
x is used to store x, the final result is still returned in the working precision w . Hence,
in order to achieve a solution which is guaranteed to the last bit, the following inequality
must hold
kx̃ − x∗ k∞
≤ w .
kx∗ k∞
Suppose that κ e(A) varies between 1 and 1/w . It is obvious that x must be stored in a
higher precision than the working precision, i.e. x ≤ w . Therefore the term x 21 [r + (1 +
κ(A) is of order O(w ) and does not affect significantly the result accuracy.
r )γ n+1,r + γn,p ]e
Nevertheless, the term 2(1 + r )γ n+1,r κe(A) in (5.4) influences the result relative error.
This is in fact the effect of the precision used for residual computation on the result
quality. When A is ill-conditioned, say κ e(A) = k1w , with k > 1 but not too big, in order
that 2(1 + r )γ n+1,r κe(A) = O(w ), we must have γ n+1,r = O(2w ). It means that twice
the working precision must be used in order to get a result accurate to the last bit in the
working precision.
Nevertheless, doubling the working precision for the residual computation does not
suffice to obtain the result accurate to the last bit in the working precision. One can see
on Figure 4.2 that when the condition number is high, there is a drop in result accuracy.
5.3. ERROR ANALYSIS 83
It seems that this term is of order O(w ) and does not affect a lot the result quality. It is
true when matrix dimensions are small, but no longer true for matrices of large dimensions.
For example, if n = 1000 then 2(n2 − n) ≈ 221 . It makes this term significant and reduces
the result accuracy.
In order to reduce the impact of the term c0n x i (1 + 2(n2 − n)ρn )e κ(A) on the result
quality, one can either reduce x or i . That means either increasing the precision to
store x or increasing the precision to compute the inverse of A. Nevertheless, inverting A
costs O(n3 ) FLOPs. Hence, increasing the precision to invert A will introduce a significant
overhead to the algorithm execution time.
Therefore, we choose to increase the precision for storage format of x. As it will be
shown in the next section, doubling the working precision for the approximate solution will
yield results accurate to the last bit without deteriorating significantly the performance of
the overall algorithm.
|x̃ − x∗ |i |ef |i
maxi ≤ maxi
|x∗ |i |x∗ |i
1 (|R| · |A| · |x∗ |)i 0 (|R| · |L̃| · |Ũ | · |x∗ |)i
≤ x + dn maxi + c n i x maxi
2 |x∗ |i |x∗ |i
1 kRk∞ · kAk∞ · kx∗ k∞ 0 kRk∞ · k|L̃| · |Ũ |k∞ · kx∗ k∞
≤ x + dn + c
n i x
2 mini (|x∗ |i ) mini (|x∗ |i )
1 max (|x∗ | )
i i
≤ x + dn kRk∞ · kAk∞ + c0n i x kRk∞ · k|L̃| · |Ũ |k∞ ∗
.
2 mini (|x |i )
One can see that the componentwise relative error bound differs from the normwise
maxi (|x∗ |i )
relative error bound only by the factor mini (|x∗ |i ) , which corresponds to the variation in
absolute value of the components of the exact solution. If all the components of x∗ are
equal then the componentwise relative error is equal to the normwise relative error.
One problem occurs when using componentwise relative error, when there is a null
maxi (|x∗ |i )
component in the exact solution. In that case the factor mini (|x∗ |i ) ∼ ∞, and the relative
error of the null component will also be infinite. Hence we suppose that there is no null
component in the exact solution.
84 CHAPTER 5. EFFECTS OF THE COMPUTING PRECISION
An arithmetic using twice the working precision and this representation of numbers as
sums of two floating-point numbers, is called a double-double arithmetic. Implementing
a fully supported package of simulated double-double precision operations, namely sum
and product, is very costly [Dek71, HLB01, LDB+ 02]. Instead, we reuse the function
for computing the residual in twice the working precision with a minor modification. A
new function called residualxx is implemented. Instead of returning an interval vector
in the working precision which encloses the exact residual, residualxx returns a triple
{r, er1, er2}, where r, er1, and er2 are vectors of double precision floating-point num-
ber: r is an approximation of the exact residual, and er1, er2 are respectively the lower
bound and the upper bound of the difference between r and the exact residual. The pair
{r, infsup(er1, er2)} is in fact the intermediate result of the function residual just be-
fore returning the guaranteed residual as r + infsup(er1, er2) = r + [er1, er2]. Thus,
b − A ∗ x̃ ∈ (r + [er1, er2]) − A ∗ x̃2 ⊆ [r − A ∗ x̃2 ] + [er1, er2] . Therefore, with this function,
we can compute the residual upon x̃ = {x̃1 , x̃2 } in twice the working precision by:
• Phase 1: compute the residual upon x1 as a pair of approximation vector and error
bound vector {r1, er1}. Use the system Ke = (r1 + er1) to improve e1.
• Phase 2: set x2 = mid(e1) and e2 = e1−mid(e1). compute the residual upon x1+x2
and use it to refine e2.
60
50
40
accuracy (bits) matlab
30 verfifylss
20 certifylss_relaxed
certifylssx
10
−10
−20
5 10 15 20 25 30 35 40 45 50
log2(condition number)
3.5 matlab
3 verifylss
execution time (s)
certifylss_relaxed
2.5
certifylssx
2
1.5
0.5
0
5 10 15 20 25 30 35 40 45 50
log2(condition number)
Figure 5.4: Effect of doubling the working precision for the approximate solution.
not really necessary to use twice the working precision. A lower precision is indeed sufficient
to achieve results accurate to the last bit in the working precision. For example, x̃ can be
coded by a couple {x̃1 , x̃2 } where x̃1 is one vector of IEEE double floating-point numbers
and x̃2 is one vector of IEEE single floating-point numbers. Using a lower precision for
the solution can help to reduce the computation time of each interval improvement step.
Nevertheless, using a lower precision will increase the error upon the residual computation.
Hence, it will decrease the rate of convergence, or in other words, increase the number of
iterations and thus the number of times residual computation is performed.
Figure 5.5 shows a comparison between the above function certifylssx and another
version of it called certifylssxs which uses IEEE single precision for x̃2 . One can see
that certifylssxs still succeeds to obtain very accurate results. Moreover, there is not
much difference between the two functions in terms of execution time.
5.5 Conclusion
In contrast to the previous chapter, which consists in relaxing the exactness to improve the
speed, in this chapter, we relax the speed requirement in order to improve the accuracy
by studying the effect of the computing precision on both the accuracy and the speed of
the algorithm. Classically, twice the working precision is used for the residual computation
88 CHAPTER 5. EFFECTS OF THE COMPUTING PRECISION
60
accuracy (bits) 50
40 matlab
certifylssx
30 certifylssxs
20
10
0
10 15 20 25 30 35 40 45 50
log2(condition number)
matlab
1.5 certifylssx
execution time(s)
certifylssxs
0.5
0
10 15 20 25 30 35 40 45 50
log2(condition number)
in order to obtain results which are accurate in the working precision. Nonetheless, in
our case where rigorous error bounds are needed or componentwise relative errors are in
question, doubling the working precision only for the residual computation does not suffice
to guarantee the last bits of the computed results when the matrix is too ill-conditioned.
By performing the error analysis as well as performing some experiments using arbi-
trary precision libraries, namely the MPFR and MPFI libraries, to study the effect of the
precision for each step on the final accuracy, we observed that in order to verify the last
bits of the approximate solution, it is needed to increase either the precision for inverting
the coefficient matrix or the precision for storing the intermediate approximate solution.
Nevertheless, an increase of precision for inverting the coefficient matrix will lead to a
drastic increase of execution time and memory usage. Therefore we chose to increase the
precision for storing the approximate solution. By reusing the procedure for computing the
residual in twice the working precision, the use of twice the working precision for storing
the approximate solution does not harm the algorithm’s speed when the condition number
is small to intermediate. Furthermore, when the matrix is too ill-conditioned, the use of
twice the working precision for the approximate solution helps to obtain very accurate
results, up to the last bit, at a cost of acceptable overhead of execution, as revealed in the
experimental section.
Conclusions and Perspectives
Conclusion
In this thesis, we concentrated on efficient implementations for verified linear algebra using
interval arithmetic. Following the fundamental property of inclusion, interval arithmetic
allows to compute an enclosure of the exact result, or a certificate of accuracy. Neverthe-
less, there are obstacles to the application of interval arithmetic. Firstly, interval arithmetic
suffers from low performance in comparison with floating-point arithmetic. Secondly, al-
gorithms using interval arithmetic usually provide results of large overestimation if care
is not taken when implementing them. Our main contribution is to find and implement
algorithms which run fast and at the same time provide sharp enclosures of the exact re-
sults. Two fundamental problems are addressed in this thesis, namely the multiplication of
interval matrices and the verification of the solution to a linear system. For each problem,
our contributions can be summarized as follows.
89
90 CONCLUSIONS AND PERSPECTIVES
Perspectives
During my Ph. D. preparation, MatLab is used as experimental environment. It allows
to obtain quickly a “proof of concept” for the performance as well as the accuracy of the
proposed algorithms. Nevertheless, because of the overhead in execution time introduced
by the interpretation, functions implemented in MatLab suffer from low performance when
employing loops such as the for and while loops, or when accessing separately rows and
columns of matrices.
Nonetheless, an implementation in other programming languages, such as C or FOR-
TRAN, does not suffer from these problems. By making direct calls to BLAS level 3
routines and accessing directly columns of the matrices, one can obtain a better implemen-
tation of the Gauss-Seidel iterations, and especially the residual computation. Although the
residual computation costs O(n2 ) floating-point operations, an implementation in MatLab
is much more expensive in terms of execution time as explained in Section 3.5.1. Therefore,
a better implementation of the residual computation will reduce significantly the execution
time of the method.
Our method for verifying the solution of a linear system allows to compute a very
accurate solution together with a tight enclosure of the error bound when the matrix is
not too ill-conditioned. Nevertheless, when the matrix is too ill-conditioned, i.e. when the
condition number of the matrix is close to 1/ with being the machine relative error,
91
then both our functions and the function verifylss of the IntLab library fail to verify the
solution. It is because that a good approximation of the inverse of the coefficient matrix
cannot be obtained.
To extend the range for condition number of verifiable systems, we can use a technique
proposed by Rump [Rum09b] in order to inverse extremely ill-conditioned matrices. This
technique is based on multiplicative correction terms and K-fold precision. For example,
using 2-fold precision, i.e. twice the working precision, one can obtain a good approximation
for matrices whose condition number falls between 1/ and 1/2 . This technique has also
been used in [OOR09] to refine the solution of ill-conditioned linear equations, as well as
in [Ogi10] to factorize ill-conditioned matrices.
A research direction that I am also interested in is the implementation of interval
arithmetic on new hardware architectures. Till now, there is no dedicated processor for
interval arithmetic. There was in fact a hardware chip called XPA 3233, which has been
developed by Ulrich W. Kulisch in 1993 [Kul08], which supports long interval arithmetic
through the exact dot product. However, this chip has never been put to production. The
implementation of interval arithmetic on traditional processors suffers from some limits
such as changing rounding mode, testing and branching, etc. Meanwhile new hardware
architectures, for example the Cell processor [GLRM08], the Graphics Processing Unit
(GPU) [CFD08], or the Field-Programmable Gate Array (FPGA), offer new computational
capacities as well as new optimization options. On these new hardware architectures,
numerous computations can be run simultaneously, for example the SIMD instructions on
the Cell processor and the GPU. However, they have also some restrictions. For example,
the Cell processor supports only the rounding mode toward zero for the single precision,
meanwhile the GPU supports only the rounding modes toward nearest and toward zero.
Therefore, care must be taken in order to obtain an efficient implementation of interval
arithmetic on these architectures.
Currently, the midpoint-radius representation is less used than the endpoints represen-
tation except for matrix operations because of its disadvantage of lower accuracy. However,
when the intervals are of small width then the midpoint-radius representation is almost
of the same accuracy as the endpoints representation. As observed in Section 2.3.3, in
some specific cases, the midpoint-radius representation is even of better accuracy. In im-
plementing on new hardware architecture, the midpoint-radius exhibits another advantage
in comparison with the endpoints representation, that is the midpoint and the radius can
be stored in different precisions. For intervals of small width, a smaller precision can be
used to represent the radius. That allows to reduce not only the memory usage but also
the cost of data transfer.
92 CONCLUSIONS AND PERSPECTIVES
Bibliography
[BM05] Sylvie Boldo and Jean-Michel Muller, Some Functions Computable with a
Fused-MAC, Proceedings of the 17th Symposium on Computer Arithmetic
(Cape Cod, USA) (Paolo Montuschi and Eric Schwarz, eds.), 2005, pp. 52–58.
[BT89] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and distributed computation, Pren-
tice Hall, 1989.
[CFD08] Sylvain Collange, Jorge Flóres, and David Defour, A GPU interval library
based on Boost interval, Real Numbers and Computers, July 2008, pp. 61–72.
[CK04] Martine Ceberio and Vladik Kreinovich, Fast Multiplication of Interval Matri-
ces (Interval Version of Strassen’s Algorithm), Reliable Computing 10 (2004),
no. 3, 241–243.
[Dek71] T.J. Dekker, A floating-point technique for extending the available precision,
Numer. Math. 18 (1971), 224–242.
[DHK+ 06] James Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Sonil Mukher-
jee, and E. Jason Riedy, Error bounds from extra-precise iterative refinement,
ACM Trans. Mathematical Software 32 (2006), no. 2, 325–351.
[FHL+ 07] Laurent Fousse, Guillaume Hanrot, Vicent Lefèvre, Patrick Pélissier, and
Paul Zimmermann, MPFR: A Multiple-Precision Binary Floating-Point Li-
brary with Correct Rounding, ACM Trans. Mathematical Software 33 (2007),
no. 2.
[GLRM08] Stef Graillat, Jean-Luc Lamotte, Siegfried M. Rump, and Svetoslav Markov,
Interval arithmetic on the Cell processor, 13th GAMM - IMACS International
Symposium on Scientific Computing, Computer Arithmetic and Verified Nu-
merical Computations SCAN’08, 2008, pp. 54–54.
[GNL09] Stef Graillat, Hong Diep Nguyen, and Jean-Luc Lamotte, Error-Free Trans-
formation in rounding mode toward zero, Lecture Notes in Computer Science
(LNCS) 5492 (2009), 217–229, Numerical Validation in Current Hardware
Architecture.
93
94 BIBLIOGRAPHY
[Hig97] Nicholas J. Higham, Iterative refinement for linear systems and LAPACK,
IMA Journal of Numerical Analysis 17 (1997), no. 4, 495–509.
[Hig02] , Accuracy and Stability of Numerical Algorithms, 2nd ed., SIAM Press,
2002.
[HKKK08] Chenyi Hu, R. Baker Kearfott, Andre de Korvin, and Vladik Kreinovich,
Knowledge processing with interval and soft computing, 1 ed., Springer Pub-
lishing Company, Incorporated, 2008.
[HLB01] Yozo Hida, Xiaoye S. Li, and David H Baily, Algorithms for quad-double pre-
cision floating point arithmetic, 15th IEEE Symposium on Computer Arith-
metic, 2001, pp. 155–162.
[IEE85] IEEE Task P754, ANSI/IEEE 754-1985, Standard for Binary Floating-Point
Arithmetic, IEEE, New York, NY, USA, August 1985.
[LDB+ 02] Xiaoye S. Li, James W. Demmel, David H. Bailey, Greg Henry, Yozo Hida,
Jimmy Iskandar, William Kahan, Suh Y. Kang, Anil Kapur, Michael C. Mar-
tin, Brandon J. Thompson, Teresa Tung, and Daniel J. Yoo, Design, im-
plementation and testing of extended and mixed precision blas, ACM Trans.
Mathematical Software 28 (2002), 152–205.
[LL08a] Philippe Langlois and Nicolas Louvet, Accurate solution of triangular linear
system, 13th GAMM - IMACS International Symposium on Scientific Com-
puting, Computer Arithmetic, and Validated Numerics, El Paso (TX), USA,
September 2008.
[LLL+ 06] Julie Langou, Julien Langou, Piotr Luszczek, Jakub Kurzak, Alfredo Buttari,
and Jack Dongarra, Exploiting the Performance of 32 bit Floating Point Arith-
metic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear
Systems) - Article 113 (17 pages), Proc. of the 2006 ACM/IEEE conference
on Supercomputing, 2006.
[MB79] Ramon E. Moore and Fritz Bierbaum, Methods and Applications of Interval
Analysis (SIAM Studies in Applied and Numerical Mathematics) (Siam Stud-
ies in Applied Mathematics, 2.), Soc for Industrial & Applied Math, 1979.
[Moo63] Ramon Edgar Moore, Interval arithmetic and automatic error analysis in
digital computing, Ph.D. thesis, Stanford University, Stanford, CA, USA, 1963.
[Neu90] Arnold Neumaier, Interval Methods for Systems of Equations, Cambridge Uni-
versity Press, 1990.
[NGL09] Hong Diep Nguyen, Stef Graillat, and Jean-Luc Lamotte, Extended precision
with a rounding mode toward zero environement. Application on the Cell pro-
cessor, Int. J. Reliability and Safety 3 (2009), no. 1/2/3, 153–173.
[NK97] S. Ning and R. B. Kearfott, A comparison of some methods for solving linear
interval equations, SIAM J. Numer. Anal. 34 (1997), no. 4, 1289–1305.
[NR] Hong Diep Nguyen and Nathalie Revol, Resolve and Certify a Linear System,
to appear on Reliable Computing (Special volume for SCAN 2008).
[Ogi10] Takeshi Ogita, Accurate matrix factorization: inverse lu and inverse qr fac-
torizations, SIAM J. Math. Anal. Appli. 31 (2010), no. 5, 2477–2497.
[Ois98] Shin’ichi Oishi, Finding all solutions of nonlinear systems of equations using
linear programming with guaranteed accuracy, J. Universal Computer Science
4 (1998), no. 2, 171–177.
[OO05] Takeshi Ogita and Shin’ichi Oishi, Fast inclusion of interval matrix multipli-
cation, Reliable Computing 11 (2005), no. 3, 191–205.
[OOO10] K. Ozaki, Takeshi Ogita, and Shin’ichi Oishi, Tight and efficient enclosure of
matrix multiplication by using optimized BLAS, preprint available at http:
//onlinelibrary.wiley.com/doi/10.1002/nla.724/pdf.
[OOR09] Shi’ichi Oishi, Takeshi Ogita, and Segfried M. Rump, Iterative Refinement
for Ill-conditioned Linear Equations, Japan J. Indust. Appl. Math. 26 (2009),
no. 2, 465–476.
[ORO05] Takeshi Ogita, Siegfried M. Rump, and Shin’ichi Oishi, Accurate Sum and
Dot Product, SIAM J. Sci. Comput. 26 (2005), no. 6, 1955–1988.
[Pan84] Victor Pan, How to multiply matrices faster, Springer-Verlag New York, Inc.,
New York, NY, USA, 1984.
[RK95] Jiří Rohn and Vladik Kreinovich, Computing Exact Componentwise Bounds
on Solutions of Linear Systems with Interval Data is NP-Hard, SIAM J. Ma-
trix Anal. Appl. 16 (1995), no. 2, 415–420.
[Roh93] Jiří Rohn, Cheap and Tight Bounds: The Recent Result by E. Hansen Can Be
Made More Efficient, Interval Computations (1993), no. 4, 13–21.
BIBLIOGRAPHY 97
[RR05] Nathalie Revol and Fabrice Rouillier, Motivations for an Arbitrary Precision
Interval Arithmetic and the MPFI Library, Reliable Computing 11 (2005),
no. 4, 275–290.
[vzGG03] J. von zur Gathen and J. Gerhard, Modern computer algebra, 2nd edition,
Cambridge University Press, 2003.