KEMBAR78
Mutation Testing | PDF | Cluster Analysis | Computing
0% found this document useful (0 votes)
63 views31 pages

Mutation Testing

This document provides a comprehensive survey and analysis of the development of mutation testing over three decades. It summarizes key findings from the literature, including how mutation testing has been applied at both the software implementation and design levels. Trend analyses show that mutation testing techniques and tools are maturing. The document also describes the authors' creation of a large mutation testing publication repository containing over 390 papers from 1977 to 2009.

Uploaded by

Polina Rohoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views31 pages

Mutation Testing

This document provides a comprehensive survey and analysis of the development of mutation testing over three decades. It summarizes key findings from the literature, including how mutation testing has been applied at both the software implementation and design levels. Trend analyses show that mutation testing techniques and tools are maturing. The document also describes the authors' creation of a large mutation testing publication repository containing over 390 papers from 1977 to 2009.

Uploaded by

Polina Rohoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1

An Analysis and Survey of the Development of


Mutation Testing
Yue Jia Student Member, IEEE, and Mark Harman Member, IEEE

Abstract— Mutation Testing is a fault–based software testing Besides using Mutation Testing at the software implementation
technique that has been widely studied for over three decades. level, it has also been applied at the design level to test the
The literature on Mutation Testing has contributed a set of specifications or models of a program. For example, at the design
approaches, tools, developments and empirical results. This paper
level Mutation Testing has been applied to Finite State Machines
provides a comprehensive analysis and survey of Mutation Test-
ing. The paper also presents the results of several development [20], [28], [88], [111], State charts [95], [231], [260], Estelle
trend analyses. These analyses provide evidence that Mutation Specifications [222], [223], Petri Nets [86], Network protocols
Testing techniques and tools are reaching a state of maturity [124], [202], [216], [238], Security Policies [139], [154], [165],
and applicability, while the topic of Mutation Testing itself is the [166], [201] and Web Services [140], [142], [143], [193], [245],
subject of increasing interest. [259].
Index Terms— mutation testing, survey Mutation Testing has been increasingly and widely studied
since it was first proposed in the 1970s. There has been much
research work on the various kinds of techniques seeking to
I. I NTRODUCTION
turn Mutation Testing into a practical testing approach. However,
Mutation Testing is a fault-based testing technique which pro- there is little survey work in the literature on Mutation Testing.
vides a testing criterion called the “mutation adequacy score”. The The first survey work was conducted by DeMillo [62] in 1989.
mutation adequacy score can be used to measure the effectiveness This work summarized the background and research achievements
of a test set in terms of its ability to detect faults. of Mutation Testing at this early stage of development of the
The general principle underlying Mutation Testing work is that field. A survey review of the (very specific) sub area of Strong,
the faults used by Mutation Testing represent the mistakes that Weak, and Firm mutation techniques was presented by Woodward
programmers often make. By carefully choosing the location and [253], [256]. An introductory chapter on Mutation Testing can
type of mutant, we can also simulate any test adequacy criteria. be found in the book by Mathur [155] and also in the book
Such faults are deliberately seeded into the original program, by by Ammann and Offutt [11]. The most recent survey work was
simple syntactic changes, to create a set of faulty programs called conducted by Offutt and Untch [191] in 2000. They summarized
mutants, each containing a different syntactic change. To assess the history of Mutation Testing and provide an overview of the
the quality of a given test set, these mutants are executed against existing optimization techniques for Mutation Testing. However,
the input test set. If the result of running a mutant is different since then, there have been more than 230 new publications on
from the result of running the original program for any test cases Mutation Testing.
in the input test set, the seeded fault denoted by the mutant is In order to provide a complete survey covering all the publica-
detected. One outcome of the Mutation Testing process is the tions related to Mutation Testing since the 1970s, we constructed
mutation score, which indicates the quality of the input test set. a Mutation Testing publication repository, which includes more
The mutation score is the ratio of the number of detected faults than 390 papers from 1977 to 2009 [121]. We also searched for
over the total number of the seeded faults. Master and PhD theses that have made a significant contribution
The history of Mutation Testing can be traced back to 1971 in to the development of Mutation Testing. These are listed in Table
a student paper by Richard Lipton [144]. The birth of the field I. We took four steps to build this repository. First we searched
can also be identified in papers published in the late 1970s by the online repositories of the main technical publishers, including
DeMillo et al. [66] and Hamlet [107]. IEEE explore, ACM Portal, Springer Online Library, Wiley Inter
Mutation Testing can be used for testing software at the unit Science and Elsevier Online Library, collecting papers which
level, the integration level and the specification level. It has been have either “mutation testing”, “mutation analysis”, “mutants +
applied to many programming languages as a white box unit test testing”, “mutation operator + testing”, “fault injection” and “fault
technique, for example, Fortran programs [3], [36], [40], [131], based testing” keywords in their title or abstract. Then we went
[145], [181], Ada programs [29], [192], C programs [6], [56], through the references for each paper in our repository, to find
[97], [213], [214], [237], [239], Java programs [44], [45], [127]– missing papers using the same keyword rules. In this way, we
[130], [150], [151], C# programs [69]–[73], SQL code [43], [212], performed a ‘transitive closure’ on the literature. Mutation Test-
[233], [234] and AspectJ programs [12], [13], [17], [90]. Mutation ing work which was not concerned with software, for example,
Testing has also been used for integration testing [54]–[56], [58]. hardware and also filtered out papers not written in English.
Finally we sent a draft of this paper to all cited authors asking
King’s College London them to check our citations. We have made the repository publicly
Centre for Research on Evolution, Search and Testing (CREST) available at http://www.dcs.kcl.ac.uk/pg/jiayue/repository/ [121].
Strand, London Overall growth trend of all papers in Mutation Testing can be
WC2R 2LS, UK
{yue.jia,mark.harman}@kcl.ac.uk found in Figure 1.
The rest of the paper is organized as follows. Section II
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2

55
Number of Publications r=0.7858, R2=0.7747 *
50 y = 1.1185 (0.9961 * x)
Mutation Testing Publications

45
40 *
35
30 *
25
20
15
*
10
5
0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09
Year
Fig. 1. Mutation Testing Publications from 1978-2009 (* indicates years in which a mutation workshop was held.)

TABLE I
A LIST OF P H D AND M ASTER WORK ON M UTATION T ESTING

Author Title Type University Year


Acree [2] On Mutation PhD Georgia Institute of Technology 1980
Hanks [108] Testing Cobol Programs by Mutation PhD Georgia Institute of Technology 1980
Budd [34] Mutation Analysis of Program Test Data PhD Yale University 1980
Tanaka [228] Equivalence Testing for Fortran Mutation System Using PhD Georgia Institute of Technology 1981
Data Flow Analysis
Morell [164] A Theory of Error-Based Testing PhD University of Maryland at College 1984
Park
Offutt [194] Automatic Test Data Generation PhD Georgia Institute of Technology 1988
Craft [48] Detecting Equivalent Mutants Using Compiler Optimiza- Master Clemson University 1989
tion Techniques
Choi [46] Software Testing Using High-performance Computers PhD Purdue University 1991
Krauser [132] Compiler-Integrated Software Testing PhD Purdue University 1991
Fichter [91] Parallelizing Mutation on a Hypercube Master Clemson University 1991
Lee [141] Weak vs. Strong: An Empirical Comparison of Mutation Master Clemson University 1991
Variants
Zapf [261] A Distributed Interpreter for the Mothra Mutation Test- PhD Clemson University 1993
ing System
Delamaro [52] Proteum - A Mutation Analysis Based Testing Environ- PhD University of São Paulo 1993
ment
Wong [248] On Mutation and Data Flow PhD Purdue University 1993
Pan [197] Using Constraints to Detect Equivalent Mutants Master George Mason University 1994
Untch [236] Schema-based Mutation Analysis: A New Test Data PhD Clemson University 1995
Adequacy Assessment Method
Ghosh [98] Testing Component-Based Distributed Applications PhD Purdue University 2000
Ding [74] Using Mutation to Generate Tests from Specifications Master George Mason University 2000
Okun [195] Specification Mutation for Test Generation and Analysis PhD University of Maryland Baltimore 2004
Ma [148] Object-oriented Mutation Testing for Java PhD KAIST University in Korea 2005
May [161] Test Data Generation: Two Evolutionary Approaches to PhD University of Kent 2007
Mutation Testing
Hussain [116] Mutation Clustering Master King’s College London 2008
Adamopoulos [4] Search Based Test Selection and Tailored Mutation Master King’s College London 2009
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 3

introduces the fundamental theory of Mutation Testing including


the hypotheses, the process and the problems of Mutation Testing.
Section III explains the techniques for the reduction of the com-
putational cost. Section IV introduces the techniques for detecting
equivalent mutants. The applications of Mutation Testing are
introduced in Section V. Section VI summarises the empirical
experiments of the research work on Mutation Testing. Section
VII describes the development work on mutation tools. Section
VIII discusses the evidences for the increasing importance of
Mutation Testing. Section IX discusses the unresolved problems,
barriers and the areas of success in Mutation Testing. The paper
concludes in Section X.

II. T HE T HEORY OF M UTATION T ESTING


Fig. 2. Generic Process of Mutation Analysis [191]
This section will first introduce the two fundamental hypotheses
of Mutation Testing. It then discusses the general process of
Mutation Testing and the problems from which it suffers. Coupling Effect Hypothesis now becomes “Complex mutants are
coupled to simple mutants in such a way that a test data set
A. Fundamental Hypotheses that detects all simple mutants in a program will also detect a
Mutation Testing promises to be effective in identifying ad- large percentage of the complex mutants [175]”. As a result, the
equate test data which can be used to find real faults [96]. mutants used in traditional Mutation Testing are limited to simple
However, the number of such potential faults for a given program mutants only.
is enormous; it is impossible to generate mutants representing all There has been much research work on the validation of
of them. Therefore, traditional Mutation Testing targets only a the coupling effect hypothesis [145], [164], [174], [175]. Lipton
subset of these faults, those which are close to the correct version and Sayward [145] conducted an empirical study using a small
of the program, with the hope that these will be sufficient to program, FIND. In their experiment, a small sample of 2nd -
simulate all faults. This theory is based on two hypotheses: the order, 3rd -order and 4th -order mutants is investigated. The results
Competent Programmer Hypothesis (CPH) [3], [66] and Coupling suggested that an adequate test set generated from 1st -order
Effect [66]. mutants was also adequate for the samples of kth -order mutants
The CPH was first introduced by DeMillo et al. in 1978 [66]. (k = 2, ..., 4). Offutt [174], [175] extended this experiment using
It states that programmers are competent, which implies that all possible 2nd -order mutants with two more programs, MID and
they tend to develop programs close to the correct version. As TRITYP. The results suggested that test data developed to kill
a result, although there may be faults in the program delivered 1st -order mutants killed over 99% 2nd -order and 3rd -order mutants.
by a competent programmer, we assume that these faults are This study implied that the mutation coupling effect hypothesis
merely a few simple faults which can be corrected by a few does, indeed manifest itself in practice. Similar results were found
small syntactical changes. Therefore, in Mutation Testing, only in the empirical study by Morell [164].
faults constructed from several simple syntactical changes are The validity of the mutation coupling effect has also been
applied, which represent the faults that are made by “competent considered in the theoretical studies of Wah [242]–[244] and
programmers”. An example of the CPH can be found in Acree Kappoor [125]. In Wah’s work [243], [244], a simple theoretical
et al.’s work [3]. A theoretical discussion using the concept of model, the q function model was proposed which considers a
program neighbourhoods can also be found in Budd et al.’s work program to be a set of finite functions. Wah applied test sets to
[37]. the 1st -order and the 2nd -order model. Empirical results indicated
The Coupling Effect was also proposed by DeMillo et al. in that the average survival ratio of 1st -order mutants and 2nd -order
1978 [66]. Unlike the CPH concerning a programmer’s behaviour, mutants is 1/n and 1/n2 respectively where n is the order of the
the Coupling Effect concerns the type of faults used in mutation domain [243]. This result is also similar to the estimated results
analysis. It states that “Test data that distinguishes all programs of the empirical studies mentioned above. A formal proof of the
differing from a correct one by only simple errors is so sensitive coupling effect on the boolean logic faults can be also found in
that it also implicitly distinguishes more complex errors”. Offutt Kappoor’s work [125].
[174], [175] extended this into the Coupling Effect Hypothesis
and the Mutation Coupling Effect Hypothesis with a precise
B. The Process of Mutation Analysis
definition of simple and complex faults (errors). In his definition,
a simple fault is represented by a simple mutant which is created The traditional process of mutation analysis is illustrated in
by making a single syntactical change, while a complex fault is Figure 2. In mutation analysis, from a program p, a set of faulty
represented as a complex mutant which is created by making more programs p′ called mutants, is generated by a few single syntactic
than one change. changes to the original program p. As an illustration, Table II
According to Offutt, the Coupling Effect Hypothesis is that shows the mutant p′ , generated by changing the and operator
“complex faults are coupled to simple faults in such a way that a (&&) of the original program p, into the or operator (||), thereby
test data set that detects all simple faults in a program will detect producing the mutant p′ .
a high percentage of the complex faults ” [175]. The Mutation A transformation rule that generates a mutant from the original
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 4

TABLE II
executed against the original program p to check its correctness
A E XAMPLE OF M UTATION O PERATION
for the test case. If p is incorrect, it has to be fixed before running
other mutants, otherwise each mutant p′ will then be run against
Program p Mutant p′
this test set T . If the result of running p′ is different from the
... ...
if ( a > 0 && b > 0 ) if ( a > 0 || b > 0 ) result of running p for any test case in T , then the mutant p′ is
return 1; return 1; said to be ‘killed’, otherwise it is said to have ‘survived’.
... ... After all test cases have been executed, there may still be a
few ‘surviving’ mutants. To improve the test set T , the program
TABLE III tester can provide additional test inputs to kill these surviving
T HE FIRST SET OF MUTATION OPERATORS : T HE 22 “M OTHRA” F ORTRAN mutants. However, there are some mutants that can never be
M UTATION O PERATORS ( ADAPTED FROM [131]) killed, because they always produce the same output as the
original program. These mutants are called Equivalent Mutants.
Mutation They are syntactically different but functionally equivalent to the
Operator Description original program. Automatically detecting all equivalent mutants
AAR array reference for array reference replacement is impossible [35], [187], because program equivalence is unde-
ABS absolute value insertion cidable. The equivalent mutant problem has been a barrier that
ACR array reference for constant replacement
AOR arithmetic operator replacement
prevents Mutation Testing from being more widely used. Several
ASR array reference for scalar variable replacement proposed solutions to the equivalent mutant problem are discussed
CAR constant for array reference replacement in Section IV.
CNR comparable array name replacement Mutation Testing concludes with an adequacy score, known
CRP constant replacement
CSR constant for scalar variable replacement as the Mutation Score, which indicates the quality of the input
DER DO statement alterations test set. The mutation score (MS) is the ratio of the number of
DSA DATA statement alterations killed mutants over the total number of non-equivalent mutants.
GLR GOTO label replacement
The goal of mutation analysis is to raise the mutation score to
LCR logical connector replacement
ROR relational operator replacement 1, indicating the test set T is sufficient to detect all the faults
RSR RETURN statement replacement denoted by the mutants.
SAN statement analysis
SAR scalar variable for array reference replacement
SCR scalar for constant replacement C. The Problems of Mutation Analysis
SDL statement deletion
SRC source constant replacement Although Mutation Testing is able to effectively assess the
SVR scalar variable replacement quality of a test set, it still suffers from a number of problems. One
UOI unary operator insertion problem that prevents Mutation Testing from becoming a practical
testing technique is the high computational cost of executing
the enormous number of mutants against a test set. The other
program is known as a mutation operator1 . Table II contains problems are related to the amount of human effort involved in
only one example of a mutation operator; there are many others. using Mutation Testing. For example, the human oracle problem
Typical mutation operators are designed to modify variables and [247] and the equivalent mutant problem [35].
expressions by replacement, insertion or deletion operators. Table The human oracle problem refers to the process of checking the
III lists the first set of formalized mutation operators for the original program’s output with each test case. Strictly speaking,
Fortran programming language. These typical mutation operators this is not a problem unique to Mutation Testing. In all forms of
were implemented in the Mothra mutation system [131]. testing, once a set of inputs has been arrived at, there remains
To increase the flexibility of Mutation Testing in practical the problem of checking output [247]. However, mutating testing
applications, Jia and Harman introduced a scripting language, the is effective precisely because it is demanding and this can lead
Mutation Operator Constraint Script (MOCS) [123]. The MOCS to an increase in the number of test cases, thereby increasing
provides two types of constraint: Direct Substitution Constraint oracle cost. This oracle cost is often the most expensive part of
and Environmental Condition Constraint. The Direct Substitution the overall test activity. Also, because of the undecidability of
Constraint allows users to select a specific transformation rule mutant equivalence, the detection of equivalent mutants typically
that performs a simple change while the Environmental Condition involves additional human effort.
Constraint is used to specify the domain for applying mutation Although it is impossible to completely solve these problems,
operators. Simao et al. [217] also proposed a transformation with existing advances in Mutation Testing, the process of Mu-
language, M U D EL, used to specify the description of mutation tation Testing can be automated and the run-time can allow for
operators. Besides modifying program source, mutation operators reasonable scalability, as this survey will show. A lot of previous
can also be defined as rules to modify the grammar used to capture work has focused on techniques to reduce computational cost, a
the syntax of a software artefact. A much more detailed account topic to which we now turn.
of these grammar-based mutation operators can be found in the
work of Offutt et al. [177]. III. C OST R EDUCTION T ECHNIQUES
In the next step, a test set T is supplied to the system. Before
Mutation Testing is widely believed to be a computationally
starting the mutation analysis, this test set needs to be successfully
expensive testing technique. However, this belief is partly based
1 In the literature of Mutation Testing, mutation operators are also known on the outdated assumption that all mutants in the traditional
as mutant operators, mutagenic operators, mutagens and mutation rules [191]. Mothra set need to be considered. In order to turn Mutation
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 5

ditional Mutation Testing. x% of these mutants are then selected


randomly for mutation analysis and the remaining mutants are
discarded. There were many empirical studies of this approach.
The primary focus was on the choice of the random selection
rate (x). In Wong and Mathur’s studies [159], [248], the authors
conducted an experiment using a random selection rate x% from
10% to 40% in steps of 5%. The results suggested that random
selection of 10% of mutants is only 16% less effective than a full
set of mutants in terms of mutation score. This study implied that
Mutant Sampling is valid with a x% value higher than 10%. This
finding also agreed with the empirical studies by DeMillo et al.
[64] and King and Offutt [131]. Instead of fixing the sample rate,
Sahinoglu and Spafford [207] proposed an alternative sampling
approach based on the Bayesian sequential probability ratio test
(SPRT). In their approach, the mutants are randomly selected until
a statistically appropriate sample size has been reached. The result
suggested that their model is more sensitive than the random
selection becuase it is self-adjusting based on the avilable test
set.
2) Mutant Clustering: The idea of Mutant Clustering was first
proposed in Hussain’s masters thesis [116]. Instead of selecting
Fig. 4. Percentage of publications using each Mutant Reduction Technique
mutants randomly, Mutant Clustering chooses a subset of mutants
using clustering algorithms. The process of Mutation Clustering
starts from generating all first order mutants. A clustering al-
Testing into a practical testing technique, many cost reduction
gorithm is then applied to classify the first order mutants into
techniques have been proposed. In the survey work of Offutt
different clusters based on the killable test cases. Each mutant
and Untch [191], cost reduction techniques are divided into three
in the same cluster is guaranteed to be killed by a similar set
types: ‘do fewer’, ‘do faster’ and ‘do smarter’. In this paper, these
of test cases. Only a small number of mutants are selected
techniques are classified into two types, reduction of the generated
from each cluster to be used in Mutation Testing, the remaining
mutants (which corresponds to ‘do fewer’) and reduction of the
mutants are discarded. In Hussain’s experiment, two clustering
execution cost (which combines do faster and do smarter). Figure
algorithms, K-means and Agglomerative clustering were applied
3 provides an overview of the chronological development of
and the result was compared with random and greedy selection
published ideas for cost reduction.
strategies. Empirical results suggest that Mutant Clustering is able
To take a closer look at the cost reduction research work,
to select fewer mutants but still maintain the mutation score. A
we counted the number of publications for each technique (see
development of the Mutant Clustering approach can be found
Figure 4). From this figure, it is clear that Selective Mutation
in the work of Ji et al. [120]. Ji et al. use a domain reduction
and Weak Mutation are the most widely studied cost reduction
technique to avoid the need to execute all mutants.
techniques. Each of the other techniques is studied in no more
3) Selective Mutation: A reduction in the number of mutants
than five papers, to date. The rest of the section will introduce
can also be achieved by reducing the number of mutation op-
each cost reduction technique in detail. Section III-A will present
erators applied. This is the basic idea, underpinning Selective
work on mutant reduction techniques, while Section III-B will
Mutation, which seeks to find a small set of mutation operators
cover execution reduction techniques.
that generate a subset of all possible mutants without signif-
icant loss of test effectiveness. This idea was first suggested
A. Mutant Reduction Techniques as “constrained mutation” by Mathur [156]. Offutt et al. [190]
One of the major sources of computational cost in Mutation subsequently extended this idea calling it Selective Mutation.
Testing is the inherent running cost in executing the large number Mutation operators generate different numbers of mutants and
of mutants against the test set. As a result, reducing the number of some mutation operators generate far more mutants than others,
generated mutants without significant loss of test effectiveness has many of which may turn out to be redundant. For example,
become a popular research problem. For a given set of mutants, two mutation operators of the 22 Mothra operators, ASR and
M , and a set of test data T , M ST (M ) denotes the mutation score SVR, were reported to generate approximately 30% to 40% of
of the test set T applied to mutants M . The mutant reduction all mutants [131]. To effectively reduce the generated mutants,
problem can be defined as the problem of finding a subset of Mathur [156] suggested omitting two mutation operators ASR
mutants M ′ from M , where M ST (M ′ ) ≈ M ST (M ). This section and SVR which generated most of the mutants. This idea was
will introduce four techniques used to reduce the number of implemented as “2-selective mutation” by Offutt et al. [190].
mutants, Mutant Sampling, Mutant Clustering, Selective Mutation Offutt et al. [190] have also extended Mathur and Wong’s work
and Higher Order Mutation. by omitting four mutation operators (4-selective mutation) and
1) Mutant Sampling: Mutant Sampling is a simple approach omitting six mutation operators (6-selective mutation). In their
that randomly chooses a small subset of mutants from the entire studies, they reported that 2-selective mutation achieved a mean
set. This idea was first proposed by Acree [2] and Budd [34]. In mutation score of 99.99% with a 24% reduction in the number of
this approach, all possible mutants are generated first as in tra- mutants reduced. 4-selective mutation achieved a mean mutation
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 6

Fig. 3. Overview of the Chronological Development of Mutant Reduction Techniques

score of 99.84% with a 41% reduction in the number of mutants. 10 selected mutation operators, which achieved a mean mutation
6-selective mutation achieved a mean mutation score of 88.71% score of 99.6% with a 65.02% reduction in the number of mutants.
with a 60% reduction in the number of mutants. They also compared their operators with Wong’s and Offutt et al.’s
Wong and Mathur adopted another type of selection strategy, set. The results showed their operator set achieved the highest
selection based on test effectiveness [248], [252], known as mutation score.
constraint mutation. Wong and Mathur suggested using only two The most recent research work on selective mutation was
mutation operators: ABS and RAR. The motivation for the ABS conducted by Namin et al. [168]–[170]. They formulated the
operator is that killing the mutants generated from ABS requires selective mutation problem as a statistical problem: the variable
test cases from different parts of the input domain. The motivation selection or reduction problem. They applied linear statistical
for the ROR operator is that killing the mutants generated from approaches to identify a subset of 28 mutation operators from
ROR requires test cases which ‘examine’ the mutated predicate 108 C mutation operators. The results suggested that these 28
[248], [252]. Empirical results suggest that these two mutation operators are sufficient to predict the effectiveness of a test
operators achieve an 80% reduction in the number of mutants suite and it reduced 92% of all generated mutants. According to
and only 5% reduction in the mutation score in practice. their results, this approach achieved the highest rate of reduction
Offutt et al. [182] extended their 6-selective mutation further compared with other approaches.
using a similar selection strategy. Based on the type of the Mothra 4) Higher Order Mutation: Higher Order Mutation is a com-
mutation operators, they divided them into three categories: paratively new form of Mutation Testing introduced by Jia and
statements, operands and expressions. They tried to omit operators Harman [122]. The underlying motivation was to find those rare
from each class in turn. They discovered that 5 operators from but valuable higher order mutants that denote subtle faults. In
the operands and expressions class became the key operators. traditional Mutation Testing, mutants can be classified into first
These 5 operators are ABS, UOI, LCR, AOR and ROR. These order mutants (FOMs) and higher order mutants (HOMs). FOMs
key operators achieved 99.5% mutation score. are created by applying a mutation operator only once. HOMs
Mresa and Bottaci [167] proposed a different type of selective are generated by applying mutation operators more than once.
mutation. Instead of trying to achieve a small loss of test effective- In their work, Jia and Harman introduced the concept of
ness, they also took the cost of detecting equivalent mutants into subsuming HOMs. A subsuming HOM is harder to kill than
consideration. In their work, each mutation operator is assigned the FOMs from which it is constructed. As a result, it may be
a score which is computed by its value and cost. Their results preferable to replace FOMs with the single HOM to reduce the
indicated that it was possible to reduce the number of equivalent number of the mutants. In particular, they also introduced the
mutants while maintaining effectiveness. concept of a strongly subsuming HOM (SSHOM) which is only
Based on previous experience, Barbosa et al. [19] defined a killed by a subset of the intersection of test cases that kill each
guideline for selecting a sufficient set of mutation operators from FOM from which it is constructed.
all possible mutation operators. They applied this guideline to This idea has been partly proved by Polo et al.’s work [199].
Proteum’s 77 C mutation operators [6] and obtained a set of In their experiment, they focused on a specific order of HOMs,
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 7

the second order mutants. They proposed different algorithms to experiment, four of Howden’s five program components were
combine first order mutants to generate the second order ones. considered. The results suggested that weak mutation is less
Empirical results suggest that applying second order mutants computationally expensive than strong mutation. Marick [153]
reduced test effort by approximately 50%, without much loss drew similar conclusions from his experiments.
of test effectiveness. More recently, Langdon et al. have applied A theoretical proof of Weak Mutation by Horgan and Mathur
multi-object genetic programming to the generation of higher [113] showed that under certain conditions, test sets generated by
order mutants [136], [137]. In their experiment, they have found weak mutation can also be expected to be as effective as strong
realistic higher order mutants that are harder to kill than any first mutation. Offutt and Lee [183], [184] presented a comprehensive
order mutant. empirical study using a weak mutation system named Leonardo.
In their experiment, they used the 22 Mothra mutation operators
B. Execution Cost Reduction Techniques as fault models instead of Howden’s five component set. The
In addition to reducing the number of generated mutants, the results from their experiments indicated that Weak Mutation is an
computational cost can also be reduced by optimizing the mutant alternative to Strong Mutation in most common cases, agreeing
execution process. This section will introduce the three types of with the probabilistic results of Horgan and Mathur [113] and
techniques used to optimize the execution process that have been experimental results of Girgis and Woodward [103] and Marick
considered in the literature. [153].
1) Strong, Weak and Firm Mutation: Based on the way in Firm Mutation was first proposed by Woodward and Halewood
which we decide whether to analyse if a mutant is killed during [257]. The idea of Firm Mutation is to overcome the disadvan-
the execution process, Mutation Testing techniques can be classi- tages of both weak and strong mutations by providing a contin-
fied into three types, Strong Mutation, Weak Mutation and Firm uum of intermediate possibilities. That is, the ‘compare state’ of
Mutation. Firm Mutation lies between the intermediate states after execution
Strong Mutation is often referred to as traditional Mutation (Weak Mutation) and the final output (Strong Mutation). In 2001,
Testing. That is, it is the formulation originally proposed by Jackson and Woodward [119] proposed a parallel Firm Mutation
DeMillo et al. [66]. In Strong Mutation, for a given program approach for Java programs. Unfortunately, to date there is no
p, a mutant m of program p is said to be killed only if mutant m publicly available firm mutation tool.
gives a different output from the original program p. 2) Run-time Optimization Techniques: The Interpreter-Based
To optimize the execution of the Strong Mutation, Howden Technique is one of the optimization techniques used in the first
[115] proposed Weak Mutation. In Weak Mutation, a program generation of Mutation Testing tools [131], [181]. In traditional
p is assumed to be constructed from a set of components C = Interpreter-Based Techniques, the result of a mutant is interpreted
{c1 , ..., cn }. Suppose mutant m is made by changing component from its source code directly. The main cost of this technique
cm , mutant m is said to be killed if any execution of component is determined by the cost of interpretation. To optimise the
cm is different from mutant m. As a result, in Weak Mutation, traditional Interpreter-Based approach, Offutt and King [131],
instead of checking mutants after the execution of the entire [181] translated the original program into an intermediate form.
program, the mutants need only to be checked immediately after Mutation and interpretation are performed at this intermediate
the execution point of the mutant or mutated component. code level. Interpreter-Based tools provide additional flexibility
In Howden’s work [115], the component C referred to one of and are sufficiently efficient for mutating small programs. How-
the following five types: variable reference, variable assignment, ever, due to the nature of interpretation, it becomes slower as the
arithmetic expression, relational expression and boolean expres- scale of programs under test increases.
sion. This definition of components was later refined by Offutt and The Compiler-Based Technique is the most common approach
Lee [183], [184]. Offutt and Lee defined four types of execution: to achieve program mutation [52], [53]. In a Compiler-Based
evaluation after the first execution of an expression (Ex-Weak/1), Technique, each mutant is first compiled into an executable
the first execution of a statement (St-Weak/1), the first execution program; the compiled mutant is then executed by a number of
of a basic block (BB-Weak/1) and after N iterations of a basic test cases. Compared to source code interpretation techniques, this
block in a loop ((BB-Weak/N ). approach is much faster because execution of compiled binary
The advantage of weak mutation is that each mutant does code takes less time than interpretation. However, there is also a
not require a complete execution process; once the mutated speed limitation, known as compilation bottleneck, due to the high
component is executed we can check for survival. Moreover, it compilation cost for programs whose run-time is much longer
might not even be necessary to generate each mutant, as the than the compilation/link time. [47].
constraints for the test data can sometimes be determined in DeMillo et al. proposed the Compiler-Integrated Technique
advance [253]. However, as different components of the original [65] to optimise the performance of the traditional Compiler-
program may give different outputs from the original execution, Based Techniques. Because there is only a minor syntactic differ-
weak mutation test sets can be less effective than strong mutation ence between each mutant and the original program, compiling
test sets. In this way, weak mutation sacrifices test effectiveness each mutant separately in the Compiler-Based technique will
for improvements in test effort. This raises the question as to what result in redundant compilation cost. In the Compiler-Integrated
kind of trade-off can be achieved. technique, an instrumented compiler is designed to generate and
There were many empirical studies on the Weak Mutation compile mutants.
trade off. Girgis and Woodward [103] implemented a weak The instrumented compiler generates two outputs from the orig-
mutation system for Fortran 77 programs. Their system is an inal program: an executable object code for the original program
analytical type of weak mutation system in which the mutants and a set of patches for mutants. Each patch contains instructions
are killed by examining the program’s internal state. In their which can be applied to convert the original executable object
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 8

TABLE IV
code image directly to executable code for a mutant. As a result,
A E XAMPLE OF E QUIVALENT M UTATION
this technique can effectively reduce the redundant cost from
individual compilation. A much more detailed account can be
Program p Equivalent Mutant m
found in the Krauser’s PhD thesis [132].
The Mutant Schema Generation approach is also designed to for (int i = 0; i < 10; i + +) for (int i = 0; i ! = 10; i++)
reduce the overhead cost of the traditional interpreter-based tech- { {
...(the value of i ...(the value of i
niques [235]–[237]. Instead of compiling each mutant separately, is not changed) is not changed)
the mutant schema technique generates a metaprogram. Just like } }
a ‘super mutant’ this metaprogram can be used to represent
all possible mutants. Therefore, to run each mutant against the
test set, only this metaprogram need be compiled. The cost of
this technique is composed of a one-time compilation cost and IV. E QUIVALENT M UTANT D ETECTION T ECHNIQUES
the overall run-time cost. As this metaprogram is a compiled
program, its running speed is faster than the interpreter-based To detect if a program and one of its mutants programs are
technique. The results from Untch et al.’s work [237] suggest that equivalent is undecidable, as proved in the work of Budd and
the mutant schema prototype tool, TUMS, is significantly faster Angluin [35]. As a result, the detection of equivalent mutants
than Mothra using interpreter techniques. Much more extensive alternatively may have to be carried out by humans. This has been
results are reported in detail in the Untch’s PhD dissertation [236]. a source of much theoretical interest. For a given program p, m
A similar idea of the Mutant Schemata technique, named the denotes a mutant of program p. Recall that m is an equivalent
Mutant Container, was proposed by Mathur independently. The mutant if m is syntactically different from p, but has the same
details can be found in a software engineering course ‘handout’ behaviour with p. Table IV shows an example of equivalent
by Mathur [157]. mutant generated by changing the operator < of the original
program into the operator ! =. If the statements within the loop do
The most recent work on reduction of the compilation cost not change the value of i, program p and mutant m will produce
is the Bytecode Translation Technique. This technique was first identical output.
proposed by Ma et al. [151], [185]. In Bytecode Translation, An equivalent mutant is created when a mutation leads to no
mutants are generated from the compiled object code of the possible observable change in behaviour; the mutant is syntacti-
original program, instead of the source code. As a result, the cally different but semantically identical to the original program
generated ‘bytecode mutants’ can be executed directly without from which it is created. Grün et al. [106] manually investigated
compilation. As well as saving compilation cost, Bytecode Trans- eight equivalent mutants generated from the JAXEN XPATH
lation can also handle off-the-shelf programs which do not have query engine program. They pointed out four common equivalent
available source code. This technique has been adopted in the Java mutant situations: the mutant is generated from dead code, the
programming language [151], [152], [185], [208]. However, not mutant improves speed, the mutant only alters the internal states
all programming languages provide an easy way to manipulate and the mutant cannot be triggered (i.e. no input test data can
intermediate object code. There are also some limitations for the change the program’s behaviour at the mutation point). It is worth
application of Bytecode Translation in Java, such as not all the noticing that these four are not the only situations that lead to
mutation operators can be represented at the Bytecode level [208]. equivalent mutants. For example, none of it applies to the example
Bogacki and Walter introduced an alternative approach to in Table IV.
reduce compilation cost, called Aspect-Oriented Mutation [26], As the mutation score is counted based on non-equivalent mu-
[27]. In their approach, an aspect patch is generated to capture tants, without a complete detection of all equivalent mutants, the
the output of a method on the fly. Each aspect patch will run mutant score can never be 100%, which means the programmer
programs twice. The first execution obtains the results and context will not have complete confidence in the adequacy of a potentially
of the original program and mutants are generated and executed perfectly adequate test set. Empirical results indicate that there
in the second execution. As a result, there is no need to compile are 10% to 40% of mutants which are equivalent [178], [187].
each mutant. Empirical evaluation between a prototype tool and Fortunately, there has been much research work on the detection
Jester can be found in the work of Bogacki and Walter [26]. of the equivalent mutants.
Baldwin and Sayward [18] proposed an approach that used
3) Advanced Platforms Support for Mutation Testing: Muta- compiler optimization techniques to detect equivalent mutants.
tion Testing has also been applied to many advanced computer This approach is based on the idea that the optimization procedure
architectures to distribute the overall computational cost among of source code will produce an equivalent program, so a mutant
many processors. In 1988, Mathur and Krauser [158] were the might be detected as equivalent mutants by either ‘optimization’
first to perform Mutation Testing on a vector processor system. or a ‘de-optimization process’. Baldwin and Sayward [18] pro-
Krauser et al. [133], [134] proposed an approach for concur- posed six types of compiler optimization rules that can be used for
rent execution mutants under SIMD machines. Fleyshgakker the detection of equivalent mutants. These six were implemented
and Weiss [92], [246] proposed an algorithm that significantly and empirically studied by Offutt and Craft [178]. The empirical
improved techniques for parallel Mutation Testing. Choi and results showed that, generally, 10% of all mutants were equivalent
Mathur [47] and Offutt et al. [189] have distributed the execution mutants for 15 subject programs.
cost of Mutation Testing through MIMD machines. Zapf [261] Based on the work of constraint test data generation, Offutt
extended this idea in a network environment, where each mutant and Pan [186], [187], [197] introduced a new equivalent mutant
is executed independently. detection approach using constraint solving. In their approach,
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 9

the equivalent mutant problem is formulated as a constraint


satisfaction problem by analysing the path condition of a mutant.
A mutant is equivalent if and only if the input constraint is unsat-
isfiable. Empirical evaluation of a prototype has shown that this
technique is able to detect a significant percentage of equivalent
mutants (47.63% among 11 subject programs) for most of the
programs. Their results suggest that the constraint satisfaction
formulation is more powerful than the compiler optimization
technique [178].
The program slicing technique has also been proposed to assist
in the detection of equivalent mutants [109], [110], [241]. Voas
and McGraw [241] were the first to suggest the application
of program slicing to Mutation Testing. Hierons et al. [110]
demonstrated an approach using slicing to assist the human
analysis of equivalent mutants. This is achieved by the generation
of a sliced program that denotes the answer to an equivalent
mutant. This work was later extended by Harman et al. [109]
using dependence analysis.
Adamopoulos et al. [5] proposed a co-evolutionary approach
to detect possible equivalent mutants. In their work, a fitness
function was designed to set a poor fitness value to an equivalent
Fig. 6. Percentage of publications addressing each language to which
mutant. Using this fitness function, equivalent mutants are wiped Mutation Testing has been applied
out during the co-evolution process and only mutants that are
hard to kill and test cases that are good at detecting mutants are
selected. Program Mutation and Specification Mutation are summarized
Ellims et al. [83] reported that mutants with syntactic difference by the programming language targeted.
and the same output can be also semantically different in terms of
running profile. These mutants often have the same output as the A. Program Mutation
original programs but have different execution time or memory Program Mutation has been applied to both the unit level
usage. Ellims et al. suggested that ‘resource-aware’ might be used [66] and the integration level [55] of testing. For unit level
to kill the potential mutants. Program Mutation, mutants are generated to represent the faults
The most recent work on the equivalent mutants was conducted that programmers might have made within a software unit, while
by Grün et al. [106] who investigated the impact of mutants. for the integration level Program Mutation, mutants are designed
The impact of a mutant was defined as the different program to represent the integration faults caused by the connection
behaviour between the original program and the mutant and it or interaction between software units [240]. Applying Program
was measured through the code coverage in their experiment. The Mutation at the integration level is also known as Interface
empirical results suggested that there was a strong correlation Mutation which was first introduced by Delamaro et al. [55]
between mutant ‘killability’ and its impact on execution, which in 1996. Interface Mutation has been applied to C Programs by
indicates that if a mutant has higher impact, it is less likely to be Delamaro et al. [54]–[56] and also to the CORBA Programs by
equivalent. Ghosh and Mathur [98], [100]–[102]. Empirical evaluations of
Interface Mutation can be found in Vincenzei et al.’s work [240]
V. T HE A PPLICATION OF M UTATION T ESTING and Delamaro et al.’s work [57], [58].
1) Mutation Testing for Fortran: In the earliest days of Muta-
Since Mutation Testing was proposed in the 1970s, it has been tion Testing, most of the experiments on Mutation Testing targeted
applied to test both program source code (Program Mutation) Fortran. Budd et al. [36], [40] was the first to design mutation
[60] and program specification (Specification Mutation) [105]. operators for Fortran IV in 1977. Based on these studies, a
Program Mutation belongs to the category of white box based Mutation Testing tool named PIMS was developed for testing
testing, in which faults are seeded into source code, while Fortran IV programs [3], [36], [145]. However, there were no
Specification Mutation belongs to black box based testing where formal definitions of mutation operators for Fortran until 1987.
faults are seeded into program specifications, but in which the In 1987, Offutt and King [131], [181] summarized the results
source code may be unavailable during testing. from previous work and proposed 22 mutation operators for
Figure 5 shows the chronological development of research work Fortran 77. This set of mutation operators became the first set
on Program Mutation and Specification Mutation. Figure 6 shows of formalized mutation operators and consequently had greater
the percentage of the publications addressing each language to influence on later definitions of mutation operators for applying
which Mutation Testing has been applied. As Figure 5 shows, Mutation Testing to the other programming languages. These
there has been more work on Program Mutation than Specification mutation operators are divided into three groups; the Statement
Mutation. Notably more than 50% of the work has been applied analysis group, the Predicate analysis group and the Coincidental
to Java, Fortran and C. Fortran features highly because a lot correctness group.
of the earlier work on Mutation Testing was carried out on 2) Mutation Testing for Ada: Ada mutation operators were first
Fortran programs. In the following section, the applications of proposed by Bowser [29] in 1988. In 1997, based on previous
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 10

Fig. 5. Publications of the Applications of Mutation Testing

work of Bowser’s Ada mutation operators [29], Agrawal et al.’s Ghosh et al. [97] have applied Mutation Testing to an Adaptive
C mutation operators [6] and the design of Fortran 77 mutation Vulnerability Analysis (AVA) to detect BOFs.
operators for M OTHRA [131], Offutt et al. [192] redesigned 4) Mutation Testing for Java: Traditional mutation operators
mutation operators for Ada programs to produce a proposed are not sufficient for testing Object Oriented (OO) programming
set of 65 Ada mutation operators. According to the semantics languages like Java [130], [151]. This is mainly because the faults
of Ada, this set of Ada mutation operators is divided into five represented by the traditional mutation operators are different to
groups: Operand Replacement Operators group, Statement Op- those in the OO environment, due to OO’s different programming
erators group, Expression Operators group, Coverage Operators structure. Moreover, there are new faults, introduced by OO-
group and Tasking Operators group. specific features, such as inheritance and polymorphism.
3) Mutation Testing for C: In 1989, Agrawal et al. [6] pro- As a result, the design of Java mutation operators was not
posed a comprehensive set of mutation operators for the ANSI strongly influenced by previous work. Kim et al. [128] were
C programming language. There were 77 mutation operators the first to design mutation operators for the Java programming
defined in this set, which was designed to follow the C language language. They proposed 20 mutation operators for Java using
specification. These operators are classified into variable mutation, HAZOP (Hazard and Operability Studies). HAZOP is a safety
operator mutation, constant mutation and statement mutation. technique which investigates and records the result of system
Delamaro et al. [54]–[56], [58] investigated the application of deviations. In Kim et al.’s work, HAZOP was applied to the
Mutation Testing at the integration level. They selected 10 mu- Java syntax definition to identify the plausible faults of the
tation operators from Agrawal et al.’s 77 mutation operators to Java programming language. Based on these plausible faults, 20
test interfaces of C programs. These mutation operators focus Java mutation operators were designed, falling into six groups:
on injecting faults into the signature of public functions. More Types/Variables, Names, Classes/interface declarations, Blocks,
recently, Higher Order Mutation Testing has also been applied to Expressions and others.
C Programs by Jia and Harman [122]. Based on their previous work on Java mutation operators,
There are also mutation operators that target specific C program Kim et al. [127] introduced Class Mutation, which applies mu-
defects or vulnerabilities. Shahriar and Zulkernine [214] proposed tation to OO (Java) programs targeting faults related to OO-
8 mutation operators to generate mutants that represent Format specific features. In Class Mutation, three mutation operators
String Bugs (FSBs). Vilela et al. [239] proposed 2 mutation representing Java OO-features were selected from the 20 Java
operators representing faults associated with static and dynamic mutation operators. In 2000, Kim et al. [129] added another
memory allocations, which were used to detect Buffer Overflows 10 mutation operators for Class Mutation. Finally, in 2001, the
(BOFs). This work was subsequently extended by Shahriar and number of the Class mutation operators was extended to 15
Zulkernine [213] who proposed 12 comprehensive mutation op- and these mutation operators were classified into four types:
erators to support the testing of all BOF vulnerabilities, targeting polymorphic types, method overloading types, information hiding
vulnerable library functions, program statements and buffer size. and exception handling types [130]. A similar approach was also
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 11

adopted by Chevalley and Thevenod-Fosse in their work [44], implementation. Empirical results from evaluation of this work
[45]. using real world applications can also be found in their work [90].
Ma et al. [150], [151] pointed out that the design of mutation A recent work from Delamare et al. introduced an approach to
operators should not start with the selected approach (Kim et al.’s detect equivalent mutants in AOP programs using static analysis
approach [127]). They suggested that the selected mutation opera- of aspects and base code [51].
tors should be obtained from empirical results of the effectiveness AspectJ is a widely studied aspect-oriented extension of the
of all mutation operators. Therefore, instead of continuing Kim Java language, which provides many special constructs such as
et al.’s work [129], Ma et al. [150] proposed 24 comprehensive aspects, advice, join points and pointcuts [13]. Baekken and
Java mutation operators based on previous studies of OO Fault Alexander [17] summarised previous research work on the fault
models. These are classified into six groups: Information Hid- model associated with AspectJ pointcuts. They proposed a com-
ing group, Inheritance group, Polymorphism group, Overloading plete AspectJ fault model based on the incorrect pointcut pattern,
group, Java Specific Features group and Common Programming which was used as a set of mutation operators for AspectJ
Mistakes group. Ma et al. conducted an experiment to evaluate programs. Based on this work, Anbalagan and Xie [12], [13]
the usefulness of the proposed class mutation operators [149]. The proposed a framework to generate mutants for pointcuts and to
results suggested that some class mutation model faults can be detect equivalent mutants. To reduce the total number of mutants,
detected by traditional Mutation Testing. However, the mutants a classification and ranking approach based on the strength of the
generated by the EOA class mutation (Reference assignment and pointcuts was also introduced in their framework.
content assignment replacement) and the EOC class mutation 8) Other Program Mutation Applications: Besides these pro-
(Reference comparison and content comparison replacement) can gramming languages, Mutation Testing has also been applied to
not be killed by a traditional mutation adequate test set. Lustre programs [80], [81], PHP programs [215], Cobol programs
There are also alternative approaches to the definition of the [108], Matlab/Simulink [262] and spreadsheets [1]. There is also
mutation operators for Java. For example, instead of applying research work investigating the design of mutation operators
mutation operators to the program source, Alexander et al. [9], for real-time systems [96], [171], [172], [227] and concurrent
[24] designed a set of mutation operators to inject faults into programs [8], [31], [41], [99], [147].
Java utility libraries, such as, the Java container library and the
iterator library. Based on work on traditional mutation operators, B. Specification Mutation
Bradury et al. [31] introduced an extension to the concurrent Java Although Mutation Testing was originally proposed as a white
environment. box testing technique at the implementation level, it has also
5) Mutation Testing for C#: Based on previous proposed Java been applied at the software design level. Mutation Testing at
mutation operators, Dereziǹska introduced an extension to a set design level is often referred to as ‘Specification Mutation’, which
of C# specialized mutation operators [70], [71] and implemented was first introduced by Gopal and Budd in 1983 [38], [105]. In
them in a C# mutation tool named CREAM [72]. Empirical results Specification Mutation, faults are typically seeded into a state
for this set of C# mutation operators using the CREAM were machine or logic expressions to generate ‘specification mutants’.
reported by Dereziǹska and Szustek [71], [73]. A specification mutant is said to be killed if its output condition
6) Mutation Testing for SQL: Mutation Testing has also been is falsified. Specification Mutation can be used to find faults
applied to SQL code to detect faults in database applications. related to missing functions in the implementation or specification
The first attempt to the design of mutation operators for SQL was misinterpretation [195].
done by Chan et al. [43] in 2005. They proposed 7 SQL mutation 1) Mutation Testing for Formal Specifications: The formal
operators based on the enhanced entity-relationship model. Tuya specifications can be presented in many forms, for example
et al. [234] proposed another set of mutant operators for SQL calculus expressions, Finite State Machines (FSM), Petri Nets
query statements. This set of mutation operators is organized into and Statecharts. The earlier research work on Specification Muta-
four categories, including mutation of SQL clauses, mutation of tion considered specifications of simple logical expressions. Gopal
operators in conditions and expressions, mutation handling NULL and Budd [38], [105] considered specifications in predicate cal-
values and mutation of identifiers. They also developed a tool culus targeting the predicate structure of the program under test.
named SQLMutation that implements this set of SQL mutation A similar work applied to the refinement calculus specification
operators and an empirical evaluation concerning results using can be found in the work of Aichernig [7]. Woodward [254],
SQLMutation [233]. A development of this work targeting Java [257] investigated mutation operators for algebraic specifications.
database applications can be found in the work of Zhou and Frankl In their experiment, they applied an optimization approach to
[264]. Shahriar and Zulkernine [212] have also proposed a set of compile a specification mutant into executable code and evaluated
mutation operators to handle the full set of SQL statements from the approach to provide empirical results [255].
connection to manipulation of the database. They introduced 9 More recently, many formal techniques have been proposed to
mutation operators and implemented them in an SQL mutation specify the dynamic aspects of a software system, for example,
tool called MUSIC. Finite State Machines (FSM), Petri Nets and State charts. Fabbri
7) Mutation Testing for Aspect-Oriented Programming: et al. [88] applied Specification Mutation to validate specifica-
Aspect-Oriented Programming (AOP) is a programming paradigm tions presented as FSMs. They proposed 9 mutation operators,
that aids programmers in separation of crosscutting concerns. representing faults related to the states, events and outputs of an
Ferrari et al. [90] proposed 26 mutation operators based on a FSM. This set of mutation operators was later implemented as
generalization of faults for general Aspect-Oriented programs. an extension of the C mutation tool Proteum [85]. An empirical
These mutation operators are divided into three groups: point- evaluation of these mutation operators was reported by them [85].
cut expressions, aspect declarations and advice definitions and Hierons and Merayo [111], [112] investigated the application of
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 12

Mutation Testing to Probabilistic Finite State Machines (PFSMs). XML-based language features, for example, the OWL-S specifica-
They defined 7 mutation operators and provided an approach to tion language [140], [245] and WS-BPEL specification language
avoid equivalent mutants. Other work on EFSM mutation can also [84]. Unlike the traditional XML specification language, OWL-S
be found in the work of Batth et al. [20], Bombieri et al. [28] introduces semantics to workflow specification using an ontology
and Belli et al. [23]. specification language. In the work of Lee et al. [140], the
Statecharts are widely used for the formal specification of authors propose mutation operators for detection of semantic
complex reactive systems. Statecharts can be considered as an errors caused by the misuse of the ontology classes.
extension of FSMs, so the first set of mutation operators for 4) Mutation Testing for Networks: Protocol robustness is an
Statecharts was also proposed by Fabbri et al. [87], based on their important aspect of any network system. Sidhu and Leung [216]
previous work on FSM mutation operators. Using Fabbri et al.’s investigated fault coverage of network protocols. Based on this
Statecharts mutation operators, Yoon et al. [260] introduced a new work, Probert and Guo proposed a set of mutation operators to
test criterion, the State-based Mutation Test Criterion (SMTC). test network protocols [202]. Vigna et al. [238] applied Mutation
In the work of Trakhtenbrot [231], the author proposed new Testing to network-based intrusion detection signatures, which are
mutations to assess the quality of tests for statecharts at the used to identify malicious traffic. Jing et al. [124] built a NFSM
implementation level as well as the model level. Other work on model for protocol messages and applied Mutation Testing to this
Statechart mutation can be found in the work of Fraser et al. [95]. model using the TTCN-3 specification language. Other work on
Besides FSMs and Statecharts, Specification Mutation has been the application of Mutation Testing to State based protocols can
also applied to a variety of specification languages. For example, be found in the work of Zhang et al. [263].
Souza et al. [222], [223] investigated the application of Mutation 5) Mutation Testing for Security Policy: Mutation Testing has
Testing to the Estelle Specification language. Fabbri et al. [86] also been applied to security policies [139], [154], [165], [166],
proposed mutation operators for Petri Nets. Srivatanakul et al. [201]. Much of this research work sought to designed mutation
[225] performed an empirical study using Specification Muta- operators that inject common flaws into different types of security
tion to CSP Specifications. Olsson and Runeson [196] and Sugeta policies. For example, Xie et al. [154] applied mutation analysis to
et al. [226] proposed mutation operators for SDL. Definitions of test XACML, an Oasis standard XML syntax for defining security
mutation operators for formal specification language can be found policies. A similar approach has also been applied by Mouelhi et
in the work of Black et al. [25] and the work of Okun [195]. al. [166]. Le Traon et al. [139] introduced 8 mutation operators for
2) Mutation Testing for Running Environment: During the the Organization Based Access Control OrBAC policy. Mouelhi
process of implementing specifications, bugs might be introduced et al. [165] proposed a generic meta-model for security policy
by programmers due to insufficient knowledge of the final target formalisms. Based on this formalism, a set of mutation operators
environment. These bugs are called “environment bugs” and they was introduced to apply to all rule-based formalisms. Hwang et
can be hard to detect. Examples are the bugs caused by mem- al. proposed an approach that applies Mutation Testing to test
ory limitations, numeric limitations, value initialization, constant firewall policies [117].
value interpretation, exception handling and system errors [224].
Mutation Testing was first applied to the detection of such bugs C. Other Testing Application
by Spafford [224] in 1990. In his work, environment mutants were In addition to assessing the quality of test sets, Mutation
generated to detect integer arithmetic environmental bugs. Testing has also been used to support other testing activities, for
The idea of environment bugs was extended in 1990s by Du example test data generation and regression testing, including test
and Mathur, as many empirical studies suggested that “the envi- data prioritization and test data minimization. In this section, we
ronment plays a significant role in triggering security flaws that summarise the main work on mutation as a support to these testing
lead to security violations” [78]. As a result, Mutation Testing was activities.
also applied to the validation of security vulnerabilities. Du and 1) Test Data Generation: The main idea of mutation based
Mathur [78] defined an EAI fault mode for software vulnerability, test data generation is to generate test data that can effectively
and this model was applied to generate environmental mutants. kill mutants. Constraint-based test data generation (CBT) is one
Empirical results from the evaluation of their experiments are of the automatic test data generation techniques using Mutation
reported in [79]. Testing. It was first proposed in Offutt’s PhD work [194]. Offutt
3) Mutation Testing for Web Services: Lee and Offutt [142] suggested that there are three conditions for a test case to kill
were the first to apply Mutation Testing to Web Services. In 2001, a mutant: reachability, necessity and sufficiency. In CBT, each
they introduced an Interaction Specification Model to formalize condition for a mutant is turned into constraint. Test data that
the interactions between web components [142]. Based on this guarantees to kill this mutant can be generated by finding input
specification model, a set of generic mutation operators was values that satisfy these constraints.
proposed to mutate the XML data model. This work was later Godzilla is a test data generator that uses the CBT technique.
extended by Xu et al. [193], [259] targeting the mutation of XML It was implemented by DeMillo and Offutt [67] under the Mothra
data and they renamed it XML perturbation. Instead of mutating system. Godzilla applied control-flow analysis, symbolic evalua-
XML data directly, they perturbed XML schemas to create invalid tion and a constraint satisfaction technique to generate and solve
XML data using 7 XML schema mutation operators. A constraint- constraints for each mutant. Empirical results suggest that 90% of
based test case generation approach was also proposed and the mutants can be killed using the CBT technique for most programs
results of empirical studies were reported [259]. Another set of [68]. However, the CBT technique also suffers from some of
XML schema mutation operators was proposed by Li and Miller the drawbacks associated with symbolic evaluation. Offutt et al.
[143]. [179], [180] addressed these problems by proposing the Dynamic
There is also Web Service mutation work targeting specific Domain Reduction technique.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 13

Baudry et al. proposed an approach to automatically generate reported”. This occurs where the information is unavailable in
test data for components implemented by contract [22]. In this the literature. Table IX is sorted by the number of papers that
research work, a testing-for-trust methodology was introduced to use the subject program, so the first ten programs are the most
keep the consistency of the three component artefacts: specifica- studied subject programs in the literature on Mutation Testing.
tion, implementation and test data. Baudry et al. applied a genetic These wildly studied programs are all laboratory programs under
algorithm to generate test data. The generated test data is then 50 LoC but we also noticed that the 11th program is SPACE, a
considered as a predator which is used to validate the program non-trivial real program.
and the contract at the same time. Experimental results showed To provide an overview of the trend of empirical studies
that 75% of mutants can be can be killed using this test data on Mutation Testing to attack more challenging programs, we
generation technique. calculated the size of the largest subject program for each year.
Besides generating test data directly, Mutation Testing has also For each year on the horizontal axis, the data point in Figure 7
been applied to improve the quality of test data. Baudry et al. shows the size of the largest program considered in a mutation
[21] proposed an approach to improve the quality of test data study up to that point in time. Clearly the definition of “program
using Mutation Testing with a Bacteriological Algorithm. Smith size” can be problematic, so the figure is merely intended to
and Williams applied Mutation Testing as a guidance to test data be used as a rough indicator. There is evidence to indicate
augmentation [219]. Le Traon et al. [138] use mutation analysis that the size of the subject programs that can be handled by
to improve component contract. Xie et al. [258] applied Mutation Mutation Testing is increasing. However, caution is required. We
Testing to assist programmers in writing parametrised unit tests. found that although some empirical experiments were reported to
2) Regression testing: Test case prioritization techniques are handle large programs, some studies applied only a few mutation
one way to assist regression testing. Mutation Testing has been operators. We also counted the number of newly introduced
applied as a test case prioritization technique by Do and Gregg subject programs for each year. The results are shown in Figure
[75], [76]. Do and Gregg measured how quickly a test suite 8. The dashed line in the figure is the cumulative view of the
detects the mutant in the testing process. Testing sequences results. The number of newly used subject programs is gradually
are rescheduled based on the rate of mutant killing. Empirical increasing, which suggests a growth in practical work.
studies suggested that this automated test case prioritization can In the empirical studies, it may be more indicative to use a
effectively improve the rate of fault detection of test suites [76]. real world program rather than laboratory program. To understand
Mutation Testing has also been used to assist the test case the relationship between the use of laboratory programs and
minimization process. Test case minimization techniques aim to real world programs in mutation experiments, we have counted
reduce the size of a test set without losing much test effectiveness. each type by year. The results are shown in Figure 9. In this
Offutt et al. [173] proposed an approach named Ping-Pong. study, we consider a real world program to be either an open
The main idea is to generate mutants targeting a test criterion. source or an industry program. In Figure 9, the cumulative view
A subset of test data with the highest mutation score is then shows that the number of real world programs started increasing
selected. Empirical studies show that Ping-Pong can reduce a in 1992, while the number of laboratory programs had already
mutation adequacy test set by a mean of 33% without loss of started increasing by 1988. Figure 9 also shows the number of
test effectiveness. laboratory and real programs introduced into studies each year
In addition to the previous mentioned applications, mutation as bars. This clearly indicates that, while there are correctly
analysis has also been applied to other application domains. For more laboratory programs overall, since 2002, far more new real
example, Serrestou et al. proposed an approach to evaluate and programs than laboratory programs have been introduced. This
improve the functional validation quality of RTL in a hardware finding provides some evidence to support the claim that the
environment [210], [211]. Mutation analysis has also been used development of Mutation Testing is maturing.
to assist the evaluation of software clone detection tools [204], In our study, we found that for each research area of Mutation
[205]. Testing there is a different set of subject programs used as
benchmarks. In Table V we have summarised these benchmark
VI. E MPIRICAL E VALUATION programs. We chose five active research areas based on our
studies: Coupling effect, Selective Mutation, Weak, Strong and
Empirical study is an important aspect in the evaluation and Firm Mutation, Equivalent Mutant Detection and experiments
dissemination of any technique. In the following sections, the supporting testing, including the use of mutation analysis to select,
subject programs used in empirical studies are first summarised. minimise, prioritise and generate test data.
Empirical results on the evaluation of Mutation Testing are then
reported in detail.
B. Empirical Results
Many researchers have conducted experiments to evaluate the
A. Subject Programs effectiveness of Mutation Testing [14], [50], [61], [93], [94],
In order to investigate the empirical studies on Mutation [160], [188], [248]. These experiments can be divided into two
Testing, we have collected all the subject programs for each types: comparing mutation criteria with data flow criteria such
empirical experiment work from our repository, as shown in Table as “all-use” and comparing mutants with real faults. Table VI
IX (Table IX is located in the end of the paper). Table IX shows summarises the evaluation type and the subject programs used in
the name, size, description, the year when the subject program each of these experiments.
was first applied and the overall number of research papers that Mathur and Wong have conducted experiments to compare the
report results for this subject program. The table entry for some “all-use” criterion with mutation criteria [160], [248], [251]. In
sizes and descriptions of the subject programs are shown as “not their experiment, Mathur and Wong manually generated 30 sets
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 14

120000
250
100000 200
Reported program sizes

150
80000 100
50
60000
0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
40000

20000

0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08
Year
Fig. 7. The largest program applied for each year

180
Number of new programs
Number of new subject programs

160 Cumulative view


140
120
100
80
60
40
20
0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08
Year
Fig. 8. New programs applied for each year.

of test cases satisfying each criterion for each subject program. use” data flow. This conclusion also agreed with the results of
Empirical results suggested that mutation adequate test sets more the experiment of Frankl et al. [93], [94]
easily satisfy the “all-use” criteria than all use test sets satisfy In addition to comparing mutation analysis with other test-
mutation criteria. This result indicates mutation criteria “prob- ing criteria, there have also been empirical studies comparing
subsumes” 2 the “all-use” criteria in general. real faults and mutants. In the work of Daran and Thévenod-
Offutt et al. conduced a similar experiment using ten different Fosse [50], the authors conducted an experiment comparing real
programs [188]. The ‘cross scoring’ result also provides evidence software errors with 1st order mutants. The experiment used a
for Mathur and Wong’s probsubsumes relationship [160], [248]. safety-critical program from the civil nuclear field as the subject
In addition to comparing the two criteria with each other, Offutt program with 12 real faults and 24 generated mutants. Empirical
et al. also compared the two criteria in terms of the fault results suggested that 85% of the errors caused by mutants were
detection rate. This result showed that 16% more faults can be also produced by real faults, thereby providing evidence for the
detected using mutation adequate test sets than “all-use” test sets, Mutation Coupling Effect Hypothesis. This result also agreed
indicating that mutation criteria is “probbetter” 3 than the “all- with DeMillo and Mathur’s experiment [61]. DeMillo and Mathur
carried out an extensive study of the errors in TeX reported by
2 If a test criterion C probsumes a test criterion C , a test set which is
1 2 Knuth [61] and they demonstrated how simple mutants could
adequate to C1 is likely to be adequate to C2 [188] detect real complex errors from TeX.
3 If a test criterion C probbetter than a test criterion C , then a randomly
1 2
selected test set which satisfies C1 is more likely to detect a fault than a Andrews et al. [14] conducted an experiment comparing manu-
randomly selected test set which satisfies C2 [188] ally instrumented faults generated by experienced developers with
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 15

100
95 Laboratory Programs
90 Real World Programs
85 Cumulative view for Laboratory Programs
80 Cumulative view for Real World Programs
75
Number of programs

70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08
Year
Fig. 9. Laboratory programs vs. Real Programs

TABLE V
S UBJECT P ROGRAMS BY A PPLICATION

Application Subject Programs Reference


Coupling Effect Triangle, Find, MID [174], [175]

Selective Mutation Triangle, Find, Bubble, MID, Calendar, Euclid, Quad, [19], [167], [168], [170], [182],
Insert, Warshall, Pat, Totinfo, Schedule1, Schedule2, [190]
TCAS, Printtok1, Printtok2, Space, Replace, Banker,
Sort, Areasg, Minv, Rpcalc, Seqstr, Streql, Tretrvi, Ap-
pend, Archive, Change, Ckglob, Cmp, Command, Com-
pare, Compress, Dodash, Edit, Entab, Expand, Getcmd,
Getdef, Getfn, Getfns, Getlist, Getnum, Getone, Gtext,
Makepat, Omatch, Optpat, Spread, Subst, Translit, Un-
rotate

Weak, Strong, Firm Muta- Triangle, Find, Bubble, MID, Calendar, Euclid, Quad, [183], [184], [257]
tion Insert, Warshall, Pat, Gcd, Sort, Max index

Equivalent Mutant Triangle, Find, Bubble, MID, Calendar, Euclid, Quad, [178], [186], [187]
Insert, Warshall, Pat, Bsearch, Max, Banker, Deadlock,
Count, Dead

Testing (test case genera- Triangle, Find, Bubble, MID, Calendar, Euclid, Quad, [16], [67], [68], [75], [76], [114],
tion, prioritization, selection Insert, Warshall, Pat, Space, Bsearch, Totinfo, Sched- [146], [173], [179], [180], [250]
and reduction) ule1, Schedule2, TCAS, Printtok1, Printtok2, Replace,
Gcd, Binom, Ant, Stats Twenty-four, Conversions, Op-
erators, Xml-Security, Jmeter, JTopas, ATM, BOOK,
VirtualMeeting, MinMax, NextDate, Finance

mutants automatically generated by 4 carefully selected mutation seeded faults and machine generated mutants on fault detection
operators. In the experiment, the Siemens suite (Printtokens, Print- ability and the test prioritization order. In the test data prioritiza-
tokens2, Replace, Schedule, Schedule2, Tcas and Totinfo) and the tion study, Do and Rothermel considered several prioritization
Space program were used as subjects. Empirical results suggested techniques to improve the fault detection rate. Their analysis
that, after filtering out equivalent mutants, the remaining non- showed that for non-control test case prioritization, the use of
equivalent mutants generated from the selected mutation operators mutation can improve fault detection rates. However the results
were a good indication of the fault detection ability of a test suite. are affected by the number of mutation faults applied. In the fault
The results also suggested that the human generated faults are detection ability studies, Do and Rothermel followed Andrews et
different from the mutants; both human and auto-generated faults al.’s experimental procedure [14]. Results from 4 out of the 6
are needed for the detection of real faults. subject programs revealed a similar data spread to the work of
Do and Rothermel [75], [76] studied the effect of both hand Andrews et al. The effect of test set minimization using mutation
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 16

TABLE VI
E MPIRICAL E VALUATION OF M UTATION T ESTING

Research Evaluation Type Subject Programs


DeMillo and Mathur [61] real faults vs mutants Tex
Mathur and Wong [160], [248] all-use vs mutation criteria Find, Strmat1, Strmat2 and Textfmt
Offutt et al. [188] all-use vs mutation criteria Bub, Cal, Euclid, Find, Insert, Mid, Pat, Quad, Trityp
and Warshall
Daran and Thévenod-Fosse [50] real faults vs mutants Nuclear Reactor Safety Shutdown System
Frankl et al. [93], [94] all-use vs mutation criteria Determinant, Find1, Find2, Matinv1, Matinv2, Str-
match1, Strmatch2, Textformat.r and Transpose
Andrews et al. [14] hand seeded faults vs mutants Space, Printtokens, Printtokens2, Replace, Schedule,
Schedule2, Tcas and Totinfo
Do and Rothermel [75], [76] hand seeded faults vs mutants Ant, Xml-security, Jmeter, Jtopas, galileo and nanoxml

TABLE VIII
can be found in the work of Wong et al. [249].
C LASSIFICATION OFM UTATION T ESTING T OOLS
Despite evaluating Mutation Testing against other testing ap-
proaches, there are also experiments that use mutation analysis
to evaluate different testing approaches. For example, Andrews et Stage Overall Academic Open Commercial
Tools Tools Source Tools
al. [15] conducted an experiment to compare test data generation Tools
using control flow and data flow. Thevenod et al. [229] applied
1975-1999 8 7 0 1
mutation analysis to compare random and deterministic input 2000-present 28 19 7 2
generation techniques. Bradbury et al. [32] used mutation analysis
to evaluate traditional testing and model checking approaches on
concurrent programs.
stage of Mutation Testing development appears to have started
VII. T OOLS FOR M UTATION T ESTING from the turn of the new millennium, when the first mutation
workshop was held. There have been 28 tools implemented since
The development of Mutation Testing tools is an important
this time. In Figure 10, the dashed line shows a cumulative view
enabler for the transformation of Mutation Testing from the
of this development work. We can see that the tool development
laboratory into a practical and widely used testing technique.
trend is rapidly increasing since year 2000, indicating that re-
Without a fully automated mutation tool, Mutation Testing is
search work on Mutation Testing remains active and increasingly
unlikely to be accepted by industry. In this section, we summarise
practical.
development work on Mutation Testing tools.
In order to explore the impact of Mutation Testing within
Since the idea of Mutation Testing was first proposed in the
the open source and industrial domains, we have classified tools
1970s, many mutation tools have been built to support automated
into three classes: academic, open sources and industrial. Table
mutation analysis. In our study, we have collected information
VIII shows the number of each class over two periods; one is
concerning 36 implemented mutation tools, including the aca-
before the year 2000, the other is from the year 2000 to the
demic tools reported in our repository as well as the tools from
present. As can be seen, there are more open source and industrial
the open source and the industrial domains. Table VII summarises
tools implemented recently, indicating that Mutation Testing has
the application, publication time and any notable characteristics
gradually become a practical testing technique, embraced by both
for each tool. The detailed description of the tools can be found
the open source and industrial communities.
in the references cited in the final column of the table.
Figure 10 shows the growth in the number of tools introduced.
In Figure 10, the development work can be classified into three VIII. E VIDENCE FOR THE INCREASING IMPORTANCE OF
stages. The first stage was from 1977 to 1981. In this early stage, M UTATION T ESTING
in which the idea of Mutation Testing was first proposed, four To understand the general trend for the Mutation Testing re-
prototype experimental mutation tools were built and used to search area, we analysed the number of publications by year from
support the establishment of the fundamental theory of mutation 1977 to 2009. Consider again the results in Figure 1; there are
analysis, such as the Competent Programmer Hypothesis [3] five apparent outliers in years 1994, 2001, 2006, 2007 and 2009.
and the Coupling Effect Hypothesis [66]. The second stage was The reason for the last four years, is that there were four Mutation
from 1982 to 1999. There were four tools built in this period, Testing workshops held in 2000 (with proceedings published in
three academic tools, M OTHRA for Fortran [63], [64], P ROTEUM , 2001), 2006, 2007 and 2009. However, there is no direct evidence
TUMS for C [52], [53], [236] and one industry tool called to explain the spike in year 2004; this just appears to be an
I NSURE ++. Engineering effort had been put into M OTHRA and anomalous productive year for Mutation Testing. The reader will
P ROTEUM so that they were able to handle small real programs also notice that 1986 is unique as no publications were found. An
not just laboratory programs. As a result, these two academic interesting explanation was provided by Offutt [176]: “1986 was
tools were widely used. Most of the advanced mutation techniques when we were maximally devoted to programming Mothra. ”
were experimented on using these two tools, for example, Weak We performed a regression analysis on these data and found
Mutation [183], [184], Selective Mutation [182], [190], Mutant there is a strong positive correlation between year and the number
Sampling [159], [248] and Interface Mutation [54], [55]. The third of publications (r = 0.7858). In order to predict the trend of publi-
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 17

TABLE VII
S UMMARY OF P UBLISHED M UTATION T ESTING T OOLS

Name Application Year Character Available Reference


PIMS Fortran 1977 General No [36], [40],
[145]
EXPER Fortran 1979 General No [3], [34], [39]
CMS.1 Cobol 1980 General No [2], [108]
FMS.3 Fortran 1981 General No [228]
Mothra Fortran 1987 General Yes [63], [64]
Proteum 1.4 C 1993 Interface Mutation, Finite State Machines No [52], [53]
TUMS C 1995 Mutant Schemata Generation No [235]–[237]
Insure++ C/C++ 1998 Source Code Instrumentation (Commer- Commercially [198]
cial)
Proteum/IM 2.0 C 2001 Interface Mutation, Finite State Machines Yes [59]
Jester Java 2001 General (Open Source) Yes [163]
Pester Python 2001 General (Open Source) Yes [163]
TDS CORBA IDL 2001 Interface Mutation No [100]
Nester C# 2002 General (Open Source) Yes [220]
JavaMut Java 2002 General Yes [45]
MuJava Java 2004 Mutant Schemata, Reflection Technique Yes [151], [152],
[185]
Plextest C/C++ 2005 General (Commercial) Commercially [118]
SQLMutation SQL 2006 General Yes [233]
Certitude C/C++ 2006 General (Commercial) Commercially [42]
SESAME C, Lustre, 2006 Assembler Injection No [49]
Pascal
ExMAn C, Java 2006 TXL Yes [30]
MUGAMMA Java 2006 Remote Monitoring Yes [126]
MuClipse Java 2007 Weak Mutation, Mutant Schemata, Eclipse Yes [218]
plug-in
CSAW C 2007 Variable type optimization Yes [82], [83]
Heckle Ruby 2007 General (Open Source) Yes [206]
Jumble Java 2007 General (Open Source) Yes [221]
Testooj Java 2007 General Yes [200]
ESPT C/C++ 2008 Tabular Yes [89]
MUFORMAT C 2008 Format String Bugs No [214]
CREAM C# 2008 General No [73]
MUSIC SQL(JSP) 2008 Weak Mutation, SQL Vulnerabilities No [212]
MILU C 2008 Higher Order Mutation, Search-based Yes [123]
technique, Test harness embedding
Javalanche Java 2009 Invariant and Impact analysis Yes [106], [208]
GAmera WS-BPEL 2009 Genetic algorithm Yes [77]
MutateMe PHP 2009 General (Open Source) Yes [33]
AjMutator AspectJ 2009 General Yes [51]
JDAMA SQL(JDBC) 2009 Byte code translation Yes [264]

cations in the future, we have tried to find a trend line for this data publications on applications of Mutation Testing, development
using several common regression models: Linear, Logarithmic, work on Mutation Testing tools and related empirical studies.
Polynomial, Power, Exponential and Moving average. The dashed The goal of this separation of papers into theoretical and
line in Figure 1 is the best fit line we found. It uses a quadratic practical work is to allow us to analyse the temporal relationship
model, which achieves the highest coefficient of determination between the development of theoretical and practical research
(R2 = 0.7747). To put the Mutation Testing growth trend into a effort by the community. Figure 11 shows the overall cumulative
wider context, we also collected and plotted the publication data result. It is clear that both theoretical and practical work is
from DBLP for the subject of computer science as a whole [232]. increasing. In 2006 for the first time, the total number of practical
According to DBLP, the general growth in computer science is publications surpasses the number of theoretical publications. To
also exponential. From this analysis it is clear that Mutation take a closer look at this relationship, Figure 12 shows the number
Testing remains at least as healthy as computer science itself. of publications per year. From 1977 to 2000, there were fewer
In order to take a closer look at the growing trend of the practical publications than theoretical. From 2000 to 2009, most
research work on Mutation Testing, we have classified this of the research work appears to shift to the application area. This
work into theoretical work and practical work. The theoretical provides some evidence to suggest that the field is starting to
category includes the publications concerning the hypotheses move from foundational theory to practical application, possibly
supporting Mutation Testing, optimization techniques, techniques a sign of increasing maturity.
for reducing computational cost and techniques for the detection In the Redwine-Riddle maturation model [203], there is a trend
of equivalent mutants and surveys. The practical category includes that indicates that a technology takes about 15 to 20 years to
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 18

35 Number of new tools


Cumulative view
The number of developed tools

30

25

20

15

10

0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09
Year
Fig. 10. The number of tools introduced for each year

225
Cumulative view for Theoretical Work
200 Cumulative view for Practical Work
175
No. of publications

150
125
100
75
50
25
0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09
Year
Fig. 11. Theoretical Publications vs. Practical Publications (Cumulative view)

reach a level of maturity at which time industrial uptake takes Finally, an increasing level of maturity can also be seen in
place. Suppose we cast our attention back by 15 years to the mid the development of the empirical studies reported on Mutation
1990s. We reach a point where only approximately 25% of the Testing. For example, there is a noticeable trend for empirical
current volume of output had then been published in the literature. studies to involve more programs and to also involve bigger and
(see Figure 12). The ideas found in this early Mutation Testing more realistic programs, as can be seen in the chronological data
literature have now been implemented in practical commercial on empirical studies presented in Figure 7 and 8. However, it
Mutation Testing tools, as shown in Table VII. This observation should also be noted that more work is required on real world
suggests that the development of Mutation Testing is in line with programs and that many of our empirical evidence still rests on
Redwine and Riddle’s findings. studies of what would now be regarded as ‘toy programs’. There
Furthermore, the set of Mutation Testing systems developed in also appears to be an increasing degree of corroboration and
the laboratory now provides tooling for a great many different replication of the results reported (see Table VI).
programming language paradigms (as shown in Table VII). This
provides further evidence of maturity and offers hope that, as IX. D ISCUSSION OF UNRESOLVED PROBLEMS , BARRIERS AND
these tools mature, following the Redwine and Riddle model, we AREAS OF SUCCESS
can expect a future state–of–practice in which a wide coverage This section discusses some of the findings and conclusions
of popular programming paradigms will be covered by real world that can be drawn from this survey of the literature concerning
Mutation Testing tools. the current state of Mutation Testing. Naturally, this account is, to
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 19

35
Theoretical Work
Practical Work
30

25
No. of publications

20

15

10

0
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09
Year
Fig. 12. Theoretical Publications vs. Practical Publications

some extent, influenced by the authors’ own position on Mutation B. Barriers to be overcome
Testing. However, we have attempted to take a step back and to
summarize unresolved problems, barriers and areas of success
in an objective manner, based on the available literature and the There remains a perception — perhaps misplaced, but nonethe-
trends we have found within it. less widely held — that Mutation Testing is costly and imprac-
tical. This remains a barrier to wider academic interest in the
A. Unresolved Problems subject and also to a wider uptake within industry. We hope that
One barrier to wider application of Mutation Testing centres this survey will go some way towards addressing the remaining
on the problems associated with Equivalent Mutants. As the doubts of academics. There is plenty of evidence in this survey
survey shows, there has been a sustained interest in techniques to show that Mutation Testing is on the cusp of a rising trend
for reducing the impact of equivalent mutants. This remains an of maturity and that it is making a transition from academic to
unresolved problem. We see several possible developments along industrial application.
this line. Past work has concentrated on techniques to detect
The barriers to industrial uptake are more significant and will
equivalent mutants once they have been produced. In future,
take longer to fully overcome. The primary barriers appear to be
Mutation Testing approaches may seek to avoid their initial
those that apply to many other emergent software technologies
creation or to reduce their likelihood. Mutation Testing may be
as they make their transition from laboratory to wider practical
applied to languages that do not have equivalent mutants. Where
application. That is, a need for reliable tooling and compelling
equivalent mutants are a possibility, there will be a focus on
evidence to motivate the necessary investment of time and money
designing operators and analyzing code so that their likelihood
in such tooling.
is reduced. Of course, we should be careful not to ‘throw the
baby out with the bath water’; we seek to retain the highly As the survey shows, there is an increasingly practical trend in
valuable, so-called stubborn mutants, while filtering out those empirical work. That is, as shown in Section VI, empirical studies
that are equivalent. However, behaviourally these two classes of are increasingly focussing on non-trivial industrial subjects, rather
mutants are highly similar. than laboratory programs. In order to provide a compelling
Most work on Mutation Testing has been concerned with the body of evidence, sufficient to overcome remaining practitioner
generation of mutants. Comparatively less work has concentrated doubts, this trend will need to continue. There is also evidence
on the generation of test cases to kill mutants. Though there are that Mutation Testing tools are starting to emerge as practical
existing tools for mutant generation that are mature enough for commercial products (see Section VII). However, more tooling
commercial application, there is currently no tool that offers test is required to ensure widespread industrial uptake. Furthermore,
cases generation to kill mutants at a similar level of maturity. The there is a pressing need to address the, currently unresolved,
state of the art is therefore one in which Mutation Testing has problem of test case generation. An automated practical tool that
provided a way to assess the quality of test suites, but there has offered test case generation would be a compelling facilitator
been comparatively little work on improving the test suites, based for industrial uptake of Mutation Testing. No such tool currently
on the associated mutation analysis. We expect that, in future, exists for test data generation, but recent developments in dynamic
there will be much more work that seeks to use high quality symbolic execution [104], [209], [230] and search-based test data
mutants as a basis for generating high quality test data. However, generation [10], [135], [162] indicates that such a tool cannot be
at present, practical software test data generation for mutation test far off. The Mutation Testing community will need to ensure that
adequacy remains an unresolved problem. it does not lag behind in this trend.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 20

C. Areas of Success
As this paper has shown (see Figures 1, 3, 11, 12 and
Tables VII, VIII), work on Mutation Testing is growing at a rapid
rate and tools and techniques are reaching a level of maturity not
previously witnessed in this field. There has also been a great deal
of work to extend Mutation Testing to new languages, paradigms
and to find new domains of application (see Figures 5, 7, 8, 9 and
Tables V, IX). Based on this existing success, we can expect that
the future will bring many more applications. There may shortly
be few widely–used programming languages to which Mutation
Testing has yet to be applied.
In all aspects of testing there is a trade-off to be arrived at
that balances the cost of test effort and the value of fault finding
ability; a classic tension between effort and effectiveness. Tradi-
tionally, Mutation Testing has been seen to be a rather expensive
technique that offers high value. However, more recently, authors
have started to develop techniques that reduce costs, without over-
compromising on quality. This has led to successful techniques
for reducing mutation effort without significant reduction in test
effectiveness (as described in Section III).

X. C ONCLUSION AND F UTURE W ORK


This paper has provided a detailed survey and analysis of
trends and results on Mutation Testing. The paper covers theories,
optimization techniques, equivalent mutant detection, applica-
tions, empirical studies and mutation tools. There has been much
optimization to reduce the cost of the Mutation Testing process.
From the data we collected from and about the Mutation Testing
literature, our analysis reveals an increasingly practical trend in
the subject.
We also found evidence that there is an increasing number
of new applications. There are more, larger and more realistic
programs that can be handled by Mutation Testing. Recent trends
also include the provision of new open source and industrial tools.
These findings provide evidence to support the claim that the field
of Mutation Testing is now reaching a mature state.
Recent work has tended to focus on more elaborate forms
of mutation than on the relatively simple faults that have been
previously considered. There is an interest in the semantic effects
of mutation, rather than the syntactic achievement of a mutation.
This migration from the syntactic achievement of mutation to the
desired semantic effect has raised interest in higher order mutation
to generate subtle faults and to find those mutations that denote
real faults. We hope the future will see a further coming of age,
with the generation of more realistic mutants and the test cases
to kill them and with the provision of practical tooling to support
both.

ACKNOWLEDGEMENTS
The authors benefitted from many discussions with researchers
and practitioners in the Mutation Testing community, approxi-
mately fifty of whom kindly provided very helpful comments
and feedback on an earlier draft of this analytical survey. We
are very grateful to these colleagues for their time and expertise
though we are not able to name them all individually. This work
is part funded by EPSRC grants EP/G060525, EP/F059442 and
EP/D050863 and by EU grant IST-33742. Yue Jia is additionally
supported by a grant from the ORSA scheme. The authors are
also grateful to Lorna Anderson and Kathy Harman for additional
proof reading.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 21

TABLE IX: Programs used in Empirical Studies

Name Size Description First Use No. of Uses


Triangle 30 Loc Return the type of a triangle 1978 25
Find 30 Loc Patition the input array by order using input index 1988 22
Bubble 10 Loc Bubble sort algorithm 1988 18
MID 15 Loc Return the mid value of three integers 1989 16
Calendar/Days 30 Loc Compute number of days between input days 1988 15
Euclid 10 Loc Euclide’s algorithm to find the greatest common divisor of two 1991 15
intergers
Quad 10 Loc Find the root of a quadratic equation 1991 14
Insert 15 Loc Insert sort algorithm 1991 13
Warshall 10 Loc Calculates the ttransitive closure of Boolean matrix. 1991 12
Pat 20 Loc Decide if a pattern is in a subject 1991 10
SPACE 6000 Loc European Space Agency program 1997 9
Bsearch 20 Loc Binary search on an interger array 1992 6
Totinfo 350 Loc Information measure 1998 6
Schedule1 300 Loc Priority scheduler 1998 6
Schedule2 300 Loc Priority scheduler 1998 6
TCAS 140 Loc Altitude separation 1998 6
Printtok1 400 Loc Lexical analyzer 1998 6
Printtok2 480 Loc Lexical analyzer 1998 6
Replace 510 Loc Pattern replacement 1998 6
Max 5 Loc Return the greater from the inputs 1978 4
STRMAT 20 Loc Search String based on input pattern 1993 4
TEXTFMT 30 Loc Text formating program 1993 4
Banker 40 Loc Deadlock avoid algorithm 1994 4
Cal 160 Loc Print a calendar for a specified year or month 1994 4
Checkeq 90 Loc Report missing or unbalanced delimiters and .EQ / .EN pairs 1994 4
Comm 145 Loc Select or reject lines common to two sorted files 1994 4
Look 135 Loc Find words in the system dictionary or lines in a sorted list 1994 4
Uniq 85 Loc Report or remove adjacent duplicate lines 1994 4
Gcd 55 Loc Compute greatest common divisor of an array 1988 3
Sort 20 Loc Sort algorithm foran array 1988 3
Binom 6 Func Solves binomial equation 1994 3
Col 275 Loc Filter reverse paper motions from nroff output for display on a 1994 3
terminal
Sort(Linux) 842 Loc Sort and merge files 1994 3
Spline 289 Loc Interpolate smooth curve based on given data 1994 3
Tr 100 Loc Translate characters 1994 3
Ant 21,000 Loc A build tool from Apache 2002 3
Determinant 60 Loc Matrix manipulation programs based on LU decomposition 1994 2
Matinv 30 Loc Matrix manipulation programs based on LU decomposition 1994 2
Transpose 80 Loc Transpose routine of a sparse-matrix package 1994 2
Deadlock 50 Loc Check for deadlock 1994 2
Stats 4 Func Not reported 1994 2
Twenty-four 2 Func Not reported 1994 2
Conversions 8 Func Not reported 1994 2
Operators 4 Func Not reported 1994 2
Crypt 120 Loc Encrypt and decrypt a file using a user supplied password 1994 2
Bisect 20 Loc Not reported 1996 2
NewTon 15 Loc Not reported 1996 2
MRCS Not reported Mars Robot Communication System 2004 2
Xml-Security 143 Class Implements security XML 2005 2
Jmeter 389 Class A Java desktop application designed to load test functional 2005 2
behavior and measure performance
JTopas 50 Class A java library used for parsing text data 2005 2
ATM 5500 Loc The ATM component are ValidatePin 2005 2
Tetris Not reported AspectJ benchmark 2006 2
Max index 15 Loc Find the max value in the input array 1988 1
NASA’s planetary Not reported NASA’s planetary lander control software 1992 1
lander control
software
QCK Not reported Non-recurisive interger quicksort 1992 1
Gold Version G 2000 Loc A battle simulation software 1992 1
Count 10 Loc Not reported 1994 1
Dead 10 Loc Not reported 1994 1
TCAS Not reported Air craft avoid colision system 1994 1
Continued on next page
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 22

Table IX – continued from previous page


Name Size Description First Use No. of Uses
STU 15 Func A part of a nuclear reactor safety shutdown system that period- 1996 1
ically scans the position of the reactor’s control rods.
DIV/MOD Not reported Not reported 1996 1
EBC 10 Loc Not reported 1996 1
Search 14 Nod Not reported 1997 1
Secant 9 Nod Not reported 1997 1
State chart of Citizen Not reported State chart of Citizen watch 1999 1
watch
Queue Not reported ADS class library 1999 1
Dequeue Not reported ADS class library, double-ended queue 1999 1
PriorityQueue Not reported ADS class library, priority queue 1999 1
Areasg 50 Loc Calculates the areas of the segments formed by a rectangle 1999 1
inscribed in a circle
Minv 44 Loc Computes the inverse ofthe square N by N matrix A 1999 1
Rpcalc 55 Loc Calculates the value of a reverse polish expression using a stack 1999 1
Seqstr 70 Loc Locate sequences of integers within an input array and copies 1999 1
them to an output array
Streql 45 Loc Compares two strings after replacing consecutive white space 1999 1
characters with asingle space
Tretrv 55 Loc Performs an in-order traversal of a binary tree of integers to 1999 1
produce a sequence of integers
Alternating-bit pro- Not reported Estelle specification Alternating-bit protocol 2000 1
tocol
Append 15 Loc A component of a text editor 2001 1
Archive 15 Loc A component of a text editor 2001 1
Change 15 Loc A component of a text editor 2001 1
Ckglob 25 Loc A component of a text editor 2001 1
Cmp 15 Loc A component of a text editor 2001 1
Command 70 Loc A component of a text editor 2001 1
Compare 20 Loc A component of a text editor 2001 1
Compress 15 Loc A component of a text editor 2001 1
Dodash 15 Loc A component of a text editor 2001 1
Edit 25 Loc A component of a text editor 2001 1
Entab 20 Loc A component of a text editor 2001 1
Expand 15 Loc A component of a text editor 2001 1
Getcmd 30 Loc A component of a text editor 2001 1
Getdef 30 Loc A component of a text editor 2001 1
Getfn 10 Loc A component of a text editor 2001 1
Getfns 25 Loc A component of a text editor 2001 1
Getlist 20 Loc A component of a text editor 2001 1
Getnum 20 Loc A component of a text editor 2001 1
Getone 25 Loc A component of a text editor 2001 1
Gtext 15 Loc A component of a text editor 2001 1
Makepat 30 Loc A component of a text editor 2001 1
Omatch 35 Loc A component of a text editor 2001 1
Optpat 15 Loc A component of a text editor 2001 1
Spread 20 Loc A component of a text editor 2001 1
Subst 35 Loc A component of a text editor 2001 1
Translit 35 Loc A component of a text editor 2001 1
Unrotate 30 Loc A component of a text editor 2001 1
LogServiceProvider 230 Loc An abstract class which is extended by classes providing logging 2001 1
services.
Print Writer Log Ser- 85 Loc Used for writing textual log messages to a print stream (for 2001 1
vice Provider example, to the console)
Logger 170 Loc Provides the central control for the PSK logging service such 2001 1
as registering multiple log service providers to be operative
concurrently
LogMessage 150 Loc A Message format to be logged by the logging service 2001 1
LogException 55 Loc Base exception class for exceptions thrown by the logger and 2001 1
log service providers
Junit 1,500 Loc A unit testing framework 2002 1
GraphPath 150 Loc Finds the shortest path and distance between specified nodes in 2002 1
a directed graph
Paint 330 Loc Calculates the amount of paint needed to paint a hous 2002 1
MazeGame 1,600 Loc A game that involves finding a rescuing a hostage in a maze 2002 1
Continued on next page
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 23

Table IX – continued from previous page


Name Size Description First Use No. of Uses
Specification of elec- Specification of electrionic purse 2003 1
trionic purse
Parking Garage sys- 12 Class Java 2004 1
tem
Video shop manager 17 Class Java 2004 1
EJB Trading Not reported An EJB trading Component 2004 1
RSDIMU Not reported The application was part of the navigation system in an aircraft 2005 1
or spacecraft
Roots Not reported Determines whether a quadratic equation has real roots or not 2005 1
Calculate Not reported Calculates sum, product and average of the inputs 2005 1
BAMean Not reported Calculates mean of the input and both averages of numbers 2005 1
below and above mean
SCMSA Not reported Application defined by the Web Services Interoperability Orga- 2005 1
nization
BOOK 250 Loc An application between the diagnosis accuracy and the DBB 2006 1
sizes
VirtualMeeting 1500 Loc A server that simulates business meetings over network 2006 1
Nunit 20,000 Loc A .NET unit test application 2006 1
Nhibernate 100,000 Loc Library for object-relational mapping dedicated for .NET 2006 1
Nant 80, 000 Loc .Net build tool 2006 1
System.XML 100,000 Loc The Mono class libraries 2006 1
Assign value Not reported A safety-critical software component of the DARTs 2006 1
Vending Machine 50L Loc A vending maching example 2006 1
Sudoku 3360 Loc A puzzle board game 2006 1
Polynomial Solver 450 Loc A Polynomial solver 2006 1
MinMax 10 Loc Return the maximum and minimum elements of an interger 2006 1
array
Field 65 Loc org.apache.bcel.classfile 2006 1
BranchHandle 80 Loc org.apache.bcel.generic 2006 1
String Representa- 190 Loc org.apache.bcel.verifier.statics 2006 1
tion
Pass2Verifier 1000 Loc org.apache.bcel.verifier.statics 2006 1
ConstantPoolGen 405 Loc org.apache.bcel.generic 2006 1
LocalVariable 145 Loc org.apache.bcel.classfile 2006 1
ClassPath 250 Loc org.apache.bcel.until 2006 1
IntructionList 560 Loc org.apache.bcel.generic 2006 1
JavaClass 465 Loc org.apache.bcel.classfile 2006 1
CodeExceptionGen 120 Loc org.apache.bcel.generic 2006 1
LocalVariables 95 Loc org.apache.bcel.structurals 2006 1
NextDate 70 Loc Determines the date of the next input day 2007 1
TicketsOrderSim 75 Loc A simulation program in which agents sell airline tickets 2007 1
LinkedList 300 Loc A program that has two threads adding elements to a shared 2007 1
linked list
BufWriter 213 Loc A simulation program that contains a number of threads that 2007 1
write to a buffer and one thread that reads from the buffer
AccountProgram 145 Loc A banking simulation program where threads are responsible 2007 1
for managing accounts
Finance 5500 Loc A reuses interfaces provided by an open source Java library 2007 1
MoneyJar.jar
iTrust 2630 Loc A web-based healthcare application 2007 1
Bean Not reported AspectJ benchmark suites 2008 1
NullCheck Not reported AspectJ benchmark suites 2008 1
Cona-sim Not reported AspectJ benchmark suites 2008 1
Spring.NET 100,000 Loc An environment for programs execution 2008 1
Castle.DynamicProxy 6,600 Loc A library for implementation of the Proxy design pattern 2008 1
Castle.Core 6,200 Loc Comprises the basic classes used in Castle projects 2008 1
Castle.ActiveRecord 21,000 Loc Implements the ActiveRecord design pattern 2008 1
Adapdev 68,000 Loc Extends the standard library of the .NET environment 2008 1
Ncover 4,300 Loc A tool for the quality analysis of the source code in .NET 2008 1
programs
CruiseControl 31,300 Loc A server supporting a continuous integration of .NET programs 2008 1
Pprotection 220 Loc Password Protection controls a reserved area 2008 1
Hhorse MP3 170 Loc Manages MP3 audio files 2008 1
PHPP.Protect 1,300 Loc Protects files 2008 1
AmyQ 200 Loc Control a FAQ System 2008 1
EasyPassword 490 Loc Manages password 2008 1
Continued on next page
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 24

Table IX – continued from previous page


Name Size Description First Use No. of Uses
Show Pictures 1140 Loc A mini Web portal 2008 1
Administrator 1400 Loc Controls and administers reserved area 2008 1
Cmail 720 Loc Sends email 2008 1
Workflow 7500 Loc Manages a workflow system 2008 1
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 25

R EFERENCES [21] B. Baudry, F. Fleurey, J.-M. Jezequel, and Y. Le Traon, “Genes


and Bacteria for Automatic Test Cases Optimization in the .NET
[1] R. Abraham and M. Erwig, “Mutation Operators for Spreadsheets,” Environment,” in Proceedings of the 13th International Symposium on
IEEE Transactions on Software Engineering, vol. 35, no. 1, pp. 94– Software Reliability Engineering (ISSRE’02), Annapolis, Maryland, 12-
108, January-February 2009. 15 November 2002, pp. 195–206.
[2] A. T. Acree, “On Mutation,” PhD Thesis, Georgia Institute of Tech- [22] B. Baudry, V. Le Hanh, J.-M. Jézéquel, and Y. Le Traon, “Trustable
nology, Atlanta, Georgia, 1980. Components: Yet Another Mutation-Based Approach,” in Proceedings
[3] A. T. Acree, T. A. Budd, R. A. DeMillo, R. J. Lipton, and F. G. Say- of the 1st Workshop on Mutation Analysis (MUTATION’00), published
ward, “Mutation Analysis,” Georgia Institute of Technology, Atlanta, in book form, as Mutation Testing for the New Century. San Jose,
Georgia, Technique Report GIT-ICS-79/08, 1979. California, 6-7 October 2001, pp. 47–54.
[4] K. Adamopoulos, “Search Based Test Selection and Tailored Muta- [23] F. Belli, C. J. Budnik, and W. E. Wong, “Basic Operations for
tion,” Masters Thesis, King’s College London, London, Uk, 2009. Generating Behavioral Mutants,” in Proceedings of the 2nd Workshop
[5] K. Adamopoulos, M. Harman, and R. M. Hierons, “How to Over- on Mutation Analysis (MUTATION’06). Raleigh, North Carolina:
come the Equivalent Mutant Problem and Achieve Tailored Selective IEEE Computer Society, 2006, p. 9.
Mutation Using Co-evolution,” in Proceedings of the Genetic and [24] J. Bieman, S. Ghosh, and R. T. Alexander, “A Technique for Mutation
Evolutionary Computation Conference (GECCO’04), ser. LNCS, vol. of Java Objects,” in Proceedings of the 16th IEEE International
3103. Seattle, Washington, USA: Springer, 26th-30th, June 2004, pp. Conference on Automated Software Engineering (ASE’01), San Diego,
1338–1349. California, 26-29 November 2001, p. 337.
[6] H. Agrawal, R. A. DeMillo, B. Hathaway, W. Hsu, W. Hsu, E. W. [25] P. E. Black, V. Okun, and Y. Yesha, “Mutation of Model Checker
Krauser, R. J. Martin, A. P. Mathur, and E. Spafford, “Design of Mutant Specifications for Test Generation and Evaluation,” in Proceedings of
Operators for the C Programming Language,” Purdue University, West the 1st Workshop on Mutation Analysis (MUTATION’00), published
Lafayette, Indiana, Technique Report SERC-TR-41-P, March 1989. in book form, as Mutation Testing for the New Century. San Jose,
[7] B. K. Aichernig, “Mutation Testing in the Refinement Calculus,” California, 6-7 October 2001, pp. 14–20.
Formal Aspects of Computing, vol. 15, no. 2-3, pp. 280–295, November [26] B. Bogacki and B. Walter, “Evaluation of Test Code Quality with
2003. Aspect-Oriented Mutations,” in Proceedings of the 7th International
[8] B. K. Aichernig and C. C. Delgado, “From Faults Via Test Purposes Conference on eXtreme Programming and Agile Processes in Software
to Test Cases: On the Fault-Based Testing of Concurrent Systems,” Engineering (XP’06), ser. LNCS, vol. 4044, 2006, Oulu, 17-22 June
in Proceedings of the 9th International Conference on Fundamental 2006, pp. 202–204.
Approaches to Software Engineering (FASE’06), ser. LNCS, vol. 3922. [27] B. Bogacki and B. Walter, “Aspect-oriented Response Injection: an
Vienna, Austria: Springer, 27-28 March 2006, pp. 324–338. Alternative to Classical Mutation Testing,” in Software Engineering
[9] R. T. Alexander, J. M. Bieman, S. Ghosh, and B. Ji, “Mutation of Techniques: Design for Quality, ser. IFIP, vol. 227, 2007, pp. 273–282.
Java Objects,” in Proceedings of the 13th International Symposium on [28] N. Bombieri, F. Fummi, and G. Pravadelli, “A Mutation Model for
Software Reliability Engineering (ISSRE’02). Annapolis, Maryland: the SystemC TLM2.0 Communication Interfaces,” in Proceedings of
IEEE Computer Society, 12-15 November 2002, pp. 341–351. the Conference on Design, Automation and Test in Europe (DATE’08),
[10] S. Ali, L. C. Briand, H. Hemmati, and R. K. Panesar-Walawege, “A Munich, Germany, 10-14 March 2008, pp. 396–401.
Systematic Review of the Application and Empirical Investigation of [29] J. H. Bowser, “Reference Manual for Ada Mutant Operators,” Georiga
Search-Based Test-Case Generation,” IEEE Transactions on Software Institute of Technology, Atlanta, Georgia, Technique Report GIT-
Engineering, To appear. SERC-88/02, 1988.
[11] P. Ammann and J. Offutt, Introducation to Software Testing. Cam- [30] J. S. Bradbury, J. R. Cordy, and J. Dingel, “ExMAn: A Generic and
bridge University Press, 2008. Customizable Framework for Experimental Mutation Analysis,” in Pro-
[12] P. Anbalagan and T. Xie, “Efficient Mutant Generation for Mutation ceedings of the 2nd Workshop on Mutation Analysis (MUTATION’06).
Testing of Pointcuts in Aspect-Oriented Programs,” in Proceedings of Raleigh, North Carolina: IEEE Computer Society, November 2006, pp.
the 2nd Workshop on Mutation Analysis (MUTATION’06). Raleigh, 57–62.
North Carolina: IEEE Computer Society, November 2006, p. 3. [31] J. S. Bradbury, J. R. Cordy, and J. Dingel, “Mutation Operators for
[13] P. Anbalagan and T. Xie, “Automated Generation of Pointcut Mutants Concurrent Java (J2SE 5.0),” in Proceedings of the 2nd Workshop on
for Testing Pointcuts in AspectJ Programs,” in Proceedings of the Mutation Analysis (MUTATION’06). Raleigh, North Carolina: IEEE
19th International Symposium on Software Reliability Engineering Computer Society, November 2006, pp. 83–92.
(ISSRE’08). Redmond, Washingto: IEEE Computer Society, 11-14 [32] J. S. Bradbury, J. R. Cordy, and J. Dingel, “Comparative Assessment of
November 2008, pp. 239–248. Testing and Model Checking Using Program Mutation,” in Proceedings
[14] J. H. Andrews, L. C. Briand, and Y. Labiche, “Is Mutation an of the 3rd Workshop on Mutation Analysis (MUTATION’07), published
Appropriate Tool for Testing Experiments?” in Proceedings of the 27th with Proceedings of the 2nd Testing: Academic and Industrial Confer-
International Conference on Software Engineering (ICSE’05), St Louis, ence Practice and Research Techniques (TAIC PART’07). Windsor,
Missouri, 15-21 May 2005, pp. 402 – 411. UK: IEEE Computer Society, 2007, pp. 210–222.
[15] J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin, “Using [33] P. Brady, “MutateMe,” http://github.com/padraic/mutateme/tree/master,
Mutation Analysis for Assessing and Comparing Testing Coverage 2007.
Criteria,” IEEE Transactions on Software Engineering, vol. 32, no. 8, [34] T. A. Budd, “Mutation Analysis of Program Test Data,” PhD Thesis,
pp. 608–624, August 2006. Yale University, New Haven, Connecticut, 1980.
[16] K. Ayari, S. Bouktif, and G. Antoniol, “Automatic Mutation Test Input [35] T. A. Budd and D. Angluin, “Two Notions of Correctness and Their
Data Generation via Ant Colony,” in Proceedings of the Genetic and Relation to Testing,” Acta Informatica, vol. 18, no. 1, pp. 31–45, March
Evolutionary Computation Conference (GECCO’07), London, Eng- 1982.
land, 7-11 July 2007, pp. 1074–1081. [36] T. A. Budd, R. A. DeMillo, R. J. Lipton, and F. G. Sayward, “The
[17] J. S. Baekken and R. T. Alexander, “A Candidate Fault Model for Design of a Prototype Mutation System for Program Testing,” in
AspectJ Pointcuts,” in Proceedings of the 17th International Symposium Proceedings of the AFIPS National Computer Conference, vol. 74.
on Software Reliability Engineering (ISSRE’06). Raleigh, North Anaheim, New Jersey: ACM, 5-8 June 1978, pp. 623–627.
Carolina: IEEE Computer Society, 7-10 November 2006, pp. 169–178. [37] T. A. Budd, R. A. DeMillo, R. J. Lipton, and F. G. Sayward,
[18] D. Baldwin and F. G. Sayward, “Heuristics for Determining Equiva- “Theoretical and Empirical Studies on Using Program Mutation to Test
lence of Program Mutations,” Yale University, New Haven, Connecti- the Functional Correctness of Programs,” in Proceedings of the 7th
cut, Research Report 276, 1979. ACM SIGPLAN-SIGACT Symposium on Principles of Programming
[19] E. F. Barbosa, J. C. Maldonado, and A. M. R. Vincenzi, “Toward the Languages (POPL’80), Las Vegas, Nevada, 28-30 January 1980, pp.
determination of sufficient mutant operators for C,” Software Testing, 220–233.
Verification and Reliability, vol. 11, no. 2, pp. 113–136, May 2001. [38] T. A. Budd and A. S. Gopal, “Program Testing by Specification
[20] S. S. Batth, E. R. Vieira, A. R. Cavalli, and M. U. Uyar, “Specification Mutation,” Computer Languages, vol. 10, no. 1, pp. 63–73, 1985.
of Timed EFSM Fault Models in SDL,” in Proceedings of the 27th IFIP [39] T. A. Budd, R. Hess, and F. G. Sayward, “EXPER Implementor’s
WG 6.1 International Conference on Formal Techniques for Networked Guide,” Yale University, New Haven, Connecticut, Technique Report,
and Distributed Systems (FORTE’07), ser. LNCS, vol. 4574. Tallinn, 1980.
Estonia: Springer, 26-29 June 2007, pp. 50–65. [40] T. A. Budd and F. G. Sayward, “Users Guide to the Pilot Mutation
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 26

System,” Yale University, New Haven, Connecticut, Technique Report [62] R. A. DeMillo, “Test Adequacy and Program Mutation,” in Proceed-
114, 1977. ings of the 11th International Conference on Software Engineering
[41] R. H. Carver, “Mutation-Based Testing of Concurrent Programs,” in (ICSE’89), Pittsburgh, Pennsylvania, 15-18 May 1989, pp. 355–356.
Proceedings of the IEEE International Test Conference on Designing, [63] R. A. DeMillo, D. S. Guindi, K. N. King, and W. M. McCracken,
Testing, and Diagnostics, Baltimore, Maryland, 17-21 October 1993, “An Overview of the Mothra Software Testing Environment,” Purdue
pp. 845–853. University, West Lafayette, Indiana, Technique Report SERC-TR-3-P,
[42] Cetress, “Certitude,” http://www.certess.com/product/, 2006. 1987.
[43] W. K. Chan, S. C. Cheung, and T. H. Tse, “Fault-Based Testing [64] R. A. DeMillo, D. S. Guindi, K. N. King, W. M. McCracken, and
of Database Application Programs with Conceptual Data Model,” in A. J. Offutt, “An Extended Overview of the Mothra Software Testing
Proceedings of the 5th International Conference on Quality Software Environment,” in Proceedings of the 2nd Workshop on Software Testing,
(QSIC’05), Melbourne, Australia, 19 -20 September 2005, pp. 187– Verification, and Analysis (TVA’88). Banff Alberta,Canada: IEEE
196. Computer society, July 1988, pp. 142–151.
[44] P. Chevalley, “Applying Mutation Analysis for Object-oriented Pro- [65] R. A. DeMillo, E. W. Krauser, and A. P. Mathur, “Compiler-Integrated
grams Using a Reflective Approach,” in Proceedings of the 8th Asia- Program Mutation,” in Proceedings of the 5th Annual Computer
Pacific Software Engineering Conference (APSEC 01), Macau, China, Software and Applications Conference (COMPSAC’91). Tokyo, Japan:
4-7 December 2001, p. 267. IEEE Computer Society Press, September 1991, pp. 351–356.
[45] P. Chevalley and P. Thévenod-Fosse, “A Mutation Analysis Tool for [66] R. A. DeMillo, R. J. Lipton, and F. G. Sayward, “Hints on Test Data
Java Programs,” International Journal on Software Tools for Technol- Selection: Help for the Practicing Programmer,” Computer, vol. 11,
ogy Transfer, vol. 5, no. 1, pp. 90–103, November 2002. no. 4, pp. 34–41, April 1978.
[46] B. Choi, “Software Testing Using High Performance Computers,” PhD [67] R. A. DeMillo and A. J. Offutt, “Constraint-Based Automatic Test
Thesis, Purdue University, West Lafayette, Indiana, July 1991. Data Generation,” IEEE Transactions on Software Engineering, vol. 17,
[47] B. Choi and A. P. Mathur, “High-performance Mutation Testing,” no. 9, pp. 900–910, September 1991.
Journal of Systems and Software, vol. 20, no. 2, pp. 135–152, February [68] R. A. DeMillo and A. J. Offutt, “Experimental Results From an
1993. Automatic Test Case Generator,” ACM Transactions on Software En-
[48] W. M. Craft, “Detecting Equivalent Mutants Using Compiler Optimiza- gineering and Methodology, vol. 2, no. 2, pp. 109–127, April 1993.
tion Techniques,” Masters Thesis, Clemson University, Clemson, South [69] A. Derezińska, “Object-oriented Mutation to Assess the Quality of
Carolina, September 1989. Tests,” in Proceedings of the 29th Euromicro Conference, Belek,
[49] Y. Crouzet, H. Waeselynck, B. Lussier, and D. Powell, “The SESAME Turkey, 1-6 September 2003, pp. 417– 420.
Experience: from Assembly Languages to Declarative Models,” in Pro- [70] A. Derezińska, “Advanced Mutation Operators Applicable in C# Pro-
ceedings of the 2nd Workshop on Mutation Analysis (MUTATION’06). grams,” Warsaw University of Technology, Warszawa, Poland, Tech-
Raleigh, North Carolina: IEEE Computer Society, November 2006, nique Report, 2005.
p. 7. [71] A. Derezińska, “Quality Assessment of Mutation Operators Dedicated
[50] M. Daran and P. Thévenod-Fosse, “Software Error Analysis: A Real for C# Programs,” in Proceedings of the 6th International Conference
Case Study Involving Real Faults and Mutations,” ACM SIGSOFT on Quality Software (QSIC’06), Beijing, China, 27-28 October 2006.
Software Engineering Notes, vol. 21, no. 3, pp. 158–177, May 1996. [72] A. Derezińska and A. Szustek, “CREAM- A System for Object-
[51] R. Delamare, B. Baudry, and Y. Le Traon, “AjMutator: A Tool For The Oriented Mutation of C# Programs,” Warsaw University of Technology,
Mutation Analysis Of AspectJ Pointcut Descriptors,” in Proceedings of Warszawa, Poland, Technique Report, 2007.
the 4th International Workshop on Mutation Analysis (MUTATION’09), [73] A. Derezińska and A. Szustek, “Tool-Supported Advanced Mutation
published with Proceedings of the 2nd International Conference on Approach for Verification of C# Programs,” in Proceedings of the
Software Testing, Verification, and Validation Workshops. Denver, 3th International Conference on Dependability of Computer Sys-
Colorado: IEEE Computer Society, 1-4 April 2009, pp. 200–204. tems (DepCoS-RELCOMEX’08), Szklarska Porêba, Poland, 26-28 June
[52] M. E. Delamaro, “Proteum - A Mutation Analysis Based Testing 2008, pp. 261–268.
Environmen,” Masters Thesis, University of São Paulo, Sao Paulo, [74] W. Ding, “Using Mutation to Generate Tests from Specifications,”
Brazil, 1993. Masters Thesis, George Mason University, Fairfax, VA, 2000.
[53] M. E. Delamaro and J. C. Maldonado, “Proteum-A Tool for the [75] H. Do and G. Rothermel, “A Controlled Experiment Assessing Test
Assessment of Test Adequacy for C Programs,” in Proceedings of the Case Prioritization Techniques via Mutation Faults,” in Proceedings
Conference on Performability in Computing Systems (PCS’96), New of the 21st IEEE International Conference on Software Maintenance
Brunswick, New Jersey, July 1996, pp. 79–95. (ICSM’05), Budapest, Hungary, 25-30 September 2005, pp. 411–420.
[54] M. E. Delamaro and J. C. Maldonado, “Interface Mutation: Assessing [76] H. Do and G. Rothermel, “On the Use of Mutation Faults in Empirical
Testing Quality at Interprocedural Level,” in Proceedings of the 19th Assessments of Test Case Prioritization Techniques,” IEEE Transac-
International Conference of the Chilean Computer Science Society tions on Software Engineering, vol. 32, no. 9, pp. 733–752, September
(SCCC’99), Talca, Chile, 11-13 November 1999, pp. 78–86. 2006.
[55] M. E. Delamaro, J. C. Maldonado, and A. P. Mathur, “Integration [77] J. J. Domı́nguez-Jiménez, A. Estero-Botaro, and I. Medina-Bulo, “A
Testing Using Interface Mutation,” in Proceedings of the seventh Framework for Mutant Genetic Generation for WS-BPEL,” in Proceed-
International Symposium on Software Reliability Engineering (ISSRE ings of the 35th Conference on Current Trends in Theory and Practice
’96), White Plains, New York, 30 October - 02 November 1996, pp. of Computer Science, ser. LNCS, vol. 5404. Spindleruv Mlyn, Czech
112–121. Republic: Springer, January 2009, pp. 229 – 240.
[56] M. E. Delamaro, J. C. Maldonado, and A. P. Mathur, “Interface [78] W. Du and A. P. Mathur, “Vulnerability Testing of Software System
Mutation: An Approach for Integration Testing,” IEEE Transactions Using Fault Injection,” Purdue University, West Lafayette, Indiana,
on Software Engineering, vol. 27, no. 3, pp. 228–247, May 2001. Technique Report COAST TR 98-02, 1998.
[57] M. E. Delamaro, J. C. Maldonado, A. Pasquini, and A. P. Mathur, “In- [79] W. Du and A. P. Mathur, “Testing for Software Vulnerability Using
terface Mutation Test Adequacy Criterion: An Empirical Evaluation,” Environment Perturbation,” in Proceeding of the International Confer-
State University of Maringá, Parana, Brasil, Technique Report, 2000. ence on Dependable Systems and Networks (DSN’00), New York, NY,
[58] M. E. Delamaro, J. C. Maldonado, A. Pasquini, and A. P. Mathur, “In- 25-28 June 2000, pp. 603–612.
terface Mutation Test Adequacy Criterion: An Empirical Evaluation,” [80] L. du Bousquet and M. Delaunay, “Mutation Analysis for Lustre pro-
Empirical Software Engineering, vol. 6, no. 2, pp. 111–142, June 2001. grams: Fault Model Description and Validation,” in Proceedings of the
[59] M. E. Delamaro, J. C. Maldonado, and A. Vincenzi, “Proteum/IM 2.0: 3rd Workshop on Mutation Analysis (MUTATION’07), published with
An Integrated Mutation Testing Environment,” in Proceedings of the 1st Proceedings of the 2nd Testing: Academic and Industrial Conference
Workshop on Mutation Analysis (MUTATION’00), published in book Practice and Research Techniques (TAIC PART’07). Windsor, UK:
form, as Mutation Testing for the New Century. San Jose, California, IEEE Computer Society, 10-14 September 2007, pp. 176–184.
6-7 October 2001, pp. 91–101. [81] L. du Bousquet and M. Delaunay, “Using Mutation Analysis to
[60] R. A. DeMillo, “Program Mutation: An Approach to Software Testing,” Evaluate Test Generation Strategies in a Synchronous Context,” in
Georgia Institute of Technology, Technical Report, 1983. Proceedings of the 2nd International Conference on Software Engi-
[61] R. A. DeMillo and A. P. Mathur, “On the Use of Software Artifacts neering Advances (ICSEA’07), Cap Esterel, French Riviera, France,
to Evaluate the Effectiveness of Mutation Analysis in Detecting Errors 25-31 August 2007, p. 40.
in Production Software,” Purdue University, West Lafayette, Indiana, [82] Ellims, “Csaw,” http://www.skicambridge.com/papers/Csaw v1 files.html,
Technique Report SERC-TR-92-P, 1992. 2007.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 27

[83] M. Ellims, D. C. Ince, and M. Petre, “The Csaw C Mutation Tool: International Conference on Technology of Object-Oriented Languages
Initial Results,” in Proceedings of the 3rd Workshop on Mutation and Systems (TOOLS’00), Santa Barbara, California, 30 July - 4 August
Analysis (MUTATION’07), published with Proceedings of the 2nd 2000, p. 37.
Testing: Academic and Industrial Conference Practice and Research [102] S. Ghosh and A. P. Mathur, “Interface Mutation,” Software Testing,
Techniques (TAIC PART’07). Windsor, UK: IEEE Computer Society, Verification and Reliability, vol. 11, no. 3, pp. 227–247, March 2001.
10-14 September 2007, pp. 185–192. [103] M. R. Girgis and M. R. Woodward, “An Integrated System for Program
[84] A. Estero-Botaro, F. Palomo-Lozano, and I. Medina-Bulo, “Mutation Testing Using Weak Mutation and Data Flow Analysis,” in Proceed-
operators for WS-BPEL 2.0,” in Proceedings of the 21th International ings of the 8th International Conference on Software Engineering
Conference on Software and Systems Engineering and their Applica- (ICSE’85). London, England: IEEE Computer Society Press, August
tions (ICSSEA’08), Paris, France, 9-11 December 2008. 1985, pp. 313–319.
[85] S. C. P. F. Fabbri, J. C. Maldonado, P. C. Masiero, and M. E. Delamaro, [104] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed Automated
“Proteum/FSM: A Tool to Support Finite State Machine Validation Random Testing,” in Proceedings of the 2005 ACM SIGPLAN Con-
Based on Mutation Testing,” in Proceedings of the 19th International ference on Programming Language Design and Implementation, ser.
Conference of the Chilean Computer Science Society (SCCC’99), Talca, ACM SIGPLAN Notices, vol. 40, no. 6, June 2005, pp. 213–223.
Chile, 11-13 November 1999, p. 96. [105] A. S. Gopal and T. A. Budd, “Program Testing by Specification
[86] S. C. P. F. Fabbri, J. C. Maldonado, P. C. Masiero, M. E. Delamaro, Mutation,” University of Arizona, Tucson, Arizona, Technical Report
and W. E. Wong, “Mutation Testing Applied to Validate Specifications TR 83-17, 1983.
Based on Petri Nets,” in Proceedings of the IFIP TC6 8th International [106] B. J. M. Grün, D. Schuler, and A. Zeller, “The Impact of Equivalent
Conference on Formal Description Techniques VIII, vol. 43, 1995, pp. Mutants,” in Proceedings of the 4th International Workshop on Mu-
329–337. tation Analysis (MUTATION’09), published with Proceedings of the
[87] S. C. P. F. Fabbri, J. C. Maldonado, T. Sugeta, and P. C. Masiero, 2nd International Conference on Software Testing, Verification, and
“Mutation Testing Applied to Validate Specifications Based on Stat- Validation Workshops. Denver, Colorado: IEEE Computer Society,
echarts,” in Proceedings of the 10th International Symposium on 1-4 April 2009, pp. 192–199.
Software Reliability Engineering (ISSRE’99), Boca Raton, Florida, 1-4 [107] R. G. Hamlet, “Testing Programs with the Aid of a Compiler,” IEEE
November 1999, p. 210. Transactions on Software Engineering, vol. 3, no. 4, pp. 279–290, July
[88] S. P. F. Fabbri, M. E. Delamaro, J. C. Maldonado, and P. Masiero, “Mu- 1977.
tation Analysis Testing for Finite State Machines,” in Proceedings of [108] J. M. Hanks, “Testing Cobol Programs by Mutation,” PhD Thesis,
the 5th International Symposium on Software Reliability Engineering, Georgia Institute of Technology, Atlanta, Georgia, 1980.
Monterey, California, 6-9 November 1994, pp. 220–229. [109] M. Harman, R. Hierons, and S. Danicic, “The Relationship Between
[89] X. Feng, S. Marr, and T. O’Callaghan, “ESTP: An Experimental Soft- Program Dependence and Mutation Analysis,” in Proceedings of the 1st
ware Testing Platform,” in Proceedings of the 3rd Testing: Academic Workshop on Mutation Analysis (MUTATION’00), published in book
and Industrial Conference Practice and Research Techniques (TAIC form, as Mutation Testing for the New Century. San Jose, California,
PART’08), Windsor, UK, 29-31 August 2008, pp. 59–63. 6-7 October 2001, pp. 5–13.
[90] F. C. Ferrari, J. C. Maldonado, and A. Rashid, “Mutation Testing for [110] R. M. Hierons, M. Harman, and S. Danicic, “Using Program Slicing
Aspect-Oriented Programs,” in Proceedings of the 1st International to Assist in the Detection of Equivalent Mutants,” Software Testing,
Conference on Software Testing, Verification, and Validation (ICST Verification and Reliability, vol. 9, no. 4, pp. 233–262, December 1999.
’08). Lillehammer, Norway: IEEE Computer Society, 9-11 April 2008,
[111] R. M. Hierons and M. G. Merayo, “Mutation Testing from Probabilistic
pp. 52–61.
Finite State Machines,” in Proceedings of the 3rd Workshop on Muta-
[91] S. Fichter, “Parallelizing Mutation on a Hypercube,” Masters Thesis,
tion Analysis (MUTATION’07), published with Proceedings of the 2nd
Clemson University, Clemson, SC, 1991.
Testing: Academic and Industrial Conference Practice and Research
[92] V. N. Fleyshgakker and S. N. Weiss, “Efficient Mutation Analysis: A
Techniques (TAIC PART’07). Windsor, UK: IEEE Computer Society,
New Approach,” in Proceedings of the International Symposium on
10-14 September 2007, pp. 141–150.
Software Testing and Analysis (ISSTA’94). Seattle, Washington: ACM
[112] R. M. Hierons and M. G. Merayo, “Mutation Testing from Proba-
Press, August 1994, pp. 185–195.
bilistic and Stochastic Finite State Machines,” Journal of Systems and
[93] P. G. Frankl, S. N. Weiss, and C. Hu, “All-Uses Versus Mutation
Software, vol. 82, no. 11, pp. 1804–1818, November 2009.
Testing: An Experimental Comparison of Effectiveness,” Polytechnic
University, Brooklyn, New York, Technique Report, 1994. [113] J. R. Horgan and A. P. Mathur, “Weak Mutation is Probably Strong
[94] P. G. Frankl, S. N. Weiss, and C. Hu, “All-uses vs Mutation Testing: Mutation,” Purdue University, West Lafayette, Indiana, Technical Re-
an Experimental Comparison of Effectiveness,” Journal of Systems and port SERC-TR-83-P, 1990.
Software, vol. 38, no. 3, pp. 235–253, September 1997. [114] S.-S. Hou, L. Zhang, T. Xie, H. Mei, and J.-S. Sun, “Applying Interface-
[95] G. Fraser and F. Wotawa, “Mutant Minimization for Model-Checker Contract Mutation in Regression Testing of Component-Based Soft-
Based Test-Case Generation,” in Proceedings of the 3rd Workshop ware,” in Proceedings of the 23rd International Conference on Software
on Mutation Analysis (MUTATION’07), published with Proceedings Maintenance (ICSM’07), Paris, France, 2-5 October 2007, pp. 174–183.
of the 2nd Testing: Academic and Industrial Conference Practice and [115] W. E. Howden, “Weak Mutation Testing and Completeness of Test
Research Techniques (TAIC PART’07). Windsor, UK: IEEE Computer Sets,” IEEE Transactions on Software Engineering, vol. 8, no. 4, pp.
Society, 10-14 September 2007, pp. 161–168. 371–379, July 1982.
[96] R. Geist, A. J. Offutt, and F. C. Harris, “Estimation and Enhancement [116] S. Hussain, “Mutation Clustering,” Masters Thesis, King’s College
of Real-Time Software Reliability Through Mutation Analysis,” IEEE London, UK, 2008.
Transactions on Computers, vol. 41, no. 5, pp. 550–558, May 1992. [117] J. Hwang, T. Xie, F. Chen, and A. X. Liu, “Systematic Structural
[97] A. K. Ghosh, T. O‘Connor, and G. McGraw, “An Automated Approach Testing of Firewall Policies,” in Proceedings of the IEEE Symposium
for Identifying Potential Vulnerabilities in Software,” in Proceedings on Reliable Distributed Systems (SRDS ’08), Napoli, Italy, 6-8 October
of the IEEE Symposium on Security and Privacy (S&P’98), Oakland, 2008, pp. 105–114.
California, 3-6 May 1998, pp. 104–114. [118] Itregister, “Plextest,” http://www.itregister.com.au/products/plextest.htm,
[98] S. Ghosh, “Testing Component-Based Distributed Applications,” PhD 2007.
Thesis, Purdue University, West Lafayette, Indiana, 2000. [119] D. Jackson and M. R. Woodward, “Parallel firm mutation of Java
[99] S. Ghosh, “Towards Measurement of Testability of Concurrent Object- programs,” in Proceedings of the 1st Workshop on Mutation Analysis
oriented Programs Using Fault Insertion: a Preliminary Investigation,” (MUTATION’00), published in book form, as Mutation Testing for the
in Proceedings of the 2nd IEEE International Workshop on Source New Century. San Jose, California, 6-7 October 2001, pp. 55–61.
Code Analysis and Manipulation (SCAM’02), Los Alamitos, California, [120] C. Ji, Z. Chen, B. Xu, and Z. Zhao, “A Novel Method of Mutation
2002, p. 7. Clustering Based on Domain Analysis,” in Proceedings of the 21st
[100] S. Ghosh, P. Govindarajan, and A. P. Mathur, “TDS: a Tool for Testing International Conference on Software Engineering and Knowledge
Distributed Component-Based Applications,” in Proceedings of the 1st Engineering (SEKE’09). Boston, Massachusetts: Knowledge Systems
Workshop on Mutation Analysis (MUTATION’00), published in book Institute Graduate School, 1-3 July 2009.
form, as Mutation Testing for the New Century. San Jose, California, [121] Y. Jia, “Mutation Testing Repository,” http://www.dcs.kcl.ac.uk/pg
6-7 October 2001, pp. 103–112. /jiayue/repository/, 2009.
[101] S. Ghosh and A. P. Mathur, “Interface Mutation to Assess the Adequacy [122] Y. Jia and M. Harman, “Constructing Subtle Faults Using Higher Order
of Tests for Componentsand Systems,” in Proceedings of the 34th Mutation Testing,” in Proceedings of the 8th International Working
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 28

Conference on Source Code Analysis and Manipulation (SCAM’08), [142] S. C. Lee and A. J. Offutt, “Generating Test Cases for XML-Based Web
Beijing, China, 28-29 September 2008, pp. 249–258. Component Interactions Using Mutation Analysis,” in Proceedings of
[123] Y. Jia and M. Harman, “MILU: A Customizable, Runtime-Optimized the 12th International Symposium on Software Reliability Engineering
Higher Order Mutation Testing Tool for the Full C Language,” in (ISSRE’01), Hong Kong, China, November 2001, pp. 200–209.
Proceedings of the 3rd Testing: Academic and Industrial Conference [143] J. B. Li and J. Miller, “Testing the Semantics of W3C XML Schema,”
Practice and Research Techniques (TAIC PART’08). Windsor, UK: in Proceedings of the 29th Annual International Computer Software
IEEE Computer Society, 29-31 August 2008, pp. 94–98. and Applications Conference (COMPSAC’05), Turku, Finland, 26-28
[124] C. Jing, Z. Wang, X. Shi, X. Yin, and J. Wu, “Mutation Testing of July 2005, pp. 443–448.
Protocol Messages Based on Extended TTCN-3,” in Proceedings of the [144] R. Lipton, “Fault Diagnosis of Computer Programs,” Student Report,
22nd International Conference on Advanced Information Networking Carnegie Mellon University, 1971.
and Applications (AINA’08), Okinawa, Japan, 25-28 March 2008, pp. [145] R. J. Lipton and F. G. Sayward, “The Status of Research on Program
667–674. Mutation,” in Proceedings of the Workshop on Software Testing and
[125] K. Kapoor, “Formal Analysis of Coupling Hypothesis for Logical Test Documentation, December 1978, pp. 355–373.
Faults,” Innovations in Systems and Software Engineering, vol. 2, no. 2, [146] M.-H. Liu, Y.-F. Gao, J.-H. Shan, J.-H. Liu, L. Zhang, and J.-S.
pp. 80–87, July 2006. Sun, “An Approach to Test Data Generation for Killing Multiple
[126] S.-W. Kim, M. J. Harrold, and Y.-R. Kwon, “MUGAMMA: Mutation Mutants,” in Proceedings of the 22nd IEEE International Conference on
Analysis of Deployed Software to Increase Confidence and Assist Software Maintenance (ICSM’06), Philadelphia, Pennsylvania, USA,
Evolution,” in Proceedings of the 2nd Workshop on Mutation Analysis 24-27 September 2006, pp. 113–122.
(MUTATION’06). Raleigh, North Carolina: IEEE Computer Society, [147] B. Long, R. Duke, D. Goldson, P. Strooper, and L. Wildman,
November 2006, p. 10. “Mutation-based Exploration of a Method for Verifying Concurrent
[127] S. Kim, J. A. Clark, and J. A. McDermid, “Assessing Test Set Java Components,” in 18th International Parallel and Distributed
Adequacy for Object Oriented Programs Using Class Mutation,” in Processing Symposium (IPDPS’04), Santa Fe, New Mexico, 26-30
Proceedings of the 3rd Symposium on Software Technology (SoST’99), April 2004, p. 265.
Buenos Aires, Argentina, 8-9 September 1999. [148] Y.-S. Ma, “Object-Oriented Mutation Testing for Java,” PhD Thesis,
[128] S. Kim, J. A. Clark, and J. A. McDermid, “The Rigorous Generation KAIST University in Korea, 2005.
of Java Mutation Operators Using HAZOP,” in Proceedings of the 12th [149] Y.-S. Ma, M. J. Harrold, and Y.-R. Kwon, “Evaluation of Mutation
International Cofference Software and Systems Engineering and their Testing for Object-Oriented Programs,” in Proceedings of the 28th in-
Applications (ICSSEA 99), Paris, France, 29 November-1 December ternational Conference on Software Engineering (ICSE ’06), Shanghai,
1999. China, 20-28 May 2006, pp. 869–872.
[129] S. Kim, J. A. Clark, and J. A. McDermid, “Class Mutation: Mu- [150] Y.-S. Ma, Y.-R. Kwon, and A. J. Offutt, “Inter-class Mutation Operators
tation Testing for Object-oriented Programs,” in Proceedings of the for Java,” in Proceedings of the 13th International Symposium on
Net.ObjectDays Conference on Object-Oriented Software Systems, Software Reliability Engineering (ISSRE’02). Annapolis, Maryland:
2000. IEEE Computer Society, 12-15 November 2002, p. 352.
[130] S. Kim, J. A. Clark, and J. A. McDermid, “Investigating the ef- [151] Y.-S. Ma, A. J. Offutt, and Y.-R. Kwon, “MuJava: An Automated Class
fectiveness of object-oriented testing strategies using the mutation Mutation System,” Software Testing, Verification & Reliability, vol. 15,
method,” in Proceedings of the 1st Workshop on Mutation Analysis no. 2, pp. 97–133, June 2005.
(MUTATION’00), published in book form, as Mutation Testing for the [152] Y.-S. Ma, A. J. Offutt, and Y.-R. Kwon, “MuJava: a Mutation System
New Century. San Jose, California, 6-7 October 2001, pp. 207–225. for Java,” in Proceedings of the 28th international Conference on
[131] K. N. King and A. J. Offutt, “A Fortran Language System for Mutation- Software Engineering (ICSE ’06), Shanghai, China, 20-28 May 2006,
Based Software Testing,” Software:Practice and Experience, vol. 21, pp. 827–830.
no. 7, pp. 685–718, October 1991. [153] B. Marick, “The Weak Mutation Hypothesis,” in Proceedings of the 4th
[132] E. W. Krauser, “Compiler-Integrated Software Testing,” PhD Thesis, Symposium on Software Testing, Analysis, and Verification (TAV’91).
Purdue University, West Lafyette, 1991. Victoria, British Columbia, Canada: IEEE Computer Society, October
[133] E. W. Krauser, A. P. Mathur, and V. J. Rego, “High Performance 1991, pp. 190–199.
Software Testing on SIMD Machines,” IEEE Transactions on Software [154] E. E. Martin and T. Xie, “A Fault Model and Mutation Testing of
Engineering, vol. 17, no. 5, pp. 403–423, May 1991. Access Control Policies,” in Proceedings of the 16th International
[134] E. W. Krauser, A. P. Mathur, and V. J. Rego, “High Performance Conference on World Wide Web. Banff, Alberta, Canada: ACM, 8-12
Software Testing on SIMD Machines,” in Proceedings of the 2nd May 2007, pp. 667–676.
Workshop on Software Testing, Verification, and Analysis (TVA’88). [155] A. P. Mathur, Foundations of Software Testing. Pearson Education,
Banff Alberta: IEEE Computer Society, July 1988, pp. 171 – 177. 2008.
[135] K. Lakhotia, P. McMinn, and M. Harman, “Automated Test Data [156] A. P. Mathur, “Performance, Effectiveness, and Reliability Issues in
Generation for Coverage: Haven’t We Solved This Problem Yet?” in Software Testing,” in Proceedings of the 5th International Computer
Proceedings of Testing: Academia & Industry Conference - Practice Software and Applications Conference (COMPSAC’79), Tokyo, Japan,
And Research Techniques (TAIC-PART ’09). Windsor, UK: IEEE 11-13 September 1991, pp. 604–605.
Computer Society, 4-6 September 2009, pp. 95–104. [157] A. P. Mathur, “CS 406 Software Engineering I,” Course Project
[136] W. B. Langdon, M. Harman, and Y. Jia, “Multi Objective Higher Handout, August 1992.
Order Mutation Testing With Genetic Programming,” in Proceedings [158] A. P. Mathur and E. W. Krauser, “Mutant Unification for Improved
of the 4th Testing: Academic and Industrial Conference - Practice and Vectorization,” Purdue University, West Lafayette, Indiana, Technique
Research (TAIC PART’09). Windsor, UK: IEEE Computer Society, Report SERC-TR-14-P, 1988.
4-6 September 2009. [159] A. P. Mathur and W. E. Wong, “An Empirical Comparison of Mutation
[137] W. B. Langdon, M. Harman, and Y. Jia, “Multi Objective Mutation and Data Flow Based Test Adequacy Criteria,” Purdue University, West
Testing With Genetic Programming,” in Proceedings of the Genetic and Lafayette, Indiana, Technique Report, 1993.
Evolutionary Computation Conference 2009 (GECCO’09), Montréal, [160] A. P. Mathur and W. E. Wong, “An Empirical Comparison of Data
Canada, 8-12 July 2009. Flow and Mutation-based Test Adequacy Criteria,” Software Testing,
[138] Y. Le Traon, B. Baudry, and J.-M. Jézéquel, “Design by Contract to Verification and Reliability, vol. 4, no. 1, pp. 9 – 31, 1994.
Improve Software Vigilance,” IEEE Transactions of Software Engineer- [161] P. S. May, “Test Data Generation: Two Evolutionary Approaches to
ing, vol. 32, no. 8, pp. 571–586, August 2006. Mutation Testing,” PhD Thesis, University of Kent, Canterbury, Kent,
[139] Y. Le Traon, T. Mouelhi, and B. Baudry, “Testing Security Policies: 2007.
Going Beyond Functional Testing,” in The 18th IEEE International [162] P. McMinn, “Search-Based Software Test Data Generation: A Survey,”
Symposium on Software Reliability. Trollhättan, Sweden: IEEE Software Testing, Verification and Reliability, vol. 14, no. 2, pp. 105–
Computer Society, 5-9 November 2007, pp. 93–102. 156, 2004.
[140] S. Lee, X. Bai, and Y. Chen, “Automatic Mutation Testing and [163] I. Moore, “Jester and Pester,” http://jester.sourceforge.net/, 2001.
Simulation on OWL-S Specified Web Services,” in Proceedings of the [164] L. J. Morell, “A Theory of Error-Based Testing,” PhD Thesis, Univer-
41st Annual Simulation Symposium (ANSS’08), Ottawa, Canada., 14-16 sity of Maryland at College Park, College Park, Maryland, 1984.
April 2008, pp. 149–156. [165] T. Mouelhi, F. Fleurey, and B. Baudry, “A Generic Metamodel For
[141] S. D. Lee, “Weak vs. Strong: An Empirical Comparison of Mutation Security Policies Mutation,” in Proceedings of the IEEE International
Variants,” Masters Thesis, Clemson University, Clemson, SC, 1991. Conference on Software Testing Verification and Validation Workshop
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 29

(ICSTW’08). Lillehammer, Norway: IEEE Computer Society, 9-11 [187] A. J. Offutt and J. Pan, “Automatically Detecting Equivalent Mutants
April 2008, pp. 278–286. and Infeasible Paths,” Software Testing, Verification and Reliability,
[166] T. Mouelhi, Y. Le Traon, and B. Baudry, “Mutation Analysis for vol. 7, no. 3, pp. 165–192, September 1997.
Security Tests Qualification,” in Proceedings of the 3rd Workshop on [188] A. J. Offutt, J. Pan, K. Tewary, and T. Zhang, “An Experimental
Mutation Analysis (MUTATION’07), published with Proceedings of Evaluation of Data Flow and Mutation Testing,” Software:Practice and
the 2nd Testing: Academic and Industrial Conference Practice and Experience, vol. 26, no. 2, pp. 165–176, February 1996.
Research Techniques (TAIC PART’07). Windsor, UK: IEEE Computer [189] A. J. Offutt, R. P. Pargas, S. V. Fichter, and P. K. Khambekar, “Mutation
Society, 10-14 September 2007, pp. 233–242. Testing of Software Using a MIMD Computer,” in Proceedings of
[167] E. S. Mresa and L. Bottaci, “Efficiency of Mutation Operators and the International Conference on Parallel Processing, Chicago, Illinois,
Selective Mutation Strategies: An Empirical Study,” Software Testing, August 1992, pp. 255–266.
Verification and Reliability, vol. 9, no. 4, pp. 205–232, December 1999. [190] A. J. Offutt, G. Rothermel, and C. Zapf, “An Experimental Evaluation
[168] A. S. Namin and J. H. Andrews, “Finding Sufficient Mutation Operators of Selective Mutation,” in Proceedings of the 15th International Con-
via Variable Reduction,” in Proceedings of the 2nd Workshop on ference on Software Engineering (ICSE’93). Baltimore, Maryland:
Mutation Analysis (MUTATION’06). Raleigh, North Carolina: IEEE IEEE Computer Society Press, May 1993, pp. 100–107.
Computer Society, November 2006, p. 5. [191] A. J. Offutt and R. H. Untch, “Mutation 2000: Uniting the Orthogonal,”
[169] A. S. Namin and J. H. Andrews, “On Sufficiency of Mutants,” in Pro- in Proceedings of the 1st Workshop on Mutation Analysis (MUTA-
ceedings of the 29th International Conference on Software Engineering TION’00), published in book form, as Mutation Testing for the New
(ICSE COMPANION’07), Minneapolis, Minnesota, 20-26 May 2007, Century. San Jose, California, 6-7 October 2001, pp. 34–44.
pp. 73–74. [192] A. J. Offutt, J. Voas, and J. Payn, “Mutation Operators for Ada,” George
[170] A. S. Namin, J. H. Andrews, and D. J. Murdoch, “Sufficient Mutation Mason University, Fairfax, Virginia, Technique Report ISSE-TR-96-09,
Operators for Measuring Test Effectiveness,” in Proceedings of the 30th 1996.
International Conference on Software Engineering (ICSE’08), Leipzig, [193] A. J. Offutt and W. Xu, “Generating Test Cases for Web Services Using
Germany, 10-18 May 2008, pp. 351–360. Data Perturbation,” in Proceedings of the Workshop on Testing, Analysis
[171] R. Nilsson, A. J. Offutt, and S. F. Andler, “Mutation-based Testing and Verification of Web Services (TAV-WEB), Boston, Massachusetts,
Criteria for Timeliness,” in Proceedings of the 28th Annual Inter- 11-14 July 2004, pp. 1 – 10.
national Computer Software and Applications Conference (COMP- [194] A. J. Offutt, “Automatic Test Data Generation,” PhD Thesis, Georgia
SAC’04), Hong Kong, China, 28-30, September 2004, pp. 306–311. Institute of Technology, Atlanta, GA, USA, 1988.
[172] R. Nilsson, A. J. Offutt, and J. Mellin, “Test Case Generation for [195] V. Okun, “Specification Mutation for Test Generation and Analysis,”
Mutation-based Testing of Timeliness,” in Proceedings of the 2nd PhD Thesis, University of Maryland Baltimore County, Baltimore,
Workshop on Model Based Testing (MBT 2006), ser. ENTCS, vol. 164, Maryland, 2004.
no. 4, Vienna, Austria, 25-26 March 2006, pp. 97–114. [196] T. Olsson and P. Runeson, “System Level Mutation Analysis Applied
[173] A. J. Offutt, J. Pan, and J. M. Voas, “Procedures for Reducing the Size to a State-based Language,” in Proceedings of the 8th Annual IEEE
of Coverage-based Test Sets,” in Proceedings of the 12 International International Conference and Workshop on the Engineering of Com-
Conference on Testing Computer Software, Washington, DC, June puter Based Systems (ECBS’01), Washington DC, 17-20 April 2001,
1995, pp. 111–123. p. 222.
[174] A. J. Offutt, “The Coupling Effect: Fact or Fiction,” ACM SIGSOFT
[197] J. Pan, “Using Constraints to Detect Equivalent Mutants,” Masters
Software Engineering Notes, vol. 14, no. 8, pp. 131–140, December
Thesis, George Mason University, Fairfax VA, 1994.
1989.
[198] Parasoft, “Parasoft Insure++,” http://www.parasoft.com/jsp/products/
[175] A. J. Offutt, “Investigations of the Software Testing Coupling Effect,”
home.jsp?product=Insure, 2006.
ACM Transactions on Software Engineering and Methodology, vol. 1,
no. 1, pp. 5–20, January 1992. [199] M. Polo, M. Piattini, and I. Garcia-Rodriguez, “Decreasing the Cost
[176] A. J. Offutt, “Private Communication,” July 2008. of Mutation Testing with Second-Order Mutants,” Software Testing,
[177] A. J. Offutt, P. Ammann, and L. L. Liu, “Mutation Testing implements Verification and Reliability, vol. 19, no. 2, pp. 111 – 131, June 2008.
Grammar-Based Testing,” in Proceedings of the 2nd Workshop on [200] M. Polo, S. Tendero, and M. Piattini, “Integrating techniques and
Mutation Analysis (MUTATION’06). Raleigh, North Carolina: IEEE tools for testing automation: Research Articles,” Software Testing,
Computer Society, November 2006, p. 12. Verification and Reliability, vol. 17, no. 1, pp. 3–39, March 2007.
[178] A. J. Offutt and W. M. Craft, “Using Compiler Optimization Tech- [201] A. Pretschner, T. Mouelhi, and Y. Le Traon, “Model-Based Tests
niques to Detect Equivalent Mutants,” Software Testing, Verification for Access Control Policies,” in Proceedings of the 1st International
and Reliability, vol. 4, no. 3, pp. 131–154, September 1994. Conference on Software Testing, Verification, and Validation (ICST
[179] A. J. Offutt, Z. Jin, and J. Pan, “The Dynamic Domain Reduction ’08). Lillehammer, Norway: IEEE Computer Society, 9-11 April 2008,
Approach for Test Data Generation: Design and Algorithms,” George pp. 338–347.
Mason University, Fairfax, Virginia, Technical Report ISSE-TR-94- [202] R. Probert and F. Guo, “Mutation Testing of Protocols: Principles and
110, 1994. Preliminary Experimental Results,” in Proceedings of the Workshop on
[180] A. J. Offutt, Z. Jin, and J. Pan, “The Dynamic Domain Reduction Pro- Protocol Test Systems, Leidschendam, Netherland, 15-17 October 1991,
cedure for Test Data Generation,” Software:Practice and Experience, pp. 57–76.
vol. 29, no. 2, pp. 167–193, February 1999. [203] S. T. Redwine and W. E. Riddle, “Software Technology Maturation,”
[181] A. J. Offutt and K. N. King, “A Fortran 77 Interpreter for Mutation in Proceedings of the 8th International Conference on Software Engi-
Analysis,” ACM SIGPLAN Notices, vol. 22, no. 7, pp. 177–188, July neering, London, England, 1985, pp. 189–200.
1987. [204] C. K. Roy and J. R. Cordy, “Towards a Mutation-based Automatic
[182] A. J. Offutt, A. Lee, G. Rothermel, R. H. Untch, and C. Zapf, “An Framework for Evaluating Code Clone Detection Tools,” in Proceed-
Experimental Determination of Sufficient Mutant Operators,” ACM ings of the Canadian Conference on Computer Science and Software
Transactions on Software Engineering and Methodology, vol. 5, no. 2, Engineering (C3S2E’08). Montreal, Quebec, Canada: ACM, 12-13
pp. 99–118, April 1996. May 2008, pp. 137–140.
[183] A. J. Offutt and S. Lee, “An Empirical Evaluation of Weak Mutation,” [205] C. K. Roy and J. R. Cordy, “A Mutation / Injection-based Auto-
IEEE Transactions on Software Engineering, vol. 20, no. 5, pp. 337– matic Framework for Evaluating Code Clone Detection Tools,” in
344, May 1994. Proceedings of the 4th International Workshop on Mutation Analysis
[184] A. J. Offutt and S. D. Lee, “How Strong is Weak Mutation?” in (MUTATION’09), published with Proceedings of the 2nd International
Proceedings of the 4th Symposium on Software Testing, Analysis, and Conference on Software Testing, Verification, and Validation Work-
Verification (TAV’91). Victoria, British Columbia, Canada: IEEE shops. Denver, Colorado: IEEE Computer Society, 1-4 April 2009,
Computer Society, October 1991, pp. 200 – 213. pp. 157–166.
[185] A. J. Offutt, Y.-S. Ma, and Y.-R. Kwon, “An Experimental Mutation [206] Rubyforge, “Heckle,” http://seattlerb.rubyforge.org/heckle/, 2007.
System for Java,” ACM SIGSOFT Software Engineering Notes, vol. 29, [207] M. Sahinoglu and E. H. Spafford, “A Bayes Sequential Statistical
no. 5, pp. 1–4, September 2004. Procedure for Approving Software Products,” in Proceedings of the
[186] A. J. Offutt and J. Pan, “Detecting Equivalent Mutants and the IFIP Conference on Approving Software Products (ASP’90). Garmisch
Feasible Path Problem,” in Proceedings of the 1996 Annual Conference Partenkirchen, Germany: Elsevier Science, September 1990, pp. 43–56.
on Computer Assurance. Gaithersburg, Maryland: IEEE Computer [208] D. Schuler, V. Dallmeier, and A. Zeller, “Efficient Mutation Testing
Society Press, June 1996, pp. 224–236. by Checking Invariant Violations,” in Proceedings of the International
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 30

Symposium on Software Testing and Analysis (ISSTA’09), Chicago, [229] P. Thévenod-Fosse, H. Waeselynck, and Y. Crouzet, “An Experimental
Illinois, 19-23 July 2009. Study on Software Structural Testing: Deterministi cversus Random
[209] K. Sen, D. Marinov, and G. Agha, “CUTE: A Concolic Unit Testing Input Generation,” in Proceedings of the 25th International Symposium
Engine for C,” in Proceedings of the 13th ACM SIGSOFT International on Fault-Tolerant Computing (FTCS’91), Montréal, Canada, 25-27 June
Symposium on Foundations of Software Engineering (FSE’05), Lisbon, 1991, pp. 410–417.
Portugal, 2005, pp. 263–272. [230] N. Tillmann and J. de Halleux, “Pex-White Box Test Generation for
[210] Y. Serrestou, V. Beroulle, and C. Robach, “Functional Verification of .NET,” in Proceedings of the 2nd International Conference on Tests
RTL Designs Driven by Mutation Testing Metrics,” in Proceedings of and Proofs (TAP’08), Prato, Italy, 9-11 April 2008, pp. 134–153.
the 10th Euromicro Conference on Digital System Design Architectures, [231] M. Trakhtenbrot, “New Mutations for Evaluation of Specification and
Methods and Tools, Lubeck, Germany, 29-31 August 2007, pp. 222– Implementation Levels of Adequacy in Testing of Statecharts Models,”
227. in Proceedings of the 3rd Workshop on Mutation Analysis (MUTA-
[211] Y. Serrestou, V. Beroulle, and C. Robach, “Impact of Hardware TION’07), published with Proceedings of the 2nd Testing: Academic
Emulation on the Verification Quality Improvement,” in Proceedings and Industrial Conference Practice and Research Techniques (TAIC
of the IFIP WG 10.5 International Conference on Very Large Scale PART’07). Windsor, UK: IEEE Computer Society, 10-14 September
Integration of System-on-Chip (VLSI-SoC’07), Atlanta, GA, 15-17 2007, pp. 151–160.
October 2007, pp. 218–223. [232] U. Trier, “DBLP,” http://www.informatik.uni-trier.de/ ley/db/.
[212] H. Shahriar and M. Zulkernine, “MUSIC: Mutation-based SQL Injec- [233] J. Tuya, M. J. S. Cabal, and C. de la Riva, “SQLMutation: A Tool
tion Vulnerability Checking,” in Proceedings of the 8th International to Generate Mutants of SQL Database Queries,” in Proceedings of the
Conference on Quality Software (QSIC’08), Oxford, UK, 12-13 August 2nd Workshop on Mutation Analysis (MUTATION’06). Raleigh, North
2008, pp. 77–86. Carolina: IEEE Computer Society, November 2006, p. 1.
[213] H. Shahriar and M. Zulkernine, “Mutation-Based Testing of Buffer [234] J. Tuya, M. J. S. Cabal, and C. de la Riva, “Mutating Database Queries,”
Overflow Vulnerabilities,” in Proceedings of the 2nd Annual IEEE Information and Software Technology, vol. 49, no. 4, pp. 398–417,
International Workshop on Security in Software Engineering, 28 July April 2007.
-1 August, Turku, Finland 2008, pp. 979–984. [235] R. H. Untch, “Mutation-based Software Testing Using Program
[214] H. Shahriar and M. Zulkernine, “Mutation-Based Testing of Format Schemata,” in Proceedings of the 30th Annual Southeast Regional
String Bugs,” in Proceedings of the 11th IEEE High Assurance Systems Conference (ACM-SE’92), Raleigh, North Carolina, 1992, pp. 285–291.
Engineering Symposium (HASE’08), Nanjing, China, 3-5 Dec 2008, pp. [236] R. H. Untch, “Schema-based Mutation Analysis: A New Test Data
229–238. Adequacy Assessment Method,” PhD Thesis, Clemson University,
[215] H. Shahriar and M. Zulkernine, “MUTEC: Mutation-based Testing of Clemson, South Carolina, December 1995, adviser-Harrold, Mary Jean.
Cross Site Scripting,” in Proceedings of the 5th International Workshop [237] R. H. Untch, A. J. Offutt, and M. J. Harrold, “Mutation Analysis Using
on Software Engineering for Secure Systems (SESS’09), Vancouver, Mutant Schemata,” in Proceedings of the International Symposium on
Canada, 19 May 2009, pp. 47–53. Software Testing and Analysis (ISSTA’93), Cambridge, Massachusetts,
[216] D. P. Sidhu and T. K. Leung, “Fault Coverage of Protocol Test 1993, pp. 139–148.
Methods,” in Proceedings of the 7th Annual Joint Conference of the [238] G. Vigna, W. Robertson, and D. Balzarotti, “Testing Network-based
IEEE Computer and Communcations Societies (INFOCOM’88), New Intrusion Detection Signatures using Mutant Exploits,” in Proceedings
Orleans, Louisiana, 27-31 March 1988, pp. 80–85. of the 11th ACM Conference on Computer and Communications
[217] A. Simao, J. C. Maldonado, and R. da Silva Bigonha, “A Trans- Security, Washington DC, USA, 2004, pp. 21–30.
formational Language for Mutant Description,” Computer Languages, [239] P. Vilela, M. Machado, and W. E. Wong, “Testing for Security
Systems & Structures, vol. 35, no. 3, pp. 322–339, October 2009. Vulnerabilities in Software,” in Software Engineering and Applications,
[218] B. H. Smith and L. Williams, “An Empirical Evaluation of the MuJava 2002.
Mutation Operators,” in Proceedings of the 3rd Workshop on Mutation [240] A. M. R. Vincenzi, J. C. Maldonado, E. F. Barbosa, and M. E.
Analysis (MUTATION’07), published with Proceedings of the 2nd Delamaro, “Unit and Integration Testing Strategies for C Programs
Testing: Academic and Industrial Conference Practice and Research Using Mutation,” Software Testing, Verification and Reliability, vol. 11,
Techniques (TAIC PART’07). Windsor, UK: IEEE Computer Society, no. 4, pp. 249–268, November 2001.
10-14 September 2007, pp. 193–202. [241] J. Voas and G. McGraw, Software Fault Injection: Inoculating Pro-
[219] B. H. Smith and L. Williams, “On Guiding the Augmentation of grams Against Errors. John Wiley & Sons, 1997.
an Automated Test Suite via Mutation Analysis,” Empirical Software [242] K. S. H. T. Wah, “Fault Coupling in Finite Bijective Functions,”
Engineering, vol. 14, no. 3, pp. 341–369, 2009. Software Testing, Verification and Reliability, vol. 5, no. 1, pp. 3–47,
[220] SourceForge, “Nester,” http://nester.sourceforge.net/, 2002. 1995.
[221] SourceForge, “Jumble,” http://jumble.sourceforge.net/, 2007. [243] K. S. H. T. Wah, “A Theoretical Study of Fault Coupling,” Software
[222] S. D. R. S. D. Souza, J. C. Maldonado, S. C. P. F. Fabbri, and W. L. D. Testing, Verification and Reliability, vol. 10, no. 1, pp. 3–46, April
Souza, “Mutation Testing Applied to Estelle Specifications,” Software 2000.
Quality Control, vol. 8, no. 4, pp. 285–301, December 1999. [244] K. S. H. T. Wah, “An Analysis of the Coupling Effect I: Single Test
[223] S. D. R. S. D. Souza, J. C. Maldonado, S. C. P. F. Fabbri, and Data,” Science of Computer Programming, vol. 48, no. 2-3, pp. 119–
W. L. D. Souza, “Mutation Testing Applied to Estelle Specifications,” 161, August-September 2003.
in Proceedings of the 33rd Hawaii International Conference on System [245] R. Wang and N. Huang, “Requirement Model-Based Mutation Testing
Sciences (HICSS’08), vol. 8, Maui, Hawaii, 4-7 January 2000, p. 8011. for Web Service,” in Proceedings of the 4th International Conference
[224] E. H. Spafford, “Extending Mutation Testing to Find Environmental on Next Generation Web Services Practices (NWeSP’08), Seoul, Re-
Bugs,” Software:Practice and Experience, vol. 20, no. 2, pp. 181–189, public of Korea, 20-22 October 2008, pp. 71–76.
February 1990. [246] S. N. Weiss and V. N. Fleyshgakker, “Improved Serial Algorithms
[225] T. Srivatanakul, J. A. Clark, S. Stepney, and F. Polack, “Challenging for Mutation Analysis,” ACM SIGSOFT Software Engineering Notes,
Formal Specifications by Mutation: a CSP Security Example,” in vol. 18, no. 3, pp. 149–158, July 1993.
Proceedings of the 10th Asia-Pacific Software Engineering Conference [247] E. J. Weyuker, “On Testing Non-Testable Programs,” The Computer
(APSEC’03), Chiang Mai, Thailand, 10-12 December 2003, pp. 340– Journal, vol. 25, pp. 456–470, 1982.
350. [248] W. E. Wong, “On Mutation and Data Flow,” PhD Thesis, Purdue
[226] T. Sugeta, J. C. Maldonado, and W. E. Wong, “Mutation Testing University, West Lafayette, Indiana, 1993.
Applied to Validate SDL Specifications,” in Proceedings of the 16th [249] W. E. Wong, J. R. Horgan, S. London, and A. P. Mathur, “Effect of Test
IFIP International Conference on Testing of Communicating Systems, Set Minimization on Fault Detection Effectiveness,” Software:Practice
ser. LNCS, vol. 2978, Oxford, UK, 17-19 March 2004, p. 2741. and Experience, vol. 28, pp. 347–369, 1998.
[227] A. Sung, J. Jang, and B. Choi, “Fault-Based Interface Testing Between [250] W. E. Wong, J. R. Horgan, A. P. Mathur, and A. Pasquini, “Test Set
Real-Time Operating System and Application,” in Proceedings of the Size Minimization and Fault Detection Effectiveness: A Case Study in
2nd Workshop on Mutation Analysis (MUTATION’06). Raleigh, North a Space Application,” Journal of Systems and Software, vol. 48, no. 2,
Carolina: IEEE Computer Society, November 2006, p. 8. pp. 79–89, October 1999.
[228] A. Tanaka, “Equivalence Testing for Fortran Mutation System Using [251] W. E. Wong and A. P. Mathur, “Fault Detection Effectiveness of
Data Fow Analysis,” Masters Thesis, Georgia Institute of Technology, Mutation and Data Flow Testing,” Software Quality Journal, vol. 4,
Atlanta, Georgia, 1981. no. 1, pp. 69–83, March 1995.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 31

[252] W. E. Wong and A. P. Mathur, “Reducing the Cost of Mutation Testing: Mark Harman is professor of software engineering
An Empirical Study,” Journal of Systems and Software, vol. 31, no. 3, in the department of computer science at king’s
pp. 185–196, December 1995. college london. he is widely known for work on
[253] M. R. Woodward, “Mutation Testing-An Evolving Technique,” in source code analysis and testing and he was instru-
Proceedings of the IEE Colloquium on Software Testing for Critical mental in the founding of the field of search based
Systems, London, UK, 19 June 1990, pp. 3/1–3/6. software engineering, the topic of this special issue.
[254] M. R. Woodward, “OBJTEST: an Experimental Testing Tool for he has given 15 keynote invited talks on sbse and its
Algebraic Specifications,” in Proceedings of the IEE Colloquium on applications in the past four years. professor harman
Automating Formal Methods for Computer Assisted Prototying, 14 Jan is the author of over 150 refereed publications, on
1990, p. 2. the editorial board of 6 international journals and has
[255] M. R. Woodward, “Errors in Algebraic Specifications and an Experi- served on 90 programme committees. he is director
mental Mutation Testing Tool,” Software Engineering Journal, vol. 8, of the crest centre at the university of london.
no. 4, pp. 221–224, July 1993.
[256] M. R. Woodward, “Mutation Testng - Its Origin and Evolution,”
Journal of Information and Software Technology, vol. 35, no. 3, pp.
163–169, March 1993.
[257] M. R. Woodward and K. Halewood, “From Weak to Strong, Dead or
Alive? an Analysis of Some Mutationtesting Issues,” in Proceedings
of the 2nd Workshop on Software Testing, Verification, and Analysis
(TVA’88). Banff Albert, Canada: IEEE Computer Society, July 1988,
pp. 152–158.
[258] T. Xie, N. Tillmann, J. de Halleux, and W. Schulte, “Mutation Analysis
of Parameterized Unit Tests,” in Proceedings of the 4th International
Workshop on Mutation Analysis (MUTATION’09), published with Pro-
ceedings of the 2nd International Conference on Software Testing,
Verification, and Validation Workshops. Denver, Colorado: IEEE
Computer Society, 1-4 April 2009, pp. 177–181.
[259] W. Xu, A. J. Offutt, and J. Luo, “Testing Web Services by XML Pertur-
bation,” in Proceedings of the 16th IEEE International Symposium on
Software Reliability Engineering (ISSRE’05), Chicago Illinois, 14-16
July 2005, pp. 257–266.
[260] H. Yoon, B. Choi, and J. O. Jeon, “Mutation-Based Inter-Class Testing,”
in Proceedings of the 5th Asia Pacific Software Engineering Conference
(APSEC’98), Taipei, Taiwan, 2-4 December 1998, p. 174.
[261] C. N. Zapf, “A Distributed Interpreter for the Mothra Mutation Testing
System,” Masters Thesis, Clemson University, Clemson, South Car-
olina, 1993.
[262] Y. Zhan and J. A. Clark, “Search-based Mutation Testing for Simulink
Models,” in Proceedings of the Conference on Genetic and Evolu-
tionary Computation (GECCO’05), Washington DC, USA, 25-29 June
2005, pp. 1061–1068.
[263] S. Zhang, T. R. Dean, and G. S. Knight, “Lightweight State Based
Mutation Testing for Security,” in Proceedings of the 3rd Workshop
on Mutation Analysis (MUTATION’07), published with Proceedings
of the 2nd Testing: Academic and Industrial Conference Practice and
Research Techniques (TAIC PART’07). Windsor, UK: IEEE Computer
Society, 10-14 September 2007, pp. 223–232.
[264] C. Zhou and P. Frankl, “Mutation Testing for Java Database Applica-
tions,” in Proceedings of the 2nd International Conference on Software
Testing Verification and Validation (ICST’09), Davor Colorado, 01-04
April 2009, pp. 396–405.

Yue Jia is a third year PhD student in the CREST


centre at King’s College London, under the supervi-
sion of Prof. Harman. He holds a BSc from Beijing
Union University, China and an MSc from King’s
College London, UK. His MSc work resulted in
the KClone tool for lightweight dependence-based
clone detection. His PhD work concerns Higher
Order Mutation Testing, a topic on which he has
published several papers, one of which received
the best paper award at the 8th IEEE International
Working Conference on Source Code Analysis and
Manipulation. Yue’s work on Higher Order Mutation is implemented in the
Mutation Testing tool MiLu, a publicly available tool that offers configurable
mutation testing for the C language. He also maintains the Mutation Testing
Repository, from which the reader can find many resources, including papers
and analysis of trends in research and practice of Mutation Testing. His
interests are software testing and SBSE.

You might also like