Sufficient Mutant Operators
Offutt, Rothermel, Lee, Untch, and Zapf TOSEM, April 1996
Slide text copyright Robert W. Hasker, 2006 Material (and figures) from Determination of Sufficient Mutant Operators, ACM Transactions on Software Engineering and Methodology, Vol. 5, No. 2 (April 1996) http://portal.acm.org/citation.cfm?id=227610&coll=ACM&dl=GUIDE&CFID=73501362&CFTOKEN=28660440
Testing Programs by Random Changes
Mutation testing: randomly change code, run against test suite For example, change
int min(int a, int b) { int result = a; if ( b < a ) result = b; return result; } into int min(int a, int b) { int result = b; if ( b < a ) result = b; return result; }
Now:
min(6, 1)
gives 1, but
min(1, 6)
gives 6
What does this tell us?
If new program produces same output on all test data, then either
Test data not sufficient to capture all error Program contains dead code New program equivalent to the old
Thus: tool tests the test suite
Similar to structural coverage
Problem: potentially an infinite number of mutants!
Other Versions of min
Changing < to >:
int min(int a, int b) { int result = a; if ( b > a ) result = b; return result; }
Replacing a by result (equivalent mutant):
int min(int a, int b) { int result = a; if ( b < result ) result = b; return result; } Note this mutant could be eliminated by definition-use analysis
So why is mutation testing helpful?
Structural coverage is a very minimal criterion Consider
int max(int xs[], int len) { int res = -1; for(int i = 0; i < len; ++i) if ( xs[i] > res ) res = xs[i]; return res; } with int nums[3] = {-9, 5, 0}; Get coverage with max(nums, 3); Fails: max(nums, 1);
Mutation Testing, Described
Competent programmer hypothesis
Plan: generate code, generate tests, generate non-equivalent mutants, add to tests until all mutants killed What operators for generating mutants? Mothra: system developed in 90s which used 22 operators
Assumption: if program to be tested isnt correct, it contains at most a few small errors
Are all of these operators necessary? In general: do we need a huge number of operators to make this work?
Set of Mutant Operators (in Mothra) (for Fortran C would be similar)
AAR
ABS ACR AOR ASR CAR CNR
array reference for array reference replacement
absolute value insertion Array reference for constant replacement Arithmetic operator replacement Array reference for scalar variable replacement Constant for array reference replacement Comparable array name replacement
CRP
CSR DER
Constant replacement
Constant for scalar variable replacement DO statement end replacement
Mutant Operators, Continued
DSA DATA statement alterations
GLR
LCR ROR RSR SAN SAR SCR SDL SRC SVR UOI
GOTO label replacement
Logical connector replacement Relational operator replacement RETURN statement replacement Statement analysis - replace statements by traps, return, goto Scalar variable for array reference replacement Scalar for constant replacement Statement deletion Source constant replacement Scalar variable replacement Unary operator insertion
Musing on Mutant Operators
22 operators too many mutants! But note some operators produce more mutants than others:
~18% by SVR, ~13% by ASR; ~60% by top five Can we find a useful subset?
An experiment
10 Fortran programs (or subroutines):
Description Deadlock avoidance Bubble sort over ints Stmts 48 11 Mutants 2765 338
Program Banker Bub
Cal
Euclid Find Insert
Days between 2 dates
Greatest common divisor Partition array Insertion sort over ints
29
11 28 14
3010
196 1022 460
Mid
Quad Trityp Warshall
Median of three ints
Real roots of quadratic eqn Triangle classification Transitive closure of matrix
16
10 28 11
183
359 951 305
Experiment, contd.
Use tool to generate test data sets Augmented these with hand-generated data sets to kill remaining incorrect mutants
Small number of hand-generated cases compared to total test cases Also tried completely random data; results similar
Generated 5 sets of test cases for each program
Each set is then able to kill all non-equivalent mutants
Then scored each set of test cases against all mutants
The result: percentage of total mutants that are killed Any mutant killed by same test set is redundant!
Results
Tried different sets, found best results with five operators ABS, AOR, LCR, ROR, UOI Each test set generated using these mutant operators resulted in killing 98.5% (and an average of 99.5%) of all mutants: Conclusion: these operators sufficient
Results, Continued
Removing UOI, AOR, ABS would clearly weaken tests Evidence for removing LCR is weak; few such connectives in sample programs ROR provides branch coverage and so hard to justify removing
Minimum Set of Operators (for Fortran)
ABS:
insert calls to an absolute value function AOR: replace all arithmetic ops by every syntactically legal alternatives LCR: replaces AND, OR by all logical connectors ROR: replace (modify) relational operators UOI: insert unary operators
Conclusion
Five operators, responsible for ~17% of all mutants, sufficient for 10 test programs
In general: get O(Lines + References) mutants Constant for O is large! Above 10 programs (200 lines): 231,972 mutants
Result: a testing method with minimal user input:
1. 2. 3.
Write code Write tests, possibly using a tool to generate cases Evaluate tests using mutants; if some non-equivalent mutant not killed, extend test set
Would probably focus on critical routines (unit testing)