Dynamic Symbolic Database Application Testing

Overview
Alternative Methods
Details of the Method
Implementation
Ongoing and Future Work
Dynamic Symbolic
Database Application Testing
Chengkai Li, Christoph Csallner
University of Texas at Arlington
June 7, 2010
DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 1/30

Overview
Alternative Methods
Implementation
Motivation
Maximizing code coverage is an important goal in testing.
Database applications: input can be user-supplied queries.
Query results will be used as program values in program logic.
Different queries thus result in different execution paths.
To maximize code coverage: we need to enumerate queries in
an effective way.

Overview
Alternative Methods
Implementation
Our Method
Generate queries dynamically by inverting branching conditions in
existing program execution paths.
1 Monitor the program’s execution paths by dynamic symbolic
execution (e.g., Dart, Pex).
2 Invert a branching condition on some covered path → a new test
query.
3 Execute the query, bring in new tuples.
4 The new tuples will cover new paths.
5 Do 1-4 iteratively.

Overview
Alternative Methods
Implementation
Illustration of the Idea
After the initial query
q1=c1 ∧ c2
Execution tree (maintained by dynamic symbolic engine):
each path to a leaf node represents an execution path, encountered
for tuples satisfying the branching conditions on the path.
true
c1
c2
c3
c4
c5 !c5
!c4
(q1)
if (z > 0) { // c1
if (z < 100) // c2
// ..
}

Overview
Alternative Methods
Implementation
After the initial query, the candidate queries
Each dashed edge represents an inversed branching condition, thus
a candidate query.
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
(q1)

Overview
Alternative Methods
Implementation
The second test query
q2=!c1
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
(q1)
(q2)

Overview
Alternative Methods
Implementation
After the second test query
q2=!c1
candidate queries are again dashed.
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
c6
c7 !c7
c11 !c11
c12 !c12
!c6(q1)
(q2)

Overview
Alternative Methods
Implementation
The third test query
q3=!c1 ∧ c6 ∧ c7
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
c6
c7 !c7
c11 !c11
c12 !c12
!c6(q1)
(q2)
(q3)

Overview
Alternative Methods
Implementation
After the third test query
q3=!c1 ∧ c6 ∧ c7
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
c6
c7
c8
c9 !c9
!c8
c10 !c10
!c7
c11 !c11
c12 !c12
!c6(q1)
(q2)
(q3)

Overview
Alternative Methods
Implementation
The fourth test query
q4=!c1 ∧ c6∧!c7∧!c11∧!c12
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
c6
c7
c8
c9 !c9
!c8
c10 !c10
!c7
c11 !c11
c12 !c12
!c6(q1)
(q2)
(q3)
(q4)

Overview
Alternative Methods
Implementation
After the fourth test query
q4=!c1 ∧ c6∧!c7∧!c11∧!c12
true
c1
c2
c3
c4
c5 !c5
!c4
!c3
!c2
!c1
c6
c7
c8
c9 !c9
!c8
c10 !c10
!c7
c11 !c11
c12 !c12
!c6(q1)
(q2)
(q3)
(q4)

Overview
Alternative Methods
Implementation
Advantages of the Proposed Method
Real data, no mock database (which can be hard to generate).
No need to worry about if the mock database is representative.
Given large space of possible program paths, we only test those
that can be encountered for real data.
This is especially useful for applications that only read existing
data.

Overview
Alternative Methods
Implementation
Alternative Method 1: Brute force
Test for every tuple in database.
Too costly
Limited resources in testing.
Many tuples result in the same execution path. Thus efforts wasted.
May not be possible to get all the tuples
Security constraint.
Query capability constraint. (e.g., deep-Web databases)

Overview
Alternative Methods
Implementation
Alternative Method 2: Sample the existing database
Do sampling ﬁrst, then test for every tuple in the sample.
A presentative database sample may not trigger a set of program
execution paths that is representative of the paths encountered
in production use.
E.g., a column with 1 million distinct values; several particular
values will trigger some paths.
Ours can be viewed as a sampling technique that is aware of the
program structure.

Overview
Alternative Methods
Implementation
Alternative Method 3: Generate custom mock
databases
Generate a mock database such that its data will expose a bug in the
program
Will expose potential program bugs.
But users may not care about them.
Because many “bugs” will never occur in practice.
Because the mock database generator typically cannot generate
fully realistic databases.

Overview
Alternative Methods
Implementation
Alternative Method 4: Static Analysis
Static program analysis is typically:
(+) Fast
(-) Imprecise: misses bugs and gives false alarms
Our approach: Test = execute the program (dynamic analysis)
(+) Fully precise: no false alarms
(-) Resource-hungry, will still miss bugs
Our (dynamic) analysis reasons about program + existing database
contents. We are not aware of any static analysis that does that.

Overview
Alternative Methods
Implementation
Assumptions/Limitations
Queries
single-relation conjunctive selection query.
Each conjunct is a ⊙ v, where a is an attribute, v is a constant
value, and ⊙ can be <, ≤, >, ≥, =, or ∕=.
no grouping, aggregation, join, insertion, deletion, updates.
Programs
follow tuple-wise semantics.
if a branching condition depends on a database tuple, the
condition can be rewritten to the same form of the query
conjuncts: a ⊙ v.

Overview
Alternative Methods
Implementation
Iterative Testing Method
1: q ← define an initial test query; 𝒬 ← {q}
2: repeat
3: 𝒯 ← run q and get the first nq result tuples
4: for each tuple t in 𝒯 do
5: run the program over t and update the execution tree tree𝒬
with encountered new execution paths
6: tree𝒬 ← the complement tree of tree𝒬
7: 𝒬c ← get the candidate queries based on tree𝒬
8: q ← select a query from 𝒬c
9: 𝒬 ← 𝒬 ∪ {q}
10: until stopping criteria satisfied

Overview
Alternative Methods
Implementation
Challenges
How to
decide how many tuples to retrieve for a query?
choose the next test query?
design stopping condition for testing?

Overview
Alternative Methods
Implementation
Optimization Goals
Given program 𝒫 and a set of test queries 𝒬={qi }
maximize coverage
Path(𝒫,ℛ,𝒬) = {Patht ∣t ∈
∪
𝒯i
}, where 𝒯i is the ﬁrst ni tuples for query
qi .
minimize cost
cost(𝒬) =
∑
i cost(qi )
cost(qi ) = q_cost(qi ) + t_cost(qi ) = w + c × ni + t × ni
t_cost: t is test cost per tuple.
q_cost: w is query cost to get ﬁrst result tuple, c is query cost to
get each additional tuple.

Overview
Alternative Methods
Implementation
Why only ni tuples for a query qi?
Multiple tuples will result in the same program execution path. After a
certain number of initial tuples, most or all distinct paths may have
been encountered.
Less retrieved/tested tuples means both less testing cost and less
query execution cost.

Overview
Alternative Methods
Implementation
How to choose next q and n
Greedy Approach
Given candidate query q,
score(q) = cost′
(q)
∣Path′(𝒫,ℛ,ℳ,𝒬∪{q})∣−∣Path(𝒫,ℛ,ℳ,𝒬)∣
∣Path′
(𝒫, ℛ, ℳ, 𝒬 ∪ {q})∣: estimate of ∣Path(𝒫,ℛ, ℳ, 𝒬 ∪ {q})∣
cost′
(q): estimate of cost(q)
(both are functions of n)
ﬁnd q that minimizes score(q)

Overview
Alternative Methods
Implementation
Estimating the Coverage and Cost
Estimating the Coverage
Estimate the query result size of leaf node (query).
The result sizes for intermediate nodes are accumulated.
c1
c2
c3 !c3
!c2
c4
c5 !c5
!c4
(100)
(20) (80)
(10) (10) (40) (40)
(30) (10)
Estimating the Cost
both initial tuple cost and total cost.
EXPLAIN (supported by major DBMSs)

Overview
Alternative Methods
Implementation
Stopping Condition for Testing
testing resource limit reached
no more candidate queries
no candidate query can return non-empty result
total number of encountered tuples (associated with distinct
paths) equals the table size

Overview
Alternative Methods
Implementation
Implementation
Overview
Fully automated tool
Analyze Java bytecode programs (any Java program, no need for
source code)
Rewrite application bytecode at load-time: after each application
bytecode instruction, insert a call to our dynamic symbolic engine
Use inserted calls to maintain an accurate symbolic
representation of program state
Treat calls to database (e.g., Jdbc) differently: Represent
returned values as symbolic variables and track how the program
uses them, i.e., in path conditions

Overview
Alternative Methods
Implementation
Implementation
Details
Use Java 5 instrumentation facilities
Use third-party open source bytecode instrumentation framework
ASM
Implement on top of new dynamic symbolic engine Dsc:
Allows handling of regular (non-query) program inputs
Solve constraints on regular program inputs with powerful
third-party satisﬁability modulo theories (SMT) constraint solver
Z3

Overview
Alternative Methods
Implementation
Several directions
Finish prototype implementation
Evaluate on realistic applications
Compare with mock-database generation techniques + compare
with traditional database sampling techniques:
Can we achieve higher coverage of the application code that is
reachable with the existing database contents?
How to deal with database insert, update, delete?

Overview
Alternative Methods
Implementation
Thank you!
Contact
cli@uta.edu, csallner@uta.edu

Overview
Alternative Methods
Implementation
References
Dynamic Symbolic Execution Systems
Dart: C programs, by Godefroid et al. [PLDI’05]
jCute: Java programs, by Sen et al. [CAV’06]
Klee: C programs, by Cadar et al. [OSDI’08]
Pex: .Net programs (C#, etc.), by Tillmann et al. [TAP’08]
Database application testing via mock database generation
jCute extension: Java programs, by Emmi et al. [ISSTA’07]
Qex (Pex extension): .Net programs (C#, etc.), by Veanes et al.
[ICFEM’09]

Overview
Alternative Methods
Implementation
References
Main tools used by our prototype implementation
ASM: http://asm.ow2.org/
Z3:
http://research.microsoft.com/en-us/um/redmond/projects/z3/

Dynamic Symbolic Database Application Testing

More Related Content

What's hot

Similar to Dynamic Symbolic Database Application Testing

More from The Innovative Data Intelligence Research (IDIR) Laboratory, University of Texas at Arlington

Recently uploaded

Dynamic Symbolic Database Application Testing