Property-based testing an open-source compiler, pflua (FOSDEM 2015)

Property-based testing an
open source compiler, pflua
A fast and easy way to find bugs
kbarone@igalia.com ( luatime.org )
www.igalia.com
Katerina Barone-Adesi

Summary
● What is property-based testing?
● Why is it worth using?
● Property-based testing case study with pflua,
an open source compiler
● How do you implement it in an afternoon?
● What tools already exist?

Why test?
● Reliability
● Interoperability
● Avoiding regressions
● … but this is the test room, so
hopefully people already think testing
is useful and necessary

Why property-based testing?
● Writing tests by hand is slow, boring,
expensive, and usually doesn't lead to
many tests being written
● Generating tests is cheaper, faster,
more flexible, and more fun
● Covers cases humans might not

Why is it more flexible?
● Have you ever written a set of unit
tests, then had to change them all by
hand as the code changes?
● It's a lot easier and faster to change
one part of test generation instead!

What is property-based testing?
● Choose a property (a statement that
should always be true), such as:
● somefunc(x, y) < 100
● sort(sort(x)) == sort(x) (for stable sorts)
● run(expr) == run(optimize(expr))
● our_app(input) == other_app(input)

What is property-based testing not?
● A formal proof
● Exhaustive (except for very small types)
● What that means: property-based testing tries
to find counter-examples. If you find a counter-
example, something is wrong and must be
changed. If you don't, it's evidence (NOT proof)
towards that part of your program being correct.

Why not exhaustively test?
● Too difficult
● Too expensive
● Too resource-consuming (human and computer
time)
● Formal methods and state space reduction
have limitations

What is pflua?
● Pflua is a source to source compiler
● It takes libpcap's filter language (which we call
pflang), and emits lua code
● Why? This lets us run the lua code with luajit
● Performance: better than libpcap, often by a
factor of two or more
● https://github.com/Igalia/pflua/
● Apache License, Version 2.0

What is pflang?
● The input for pflua, libpcap, and other tools
● Igalia's name for it, not an official name
● A language for defining packet filters
● Examples: “ip”, “tcp”, “tcp port 80”, …
● tcp port 80 and not host 192.168.0.1
● If you've used wireshark or tcpdump,
you've used pflang

Case study: testing pflua
● Pflua already had two forms of testing,
and works in practice
● Andy Wingo and I implemented a
property-based checker in an
afternoon, with one property...

What was the test property?
● lua code generated from optimized and
unoptimized IR has the same result on the
same random packet
● It compared two paths:
● Input → IR → optimize(IR) →
compile → run()
● Input → IR → (no change) →
compile → run()

What happened?
● We found 6/7 bugs
● Some are ones we were unlikely to
find with testing by hand
● Remember: pflua is an already-tested,
working project

What were the bugs?
● Accidental comments: 8--2 is 8, not 10! (Lua)
● Invalid optimization: ntohs/ntohl
● Generating invalid lua (return must end block)
● Range analysis: range folding bug (→ inf)
● Range analysis: not setting range of len
● Range analysis: NaN (inf – inf is not your friend)
● + a Luajit bug, found later by the same test

Case study recap
● Property-based testing is useful even for
seemingly-working, seemingly-mature code
● We found 3 bugs in range analysis
● We were unlikely to have found all 3 bugs with
unit testing by hand
● This was code that appeared to work
● Typical use didn't cause any visible problem
● 4 of the 6 bugs fixed that afternoon

Property-based testing: how?
● for i = 1,100 do
local g = generate_test_case()
run_test_case(property, g)
● Conceptually, it's that simple:
Generate and run tests (handling exceptions)
● With premade tools, you need a property,
and (sometimes) a random test generator

How to generate test cases
● The simplest version is unweighted choices:
function True() return { 'true' } end
function Comparison()
return { ComparisonOp(), Arithmetic(),
Arithmetic() } end
…
function Logical()
return choose({ Conditional, Comparison,
True, False, Fail })() end

Are unweighted choices enough?
● math.random(0, 2^32-1)
● Property: 1/y <= y
● False iff y = 0
● 4 billion test cases doesn't guarantee this will
be found...
● What are other common edge case numbers?

Weighted choices
function Number()
if math.random() < 0.2
then return math.random(0, 2^32 – 1)
else
return choose({ 0, 1, 2^32-1, 2^31-1 })
end
end

Write your own checker!
for i = 1,iterations do
local packet, packet_idx = choose(packets)
local P, len = packet.packet, packet.len
random_ir = Logical()
local unopt_lua = codegen.compile(random_ir)
local optimized = optimize.optimize(random_ir)
local opt_lua = codegen.compile(optimized)
if unopt_lua(P, len) ~= opt_lua(P, len)
then print_details_and_exit() end
end

Test generation problems
● Large, hard-to-analyze test cases
● Defaults to randomly searching the
solution space; randomly testing that
plain 'false' is still 'false' after
optimization as 20% of your 1000
tests is a bit daft

What level to test?
● For a compiler: the front-end language? Various
levels of IR? Other?
● In general: input? Internal objects?
● Tradeoffs: whitebox testing with internals can
be useful, but can break systems with internals
that the system itself cannot create.
● Testing multiple levels is possible
● Tends to test edge cases of lower levels

Interaction with interface stability
● At any level, more flexible than hand unit
testing
● Interfaces change. Inputs hopefully change
rarely; internals may change often
● Property-based testing makes refactoring
cheaper and easier: less code to change when
internals change, more test coverage

It's still worth unit testing
● Use property-based testing to find bugs (and
classes of bugs)
● Use unit tests for avoiding regressions;
continue to routinely test code that has already
caused problems, to reduce the chances that
known bugs will be re-introduced
● Use unit testing if test generation is infeasible,
or for extremely rare paths

Reproducible tests
● There are some pitfalls to outputting a
random seed to re-run tests
● The RNG may not produce consistent
results across platforms or be stable
across upgrades
● (Rare) Bugs in your compiler / interpreter
/ libraries can hinder reproducibility

Existing tools: QuickCheck
● Originally in Haskell; has been widely ported to
other languages
● Better tools for test case generation
● Allows filtering test cases
● Starts with small test cases
● QuickCheck2: test case minimization

The future of test generation
● Hypothesis, by David Ritchie MacIver (Python)
● https://github.com/DRMacIver/hypothesis
● Example database is better than saving seeds - it
propagates interesting examples between tests.
● Much smarter data generation
● Adapts to conditional tests better
● Blurs the lines between fuzz testing, conventional
unit testing and property based testing.

Forward-looking Hypothesis
● The following are planned, but not implemented
● Using coverage information to drive example
generation
● Adding "combining rules" which allow you to
also express things like "set | set -> set" and
then it can test properties on those too.
● Better workflows around integrating into CI
● End-of-February 1.0 release predicted

Other stable tools
● Scalacheck
● Quviq's Quickcheck for Erlang
● Have/inspired some of the benefits of
Hypothesis, but are already mature and widely
used

Conclusions
● Property-based testing finds tricky bugs and
saves time
● You can start it in an afternoon, with no tools
● There are some pretty helpful existing tools
(QuickCheck, Hypothesis, ScalaCheck, etc)
● Start property-based testing today!
● Or Monday, at least.

Property-based testing an open-source compiler, pflua (FOSDEM 2015)

More Related Content

What's hot

Viewers also liked

Similar to Property-based testing an open-source compiler, pflua (FOSDEM 2015)

More from Igalia

Recently uploaded

Property-based testing an open-source compiler, pflua (FOSDEM 2015)