KEMBAR78
Property-based testing an open-source compiler, pflua (FOSDEM 2015) | PDF
Property-based testing an
open source compiler, pflua
A fast and easy way to find bugs
kbarone@igalia.com ( luatime.org )
www.igalia.com
Katerina Barone-Adesi
Summary
● What is property-based testing?
● Why is it worth using?
● Property-based testing case study with pflua,
an open source compiler
● How do you implement it in an afternoon?
● What tools already exist?
Why test?
● Reliability
● Interoperability
● Avoiding regressions
● … but this is the test room, so
hopefully people already think testing
is useful and necessary
Why property-based testing?
● Writing tests by hand is slow, boring,
expensive, and usually doesn't lead to
many tests being written
● Generating tests is cheaper, faster,
more flexible, and more fun
● Covers cases humans might not
Why is it more flexible?
● Have you ever written a set of unit
tests, then had to change them all by
hand as the code changes?
● It's a lot easier and faster to change
one part of test generation instead!
What is property-based testing?
● Choose a property (a statement that
should always be true), such as:
● somefunc(x, y) < 100
● sort(sort(x)) == sort(x) (for stable sorts)
● run(expr) == run(optimize(expr))
● our_app(input) == other_app(input)
What is property-based testing not?
● A formal proof
● Exhaustive (except for very small types)
● What that means: property-based testing tries
to find counter-examples. If you find a counter-
example, something is wrong and must be
changed. If you don't, it's evidence (NOT proof)
towards that part of your program being correct.
Why not exhaustively test?
● Too difficult
● Too expensive
● Too resource-consuming (human and computer
time)
● Formal methods and state space reduction
have limitations
What is pflua?
● Pflua is a source to source compiler
● It takes libpcap's filter language (which we call
pflang), and emits lua code
● Why? This lets us run the lua code with luajit
● Performance: better than libpcap, often by a
factor of two or more
● https://github.com/Igalia/pflua/
● Apache License, Version 2.0
What is pflang?
● The input for pflua, libpcap, and other tools
● Igalia's name for it, not an official name
● A language for defining packet filters
● Examples: “ip”, “tcp”, “tcp port 80”, …
● tcp port 80 and not host 192.168.0.1
● If you've used wireshark or tcpdump,
you've used pflang
Case study: testing pflua
● Pflua already had two forms of testing,
and works in practice
● Andy Wingo and I implemented a
property-based checker in an
afternoon, with one property...
What was the test property?
● lua code generated from optimized and
unoptimized IR has the same result on the
same random packet
● It compared two paths:
● Input → IR → optimize(IR) →
compile → run()
● Input → IR → (no change) →
compile → run()
What happened?
● We found 6/7 bugs
● Some are ones we were unlikely to
find with testing by hand
● Remember: pflua is an already-tested,
working project
What were the bugs?
● Accidental comments: 8--2 is 8, not 10! (Lua)
● Invalid optimization: ntohs/ntohl
● Generating invalid lua (return must end block)
● Range analysis: range folding bug (→ inf)
● Range analysis: not setting range of len
● Range analysis: NaN (inf – inf is not your friend)
● + a Luajit bug, found later by the same test
Case study recap
● Property-based testing is useful even for
seemingly-working, seemingly-mature code
● We found 3 bugs in range analysis
● We were unlikely to have found all 3 bugs with
unit testing by hand
● This was code that appeared to work
● Typical use didn't cause any visible problem
● 4 of the 6 bugs fixed that afternoon
Property-based testing: how?
● for i = 1,100 do
local g = generate_test_case()
run_test_case(property, g)
● Conceptually, it's that simple:
Generate and run tests (handling exceptions)
● With premade tools, you need a property,
and (sometimes) a random test generator
How to generate test cases
● The simplest version is unweighted choices:
function True() return { 'true' } end
function Comparison()
return { ComparisonOp(), Arithmetic(),
Arithmetic() } end
…
function Logical()
return choose({ Conditional, Comparison,
True, False, Fail })() end
Are unweighted choices enough?
● math.random(0, 2^32-1)
● Property: 1/y <= y
● False iff y = 0
● 4 billion test cases doesn't guarantee this will
be found...
● What are other common edge case numbers?
Weighted choices
function Number()
if math.random() < 0.2
then return math.random(0, 2^32 – 1)
else
return choose({ 0, 1, 2^32-1, 2^31-1 })
end
end
Write your own checker!
for i = 1,iterations do
local packet, packet_idx = choose(packets)
local P, len = packet.packet, packet.len
random_ir = Logical()
local unopt_lua = codegen.compile(random_ir)
local optimized = optimize.optimize(random_ir)
local opt_lua = codegen.compile(optimized)
if unopt_lua(P, len) ~= opt_lua(P, len)
then print_details_and_exit() end
end
Test generation problems
● Large, hard-to-analyze test cases
● Defaults to randomly searching the
solution space; randomly testing that
plain 'false' is still 'false' after
optimization as 20% of your 1000
tests is a bit daft
What level to test?
● For a compiler: the front-end language? Various
levels of IR? Other?
● In general: input? Internal objects?
● Tradeoffs: whitebox testing with internals can
be useful, but can break systems with internals
that the system itself cannot create.
● Testing multiple levels is possible
● Tends to test edge cases of lower levels
Interaction with interface stability
● At any level, more flexible than hand unit
testing
● Interfaces change. Inputs hopefully change
rarely; internals may change often
● Property-based testing makes refactoring
cheaper and easier: less code to change when
internals change, more test coverage
It's still worth unit testing
● Use property-based testing to find bugs (and
classes of bugs)
● Use unit tests for avoiding regressions;
continue to routinely test code that has already
caused problems, to reduce the chances that
known bugs will be re-introduced
● Use unit testing if test generation is infeasible,
or for extremely rare paths
Reproducible tests
● There are some pitfalls to outputting a
random seed to re-run tests
● The RNG may not produce consistent
results across platforms or be stable
across upgrades
● (Rare) Bugs in your compiler / interpreter
/ libraries can hinder reproducibility
Existing tools: QuickCheck
● Originally in Haskell; has been widely ported to
other languages
● Better tools for test case generation
● Allows filtering test cases
● Starts with small test cases
● QuickCheck2: test case minimization
The future of test generation
● Hypothesis, by David Ritchie MacIver (Python)
● https://github.com/DRMacIver/hypothesis
● Example database is better than saving seeds - it
propagates interesting examples between tests.
● Much smarter data generation
● Adapts to conditional tests better
● Blurs the lines between fuzz testing, conventional
unit testing and property based testing.
Forward-looking Hypothesis
● The following are planned, but not implemented
● Using coverage information to drive example
generation
● Adding "combining rules" which allow you to
also express things like "set | set -> set" and
then it can test properties on those too.
● Better workflows around integrating into CI
● End-of-February 1.0 release predicted
Other stable tools
● Scalacheck
● Quviq's Quickcheck for Erlang
● Have/inspired some of the benefits of
Hypothesis, but are already mature and widely
used
Conclusions
● Property-based testing finds tricky bugs and
saves time
● You can start it in an afternoon, with no tools
● There are some pretty helpful existing tools
(QuickCheck, Hypothesis, ScalaCheck, etc)
● Start property-based testing today!
● Or Monday, at least.

Property-based testing an open-source compiler, pflua (FOSDEM 2015)

  • 1.
    Property-based testing an opensource compiler, pflua A fast and easy way to find bugs kbarone@igalia.com ( luatime.org ) www.igalia.com Katerina Barone-Adesi
  • 2.
    Summary ● What isproperty-based testing? ● Why is it worth using? ● Property-based testing case study with pflua, an open source compiler ● How do you implement it in an afternoon? ● What tools already exist?
  • 3.
    Why test? ● Reliability ●Interoperability ● Avoiding regressions ● … but this is the test room, so hopefully people already think testing is useful and necessary
  • 4.
    Why property-based testing? ●Writing tests by hand is slow, boring, expensive, and usually doesn't lead to many tests being written ● Generating tests is cheaper, faster, more flexible, and more fun ● Covers cases humans might not
  • 5.
    Why is itmore flexible? ● Have you ever written a set of unit tests, then had to change them all by hand as the code changes? ● It's a lot easier and faster to change one part of test generation instead!
  • 6.
    What is property-basedtesting? ● Choose a property (a statement that should always be true), such as: ● somefunc(x, y) < 100 ● sort(sort(x)) == sort(x) (for stable sorts) ● run(expr) == run(optimize(expr)) ● our_app(input) == other_app(input)
  • 7.
    What is property-basedtesting not? ● A formal proof ● Exhaustive (except for very small types) ● What that means: property-based testing tries to find counter-examples. If you find a counter- example, something is wrong and must be changed. If you don't, it's evidence (NOT proof) towards that part of your program being correct.
  • 8.
    Why not exhaustivelytest? ● Too difficult ● Too expensive ● Too resource-consuming (human and computer time) ● Formal methods and state space reduction have limitations
  • 9.
    What is pflua? ●Pflua is a source to source compiler ● It takes libpcap's filter language (which we call pflang), and emits lua code ● Why? This lets us run the lua code with luajit ● Performance: better than libpcap, often by a factor of two or more ● https://github.com/Igalia/pflua/ ● Apache License, Version 2.0
  • 11.
    What is pflang? ●The input for pflua, libpcap, and other tools ● Igalia's name for it, not an official name ● A language for defining packet filters ● Examples: “ip”, “tcp”, “tcp port 80”, … ● tcp port 80 and not host 192.168.0.1 ● If you've used wireshark or tcpdump, you've used pflang
  • 12.
    Case study: testingpflua ● Pflua already had two forms of testing, and works in practice ● Andy Wingo and I implemented a property-based checker in an afternoon, with one property...
  • 13.
    What was thetest property? ● lua code generated from optimized and unoptimized IR has the same result on the same random packet ● It compared two paths: ● Input → IR → optimize(IR) → compile → run() ● Input → IR → (no change) → compile → run()
  • 14.
    What happened? ● Wefound 6/7 bugs ● Some are ones we were unlikely to find with testing by hand ● Remember: pflua is an already-tested, working project
  • 15.
    What were thebugs? ● Accidental comments: 8--2 is 8, not 10! (Lua) ● Invalid optimization: ntohs/ntohl ● Generating invalid lua (return must end block) ● Range analysis: range folding bug (→ inf) ● Range analysis: not setting range of len ● Range analysis: NaN (inf – inf is not your friend) ● + a Luajit bug, found later by the same test
  • 16.
    Case study recap ●Property-based testing is useful even for seemingly-working, seemingly-mature code ● We found 3 bugs in range analysis ● We were unlikely to have found all 3 bugs with unit testing by hand ● This was code that appeared to work ● Typical use didn't cause any visible problem ● 4 of the 6 bugs fixed that afternoon
  • 17.
    Property-based testing: how? ●for i = 1,100 do local g = generate_test_case() run_test_case(property, g) ● Conceptually, it's that simple: Generate and run tests (handling exceptions) ● With premade tools, you need a property, and (sometimes) a random test generator
  • 18.
    How to generatetest cases ● The simplest version is unweighted choices: function True() return { 'true' } end function Comparison() return { ComparisonOp(), Arithmetic(), Arithmetic() } end … function Logical() return choose({ Conditional, Comparison, True, False, Fail })() end
  • 19.
    Are unweighted choicesenough? ● math.random(0, 2^32-1) ● Property: 1/y <= y ● False iff y = 0 ● 4 billion test cases doesn't guarantee this will be found... ● What are other common edge case numbers?
  • 20.
    Weighted choices function Number() ifmath.random() < 0.2 then return math.random(0, 2^32 – 1) else return choose({ 0, 1, 2^32-1, 2^31-1 }) end end
  • 21.
    Write your ownchecker! for i = 1,iterations do local packet, packet_idx = choose(packets) local P, len = packet.packet, packet.len random_ir = Logical() local unopt_lua = codegen.compile(random_ir) local optimized = optimize.optimize(random_ir) local opt_lua = codegen.compile(optimized) if unopt_lua(P, len) ~= opt_lua(P, len) then print_details_and_exit() end end
  • 22.
    Test generation problems ●Large, hard-to-analyze test cases ● Defaults to randomly searching the solution space; randomly testing that plain 'false' is still 'false' after optimization as 20% of your 1000 tests is a bit daft
  • 23.
    What level totest? ● For a compiler: the front-end language? Various levels of IR? Other? ● In general: input? Internal objects? ● Tradeoffs: whitebox testing with internals can be useful, but can break systems with internals that the system itself cannot create. ● Testing multiple levels is possible ● Tends to test edge cases of lower levels
  • 24.
    Interaction with interfacestability ● At any level, more flexible than hand unit testing ● Interfaces change. Inputs hopefully change rarely; internals may change often ● Property-based testing makes refactoring cheaper and easier: less code to change when internals change, more test coverage
  • 25.
    It's still worthunit testing ● Use property-based testing to find bugs (and classes of bugs) ● Use unit tests for avoiding regressions; continue to routinely test code that has already caused problems, to reduce the chances that known bugs will be re-introduced ● Use unit testing if test generation is infeasible, or for extremely rare paths
  • 26.
    Reproducible tests ● Thereare some pitfalls to outputting a random seed to re-run tests ● The RNG may not produce consistent results across platforms or be stable across upgrades ● (Rare) Bugs in your compiler / interpreter / libraries can hinder reproducibility
  • 27.
    Existing tools: QuickCheck ●Originally in Haskell; has been widely ported to other languages ● Better tools for test case generation ● Allows filtering test cases ● Starts with small test cases ● QuickCheck2: test case minimization
  • 28.
    The future oftest generation ● Hypothesis, by David Ritchie MacIver (Python) ● https://github.com/DRMacIver/hypothesis ● Example database is better than saving seeds - it propagates interesting examples between tests. ● Much smarter data generation ● Adapts to conditional tests better ● Blurs the lines between fuzz testing, conventional unit testing and property based testing.
  • 29.
    Forward-looking Hypothesis ● Thefollowing are planned, but not implemented ● Using coverage information to drive example generation ● Adding "combining rules" which allow you to also express things like "set | set -> set" and then it can test properties on those too. ● Better workflows around integrating into CI ● End-of-February 1.0 release predicted
  • 30.
    Other stable tools ●Scalacheck ● Quviq's Quickcheck for Erlang ● Have/inspired some of the benefits of Hypothesis, but are already mature and widely used
  • 31.
    Conclusions ● Property-based testingfinds tricky bugs and saves time ● You can start it in an afternoon, with no tools ● There are some pretty helpful existing tools (QuickCheck, Hypothesis, ScalaCheck, etc) ● Start property-based testing today! ● Or Monday, at least.