art with code

2009-01-20

Low-boilerplate testing in OCaml

Continuing my search for a low-boilerplate way to write reliable code, I integrated quickcheck.ml (QuickCheck clone adapted from Core, which is adapted from this) to Prelude.ml's test suite extractor. The test suite extractor parses the source code for tests embedded in special comments and creates an oUnit test suite from them. This way I can have the tests near the code (where I think they should be), and only compile and run them when I want to (i.e. I do `omake -P test` and it extracts and reruns the tests whenever it detects changes in the source files.)

A further bonus of generating the tests is that I don't have to come up with names for them. The extractor automatically names the tests at extraction time (as test_filename_line_nnn.) This makes it a lot nicer to write tests, as you don't need to shift your focus from "what should the test do" to "what should the name be for what the test should do." The existence of the file name and line number in the auto-generated name also assures me that I _will_ find the failing test and _fast_.

Previously I had embedded unit test bodies (which I haven't used much) and lists of boolean tests (which most of my tests are.) Now I added embedded QuickCheck laws, and can perhaps eliminate a dozen manually written tests with a single property check.

let reverse l =
let rec aux res l = match l with
| [] -> res
| (h::t) -> aux (h::res) t in
aux [] l
(***
(* embedded test body *)
assert_equal (reverse []) [];
assert_equal (reverse [1]) [1];
assert_equal (reverse (1--10)) (10--1)
**)
(**T
(* embedded boolean test list *)
reverse [] = []
reverse [1] = [1]
reverse (1--10) = (10--1)
**)
(**Q
(* embedded QuickCheck law list *)
Q.list ~size_gen:(fun _ -> Random.int 2) Q.uig (fun l -> reverse l = l)
Q.list Q.uig (fun l -> reverse (reverse l) = l)
Q.list Q.uig (fun l -> reverse l = list (areverse (array l)))
Q.list Q.cg (fun l -> reverse l = explode (sreverse (implode l)))
**)

As you may notice, the QuickCheck laws are generally longer to write than the tests themselves. But you shouldn't need as many laws as tests.

I haven't yet used the embedded QuickCheck tests for anything, so I can't give any real-world statistics. Maybe by the end of the week? Though the code that I'm currently testing is IO, and that tends to require whole embedded test bodies (as you need setup for temp file creation and deletion and state change checks.) Hm, maybe there's a nice way to abstract all that fiddling into an unfold of some kind...

Future directions

The FsCheck port of QuickCheck contains a shrinking feature for finding a reduction of the failed test input. Which would be nice.

My goal for the testing system is minimizing the amount of extraneous stuff I have to write to get the information I want from the code. To explain:

Tests give me information about the code. Specifically: does this function behave as expected when called with this argument.

I can only write a limited amount of lines of code per day (physically I can sustain ~300 loc / day, 1000 will get me RSI by the end of the day.) The more boilerplate I have to write, the less real code I can write.

A code:tests ratio of 1:4 means that I can write 60 lines of code per day. The rest of the budget is taken by the 240 lines of tests.

There are several ways to get at the testing information. Finding a way that expends less effort on my part will boost my productivity (and my hands will be thankful as well.)

Furthermore, I make mistakes when writing code. I also make mistakes when writing tests, missing tests that might discover mistakes in the code. The testing system should seek to find the coding mistakes and hence also minimize the amount of testing mistakes (QuickCheck and such automated test generators help there.)

The more tests the computer writes, the more and better quality code I can write.

2 comments:

Anonymous said...

Did you also have a look at Kaputt? http://kaputt.x9c.fr

Ilmari Heikkinen said...

I looked at Kaputt now. It has a more generators than quickcheck.ml and a cleaner design. The oUnit-clone side of it isn't really of interest to me, as I'm already using oUnit. I could use its generator.ml as a test generation backend, but it feels like less bother to refactor quickcheck.ml (hubris speaking.)

The Bisect code coverage tool by the same author looks interesting too. http://bisect.x9c.fr

Blog Archive