Introduction
This is a book about property-based testing. It is also a book about Hegel, our family of property-based testing libraries for a variety of languages.
We assume you’re familiar with software testing but not necessarily property-based testing. If you’ve used another property-based testing library, we expect this book will still be useful for you (and not just for the Hegel content), but much of it will be skippable.
This book is currently in an extremely alpha state, and is really more by way of a draft in public than something that is ready for readers. Feedback on it is extremely welcome.
What is property-based testing?
Property-based testing is just a generalisation of the sort of testing you’re already useful.1
You can think of a test as consisting of two parts:
- A scenario
- A set of checks
The scenario is “I did a thing”, and the checks are “here is what should have happened when I did that thing”.
Each of these can vary in how specific they are.
- Here is the exact thing I did, and here is the exact thing that should have happened.
- Here is the exact thing I did, and here are some things that should be true about the result.
- I did something like this, and this is the exact thing that should have happened.
- I did something like this, and here are some things that should be true about the result.
Concretely:
- Here is a series of interactions with my web-application, and here’s what the screenshot of the final result should look like.
- Here is a series of interactions with my web-application, and every fetch should have returned a 200 or a redirect.
- I inserted three unique keys into my dictionary, and the result should be that the dictionary should have this exact structure.
- I inserted three unique keys into my dictionary, and the dictionary should now have size 3.
These four categories roughly correspond to:
- Snapshot testing (AKA golden master testing AKA expect testing)
- Example-based testing (what you probably think of as “normal software testing”)
- Differential testing (comparing two implementations of the same API and asserting that they get the same result)2
- Property-based testing (what this book is about)
None of these are truly distinct categories, and all of them are great in some contexts. This is important to remember: When learning about property-based testing, it’s easy to get excited and think all of your tests should be property-based tests. Resist that urge. Property-based tests are part of a complete breakfast test suite, not the whole of it.
An example
Suppose we have an LRU cache, and we want to check that it never exceeds its configured capacity. We might write the following example-based test:
#[test]
fn test_respects_lru_capacity() {
let mut cache = MyLRUCache::<String, i64>::new(2);
cache.put("a".to_string(), 1);
cache.put("b".to_string(), 2);
cache.put("c".to_string(), 3);
assert!(cache.size() <= 2);
}
This is a perfectly reasonable test, but notice the difference between what it promises and what it actually does. The claim of the test is that the cache never exceeds its capacity, but the test actually only shows that after one very specific sequence of operations the cache has not exceeded its capacity.
In contrast, we might write the following property-based test:
use hegel::generators as gs;
use hegel::TestCase;
#[hegel::test]
fn test_respects_lru_capacity(tc: TestCase) {
let capacity = tc.draw(gs::integers::<usize>().min_value(0));
let mut cache = MyLRUCache::<String, i64>::new(capacity);
let entries = tc.draw(
gs::vecs(gs::tuples!(gs::text(), gs::integers::<i64>()))
);
for (key, value) in entries {
cache.put(key, value);
}
assert!(cache.size() <= capacity);
}
This test generates a fully general series of put operations against the cache, and asserts that the capacity is still respected at the end.
Now, this still doesn’t actually guarantee that the cache never exceeds its capacity. For starters (more on this problem later), this is only performing put operations. More importantly though, although the space this test applies to is logically infinite, in practice this test will run some number of test cases, mostly small, and will pass if each of those test cases pass.
To learn more about what actually happens when you run this test, read Lifecycle of a Property-Based Test
-
This is not how it’s usually presented though. Property-based testing is often sold as “lightweight formal verification” and as if it was a very different thing from normal testing and there were these profound mathematical properties of your software that you pluck out of the platonic realm and place within your test suite. I think this is an extremely unhelpful way to look at it. ↩
-
Arguably differential testing is a type of property-based testing, and certainly you can and sometimes should use a property-based testing library to do differential testing, but it’s a bit of a special category. ↩
What happens when you run a Hegel test
In the previous chapter we wrote a property-based test for an LRU cache. In this chapter we’ll go through it in more detail, and what happens when we run it.
In order to do this, let’s run it against a real, buggy, implementation of MyLRUCache. We’ll start with one that simply never evicts:
/// An "LRU cache" that never actually evicts anything: it just stores every
/// entry in a map. Because it ignores its capacity, its size grows without
/// bound.
pub struct MyLRUCache<K, V> {
capacity: usize,
entries: HashMap<K, V>,
}
impl<K: Hash + Eq, V> MyLRUCache<K, V> {
pub fn new(capacity: usize) -> Self {
Self {
capacity,
entries: HashMap::new(),
}
}
pub fn capacity(&self) -> usize {
self.capacity
}
pub fn put(&mut self, key: K, value: V) {
// BUG: a real LRU cache would evict the least-recently-used entry once
// it reached `self.capacity`. This one never does.
self.entries.insert(key, value);
}
pub fn get(&self, key: &K) -> Option<&V> {
self.entries.get(key)
}
pub fn size(&self) -> usize {
self.entries.len()
}
}
And here is the property-based test from the previous chapter, which we’ll run against it:
#[hegel::test]
fn test_respects_lru_capacity(tc: TestCase) {
let capacity = tc.draw(gs::integers::<usize>().min_value(0));
let mut cache = MyLRUCache::<String, i64>::new(capacity);
let entries = tc.draw(gs::vecs(gs::tuples!(gs::text(), gs::integers::<i64>())));
for (key, value) in entries {
cache.put(key, value);
}
assert!(cache.size() <= capacity);
}
Because the cache never evicts, its size grows without bound, so the property is false: as soon as we insert more distinct keys than the capacity, the cache is too big. As a result, the test fails:
running 1 test
test test_respects_lru_capacity ... FAILED
failures:
---- test_respects_lru_capacity stdout ----
let capacity = 0;
let entries = [("", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru.rs:16:5:
assertion failed: cache.size() <= capacity
failures:
test_respects_lru_capacity
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
The reported failure is very straightforward: We set the capacity to zero, then we insert a single key (an empty string, with inserted value 0), and then check that the cache has at most zero elements in it, which it does not so the test fails.
We might reasonably think that this is a bug with the capacity zero case, and maybe we’re not interested in capacity zero caches, so we could modify the test as follows:
#[hegel::test]
fn test_respects_lru_capacity(tc: TestCase) {
let capacity = tc.draw(gs::integers::<usize>().min_value(1));
let mut cache = MyLRUCache::<String, i64>::new(capacity);
let entries = tc.draw(gs::vecs(gs::tuples!(gs::text(), gs::integers::<i64>())));
for (key, value) in entries {
cache.put(key, value);
}
assert!(cache.size() <= capacity);
}
All we’ve changed is that the capacity is now at least one. This doesn’t help, though: the cache still never evicts, so the property is still false, and Hegel just finds the next-smallest failure instead:
running 1 test
test test_respects_lru_capacity ... FAILED
failures:
---- test_respects_lru_capacity stdout ----
let capacity = 1;
let entries = [("", 0), ("0", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_nonzero.rs:16:5:
assertion failed: cache.size() <= capacity
failures:
test_respects_lru_capacity
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
This time the capacity is one, and we insert two distinct keys — the empty
string and the string "0" — giving a cache of size two, which is larger than
one. The bug was never really about capacity zero at all.
There are a couple of things worth observing about this test:
- It does, correctly, fail, in a way that demonstrates the problem. Given that the thing it is testing is completely broken, this is a pretty low bar to clear, but it’s worth noting explicitly.1
- When it fails it prints the failing test case. This is a big difference between property-based tests and typical example-based tests: Because the test involves generated data, you need to be able to show the actual concrete values that were chosen.
- The printed test case is quite simple. In this case it’s the simplest it could possibly be (according to some specific notion of “simplest”), but in general it will only be simplified.
These come from the basic lifecycle of a property-based test:
- We run the test function multiple times (in Hegel, 100 times by default), with different generated values.
- If any of them fail, we pick a failing test case and shrink it - running the test function many more times, with simpler variations of our current simplest failing test case.
- Finally, we print it.
Shrinking in action
In order to see shrinking at work, we can use the verbosity setting to print every test case tried as we run:2
#[hegel::test(verbosity = Verbosity::Verbose)]
fn test_respects_lru_capacity(tc: TestCase) {
let capacity = tc.draw(gs::integers::<usize>().min_value(1));
let mut cache = MyLRUCache::<String, i64>::new(capacity);
let entries = tc.draw(gs::vecs(gs::tuples!(gs::text(), gs::integers::<i64>())));
for (key, value) in entries {
cache.put(key, value);
}
assert!(cache.size() <= capacity);
}
This prints all intermediate test cases rather than just the final failing one. Here’s an excerpt of the most interesting bits:
running 1 test
Running test case
let capacity = 1;
let entries = [];
Running test case
let capacity = 3971;
let entries = [];
Running test case
let capacity = 46634;
let entries = [("úY\u{8b413}", -9223372036854739458)];
Running test case
let capacity = 142;
let entries = [("\u{94}ÈóA", -9223372036854751606), ("#¼", -9223372036854775631)];
[ ... dozens more cases, each one passing ... ]
Running test case
let capacity = 1;
let entries = [("\u{98}§\u{96}", -9223372036854726090), ("3\u{1f}\u{90}\u{3cbe9}\nË^W", -6634910996680964021), ("\u{dd1c7}\u{16}\u{b}\u{8b}", -2421911336807628733), ("ù\u{52662}7", 3450639943582724138), ("\u{1b}Pဋ\u{13}S\u{95}5ä", -9223372033209782582), ("𝑻𝒉𝒆 𝒒𝒖𝒊𝒄𝒌 𝒃𝒓𝒐𝒘𝒏 𝒇𝒐𝒙 𝒋𝒖𝒎𝒑𝒔 𝒐𝒗𝒆𝒓 𝒕𝒉𝒆 𝒍𝒂𝒛𝒚 𝒅𝒐𝒈", -5779172081946973126), ("\u{c0db0}\u{84}o\u{1d}\u{88}", -9223372036854733404), ("N> ü", -9223372036854712834), ("\u{8f}Á𘏐\u{92da9}", -9223372036854775600), ("\u{86251}\u{92}_´𧇎𑚨\u{8cc31}", -9223372036854757126), ("ä", -9223372036854745451), ("", -9223372036854738920), ("ýV\u{9a}", -3145832809244121282)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
[ ... the same failure, simplified over and over ... ]
Running test case
let capacity = 1;
let entries = [("\u{98}§\u{96}", -9223372036854726090), ("3\u{1f}\u{90}\u{3cbe9}\nË^W", -6634910996680964021)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
Running test case
let capacity = 1;
let entries = [("\u{98}§\u{96}", -9223372036854726090), ("", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
Running test case
let capacity = 1;
let entries = [("\u{98}§\u{96}", -9223372036854726090)];
Running test case
let capacity = 1;
let entries = [("", 0)];
[ ... some simplifications no longer fail, so they are discarded ... ]
Running test case
let capacity = 1;
let entries = [("", 0), ("000", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
Running test case
let capacity = 1;
let entries = [("", 0), ("00", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
Running test case
let capacity = 1;
let entries = [("", 0), ("0", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
Running test case
let capacity = 1;
let entries = [("0", 0)];
let capacity = 1;
let entries = [("", 0), ("0", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
test test_respects_lru_capacity ... FAILED
failures:
test_respects_lru_capacity
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
You can see both the generation and the shrinking at play here. Initially, Hegel tries a variety of different test cases, with various different capacities and entries, until it finds one that fails (e.g. capacity = 1, with more than one entry). Then it switches to a shrink mode, where it tries deleting those entries, simplifying keys and values within them, etc. Once it can no longer shrink any further it replays the final shrunk example one last time and lets the shrunk failure that we saw propagate to the test runner.
Replaying a saved failure
We’ll now see one other piece of the property-based testing lifecycle: Replay. Once Hegel has found this failure, subsequent runs will start from there (until the bug is fixed). So if we run the verbose test a second time without changing anything, we don’t see the long search and shrink from before:
running 1 test
Running test case
let capacity = 1;
let entries = [("", 0), ("0", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
let capacity = 1;
let entries = [("", 0), ("0", 0)];
thread 'test_respects_lru_capacity' panicked at tests/lru_verbose.rs:18:5:
assertion failed: cache.size() <= capacity
test test_respects_lru_capacity ... FAILED
failures:
failures:
test_respects_lru_capacity
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
As well as significantly speeding up the test, this feature is an important part of making Hegel part of your development loop. Although a Hegel test may sometimes pass erroneously (because it failed to find the bug in its 100 test case budget), once it has found a bug, you may reliably use it as part of your development process because the test will keep failing until the bug is fixed.
-
It’s also not completely true! If you run these tests yourself, they will only fail most of the time. More on that in a second. ↩
-
Or we can in languages that support this, which turns out to only be our rust implementation right now. Sorry, we’re on it. ↩
Stateful testing in Hegel
We’ll continue with our running example of an LRU cache. In this chapter we’ll show how to use stateful testing to completely specify the behaviour of the cache, and how this can help us develop it.
In classic property-based testing you have something that looks like a normal test, just with some parametrized sources of data. Stateful testing takes this up a notch by letting the property-based testing library assemble whole tests for you out of component parts.
The basic idea of stateful testing is that you have some sort of system under test (which might be e.g. a database, a data structure, the API for your site…) and you want to perform operations against it and assert that no matter what operations you perform, nothing breaks.
Hegel represents this with two basic components: rules, which are things you can do against the system under test, and invariants, which are things that should always be true.
Here is an example of how to use it with our running example of an LRUCache:
struct CacheModel {
// The real cache, which is the thing under test.
cache: MyLRUCache<String, i64>,
capacity: usize,
// The model: a deliberately simple but correct LRU cache, built out of two
// plain maps. `store` holds the current key/value entries, and `recency`
// records the "time" at which each key was last used.
store: HashMap<String, i64>,
recency: HashMap<String, u64>,
clock: u64,
// Every key we have ever inserted, so that `get` can also ask about keys
// the model has since evicted.
keys: Variables<String>,
}
#[hegel::state_machine]
impl CacheModel {
#[rule]
fn put(&mut self, tc: TestCase) {
let key = tc.draw(gs::text());
let value = tc.draw(gs::integers::<i64>());
self.cache.put(key.clone(), value);
// Update the model. If we are now over capacity, evict the
// least-recently-used key with a naive O(n) scan of the recency map.
self.store.insert(key.clone(), value);
self.recency.insert(key.clone(), self.clock);
self.clock += 1;
if self.store.len() > self.capacity {
let lru = self
.recency
.iter()
.min_by_key(|(_, &used)| used)
.map(|(k, _)| k.clone())
.unwrap();
self.store.remove(&lru);
self.recency.remove(&lru);
}
self.keys.add(key);
}
#[rule]
fn get(&mut self, _tc: TestCase) {
let key = self.keys.draw().clone();
// Reading a key counts as using it, so refresh its recency.
if self.store.contains_key(&key) {
self.recency.insert(key.clone(), self.clock);
self.clock += 1;
}
assert!(
self.cache.get(&key) == self.store.get(&key),
"get({:?}): cache has {:?}, model has {:?}",
key,
self.cache.get(&key),
self.store.get(&key)
);
}
#[invariant]
fn same_size(&mut self, _tc: TestCase) {
assert!(
self.cache.size() == self.store.len(),
"cache size {} != model size {}",
self.cache.size(),
self.store.len()
);
}
}
#[hegel::test]
fn test_cache_matches_model(tc: TestCase) {
let capacity = tc.draw(gs::integers::<usize>().min_value(1));
let machine = CacheModel {
cache: MyLRUCache::new(capacity),
capacity,
store: HashMap::new(),
recency: HashMap::new(),
clock: 0,
keys: variables(&tc),
};
hegel::stateful::run(machine, tc);
}
In this stateful test we are defining a model - essentially a bad implementation of the same API in this case - alongside our LRUCache, and asserting that the two always agree in their behaviour.
We define rules put and get. put puts keys into both the cache and the model, while get looks up a key in each and asserts that you get the same result (both present or both absent, and if both present then with the same value). The same_size invariant checks that they always have the same number of keys.
One thing worth noting is the keys object. This stores keys we’ve previously used for reuse in future rules. This is useful because it allows us to ensure that keys we check in get are ones we’ve previously used in put - if we just checked random keys there, get would typically be fairly trivial because we’d usually get cache misses in each.
Running it
We can run our stateful test inside a normal Hegel test:
running 1 test
let capacity = 1;
Initial invariant check.
Step 1: put
let draw_1 = "";
let draw_2 = 0;
Step 2: put
let draw_3 = "0";
let draw_4 = 0;
thread 'test_cache_matches_model' panicked at tests/stateful.rs:70:9:
cache size 2 != model size 1
test test_cache_matches_model ... FAILED
failures:
failures:
test_cache_matches_model
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
This is the same bug we found in the previous chapter — the cache never evicts, so after we’ve put two keys into a cache of size one, the model and the cache disagree. Suppose we “fix” this bug by just stopping cache writes when the cache gets too big:
pub fn put(&mut self, key: K, value: V) {
// BUG: once we are full we drop the write entirely, rather than evicting
// the least-recently-used entry to make room for the new one.
if self.entries.len() >= self.capacity && !self.entries.contains_key(&key) {
return;
}
self.entries.insert(key, value);
}
The same_size invariant will no longer fail, but now the get rule does:
running 1 test
let capacity = 1;
Initial invariant check.
Step 1: put
let draw_1 = "";
let draw_2 = 0;
Step 2: put
let draw_3 = "0";
let draw_4 = 0;
Step 3: get
Rule stopped early due to violated assumption.
thread 'test_cache_matches_model' panicked at tests/stateful_capped.rs:51:9:
get("0"): cache has None, model has Some(0)
test test_cache_matches_model ... FAILED
failures:
failures:
test_cache_matches_model
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
This is the flexibility of stateful testing. Unlike our previous property-based test, which captures a single property of the system, this test asserts that under any sequence of operations, the behaviour of the real implementation agrees with the model.
In some sense, any implementation that passes this test “has” to be correct, because the model fully specifies the range of allowed behaviours. The reality is a little weaker than that, because you’re limited by what sequences of operations Hegel will generate (e.g. in normal operation this will not find any bugs that require thousands of operations to trigger), and of course this doesn’t test any nonfunctional requirements like performance (you could pass it by replicating the same inefficient implementation that is in the model), but in practice we’ve found it’s often quite effective at flushing out problems in implementations.