Stateful testing in Hegel
We’ll continue with our running example of an LRU cache. In this chapter we’ll show how to use stateful testing to completely specify the behaviour of the cache, and how this can help us develop it.
In classic property-based testing you have something that looks like a normal test, just with some parametrized sources of data. Stateful testing takes this up a notch by letting the property-based testing library assemble whole tests for you out of component parts.
The basic idea of stateful testing is that you have some sort of system under test (which might be e.g. a database, a data structure, the API for your site…) and you want to perform operations against it and assert that no matter what operations you perform, nothing breaks.
Hegel represents this with two basic components: rules, which are things you can do against the system under test, and invariants, which are things that should always be true.
Here is an example of how to use it with our running example of an LRUCache:
struct CacheModel {
// The real cache, which is the thing under test.
cache: MyLRUCache<String, i64>,
capacity: usize,
// The model: a deliberately simple but correct LRU cache, built out of two
// plain maps. `store` holds the current key/value entries, and `recency`
// records the "time" at which each key was last used.
store: HashMap<String, i64>,
recency: HashMap<String, u64>,
clock: u64,
// Every key we have ever inserted, so that `get` can also ask about keys
// the model has since evicted.
keys: Variables<String>,
}
#[hegel::state_machine]
impl CacheModel {
#[rule]
fn put(&mut self, tc: TestCase) {
let key = tc.draw(gs::text());
let value = tc.draw(gs::integers::<i64>());
self.cache.put(key.clone(), value);
// Update the model. If we are now over capacity, evict the
// least-recently-used key with a naive O(n) scan of the recency map.
self.store.insert(key.clone(), value);
self.recency.insert(key.clone(), self.clock);
self.clock += 1;
if self.store.len() > self.capacity {
let lru = self
.recency
.iter()
.min_by_key(|(_, &used)| used)
.map(|(k, _)| k.clone())
.unwrap();
self.store.remove(&lru);
self.recency.remove(&lru);
}
self.keys.add(key);
}
#[rule]
fn get(&mut self, _tc: TestCase) {
let key = self.keys.draw().clone();
// Reading a key counts as using it, so refresh its recency.
if self.store.contains_key(&key) {
self.recency.insert(key.clone(), self.clock);
self.clock += 1;
}
assert!(
self.cache.get(&key) == self.store.get(&key),
"get({:?}): cache has {:?}, model has {:?}",
key,
self.cache.get(&key),
self.store.get(&key)
);
}
#[invariant]
fn same_size(&mut self, _tc: TestCase) {
assert!(
self.cache.size() == self.store.len(),
"cache size {} != model size {}",
self.cache.size(),
self.store.len()
);
}
}
#[hegel::test]
fn test_cache_matches_model(tc: TestCase) {
let capacity = tc.draw(gs::integers::<usize>().min_value(1));
let machine = CacheModel {
cache: MyLRUCache::new(capacity),
capacity,
store: HashMap::new(),
recency: HashMap::new(),
clock: 0,
keys: variables(&tc),
};
hegel::stateful::run(machine, tc);
}
In this stateful test we are defining a model - essentially a bad implementation of the same API in this case - alongside our LRUCache, and asserting that the two always agree in their behaviour.
We define rules put and get. put puts keys into both the cache and the model, while get looks up a key in each and asserts that you get the same result (both present or both absent, and if both present then with the same value). The same_size invariant checks that they always have the same number of keys.
One thing worth noting is the keys object. This stores keys we’ve previously used for reuse in future rules. This is useful because it allows us to ensure that keys we check in get are ones we’ve previously used in put - if we just checked random keys there, get would typically be fairly trivial because we’d usually get cache misses in each.
Running it
We can run our stateful test inside a normal Hegel test:
running 1 test
let capacity = 1;
Initial invariant check.
Step 1: put
let draw_1 = "";
let draw_2 = 0;
Step 2: put
let draw_3 = "0";
let draw_4 = 0;
thread 'test_cache_matches_model' panicked at tests/stateful.rs:70:9:
cache size 2 != model size 1
test test_cache_matches_model ... FAILED
failures:
failures:
test_cache_matches_model
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
This is the same bug we found in the previous chapter — the cache never evicts, so after we’ve put two keys into a cache of size one, the model and the cache disagree. Suppose we “fix” this bug by just stopping cache writes when the cache gets too big:
pub fn put(&mut self, key: K, value: V) {
// BUG: once we are full we drop the write entirely, rather than evicting
// the least-recently-used entry to make room for the new one.
if self.entries.len() >= self.capacity && !self.entries.contains_key(&key) {
return;
}
self.entries.insert(key, value);
}
The same_size invariant will no longer fail, but now the get rule does:
running 1 test
let capacity = 1;
Initial invariant check.
Step 1: put
let draw_1 = "";
let draw_2 = 0;
Step 2: put
let draw_3 = "0";
let draw_4 = 0;
Step 3: get
Rule stopped early due to violated assumption.
thread 'test_cache_matches_model' panicked at tests/stateful_capped.rs:51:9:
get("0"): cache has None, model has Some(0)
test test_cache_matches_model ... FAILED
failures:
failures:
test_cache_matches_model
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
This is the flexibility of stateful testing. Unlike our previous property-based test, which captures a single property of the system, this test asserts that under any sequence of operations, the behaviour of the real implementation agrees with the model.
In some sense, any implementation that passes this test “has” to be correct, because the model fully specifies the range of allowed behaviours. The reality is a little weaker than that, because you’re limited by what sequences of operations Hegel will generate (e.g. in normal operation this will not find any bugs that require thousands of operations to trigger), and of course this doesn’t test any nonfunctional requirements like performance (you could pass it by replicating the same inefficient implementation that is in the model), but in practice we’ve found it’s often quite effective at flushing out problems in implementations.