Stateful testing in Hegel

We’ll continue with our running example of an LRU cache. In this chapter we’ll show how to use stateful testing to completely specify the behaviour of the cache, and how this can help us develop it.

In classic property-based testing you have something that looks like a normal test, just with some parametrized sources of data. Stateful testing takes this up a notch by letting the property-based testing library assemble whole tests for you out of component parts.

The basic idea of stateful testing is that you have some sort of system under test (which might be e.g. a database, a data structure, the API for your site…) and you want to perform operations against it and assert that no matter what operations you perform, nothing breaks.

Hegel represents this with two basic components: rules, which are things you can do against the system under test, and invariants, which are things that should always be true.

Here is an example of how to use it with our running example of an LRUCache:

struct CacheModel {
    // The real cache, which is the thing under test.
    cache: MyLRUCache<String, i64>,
    capacity: usize,
    // The model: a deliberately simple but correct LRU cache, built out of two
    // plain maps. `store` holds the current key/value entries, and `recency`
    // records the "time" at which each key was last used.
    store: HashMap<String, i64>,
    recency: HashMap<String, u64>,
    clock: u64,
    // Every key we have ever inserted, so that `get` can also ask about keys
    // the model has since evicted.
    keys: Variables<String>,
}

#[hegel::state_machine]
impl CacheModel {
    #[rule]
    fn put(&mut self, tc: TestCase) {
        let key = tc.draw(gs::text());
        let value = tc.draw(gs::integers::<i64>());

        self.cache.put(key.clone(), value);

        // Update the model. If we are now over capacity, evict the
        // least-recently-used key with a naive O(n) scan of the recency map.
        self.store.insert(key.clone(), value);
        self.recency.insert(key.clone(), self.clock);
        self.clock += 1;
        if self.store.len() > self.capacity {
            let lru = self
                .recency
                .iter()
                .min_by_key(|(_, &used)| used)
                .map(|(k, _)| k.clone())
                .unwrap();
            self.store.remove(&lru);
            self.recency.remove(&lru);
        }

        self.keys.add(key);
    }

    #[rule]
    fn get(&mut self, _tc: TestCase) {
        let key = self.keys.draw().clone();
        // Reading a key counts as using it, so refresh its recency.
        if self.store.contains_key(&key) {
            self.recency.insert(key.clone(), self.clock);
            self.clock += 1;
        }
        assert!(
            self.cache.get(&key) == self.store.get(&key),
            "get({:?}): cache has {:?}, model has {:?}",
            key,
            self.cache.get(&key),
            self.store.get(&key)
        );
    }

    #[invariant]
    fn same_size(&mut self, _tc: TestCase) {
        assert!(
            self.cache.size() == self.store.len(),
            "cache size {} != model size {}",
            self.cache.size(),
            self.store.len()
        );
    }
}

#[hegel::test]
fn test_cache_matches_model(tc: TestCase) {
    let capacity = tc.draw(gs::integers::<usize>().min_value(1));
    let machine = CacheModel {
        cache: MyLRUCache::new(capacity),
        capacity,
        store: HashMap::new(),
        recency: HashMap::new(),
        clock: 0,
        keys: variables(&tc),
    };
    hegel::stateful::run(machine, tc);
}

type cacheModel struct {
	// The real cache, which is the thing under test.
	cache    *MyLRUCache[string, int]
	capacity int
	// The model: a deliberately simple but correct LRU cache, built out of two
	// plain maps. store holds the current key/value entries, and recency
	// records the "time" at which each key was last used.
	store   map[string]int
	recency map[string]int
	clock   int
	// Every key we have ever inserted, so that RuleGet can also ask about keys
	// the model has since evicted.
	keys []string
}

func (m *cacheModel) RulePut(tc hegel.TestCase) {
	key := hegel.Draw(tc, hegel.Text())
	value := hegel.Draw(tc, hegel.Integers(math.MinInt, math.MaxInt))

	m.cache.Put(key, value)

	// Update the model. If we are now over capacity, evict the
	// least-recently-used key with a naive O(n) scan of the recency map.
	m.store[key] = value
	m.recency[key] = m.clock
	m.clock++
	if len(m.store) > m.capacity {
		lru, oldest := "", math.MaxInt
		for k, used := range m.recency {
			if used < oldest {
				lru, oldest = k, used
			}
		}
		delete(m.store, lru)
		delete(m.recency, lru)
	}

	m.keys = append(m.keys, key)
}

func (m *cacheModel) RuleGet(tc hegel.TestCase) {
	tc.Assume(len(m.keys) > 0)
	key := hegel.Draw(tc, hegel.SampledFrom(m.keys))
	// Reading a key counts as using it, so refresh its recency.
	if _, ok := m.store[key]; ok {
		m.recency[key] = m.clock
		m.clock++
	}
	got, gotOK := m.cache.Get(key)
	want, wantOK := m.store[key]
	if gotOK != wantOK || got != want {
		panic(fmt.Sprintf("get(%q): cache has (%v, %v), model has (%v, %v)",
			key, got, gotOK, want, wantOK))
	}
}

func (m *cacheModel) InvariantSameSize(_ hegel.TestCase) {
	if m.cache.Size() != len(m.store) {
		panic(fmt.Sprintf("cache size %d != model size %d",
			m.cache.Size(), len(m.store)))
	}
}

func TestCacheMatchesModel(t *testing.T) {
	hegel.Test(t, func(ht *hegel.T) {
		capacity := hegel.Draw(ht, hegel.Integers(1, math.MaxInt))
		m := &cacheModel{
			cache:    NewMyLRUCache[string, int](capacity),
			capacity: capacity,
			store:    make(map[string]int),
			recency:  make(map[string]int),
		}
		hegel.RunStateful(ht, m)
	})
}

TODO: stateful testing is not yet available in hegel-cpp (v0.3.9). It is
planned for a future release.

TODO: stateful testing is not yet available in hegel-typescript. It is
planned for a future release.

In this stateful test we are defining a model - essentially a bad implementation of the same API in this case - alongside our LRUCache, and asserting that the two always agree in their behaviour.

We define rules put and get. put puts keys into both the cache and the model, while get looks up a key in each and asserts that you get the same result (both present or both absent, and if both present then with the same value). The same_size invariant checks that they always have the same number of keys.

One thing worth noting is the keys object. This stores keys we’ve previously used for reuse in future rules. This is useful because it allows us to ensure that keys we check in get are ones we’ve previously used in put - if we just checked random keys there, get would typically be fairly trivial because we’d usually get cache misses in each.

Running it

We can run our stateful test inside a normal Hegel test:

running 1 test
let capacity = 1;
Initial invariant check.
Step 1: put
  let draw_1 = "";
  let draw_2 = 0;
Step 2: put
  let draw_3 = "0";
  let draw_4 = 0;
thread 'test_cache_matches_model' panicked at tests/stateful.rs:70:9:
cache size 2 != model size 1

test test_cache_matches_model ... FAILED

failures:

failures:
    test_cache_matches_model

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

--- FAIL: TestCacheMatchesModel
    stateful_test.go:79: capacity := hegel.Draw(ht, hegel.Integers(1, math.MaxInt)) = 1
    stateful.go:93: Initial invariant check.
    stateful.go:120: Step 1: RulePut
    stateful.go:120: Step 2: RulePut
    stateful_test.go:78: property test failed: cache size 2 != model size 1
FAIL
FAIL	lru-example/stateful

TODO

TODO

This is the same bug we found in the previous chapter — the cache never evicts, so after we’ve put two keys into a cache of size one, the model and the cache disagree. Suppose we “fix” this bug by just stopping cache writes when the cache gets too big:

    pub fn put(&mut self, key: K, value: V) {
        // BUG: once we are full we drop the write entirely, rather than evicting
        // the least-recently-used entry to make room for the new one.
        if self.entries.len() >= self.capacity && !self.entries.contains_key(&key) {
            return;
        }
        self.entries.insert(key, value);
    }

func (c *CappedCache[K, V]) Put(key K, value V) {
	// BUG: once we are full we drop the write entirely, rather than evicting the
	// least-recently-used entry to make room for the new one.
	if _, present := c.entries[key]; len(c.entries) >= c.capacity && !present {
		return
	}
	c.entries[key] = value
}

TODO: stateful testing is not yet available in hegel-cpp (v0.3.9).

TODO: stateful testing is not yet available in hegel-typescript.

The same_size invariant will no longer fail, but now the get rule does:

running 1 test
let capacity = 1;
Initial invariant check.
Step 1: put
  let draw_1 = "";
  let draw_2 = 0;
Step 2: put
  let draw_3 = "0";
  let draw_4 = 0;
Step 3: get
Rule stopped early due to violated assumption.
thread 'test_cache_matches_model' panicked at tests/stateful_capped.rs:51:9:
get("0"): cache has None, model has Some(0)

test test_cache_matches_model ... FAILED

failures:

failures:
    test_cache_matches_model

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

--- FAIL: TestCacheMatchesModel
    stateful_capped_test.go:71: capacity := hegel.Draw(ht, hegel.Integers(1, math.MaxInt)) = 1
    stateful.go:93: Initial invariant check.
    stateful.go:120: Step 1: RulePut
    stateful.go:120: Step 2: RulePut
    stateful.go:120: Step 3: RuleGet
    stateful_capped_test.go:70: property test failed: get(""): cache has (0, true), model has (0, false)
FAIL
FAIL	lru-example/stateful_capped

TODO

TODO

This is the flexibility of stateful testing. Unlike our previous property-based test, which captures a single property of the system, this test asserts that under any sequence of operations, the behaviour of the real implementation agrees with the model.

In some sense, any implementation that passes this test “has” to be correct, because the model fully specifies the range of allowed behaviours. The reality is a little weaker than that, because you’re limited by what sequences of operations Hegel will generate (e.g. in normal operation this will not find any bugs that require thousands of operations to trigger), and of course this doesn’t test any nonfunctional requirements like performance (you could pass it by replicating the same inefficient implementation that is in the model), but in practice we’ve found it’s often quite effective at flushing out problems in implementations.

Keyboard shortcuts

Property-based testing with Hegel

Stateful testing in Hegel

Running it