Doing Something Actually Useful With Formal Verification

My friend Isaac is writing his own programming language in order to synchronize contacts between his phone and computer. I thought that was pretty crazy, but meanwhile I’m writing a long, formally-verified proof of Fenwick trees in order to learn Mandarin Chinese.

Kublai: You two are both complete fools!

OK, calm down Kublai. I’ll explain why.

In 2022, I decided I wanted to get better at speaking and reading Mandarin Chinese. Unfortunately, Chinese has the worst, most miserable, painful writing system ever designed so I needed a flash cards app to help me memorize thousands of Chinese characters. At that time, I was learning the Go programming language for contributing to Gitea and Woodpecker CI, so I wrote SD, a ridiculously overengineered flash cards app using Go, SQLite, and segment trees.

A while later, I rewrote the app using C and Fenwick trees (the superior data structure, obviously) for maximal performance (although it’s actually bottlenecked by SQLite so yeah I’m really overengineering things).

However, the C port had a pretty bad habit of segfaulting sometimes, and Fenwick trees are full of crazy bit magic and potential off-by-one errors, so: formal verification to the rescue!

Kublai: Oh nooooo, not formal verification! I still have a huge headache from your previous post!

Cool, you’re gonna love the rest of this post then! I only proved a few Fenwick tree operations last time, so there’s two more we need to prove for the flash cards app.

Proving range query

First, let’s prove the range query operation. This isn’t actually used by my flash cards app, but let’s do it anyways for funsies, since it was also one of the challenges I posed at the end of my previous post. That said, I think I did severely underestimate the difficulty of my challenge problems.

Difficulty	What I expected	In reality
★☆☆	15 minutes	1 hour
★★☆	1 hour	Eh it’ll take an afternoon
★★★	3 hours	DO NOT ATTEMPT

Anyways, here’s the function we’ll prove, which returns the sum between the indices i' and j' inclusive:

method range_query(i': int, j': int) returns (ret: int)
{
    ret := 0;
    var j := j';
    while j >= i'
    {
        ret := ret + F[j];
        j := j - lsb(j);
    }
    var i := i' - 1;
    while i > j
    {
        ret := ret - F[i];
        i := i - lsb(i);
    }
}

This function isn’t terribly difficult to prove, as in, we’ll only need one lemma instead of a gazillion. It’s a variant of lsb_add from last time.

lemma lsb_sub(i: int)
    requires i > 0
    requires i > lsb(i)
    ensures 2 * lsb(i) <= lsb(i - lsb(i))
{
    reveal lsb;
    if i % 2 == 0 {
        lsb_sub(i / 2);
    }
}

Now we’re ready for the proof! The only tricky thing is to prove that i doesn’t overshoot j in the second loop, which we do by repurposing the lsb_next lemma from the update proof.

method range_query(i': int, j': int) returns (ret: int)
    requires 0 < i' <= j' <= N
    requires valid()
    ensures ret == sum(A[i' - 1..j'])
{
    ret := 0;
    var j := j';
    ghost var k := j';
    assert ret == sum(A[j'..j']) by { reveal sum; }
    while j >= i'
        invariant j >= 0
        invariant ret == sum(A[j..j'])
        // k is the previous value of j
        invariant j == j' || j == k - lsb(k)
        invariant k >= i'
    {
        ret := ret + F[j];
        // Extend sum from j to j - lsb(j)
        split(A, j - lsb(j), j, j');
        // Save j
        k := j;
        j := j - lsb(j);
    }
    if k > lsb(k) {
        // This ensures i' <= j + lsb(j);
        lsb_sub(k);
    }
    // Now start from i' - 1 and remove intervals until we reach j
    var i := i' - 1;
    assert ret == sum(A[j..j']) - sum(A[i..i' - 1]) by { reveal sum; }
    while i > j
        invariant i >= j
        invariant ret == sum(A[j..j']) - sum(A[i..i' - 1])
    {
        ret := ret - F[i];
        split(A, i - lsb(i), i, i' - 1);
        if j > 0 {
            // This prevents us from overshooting
            lsb_next(j);
        }
        i := i - lsb(i);
    }
    // Finally, remove the overlapping interval
    split(A, i, i' - 1, j');
}

Kublai: Aaahhh now you’re just dumping tons of code at me, help!

Proving search

Next, let’s prove prefix search! We’ll need a new function, msb(i), which is the largest power of 2 less than or equal to i. You can also view this as the value of the leftmost bit in i. In C, we can compute that using 1 << (8 * sizeof(int) - __builtin_clz(i) - 1), but here we’ll write a recursive implementation in Dafny:

function msb(i: int): int
    requires i > 0
    ensures msb(i) > 0
    ensures 2 * msb(i) > i
{
    if i == 1 then 1 else 2 * msb(i / 2)
}

Alright, now here’s the function we’ll prove, which returns the largest index with prefix sum less than or equal to s':

method search(s': int) returns (ret: int)
{
    ret := 0;
    var s := s';
    var i := msb(N);
    while i > 0
    {
        if ret + i <= N && F[ret + i] <= s {
            s := s - F[ret + i];
            ret := ret + i;
        }
        i := i / 2;
    }
}

First, note that we need all array elements to be nonnegative, or else there could be multiple indices where the prefix sum increases from s' to greater than s'. An alternative way to see this is that this function is basically doing a binary search, and we need the prefix sums to be nondecreasing to binary search over them. Here’s the full specification for this function:

method search(s': int) returns (ret: int)
    requires N > 0
    requires s' >= 0
    requires valid()
    requires forall i :: 0 <= i < N ==> A[i] >= 0
    ensures forall i :: 0 <= i <= N ==> (sum(A[..i]) <= s' <==> i <= ret)
{
    ret := 0;
    var s := s';
    var i := msb(N);
    while i > 0
    {
        if ret + i <= N && F[ret + i] <= s {
            s := s - F[ret + i];
            ret := ret + i;
        }
        i := i / 2;
    }
}

Lots of lemmas

First, we’ll define a power of 2 as a number i such that lsb(i) == i. This is a pretty unconventional definition but it’ll be a lot easier to reason about.

Let’s check that msb always returns a power of 2:

lemma msb_is_pow2(i: int)
    requires i > 0
    ensures lsb(msb(i)) == msb(i)
{
    reveal lsb;
}

Pretty easy, right? Next, let’s show that if i is a power of 2, then i / 2 is also a power of 2:

lemma lsb_pow2(i: int)
    requires i > 1
    requires lsb(i) == i
    ensures lsb(i / 2) == i / 2
    ensures lsb(i) == 2 * lsb(i / 2)
{
    reveal lsb;
}

Here’s a useful lemma about the lsb of a sum of two numbers:

lemma lsb_sum(i: int, j: int)
    requires i > 0 && j > 0
    requires lsb(i) > lsb(j)
    ensures lsb(i + j) == lsb(j)
{
    reveal lsb;
    if j % 2 == 0 {
        lsb_sum(i / 2, j / 2);
    }
}

And finally, let’s confirm that nonnegative arrays have nondecreasing prefix sums.

lemma sum_nondecreasing(A: seq<int>, i: int, j: int)
    requires 0 <= i <= j <= |A|
    requires forall k :: 0 <= k < |A| ==> A[k] >= 0
    ensures sum(A[..i]) <= sum(A[..j])
{
    reveal sum;
    if j > i {
        sum_nondecreasing(A, i, j - 1);
        split(A, 0, j - 1, j);
    }
}

The proof

We’re ready to prove the main loop inside search! ret is a lower bound on the answer, while ret + 2 * i will be the upper bound for our binary search.

ret := 0;
var s := s';
msb_is_pow2(N);
var i := msb(N);
assert sum(A[..ret]) + s == s' by { reveal sum; }
while i > 0
    // Sorry for all the invariants
    // i is always a power of 2
    invariant i == 0 || (i > 0 && lsb(i) == i)
    invariant ret == 0 || lsb(ret) > i
    // A[..ret] is still s away from s'
    invariant s >= 0
    invariant ret <= N && sum(A[..ret]) == s' - s
    // A[..ret + 2 * i] is larger
    invariant i > 0 ==> ret + 2 * i > N || sum(A[..ret + 2 * i]) > s'
    // At the end of the loop our upper bound is ret + 1
    invariant i == 0 ==> ret == N || sum(A[..ret + 1]) > s'
{
    if ret + i <= N {
        if ret > 0 {
            // Ensure that lsb(ret + i) == lsb(i)
            lsb_sum(ret, i);
        }
        split(A, 0, ret, ret + i);
    }
    if ret + i <= N && F[ret + i] <= s {
        s := s - F[ret + i];
        // Move lower bound
        ret := ret + i;
    }
    // Otherwise, move upper bound
    if i > 1 {
        // i / 2 is still a power of 2
        lsb_pow2(i);
    }
    i := i / 2;
}

Kublai: Aaaaaahhhhhhh so many loop invariants!!!

Now we’ve proved sum(A[..ret]) <= s' && (ret == N || A[..ret + 1] > s'), so we just need to show everything to the left is smaller and everything to the right is larger using sum_nondecreasing.

forall j | 0 <= j <= N
    ensures sum(A[..j]) <= s' <==> j <= ret
{
    if j <= ret {
        sum_nondecreasing(A, j, ret);
    } else {
        sum_nondecreasing(A, ret + 1, j);
    }
}

And that’s it for all the proofs! You can relax now, Kublai.

Using our verified Fenwick tree

We have a nice formally verified Fenwick tree, now what? Dafny doesn’t support translating Dafny code to C, but it does support Rust. So, let’s use it for a Rust port of my flash cards app, which I’ll call SDRS. First, we’ll translate the Dafny code to Rust using dafny build -t rs fenwick.dfy (which you can find here). This produces two crates called dafny_runtime and fenwick which we’ll add to SDRS.

In the Rust code, first import it:

use fenwick::_module::fenwick;

The syntax for creating a new Fenwick tree from a Vec<i32> is a bit abstruse and undocumented (I had to trawl through the source code and examples in the Dafny repo to figure this out):

let obj = fenwick::_allocate_object();
let seq = a[1..=n]
    .iter()
    .map(|x| dafny_runtime::DafnyInt::from_i32(*x))
    .collect();
fenwick::_ctor(&obj, &seq);
let ft = dafny_runtime::rd!(obj);

Now we can use our various Fenwick tree operations:

let i = ft.search(&dafny_runtime::DafnyInt::from_i32(s)).as_usize() + 1;

ft.update(
    &dafny_runtime::DafnyInt::from_usize(i),
    &dafny_runtime::DafnyInt::from_i32(w - a[i]),
);

It’s a bit janky since dafny_runtime has its own types which we have to convert to and from. Also, note that it’s our responsibility to use the Fenwick tree correctly since we’re outside of Dafny now and can’t rely on it to check our work. For a full example, see this repo and admire my trashy Rust code.

Here’s the PyQt6 GUI for my flash cards app in its full glory, using the (partially) formally verified Rust backend!

A screenshot of sd-qt using sdrs as its backend

Conclusion

So was it worth it? Was it worth all this trouble so that I can have a lovely, formally verified flash cards app?

Kublai: No way!

Actually, that’s right. I don’t even use my flash cards app much anymore since I’ve progressed pretty far in learning Mandarin, so what’s the use of SDRS nowadays? And plus, this Fenwick library is pretty slow since it uses recursive implementations of the crazy bit hacks and is full of Dafny-generated boilerplate. In fact, it’s 1000 times slower compared to a pure Rust implementation which is pretty concerning.

Kublai: Well hey it’s 1000 times safer than Rust!

Luckily, we can get a 6x speedup by making our lsb function a ghost and defining a new lsb_fast function that uses bit hacks if i is small:

opaque function lsb_fast(i: int): int
    requires i > 0
    ensures 0 < lsb_fast(i) <= i
{
    if i <= 0xFFFFFFFF then ((i as bv32) & -(i as bv32)) as int
    else if i % 2 == 1 then 1 else 2 * lsb_fast(i / 2)
}

lemma lsb_correct(i: int)
    requires i > 0
    ensures lsb(i) == lsb_fast(i)
{
    reveal lsb;
    reveal lsb_fast;
}

Kublai: Nooooo you lied, I thought there wouldn’t be any more Dafny code!

We’ll still use lsb for proof purposes, but use lsb_fast instead for executable code, although we have to call lsb_correct each time so that Dafny knows they’re equivalent. With this optimization, our library is now only 200 times slower than pure Rust, so I suspect the Dafny-generated code is maybe just super inefficient? Or it’s because of the BigInts? If anyone can improve on this, I’d love to hear it.

Anyways, this is just a toy example, and it proves that formal verification is pretty hard, but it’s not that hard, especially with fancy tools like Dafny (it’s definitely easier than using dependent types). So if you liked this post, check out Dafny and you might find it actually useful someday!