Fenwick trees are really fast (although you can speed them up 10x using cache and SIMD tricks). But can we prove this using Dafny?

Kublai: Oh nooooo not Dafny again!

Don’t worry, we’ll only write a few lines of Dafny code this time. Still, you should read my previous Fenwick tree posts since otherwise this post probably won’t make much sense.

As far as I know, Dafny doesn’t have built-in support for reasoning directly about time complexity, but we can still prove bounds for the number of loop iterations for our code. For instance, the update operation contains the following loop:

var i: u32 := i';
while i <= N
    modifies F
    decreases N - i
    // Interval i always covers i'
    invariant i - lsb(i) < i'
    // F beyond i hasn't been modified
    invariant forall j :: 0 < j < i' || i <= j <= N ==> F[j] == old(F[j])
    // Only intervals covering i' have been modified so far
    invariant forall j :: i' <= j < i && j <= N ==> if j - lsb(j) < i' then F[j] == old(F[j]) + v else F[j] == old(F[j])
{
    F[i] := F[i] + v;
    // Ensure lsb keeps getting bigger
    lsb_add(i);
    // Ensure the next interval covering i is at i + lsb(i)
    lsb_next(i);
    lsb_correct(i);
    i := i + lsb_fast(i);
}

Let’s keep track of how many loop iterations we’ve done using a ghost variable called t. Each iteration, we’ll multiply t by 2. How can we relate t to how much progress we’ve made so far in this loop?

The key idea is our best friend lsb_add, which I wrote about extensively here. This lemma shows that each time we do i + lsb(i), the lsb go up by at least 2! This means we’ll always have t <= lsb(i). We just need to make a few small edits to our code from above.

var i: u32 := i';
ghost var t := 1;
while i <= N
    modifies F
    decreases N - i
    // Interval i always covers i'
    invariant i - lsb(i) < i'
    // F beyond i hasn't been modified
    invariant forall j :: 0 < j < i' || i <= j <= N ==> F[j] == old(F[j])
    // Only intervals covering i' have been modified so far
    invariant forall j :: i' <= j < i && j <= N ==> if j - lsb(j) < i' then F[j] == old(F[j]) + v else F[j] == old(F[j])
    // For proving time complexity
    invariant t <= lsb(i)
    invariant t <= 2 * N
{
    F[i] := F[i] + v;
    // Ensure lsb keeps getting bigger
    lsb_add(i);
    // Ensure the next interval covering i is at i + lsb(i)
    lsb_next(i);
    lsb_correct(i);
    i := i + lsb_fast(i);
    t := 2 * t;
}
// Loop ran at most log(N) + 1 iterations
assert t <= 2 * N;

This proof gives us the bound log(N) + 1! This bound is tight since if N is a power of 2 and i = 1, then the loop will run for exactly log(N) + 1 iterations. We need the t <= 2 * N invariant because after the loop, Dafny uses the invariants to infer the current state of the program. If we didn’t have that invariant, then Dafny would only know i > N and t <= lsb(i), which is not enough to infer t <= 2 * N. Assuming that lsb_fast and array assignment take constant time, then update has time complexity O(log N)!

We can use a similar strategy for bounding the loop in query:

ret := 0;
var i: u32 := i';
assert ret == sum(A[i'..i']) by { reveal sum; }
while i > 0
    invariant i >= 0
    invariant ret == sum(A[i..i'])
{
    ret := ret + F[i];
    // Extend sum from i to i - lsb(i)
    split(A, i - lsb(i), i, i');
    lsb_correct(i);
    i := i - lsb_fast(i);
}

Again, we want t <= lsb(i). This time, we’re doing i - lsb(i) instead of i + lsb(i) so we need lsb_add’s cousin, lsb_sub. Other than that, it’s basically the same as the previous proof we did.

ret := 0;
var i: u32 := i';
assert ret == sum(A[i'..i']) by { reveal sum; }
ghost var t := 1;
while i > 0
    invariant i >= 0
    invariant ret == sum(A[i..i'])
    // For proving time complexity
    invariant i == 0 || t <= lsb(i)
    invariant t <= 2 * N
{
    ret := ret + F[i];
    // Extend sum from i to i - lsb(i)
    split(A, i - lsb(i), i, i');
    lsb_correct(i);
    if i > lsb(i) {
        lsb_sub(i);
    }
    i := i - lsb_fast(i);
    t := 2 * t;
}
// Loop ran at most log(N) + 1 iterations
assert t <= 2 * N;

Lastly, for the constructor, it’s kind of “obvious” to see that it runs in O(N) time so we don’t need to prove anything. Or maybe I’m just lazy.