What’s the Derivative of a Data Type?

Kublai: BOO!!

SHL: Seriously? Trying to scare me? My fox ears could hear you sneaking up from a mile away!

Kublai: Hmph… how can you even hear when your ears are muffled by that ridiculous hat?

SHL: It’s not ridiculous! It’s my Halloween costume! I’m the Witch of Agnesi.

Kublai: Looks more like a pirate hat to me…

SHL: Well, what’s your costume?

Kublai: I’m Kublai Khan!

SHL: Looks more like the Emperor’s New Clothes…

Kublai: Anyways, I was walking around outside, admiring kitschy Halloween decorations, and I stumbled across this guy with the most aromatic, saliva-inducing, limited edition grass-flavored candy bars! And he told me I could have all these lovely amazing grassy candy bars, but with one condition: solve a riddle first. Oh and by the way, he absolutely loved my costume.

SHL: Sure, sure.

Kublai: So here’s the riddle. Actually, I’m terrified even to utter it. But here we go: find the derivative of the list data type.

SHL: So what was your answer?

Kublai: AAAAAAAAAAAAAAHHHH!!! My heart was pounding, knees collapsing, mind flooded with millions of flashbacks of failing high school calculus class…

SHL: Hey, calm down! That question isn’t hard at all! The derivative is just two lists!

Kublai: What is that even supposed to mean? Sounds like you just pulled some random answer out of your ridiculous hat!

SHL: No no no no, there are deep reasons for this! Well, first of all, what’s a list, Kublai?

Kublai: Uh… well, in Lean, it’s something like this, right?
inductive List x where
  | nil
  | cons (head : x) (tail : List x)

SHL: That’s right! So in some sense, a list corresponds to the equation \[\mathsf{List}(x) = 1+x\mathsf{List}(x)\] where $1$ represents the empty list and $x \mathsf{List}(x)$ represents concatenating an element of type $x$ with another list.

Kublai: Oh, so \[\mathsf{List}(x) = \frac{1}{1-x}\] and \[\frac{d\mathsf{List}(x)}{dx} = \frac{1}{(1-x)^2} = \mathsf{List}(x)^2?\] So that’s why the derivative is two lists?

SHL: Whoa whoa whoa, not so fast! How do you know that all that algebra is well-defined and legal?

Kublai: I don’t know! I have no idea what I just did!

SHL: Well, that’s OK. We can make this rigorous using combinatorial species! The equation \[\mathsf{List}(x) = 1+x \mathsf{List}(x)\] is simply the exponential generating function for the list combinatorial species, and technically the right-hand side should be a least fixed point but we can omit it due to the implicit species theorem.

Kublai: A combinatorial whaaaa??

SHL: Oh, a combinatorial species is just an endofunctor on the category of finite sets whose morphisms are bijections, what’s the problem?

Kublai: AAAAAAAAAAAAAAHHHH!!!

SHL: Sigh… let’s try a different strategy. Kublai, have you heard of automatic differentiation (AD)?

Kublai: Yeah, that’s how PyTorch automatically computes gradients so I don’t have to do it myself and relive high school calculus class, right?

SHL: Exactly! So one way to implement AD, specifically forward-mode AD, is using dual numbers. Here’s an example by Alan Edelman using the Julia programming language. So imagine we have an implementation of the Babylonian square root algorithm, and we’d like to take its derivative, which is $\frac{1}{2\sqrt{x}}$.
function BabylonianSqrt(x; N = 10)
    t = (1+x)/2
    for i = 2:N; t = (t+x/t)/2 end
    t
end

Kublai: Oh no, I’m scared already.

SHL: Don’t worry! All we need are these eight lines:

import Base: +, /, convert, promote_rule
struct D <: Number
    f::Tuple{Float64,Float64}
end
+(x::D, y::D) = D(x.f .+ y.f)
/(x::D, y::D) = D((x.f[1]/y.f[1], (y.f[1]*x.f[2]-x.f[1]*y.f[2])/y.f[1]^2))
convert(::Type{D}, x::Real) = D((x,zero(x)))
promote_rule(::Type{D}, ::Type{<:Number}) = D

And now we can compute $\sqrt{2} \approx 1.4142$ and the derivative at $x = 2$, which is $\frac{1}{\sqrt{2}} \approx 0.35355$

# Prints D((1.414213562373095, 0.35355339059327373))
BabylonianSqrt(D((2, 1)))

Kublai: What the heck? That’s insane!

SHL: Look closely. Those formulas should look familiar, right?

Kublai: Oh, that definition of division is the same as that rule from calculus! So that D tuple is storing both the value of the function and its derivative!

SHL: Exactly! Now can you implement - and * for D as well? (We didn’t need it above since BabylonianSqrt doesn’t use those operations.)

Kublai: Hmmm, maybe this?

import Base: -, *
-(x::D, y::D) = D(x.f .- y.f)
*(x::D, y::D) = D((x.f[1]*y.f[1], x.f[2]*y.f[1]+x.f[1]*y.f[2]))

SHL: Perfect! Now, look at this:
# Prints D((0.0, 0.0))
D((0,1)) * D((0,1))
In mathematical notation, we write D((a,b)) as $a+b\epsilon$. The code above shows that $\epsilon^2 = 0$. This is the key property of dual numbers, and the conventional way of defining them. It’s a bit like how we define $i$ as $i^2 = -1$.

Kublai: Wait, so if $\epsilon^2 = 0$, then does $\epsilon$ behave like an infinitesimal? Can we write \[\frac{df}{dx} = \frac{f(x+\epsilon)-f(x)}{\epsilon}?\]

SHL: Hold up! Dividing by $\epsilon$ is undefined since in our formula it would cause division by 0! However, we can salvage your formula by writing \[\frac{df}{dx}\epsilon = f(x+\epsilon)-f(x)\] which can be proven by Taylor expansion.

Kublai: That’s cool and all, but what does it have to do with differentiating data types?

SHL: OK, so back to data types. Here’s our list again, along with its mysterious power series:
inductive List x where
  | nil
  | cons (head : x) (tail : List x)
\[\mathsf{List}(x) = 1+x\mathsf{List}(x)\] Kublai, can you try explaining that equation?

Kublai: Uh… well, a list is a sum type where it’s either nil or a cons cell, so that kinda explains the $+$. And for the cons cell, it’s a product type, or tuple, of an element of type $x$ and another list.

SHL: Now can you think of a data type $\epsilon$ such that $\epsilon^2 = 0$?

Kublai: Hmmmm… that’s a really weird question. It would be like a value where if you make a tuple of it, it just annihilates itself.

SHL: So what would happen if you did \[\frac{\mathsf{List}(x+\epsilon)-\mathsf{List}(x)}{\epsilon}?\]

Kublai: Hey, isn’t that illegal? Well… I guess if we really tried to do it, we’d have a list where every element can either be type $x$ or $\epsilon$, but if you have a list with two or more $\epsilon$s, it’d annihilate itself! So you’re left with all lists that have at most one $\epsilon$, and then we subtract out $\mathsf{List}(x)$, so we get the lists with exactly one $\epsilon$ at some position. But then we divide by $\epsilon$, which removes the $\epsilon$ in our list and makes a hole! Wait, that’s kinda like two lists, one in front of the hole and one behind it!

SHL: Cool! That’s why the derivative of a data type is a “one-hole context”! Now the derivation we did was the complete opposite of rigorous, but it should hopefully give you some intuition for why the rules of calculus show up when dealing with something as discrete as a data type.
We can take the derivative of other data types too, not just lists. For instance, the derivative of a multiset is another multiset, so therefore it’s kinda like $e^x$. Similarly, a set is kinda like $2^x$. We can make all this rigorous and precise using the deep and beautiful theory of combinatorial species!

Kublai: Oh no, not those again! Anyways, thanks for the help, and I have to go hunt down that guy and demand my free grassy candy bars!

What’s the Derivative of a Data Type?

Further reading