Asian Bayesian

Recently, I noticed that “Bayesian”, as in Bayesian statistics or Bayesian inference, sounds very close to “Asian”. Pun potential!

“I prefer Bayesian models over Asian models.”

“All the countries are using so much game theory to strategize for the Asian Games… more like Bayesian games!”

“Consider a game where the players are $N$ Asians trying to get into MIT. Each Asian has a number randomly chosen uniformly between $0$ and $1$ , and each Asian only knows their own number. MIT will accept the $M$ Asians who apply with the highest numbers. If an Asian applies and gets rejected, they receive $-1$ utility. If they apply and get accepted, they receive $A-1$ utility. Otherwise, they receive $0$ utility. What’s the symmetric Bayesian Nash equilibrium, or should I say, Asian Nash equilibrium, in this game?”

Bad puns aside, let’s actually attempt this game theory problem. At a symmetric Bayesian Nash equilibrium, each player has the same strategy, and they don’t expect to get more utility by switching to a different strategy. Intuitively, it makes sense that this common strategy should be to apply if your number is above a certain cutoff $k$ , and otherwise don’t apply. It makes sense that the cutoff would probably be somewhere around $\frac{N-M}{N}$ .

Let’s say you are some player with number $x$ . Your expected utility increases as $x$ increases, because you will have a higher and higher probability $p(x)$ of getting accepted. You should apply if your expected utility is greater than $0$ .

$\begin{align} (A-1)p(x)+(-1)(1-p(x)) &> 0 \\ Ap(x) &> 1 \end{align}$

Thus, you should apply if the probability of getting accepted $p(x)$ is greater than $\frac{1}{A}$ . Thus, our cutoff $k$ is when $p(k) = \frac{1}{A}$ .

Now what function is $p(x)$ ? You will be accepted if you are in the top $M$ highest players who apply. If all players follow the common strategy of applying above the cutoff, then you will be accepted if your number is in the top $M$ numbers, because all players will numbers above you will apply. Thus, $p(x)$ is the probability that the $M$ th highest number among all $N$ numbers is less than or equal to $x$ .

Now time for some fun calculus! Assume that the $M$ highest number falls in a tiny interval $[y, y+dy]$ for some $y \le x$ . Then there are $M-1$ players with higher numbers, and $N-M$ players with lower numbers. Each of the $M-1$ high players have a probability roughly $1-y$ of having a higher number. Each of the $N-M$ lower players have a probability of roughly $y$ of having a lower number. Additionally, there are $\frac{(M-1)!(N-M)!}{N!}$ ways to choose the $M-1$ high players and $N-M$ low players among the $N$ players. Thus, the probability that the $M$ highest number falls in the interval $[y, y+dy]$ is roughly

$\begin{equation} \frac{(1-y)^{M-1}y^{N-M}N!}{(N-M)!(M-1)!} dy. \end{equation}$

We can now integrate this expression over all $y$ to compute $p(x)$ !

$\begin{equation} p(x) = \int_0^x \frac{(1-y)^{M-1}y^{N-M}N!}{(N-M)!(M-1)!} dy. \end{equation}$

Using the equation from above, $p(k) = \frac{1}{A}$ , we get

$\begin{equation} \frac{1}{A} = \int_0^k \frac{(1-y)^{M-1}y^{N-M}N!}{(N-M)!(M-1)!} dy. \end{equation}$

This integral can be computed because it’s just a polynomial, but then you have to find the root of a high-degree polynomial, so there’s probably no explicit formula for the cutoff $k$ . However, we can solve it numerically using SageMath!

xxxxxxxxxx
 
from sage.symbolic.integration.integral import definite_integral
​
def f(N, M, A):
    k = var('k')
    return find_root(factorial(N-M) * factorial(M-1) / factorial(N) / A == definite_integral(x^(N-M)*(1-x)^(M-1), x, 0, k), 0, 1)
​
def test(N, M, A):
    print(f(N, M, A), n((N-M)/N))
​
print('Actual, Estimate')
test(10, 4, 10)
test(10, 2, 10)
test(10, 8, 10)
test(100, 50, 10)

The first three answers seem reasonable, but the fourth one is totally wack! Only one third of the players applying when half of them can get in? What’s going on?

The culprit is loss-of-precision. The left side blows up to gigantic numbers, while the right side becomes a high-degree polynomial. If we move the factorials to inside the integral, we get much more reasonable results.

xxxxxxxxxx
 
from sage.symbolic.integration.integral import definite_integral
​
def f(N, M, A):
    k = var('k')
    return find_root(factorial(N-M) * factorial(M-1) / factorial(N) / A == definite_integral(x^(N-M)*(1-x)^(M-1), x, 0, k), 0, 1)
​
def g(N, M, A):
    k = var('k')
    p = definite_integral(x^(N-M)*(1-x)^(M-1) * factorial(N) / factorial(N-M) / factorial(M-1), x, 0, k)
    return find_root(1 / A == p, 0, 1)
​
def test(N, M, A):
    print(f(N, M, A), g(N, M, A), n((N-M)/N))
​
print('Wrong, Fixed, Estimate')
test(10, 4, 10)
test(10, 2, 10)
test(10, 8, 10)
test(100, 50, 10)

Much better! Try testing out some other values of $N, M, A$ to see what happens.

Also, using math lingo, $p(x)$ is actually the cumulative distribution for the beta distribution, so we can also use SageMath’s built-in functions to compute $k$ instead of writing out all the factorials and stuff.

xxxxxxxxxx
 
from sage.symbolic.integration.integral import definite_integral
​
def f(N, M, A):
    k = var('k')
    return find_root(factorial(N-M) * factorial(M-1) / factorial(N) / A == definite_integral(x^(N-M)*(1-x)^(M-1), x, 0, k), 0, 1)
​
def g(N, M, A):
    k = var('k')
    p = definite_integral(x^(N-M)*(1-x)^(M-1) * factorial(N) / factorial(N-M) / factorial(M-1), x, 0, k)
    return find_root(1 / A == p, 0, 1)
​
def h(N, M, A):
    return RealDistribution('beta', [N-M+1, M]).cum_distribution_function_inv(1/A)
​
def test(N, M, A):
    print(f(N, M, A), g(N, M, A), h(N, M, A), n((N-M)/N))
​
print('Wrong, Fixed, Using beta distribution, Estimate')
test(10, 4, 10)
test(10, 2, 10)
test(10, 8, 10)
test(100, 50, 10)

As you can see, our own implementation still has some precision issues, and doesn’t quite match the results from using SageMath’s beta distribution, but it’s better than our first attempt. Lesson learned: leave numerical computation to the experts and use the built-in functions whenever possible.

Fun story about SageMath: I participated in the Purple Comet math team contest a few years ago, and this contest has a bizarre rule where you’re allowed to use computers as long as you don’t access the internet. Thus, I just wrote a ton of SageMath code and blasted through the problems, and somehow one year, my school’s team got all the questions correct and tied for first place.

Anyways, that’s all for this post, which may have actually been an excuse to mess around with KaTeX and SageMathCell. I’ve been using Typst (less boilerplate!) and WolframAlpha a lot lately and nearly forgot how to use LaTeX and SageMath, so this blog post took a lot of trial and error. Oh, and if you want to see more fun with SageMathCell, check out this nice article about why 37 is the median value for the second prime factor of an integer!