Recently, I noticed that “Bayesian”, as in Bayesian statistics or Bayesian inference, sounds very close to “Asian”. Pun potential!

“I prefer Bayesian models over Asian models.”

“All the countries are using so much game theory to strategize for the Asian Games… more like Bayesian games!”

“Consider a game where the players are NN Asians trying to get into MIT. Each Asian has a number randomly chosen uniformly between 00 and 11, and each Asian only knows their own number. MIT will accept the MM Asians who apply with the highest numbers. If an Asian applies and gets rejected, they receive 1-1 utility. If they apply and get accepted, they receive A1A-1 utility. Otherwise, they receive 00 utility. What’s the symmetric Bayesian Nash equilibrium, or should I say, Asian Nash equilibrium, in this game?”

Bad puns aside, let’s actually attempt this game theory problem. At a symmetric Bayesian Nash equilibrium, each player has the same strategy, and they don’t expect to get more utility by switching to a different strategy. Intuitively, it makes sense that this common strategy should be to apply if your number is above a certain cutoff kk, and otherwise don’t apply. It makes sense that the cutoff would probably be somewhere around NMN\frac{N-M}{N}.

Let’s say you are some player with number xx. Your expected utility increases as xx increases, because you will have a higher and higher probability p(x)p(x) of getting accepted. You should apply if your expected utility is greater than 00.

(A1)p(x)+(1)(1p(x))>0Ap(x)>1\begin{align} (A-1)p(x)+(-1)(1-p(x)) &> 0 \\ Ap(x) &> 1 \end{align}

Thus, you should apply if the probability of getting accepted p(x)p(x) is greater than 1A\frac{1}{A}. Thus, our cutoff kk is when p(k)=1Ap(k) = \frac{1}{A}.

Now what function is p(x)p(x)? You will be accepted if you are in the top MM highest players who apply. If all players follow the common strategy of applying above the cutoff, then you will be accepted if your number is in the top MM numbers, because all players will numbers above you will apply. Thus, p(x)p(x) is the probability that the MMth highest number among all NN numbers is less than or equal to xx.

Now time for some fun calculus! Assume that the MM highest number falls in a tiny interval [y,y+dy][y, y+dy] for some yxy \leq x. Then there are M1M-1 players with higher numbers, and NMN-M players with lower numbers. Each of the M1M-1 high players have a probability roughly 1y1-y of having a higher number. Each of the NMN-M lower players have a probability of roughly yy of having a lower number. Additionally, there are (M1)!(NM)!N!\frac{(M-1)!(N-M)!}{N!} ways to choose the M1M-1 high players and NMN-M low players among the NN players. Thus, the probability that the MM highest number falls in the interval [y,y+dy][y, y+dy] is roughly

(1y)M1yNMN!(NM)!(M1)!dy.\begin{equation} \frac{(1-y)^{M-1}y^{N-M}N!}{(N-M)!(M-1)!} dy. \end{equation}

We can now integrate this expression over all yy to compute p(x)p(x)!

p(x)=0x(1y)M1yNMN!(NM)!(M1)!dy.\begin{equation} p(x) = \int_0^x \frac{(1-y)^{M-1}y^{N-M}N!}{(N-M)!(M-1)!} dy. \end{equation}

Using the equation from above, p(k)=1Ap(k) = \frac{1}{A}, we get

1A=0k(1y)M1yNMN!(NM)!(M1)!dy.\begin{equation} \frac{1}{A} = \int_0^k \frac{(1-y)^{M-1}y^{N-M}N!}{(N-M)!(M-1)!} dy. \end{equation}

This integral can be computed because it’s just a polynomial, but then you have to find the root of a high-degree polynomial, so there’s probably no explicit formula for the cutoff kk. However, we can solve it numerically using SageMath!

The first three answers seem reasonable, but the fourth one is totally wack! Only one third of the players applying when half of them can get in? What’s going on?

The culprit is loss-of-precision. The left side blows up to gigantic numbers, while the right side becomes a high-degree polynomial. If we move the factorials to inside the integral, we get much more reasonable results.

Much better! Try testing out some other values of N,M,AN, M, A to see what happens.

Also, using math lingo, p(x)p(x) is actually the cumulative distribution for the beta distribution, so we can also use SageMath’s built-in functions to compute kk instead of writing out all the factorials and stuff.

As you can see, our own implementation still has some precision issues, and doesn’t quite match the results from using SageMath’s beta distribution, but it’s better than our first attempt. Lesson learned: leave numerical computation to the experts and use the built-in functions whenever possible.

Fun story about SageMath: I participated in the Purple Comet math team contest a few years ago, and this contest has a bizarre rule where you’re allowed to use computers as long as you don’t access the internet. Thus, I just wrote a ton of SageMath code and blasted through the problems, and somehow one year, my school’s team got all the questions correct and tied for first place.

Anyways, that’s all for this post, which may have actually been an excuse to mess around with KaTeX and SageMathCell. I’ve been using Typst (less boilerplate!) and WolframAlpha a lot lately and nearly forgot how to use LaTeX and SageMath, so this blog post took a lot of trial and error. Oh, and if you want to see more fun with SageMathCell, check out this nice article about why 37 is the median value for the second prime factor of an integer!