Empirical Rabbit: Rating by Maximum Likelihood

In this article, I highlight some limitations of simple rating systems and investigate an improvement. I apply this improved method to Michael de la Maza’a last 19 games, and find that my previous calculation significantly underestimates his final performance. (See my earlier article Michael de la Maza Statistics.) Practical rating systems typically:

(1). Calculate the average rating of the opposition faced by a player.
(2). Add a difference based on the player’s score against that opposition.

This approach is likely to be reasonably accurate, if all the opponents have ratings close to that of the player concerned. It can, however, be inaccurate if there are large rating differences. This limitation can be avoided by using the standard technique of maximum likelihood estimation.

Simple Elo Calculation

The USCF version of the Elo rating system estimates the score s_A (as a fraction between 0 and 1) of player A with rating R_A against an opponent with a rating R_B using the formula:

s_A = 1 / (1 + 10^[(R_B-R_A)/400])

Equivalently:

R_B - R_A = 400*log₁₀(1/s_A - 1)

See: http://en.wikipedia.org/wiki/Elo_rating_system.

If our score as against opponent i is s_i, our overall score against N opponents is:

s = Σ_i {s_i} / N

If we have a rating R, and our opponents have ratings R_i, the average of these ratings is:

R_ave = Σ_i {R_i} / N

Our rating R can be estimated as:

R = R_ave - 400*log₁₀(1/s - 1)

Likelihood

If the outcome of a game was always a win or a loss, the Elo formula would give the probability of winning an individual game. If we can count a draw as one point, and a win as two points, we can interpret the Elo formula as giving the probability of winning a single point. With this assumption, the probability that we will win a single point is:

1 / (1 + 10^[(R_i-R)/400])

The probability that we will lose this point is then:

1 / (1 + 10^[(R-R_i)/400])

Let:

S_i = 1 -- if we win point i, and

S_i = -1 -- if we lose point i.

The probability of obtaining our result for point i is then:

P_i = 1 / (1 + 10^[S_i*(R_i-R)/400])

On the assumption that each point is statistically independent, the probability of obtaining our complete set of results is:

P = Π_i {Pi} = Π_i {1 / (1 + 10^[S_i*(R_i-R)/400])}

Maximum Likelihood

We can now find the value of our rating R that maximises P:

ln P = -Σ_i {ln(1+ 10^[S_i*(R_i-R)/400])}

(1/P) dP/dR = -Σ_i {(1 / (1 + 10^[S_i*(R_i-R)/400]) * ln10 * 10^[S_i*(R_i-R)/400]) * S_i / 400}

(N.B. if y = 10^x, ln y = x ln10, y = e^(x ln10) and dy/dx = ln10 * e^(x ln10) = ln 10 * 10^x.)

P is at its maximum when the derivative dP/dR is zero, i.e. when:

Σ_i {S_i * (1 / (1 + 10^[S_i*(R-R_i)/400])} = 0

What a remarkably simple result! Here is a typical graph of this function:

Numerical Solution

We can solve this equation numerically using Newton’s method. Let:

f(R) = Σ_i {S_i * (1 / (1 + 10^[S_i*(R-R_i)/400])}

f’(R) = Σ_i {S_i * (-1 / (1 + 10^[S_i*(R-R_i)/400])^2 * ln10 * 10^[S_i*(R-R_i)/400] * Si / 400}

= Σ_i {(-ln10 / [(1+ 10^[(R-R_i)/400])*(1+ 10^[(R_i-R)/400])] / 400}

Another simple result! We can use the approximate formula discussed at the beginning of this article to obtain a first approximation for R. Newton’s Method then gives the next approximation as:

R’ = R + h

f(R) + h * f’(R) = 0

h = -f(R) / f’(R)

We can use this formula successively until sufficient accuracy is obtained. (N.B. This method does not always converge when the initial approximation is poor, but it doubles the number of significant figures on each iteration near the solution.)

More General Calculation

For a more general calculation, where there are no reliable ratings at the outset, see: http://www.edochess.ca/Edo.explanation.html. For the Bradley-Terry algorithm to solve this problem, see: http://www.stat.berkeley.edu/~aldous/Colloq/lange-talk.pdf.

Michael de la Maza

Here are the results for Michael de la Maza’s last 19 games:

1873 1 1607 1
1810 1 2281 0
1854 1 1836 1
1936 1 1813 1
1925 1 1821 1
1878 0.5 1952 1
1960 1 1853 0.5
1977 1 1948 0.5
1991 0.5 2531 0
1629 1

The graph above shows f(R) for these results. Here are my results from Newton‘s Method:
1 2 3 4 5
R 2149.349349 2186.589424 2188.682595 2188.689059 2188.689059
h 37.24007498 2.093170523 0.006463885 6.15177E-08 -2.44907E-13

The initial approximation given by the simple calculation is 2149, and the value given by the maximum likelihood method is 2189, which is significantly higher. MDLM's final performance was very nearly at the US National Master level of 2200.

3 comments:

Temposchlucker1 September 2012 at 05:38
Unexpecting things can happen during a tournament. An opponent can resign at move 5 and decide to go to the dentist. Your opponent can get a seizure and lose on time. Or they are called away for family problems. An unreasonable share of your high rated opponents are so impressed by your smell that they want to get it over with. Causing you to be temporarily overrated. I guess something like that has happened. For some people being overrated is a good moment to stop :)
Geoff Fergusson1 September 2012 at 08:27
The chances of that happening to you for 19 games are astronomical! I do not believe his story.
AoxomoxoA wondering2 September 2012 at 16:31
Tempo said:
An unreasonable share of your high rated opponents are so impressed by your smell ...
The smell of MDLM had to be impressiv with a coat on at that heat: http://i47.tinypic.com/oszp1f.jpg

Saturday, 1 September 2012

Rating by Maximum Likelihood

3 comments: