Empirical Rabbit: Rating, Time and Score Revisited

Some of the formulae in my recent articles look reminiscent of those those in my earlier article Rating, Time and Score. It turns out to be possible to derive the results in that article using the formula that I have been using for a user's expected score. I have also proved several important new results.

Score vs. Time

In the previous article, Distribution of Solution Times, I used the notation:

s_U = expected user score.

x_U = 10^(R_U/400), where R_U = user rating.

x_P = 10^(R_P/400), where R_P/400 = problem rating.

t = time taken by the user.

T = target time at which the user and problem rating are valid.

k = K/(400*log(2)), where K = rating points gained for each doubling in the thinking time.

With this notation, the formula for the expected score is:

s_U = 1 / (1 + 10^[(R_P - R_U - K*log₂(t/T))/400])

As shown previously, this reduces to:

s_U = x_U / [x_U + x_P / (t/T)^k]

When the user's actual score is 0.5 we have:

x_U / [x_U + x_P / (t_0.5/T)^k] = 0.5

x_U = x_P /(t_0.5/T)^k

t_0.5 = T * (x_P / x_U)^1/k

What an interesting result! The formula for the expected score becomes:

s_U = 1 / [1 + (x_P / x_U) / (t/T)^k] = 1 / [1 + (t_0.5/t)^k]

(t_0.5/t)^k = (1/s_U) - 1 = (1-s_U) / s_U

t_0.5 = t * [(1-s_U) / s_U]^1/k

This is the same result as I got in the earlier article. Rearranging gives:

t = t_0.5 * [s_U*(1-s_U)]^1/k

This equation gives us the relationship between the time taken on a problem and the expected score.

As s_U→ 1, t → ∞.

What this result is telling us that is that no matter how much time we give the user, he will never be able to solve all the problems correctly. This result applies irrespective of the user and problem ratings. It telling us that even Magnus Carlsen cannot solve a large set of beginners' problems with perfect accuracy, no matter how much time we give him! This result is based on just two assumptions, the Elo formula and the K relationship.

There is good evidence that the K relationship is accurate for computers, with a K value of about 60. Nonetheless, computers can solve beginners' problems very quickly, with 100% accuracy. It is clear that the Elo formula is at fault here. It assigns a zero probability to perfection. Nonetheless, computers can achieve perfection.

Human chess players do make mistakes. If we set Magnus the task of solving an unlimited number of beginner's problems, he would eventually make a mistake. There is, however, worse to come! See mean Solution Time - Asymptotic behaviour below.

Score vs. Time Graph

The graph below shows the curves for K=100 (blue) and K=200 (red), with t=100 seconds:

We can easily check a few points on these curves.

(1). For both K values, when the score is 50%, no adjustment is needed to the time, which remains at 100 seconds.

(2). If K=100, and the player becomes 100 points stronger, his score s goes up to 64% (according to the Elo formula). He is also twice as fast, but we can restore his score to what it was (50%) by halving the time we allow him to 50 seconds.

(3). If K=200, and the player becomes 200 points stronger, his score goes up to 76%. He is also twice as fast, but we can again restore his score to 50% by halving his time to 50 seconds.

(4). We can also see that if K=100, the time has to be doubled to bring a score of 36% up to 50%, as we would expect.

(5). Similarly, we can see that if K=200, the time has to be doubled to bring a score of 24% up to 50%.

Probability Distribution of the Solution Time

The relationship between score and time provides another method for finding ds_U/dt. For brevity let s = s_U and h = t_0.5 so that:

s = 1 / (1 + (h/t)^k)

ds/dt = [-1 / (1 + h/t)^k)²] * k * (h/t)^k-1 * (-h/t²)

= [1 / (1 + h/t)^k)²] * (h/t)^k * k / t

= [1 / (1 + h/t)^k)] * [(h/t)^k / (1 + h/t)^k] * k / t

ds/dt = s * (1-s) * k / t

This is the same result as I obtained in Distribution of Solution Times. We can easily see that the distribution is normalised:

₀∫^∞ ds/dt dt = s(∞) - s(0) = 1

ds/dt = (t/h)^k / [(t/h)^k + 1)]² * k / t

When t << h:

ds/dt ≈ k * t^k-1 / h^k

When t → ∞, ds/dt → 0. When k > 1 and t → 0, ds/dt → 0, but when k < 1 and t → 0, ds/dt → ∞. This odd looking result can be understood by examining the graph for s against t where h=100 and k=0.5:

The curve is vertical at the point where it touches the vertical axis, which does not appear to be a problem. I will now find the maximum of ds/st. Let x = t / h, and P = ds/dt.

P = x^k / (x^k  + 1)² * k / (x*h) = x^k-1 / (x^k  + 1)² * k / h

dP/dx = (k/h) * [(x^k  + 1)² * (k-1) * x^k-2 - x^k-1 * 2 * (x^k  + 1) * k * x^k-1] / (x^k  + 1)⁴

For a maximum, dp/dx =0:

(x^k  + 1)² * (k-1) * x^k-2 - x^k-1 * 2 * (x^k  + 1) * k * x^k-1= 0

(x^k  + 1) * (k-1) - x * 2 * k * x^k-1= 0

(k-1) * x^k + k - 1 - 2 * k * x^k = 0

(-k-1) * x^k + k - 1 = 0

x = [(k-1)/(k+1)]^i/k

t_max = h * [(k-1)/(k+1)]^i/k

The graph below plots t_max/h against k:

The mode t_max is always less than the median h.

Mean Solution Time - Asymptotic Behaviour

The mean time taken taken to solve a problem with perfect checking is:
₀∫^∞ t * ds/dt dt.

t * ds/dt = (t/h)^k / [(t/h)^k  + 1)]² * k

Let x = t / h:

t * ds/dt = x^k / (x^k  + 1)² * k

For x >> 1:

t * ds/dt ≈ x^-k

For b > a >>1 and k ≠ 1:

_a∫^b t * ds/dt dt ≈ [b^-k+1 - a^-k+1] / (-k+1)

This is finite when b tends to infinity for k > 1, but becomes infinite as b tends to infinity for k < 1. Consider k = 1. Let y = t/h +1.

t * ds/dt = (y-1) / y² = 1/y - 1/y²

₀∫^b t * ds/dt dt =  1∫^b/h+1 (1/y - 1/y²) * h * dy

∫ h*(1/y - 1/y²) dy = h*(ln(y) + 1/y) + constant

It is clear that this integral also becomes infinite as b tends to infinity. If K is less than 120.412, the the problems that the user finds the most troublesome will take him so long that even his average solution time will be infinite. This result, again, applies irrespective of the user and problem ratings. This is very unrealistic for Magnus Carlsen solving beginners' problems. I expect that K is substantially larger than 120.412 for human players. In practice, chess problem server users typically achieve only about 75% accuracy:

See http://chess.emrald.net/ctsTactHome.php. A normal distribution has a sharper cut-off (lower kurtosis) than a logistic distribution, and may be more realistic. Nonetheless, a logistic distribution should be adequate in practice, and results in great mathematical simplification. (As far as I know, all the popular existing chess problem servers use the Glicko rating system, which is based on the logistic distribution, see http://en.wikipedia.org/wiki/Glicko_rating_system.)

Mean Solution Time - Calculation

There is a trick to calculate the mean solution time:

(d/dt) (t*s) = t * ds/dt + s

₀∫^L t * ds/dt dt = L*s - ₀∫^L s dt = ₀∫^L (1 - s) dt

= ₀∫^L (h/t)^k / (1 + (h/t)^k) dt = ₀∫^L 1 / ((t/h)^k + 1) dt

Let x = t/h:

₀∫^L t * ds/dt dt = h * ₀∫^L/h 1 / (x^k + 1) dx

₀∫^∞ t * ds/dt dt = h * ₀∫^∞ 1 / (x^k + 1) dx

Let:

Q(k) = ₀∫^∞ 1 / (x^k + 1) dx

₀∫^∞ t * ds/dt dt = h * Q(k) = T * (x_p / x_u)^1/k * Q(k)

∫ 1 / (x + 1) dx = ln(x+1) + constant

Q(1) is infinite, as noted previously.

∫ 1 / (x² + 1) dx = actan(x) + constant

Q(2) = π / 2

Who would have thought that the mean time to solve a set of chess problems would depend on the ratio of the circumference to the diameter of a circle? I searched for this integral on the web and found:

http://www.math10.com/en/university-math/definite-integrals/definite-integrals.html

One of the formulae on that site gives:

Q(k) = π / (k * sin(π / k)

The graph of this function is shown below:

(The link above does not make it clear whether this function is valid for non-integer values of k. I found it to be accurate for non-integer values, but my numerical integration was not good when k is near to 1.) As k → ∞, Q(k) → 1. The mean solution time is therefore always larger than the median solution time h.

2 comments:

Temposchlucker25 September 2012 at 05:09
What this result is telling us that is that no matter how much time we give the user, he will never be able to solve all the problems correctly

That is what I found too. No matter how much time I use on ct standaard mode, I continue to make errors.
Geoff Fergusson25 September 2012 at 08:51
Yes, for human players that is entirely realistic. I doubt whether anyone is capable of never making a mistake. An infinite average solution time with perfect checking is more worrying, but I cannot prove that it is unrealistic on general grounds. I expect k is more than 1 for humans anyway, and then the problem does not arise.

Sunday, 23 September 2012

Rating, Time and Score Revisited

2 comments: