In Rating, Tme and Score, I suggested that a chess player’s rating increases by K rating points for every doubling of his time on the clock. I was not the first person to make this suggestion, see:
This link concludes that K is about 120 Elo points, based on time handicap matches. It does not, however, include the clock simultaneous match between Topalov and the four player Irish team, see:
This match was drawn, with a rating difference of 367 Elo points. Including this result would have raised the estimated value of K a little.
In Rethinking Chess Problem Server Ratings, I suggested applying the K relationship to a set of problems of that are statistically representative of the moves of a chess game, rather than chess games themselves. In A Simpler Sever Rating Method, I suggested applying this relationship to a single problem or a set of problems at the same level of difficulty. I proposed the formula:
sU = 1/(1+10^[(RP - RU - K*log2(t/T))/400])
In this formula, sU is the user’s expected score, RP is the problem rating, RU is the user rating, t is the time taken on the problem, and T is the target time at which the user and problem ratings are valid. If the time that the user takes doubles, the rating of the problem is reduced by K rating points. This results in a K rating point reduction in the user’s rating. We can see this by solving the equation above for RU:
RU = RP - K*log2(t/T) - log(1/sU - 1)
The corresponding expected score for a problem is:
sP = 1 - sU = 1/(1+10^[(RU - RP + K*log2(t/T))/400])
If the user solves a set of problems with different ratings, and the time that he takes increases by a factor r, the ratings of these problems are all reduced by K*log2(r) rating points, in the formula for sU above. The user’s rating is reduced by K*log2(r) rating points as a result.
The graph below shows plots the expected score against the user rating for Michael de la Maza's last 19 games (in red), and a single problem approximation to this curve (in green):
(See Rating by Expected Score.) The curve for a single problem satisfies the Elo formula, but the curve for the last 19 games does not. If the Elo formula (without time adjustment) applies to single opponents or problems, it does not, in general, apply to sets of problems or opponents.
In Rating, Time and Score, I showed that if we assume both the K formula and the Elo formula, it follows that:
t = h*[s/(1-s)]1/k
In this formula, t is the time taken, h is the time to score 50%, s is the score, and k = K/(400*log(2)). This is a very useful formula. However, if it applies to individual problems, it does not (in general) apply to sets of problems. In particular, it does not apply to sets of problems that are representative of the positions in a chess game.
Applying the Elo formula individual problems, as I proposed in A Simpler Server Rating Method, provides a very simple, logical and mathematically sound rating method that takes time into account. All we need to do is:
(1). Subtract K*log2(t/T) from the problem ratings, when calculating user ratings.
(2). Add K*log2(t/T) to the user ratings, when calculating problem ratings.
We can then use a standard rating method that does not take time into account. If we calculate the user ratings from the problem ratings and then recalculate the problem ratings, we will get back to where we started; provided that the rating method used is mathematically consistent. Few rating systems are mathematically consistent, but my Rating by Expected Score method is very simple and does the job.
Existing chess problem servers adjust the score that a user receives for tackling a single problem according to that time that he took on that problem. This makes superficial sense if the user solves the problem correctly (the longer he takes the lower the score that he receives), but runs into difficulties if the user fails to solve the problem correctly. Here too, the longer he takes, the worse his result, and the lower the score that he deserves. If he instantly rejects a problem without even looking at it, he does not deserve to be penalised at all. On the other hand, if he spends all day failing to solve a simple problem, he deserves to be penalised heavily. The existing servers give him a zero score for a failure, however long he takes. This does not make sense, but we cannot give him a negative problem score. I believe that we have to cross this approach off our list.
In Rethinking Chess Problem Server Ratings, I tried to get round this problem by adjusting the users’ scores on sets of problems rather than on individual problems, and using these adjusted scores to calculate the user ratings. Specifically, I suggested adjusting a user’s score to what it would have been if his average time taken was equal to a fixed target time. I also suggested adjusting a problem‘s score to what it would have been if the average time taken by the users on that problem was equal to the fixed target time. I ran into difficulties, however, when I tried to implement this method. I found that successively calculating the user ratings from the problem ratings, and the problem ratings from the user ratings did not converge to a consistent solution. I believe that we also have to cross this approach off our list.
My scheduling method (see Rethinking Chess Problem Sever Ratings) remains an interesting possibility. This method is based in the idea of using the server scheduling to ensure that all the users take the same average time per problem, and that the same average time is taken on every problem. This method is potentially very convincing, because it does not rely on any assumptions concerning the relationship between time and score, but it needs reliable problem ratings calculated by another method.
We are left with the Simpler Server Rating method. The only question that remains is whether we can validly apply the Elo formula to individual problems, rather than sets of problems representative of the positions in a chess game. Nonetheless, if we want to calculate ratings for individual problems, we have little choice but to make that assumption.