Since the calculation only gives ratings relative to an arbitrary base value, I normalised the ratings for each K value to the average rating for K=200 (the value used in the simulation.) The chart shows that when the K value is too high, it increases the ratings of the harder problems (which take more time to solve) relative to the easier problems. A K value that is too low does the opposite. We can understand this result by examining the formula for a problem's expected score:
sP = 1 / (1 + 10^[RU - RP + K*log2( t/T))/400])
(See A Simpler Server Rating Method.)
RU - RP + K*log2( t/T))/400 = log(1/sP - 1)
RP = RU+ K*log2( t/T))/400 - log(1/sP - 1)
The larger K, the larger the problem's estimated rating, and vice versa. The graph below shows the user ratings, again calculated with K=100 (red), K=200 (green) and K=300 (blue):
The user ratings are very nearly independent of the value of K used to calculate them! Indeed, the result is not materially changed by setting K=0, when there is no time factor at all. The reason for this result is very simple (but it took me a while to find it). The distribution of solution times in the simulation was the same for all the users, and there was virtually no difference in their average solution times. Time was not a factor in calculating the user ratings, so they are independent of K.
It became obvious to me at this point that my simulation was not realistic. I ran another simulation, in which I set the time limit for each user to 240 / 2^[(Ru-1500)/K], where Ru is the user rating. This adjustment made all the users' target success probabilities about 0.77, and gave a wide variation in their average solution times. (I also reduced the standard deviation in the perceived rating adjustment from 50 points to 25 points, and rounded the solution times up to the next tenth of a second, rather than the next second.) The graph below shows the real user ratings used in the simulation (in blue) and the calculated ratings (in red):
The calculation, again, faithfully reproduces the ratings used in the simulation. The graph below shows the user ratings, again calculated with K=100 (red), K=200 (green) and K=300 (blue):
The user ratings are strongly dependent on K here. The higher the K value, the greater the benefit to the stronger users who were solving the problems faster, and the greater the penalty for the weaker users.
I tried to implement my suggestion in A Simpler Server Rating Method for calculating K, but without success. Using only the user's faster solution times did not materially affect his rating, and neither did using only his slower solution times. The same was true of the problem solution times. The best suggestion that I can currently make is to use the value of K that gives the best match to the users' ratings for playing chess games.