Empirical Rabbit: An Important Discovery

Thursday, 1 December 2011

An Important Discovery

While puzzling over the results of the Blue Coakley Experiment, I stumbled across an important and unexpected discovery. When I plotted the cumulative distribution of (correct) solution times, aggregated over the first passes through all the Coakley problem batches, I noticed that it looked like an exponential curve. (N.B. The cumulative distribution of solution times gives the probability that the solution time will be less than any chosen time value.) I tried fitting an exponential curve to my results:

I had not expected this! I thought that the distribution of chess problem solution times would be highly dependent on the distribution in difficulty of the problems. The problems at the beginning of Coakley’s book are significantly easier than those at the end, but that does not appear to have affected the result. The green line on the graph is the exponential curve P = 0.764*(1 - exp(-t/21.0)), where t is the time in seconds. I got much the same result with the Woolum Experiment:

For this graph, the green curve is P = 0.775*(1 - exp(-t/7.22)). The problems in Woolum vary from mates in one, to mates in three with more than one variation and the King in the middle of the board. Again, the distribution of difficulty does not appear to have affected the result.

In general, the cumulative distribution of correct solution times appears to be of the form P = a*(1- exp(-t/b)). If I continue solving an infinite set of problems, in the same way, until I have (correctly or incorrectly) solved them all, a represents the fraction of the problems that I solve correctly, and b represents my average correct solution time. (N.B. The mathematics of the exponential distribution is summarised by the Wikipedia article http://en.wikipedia.org/wiki/Exponential_distribution.)

I constructed a simple mathematical model to explain the graphs above. I assumed that I had a fixed probability per unit time 1/b of solving a problem (irrespective of the time that I had already spent on that problem). It follows that the probability that I will solve the problem (correctly or incorrectly) within time t is given by 1 - exp(-t/b). I also assumed that I had a fixed probability a that my solution was correct. The probability that I will find a correct solution within time t is then given by a*(1 - exp(-t/b)).

If I do not stop at the time limit, but continue until I have (correctly or incorrectly) solved all the problems, my average solution time will be b, but a fraction 1-a of the problems will have been solved incorrectly. If I try again, with these remaining problems, my average solution time will again be b, and a fraction (1-a)^2 of the problems will remain to be solved correctly at the end. If I repeat this process over and over again, until all the problems have been solved correctly, my average solution time will be:

b + b*(1-a) + b*(1-a)^2 + b*(1-a)^3 +…. = b /(1 - (1-a)) = b/a

This model therefore estimates my average time to find a solution that is always correct as b/a. (N.B. I have assumed that I have perfect memory of my previous attempts to solve a problem, and I do not benefit unfairly from knowing when my solution is wrong.) I have assumed that the tail of the distribution continues to follow the same exponential curve. A more detailed study of my results shows that this assumption is over optimistic. Nonetheless, the model appears to be reasonably accurate for most of the problems in my problem sets.

The expression 1 - exp(-a*t/b) can be interpreted as the probability that I will find a solution that is always correct within t seconds. Equivalently, it can be viewed as the cumulative probability distribution of the difficulty of the problems as measured by the time that it takes me to solve them correctly. With this perspective, my failure to solve a problem is simply a consequence of not spending enough time on it.

If I work on randomly selected fraction a of an infinite set of problems until I solve them all correctly, and ignore the others, the probability that I will solve a problem correctly within t seconds is again given by a*(1 - exp(-t/b)). (N.B. Ignoring a randomly chosen fraction a of the problems reduces both the number of problems that I solve by a fraction a, and the time that I take to solve them by a fraction a.) This is the same result as I got when I tackled all the problems, with a probability a of getting each one right. Indeed, because there is fixed probability per unit time that I will solve a problem, I get the exactly same result however I allocate my time between the problems. Time management is futile here! My time management does, however, affect the distribution of failures. I can give up solving a problem whenever I choose, and greatly affect the failure distribution, without affecting the distribution of successful solutions. The distribution of solutions (correct or incorrect) is the sum of the distribution of successes and the distribution of failures, so it too is affected by my time management.

[For a more detailed analysis see my later article: A Three Parameter Model.]

For computer programs, each doubling of the calculation speed increases the Elo rating by about 70 points, but I have not been able to find a corresponding number for human players. My guess is that each doubling of the time that a human player has on the clock increases his rating by about 200 points (up to about the standard time limit at least). The only evidence that I was able find is a clock simultaneous match between Topalov and the four player Irish team: http://2seeitlive.co.uk/icu/ This match was drawn with a rating difference of 367 Elo points, so having a quarter of the time on the clock appears to have reduced Topalov’s rating by about 400 points. This result is very interesting, but there were only four games, and data for 2000-2200 players playing clock simultaneous matches against four players rated 400 points lower would be relevant to the average player.

If my time to solve a set of problems correctly goes down from t₁ to t₂, the number of times t₂ needs to be doubled to get to t₁ is log₂(t₁/ t₂). This number can be fractional. If each doubling corresponds to 200 rating points, my improvement is 200 * log₂(t₁/ t₂) rating points. Let t_x be the average time that players rated x take to solve the problem set correctly. If I take time t, my performance is 200 * log₂(t_x/ t) + x = -200 * log₂(t) + 200 * log₂(t_x) + x (N.B. log₂(t) = ln(t) / ln(2).)

Consider two players, who are identical except that player 2 is 20% faster than player 1. Suppose that their distributions of (successful) solution times are given by P1 = 0.8*(1-exp(-t/20)) and P2 = 0.8*(1-exp(-t/16) respectively. The Elo formula gives the difference between their respective ratings and the rating of the problem set as -400*log(1/P1 - 1) and -400*log(1/P2 -1). (See my earlier article Rating Points Revisited.) Here is a graph showing these rating point differences for solution times between 5 and 60 seconds:

At first sight this graph looks balmy! The rating difference between the players and the problem set increases as the time limit increases - but this makes sense, because the players’ scores increase when they have more to think. More surprisingly perhaps, the difference in the players’ ratings relative to the problem set is at a maximum at around 30 seconds and smaller for both longer and shorter time limits. Again, this makes sense. If the players both have all day to think, but do not check their solutions any more thoroughly, they are both going to get 80% of the problems right. It is also not too surprising that if they are both given very little time to think, their scores will be closer. This method can give useful results, but it is less than ideal.

I used the method of least squares to fit exponential curves to my data points (Pi, ti). I tested the accuracy of the fit by calculating:

E = Sum_i { (P_i - a*x_i)² }

where x_i = 1 - exp(-t_i / b)

For a fixed value of b, E is at its minimum when:

dE/da = Sum_i { 2 * (P_i - a*x_i) * (-x_i) } = 0

Solving for a gives:

a = Sum_i { x_i * P_i } / Sum_i { x_i² }

I used a spreadsheet to calculate this value of a (and the corresponding value of E) whenever I entered a value for b. I adjusted b manually until I found the value that minimised E. This method worked better than the other curve fitting methods that I tried, and I used it to produce the graphs above.

19 comments:

AoxomoxoA wondering1 December 2011 at 02:16
Interesting post, will take me some time to understand it.

"For computer programs, each doubling of the calculation speed increases the Elo rating by about 70 points"

The data 70 Elopoints gain for doubeling calculationtime is very old, it might be not precise with the new algorithms of our days

"but I have not been able to find a corresponding number for human players. My guess is that each doubling of the time that a human player has on the clock increases his rating by about 200 points (up to about the standard time limit at least). "

I was looking for speed-rating relationship too:

http://aoxomoxoa-wondering.blogspot.com/2011/07/how-speed-influence-rating.html

http://aoxomoxoa-wondering.blogspot.com/2011/07/tating-and-thinking-longer.html
ReplyDelete
Replies
Geoff Fergusson1 December 2011 at 11:08
It is interesting that the 191 points from your Chess Tempo rating calculation is so close to my 200 point guess and the Topalov result. 200 points may be a little generous, but it feels about right. However, your calculation says that you have to think 5.5 times longer to gain 400 points, which I doubt. Successive doubling looks right to me.
ReplyDelete
Replies
AoxomoxoA wondering1 December 2011 at 14:59
The different valuses are the result of different sources of my "estimates".
My Chesstempo value is empirical but estimatingly to low because: without the time pressure of standard-problems you will get (at average) slower as necessary ( you may do something else in between?). The other "number" is a result calculatet from a ratingsystem. But these ratinsystems have no hard emprical background( as far as i know ). But both values are "close" to your ~ 200 Points for 2 times as much thinking-time ( for Tactics-Problems ).
ReplyDelete
Replies
AoxomoxoA wondering1 December 2011 at 16:00
I have a problem understanding your discouvery. Somehow i have the impression as if you would say: all problems are "equal":

"I assumed that I had a fixed probability per unit time 1/b of solving a problem (irrespective of the time that I had already spent on that problem). It follows that the probability that I will solve the problem (correctly or incorrectly) within time t is given by 1 - exp(-t/b). I also assumed that I had a fixed probability a that my solution was correct. The probability that I will find a correct solution within time t is then given by a*(1 - exp(-t/b))."

Is b = b(problem)=b(problem,tactician)? I think there are complicated and there are easy problems. (same with a)

If there are different a's and b's is your model still right?
ReplyDelete
Replies
BRF Fågelsången1 December 2011 at 23:04
Pure Poetry for a Statistician!
ReplyDelete
Replies
Geoff Fergusson2 December 2011 at 05:17
The values of b and a depend on the problem set, the player, and how much the player hurries. My accuracy, as measured by a was a little higher for Woolum, and my speed, as measured by b, was almost three times faster. The value of b/a depends on the problem set and the player, but should be independent of how much the player hurries (to within the limits of accuracy of the model). I improved between Woolum and Coakley, so you could say that that two different players tackled the two problem sets.
ReplyDelete
Replies
Temposchlucker2 December 2011 at 17:00
How would you describe your discovery in plain English?
ReplyDelete
Replies
Geoff Fergusson3 December 2011 at 02:16
Explain this article in plain English? That is a tricky one. If they understood arithmetic, but did not understand the concepts of random selection and probability, I could explain that the vertical axes on the first two graphs showed the number of problems that I correctly solved under each time on the horizontal axis, divided by the total number of problems. I could show them how I constructed the graphs by sorting the solution times into ascending order, and setting up another column with the count divided by the total number of problems. I could set up a third column to show the green curve. If they asked what “EXP” meant, I could explain what happens if they accrue compound interest over increasingly short periods of time, but it would take me a long time to get to the end of the article! The article contains only what Americans would call High School Mathematics, but it is beyond the average school leaver.

The article provides a method of assessing my progress at solving simple problems quickly. I believe that this method is an improvement on the previous methods that I have studied. If a computer based tactical trainer tells you when your solution is wrong, and makes you keep on trying until you get the right answer, the total time taken, averaged over the whole problem set, approximates to b/a. (The computer is helping you by telling you when you have got it wrong, but my method is also generous for the problems that I get wrong, as we will see next month.) I believe that I have cast doubt on the idea of treating problems as opponents and using a rating calculation. The simpler method of calculating ratios of average correct solution times and taking logs looks more convincing to me. What we need though is practical tests!
ReplyDelete
Replies
AoxomoxoA wondering3 December 2011 at 07:58
There are 2 suprising statements in this article.

1) The solving of a problem is Memoryless, meaning no matter how long you thought about this problem: the chances to "solve" it in the next x ( say 10 ) sec is always the same. Thats somehow strange because during your analysis you "understand" a problem better and should come a solution closer.

2) Somehow the post says that all problems are of the same difficulty, at least all problems in one book because a and b ( they say how complicated "it" is) are only dependend on the set (book).
With different difficultys of the problems the distribution "should be" a Hyper-exponential distribution.
ReplyDelete
Replies
Geoff Fergusson3 December 2011 at 11:40
The solving of a problem is Memoryless, meaning no matter how long you thought about this problem: the chances to "solve" it in the next x ( say 10 ) sec is always the same. Thats somehow strange because during your analysis you "understand" a problem better and should come a solution closer.

The longer you think on a particular problem, the nearer you are to the solution, but the chance (1 in 100 say) that you will find the answer in the next second is the same as it was in the first second. If there are 1,000 problems, you will solve 10 of them, on average, in the first second. (N.B. “On average” here assumes that you can keep rerunning the experiment, which is a mathematical idealisation.) In the second second there will be 990 problems left, so you will solve 9.9 problems, on average. By the time you get to the sixtieth second the number of problems will be seriously depleted (unless the dice rolls in a strange way), and you will solve less of them in that second. This is the mathematics of radio active decay. The problems that you solve in the first second will be obvious, and the problems that you solve in the sixtieth second will require thought, but the graph knows nothing about this. It just counts your successes, and happens to be exponential. (Perhaps the exponential graph can be explained by some underlying mental model, but I have not attempted to do this.)

Somehow the post says that all problems are of the same difficulty, at least all problems in one book because a and b (they say how complicated "it" is) are only depend on the set (book).

Clearly, the problems are not of the same level of difficulty. The value of a represents my overall success rate (the number I get right divided by the total number). The value of b represents the average time that I take to get a problem right (add up all the times and divide by the number solved).
ReplyDelete
Replies
AoxomoxoA wondering3 December 2011 at 12:48
If you do a very simple problem ( Mate in 1. QK against K) over and over again ( allways the first time!, your memory will be erased after every tryal by an superior outerspaced creature ;) You will have a high "a" almost at 1 and a low "b" ( maybe 3? )

But there are complicated problems where you will get an "a" of say 0.5 and a b of 600

If each problem would have an exponential distribution then these both problems together would have an hyper exponential distribution
http://en.wikipedia.org/wiki/Hyper-exponential_distribution

I was unable to see how exponential distributions can "add up" to an exponential distribution if they are not all with identical a and b ( but i have a flu at the moment ).

(maybe many exponential distributions can do add to one?)

I made a plot of the cumulative distribution of "all the blitz-problems i did at CT (#=10278)". I used the "av_sec of CT of today" (not my own ) and the plot was not that close to an exponential distribution ( f' was small positive in the beginning then raising quick, some more than here http://empiricalrabbit.blogspot.com/2011/04/tactics-performance-measurement_01.html , thats not very exponential either ).

I think the best method to measure the improvement in tactics is, to calculate the performance on CT-Blitz-Problems seen the first time. ( thats my method ;)
ReplyDelete
Replies
Geoff Fergusson3 December 2011 at 14:13
My mathematics does not address why an exponential distribution arises, it is just an experimental fact for the problem batches in all my problem sets. An exponential fits my graph in Tactics Performance Measurement surprisingly well.

You can make the value of a smaller by taking longer (i.e. by making the value of b bigger), but b/a should not depend on how careful you are (within reason).

Why do I get an exponential? For these simple problems my strategy is look around the board for tactical possibilities and try them one at a time. Perhaps it takes me a second to try each one, and everything I try has a fixed probability of working. Even if the problem is very hard, the first thing I try might work. If it is easy, it might still take me many tries to find the solution. There just is not time to look at the position properly! As I have said, the real stinkers take longer than the exponential predicts. One of Coakley’s endgame exercises is a related squares problem that he takes a whole page to solve, filling the whole board with letters!
ReplyDelete
Replies
Munich4 December 2011 at 14:24
We discussed the matter here, too.
http://chesstempo.com/chess-forum/chess_tactics_discussion/solving_time_rating-t3200.0.html;msg28120#new

I think it aox is right about the fact, that if you do CT Standard puzzles, you are under less time pressure and can relax and take your time (doing things in between). This would actually indicate, that if you put yourself under pressure to solve a puzzle quick, you even gain more rating points if you double the time.
I can see the statistics and see that both of you found 200 elo points. However, even though I cant proof you wrong - come on: 200 elo points seems by far too much. I am usually always late. (This is part of my personality and I found out that it can be caused by genetical disorder - it is like somebody who has a bad sense for orientation, or cant remember well faces. There are also people who always are late - like me). And because I am always late, naturally I came often late for my tournament games, too. I often had the situation where I had almost half the time to start with in comparison to my opponent. Though I believe this is a disadvantage, I very much doubt it was making the chances of a 200 elo lower rated opponent equal.
- - - -
s.th. different: I am like you (bright knight) very convinced that getting patterns int my memory is the key to success.
Different to you I dont repeat learning all puzzles, but only those puzzles I dont know well. So I concentrate on puzzles I did wrong or puzzles I could not to fast. That is "deliberate practice":
http://en.wikipedia.org/wiki/Deliberate_practice#Deliberate_practice
(you need to scroll down till "deliberate practice".
It does not make much sense to me to train puzzles I can allready do instantly. The patterns of puzzles I am good in are allready stored in my LTM. No need to train them.
With everything else: "my" training is pretty much your training. I an not sure if I can call it "my" training. However, I got the main ideas by temposchlucker, who isnt following his ideas, and I got tremendous help by aox, who himself doesnt train like this, too (despite "knowing" better).
I´d say aox pretty much formulated the 10 commandments of the holy chess-bible:
http://aoxomoxoa-wondering.blogspot.com/2011/07/learning-improving-without-repetition.html

Point 10 of aox hypothesis is what I am trying to say: dont just repeat stuff, but better concentrate on stuff you cant do. At the same time, I need to admit, that this is a very hard training because it seem to demand a high tolerance for frustrations. You feel stupid when you do easy puzzles again and again wrong or slow.
I train with Chess tempo puzzles only, because here I can measure my time I needed to solve a puzzle.
Nevertheless, from time to time I repeat everything (also the puzzles I could do fast) to ensure I can still do them fast. If not, then they automatically fall into the category of puzzles that need repetition.
ReplyDelete
Replies
AoxomoxoA wondering5 December 2011 at 01:42
@Munich
"At the same time, I need to admit, that this is a very hard training because it seem to demand a high tolerance for frustrations. You feel stupid when you do easy puzzles again and again wrong or slow."

I think its not necessary to do -many hundreds of problems a day- over and over again.
I think something like 50-100 a day is way enough to improve ( thank to Emprical Rabbit!! )
Mix with some other ct-tactics (blitz) and or games and/or endgame...
ReplyDelete
Replies
Geoff Fergusson5 December 2011 at 02:32
Thank you very much for the link.

Richard is right in saying that some players are better at blitz chess than standard time limit chess, and vice versa. What we want here is the improvement gained by having more time averaged over many players. From a training point of view, what would be more useful is to know how many real world points you are likely to gain from doubling your speed at the relevant time limit. For solving problems, what we would most like to know is how your percentage score is affected by the time you have for problem sets that are as statistically the same as we can make them. (This is not the same as comparing your performance on the problems that happen to be easy for you in the problem set with your score on those that happen to be hard for you.)

200 Elo points is only a guess, supported by some very shaky evidence. If you recruit four players rated 400 points lower from the same rating pool, put 60 minutes on their clocks and 15 minutes on yours, and tell us the score, that would be a big step forward. Play four games against them would be more evidence than we have so far. If several people did that, we would really be making progress!

I believe that pattern memory is important, but calculation ability is also important. I am currently training on harder problems without a formal time limit and writing my solutions down. I also write down my reason for my failure when I do not get the entire solution in the book.

I have consistently found that repeating the problems that give me difficulty more often than those that cause me less difficulty does not work for me. Having pattern in my LTM is not enough if I am too slow at finding it, or are not sufficiently reliable. Spending most of my time on problems that are too difficult for me to find over the board is not productive. Yes, I need to tackle some problems that stretch me, but I have not found it productive to repeat them after a short time interval.

It is strange that Temposchlucker gave up what gave great results for him, which was solving 10,000 problems without repetition. If he went through those 10,000 again, I expect that he would improve.
ReplyDelete
Replies
Munich5 December 2011 at 04:39
Didnt tempo repeated these "10.000" 7 times?
His improvement was very little.

I believe, Tempo had "bad" success in 7x"10.000" because:
- he did not check what he did wrong when he failed a puzzle.
- he did not concentrate on his slow solvers
- he did them too fast (the CTS-Tempo is too fast)
- his set size was too big, the 1st repetition set in way to late.
- the puzzles where too ultra-easy (this is due to very little time constraints).
- at the end he memorized very little (too quick solving will not leave a big impression in your memory and little understanding for the pattern).
- his 10.000 set was not tag-themed related (I see with my mate_in_2 set a lot of similar typical patterns, making it more easy for me to memorize a principal/pattern).
- Tempo did not do them rating sorted (similar pattern have similar ratings. Also it is better to know what difficulty to expect. It is not important that you challenge your STM to the maximum, but only that you understand the concepts and memorize them. If you have the "hint" about the rating, and the "hint" about what tag to expect (for instance a fork), this wont harm the transfer into the LTM.

So many things tempo actually did wrong.
And even despite all of this - I would still expect him to improve if he mangages to memorize the pattern at the end.

Tempo is doing difficult puzzles now, too, same way like you. He writes his solution down and think afterwards why he failed them.

I discussed the matter thoroughly with tempo.
At the end I come to this conclusion:
difficult puzzles consist of several easy patterns. Since there are side variations to check where these "easy tactics" are hidden, it is enough to just fail 1 of the easy tactics to fail the whole puzzle. Furthermore, in a side variation in move 2, 3 or even 4 you will only spend very few seconds per position. That is the nature in the sidevariations. you have the main idea pretty soon, and then you do blunder checking and checking anti-tactics. But these checks you usually spend only a few seconds. That is why you need to be able to spot easy tactics in side variations very fast.

Training difficult puzzles will transfer patterns into the LTM at the end, too. I just wonder what the quicker way is: doing a few hard puzzles or tons of easy puzzles that are easy for others but not for you. I strongly recommend easy puzzles (puzzles you failed or did slow despite beeing very easy).

@aox:
I do hundreds of puzzles. If I had a well asorted set, I could probably do less. However, I want to make sure to cover everything. So if CT offers 1000 discoveries in the range of 1150-1475 --> then I need to do several hundreds per day, otherwise the repetition sets in way to late.

It is much more fun to repeat what you know, than to "chew" and "bite" on previously slow solved puzzled or failures. So the deliberate practice is frustrating. Nevertheless, I blieve these are the most valuable puzzles to be repeated.
ReplyDelete
Replies
Geoff Fergusson5 December 2011 at 08:28
As I understand it, he improved by 170 points by solving 11,000 problems once:

http://temposchlucker.blogspot.com/2005/03/what-i-did-and-what-i-am-doing.html

He later went on to do many other things that did not work (indeed he went backwards), and he has offered endless speculations as to why. What he has not done is return to what worked for him.
ReplyDelete
Replies
AoxomoxoA wondering5 December 2011 at 14:44
Tempo did much more tactics than only these 7 times 10000 puzzles at CTS, but i think his training was way less effectiv than empirical rabbits method. And Tempo is not trying to measure his progress somehow excaclty ( wich is not that easy, Rabbit and I did some "reseach" on this.)

@ Munich. Search a set of your prevered pattern and strength. Then give 100 of them a self-made-tag and then create a set with all problems and this special tag, you will have a set you want and need. ( some handwork but..ok )
You can redo your small set and create a new one after say 4 repetitions
I realy think its important that you continue with doing problems of your strength. I suspect you mind will get lasy in calculation if you "only" do "silly" problems.
ReplyDelete
Replies
Munich8 December 2011 at 11:49
Bright knight: thanks for the link. I started to read many blogs tempo did long time ago. Still they are a good read.

aox: If you look at jws at CT for instance --> I cant see a sign of lazy calculation.
But true, after I did tons of silly problems I feel worn out, tired, and dull. Nevertheless I still seem to be able to do a bit more difficult stuff. Maybe I need a warm up of a few 100 puzzles that are in my strenght, but after that it should always be o.k.
The skill "calculation" should not get worse even if I did not use it for some months. It is like riding a bike. Even after a long winter, you will still be able to ride a bike in summer (disclaimer: Of course only valid if you were able to ride a bike before).

My prefered patterns to learn seem to be the patterns I did not find (=failed puzzles or slow solved easy puzzles).Or are there any others?
My prefered strength is of course that of a GM!
Shall I now do rated puzzles at CT Blitz rating of 2300++?
ReplyDelete
Replies

Add comment