My earlier article, An Important Discovery, described a two parameter model for the distribution of correct solution times for solving chess problems. This article extends that model to three parameters, to enable the distribution of failures to be modelled accurately. I will illustrate this model using the results from my first pass through all ten problem batches in the Blue Coakley Experiment. The graph below shows the cumulative distributions of solution times (in blue), correct solution times (in green) and failed solution times (in red):

The least squares best fit to the cumulative distribution of solution times (correct or incorrect) is Psolved = 1 - exp(-t/27.1), where t is the time in seconds (this curve is shown in grey). 27.1 seconds represents my average time to find a solution, if we extend the curve to infinity. I will call this value c. The least squares best fit to the cumulative distribution of successful solution times is Pcorrect = 0.764*(1 - exp(-t/21.0)). This curve is also shown in grey. The values 0.764 and 21.0 represent my success rate and average correct solution time, if we extend the curve to infinity. I will call these values a and b respectively. The cumulative distribution of failures (shown in red) is obtained by subtracting the other two distributions, as its theoretical curve (shown in grey).
In my earlier article, An Important Discovery, I proved that the two parameter model estimated my average time to find a solution that is always correct as b/a = 21.0 seconds / 0.764 = 27.5 seconds. However, according to the blue curve above, my average solution time, with failures, is 27.1 seconds, which is only marginally smaller than b/a. This small difference would not be not enough to allow me to complete all my failed solutions correctly. There is clearly something wrong with the two parameter model.
The average values of my solution times within the time limit give an important clue. My average time to solve a problem (correctly or incorrectly) within the time limit was 20.1 seconds. My average time to solve a problem correctly within the time limit was 18.1 seconds, and my average time to solve a problem incorrectly within the time limit was 29.2 seconds. (The values calculated from the theoretical (grey) curves on the graph above closely match these numbers.) I was clearly taking longer on the harder problems that I was getting wrong, than on the easier problems that I was getting right.
I decided to divide the 60 second time limit into 5 second intervals, and work out the number of problems that I solved within each successive interval, and the corresponding number of successes:

Clearly, the number of problems that I solved within each successive time interval fell steadily. Overall, I got 660*100%/805 = 82% of my solutions right, but my success rate declined. The next chart shows my success rate for each of the five second intervals:

It is very clear from this graph that my success rate fell with time. The average value for the 5 second bands is 75.5% which is roughly the same as the value of a (76.4%) shown by the horizontal red line.
The mathematical form of my success rate curve is already determined the two cumulative probability distributions:
Psolved = 1 - exp(-t/c)
Pcorrect = a*(1 - exp(-t/b))
The corresponding probability distributions are:
dPsolved/dt = exp(-t/c)/c
dPcorrect/dt = a*exp(-t/b)/b
My success rate is therefore given by:
(dPcorrect/dt) / (dPsolved/dt) = a*c*exp(-(1/b-1/c)*t)/b
(This calculation divides the 60 seconds into an infinite number of infinitely narrow intervals, rather than twelve 5 second intervals above.) This calculation indicates that my success rate dropped off exponentially with time.
Why was b/a nearly as large as c in this experiment? This happened because the problems that I solved incorrectly took me almost as long as the two parameter model allows for me to solve them correctly. If I had continued with these problems, I would have eventually have got them all right, and they would then appear on my cumulative distribution of successful solutions, but with larger values than this model implies. The tail end of this cumulative distribution must rise less rapidly than this model predicts. Nonetheless, I solved 73% of the 900 problems correctly within the time limit, and the simple model is a good match up until the time limit. The remaining 27% of the problems would take longer to solve than the two parameter model predicts.
The cumulative probability distributions Psolved and Pcorrect can be used to calculate the average time that I used in solving problems up to any time t. Let N be the number of problems. The expected time used on the problems that I had solved (correctly or incorrectly) up to time t is the integral of N*t*exp(-t/c)/c from 0 to t. Integrating by parts gives:
N*[c - (c + t)*exp(-t/c)]
The expected time used on the problems that I had not solved by time t is:
N*t*(1 - (1 - exp(-t/c)))
The expected time used per problem up to time t is the sum of these expressions:
N*[c - (c + t)*exp(-t/c) + t*(1 - (1 - exp(-t/c)))]
The average time T used per problem up to time T is therefore:
c - (c + t)*exp(-t/c) + t*(1 - (1 - exp(-t/c))) =
c - c*exp(-t/c) - t*exp(-t/c) + t - t + t*exp(-t/c) = c - c*exp(-t/c)
T = c*(1 - exp(-t/c))
The fraction of the problems that have been solved correctly by time t is:
s = a*(1 - exp(-t/b))
exp(-t/b) = exp[(-t/c)*(c/b)] = [exp(-t/c)]^(c/b) = (1 - T/c)^(c/b)
so that:
s = a*[1 - (1 - T/c)^(c/b)]
The graph below plots s against T (in green), and compares the results with the corresponding experimental values (in red, see my earlier article Rating vs. Time on the Clock):

s = 0 when T = 0, and s = a when T = c. a/c represents the fraction of the problems solved per unit time up to time c.
ds/dT = -a*(c/b)*(1 - T/c)^(c/b - 1)*(-1/c) = (a/b)*(1 - T/c)^(c/b - 1)
When T = 0, this becomes a/b, which represents the fraction of the easier problems solved per unit time. The curve for s becomes horizontal when T = c. Solving for T gives:
T = c*[1 - (1- s/a)^(b/c)]
The three parameter model based on a, b and c provides a more accurate picture than the simpler model based on a and b. It is clear from the three parameter model that b/a does not accurately represent the average time per problem to solve all the problems correctly.
No comments:
Post a Comment