## Wednesday, 1 June 2011

### A Pattern Matching Model

The basic idea behind this section is a simple one.  If tactics patterns are selected at random from a small collection in a bucket, returning them to the bucket as we use them, we will soon learn them all.  If the patterns are selected at random from a large collection, it will take us a long time to learn them all.  If we make some reasonable assumptions, it is possible to estimate the number of patterns from our actual rate of progress.  We are going to be able to make only a rough estimate here, but a rough estimate is better than none at all, or somebody’s wild guess!

I will assume that the patterns are sampled with replacement (see the previous section) from a fixed number of such patterns, and that there is one pattern per problem.  I will also assume that if I know a pattern to the required standard, I can solve p% of problems based on such patterns in under five seconds at the first attempt; and if I do not know the underlying pattern, I will not be able to solve any of the problems in under five seconds.  Clearly, the value of p is going to depend on the difficulty of the problem set.  In the Woolum Experiment, I was able to solve about 75% of the problems in under five seconds when I was practicing them intensively.  This dropped to 67% in some cases when the repetitions became further apart, and I would not do so well with new problems sharing the same underlying patterns.  I believe that a reasonable value for p here is 60%.

Suppose that I am able to solve q% of the problems in under five seconds at the first attempt.  The fraction of the patterns that I know is then q% / p%.  If M selections are made from a bucket containing N patterns, the previous section tells us that the fraction of the patterns that I know will also be given by:

1 - (1 - 1/N)M

On my first pass through batch A, I got 29.55% of the problems right, and on my first pass of batch F, I got 44.70% of the problems right.  Let M be the number of problems that I had already learned when I started batch A.  I would then have learned M+5*132 problems when I started batch F.  We get two equations:

1 - (1 - 1/N)M = 29.55% / 60% = 0.4925
1 - (1 - 1/N)M+660 = 44.70% / 60% = 0.7450

Solving these equations gives M = 650 and N = 959.  The fraction of the patterns that I knew after learning all the 6*132 problems in Woolum is given by:

1 - (1 - 1/N)M+792 = 0.7778

The number of the patterns in Woolum’s bucket that I knew at the start of the experiment is therefore 959 * 0.4925 = 473, and the number that I knew at the end is 959 * 0.7778 = 746, so Woolum taught me 274 patterns.  The number of distinct patterns in Woolum is given by:

N*(1 - (1 - 1/N)792) = 539

The percentage of distinct patterns is 539 * 100% / 792 = 68%.  These numbers all look plausible, so this simple model is probably close to the truth.  However, it is possible that some of my improvement was not pattern specific, which would increase the estimated value of N.

The problems in Bain are easier than those in Woolum, so I believe that p = 75% is more reasonable for Bain.  I got 17.19% right on my first pass through batches A+B, and 49.23% right on my first pass through batches E+F.  An analogous calculation to that above gives M = 74 and N = 332.  The number of patterns that I knew at the start and end of Bain come out at 74 and 248 respectively, so I learned 174 patterns.  The number of distinct patterns in Bain comes out at 226.  The percentage of distinct patterns is 226 * 100% / 388 = 58%.

It is reasonable to assume that all the patterns that I learned from Bain’s bucket are also present in Woolum’s bucket, and that I initially knew the same proportion of patterns in both buckets when I started Bain.  The Bain data then implies that I knew 394 patterns from Woolum’s bucket when I finished Bain, which compares reasonably well with the 473 patterns implied by the Woolum data, so the numbers are about as consistent as we can expect here.

The main conclusions of this section are that the Woolum results can be explained if the patterns underlying the problems were randomly selected from about 1,000 such patterns; and the Bain results explained if the underlying patterns were randomly selected from about a third of these patterns.