[r-t] Predicting search size (was FKM)

Tue Jan 8 23:12:21 UTC 2008

Don asks,

> How's that work then? Is it comparing nodes visited versus some estimate
> of nodes that will need to be visited? If so, how do you construct the
> estimate? If not, what's it doing instead?

Not really. I think Elf and SMC32 do it two slightly different ways - you
can look at the source for the former yourself, it's on the website and is
pure Java. Elf is cleverer and copes with the search behaviour of the FKM
algorithm better.

However, the algorithm in SMC32 forms the basis of it:

1. Start with percent = 0 and range = 1.0.

2. Step through the current composition node by node.

3. For each node, work out N, which is how many child nodes there are
(number of possible calls - for instance 3 if plain, bob and single are all
possible), and C, which is where in the list of child nodes the actual call
used comes. For instance, a bob would be child node number 1 if plain, bob
and single are allowed; or could be 0 if plain isn't allowed from this node.

4. Divide your range by N (e.g. 1.0 goes to 0.5 if initial node has two
children, say plain and single).

5. Add on to the cumulative percent value range * C. So if first node has
two children, plain and single, and a single is taken in the composition, 
C=1, you'll add on 0.5*1 = 0.5. You add on 0 for a plain.

6. Repeat for each node in the composition. The range gets smaller and
smaller, so the C value calculated at each stage has less and less effect on
the percent value. You could stop at some point, but I don't think I
bother... isn't calculated very often so has no effect on runtime.

7. At the end, your percent value is the %age complete; except it's not
accurate for the FKM (rotational sort algorithm), since that gets faster and
faster towards the end. So for FKM, SMC32 does percent = sqrt(percent).
Still not at all accurate, but a bit better.

In practice, no search starts at 0% or ends at 100% (all plains or all
bobs/singles quickly runs false). So before the search starts, SMC32 finds
the first and last possible compositions up to a certain small length (a few
courses). It calculates the % complete of each of these, and uses the range
between the two as the starting value for "range" in the above algorithm
(instead of 1.0).

Elf does something similar, although the number of child nodes tends to be
much higher (spliced). To cope with the rotational sort, it treats the first
node differently - basically for each possible child of the first node, it
precalculates a set of "progress ratios". For a simple single-method search
with plains and bobs only, the progress ratio for a plain would be 100% and
for a bob 0% (by the time you get to a bob you only have one touch left to
search, the bob course, so 0% of the search time is spent on that). For 
searches with n first children, child i is given ratio n^(i+1)/n^i. Check 
the source code for more - it's in 
org.pealfactory.compose.halfleadspliced.Composer.calcProgressRatios() and 
getComposingProgress().

Treating only the first node as a special case is again a half-measure, but
works surprisingly well and gives much more accurate % complete metrics than
SMC32's simple sqrt() for the FKM search. The main trouble with Elf is that 
the search is also heuristic, so it tunes itself as it goes along, and gets 
faster and faster. The % complete algorithm doesn't allow for this heuristic 
pruning.

MBD