[r-t] Big searches

Sat Jan 5 22:04:54 UTC 2008

I suppose our perspectives of what is interesting is probably driven to a large 
extent (ahemb) by what we have put effort into. Anyway, I personally find the 
question of "how many nodes can we search" less interesting than "how few nodes 
can we search to get at a desired answer."

If anyone wishes to compare, here's a little benchmark problem:

Find all the bobs only, tenors together, three part 5088s of Yorkshire Major, 
with part heads 34256 and 42356, and that contain all 24 5678s at the back.

I have to confess to constructing a deliberately self-serving benchmark, as many 
of the properties described can be used to reduce the search space in various ways.

Anyway, for that problem, my current technology gets the search down to 7,125 
node visitations.

That's actually a bit of a cheat, however. One of the strategies I use for 
reducing search depth involves doing ten seconds worth of breadth first search 
up front as part of setting up the problem to be attacked by the depth first 
search. If I turn that off it requires 298,059 node visitations.

By the way, my software believes there are 72 unique solutions, where I am using 
"unique" to mean that just adding or removing a shunt doesn't make a different 
solution, but reversing the whole thing does contribute a distinct solution. If 
you come out with a different number I'd be interested in comparing specific 
results. While I have reasonably high confidence my technology isn't finding any 
false "solutions", I'd not be surprised if, given the aggressive attempts to 
reduce the search, I have one or more bugs that overlook candidates at some point.

Of course, part of the cost of reducing the size of the search may be doing more 
work at each visitation, so the time consumed may also be interesting. That's a 
lot harder to compare, since it's sensitive to processor speed, memory speed, 
bus speed for getting at memory, word size, bus width, caches, pipeline 
architecture, instruction set, compiler behavior, ad infinitum.  I'm guessing a 
reasonable first pass at a vaguely comparable number might be nodes per time 
unit divided by processor clock speed. To make the resulting number nice, I 
propose scaling the computation by multiplying by 10**6; that is, the division 
by processor speed is dividing it my the number of MHz.

The nominal clock speed for the machine I'm using (an early MacBook Pro, 32 bit 
processor, not 64 like the new ones) is 2.16 GHz.  Oh, and by the way, while the 
machine I'm running it on is dual-core, it's only using one thread right now. 
I've a vague ideas for using multiple, concurrent threads but haven't gotten to 
implementing them yet.

Anyway, here are the results for the two cases:

   7,125 nodes,  40 msec,   178,000 nodes/sec,  82 nodes/Mcycle

298,059 nodes, 280 msec, 1,064,000 nodes/sec, 493 nodes/Mcycle

BTW, I'm guessing (though it's just a guess) that the sharp difference in node 
rates between the two cases is not anything algorithmic. Rather it's an artifact 
of it being a JIT compiled Java implementation. In both cases you spend the same 
amount of time compiling the inner loop, but the larger the search the more 
iterations that time gets amortized across.

Based on some previous discussions of what we're doing, I'm guessing Mark will 
be searching far more nodes for this problem, but  but processing those nodes at 
a far faster rate. Including node rate measured in nodes/MCycle, though I 
suspect he's running a machine with a much faster clock rate, too. If there are 
other folks able to make similar measurements using other implementations of 
depth first searches, it would be interesting to see how other things compare. 
In particular, has anyone come up with even better ways of reducing the search 
space that needs to be traversed?

-- 
Don Morrison <dfm at ringing.org>
"I would often run the first part of a quarter through Abel; sometimes
the whole quarter. At least then you know what it should sound like,
before the bells and ringers get hold of it."
    -- A J Barnfield, on the change-ringers mailing list, 7 January 2007