[r-t] Scala, part three

Thu Jul 5 23:02:09 UTC 2012

Right, I'm not sure how many are still with me (hands up if you're 
there?!) but here's the last article in my Scala-for-composers series. 
Here I'll focus on the remaining component of the Helixoid Pipeline 
described in previous emails: the search for splice-sisters. We'll see a 
bit more Scala, including the promised "node tables without table-build" 
optimisation.

The starting point for this stage in the pipeline is a list (or 
Iterator) of 1200 Double Helixoid methods which have the correct 
congruency property. Using the stage-2 TV expansion described in the 
last article, this immediately expands to approximately 73,000 methods, 
each of which we need to check for course-splices.

The simplest way to find those course-splices is simply to run the 
Helixoid tree-search again, modified to prune off methods which deviate 
from the set of rows in the plain course of the method under 
consideration. Running 72 thousand separate tree-searches sounds like a 
recipe for a long wait, but in fact the pruning achieved by the 
course-splice restriction is so severe that the whole thing runs quicker 
than the original, unrestricted search.

The modification to the original search algorithm (given in article 1) 
is very minor. Before, our an inner loop was as so:

	for (pn <- AllPN.filter{isRightPlace(_, pnSoFar.head)})
	{
		val newRow = row.apply(pn)
		if (!rowsThisTime.contains(newRow) &&
			!rowsUsed.contains(newRow))
		{
			... recursive call ...
		}
	}

All there is to do is to add another check to the if() statement, to 
ensure the new row of the new method exists in the plain course of the 
old method:

		if (!rowsThisTime.contains(newRow) &&
			!rowsUsed.contains(newRow) &&
			oldMethod.fullCourse.contains(newRow))

Well - not quite. Such methods will not be course splices unless the 
leadhead group is the same. The tree search only produces the PN up to 
the halflead, so we need to try out different halflead and leadhead PN 
to see if we can get the right kind of leadheads.

As we'd expect from Scala, filtering our results to find methods with 
the right leadhead is straightforward. First we get the set of allowable 
leadheads: given we have generated the entire plain course of the source 
method anyway, this boils down to picking the 3rd, 6th, 9th and 12th 
leadheads (these will have 123 home).

The quickest way to deal with sets of numbers like this is to use a 
Range, another component of Scala's collections library. There are 
several ways to create Ranges:

	Range(0, 10)
	0 until 10
	0 to 9

All three of these create a Range object which, when iterated through 
using any of the normal collections functions, will produce the numbers 
0 to 9. You can also apply a step:

	Range(0,10,3)
	0 until 10 by 3
	0 to 9 by 3

All give 0, 3, 6, 9. Note that "until", "to" and "by" are just ordinary 
functions, the first two operating on number, and here shown using infix 
notation. We could equally well write

	0.until(10)
	0.to(9).by(3)

Anyway, using a Range, extracting the right leadheads from the full 
course is a one-liner:

	val goodLHs = (3 to 12 by 3).map
		{ (n) => helixoid.fullCourse(n*HalfLeadLength*2) }

The other part of the job is to generate all the possible leadheads for 
the candidate method, given just the first half-lead. Again, it's one line:

List("56","36","16","58","38","18").
	map{ new Method(firstHalf++List(PN(_)), PN("14")}.
	filter{ goodLHs.contains(_.fullCourse(HalfLeadLength*6)) }

This is simplified by the fact that the nature of my composition was 
such that a 14 leadhead was required, so it was just the half-lead PN 
possibilities which I needed to check.

There is one more job left to do. This stage of the pipeline produces 
quite a lot of splice-sisters, but most of them are pretty poor, with 
only very minor variations. It would be nice to output some kind of 
score in order to spot examples of very different methods which still 
splice. This simple function did the trick for me:

	def methodDiff(h1: Method, h2: Method) =
	{
		val pns1 = h1.pns.slice(0, HalfLeadLength-1)
		val pns2 = h2.pns.slice(0, HalfLeadLength-1)
		pns1.zip(pns2).filter{ (pair)=>pair._1!=pair._2 }.size
	}

This final piece of functional loveliness uses the zip() function, again 
provided by the rich api of Scala's collection library. Zip takes two 
collections and returns a single collection of pairs of elements. So we 
take the two sets of place notation from two splice-sister candidates, 
zip them together, and count how many times the PN elements in the 
zipped pairs differ.

And there we have it - the final part of the pipeline, and, a few 
seconds later, half-lead-splice-sisters H and J popping out of the matrix.

However, I want to stop at this point and re-examine the search 
algorithm. Compared to "production-quality" machine searches, such as 
that implemented by SMC32, our code suffers from an obvious performance 
deficit: we are generating permutations in the inner loop. The line of 
code which does this doesn't look very significant:

		val newRow = row.apply(pn)

However the implementation of the Row function "apply(pn: Pn)" is hidden 
from us, and we can assume it is not very quick. My implementation holds 
both the Row and the PN as arrays, and steps through the latter in order 
to work out which bells in the former to swap. Whilst fast enough for 
our purposes here, a proper machine-search would not wish to incur this 
penalty. Instead, a table of nodes would be normally be pre-built, one 
for each row in the search space, and containing a list of pointers to 
the next node for each allowable permutation. (For some search 
algorithms, nodes are also a home for false-node pointers, music counts, 
and the like).

Unfortunately when writing quick-and-dirty one-off searches it is quite 
hard to motivate oneself to go to the trouble of writing the node-table 
build. Scala offers another way. Built into the language is the concept 
of a lazy variable, which is only initialised when referenced. I don't 
use that here, but what if we treated the node table like that, and 
built it as we needed it? Looked at this way, the node table is just a 
cache of previous row.apply(pn) results.

We can implement a cache using a Map. Here's one suitable for our purposes:

	val nodeMap = mutable.Map[(Row,PN), Row]()

This declares a mutable Map data structure, keyed on a pair (Row,PN), 
and having another Row as the lookup value. Now before applying a PN to 
a Row, we'll first look it up in the nodeMap, and if it's there, use the 
result. If not, we'll carry out the PN apply and store it back to the 
map for future use.

That still sounds like a bit too much coding for the hard-pressed 
composer; but again Scala has something clever to help us. Using Maps as 
caches is so common that the collections library provides a method, 
getOrElseUpdate(key, defaultValue), which does all of that for us. Here 
it is at work. Remember, previously we had:

	val newRow = row.apply(pn)

Using the nodeMap we can change this to:

	val newRow = nodeMap.getOrElseUpdate{(row,pn), row.apply(pn)}

The getOrElseUpdate method first tries to get the value out of the Map 
using the key passed in the first parameter (here a pair). If this 
fails, it then takes the second parameter, puts it into the Map with the 
same key, and returns it to you. Exactly what you want a cache to do, 
and you don't have to remember to code it yourself. (I have done so in 
Java, many times, and once or twice got it wrong - if for instance you 
forget to add the new value to the map, the code works, it's just 
somewhat slower than it ought to be... very hard for the usual kind of 
testing to find!)

But hold on - isn't the second parameter, row.apply(pn), going to be 
executed every time? Normally, this would be the case, but Scala 
supports the concept of pass-by-name parameters, which are only 
evaluated if the function uses them. The getOrElseUpdate method marks 
its second parameter as pass-by-name, hence we only incur the cost of 
the PN permutation once.

So that's it - node table implemented. Well, perhaps not quite. We've 
replaced one cost, that of PN permutation, with another, hash table 
lookup. Hash tables are pretty clever these days, so it's likely we've 
improved things, but maybe not enough. An old-school node table doesn't 
do any lookups, but steps directly from node to node, with an array 
index at most.

Now, I haven't implemented this yet, but I am pretty sure we could get 
even cleverer, and implement "true" nodetables as a library function, 
independent of our permutation type, and almost entirely transparent to 
our search algorithm. I won't go into this in depth, but Scala supports 
the concept of "traits", which are a bit like a combination of Java 
interfaces and Ruby mixins. I think we should be able to code a library 
trait called Node, which can be mixed in whenever we need true 
nodetables. The search algorithm wouldn't change, but all of a sudden 
our row.apply(pn) call would become nodetable-efficient, because we'd 
mixed in the Node trait to the Row class. I shall get back to you when 
I've coded this. :-D

In the meantime, the caching idea is pretty neat, eh?

Well, I think that'll do. I hope I haven't bored everyone to total 
suicide. Indeed, I hope I've whetted some appetites at the possibilities 
opened up by this excellent new language.

Are there any downsides to Scala? Yes, certainly. To make the most of it 
you may have to get your head into a new way of working, especially if 
you haven't previously experienced functional programming. The tooling 
support has struggled at times; there are now pretty good plugins for 
both Eclipse and IntelliJ (I'd recommend the latter), but such is the 
power and complexity of the language that things like auto-complete 
aren't yet as superbly slick as they are with Java. Compile times are a 
bit slower, too. And debugging functional code can be tricky - sometimes 
it's helpful to expand out a series of map() and filter() operations to 
see what the hell is going on in there.

But overall, could I have coded up this Helixoid stuff in any other 
language, in the late nights and lunch breaks which it seems is all my 
spare time consists of these days? The answer is certainly not. In Java, 
a language I know inside-out after 15 years of working with it, I 
wouldn't even have bothered trying some of the things I experimented 
with during this project. With Scala they were easy, and fun.

So there we are - Scala, a language for late nights and lunch breaks. 
Try it, let me know what you think. Here, again, are the links you'll need:

<http://www.scala-lang.org/>

<http://www.amazon.co.uk/Programming-In-Scala-2nd-Edition/dp/0981531644/ref=sr_1_1?s=books&ie=UTF8&qid=1340897990&sr=1-1>

MBD