[r-t] Scala, part two

Thu Jul 5 12:04:48 UTC 2012

In order to produce an extent of Major which would satisfy the stringent 
demands of my client (Philip!) I needed at least two Helixoid methods 
satisfying these requirements:

1. Double
2. Course splices of each other
3. Neither possessing any negative falseness with a given touch of PB8
4. As different as possible from one another, ideally with one being 
right-place and one wrong-place.

I had by this stage proved that it was possible to get a single, 
non-Double method which satisfied point (3), and hence could form an 
extent. It was not at all obvious that there could exist two methods 
which would meet the much higher bar set by the additional requirements. 
Nor was it immediately clear how to go about searching for them: 
although I had a list of all possible Double Helixoids, it was some 
megabytes in size, and very time-consuming applying the 
negative-falseness (or as I termed it, "Plain Bob congruency") and 
course-splice checks to the methods within it. Furthermore, doing so 
initially yielded nothing.

However, I had but scratched the surface, since each method in the (8MB) 
Double file was merely one example of a set of trivial variations whose 
size often approached two million, since place notations which merely 
perturbed the (123) and (45678) bells within a given Helixoid "path" had 
been pruned from the initial search (as in the Scala code demonstrated 
in the previous article). I had a feeling that if I was able to expand 
every possible Helixoid, I might have more luck finding my magic methods.

The trouble was, since it took hours to process the 62 thousand 
unexpanded methods, it was clearly impossible to scale this up by a 
factor of a million or more in order to deal with all the TVs.

Two breakthroughs helped me to tackle the problem, and in both cases 
Scala allowed me to code the solution very rapidly, as I will show in 
this article. The first was the idea of the pipeline: to do the 
congruency-checking first, and the course-splice check afterwards, and 
hence require only a subset of the TVs to be expanded at each stage. The 
second was the discovery that all Helixoids with the correct 
"congruency" property contained rows with the treble at lead which 
formed the handstrokes and backstrokes of a specific touch of Plain Bob, 
and that it was sufficient to check this in order to prove the lack of 
negative falseness. This lead to a very quick "congruency checker".

The first job was the very mundane task of reading the file of 
previously-generated Helixoids. One of the beauties of Scala is that 
boring jobs become quicker and easier by about the same magnitude as the 
difficult stuff. My output file, which Philip recently uploaded to the 
list, contained lines of the following format:

halflead      PN            possible half-leads
85476312 58.14....36.58.14 (56,36,16,58,38,18)

All I needed from that was the place notation string in the middle. It 
would be easy to slice that out using some string processing, however 
this is a good place to demonstrate Scala's powerful support for pattern 
matching. Java's "switch" statement is extended to allow arbitrary 
pattern-matching based on what are known as extractors, but are really 
just ordinary methods one can define in a class, and which act a bit 
like the opposite of constructors. This means pattern-matching is an 
extensible part of the language (indeed, Scala is short for "Scalable 
Language", and extensibility is a key part of its design).

Many library classes, in particular collections, provide such 
extractors, and are ready-made for pattern-matching. A typical use is in 
a recursive function which processes elements in a List. The 
pattern-match statement is introduced by the expression to match on, 
then the keyword "match":

def recurse(list: List)
{
	list match
	{
		case head::tail => { process(head); recurse(tail)}
		case Nil => return
	}
}

Here 'Nil' is a constant which matches the empty list; on finding it we 
terminate the recursion. The pattern "head::tail" matches a list with a 
head element and a tail (which may be Nil, or may contain other 
elements), and in this example we process the head and continue to 
recurse on the tail. The pattern match works because :: is an extractor 
function defined on the List class. We can use it to match more complex 
patterns, such as head::next::tail, and so on.

[Note the syntax "=>" used to join the pattern with the expression to be 
evaluated on a successful match. It is the same symbol as used in 
function types and anonymous functions, and for good reason: a Scala 
match statement is an instance of a *partial* function: that is, a 
function which is defined for some, but perhaps not all, inputs. This 
commonality of approach and links between and re-use of fundamental 
building blocks is one of the joys of the language.]

Alternatively, if we are looking for a List with a precise number of 
members, the extractor patterns List(a), List(a,b), etc can be used. A 
similar syntax is available for Arrays, the Scala type which corresponds 
most closely to Java-style arrays. This leads to the parsing matcher I 
needed for the Helixoid file:

	def parseLine(line: String): Option[Method] =
	{
		line.split(" ") match
		{
			case Array(hl, pn, hlpns) =>
				Some(new Method(pn))
			case _ => None
		}
	}

Here we first split the input line on the space character, which gives a 
Array[String]. We then match an Array with three elements only, and 
extract the middle one, turning it into a Method object. If we encounter 
any other kind of line, the "default" branch of the match statement will 
be processed: Scala uses the _ character in this context to act as a 
wildcard and represent any other pattern.

This code is equivalent to the following if() statement, but hopefully 
you can see how pattern matching comes into its own with more complex 
problems.

	val splitArray = lines.split(" ")
	if (splitArray.size==3)
		Some(new Method(splitArray(1)))
	else
		None

But what are these Somes and Nones? And what is the Option return type 
of the method? Here is another Scala gem: a library class, called 
Option, which can wrap any other type, and represent either the 
existence of a value of that type, Some(value), or the lack of a value, 
None. It is intended to provide a type-safe alternative to return(null). 
Using Option allows the compiler's type-checker to ensure we have 
handled the null case: the type of our function is not "Method", with 
null being returned if the line does not parse, it is Option[Method], so 
calling code is forced by the compiler to deal with it.

Oh, but we still have to check this case, I hear you cry! Well, yes and 
no. The Option class is part of Scala's collection hierarchy, forming a 
sequence with either zero or one members, and so we can do something 
quite clever, like this:

Source.fromInputStream(getClass.getResourceAsStream(file)).getLines().flatMap(parseLine(_).iterator)

In one line this carries out the following operations:

1. Opens an input stream on the resource "file". This is done using the 
Java standard library calls Object.getClass() and 
Class.getResourceAsStream(). Scala not only runs in the JVM, but 
interoperates fairly seamlessly with Java code: you can use Java 
classes, objects and methods directly from Scala.

2. Creates a Source object, which is part of the Scala standard library 
and provides a nice wrapper for lots of common I/O operations, often 
allowing us to avoid the tedium of closing streams and handling exceptions.

3. Calls getLines() on the Source object, which gives us an 
Iterator[String] - an iterator over each line in the file.

4. Calls flatMap(), the higher-order function introduced in the last 
article, in order to return a new iterator with our parseLine() method 
applied to every element (line) in the first iterator.

The result is, with very little frill and mess, an iterator over the 
place notation in the result file. It works because flatMap operates on 
Iterators just as easily as Lists and other collection types: the 
advantage of the Iterator in this case is that we get the elements one 
by one, so can start processing them before the entire file is read. But 
why flatMap(), and not just map()? Well, our parseLine() method didn't 
return the Method type directly, but an Option[Method]. Some lines might 
not match our pattern, and hence might not provide a Helixoid Method. 
The flatMap() call flattens the Option instances out, getting rid of the 
Nones, and turning the Somes into a single element of the right type for 
the iterator, Method. No null checks required! Simply magic.

Well, that's taken quite a long time to explain, but it took only a few 
seconds to write, and there we have code to solve a simple file-parsing 
problem with brevity and a degree of beauty. It's also allowed me to 
show off some really nice features of the language.

But on to the meaty stuff. The first job is to carry out the expansion 
of the TVs. In order to check negative cross-falseness with PB, it is 
necessary only to consider (a) the positions of bells (123) and (b) the 
sign of each row. Hence, we only need to expand place notations which 
affect these factors. But then for the second phase we need to expand 
all other PN types, which are those which (a) affect only the positions 
of bells (45678) and (b) don't change the sign of the row.

It would be nice to write the code to carry out the expansion once, and 
parameterise it by the type of PNs allowed in each phase. With Scala's 
higher-order functional support, this is easy to do. First we define a 
type alias which represents a function capable of determining which 
place notation expansions are allowed in any one phase:

	type PnFilter = (Row, PN, Set[PN]) => Set[PN]

This defines a function type which accepts as parameters a Row (the one 
to which our expanded PNs apply), a PN (the "non-TV" one from the 
original method), and a set of PNs, representing the alternatives to 
try. The function returns another Set[PN], containing just those PNs 
which are useful for this expansion.

It is not strictly necessary to define this type alias; we could refer 
to the function type directly. However, it's a pleasant and convenient 
thing to be able to do. It allows us to craft a recursive expand() 
function with the following signature:

	def expand(startRow: Row, helixoid: Method, pnF: PnFilter):
		Iterator[Method] =
	{ ... }

In order to generate all the expansions, the usual functional solution 
of recursion comes nicely into play. In Scala, it is possible to define 
functions within the scope of another function, which is very pleasant 
for this sort of job. We can create a recursive function like so, nested 
inside the expand() function given above:

def recurse(row: Row, done: List[PN], todo: List[PN]):
	Iterator[Method] =
{
	todo match
	{
		case Nil => Iterator(new DoubleMethod(done.reverse) )
		case pn::tail =>
		{
			val next = row.apply(pn)
			val goodPns = pnF(row, pn, AllPN).iterator
			goodPns.flatMap.{recurse(next, _::done, tail)}
		}
	}
}

This example builds on several concepts already introduced, including 
pattern matching on Lists, use of Iterators as a first-class citizen of 
the collections library, and use of the flatMap higher-order function. 
It works by examining one PN at a time, generating a list of all 
allowable variants of the PN, and recursively calling itself on each. 
The set of place notations within the quarter-lead already processed are 
kept in one List ("done") and those yet to do in another ("todo"); as 
each PN is expanded, it is shuttled off the end of the "todo" list onto 
the start of "done". The latter list will be built up in reverse,
but we can sort this when we're finished, which occurs when "todo" is empty.

Note in particular how it is possible to refer directly to the 
PnFilter-ing function "pnF", which is a parameter of the enclosing 
expand() function, because we are in the latter's scope. We can finish 
off the body of the expand() function by initiating the first recursive 
call, as follows:

	recurse(startRow, Nil, helixoid.pn.slice(0, HalfLeadLength/2))

Here I use the slice() method, again from the Scala collections library, 
to extract just the first quarter-lead's worth of PN from the method.

We end up with a parameterisable "expand" function which looks like this:

def expand(startRow: Row, helixoid: Method, pnF: PNFilter):
		Iterator[Method] =
{
	def recurse(...) {}

	recurse(startRow, Nil, helixoid.pn.slice(0, HalfLeadLength/2))
}

If it has not been obvious before, Scala allows us to dispense with the 
"return" keyword: the last expression evaluated in a function becomes 
its return value.

What about the two implementations of the PnFilter function? The second 
stage expansion is easy: if we use a masked startRow of "123.....", then 
the check is simply that the alternative PN yields the same row with the 
same sign:

	def validPhase2PN(row: Row, pn: PN, altPn: PN): Boolean =
		row.apply(pn)==row.apply(altPn)	&&
		pn.swapsSign==altPn.swapsSign

We can turn this into a PnFilter by passing it as the higher-order 
function parameter to a call to filter() on the candidate PN set. Why 
not roll this call to filter() into the main expand method, and have the 
PnFilter type implement the simpler, once-per-PN, and Boolean-returning 
function signature of the validPhase2PN method given here? The answer 
will become clear when we consider the more difficult job of phase 1 
expansion, below.

Again, if not obvious before: braces {} are optional where a function 
consists of a single result expression. Also note here that our function 
returns Boolean, which is a Scala class. As in other modern languages 
such as Ruby, everything is an object in Scala, and so it does not have 
the Java distinction between primitive and Object types, such as 
boolean/Boolean. A Scala Boolean is an object, on which we can call 
methods, but is implemented under the covers as a primitive type where 
possible. The same holds for numeric types. We get the benefits of 
primitive speed, but do not ever use primitives; there is also none of 
the unexpected behaviour which Java's boxing/unboxing can produce.

The first phase expansion is somewhat more difficult. The allowed PN 
fall into several distinct sets, and we need just one example from each 
set. The sets are characterised by two factors: the precise arrangement 
of (123) within the row produced, and the sign of the row as a whole. We 
can capture both of these factors using another useful Scala library 
type, known as a Tuple. Scala provides a number of tuple types, ranging 
from Tuple2 (a pair), Tuple3 (a triplet) and so on, up to (if I remember 
correctly!) Tuple23. There is also a nice syntax for tuples: (a, b) 
defines a Tuple2, containing members a and b. The generic type of the 
tuple will be Tuple2[A,B], where A and B are the types of the members. 
We can also use (A,B) as a shortcut for the generic type.

It is really very convenient to have tuples in a language. For phase 1 
PN filtering, it is just the job. Again, if we start from a masked row 
"123.....", we can create a pair (Tuple2) to hold the two factors which 
we need:

	val pair = (row.apply(pn), pn.swapsSign)

Here, since the row is masked and hence its original sign not available, 
we simply use the swapsSign() method on my PN class, which will work 
just as well: we don't need the absolute sign of the row, just to 
distinguish between PNs which produce rows of different signs.

We could also establish a Set to track unique (masked row, sign) pairs:

	val uniquePairs = mutable.Set[(Row, Boolean)]()

And now we can determine whether a new PN is worth having simply by 
seeing if it is already in our set:

	if (uniquePairs.add(pair))
		// It's good

Noting that the add() method on mutable sets not only adds the element 
to the set, but also returns true if it does not replace an existing member.

But does this really work? Sets are normally based on hash tables, and, 
just as in Java, require correct implementation of equals() and 
hashCode() methods on the objects being stored in the set. Very often in 
Scala, as here, the answer is, no worries. The Tuple types define valid 
equals() and hashCode() methods based on their contents; in our case, 
one part of the tuple is a Boolean, a standard library type also with 
valid equals/hashCode; the other is PN, a class which we have defined 
ourselves. I have not shown my implementation here (I'm happy to post 
all my tools later if people want, subject to a bit of code-cleanup 
first...!) but it is defined as a "case class", a specialised kind of 
class in Scala which offers a number of features, including ready-made 
extractors for pattern matching and, crucially for us, automatic and 
guaranteed-correct implementations of equals and hashCode. Job done, 
with no "hash stress".

Here is the complete implementation of PnFilter for phase 1:

def phase1Filter(row: Row, originalPn: PN, candidates: Set[PN]) =
{
	val uniquePns = mutable.Set[(Row,Boolean)]()
	val originalMasked = mask123(row.apply(originalPN))

	def validPhase1PN(altPn: PN) =
	{
		val row2 = row.apply(altPn)
		val pair = (row2, altPn.swapsSign)
		originalMasked==mask123(row2) && uniquePairs.add(pair)
	}

	originalPn.filter{ validPhase1PN(_) }
}

Now we see why the function is not a totally straightforward application 
of a Boolean predicate to the filter() method: it is necessary to keep 
state, in the form of the "uniquePns" set, in order to pick a single 
(arbitrary) representative of each type of PN expansion. The 
"validPhase1PN" method is not a true function, as was the case with 
validPhase2PN, since it has a side effect, storing values in a Set at 
the higher scope.

I'm sure there is a solution to this problem using a pure-functional 
approach, but it is very pleasant that Scala allows us to mix and match 
coding paradigms so we can do things in the easiest, or most obvious, way.

The only additional thing to spot in our implementation is the check on 
the "fully-masked" rows. These are of the form XXX..... and I have 
assumed a support function mask123() which produces them from the 
half-masked 123..... type rows. The check is necessary because the 
candidate PN array will include PNs that violate the original Helixoid 
path, and hence could produce something that would not be a TV, and 
probably not a valid Helixoid either.

OK, we have covered a lot of useful Scala concepts so far, including 
pattern matching, iterators, nested functions and tuples. Let us apply 
these to the real work of phase 1: finding Helixoids with no negative 
falseness with our PB touch. Previous experience led me to believe that 
it is possible to do this by considering just the rows with the treble 
at lead, so let's start off by extracting these rows from the first 
three leads of the Helixoid method's plain course (which contains all 
positions for the front three). As we expect by now, this is blindingly 
simple to do using Scala:

val threeLeads = helixoid.fullCourse.slice(0, HalfLeadLength*6)
val trebleAtLead = threeLeads.filter{ _.bellAt(1)==1 }
val maskedPairs = trebleAtLead.map{ (row)=> (mask45678(row), row.sign)}

Given a helixoid Method, we assume a support function to deliver us the 
list of rows in the plain course, helixoid.fullCourse(). We take the 
first 336 elements of this list. Then we simply apply a filter which 
accepts rows where bellAt(1)==1, again using a support function, here on 
my "Row" class, which I'll try and share later, but which is not of so 
much interest for our current discussions.

Finally, we wish to ignore the position of bells 4-8, so we apply a map 
function which masks these bells out. But, we do need to preserve the 
sign of the overall row, so we calculate that and keep a pair 
(masked-row, original-row-sign).

Now, the congruency property which we need is simply this: that all the 
positive treble leads can be linked together via a halflead PN of 18 and 
leadhead PN of 12; or in other words, that the leads fall into three 
separate plain courses of Plain Bob. In fact, it seems that the positive 
leads of a Helixoid can *always* be linked to the negative leads by 
Plain Hunt (a 18 halflead), so it turns out we only need to check the 
latter condition, that 12 leadheads will link the negative and positive 
leads.

Using our maskedPairs list, defined above, this is now simple to 
achieve. First we partition the list into positive and negative rows. 
Here I can demonstrate two further Scala features: the richness of its 
collections library, here providing the partition() function which we 
need, and the neat ability to assign tuples to simultaneous sets of 
variables:

val (positive, negative) = maskedPairs.toSet.partition(_._2)

The partition() function operates on a collection, and returns a pair of 
collections of the original type (here Set: we don't care about the 
order of the rows, so have converted from the original List). A Boolean 
function must be provided which operates on the element type of the 
collection; elements returning true go in the left side of the result, 
those returning false in the right side. Here our predicate is _._2, a 
shortcut for

	(p:(Row,Boolean)) => p._2

Methods "_1", "_2" etc are provided on tuples to extract the individual 
members, so p._2 simply returns the Boolean (sign) part of our pair. 
Perhaps not the most elegant method names to be chosen by Scala's designers!

Since the partition() call returns a pair of sets, we could assign it to 
a variable of type (Set[Row], Set[Row]), but Scala allows the syntax 
shown above, where we break the pair out into two variables, here called 
"positive" and "negative", and each of type Set[Row]. Interestingly, it 
is actually an example of a pattern-match on the left-hand-side of a 
variable assignment, and triggers an extractor on the Tuple2 class. Very 
often what looks like a core part of the syntax of the language is in 
fact a library function, which we could emulate or extend in our own 
code if we wanted to.

We have now set up the data structures we need for the lead-linking 
algorithm, using just a couple of lines of code. The linking algorithm 
is almost as straightforward and concise. First we code a method which 
decides whether a positive leadhead can be linked to a negative one:

	def canBeLinked(positiveRow: Row, pn: PN) =
		negative.contains(positiveRow.apply(PN)

It is as simple as that: a positive row can be linked if, when permuted 
by PN "12", we arrive at a row in the "negative" set. (You might be 
thinking this is always the case, but not so! We are still dealing with 
masked rows, so, in two different Helixoids, a row such as 1xx3x2xx may 
occur with either a negative or positive sign. If the corresponding row 
1x3x2xxx, which is linked by PN=12, has the same sign, the check fails 
and the leads cannot be linked. This may be the case in one Helixoid, 
but not in another.)

All that is left to do is to make sure this condition holds for all rows 
in the positive set. The Scala collections library has two useful 
methods for this kind of check: exists(), which if given a 
Boolean-returning function, returns true if at least one element 
satisfies the predicate; and forAll(), which returns true if all 
elements satisfy the predicate. The latter is what we need:

	positive.forAll{ canBeLinked(_, PN("12") }

And that's it. Here is the full code for the congruency check:

def isCongruent(helixoid: Method): Boolean =
{
	val threeLeads = helixoid.fullCourse.slice(0, HalfLeadLength*6)
	val trebleAtLead = threeLeads.filter{ _.bellAt(1)==1 }
	val maskedPairs = trebleAtLead.map
				{ (row)=> (mask45678(row), row.sign) }
	val (positive, negative) = maskedPairs.toSet.partition(_._2)

	def canBeLinked(positiveRow: Row, pn: PN) =
		negative.contains(positiveRow.apply(PN)

	positive.forAll{ canBeLinked(_, PN("12") }
}

Isn't that fabulous? A problem which on the surface appeared to be quite 
difficult, that of checking cross-falseness against a certain structure 
of PB touch, resolves itself into such a tiny, neat, efficient piece of 
code. The rapidity with which Scala enabled me to solve problems like 
this has contributed powerfully to my increasing love of the language. I 
hope it will inspire others, too.

In my final article, due in the next mail, I'll cover the final part of 
the pipeline, the "splice-sister" search. As promised in the first 
article, I'll also cover ways in which Scala can provide optimisations 
such as pre-built node tables; it turns out we can often get the 
performance benefits of node tables without writing any table-building 
code. The composer/programmer gets a free lunch!

MBD