[r-t] Scala, part two
Mark Davies
mark at snowtiger.net
Thu Jul 5 12:04:48 UTC 2012
In order to produce an extent of Major which would satisfy the stringent
demands of my client (Philip!) I needed at least two Helixoid methods
satisfying these requirements:
1. Double
2. Course splices of each other
3. Neither possessing any negative falseness with a given touch of PB8
4. As different as possible from one another, ideally with one being
right-place and one wrong-place.
I had by this stage proved that it was possible to get a single,
non-Double method which satisfied point (3), and hence could form an
extent. It was not at all obvious that there could exist two methods
which would meet the much higher bar set by the additional requirements.
Nor was it immediately clear how to go about searching for them:
although I had a list of all possible Double Helixoids, it was some
megabytes in size, and very time-consuming applying the
negative-falseness (or as I termed it, "Plain Bob congruency") and
course-splice checks to the methods within it. Furthermore, doing so
initially yielded nothing.
However, I had but scratched the surface, since each method in the (8MB)
Double file was merely one example of a set of trivial variations whose
size often approached two million, since place notations which merely
perturbed the (123) and (45678) bells within a given Helixoid "path" had
been pruned from the initial search (as in the Scala code demonstrated
in the previous article). I had a feeling that if I was able to expand
every possible Helixoid, I might have more luck finding my magic methods.
The trouble was, since it took hours to process the 62 thousand
unexpanded methods, it was clearly impossible to scale this up by a
factor of a million or more in order to deal with all the TVs.
Two breakthroughs helped me to tackle the problem, and in both cases
Scala allowed me to code the solution very rapidly, as I will show in
this article. The first was the idea of the pipeline: to do the
congruency-checking first, and the course-splice check afterwards, and
hence require only a subset of the TVs to be expanded at each stage. The
second was the discovery that all Helixoids with the correct
"congruency" property contained rows with the treble at lead which
formed the handstrokes and backstrokes of a specific touch of Plain Bob,
and that it was sufficient to check this in order to prove the lack of
negative falseness. This lead to a very quick "congruency checker".
The first job was the very mundane task of reading the file of
previously-generated Helixoids. One of the beauties of Scala is that
boring jobs become quicker and easier by about the same magnitude as the
difficult stuff. My output file, which Philip recently uploaded to the
list, contained lines of the following format:
halflead PN possible half-leads
85476312 58.14....36.58.14 (56,36,16,58,38,18)
All I needed from that was the place notation string in the middle. It
would be easy to slice that out using some string processing, however
this is a good place to demonstrate Scala's powerful support for pattern
matching. Java's "switch" statement is extended to allow arbitrary
pattern-matching based on what are known as extractors, but are really
just ordinary methods one can define in a class, and which act a bit
like the opposite of constructors. This means pattern-matching is an
extensible part of the language (indeed, Scala is short for "Scalable
Language", and extensibility is a key part of its design).
Many library classes, in particular collections, provide such
extractors, and are ready-made for pattern-matching. A typical use is in
a recursive function which processes elements in a List. The
pattern-match statement is introduced by the expression to match on,
then the keyword "match":
def recurse(list: List)
{
list match
{
case head::tail => { process(head); recurse(tail)}
case Nil => return
}
}
Here 'Nil' is a constant which matches the empty list; on finding it we
terminate the recursion. The pattern "head::tail" matches a list with a
head element and a tail (which may be Nil, or may contain other
elements), and in this example we process the head and continue to
recurse on the tail. The pattern match works because :: is an extractor
function defined on the List class. We can use it to match more complex
patterns, such as head::next::tail, and so on.
[Note the syntax "=>" used to join the pattern with the expression to be
evaluated on a successful match. It is the same symbol as used in
function types and anonymous functions, and for good reason: a Scala
match statement is an instance of a *partial* function: that is, a
function which is defined for some, but perhaps not all, inputs. This
commonality of approach and links between and re-use of fundamental
building blocks is one of the joys of the language.]
Alternatively, if we are looking for a List with a precise number of
members, the extractor patterns List(a), List(a,b), etc can be used. A
similar syntax is available for Arrays, the Scala type which corresponds
most closely to Java-style arrays. This leads to the parsing matcher I
needed for the Helixoid file:
def parseLine(line: String): Option[Method] =
{
line.split(" ") match
{
case Array(hl, pn, hlpns) =>
Some(new Method(pn))
case _ => None
}
}
Here we first split the input line on the space character, which gives a
Array[String]. We then match an Array with three elements only, and
extract the middle one, turning it into a Method object. If we encounter
any other kind of line, the "default" branch of the match statement will
be processed: Scala uses the _ character in this context to act as a
wildcard and represent any other pattern.
This code is equivalent to the following if() statement, but hopefully
you can see how pattern matching comes into its own with more complex
problems.
val splitArray = lines.split(" ")
if (splitArray.size==3)
Some(new Method(splitArray(1)))
else
None
But what are these Somes and Nones? And what is the Option return type
of the method? Here is another Scala gem: a library class, called
Option, which can wrap any other type, and represent either the
existence of a value of that type, Some(value), or the lack of a value,
None. It is intended to provide a type-safe alternative to return(null).
Using Option allows the compiler's type-checker to ensure we have
handled the null case: the type of our function is not "Method", with
null being returned if the line does not parse, it is Option[Method], so
calling code is forced by the compiler to deal with it.
Oh, but we still have to check this case, I hear you cry! Well, yes and
no. The Option class is part of Scala's collection hierarchy, forming a
sequence with either zero or one members, and so we can do something
quite clever, like this:
Source.fromInputStream(getClass.getResourceAsStream(file)).getLines().flatMap(parseLine(_).iterator)
In one line this carries out the following operations:
1. Opens an input stream on the resource "file". This is done using the
Java standard library calls Object.getClass() and
Class.getResourceAsStream(). Scala not only runs in the JVM, but
interoperates fairly seamlessly with Java code: you can use Java
classes, objects and methods directly from Scala.
2. Creates a Source object, which is part of the Scala standard library
and provides a nice wrapper for lots of common I/O operations, often
allowing us to avoid the tedium of closing streams and handling exceptions.
3. Calls getLines() on the Source object, which gives us an
Iterator[String] - an iterator over each line in the file.
4. Calls flatMap(), the higher-order function introduced in the last
article, in order to return a new iterator with our parseLine() method
applied to every element (line) in the first iterator.
The result is, with very little frill and mess, an iterator over the
place notation in the result file. It works because flatMap operates on
Iterators just as easily as Lists and other collection types: the
advantage of the Iterator in this case is that we get the elements one
by one, so can start processing them before the entire file is read. But
why flatMap(), and not just map()? Well, our parseLine() method didn't
return the Method type directly, but an Option[Method]. Some lines might
not match our pattern, and hence might not provide a Helixoid Method.
The flatMap() call flattens the Option instances out, getting rid of the
Nones, and turning the Somes into a single element of the right type for
the iterator, Method. No null checks required! Simply magic.
Well, that's taken quite a long time to explain, but it took only a few
seconds to write, and there we have code to solve a simple file-parsing
problem with brevity and a degree of beauty. It's also allowed me to
show off some really nice features of the language.
But on to the meaty stuff. The first job is to carry out the expansion
of the TVs. In order to check negative cross-falseness with PB, it is
necessary only to consider (a) the positions of bells (123) and (b) the
sign of each row. Hence, we only need to expand place notations which
affect these factors. But then for the second phase we need to expand
all other PN types, which are those which (a) affect only the positions
of bells (45678) and (b) don't change the sign of the row.
It would be nice to write the code to carry out the expansion once, and
parameterise it by the type of PNs allowed in each phase. With Scala's
higher-order functional support, this is easy to do. First we define a
type alias which represents a function capable of determining which
place notation expansions are allowed in any one phase:
type PnFilter = (Row, PN, Set[PN]) => Set[PN]
This defines a function type which accepts as parameters a Row (the one
to which our expanded PNs apply), a PN (the "non-TV" one from the
original method), and a set of PNs, representing the alternatives to
try. The function returns another Set[PN], containing just those PNs
which are useful for this expansion.
It is not strictly necessary to define this type alias; we could refer
to the function type directly. However, it's a pleasant and convenient
thing to be able to do. It allows us to craft a recursive expand()
function with the following signature:
def expand(startRow: Row, helixoid: Method, pnF: PnFilter):
Iterator[Method] =
{ ... }
In order to generate all the expansions, the usual functional solution
of recursion comes nicely into play. In Scala, it is possible to define
functions within the scope of another function, which is very pleasant
for this sort of job. We can create a recursive function like so, nested
inside the expand() function given above:
def recurse(row: Row, done: List[PN], todo: List[PN]):
Iterator[Method] =
{
todo match
{
case Nil => Iterator(new DoubleMethod(done.reverse) )
case pn::tail =>
{
val next = row.apply(pn)
val goodPns = pnF(row, pn, AllPN).iterator
goodPns.flatMap.{recurse(next, _::done, tail)}
}
}
}
This example builds on several concepts already introduced, including
pattern matching on Lists, use of Iterators as a first-class citizen of
the collections library, and use of the flatMap higher-order function.
It works by examining one PN at a time, generating a list of all
allowable variants of the PN, and recursively calling itself on each.
The set of place notations within the quarter-lead already processed are
kept in one List ("done") and those yet to do in another ("todo"); as
each PN is expanded, it is shuttled off the end of the "todo" list onto
the start of "done". The latter list will be built up in reverse,
but we can sort this when we're finished, which occurs when "todo" is empty.
Note in particular how it is possible to refer directly to the
PnFilter-ing function "pnF", which is a parameter of the enclosing
expand() function, because we are in the latter's scope. We can finish
off the body of the expand() function by initiating the first recursive
call, as follows:
recurse(startRow, Nil, helixoid.pn.slice(0, HalfLeadLength/2))
Here I use the slice() method, again from the Scala collections library,
to extract just the first quarter-lead's worth of PN from the method.
We end up with a parameterisable "expand" function which looks like this:
def expand(startRow: Row, helixoid: Method, pnF: PNFilter):
Iterator[Method] =
{
def recurse(...) {}
recurse(startRow, Nil, helixoid.pn.slice(0, HalfLeadLength/2))
}
If it has not been obvious before, Scala allows us to dispense with the
"return" keyword: the last expression evaluated in a function becomes
its return value.
What about the two implementations of the PnFilter function? The second
stage expansion is easy: if we use a masked startRow of "123.....", then
the check is simply that the alternative PN yields the same row with the
same sign:
def validPhase2PN(row: Row, pn: PN, altPn: PN): Boolean =
row.apply(pn)==row.apply(altPn) &&
pn.swapsSign==altPn.swapsSign
We can turn this into a PnFilter by passing it as the higher-order
function parameter to a call to filter() on the candidate PN set. Why
not roll this call to filter() into the main expand method, and have the
PnFilter type implement the simpler, once-per-PN, and Boolean-returning
function signature of the validPhase2PN method given here? The answer
will become clear when we consider the more difficult job of phase 1
expansion, below.
Again, if not obvious before: braces {} are optional where a function
consists of a single result expression. Also note here that our function
returns Boolean, which is a Scala class. As in other modern languages
such as Ruby, everything is an object in Scala, and so it does not have
the Java distinction between primitive and Object types, such as
boolean/Boolean. A Scala Boolean is an object, on which we can call
methods, but is implemented under the covers as a primitive type where
possible. The same holds for numeric types. We get the benefits of
primitive speed, but do not ever use primitives; there is also none of
the unexpected behaviour which Java's boxing/unboxing can produce.
The first phase expansion is somewhat more difficult. The allowed PN
fall into several distinct sets, and we need just one example from each
set. The sets are characterised by two factors: the precise arrangement
of (123) within the row produced, and the sign of the row as a whole. We
can capture both of these factors using another useful Scala library
type, known as a Tuple. Scala provides a number of tuple types, ranging
from Tuple2 (a pair), Tuple3 (a triplet) and so on, up to (if I remember
correctly!) Tuple23. There is also a nice syntax for tuples: (a, b)
defines a Tuple2, containing members a and b. The generic type of the
tuple will be Tuple2[A,B], where A and B are the types of the members.
We can also use (A,B) as a shortcut for the generic type.
It is really very convenient to have tuples in a language. For phase 1
PN filtering, it is just the job. Again, if we start from a masked row
"123.....", we can create a pair (Tuple2) to hold the two factors which
we need:
val pair = (row.apply(pn), pn.swapsSign)
Here, since the row is masked and hence its original sign not available,
we simply use the swapsSign() method on my PN class, which will work
just as well: we don't need the absolute sign of the row, just to
distinguish between PNs which produce rows of different signs.
We could also establish a Set to track unique (masked row, sign) pairs:
val uniquePairs = mutable.Set[(Row, Boolean)]()
And now we can determine whether a new PN is worth having simply by
seeing if it is already in our set:
if (uniquePairs.add(pair))
// It's good
Noting that the add() method on mutable sets not only adds the element
to the set, but also returns true if it does not replace an existing member.
But does this really work? Sets are normally based on hash tables, and,
just as in Java, require correct implementation of equals() and
hashCode() methods on the objects being stored in the set. Very often in
Scala, as here, the answer is, no worries. The Tuple types define valid
equals() and hashCode() methods based on their contents; in our case,
one part of the tuple is a Boolean, a standard library type also with
valid equals/hashCode; the other is PN, a class which we have defined
ourselves. I have not shown my implementation here (I'm happy to post
all my tools later if people want, subject to a bit of code-cleanup
first...!) but it is defined as a "case class", a specialised kind of
class in Scala which offers a number of features, including ready-made
extractors for pattern matching and, crucially for us, automatic and
guaranteed-correct implementations of equals and hashCode. Job done,
with no "hash stress".
Here is the complete implementation of PnFilter for phase 1:
def phase1Filter(row: Row, originalPn: PN, candidates: Set[PN]) =
{
val uniquePns = mutable.Set[(Row,Boolean)]()
val originalMasked = mask123(row.apply(originalPN))
def validPhase1PN(altPn: PN) =
{
val row2 = row.apply(altPn)
val pair = (row2, altPn.swapsSign)
originalMasked==mask123(row2) && uniquePairs.add(pair)
}
originalPn.filter{ validPhase1PN(_) }
}
Now we see why the function is not a totally straightforward application
of a Boolean predicate to the filter() method: it is necessary to keep
state, in the form of the "uniquePns" set, in order to pick a single
(arbitrary) representative of each type of PN expansion. The
"validPhase1PN" method is not a true function, as was the case with
validPhase2PN, since it has a side effect, storing values in a Set at
the higher scope.
I'm sure there is a solution to this problem using a pure-functional
approach, but it is very pleasant that Scala allows us to mix and match
coding paradigms so we can do things in the easiest, or most obvious, way.
The only additional thing to spot in our implementation is the check on
the "fully-masked" rows. These are of the form XXX..... and I have
assumed a support function mask123() which produces them from the
half-masked 123..... type rows. The check is necessary because the
candidate PN array will include PNs that violate the original Helixoid
path, and hence could produce something that would not be a TV, and
probably not a valid Helixoid either.
OK, we have covered a lot of useful Scala concepts so far, including
pattern matching, iterators, nested functions and tuples. Let us apply
these to the real work of phase 1: finding Helixoids with no negative
falseness with our PB touch. Previous experience led me to believe that
it is possible to do this by considering just the rows with the treble
at lead, so let's start off by extracting these rows from the first
three leads of the Helixoid method's plain course (which contains all
positions for the front three). As we expect by now, this is blindingly
simple to do using Scala:
val threeLeads = helixoid.fullCourse.slice(0, HalfLeadLength*6)
val trebleAtLead = threeLeads.filter{ _.bellAt(1)==1 }
val maskedPairs = trebleAtLead.map{ (row)=> (mask45678(row), row.sign)}
Given a helixoid Method, we assume a support function to deliver us the
list of rows in the plain course, helixoid.fullCourse(). We take the
first 336 elements of this list. Then we simply apply a filter which
accepts rows where bellAt(1)==1, again using a support function, here on
my "Row" class, which I'll try and share later, but which is not of so
much interest for our current discussions.
Finally, we wish to ignore the position of bells 4-8, so we apply a map
function which masks these bells out. But, we do need to preserve the
sign of the overall row, so we calculate that and keep a pair
(masked-row, original-row-sign).
Now, the congruency property which we need is simply this: that all the
positive treble leads can be linked together via a halflead PN of 18 and
leadhead PN of 12; or in other words, that the leads fall into three
separate plain courses of Plain Bob. In fact, it seems that the positive
leads of a Helixoid can *always* be linked to the negative leads by
Plain Hunt (a 18 halflead), so it turns out we only need to check the
latter condition, that 12 leadheads will link the negative and positive
leads.
Using our maskedPairs list, defined above, this is now simple to
achieve. First we partition the list into positive and negative rows.
Here I can demonstrate two further Scala features: the richness of its
collections library, here providing the partition() function which we
need, and the neat ability to assign tuples to simultaneous sets of
variables:
val (positive, negative) = maskedPairs.toSet.partition(_._2)
The partition() function operates on a collection, and returns a pair of
collections of the original type (here Set: we don't care about the
order of the rows, so have converted from the original List). A Boolean
function must be provided which operates on the element type of the
collection; elements returning true go in the left side of the result,
those returning false in the right side. Here our predicate is _._2, a
shortcut for
(p:(Row,Boolean)) => p._2
Methods "_1", "_2" etc are provided on tuples to extract the individual
members, so p._2 simply returns the Boolean (sign) part of our pair.
Perhaps not the most elegant method names to be chosen by Scala's designers!
Since the partition() call returns a pair of sets, we could assign it to
a variable of type (Set[Row], Set[Row]), but Scala allows the syntax
shown above, where we break the pair out into two variables, here called
"positive" and "negative", and each of type Set[Row]. Interestingly, it
is actually an example of a pattern-match on the left-hand-side of a
variable assignment, and triggers an extractor on the Tuple2 class. Very
often what looks like a core part of the syntax of the language is in
fact a library function, which we could emulate or extend in our own
code if we wanted to.
We have now set up the data structures we need for the lead-linking
algorithm, using just a couple of lines of code. The linking algorithm
is almost as straightforward and concise. First we code a method which
decides whether a positive leadhead can be linked to a negative one:
def canBeLinked(positiveRow: Row, pn: PN) =
negative.contains(positiveRow.apply(PN)
It is as simple as that: a positive row can be linked if, when permuted
by PN "12", we arrive at a row in the "negative" set. (You might be
thinking this is always the case, but not so! We are still dealing with
masked rows, so, in two different Helixoids, a row such as 1xx3x2xx may
occur with either a negative or positive sign. If the corresponding row
1x3x2xxx, which is linked by PN=12, has the same sign, the check fails
and the leads cannot be linked. This may be the case in one Helixoid,
but not in another.)
All that is left to do is to make sure this condition holds for all rows
in the positive set. The Scala collections library has two useful
methods for this kind of check: exists(), which if given a
Boolean-returning function, returns true if at least one element
satisfies the predicate; and forAll(), which returns true if all
elements satisfy the predicate. The latter is what we need:
positive.forAll{ canBeLinked(_, PN("12") }
And that's it. Here is the full code for the congruency check:
def isCongruent(helixoid: Method): Boolean =
{
val threeLeads = helixoid.fullCourse.slice(0, HalfLeadLength*6)
val trebleAtLead = threeLeads.filter{ _.bellAt(1)==1 }
val maskedPairs = trebleAtLead.map
{ (row)=> (mask45678(row), row.sign) }
val (positive, negative) = maskedPairs.toSet.partition(_._2)
def canBeLinked(positiveRow: Row, pn: PN) =
negative.contains(positiveRow.apply(PN)
positive.forAll{ canBeLinked(_, PN("12") }
}
Isn't that fabulous? A problem which on the surface appeared to be quite
difficult, that of checking cross-falseness against a certain structure
of PB touch, resolves itself into such a tiny, neat, efficient piece of
code. The rapidity with which Scala enabled me to solve problems like
this has contributed powerfully to my increasing love of the language. I
hope it will inspire others, too.
In my final article, due in the next mail, I'll cover the final part of
the pipeline, the "splice-sister" search. As promised in the first
article, I'll also cover ways in which Scala can provide optimisations
such as pre-built node tables; it turns out we can often get the
performance benefits of node tables without writing any table-building
code. The composer/programmer gets a free lunch!
MBD
More information about the ringing-theory
mailing list