[r-t] Similar compositions
andrew_johnson at uk.ibm.com
Mon Jan 22 13:05:06 UTC 2018
Composition Library, a.k.a. CompLib <https://www.complib.org/> is doing an
excellent job at accumulating compositions as people enter them.
With 23,659 compositions as of 22 January 2017, there are now two
problems, identifying duplicates and similar compositions, and also
choosing a suitable composition.
I'll just consider the first.
One important reference is:
THE CENTRAL COUNCIL OF CHURCH BELL RINGERS
Variation and Transposition By J. ARMIGER TROLLOPE
Issued under the Authority of the Council
which says "we see that any true peal can be begun from any lead-end and
still remain the same true peal; or it can be reversed and still be the
same, and the reversal can be begun from any lead-end and still be the
same. All these variations will give different leadends, and, of course,
He also explains about Q-sets and how plaining a bobbed Q-set can reduce a
peal of says 5376 to 5152. Also "In many cases if you bob a plained “Q
set” you will still get the same number of changes, but they will come in
a different order. "
There is also the case of converting a peal in one method to another where
all the calls are in Q-sets
"These two compositions look quite different; actually they are the same,
and they contain exactly the same lead-ends, but in different order. The
first by N. J. Pitstow is No. 88 in the Double Norwich Collection, and was
rung in 1892. The second, with a little alteration of the bobs which do
not affect the “part” bells is No. 87 in the Bob Major Collection, and was
composed in 1894 by Mr. Lindoff. According to the old standards. these are
distinct and original compositions, and, of course, they were produced
quite independently of each other "
He also considers the case of finding the 'key of the composition' by
omitting courses linked in by Q-sets.
It would also be nice to spot related compositions in different stages
e.g. WHW*3 in Plain Bob Minor and Yorkshire Surprise Major.
I would like to automated some of this, preferably in a way that doesn't
involve extensive computation comparing each composition against every
other as that could take a while.
A hash is a transform of a string of characters into a shorter fixed
length value. It is possible for two different strings to have the same
hash, but good hash functions spread out the hash values and minimise the
chance. We can hash all the rows of a composition and if two compositions
have the same hash they are probably the same.
Some hash calculations can be order dependent and some independent.
E.g. just adding the character codes of each character gives a hash, but
'ABC' will hash to the same value as 'CBA'.
Java uses the following order dependent hash.
for each character:
hash = hash * 31
hash = hash + character code of next character
Let us consider the following compositions of Cambridge Surprise Major:
5056 Cambridge Surprise Major Charles Middleton arr. Henry Johnson -
rotated version of #c10030
5056 Cambridge Surprise Major Charles Middleton
5152 Cambridge Surprise Major Charles Middleton (omit 3 homes)
5184 Cambridge Surprise Major Charles Middleton arr. Arthur P Heywood
5184 Cambridge Surprise Major Charles Middleton arr. James W Washbrook
5600 Cambridge Surprise Major Charles Middleton
5600 Cambridge Surprise Major Charles Middleton (rotated)
5600 Cambridge Surprise Major William Shipway
CompLib can generate a list of all the rows, including opening and closing
rounds, so we can use this for our calculations:
Hash all rows, order important: Use Java hash of row, and use Java style
hash to generate hash of all row hashes. This should work quite well to
detect identical compositions.
Hash all rows, order unimportant: Use Java hash of row, and add Java
hashes together (or exclusive or). This should detect compositions with
the same rows, but possibly in a different order. Clearly extents or
in-course extents will have the same hash value.
One interesting clash is the 5600 by Charles Middleton and William Shipway
have the same hash using 32-bit addition to combine the hash of each row:
but are different using exclusive-or (hash3) or 64-bit addition to total
the 32-bit hashes for each row. I don't know why this is the case - the
collision persists just considering the leadends and leadheads. There are
350 leadend and leadheads, of which 70 are common and 280 are different.
We now can consider rotations. I see two approaches here - one is to
rotate and renumber the rows (so we always start from rounds) and take the
rotation with the lowest (highest) etc. hash. The other is to rotate, and
take the rotation which comes first when sorting, then take the hash. The
second might be slightly better as there should be a better distribution
of hashes. This correctly spots #c10030 and #c14095 as the same and
#c10029 and #c38321 as the same.
To consider reversals, we can also do the same, consider all rotations and
find the lowest in sort order, then reverse and consider all those
rotations, and find the lowest, then choose the lowest of the two, then
find the hash. This additionally finds #c14493 and #c14494 as equivalent.
It is known that the Arthur P Heywood and James W Washbrook arrangements
We can also just consider leadend and leadheads (by looking for
consecutive rows with the treble leading) or leadheads (just the second
row) with the above hashes. This should spot similarities between
compositions in different methods for the same leadhead group. E.g.
5088 Yorkshire Surprise Major by Nathan J Pitstow
5088 Superlative Surprise major by Nathan J Pitstow
This might be useful in Stedman Triples peals by considering six-ends and
six-heads as although all peals contain the same rows, the types of the
sixes will vary. This could find peals the same apart from adding/removing
Q-sets of bobs. Automatically deciding for any composition what is a
leadend/leadhead or six-end six-head might be hard though.
It would not spot similarities where the peals are of different lengths,
for example where the methods have different leadhead groups:
5280 Rutland Surprise Major by Nathan J Pitstow
5088 Yorkshire Surprise Major by Nathan J Pitstow
This could be detected by only considering leadends/leadheads where there
is a call, which works for these two peals.
Unfortunately only considering those leadends/leadheads will mask
compositions which are otherwise similar, but one composition has an extra
neutral Q-set of calls which just rearranges the order. Perhaps a
preprocessing step which removes all neutral Q-sets could help - though I
don't know if this could be done in a unique fashion.
This also might be useful in plain minor where all the six-ends and
six-heads cover all the rows with the treble leading, but the ones
affected by calls might make an identifying characteristic.
There is also the case of identifying WHW*3 for plain minor and surprise
major being similar. One way is to look for leadends/leadheads affected by
a call, then remove the fixed bells from all such rows. Perhaps the
composition should be rotated / reversed to get the tenors as fixed bells
before they are stripped.
Matching peals in plain methods doubled by singles with treble bobs
methods is harder too.
I'm not sure how to detect that all the Cambridge peals above are similar.
Stripping out all Q-sets might identify the #c10029 and #c38330, #c38321
as similar but not the others.
The general idea of all these steps is to generate a list of hashes for
each composition, and if two compositions have the same hash then they are
related, rather than having to do a pairwise comparison of compositions.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
More information about the ringing-theory