[r-t] Similar compositions

Andrew Johnson andrew_johnson at uk.ibm.com
Mon Jan 22 13:05:06 UTC 2018


Composition Library, a.k.a. CompLib <https://www.complib.org/> is doing an 
excellent job at accumulating compositions as people enter them.

With 23,659 compositions as of 22 January 2017, there are now two 
problems, identifying duplicates and similar compositions, and also 
choosing a suitable composition.

I'll just consider the first.

One important reference is:
THE CENTRAL COUNCIL OF CHURCH BELL RINGERS
Variation and Transposition By J. ARMIGER TROLLOPE
Issued under the Authority of the Council
http://www.ringing.info/v-and-t.html

which says "we see that any true peal can be begun from any lead-end and 
still remain the same true peal; or it can be reversed and still be the 
same, and the reversal can be begun from any lead-end and still be the 
same. All these variations will give different leadends, and, of course, 
different qualities."

He also explains about Q-sets and how plaining a bobbed Q-set can reduce a 
peal of says 5376 to 5152. Also "In many cases if you bob a plained “Q 
set” you will still get the same number of changes, but they will come in 
a different order. "

There is also the case of converting a peal in one method to another where 
all the calls are in Q-sets
"These two compositions look quite different; actually they are the same, 
and they contain exactly the same lead-ends, but in different order. The 
first by N. J. Pitstow is No. 88 in the Double Norwich Collection, and was 
rung in 1892. The second, with a little alteration of the bobs which do 
not affect the “part” bells is No. 87 in the Bob Major Collection, and was 
composed in 1894 by Mr. Lindoff. According to the old standards. these are 
distinct and original compositions, and, of course, they were produced 
quite independently of each other "

He also considers the case of finding the 'key of the composition' by 
omitting courses linked in by Q-sets.

It would also be nice to spot related compositions in different stages 
e.g. WHW*3 in Plain Bob Minor and Yorkshire Surprise Major.

I would like to automated some of this, preferably in a way that doesn't 
involve extensive computation comparing each composition against every 
other as that could take a while.

A hash is a transform of a string of characters into a shorter fixed 
length value. It is possible for two different strings to have the same 
hash, but good hash functions spread out the hash values and minimise the 
chance. We can hash all the rows of a composition and if two compositions 
have the same hash they are probably the same.

Some hash calculations can be order dependent and some independent.

E.g. just adding the character codes of each character gives a hash, but 
'ABC' will hash to the same value as 'CBA'.

Java uses the following order dependent hash.
hash=0
for each character:
    hash = hash * 31
    hash = hash + character code of next character

Let us consider the following compositions of Cambridge Surprise Major:

5056 Cambridge Surprise Major Charles Middleton arr. Henry Johnson - 
rotated version of #c10030
https://complib.org/composition/14095
5056 Cambridge Surprise Major Charles Middleton
https://complib.org/composition/10030
5152 Cambridge Surprise Major Charles Middleton (omit 3 homes)
https://complib.org/composition/38330?accessKey=06bdc66d4b3df9cfc401ea749a80f30db9d4eecd
5184 Cambridge Surprise Major   Charles Middleton arr. Arthur P Heywood
https://complib.org/composition/14494
5184 Cambridge Surprise Major Charles Middleton arr. James W Washbrook
https://complib.org/composition/14493
5600 Cambridge Surprise Major Charles Middleton
https://complib.org/composition/10029
5600 Cambridge Surprise Major Charles Middleton (rotated)
https://complib.org/composition/38321?accessKey=ee0978ed1aa8b92903917cb10adbc8b9e2cb1783
5600 Cambridge Surprise Major William Shipway
https://complib.org/composition/14496

CompLib can generate a list of all the rows, including opening and closing 
rounds, so we can use this for our calculations:
https://complib.org/composition/10029/rows

Hash all rows, order important: Use Java hash of row, and use Java style 
hash to generate hash of all row hashes. This should work quite well to 
detect identical compositions.

Hash all rows, order unimportant: Use Java hash of row, and add Java 
hashes together (or exclusive or). This should detect compositions with 
the same rows, but possibly in a different order. Clearly extents or 
in-course extents will have the same hash value.

One interesting clash is the 5600 by Charles Middleton and William Shipway 
have the same hash using 32-bit addition to combine the hash of each row:
hash2=1824995460 hash3=-1163601020
hash2=1824995460 hash3=1528932476

but are different using exclusive-or (hash3) or 64-bit addition to total 
the 32-bit hashes for each row. I don't know why this is the case - the 
collision persists just considering the leadends and leadheads. There are 
350 leadend and leadheads, of which 70 are common and 280 are different.

We now can consider rotations. I see two approaches here - one is to 
rotate and renumber the rows (so we always start from rounds) and take the 
rotation with the lowest (highest) etc. hash. The other is to rotate, and 
take the rotation which comes first when sorting, then take the hash. The 
second might be slightly better as there should be a better distribution 
of hashes. This correctly spots #c10030 and #c14095 as the same and 
#c10029 and #c38321 as the same.

To consider reversals, we can also do the same, consider all rotations and 
find the lowest in sort order, then reverse and consider all those 
rotations, and find the lowest, then choose the lowest of the two, then 
find the hash. This additionally finds #c14493 and #c14494 as equivalent. 
It is known that the Arthur P Heywood and James W Washbrook arrangements 
are reversals.
http://www.changeringing.co.uk/cambridgefm.htm#No.4

We can also just consider leadend and leadheads (by looking for 
consecutive rows with the treble leading) or leadheads (just the second 
row) with the above hashes. This should spot similarities between 
compositions in different methods for the same leadhead group. E.g.
5088 Yorkshire Surprise Major by Nathan J Pitstow
https://complib.org/composition/10864
5088 Superlative Surprise major by Nathan J Pitstow
https://complib.org/composition/25994

This might be useful in Stedman Triples peals by considering six-ends and 
six-heads as although all peals contain the same rows, the types of the 
sixes will vary. This could find peals the same apart from adding/removing 
Q-sets of bobs. Automatically deciding for any composition what is a 
leadend/leadhead or six-end six-head might be hard though.

It would not spot similarities where the peals are of different lengths, 
for example where the methods have different leadhead groups: 
5280 Rutland Surprise Major by Nathan J Pitstow
https://complib.org/composition/10864?substitutedmethodid=17150
compared to
5088 Yorkshire Surprise Major by Nathan J Pitstow
https://complib.org/composition/10864

This could be detected by only considering leadends/leadheads where there 
is a call, which works for these two peals.

Unfortunately only considering those leadends/leadheads will mask 
compositions which are otherwise similar, but one composition has an extra 
neutral Q-set of calls which just rearranges the order. Perhaps a 
preprocessing step which removes all neutral Q-sets could help - though I 
don't know if this could be done in a unique fashion.

This also might be useful in plain minor where all the six-ends and 
six-heads cover all the rows with the treble leading, but the ones 
affected by calls might make an identifying characteristic.

There is also the case of identifying WHW*3 for plain minor and surprise 
major being similar. One way is to look for leadends/leadheads affected by 
a call, then remove the fixed bells from all such rows. Perhaps the 
composition should be rotated / reversed to get the tenors as fixed bells 
before they are stripped.

Matching peals in plain methods doubled by singles with treble bobs 
methods is harder too.

I'm not sure how to detect that all the Cambridge peals above are similar. 
Stripping out all Q-sets might identify the #c10029 and #c38330, #c38321 
as similar but not the others.

The general idea of all these steps is to generate a list of hashes for 
each composition, and if two compositions have the same hash then they are 
related, rather than having to do a pairwise comparison of compositions.

Any comments?

Andrew Johnson






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



More information about the ringing-theory mailing list