[r-t] Similar compositions

John Goldthorpe john.m.goldthorpe at gmail.com
Mon Jan 22 14:12:27 UTC 2018


Hi Andrew,

Part of my job as Composition Secretary for the Yorkshire Association is to
check if compositions have been rung before.  To help with this I invented
a system which is very similar to the one you describe.

I have just tried to find my code but it's not where I thought it was.
Anyway, from what I can remember I generate six hashes per composition.
They are done by creating a text file (albeit in memory) which contains
each row of the composition, in order.  I then calculate the MD5 checksum
for this file.  Other options are available.  Another composition with the
same checksum is obviously the same.  I then sort the rows in to order and
calculate a new checksum to detect the same rows but in a different order.
Next I do the same things but only using the lead-ends and lead-heads, and
finally I do it using only the lead-ends and lead-heads which have calls.

Obviously this does not spot rotations or reversals.  Compositions which
have say a 3H omitted from one part I tend to save twice so there is a
longer one without it missing.  My philosophy tends to be that if it's not
the same - it's different.  Stripping Q sets out of compositions, (in the
YACR at least) would leave a lot of compositions with not much left, if
anything.

I had intended at some point to ask the developer(s) of complib if there
was a way in which I could automatically upload lots of YACR compositions
in the hope that it would make my life easier and also benefit the
collection.  My assumption was that a check for repeat compositions was
already part of it.  Sorry this email isn't more well thought out.  I'm at
work and supposed to be doing something else!

John Goldthorpe


On 22 January 2018 at 13:05, Andrew Johnson <andrew_johnson at uk.ibm.com>
wrote:

> Composition Library, a.k.a. CompLib <https://www.complib.org/> is doing an
> excellent job at accumulating compositions as people enter them.
>
> With 23,659 compositions as of 22 January 2017, there are now two
> problems, identifying duplicates and similar compositions, and also
> choosing a suitable composition.
>
> I'll just consider the first.
>
> One important reference is:
> THE CENTRAL COUNCIL OF CHURCH BELL RINGERS
> Variation and Transposition By J. ARMIGER TROLLOPE
> Issued under the Authority of the Council
> http://www.ringing.info/v-and-t.html
>
> which says "we see that any true peal can be begun from any lead-end and
> still remain the same true peal; or it can be reversed and still be the
> same, and the reversal can be begun from any lead-end and still be the
> same. All these variations will give different leadends, and, of course,
> different qualities."
>
> He also explains about Q-sets and how plaining a bobbed Q-set can reduce a
> peal of says 5376 to 5152. Also "In many cases if you bob a plained “Q
> set” you will still get the same number of changes, but they will come in
> a different order. "
>
> There is also the case of converting a peal in one method to another where
> all the calls are in Q-sets
> "These two compositions look quite different; actually they are the same,
> and they contain exactly the same lead-ends, but in different order. The
> first by N. J. Pitstow is No. 88 in the Double Norwich Collection, and was
> rung in 1892. The second, with a little alteration of the bobs which do
> not affect the “part” bells is No. 87 in the Bob Major Collection, and was
> composed in 1894 by Mr. Lindoff. According to the old standards. these are
> distinct and original compositions, and, of course, they were produced
> quite independently of each other "
>
> He also considers the case of finding the 'key of the composition' by
> omitting courses linked in by Q-sets.
>
> It would also be nice to spot related compositions in different stages
> e.g. WHW*3 in Plain Bob Minor and Yorkshire Surprise Major.
>
> I would like to automated some of this, preferably in a way that doesn't
> involve extensive computation comparing each composition against every
> other as that could take a while.
>
> A hash is a transform of a string of characters into a shorter fixed
> length value. It is possible for two different strings to have the same
> hash, but good hash functions spread out the hash values and minimise the
> chance. We can hash all the rows of a composition and if two compositions
> have the same hash they are probably the same.
>
> Some hash calculations can be order dependent and some independent.
>
> E.g. just adding the character codes of each character gives a hash, but
> 'ABC' will hash to the same value as 'CBA'.
>
> Java uses the following order dependent hash.
> hash=0
> for each character:
>     hash = hash * 31
>     hash = hash + character code of next character
>
> Let us consider the following compositions of Cambridge Surprise Major:
>
> 5056 Cambridge Surprise Major Charles Middleton arr. Henry Johnson -
> rotated version of #c10030
> https://complib.org/composition/14095
> 5056 Cambridge Surprise Major Charles Middleton
> https://complib.org/composition/10030
> 5152 Cambridge Surprise Major Charles Middleton (omit 3 homes)
> https://complib.org/composition/38330?accessKey=
> 06bdc66d4b3df9cfc401ea749a80f30db9d4eecd
> 5184 Cambridge Surprise Major   Charles Middleton arr. Arthur P Heywood
> https://complib.org/composition/14494
> 5184 Cambridge Surprise Major Charles Middleton arr. James W Washbrook
> https://complib.org/composition/14493
> 5600 Cambridge Surprise Major Charles Middleton
> https://complib.org/composition/10029
> 5600 Cambridge Surprise Major Charles Middleton (rotated)
> https://complib.org/composition/38321?accessKey=
> ee0978ed1aa8b92903917cb10adbc8b9e2cb1783
> 5600 Cambridge Surprise Major William Shipway
> https://complib.org/composition/14496
>
> CompLib can generate a list of all the rows, including opening and closing
> rounds, so we can use this for our calculations:
> https://complib.org/composition/10029/rows
>
> Hash all rows, order important: Use Java hash of row, and use Java style
> hash to generate hash of all row hashes. This should work quite well to
> detect identical compositions.
>
> Hash all rows, order unimportant: Use Java hash of row, and add Java
> hashes together (or exclusive or). This should detect compositions with
> the same rows, but possibly in a different order. Clearly extents or
> in-course extents will have the same hash value.
>
> One interesting clash is the 5600 by Charles Middleton and William Shipway
> have the same hash using 32-bit addition to combine the hash of each row:
> hash2=1824995460 hash3=-1163601020
> hash2=1824995460 hash3=1528932476
>
> but are different using exclusive-or (hash3) or 64-bit addition to total
> the 32-bit hashes for each row. I don't know why this is the case - the
> collision persists just considering the leadends and leadheads. There are
> 350 leadend and leadheads, of which 70 are common and 280 are different.
>
> We now can consider rotations. I see two approaches here - one is to
> rotate and renumber the rows (so we always start from rounds) and take the
> rotation with the lowest (highest) etc. hash. The other is to rotate, and
> take the rotation which comes first when sorting, then take the hash. The
> second might be slightly better as there should be a better distribution
> of hashes. This correctly spots #c10030 and #c14095 as the same and
> #c10029 and #c38321 as the same.
>
> To consider reversals, we can also do the same, consider all rotations and
> find the lowest in sort order, then reverse and consider all those
> rotations, and find the lowest, then choose the lowest of the two, then
> find the hash. This additionally finds #c14493 and #c14494 as equivalent.
> It is known that the Arthur P Heywood and James W Washbrook arrangements
> are reversals.
> http://www.changeringing.co.uk/cambridgefm.htm#No.4
>
> We can also just consider leadend and leadheads (by looking for
> consecutive rows with the treble leading) or leadheads (just the second
> row) with the above hashes. This should spot similarities between
> compositions in different methods for the same leadhead group. E.g.
> 5088 Yorkshire Surprise Major by Nathan J Pitstow
> https://complib.org/composition/10864
> 5088 Superlative Surprise major by Nathan J Pitstow
> https://complib.org/composition/25994
>
> This might be useful in Stedman Triples peals by considering six-ends and
> six-heads as although all peals contain the same rows, the types of the
> sixes will vary. This could find peals the same apart from adding/removing
> Q-sets of bobs. Automatically deciding for any composition what is a
> leadend/leadhead or six-end six-head might be hard though.
>
> It would not spot similarities where the peals are of different lengths,
> for example where the methods have different leadhead groups:
> 5280 Rutland Surprise Major by Nathan J Pitstow
> https://complib.org/composition/10864?substitutedmethodid=17150
> compared to
> 5088 Yorkshire Surprise Major by Nathan J Pitstow
> https://complib.org/composition/10864
>
> This could be detected by only considering leadends/leadheads where there
> is a call, which works for these two peals.
>
> Unfortunately only considering those leadends/leadheads will mask
> compositions which are otherwise similar, but one composition has an extra
> neutral Q-set of calls which just rearranges the order. Perhaps a
> preprocessing step which removes all neutral Q-sets could help - though I
> don't know if this could be done in a unique fashion.
>
> This also might be useful in plain minor where all the six-ends and
> six-heads cover all the rows with the treble leading, but the ones
> affected by calls might make an identifying characteristic.
>
> There is also the case of identifying WHW*3 for plain minor and surprise
> major being similar. One way is to look for leadends/leadheads affected by
> a call, then remove the fixed bells from all such rows. Perhaps the
> composition should be rotated / reversed to get the tenors as fixed bells
> before they are stripped.
>
> Matching peals in plain methods doubled by singles with treble bobs
> methods is harder too.
>
> I'm not sure how to detect that all the Cambridge peals above are similar.
> Stripping out all Q-sets might identify the #c10029 and #c38330, #c38321
> as similar but not the others.
>
> The general idea of all these steps is to generate a list of hashes for
> each composition, and if two compositions have the same hash then they are
> related, rather than having to do a pairwise comparison of compositions.
>
> Any comments?
>
> Andrew Johnson
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> _______________________________________________
> ringing-theory mailing list
> ringing-theory at bellringers.org
> http://lists.ringingworld.co.uk/listinfo/ringing-theory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ringingworld.co.uk/pipermail/ringing-theory/attachments/20180122/fb5d00d3/attachment-0001.html>


More information about the ringing-theory mailing list