# [r-t] Similar compositions

Andrew Johnson andrew_johnson at uk.ibm.com
Thu Feb 28 14:22:05 GMT 2019

```Last year we considered similar compositions.

John Goldthorpe had the good idea of using 6 hashes, rather than just all
the rows (in order or sorted):

>Anyway, from what I can remember I generate six hashes per composition.
They are done by creating a text file (albeit in memory) which contains
each row of the composition, in order.  I then calculate the MD5 checksum
for this file.  Other options are available.  Another composition with the
same checksum is obviously the same.  I then sort the rows in to order and
calculate a new checksum to detect the same rows but in a different
order.  Next I do the same things but only using the lead-ends and
which have calls.

That should be good for spotting identical compositions, possibly to
different methods.

I'd like a way of comparing two compositions to see just how similar they
are. My plan is to count the number of rows with changes that are the
same, i.e. row n and row n+1 in composition A match row m and row m+1 in
composition B.
Assuming A is the reference composition I then I rotate or reverse
composition B to maximise the count, as I assume rotations/reversals are
trivial variations which should not affect the 'similarity' figure.
I then calculate a percentage this is of the total changes in B. That way
if B is a shortened version of A it can be calculated as nearly 100% the
same.

Composition A and B need to be in the same method; substituting between
methods can be considered a different problem, which CompLib makes a good
attempt at. Substituting with spliced methods is going to be harder.

We shouldn't reverse asymmetric methods (e.g. Grandsire) but the reversal
is unlikely to be more similar.

This similarity figure looks at all the rows, not just those affected by
calls, but I think that works from the effect outside the tower where a
listener could have written out all the rows of composition A, and counts
each change when 2 successive rows rung from composition B match 2
successive rows in the list from composition A.

Some results:
Middleton's 5600 Cambridge (A) #c10029 compared to Johnson's variation of
5056 starting with 2H (B) #c14095:

rows A=5600 rows B=5056 rotate 0:>12345678 start=12345678 same rows=2944
58.23% same next row=2786; 94.63% of same rows; 55.10% of all rows
rows A=5600 rows B=5056 rotate 448:>13425678 start=14235678 same rows=5056
100.00% same next row=5055; 99.98% of same rows; 99.98% of all rows

so comparing without rotation the similarity is not great (55.10%), but
starting B after 2 courses (448 rows), or alternatively starting off with
the row 13425678 gives a 99.98% similarity (only 1 change is different,
jumping on with the before).

Middleton's versus Washbrook #c14493
rows A=5600 rows B=5184 rotate 0:>12345678 start=12345678 same rows=5184
100.00% same next row=5181; 99.94% of same rows; 99.94% of all rows

so only 3 changes different (a Q-set? of bobs)

Middleton's versus Washbrook #c14494
rows A=5600 rows B=5184 rotate 224:>14235678 start=13425678 same rows=5184
100.00% same next row=5181; 99.94% of same rows; 99.94% of all rows

very similar (with a start 1 course later)

Heywood versus Washbrook
rows A=5184 rows B=5184 rotate 4959:<14326587 start=14326587 same
rows=5184 100.00% same next row=5184; 100.00% of same rows; 100.00% of all
rows

so they are identical if Washbrook's is started at row 4959 going
backwards.

Let's try some other compositions:
5056 Yorkshire by Henry Dains #c14958
5056 Yorkshire by Arthur Knight #c14968

rows A=5056 rows B=5056 rotate 3936:>13254678 start=13254678 same
rows=4512 89.24% same next row=4494; 99.60% of same rows; 88.88% of all
rows

so with 88.88% they are different, but have similarities.

Instead of actual rotating composition B I renumber it to have rounds (the
first row of A) at some other point in B. That's equivalent for round
blocks. For compositions like Brian Price's 5090 (#c10162) you can't
rotate without affecting the treble's path. We can still do the compare.

rows A=5600 rows B=5090 rotate 224:>14235678 start=13425678 same rows=5056
99.33% same next row=5050; 99.88% of same rows; 99.21% of all rows

So that means ring Price's comp starting from 13425678, or else have a
rounds start from the end of the first course, and have the single, the
treble dodges 1/2 up, then that is the lead-head again, and the treble
does another dodge as the effect of wrapping around the composition.

It also works for triples. Compare Parker's 12-part (7-obs.) of Grandsire
Triples #c40921 with Bruerton #c22092

rows A=5040 rows B=5040 rotate 28:<1762453 start=1475632 same rows=5040
100.00% same next row=3267; 64.82% of same rows; 64.82% of all rows

This measure does show compositions with blocks inserted in different
places e.g. by 2 courses inserted with 3 bobs or a course between two
singles as being quite different. Adding / removing a Q-set which just
reorders parts of a composition would appear to be a small change.

Does this measure seem useful? I'm happy to compare any two compositions
in CompLib.

--
Andrew Johnson

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

```