From andrew_johnson at uk.ibm.com  Thu Feb 28 14:22:05 2019
From: andrew_johnson at uk.ibm.com (Andrew Johnson)
Date: Thu, 28 Feb 2019 14:22:05 +0000
Subject: [r-t] Similar compositions
In-Reply-To: <OF54FB64F2.B26DD62F-ON8025821B.0052BFD1-8025821D.0047E58E@notes.na.collabserv.com>
References: <OF54FB64F2.B26DD62F-ON8025821B.0052BFD1-8025821D.0047E58E@notes.na.collabserv.com>
Message-ID: <OF2F9E8B2F.37975628-ON802583AF.00351B88-802583AF.004EF033@notes.na.collabserv.com>

Last year we considered similar compositions.

John Goldthorpe had the good idea of using 6 hashes, rather than just all 
the rows (in order or sorted):

>Anyway, from what I can remember I generate six hashes per composition.  
They are done by creating a text file (albeit in memory) which contains 
each row of the composition, in order.  I then calculate the MD5 checksum 
for this file.  Other options are available.  Another composition with the 
same checksum is obviously the same.  I then sort the rows in to order and 
calculate a new checksum to detect the same rows but in a different 
order.  Next I do the same things but only using the lead-ends and 
lead-heads, and finally I do it using only the lead-ends and lead-heads 
which have calls.

That should be good for spotting identical compositions, possibly to 
different methods.

I'd like a way of comparing two compositions to see just how similar they 
are. My plan is to count the number of rows with changes that are the 
same, i.e. row n and row n+1 in composition A match row m and row m+1 in 
composition B.
Assuming A is the reference composition I then I rotate or reverse 
composition B to maximise the count, as I assume rotations/reversals are 
trivial variations which should not affect the 'similarity' figure.
I then calculate a percentage this is of the total changes in B. That way 
if B is a shortened version of A it can be calculated as nearly 100% the 
same.

Composition A and B need to be in the same method; substituting between 
methods can be considered a different problem, which CompLib makes a good 
attempt at. Substituting with spliced methods is going to be harder.

We shouldn't reverse asymmetric methods (e.g. Grandsire) but the reversal 
is unlikely to be more similar.

This similarity figure looks at all the rows, not just those affected by 
calls, but I think that works from the effect outside the tower where a 
listener could have written out all the rows of composition A, and counts 
each change when 2 successive rows rung from composition B match 2 
successive rows in the list from composition A.

Some results:
Middleton's 5600 Cambridge (A) #c10029 compared to Johnson's variation of 
5056 starting with 2H (B) #c14095:

rows A=5600 rows B=5056 rotate 0:>12345678 start=12345678 same rows=2944 
58.23% same next row=2786; 94.63% of same rows; 55.10% of all rows
rows A=5600 rows B=5056 rotate 448:>13425678 start=14235678 same rows=5056 
100.00% same next row=5055; 99.98% of same rows; 99.98% of all rows

so comparing without rotation the similarity is not great (55.10%), but 
starting B after 2 courses (448 rows), or alternatively starting off with 
the row 13425678 gives a 99.98% similarity (only 1 change is different, 
jumping on with the before).

Middleton's versus Washbrook #c14493
rows A=5600 rows B=5184 rotate 0:>12345678 start=12345678 same rows=5184 
100.00% same next row=5181; 99.94% of same rows; 99.94% of all rows

so only 3 changes different (a Q-set? of bobs)

Middleton's versus Washbrook #c14494
rows A=5600 rows B=5184 rotate 224:>14235678 start=13425678 same rows=5184 
100.00% same next row=5181; 99.94% of same rows; 99.94% of all rows

very similar (with a start 1 course later)

Heywood versus Washbrook
rows A=5184 rows B=5184 rotate 4959:<14326587 start=14326587 same 
rows=5184 100.00% same next row=5184; 100.00% of same rows; 100.00% of all 
rows

so they are identical if Washbrook's is started at row 4959 going 
backwards.

Let's try some other compositions:
5056 Yorkshire by Henry Dains #c14958
5056 Yorkshire by Arthur Knight #c14968

rows A=5056 rows B=5056 rotate 3936:>13254678 start=13254678 same 
rows=4512 89.24% same next row=4494; 99.60% of same rows; 88.88% of all 
rows

so with 88.88% they are different, but have similarities.

Instead of actual rotating composition B I renumber it to have rounds (the 
first row of A) at some other point in B. That's equivalent for round 
blocks. For compositions like Brian Price's 5090 (#c10162) you can't 
rotate without affecting the treble's path. We can still do the compare.

rows A=5600 rows B=5090 rotate 224:>14235678 start=13425678 same rows=5056 
99.33% same next row=5050; 99.88% of same rows; 99.21% of all rows

So that means ring Price's comp starting from 13425678, or else have a 
rounds start from the end of the first course, and have the single, the 
treble dodges 1/2 up, then that is the lead-head again, and the treble 
does another dodge as the effect of wrapping around the composition.

It also works for triples. Compare Parker's 12-part (7-obs.) of Grandsire 
Triples #c40921 with Bruerton #c22092

rows A=5040 rows B=5040 rotate 28:<1762453 start=1475632 same rows=5040 
100.00% same next row=3267; 64.82% of same rows; 64.82% of all rows

This measure does show compositions with blocks inserted in different 
places e.g. by 2 courses inserted with 3 bobs or a course between two 
singles as being quite different. Adding / removing a Q-set which just 
reorders parts of a composition would appear to be a small change.

Does this measure seem useful? I'm happy to compare any two compositions 
in CompLib.

--
Andrew Johnson


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU