From andrew_johnson at uk.ibm.com Thu Feb 28 14:22:05 2019 From: andrew_johnson at uk.ibm.com (Andrew Johnson) Date: Thu, 28 Feb 2019 14:22:05 +0000 Subject: [r-t] Similar compositions In-Reply-To: References: Message-ID: Last year we considered similar compositions. John Goldthorpe had the good idea of using 6 hashes, rather than just all the rows (in order or sorted): >Anyway, from what I can remember I generate six hashes per composition. They are done by creating a text file (albeit in memory) which contains each row of the composition, in order. I then calculate the MD5 checksum for this file. Other options are available. Another composition with the same checksum is obviously the same. I then sort the rows in to order and calculate a new checksum to detect the same rows but in a different order. Next I do the same things but only using the lead-ends and lead-heads, and finally I do it using only the lead-ends and lead-heads which have calls. That should be good for spotting identical compositions, possibly to different methods. I'd like a way of comparing two compositions to see just how similar they are. My plan is to count the number of rows with changes that are the same, i.e. row n and row n+1 in composition A match row m and row m+1 in composition B. Assuming A is the reference composition I then I rotate or reverse composition B to maximise the count, as I assume rotations/reversals are trivial variations which should not affect the 'similarity' figure. I then calculate a percentage this is of the total changes in B. That way if B is a shortened version of A it can be calculated as nearly 100% the same. Composition A and B need to be in the same method; substituting between methods can be considered a different problem, which CompLib makes a good attempt at. Substituting with spliced methods is going to be harder. We shouldn't reverse asymmetric methods (e.g. Grandsire) but the reversal is unlikely to be more similar. This similarity figure looks at all the rows, not just those affected by calls, but I think that works from the effect outside the tower where a listener could have written out all the rows of composition A, and counts each change when 2 successive rows rung from composition B match 2 successive rows in the list from composition A. Some results: Middleton's 5600 Cambridge (A) #c10029 compared to Johnson's variation of 5056 starting with 2H (B) #c14095: rows A=5600 rows B=5056 rotate 0:>12345678 start=12345678 same rows=2944 58.23% same next row=2786; 94.63% of same rows; 55.10% of all rows rows A=5600 rows B=5056 rotate 448:>13425678 start=14235678 same rows=5056 100.00% same next row=5055; 99.98% of same rows; 99.98% of all rows so comparing without rotation the similarity is not great (55.10%), but starting B after 2 courses (448 rows), or alternatively starting off with the row 13425678 gives a 99.98% similarity (only 1 change is different, jumping on with the before). Middleton's versus Washbrook #c14493 rows A=5600 rows B=5184 rotate 0:>12345678 start=12345678 same rows=5184 100.00% same next row=5181; 99.94% of same rows; 99.94% of all rows so only 3 changes different (a Q-set? of bobs) Middleton's versus Washbrook #c14494 rows A=5600 rows B=5184 rotate 224:>14235678 start=13425678 same rows=5184 100.00% same next row=5181; 99.94% of same rows; 99.94% of all rows very similar (with a start 1 course later) Heywood versus Washbrook rows A=5184 rows B=5184 rotate 4959:<14326587 start=14326587 same rows=5184 100.00% same next row=5184; 100.00% of same rows; 100.00% of all rows so they are identical if Washbrook's is started at row 4959 going backwards. Let's try some other compositions: 5056 Yorkshire by Henry Dains #c14958 5056 Yorkshire by Arthur Knight #c14968 rows A=5056 rows B=5056 rotate 3936:>13254678 start=13254678 same rows=4512 89.24% same next row=4494; 99.60% of same rows; 88.88% of all rows so with 88.88% they are different, but have similarities. Instead of actual rotating composition B I renumber it to have rounds (the first row of A) at some other point in B. That's equivalent for round blocks. For compositions like Brian Price's 5090 (#c10162) you can't rotate without affecting the treble's path. We can still do the compare. rows A=5600 rows B=5090 rotate 224:>14235678 start=13425678 same rows=5056 99.33% same next row=5050; 99.88% of same rows; 99.21% of all rows So that means ring Price's comp starting from 13425678, or else have a rounds start from the end of the first course, and have the single, the treble dodges 1/2 up, then that is the lead-head again, and the treble does another dodge as the effect of wrapping around the composition. It also works for triples. Compare Parker's 12-part (7-obs.) of Grandsire Triples #c40921 with Bruerton #c22092 rows A=5040 rows B=5040 rotate 28:<1762453 start=1475632 same rows=5040 100.00% same next row=3267; 64.82% of same rows; 64.82% of all rows This measure does show compositions with blocks inserted in different places e.g. by 2 courses inserted with 3 bobs or a course between two singles as being quite different. Adding / removing a Q-set which just reorders parts of a composition would appear to be a small change. Does this measure seem useful? I'm happy to compare any two compositions in CompLib. -- Andrew Johnson Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU