[r-t] Similar compositions

Alan Burlison alan.burlison at gmail.com
Mon Jan 22 15:49:08 UTC 2018


> I have just tried to find my code but it's not where I thought it was.
> Anyway, from what I can remember I generate six hashes per composition.
> They are done by creating a text file (albeit in memory) which contains each
> row of the composition, in order.  I then calculate the MD5 checksum for
> this file.  Other options are available.  Another composition with the same
> checksum is obviously the same.  I then sort the rows in to order and
> calculate a new checksum to detect the same rows but in a different order.
> Next I do the same things but only using the lead-ends and lead-heads, and
> finally I do it using only the lead-ends and lead-heads which have calls.

This may well be a dumb suggestion but couldn't you use place notation
form for the full composition? You still have the problem with
rotations etc but as PN is a condensed version of row notation, it
should be considerably smaller than storing each row in full.

As for spotting duplicates, perhaps some of these approaches could be
adapted, the problem seems similar.

https://en.wikipedia.org/wiki/Similarity_search
https://en.wikipedia.org/wiki/String_metric
https://en.wikipedia.org/wiki/Locality-sensitive_hashing

-- 
Alan Burlison
--



More information about the ringing-theory mailing list