<div dir="ltr">Place notation would save space, but lack of memory isn't a problem here.  Once a checksum has been generated there is no need to keep the composition.  I already have them saved in a fairly compressed format anyway.<div><br></div><div>I think the problem with spotting duplicates is really a problem of deciding what a duplicate is.  It seems to me there is a spectrum which roughly goes: identical -> trivial rearrangement -> slight difference -> different.  Where ever you choose to put the line, someone will argue that it should be more one way or the other.  That's why I choose to say that two compositions are either identical or different.  If a better argument can be made then I'll happily change my mind.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 22 January 2018 at 15:49, Alan Burlison <span dir="ltr"><<a href="mailto:alan.burlison@gmail.com" target="_blank">alan.burlison@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">> I have just tried to find my code but it's not where I thought it was.<br>

> Anyway, from what I can remember I generate six hashes per composition.<br>

> They are done by creating a text file (albeit in memory) which contains each<br>

> row of the composition, in order.  I then calculate the MD5 checksum for<br>

> this file.  Other options are available.  Another composition with the same<br>

> checksum is obviously the same.  I then sort the rows in to order and<br>

> calculate a new checksum to detect the same rows but in a different order.<br>

> Next I do the same things but only using the lead-ends and lead-heads, and<br>

> finally I do it using only the lead-ends and lead-heads which have calls.<br>

<br>

</span>This may well be a dumb suggestion but couldn't you use place notation<br>

form for the full composition? You still have the problem with<br>

rotations etc but as PN is a condensed version of row notation, it<br>

should be considerably smaller than storing each row in full.<br>

<br>

As for spotting duplicates, perhaps some of these approaches could be<br>

adapted, the problem seems similar.<br>

<br>

<a href="https://en.wikipedia.org/wiki/Similarity_search" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/<wbr>Similarity_search</a><br>

<a href="https://en.wikipedia.org/wiki/String_metric" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/<wbr>String_metric</a><br>

<a href="https://en.wikipedia.org/wiki/Locality-sensitive_hashing" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/<wbr>Locality-sensitive_hashing</a><br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Alan Burlison<br>

--<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

______________________________<wbr>_________________<br>

ringing-theory mailing list<br>

<a href="mailto:ringing-theory@bellringers.org">ringing-theory@bellringers.org</a><br>

<a href="http://lists.ringingworld.co.uk/listinfo/ringing-theory" rel="noreferrer" target="_blank">http://lists.ringingworld.co.<wbr>uk/listinfo/ringing-theory</a><br>

</div></div></blockquote></div><br></div>