[r-t] Similar compositions
John Goldthorpe
john.m.goldthorpe at gmail.com
Mon Jan 22 16:22:18 UTC 2018
Place notation would save space, but lack of memory isn't a problem here.
Once a checksum has been generated there is no need to keep the
composition. I already have them saved in a fairly compressed format
anyway.
I think the problem with spotting duplicates is really a problem of
deciding what a duplicate is. It seems to me there is a spectrum which
roughly goes: identical -> trivial rearrangement -> slight difference ->
different. Where ever you choose to put the line, someone will argue that
it should be more one way or the other. That's why I choose to say that
two compositions are either identical or different. If a better argument
can be made then I'll happily change my mind.
On 22 January 2018 at 15:49, Alan Burlison <alan.burlison at gmail.com> wrote:
> > I have just tried to find my code but it's not where I thought it was.
> > Anyway, from what I can remember I generate six hashes per composition.
> > They are done by creating a text file (albeit in memory) which contains
> each
> > row of the composition, in order. I then calculate the MD5 checksum for
> > this file. Other options are available. Another composition with the
> same
> > checksum is obviously the same. I then sort the rows in to order and
> > calculate a new checksum to detect the same rows but in a different
> order.
> > Next I do the same things but only using the lead-ends and lead-heads,
> and
> > finally I do it using only the lead-ends and lead-heads which have calls.
>
> This may well be a dumb suggestion but couldn't you use place notation
> form for the full composition? You still have the problem with
> rotations etc but as PN is a condensed version of row notation, it
> should be considerably smaller than storing each row in full.
>
> As for spotting duplicates, perhaps some of these approaches could be
> adapted, the problem seems similar.
>
> https://en.wikipedia.org/wiki/Similarity_search
> https://en.wikipedia.org/wiki/String_metric
> https://en.wikipedia.org/wiki/Locality-sensitive_hashing
>
> --
> Alan Burlison
> --
>
> _______________________________________________
> ringing-theory mailing list
> ringing-theory at bellringers.org
> http://lists.ringingworld.co.uk/listinfo/ringing-theory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ringingworld.co.uk/pipermail/ringing-theory/attachments/20180122/078da2bf/attachment.html>
More information about the ringing-theory
mailing list