[r-t] Similar compositions
Graham John
graham at changeringing.co.uk
Mon Jan 22 23:58:23 UTC 2018
On 22 January 2018 at 13:05, Andrew Johnson <andrew_johnson at uk.ibm.com> wrote:
> I would like to automated some of this, preferably in a way that doesn't
> involve extensive computation comparing each composition against every
> other as that could take a while.
I do plan to introduce into Complib checks that will help identify
duplicates including their reversals and rotations, also trivial
variations. This will be done as suggested by maintaining a set of
hashes for each composition in the database that can be compared when
new compositions are entered, or to compare compositions already
present.
One suggestion is to hash the generated rows (or the place notation
for them). However in practice this is not that helpful. Firstly if
the order of the rows is not taken into account, compositions for
extents would generate the same hash. Secondly, hashes for the same
composition applied to a different method would not match, nor would
rotations or reversals using the same method.
Complib already does one check for duplicates, and that is for
identical calling strings. Although this is a simple test that will
not pick up cases where the same composition is expressed using
different call symbols, it has already identified a significant number
of duplicates, particularly when importing the compositions from
ringing.org - even a spliced composition claimed by two composers in
one instance. Of around 15,000 compositions imported, over 1,250
duplicates were found and merged (crediting all composers).
The next step to improve the identity check is to ensure that
variations in entering the calling are eliminated. One way to do this
is to hash the place notations and order of calling sequence and
calling positions, adding further hashes of it normalised to a
consistent rotation and reversal. From these three, Complib will be
able to identify exact matches, rotations and reversals of single
methods.
For Spliced, the Method Calling needs to be normalised in a similar
way to a pattern of method changes rather than specific method
mnemonics. Then simple substitutions could be matched.
Trivial variation checking is rather trickier. It could need
significantly more hashes to be stored using a range of additional
factors, which can then be compared using a scoring system, in a
similar way to spam checkers. This will require some experimentation
to refine, but is nevertheless feasible.
My intention is that eventually Complib should hold every published
composition and become the de facto source for composers and other
contributors to check whether a composition is new or not, being
alerted to duplication, reversal, rotation or trivial variations upon
entering a composition. At that point no one would have an excuse for
not knowing that their contribution wasn't original, and at least
acknowledge the earlier composition and its composer upon deciding
that a trivial variation is worthy of addition.
Graham
More information about the ringing-theory
mailing list