[r-t] Similar compositions

Tue Jan 23 11:56:18 UTC 2018

On 22/01/18 23:58, Graham John wrote:

> One suggestion is to hash the generated rows (or the place notation
> for them). However in practice this is not that helpful. Firstly if
> the order of the rows is not taken into account, compositions for
> extents would generate the same hash.

As I understand it, the order of the rows was taken into account, from 
John's description he says he generates 6 hashes which are used for 
different purposes.

> Secondly, hashes for the same composition applied to a different
> method would not match, nor would rotations or reversals using the
> same method.
Would that not count as a different composition anyway?

Rotations and reversals seem tricky, are there any standard ways of 
'normalising' compositions?

> Trivial variation checking is rather trickier. It could need
> significantly more hashes to be stored using a range of additional
> factors, which can then be compared using a scoring system, in a
> similar way to spam checkers. This will require some experimentation
> to refine, but is nevertheless feasible.

That's why I pointed to the various string similarity algorithms. I 
think the hard part will be getting good distance metrics.

-- 
Alan Burlison
--