#### Equality, Hashing and Canonicalization

I've just been reading "Type-Safe Modular Hash-Consing" (FilliĆ¢tre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.

In order to hashcons, according to the paper, you need an equivalence predicate `T` × `T``Bool` and a hash function `T``Int`. They have to line up. This lining up seems interesting. The hash function is naming an equivalence class of `T`s - or at least it would be if it weren't permitted to spuriously identify things sometimes! So it's a rough equivalence class.

But my main idea is that say we were hashconsing append-lists, defined by

T ::= Nil | T ++ T | [T]

We'd want our equivalence predicate and our hash function to honour ```t ++ Nil````t`. But while it's doing so, there's also the opportunity for it to pick a canonical element: in this case it should choose just `t` rather than `t ++ Nil`, since it is structurally smaller.

So perhaps hashconsing should be expressed using a canonicalization function `T``T` and a hash function `T``Int`, rather than referring to an equivalence predicate at all?

In some sense a hashcons operator (per the paper taking `T``Hashconsed T`) is a kind of canonicalizing function, of course, but slightly weird in that it has links to the resource management facility of the underlying VM rather than simply expressing an object-level equivalence.

I wonder what would be a good way of writing a canonicalization function. It feels a lot like a standard reduction, with the notion of reduction being rules like `t ++ Nil``t`, `Nil ++ t``t` etc.

So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.