Equality, Hashing and Canonicalization
Tue 22 Nov 2011 13:15 EST
I’ve just been reading “Type-Safe Modular Hash-Consing” (Filliâtre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.
In order to hashcons,
according to the paper, you need an equivalence predicate T ×
T → Bool and a hash function T → Int. They have to
line up. This lining up seems interesting. The hash function is naming
an equivalence class of Ts - or at least it would be if it weren’t
permitted to spuriously identify things sometimes! So it’s a rough
equivalence class.
But my main idea is that say we were hashconsing append-lists, defined by
| T ::= Nil | T ++ T | [T] |
We’d want our equivalence predicate and our hash function to honour t
++ Nil ≡ t. But while it’s doing so, there’s also the
opportunity for it to pick a canonical element: in this case it should
choose just t rather than t ++ Nil, since it is structurally
smaller.
So perhaps hashconsing should be expressed using a canonicalization
function T → T and a hash function T → Int, rather than
referring to an equivalence predicate at all?
In some sense a hashcons operator (per the paper taking T →
Hashconsed T) is a kind of canonicalizing function, of course, but
slightly weird in that it has links to the resource management
facility of the underlying VM rather than simply expressing an
object-level equivalence.
I wonder what would be a good way of writing a canonicalization
function. It feels a lot like a standard reduction, with the notion of
reduction being rules like t ++ Nil → t, Nil ++ t →
t etc.
So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.
