Hi David,
Thanks very much for the explanation.
My motivation for hash-consing is as follows. I want to add an id (an integer) that is (at least) unique for every distinct value of certain types in the HOL4 kernel. (The motivation for these ids is on-the-fly proof-recording by emitting a trace of kernel operations.) Hash-consing would allow me to re-use ids for new constructions of values that are equal to existing values.
From your explanation, I understand that Poly/ML's garbage collector is
already effectively doing hash-consing via its "sharing" phase. However, if I add ids to my values, they will no longer be equal from the garbage collector's perspective and I will lose the benefits of sharing. (An alternative perspective: if I add ids, I will not know when to re-use the same id because the garbage collector doesn't expose information about whether a structurally equal (modulo the id) value is reachable already.)
So this leaves me with two avenues to pursue:
1. Would it be easy to add an alternative interface to the garbage collector wherein weak pointers can point to immutable data? (So I could use that with my own hash-consing, given that I don't care about being precise about discovering whether some object is reachable or not.) 2. Or, could Poly/ML expose some primitives by which I could add unique ids to certain objects (or simply read existing ids if you already have them) while benefiting from the garbage collector's own hash-consing/sharing implementation?
Thanks, Ramana
On 7 October 2015 at 02:31, David Matthews <David.Matthews at prolingua.co.uk> wrote:
Hi Ramana, The reason for only allowing a weak reference to point to a reference has to do with the notion of equality in ML and the way the Poly/ML system and the garbage collector in particular deal with that. A reference has an identity and so it is possible to say whether a particular reference is pointed at from anywhere else. Equality for references is defined as pointer equality and Poly/ML respects that.
Equality for immutable data is by value and in the absence of PolyML.pointerEq it is not possible to distinguish two pointers to the same immutable data structure from two pointers to different structures with the same contents. The garbage collector exploits this both by allowing some duplication and also by compacting equal data structures when memory is short. All this means that if a weak pointer could point to an immutable data structure it would not be possible to give any assurance as to when the weak pointer would be modified by the garbage collector. That may not matter if the object is to implement some sort of speed-up but it wouldn't be acceptable if the idea was to discover whether some other entity was reachable or not.
I have to ask what your motivation behind implementing hash-consing is. The garbage collector already implements a "sharing" phase if memory is really short and this will effectively hash-cons all the shareable data.
Regards, David
On 04/10/2015 09:27, Ramana Kumar wrote:
Hello,
I am interested in using Poly/ML's Weak structure ( http://polyml.org/documentation/Reference/Weak.html) for hash-consing. The documentation would perhaps suggest that this is not an intended use of Weak, so I'd like to ask the list whether it is suitable and, if not, whether there are alternatives.
An abstract of the problem is as follows. For a certain datatype, t, we want there to be at most one copy of any value of type t created by applying a constructor to unique arguments. To achieve this, we implement "hash-consing" constructors for t, which check whether the desired value exists already and return it if so, and only otherwise create a new value. To be able to check which values exist already, we maintain a hash table of all existing values. To allow values to be garbage collected when they are no longer needed, we make the hash table store only weak references to the values.
The Weak structure allows weak references to be made to other references, but not to other non-reference values. Thus, the values in the hash table can be of type (t ref option ref), as produced by Weak.weak, but not of type (t option ref). Thus, on the face of it, it seems like using Poly/ML's Weak implementation will require unnecessary references and indirections. Is that indeed the case, or is Poly/ML smart about avoiding unnecessary indirections somehow?
For hash-consing, we are not using Weak in the way described in its documentation, wherein the contents of the reference don't matter and the reference itself is only used as a token to track an external resource. Is there, perhaps, an alternative interface to Poly/ML's garbage collector more suitable for hash-consing?
Cheers, Ramana
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml