Hi Thomas, I can see what you're after; I just don't know exactly how to achieve it within the current Poly/ML system. Object (cells) in Poly/ML don't carry type information. All they carry is the length and a few flags; enough for the garbage collector.
From what I understand you have a situation where the size of the live data is larger than you would expect and you would like to get some sort of profile which would tell you something about the distribution of the live data. There is a facility to profile allocations so you can see which functions are allocating the most data but that doesn't distinguish between cells that are allocated and then immediately become garbage and those which have a long life-time.
One possibility that occurs to me would be to have a compiler flag that causes each cell to be allocated with an extra word and have that word set to the function contained the allocation. At the same time it would set a bit in the flags field to indicate that this was the case. Every function currently has space for a profile count. There could be a RTS function (possibly integrated with the GC) to first clear all the counts and then scan over the heap accumulating the counts for each function that allocated currently live cells. This would provide a profile by allocating function name rather than, as you suggested, by type. That may or may not be useful. I could probably get a minimal implementation of this together fairly quickly since it might be generally useful. Does anyone else have any ideas?
Regards, David
On 23/09/2011 05:58, Thomas Sewell wrote:
Hello to the PolyML users mailing list.
I'm an Isabelle/HOL user from the L4.verified project at NICTA, and we're having some issues with Isabelle using far too much memory. I have a feature request that would be useful in diagnosing the problem.
The current implementation of PolyML provides an object profiling mechanism PolyML.objProfile which analyses memory consumed by a given object, decomposed by size. As a trivial issue, there doesn't seem to be a good way to profile all the heap, since threading may make some objects inaccessible from the current root function.
The big issue is that the profile doesn't tell me anything useful at all.
ML> PolyML.objProfile (Thy_Info.get_theory "Nat");
Immutable object sizes and counts 1 229872 2 2557083 3 756596 4 1283875 5 811230 6 24521 7 102914 8 9200 9 9452 10 20660 ...
The development is too big for me to have a chance of identifying which objects of size 2, 4 and 5 are consuming all the memory. By comparison, an object profile could be really valuable if it were decomposed by type (or possible type and/or closure). This would hopefully make it straightforward to hunt down the worst-offending modules.
On a related note, Don Stewart gave a talk about his experience in Haskell working at Galois recently, and pointed out that profiling by types in their runtime effectively eliminated their problems with the famous Haskell space-leak issue.
It seems to me this would be roughly compatible with the existing objProfile algorithm. I had a look at the code and got as far as realising I had no hope of understanding how PolyML encodes types internally. Does this seem like a reasonable feature request?
Yours, Thomas.
polyml mailing list polyml@inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml