[polyml] Memory consumption profiling in PolyML

24 Sep 2011

      Hi David.
Thanks for thinking about this one. I hadn't realised that type information was erased in PolyML's store. That makes the problem significantly more complicated than I had thought.
The approach you suggested sounds sensible. It could also scale to more general classifications than functions in the future. Functions just happen to be objects with profile space associated with them. You could also allocated profiling nodes for modules, types, etc if that turned out to be useful. It occurs to me that this is probably how the Haskell statistics module works (I dimly recall choices about different ways to allocate profile nodes).
I only recommended types since I assumed that information was already available. I think that knowledge of the allocating function would be just as useful.
The situation is indeed that the final dataset is larger than expected. Isabelle does a lot of internal work checking proofs, but should then be able to discard the internal states if the proof is valid. There still seems to be a lot of data left lying around, and it would be nice to know what it was or where it came from. There are a lot of add-on packages which may be contributing, and no single author has overall vision of what they all do.
If you have time to implement something like this, it would certainly be appreciated. I have no idea how difficult it would be. At the moment, we're toying with an alternative strategy, which is using the getLocalStats feature to profile memory consumption of various operations against a series of development snapshots of Isabelle to get an idea which changesets are relevant. It's possible this might get us enough information.
Yours,
    Thomas.
________________________________________
From: polyml-bounces@inf.ed.ac.uk [polyml-bounces@inf.ed.ac.uk] On Behalf Of David Matthews [David.Matthews@prolingua.co.uk]
Sent: Saturday, September 24, 2011 12:14 AM
To: polyml@inf.ed.ac.uk
Subject: Re: [polyml] Memory consumption profiling in PolyML
Hi Thomas,
I can see what you're after; I just don't know exactly how to achieve it
within the current Poly/ML system.  Object (cells) in Poly/ML don't
carry type information.  All they carry is the length and a few flags;
enough for the garbage collector.
From what I understand you have a situation where the size of the live
data is larger than you would expect and you would like to get some sort
of profile which would tell you something about the distribution of the
live data.  There is a facility to profile allocations so you can see
which functions are allocating the most data but that doesn't
distinguish between cells that are allocated and then immediately become
garbage and those which have a long life-time.
One possibility that occurs to me would be to have a compiler flag that
causes each cell to be allocated with an extra word and have that word
set to the function contained the allocation.  At the same time it would
set a bit in the flags field to indicate that this was the case.  Every
function currently has space for a profile count.  There could be a RTS
function (possibly integrated with the GC) to first clear all the counts
and then scan over the heap accumulating the counts for each function
that allocated currently live cells.  This would provide a profile by
allocating function name rather than, as you suggested, by type.  That
may or may not be useful.  I could probably get a minimal implementation
of this together fairly quickly since it might be generally useful.
Does anyone else have any ideas?
Regards,
David
On 23/09/2011 05:58, Thomas Sewell wrote:
...
Hello to the PolyML users mailing list.
I'm an Isabelle/HOL user from the L4.verified project at NICTA, and
we're having some issues with Isabelle using far too much memory. I have
a feature request that would be useful in diagnosing the problem.
The current implementation of PolyML provides an object profiling
mechanism PolyML.objProfile which analyses memory consumed by a given
object, decomposed by size. As a trivial issue, there doesn't seem to be
a good way to profile all the heap, since threading may make some
objects inaccessible from the current root function.
The big issue is that the profile doesn't tell me anything useful at all.
ML> PolyML.objProfile (Thy_Info.get_theory "Nat");
Immutable object sizes and counts
1 229872
2 2557083
3 756596
4 1283875
5 811230
6 24521
7 102914
8 9200
9 9452
10 20660
...
The development is too big for me to have a chance of identifying which
objects of size 2, 4 and 5 are consuming all the memory. By comparison,
an object profile could be really valuable if it were decomposed by type
(or possible type and/or closure). This would hopefully make it
straightforward to hunt down the worst-offending modules.
On a related note, Don Stewart gave a talk about his experience in
Haskell working at Galois recently, and pointed out that profiling by
types in their runtime effectively eliminated their problems with the
famous Haskell space-leak issue.
It seems to me this would be roughly compatible with the existing
objProfile algorithm. I had a look at the code and got as far as
realising I had no hope of understanding how PolyML encodes types
internally. Does this seem like a reasonable feature request?
Yours,
Thomas.

polyml mailing list
polyml@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
_______________________________________________
polyml mailing list
polyml@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[polyml] Memory consumption profiling in PolyML