New subject: [polyml] Storage management update

5 Jul 2012


      The version of Poly/ML in SVN is the result of over a year's work on the 
storage management.  I'm currently testing the final phase of that and 
I'd be interested in feedback.  The garbage collector has been almost 
completely rewritten and is now parallelised.  The minor GC is 
completely different and uses a copying collector rather than the 
mark-sweep scheme used for major collections.  The final phase is the 
heap sizing strategy.
From the user's point of view the main differences are the options that 
control the heap management.  The old --heap, --mutable and --immutable 
arguments have been removed.  In their place are a new set of arguments: 
--minheap, --maxheap, --gcpercent and --gcthreads.  I have retained the 
-H argument as a synonym for --minheap.
The storage management adjusts the heap size within the range --minheap 
(default 1/8th of the physical memory) to --maxheap (default 100% of the 
physical memory) to keep the garbage collection overhead close to 
--gcpercent (default 10%).  If it detects paging it may reduce the heap 
size to try to avoid paging since the effect of paging is to slow the 
whole execution down quite substantially.  In that case the GC overhead 
will rise above the target.  If necessary it may run an extra "sharing" 
pass in the GC to reduce the heap size.  This is a sort of cut-down 
version of PolyML.shareCommonData.  It adds a considerable overhead to 
the GC so is only run when absolutely necessary but reducing the live 
data size can allow a program to proceed when it is bumping up against 
the --maxheap limit or the paging limit.
The overall aim is to try to adjust the heap size for optimum 
performance with a wide range of programs on a wide range of memory 
sizes.  The larger the heap available the less frequent the GC needs to 
run so the faster the program runs.  This holds up to the point that 
paging sets in.  The available memory, though, depends on what else is 
running on the machine so in general the heap starts off small and grows 
while watching for paging.  If you know how much memory you have 
available you may get better performance by setting --minheap and --maxheap.
The --gcthreads option sets the number of threads to be used for the 
parallel GC.  The default, 0, uses as many threads as there are 
processors (cores) on the machine.  Setting this to 1 makes the GC 
single-threaded.  It's probably best left at the default unless you want 
to run other programs at the same time on other cores.
I have tested the latest version with Isabelle, primarily on 64-bit 
Linux.  The paging avoidance seems to work well and even large examples 
will run in 2Gbytes.  I haven't tested it much on other platforms.  I 
did notice occasional cases of poly being killed by the over-commit 
manager.  This was fixed by increasing the amount of swap space available.
I'd be interested in feedback, particularly if there are pathological 
cases where the performance is significantly worse.
David