The version of Poly/ML in SVN is the result of over a year's work on the
storage management. I'm currently testing the final phase of that and
I'd be interested in feedback. The garbage collector has been almost
completely rewritten and is now parallelised. The minor GC is
completely different and uses a copying collector rather than the
mark-sweep scheme used for major collections. The final phase is the
heap sizing strategy.
From the user's point of view the main differences are the options that
control the heap management. The old --heap, --mutable and --immutable
arguments have been removed. In their place are a new set of arguments:
--minheap, --maxheap, --gcpercent and --gcthreads. I have retained the
-H argument as a synonym for --minheap.
The storage management adjusts the heap size within the range --minheap
(default 1/8th of the physical memory) to --maxheap (default 100% of the
physical memory) to keep the garbage collection overhead close to
--gcpercent (default 10%). If it detects paging it may reduce the heap
size to try to avoid paging since the effect of paging is to slow the
whole execution down quite substantially. In that case the GC overhead
will rise above the target. If necessary it may run an extra "sharing"
pass in the GC to reduce the heap size. This is a sort of cut-down
version of PolyML.shareCommonData. It adds a considerable overhead to
the GC so is only run when absolutely necessary but reducing the live
data size can allow a program to proceed when it is bumping up against
the --maxheap limit or the paging limit.
The overall aim is to try to adjust the heap size for optimum
performance with a wide range of programs on a wide range of memory
sizes. The larger the heap available the less frequent the GC needs to
run so the faster the program runs. This holds up to the point that
paging sets in. The available memory, though, depends on what else is
running on the machine so in general the heap starts off small and grows
while watching for paging. If you know how much memory you have
available you may get better performance by setting --minheap and --maxheap.
The --gcthreads option sets the number of threads to be used for the
parallel GC. The default, 0, uses as many threads as there are
processors (cores) on the machine. Setting this to 1 makes the GC
single-threaded. It's probably best left at the default unless you want
to run other programs at the same time on other cores.
I have tested the latest version with Isabelle, primarily on 64-bit
Linux. The paging avoidance seems to work well and even large examples
will run in 2Gbytes. I haven't tested it much on other platforms. I
did notice occasional cases of poly being killed by the over-commit
manager. This was fixed by increasing the amount of swap space available.
I'd be interested in feedback, particularly if there are pathological
cases where the performance is significantly worse.
David