The version of Poly/ML in SVN is the result of over a year's work on the storage management. I'm currently testing the final phase of that and I'd be interested in feedback. The garbage collector has been almost completely rewritten and is now parallelised. The minor GC is completely different and uses a copying collector rather than the mark-sweep scheme used for major collections. The final phase is the heap sizing strategy.
From the user's point of view the main differences are the options that control the heap management. The old --heap, --mutable and --immutable arguments have been removed. In their place are a new set of arguments: --minheap, --maxheap, --gcpercent and --gcthreads. I have retained the -H argument as a synonym for --minheap.
The storage management adjusts the heap size within the range --minheap (default 1/8th of the physical memory) to --maxheap (default 100% of the physical memory) to keep the garbage collection overhead close to --gcpercent (default 10%). If it detects paging it may reduce the heap size to try to avoid paging since the effect of paging is to slow the whole execution down quite substantially. In that case the GC overhead will rise above the target. If necessary it may run an extra "sharing" pass in the GC to reduce the heap size. This is a sort of cut-down version of PolyML.shareCommonData. It adds a considerable overhead to the GC so is only run when absolutely necessary but reducing the live data size can allow a program to proceed when it is bumping up against the --maxheap limit or the paging limit.
The overall aim is to try to adjust the heap size for optimum performance with a wide range of programs on a wide range of memory sizes. The larger the heap available the less frequent the GC needs to run so the faster the program runs. This holds up to the point that paging sets in. The available memory, though, depends on what else is running on the machine so in general the heap starts off small and grows while watching for paging. If you know how much memory you have available you may get better performance by setting --minheap and --maxheap.
The --gcthreads option sets the number of threads to be used for the parallel GC. The default, 0, uses as many threads as there are processors (cores) on the machine. Setting this to 1 makes the GC single-threaded. It's probably best left at the default unless you want to run other programs at the same time on other cores.
I have tested the latest version with Isabelle, primarily on 64-bit Linux. The paging avoidance seems to work well and even large examples will run in 2Gbytes. I haven't tested it much on other platforms. I did notice occasional cases of poly being killed by the over-commit manager. This was fixed by increasing the amount of swap space available.
I'd be interested in feedback, particularly if there are pathological cases where the performance is significantly worse.
David