The version of Poly/ML in SVN is the result of over a year's work on the storage management. I'm currently testing the final phase of that and I'd be interested in feedback. The garbage collector has been almost completely rewritten and is now parallelised. The minor GC is completely different and uses a copying collector rather than the mark-sweep scheme used for major collections. The final phase is the heap sizing strategy.
From the user's point of view the main differences are the options that control the heap management. The old --heap, --mutable and --immutable arguments have been removed. In their place are a new set of arguments: --minheap, --maxheap, --gcpercent and --gcthreads. I have retained the -H argument as a synonym for --minheap.
The storage management adjusts the heap size within the range --minheap (default 1/8th of the physical memory) to --maxheap (default 100% of the physical memory) to keep the garbage collection overhead close to --gcpercent (default 10%). If it detects paging it may reduce the heap size to try to avoid paging since the effect of paging is to slow the whole execution down quite substantially. In that case the GC overhead will rise above the target. If necessary it may run an extra "sharing" pass in the GC to reduce the heap size. This is a sort of cut-down version of PolyML.shareCommonData. It adds a considerable overhead to the GC so is only run when absolutely necessary but reducing the live data size can allow a program to proceed when it is bumping up against the --maxheap limit or the paging limit.
The overall aim is to try to adjust the heap size for optimum performance with a wide range of programs on a wide range of memory sizes. The larger the heap available the less frequent the GC needs to run so the faster the program runs. This holds up to the point that paging sets in. The available memory, though, depends on what else is running on the machine so in general the heap starts off small and grows while watching for paging. If you know how much memory you have available you may get better performance by setting --minheap and --maxheap.
The --gcthreads option sets the number of threads to be used for the parallel GC. The default, 0, uses as many threads as there are processors (cores) on the machine. Setting this to 1 makes the GC single-threaded. It's probably best left at the default unless you want to run other programs at the same time on other cores.
I have tested the latest version with Isabelle, primarily on 64-bit Linux. The paging avoidance seems to work well and even large examples will run in 2Gbytes. I haven't tested it much on other platforms. I did notice occasional cases of poly being killed by the over-commit manager. This was fixed by increasing the amount of swap space available.
I'd be interested in feedback, particularly if there are pathological cases where the performance is significantly worse.
David
David,
This looks really great - it should prove very valuable. I have a couple of questions about GC when performed by another thread:
1. Is PolyML.fullGC asynchronous, i.e. triggers GC to start in the background and returns immediately, or synchronous, i.e. waits until GC completes? (Looking at profiling output for some GTK+ programs, GC is not occurring as often as I would perhaps want so was considering inserting explicit GC calls.)
2. Are finalizers run in separate threads, i.e. should we be making sure that our finalization code - whether C or SML via a callback - is thread-safe?
Thanks, Phil
Phil,
On 30/08/2012 17:45, Phil Clayton wrote:
This looks really great - it should prove very valuable. I have a couple of questions about GC when performed by another thread:
- Is PolyML.fullGC asynchronous, i.e. triggers GC to start in the
background and returns immediately, or synchronous, i.e. waits until GC completes? (Looking at profiling output for some GTK+ programs, GC is not occurring as often as I would perhaps want so was considering inserting explicit GC calls.)
When the heap has run out or if PolyML.fullGC is called explicitly all the ML threads are stopped, the GC is performed and then the threads are allowed to continue. The exception is that a thread running code through the FFI will continue to run and only be stopped if it returns to ML before the GC is complete.
A fully asynchronous GC would require substantial changes to the ML code or some rather nasty handling of reading/writing protected memory. The GC itself will use multiple threads on a multi-core processor.
- Are finalizers run in separate threads, i.e. should we be making sure
that our finalization code - whether C or SML via a callback - is thread-safe?
Currently, the only finalisers as such are associated with the foreign function interface. There are weak references and there is a mechanism for an ML thread to be woken up after a GC that has detected some unreferenced weak references. The FFI finalisers are currently executed by the "main" thread during the GC but, as we discussed in a private email, there is a problem with this if the finaliser tries to make a callback into ML. I think a better solution would be for the finalisers to be called some time after the GC by some ML thread. I don't quite know how to do that. One possibility would be to have an ML thread waiting to execute the finalisers and have it woken up in the same way as weak references are signalled. Perhaps the FFI and weak reference mechanisms could be combined in some way.
David
On 30/08/12 18:35, David Matthews wrote:
On 30/08/2012 17:45, Phil Clayton wrote:
This looks really great - it should prove very valuable. I have a couple of questions about GC when performed by another thread:
- Is PolyML.fullGC asynchronous, i.e. triggers GC to start in the
background and returns immediately, or synchronous, i.e. waits until GC completes? (Looking at profiling output for some GTK+ programs, GC is not occurring as often as I would perhaps want so was considering inserting explicit GC calls.)
When the heap has run out or if PolyML.fullGC is called explicitly all the ML threads are stopped, the GC is performed and then the threads are allowed to continue. The exception is that a thread running code through the FFI will continue to run and only be stopped if it returns to ML before the GC is complete.
A fully asynchronous GC would require substantial changes to the ML code or some rather nasty handling of reading/writing protected memory. The GC itself will use multiple threads on a multi-core processor.
Understand. I should have read your (and Makarius') paper more carefully.. "A more promising solution is to run the garbage collector as a distinct phase as at present but to parallelize the collector itself"
- Are finalizers run in separate threads, i.e. should we be making sure
that our finalization code - whether C or SML via a callback - is thread-safe?
Currently, the only finalisers as such are associated with the foreign function interface. There are weak references and there is a mechanism for an ML thread to be woken up after a GC that has detected some unreferenced weak references. The FFI finalisers are currently executed by the "main" thread during the GC but, as we discussed in a private email, there is a problem with this if the finaliser tries to make a callback into ML.
Right. For some reason I was thinking that calling back from a finalizer was only an issue on the termination pass but now understand that it is not possible generally. Thinking about it, applications should be able to work around this without too much difficulty - spotting where the word-around is needed is probably the hard bit.
I think a better solution would be for the finalisers to be called some time after the GC by some ML thread. I don't quite know how to do that. One possibility would be to have an ML thread waiting to execute the finalisers and have it woken up in the same way as weak references are signalled. Perhaps the FFI and weak reference mechanisms could be combined in some way.
That sounds like a good approach. Presumably this would work for ML values generally, not just vols. I note that the MLton implementation of finalizable values uses a weak reference to determine whether an ML value still exists. It appears to check all such weak references when a GC occurs.
Thanks, Phil