This looks more like a polyml issue than an Isabelle issue, so I'm reposting here.
Since the below I've run a few more experiments, and the theory does run fine with image built after less than 25min with the current svn trunk version of polyml on x86_64-darwin (5.4 Experimental).
However, with the polyml-3.1-fixes version, it is still running after 45min with memory usage approaching 7.5GB and still raising.
I'm not quite sure how to proceed. I don't think I'll want to run our production proofs on an experimental polyml version. How experimental/far away from release is the current trunk?
Cheers, Gerwin
Begin forwarded message:
From: Gerwin Klein gerwin.klein@nicta.com.au Date: 23 July 2010 4:06:19 PM GMT+02:00 To: Isabelle Users ML cl-isabelle-users@lists.cam.ac.uk Subject: [isabelle] insufficient memory exception in polyml
I am currently updating a very large theory from Isabelle2009-1 to Isabelle2009-2. So far this update was less painful than previous one, but I seem to be stuck now:
I am getting various versions of insufficient memory exceptions from polyml-5.3 after the theory has been fully processed at the point (I presume) where polyml is writing the image to disk, possibly while it is doing garbage collection.
A sample output of the log file is (the log file is btw gzipped and the error code is success, but the image is never written):
lemma ... [last lemma of thy] val it = () : unit val it = () : unit ML> Exception- Fail "Insufficient memory" raised ML>
The platform is polyml-5.3.0_x86-linux with ISABELLE_USEDIR_OPTIONS="-M 1 -p 0 -q 0 -v true" and unchanged standard ML settings.
The machine has 8G of RAM and otherwise works fine for other big developments.
On polyml-5.3.0_x86-darwin, I am getting the same with the same settings. With -M 2, I am getting a C level exception in polyml at the same point with the message to set a breakpoint in a malloc function on stdour/err (but the same exception in the log).
On polyml-5.3.0_x86_64-darwin, I am getting with -M 1 the same log, and this message on std out:
Building CKERNEL ... poly(5949,0xa0a3e500) malloc: *** mmap(size=134217728) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Finished CKERNEL (0:32:51 elapsed time, 0:32:47 cpu time, factor 0.99)
(possibly the same as the 32bit version, but I've lost the previous output. Can reproduce if needed)
With -M 3 and -M 2, on 64bit Darwin, I am instead getting exceptions about not being able to start another thread (at the same point). Again, the machine has 8GB of RAM and is otherwise rock solid.
The same image builds easily under 32bit and 64bit polyml 5.3.0 (same version) on machines with less than 4GB of memory in Isabelle2009-1.
Any ideas?
I've tried playing around with -H and --mutable/--immutable settings, without success, but I don't have a good guideline what to try.
Cheers, Gerwin
Gerwin Klein wrote:
Since the below I've run a few more experiments, and the theory does run fine with image built after less than 25min with the current svn trunk version of polyml on x86_64-darwin (5.4 Experimental).
However, with the polyml-3.1-fixes version, it is still running after 45min with memory usage approaching 7.5GB and still raising.
I'm not quite sure how to proceed. I don't think I'll want to run our production proofs on an experimental polyml version. How experimental/far away from release is the current trunk?
Just so I'm clear: the version that runs successfully is with the current SVN. Did you rebuild the compiler before you ran with that version (i.e. make compiler) or use the original compiler?
I suspect the problem has to do with the way the Poly/ML run-time system (RTS) manages memory. Currently, as the ML program uses more memory the RTS gets more memory from the system up to the various limits imposed. The limits are things like the swap size, the limit set by ulimit -d and on 32-bit architecture the size of the address space. The space the RTS uses for the heap has to be significantly more than the actual size that the ML program is currently using otherwise it would be constantly garbage-collecting.
The trouble is that this can leave no space left if the system needs to allocate memory for something else such as for malloc or for the stack for a new thread. I think I need to look at some way of keeping some space in reserve for these. It's a bit difficult because I don't think there's a way to find out how much space is left after extending the heap.
David
On 26/07/2010, at 1:35 PM, David Matthews wrote:
Gerwin Klein wrote:
Since the below I've run a few more experiments, and the theory does run fine with image built after less than 25min with the current svn trunk version of polyml on x86_64-darwin (5.4 Experimental). However, with the polyml-3.1-fixes version, it is still running after 45min with memory usage approaching 7.5GB and still raising. I'm not quite sure how to proceed. I don't think I'll want to run our production proofs on an experimental polyml version. How experimental/far away from release is the current trunk?
Just so I'm clear: the version that runs successfully is with the current SVN.
Yes, that's right.
Did you rebuild the compiler before you ran with that version (i.e. make compiler) or use the original compiler?
I checked out the svn trunk version from scratch, and did run "make compiler".
I suspect the problem has to do with the way the Poly/ML run-time system (RTS) manages memory. Currently, as the ML program uses more memory the RTS gets more memory from the system up to the various limits imposed. The limits are things like the swap size, the limit set by ulimit -d and on 32-bit architecture the size of the address space. The space the RTS uses for the heap has to be significantly more than the actual size that the ML program is currently using otherwise it would be constantly garbage-collecting.
The trouble is that this can leave no space left if the system needs to allocate memory for something else such as for malloc or for the stack for a new thread. I think I need to look at some way of keeping some space in reserve for these. It's a bit difficult because I don't think there's a way to find out how much space is left after extending the heap.
Can I do anything to help you debug this?
In the 64bit 5.3 version I noticed normal (< 2G) memory usage while it was processing everything and then a huge spike at the very end going up to > 7G (presumably at the commit call at the very end of the session, but can't really tell). Could there be a polyml reason for this or is it an Isabelle problem after all? I'd have expected memory usage to go up in Isabelle2009-2 slightly, but not by that much.
It's a bit difficult to provide good data at the moment, because I'm travelling and have only a low-bandwidth connection to my machine in the office, but I can produce more when I'm back mid Aug.
Cheers, Gerwin
Gerwin Klein wrote:
On 26/07/2010, at 1:35 PM, David Matthews wrote:
Did you rebuild the compiler before you ran with that version (i.e. make compiler) or use the original compiler?
I checked out the svn trunk version from scratch, and did run "make compiler".
OK. This is probably as good an opportunity as any to summarise the current state of the SVN version.
The current SVN has a number of changes particularly to the compiler. The idea was to improve performance. The main changes are 1. Major rewrite of the X86 code-generator and combining the 32 and 64-bit versions into a single module. It now supports the floating point instructions. 2. Changes to the way functions with polymorphic equality are handled to eliminate the "structural equality" code. 3. Uses the GMP library if that is available when Poly/ML is built otherwise falls back to the old Poly/ML code.
There are probably a few other things I've forgotten.
I haven't currently released this because, with Isabelle at least, there's not a real improvement in performance and the extra compile-time just about balances out the improvement in run-times. The floating point performance is significantly better but generally floating point isn't the performance bottle-neck of ML programs. The use of the GMP library probably improves performance of multiplication and division for very long precision numbers but is unlikely to make a difference for numbers that are only slightly larger than single precision.
I'd be interested in feedback if anyone wants to try this out. One important point to make is that you should run "make compiler" TWICE i.e. make make compiler make compiler The reason for this is that the format of exception handlers has changed and if it is only run once there is danger that a raised exception could encounter code that uses the old format resulting in a segfault.
It's a bit difficult to provide good data at the moment, because I'm travelling and have only a low-bandwidth connection to my machine in the office, but I can produce more when I'm back mid Aug.
Let's look into this more carefully when you get back.
David
On Mon, 26 Jul 2010, David Matthews wrote:
This is probably as good an opportunity as any to summarise the current state of the SVN version.
I'd be interested in feedback if anyone wants to try this out.
We have the Poly/ML SVN version running in parallel with our regular Isabelle Tests for several months already -- updating from the repository occasionally. My impression is that it is pretty close to stable-release quality.
Makarius
On 26/07/2010, at 6:14 PM, David Matthews wrote:
It's a bit difficult to provide good data at the moment, because I'm travelling and have only a low-bandwidth connection to my machine in the office, but I can produce more when I'm back mid Aug.
Let's look into this more carefully when you get back.
Ok, will let this rest until I'm back.
Actually 5.4 (svn trunk) otherwise looks fine to me, so far no problems and similar performance to before. And one less crash ;-)
If you're close to releasing 5.4, I'd be happy just switching to it.
Cheers, Gerwin
David Matthews wrote:
One important point to make is that you should run "make compiler" TWICE i.e. make make compiler make compiler The reason for this is that the format of exception handlers has changed and if it is only run once there is danger that a raised exception could encounter code that uses the old format resulting in a segfault.
The Poly/ML download page ( http://www.polyml.org/download.html ) says:
./configure make make compiler make make install
Should the third make command be 'make compiler' then? (I've always found just 'make' does nothing.)
Phil
Phil Clayton wrote:
The Poly/ML download page ( http://www.polyml.org/download.html ) says:
./configure make make compiler make make install
Should the third make command be 'make compiler' then? (I've always found just 'make' does nothing.)
I've updated the web page for the current SVN. The second "make" was included because a previous version of the Makefile required it. The current version includes a call to "make" as part of "make compiler". The need for a second call to "make compiler" is specific to the current state of SVN for the X86. The version of the compiler in SVN uses a different format for exception handlers than the previous compiler. Calling "make compiler" builds a new compiler so that any code it generates uses the new format. The compiler itself, though, uses the old format. There are cases where handlers in the compiler get called from newly compiled code and the mismatch can result in a segfault. The solution is to build the compiler itself on the new compiler so everything uses the new format and this is what the second call to "make compiler" does. It isn't generally necessary.
David
On Mon, 26 Jul 2010, Gerwin Klein wrote:
On 26/07/2010, at 1:35 PM, David Matthews wrote:
Did you rebuild the compiler before you ran with that version (i.e. make compiler) or use the original compiler?
Maybe you should also try to rebuild the official 5.3.0 version from source. Theoretically there could be a problem with the compilation from the Isabelle download site -- it also has some add-on library for SHA1 that is loaded into the RTS at some point.
In the 64bit 5.3 version I noticed normal (< 2G) memory usage while it was processing everything and then a huge spike at the very end going up to > 7G (presumably at the commit call at the very end of the session, but can't really tell).
BTW, the commit is no longer a primitive of Poly/ML 5.x but defined in Isabelle/lib/scripts/run-polyml like this:
fun commit () = (PolyML.shareCommonData PolyML.rootFunction; TextIO.output (TextIO.stdOut, "Exporting $OUTFILE\n"); PolyML.SaveState.saveState "$OUTFILE"; true);
You can also experiment doing only part of the above.
Makarius