Hi all,
when I build a large session in Isabelle2011-1 (to be precise, an extension of of JinjaThreads in the Archive of Formal Proofs) with PolyML 5.4.1, this takes 1:51h. While running the session, polyml requires 12GB of memory (VmSize in /proc/<PID>/status). The final call to PolyML.shareCommonData before writing the heap image consumes 17GB of memory.
I now ran the same session with the SVN version 1352 of PolyML. Then, it takes 2:35h, 16GB for the session and 21GB for sharing common data. What is happening there? Have other people experienced a similar surge in memory usage and runtime?
Thanks for any hints, Andreas
Andreas, How much real memory do you have on this machine? I would not expect that the size of the live data or the CPU time will have changed much, indeed I would have expected the CPU time to have reduced. There has, though, been a major change in the garbage-collector and in the way that the heap is allocated.
In languages with a GC the size of the heap effectively determines the amount of time the program spends garbage-collecting. With a small heap the program needs to garbage-collect more frequently and since the cost of each GC depends largely on the amount of live data the total GC time goes up. Poly/ML increases the total size of the heap as the amount of live data goes up but working out how to do this is largely down to heuristics. The big problem occurs if the heap size reaches the size of the available main memory because at that point GC results in a very significant increase in paging activity.
The GC in SVN has been extensively re-worked to allow for multi-threading the GC and there were various other changes. Multi-threading is off by default but if anyone wants to experiment with it there is a run-time option --gc-threads=N which sets the number of GC threads. Zero means use as many threads as there are processors. As part of these changes the way the -H option works has changed. If you are providing a -H option try providing a smaller value than you did with 5.4.1.
Determining the optimal heap size is an open research topic and I'm planning to look into this in more detail. Let me know how you get on and if there's still a problem I'll investigate.
Regards, David
On 16/11/2011 09:04, Andreas Lochbihler wrote:
Hi all,
when I build a large session in Isabelle2011-1 (to be precise, an extension of of JinjaThreads in the Archive of Formal Proofs) with PolyML 5.4.1, this takes 1:51h. While running the session, polyml requires 12GB of memory (VmSize in /proc/<PID>/status). The final call to PolyML.shareCommonData before writing the heap image consumes 17GB of memory.
I now ran the same session with the SVN version 1352 of PolyML. Then, it takes 2:35h, 16GB for the session and 21GB for sharing common data. What is happening there? Have other people experienced a similar surge in memory usage and runtime?
Thanks for any hints, Andreas
David,
Thanks for the hint with -H. I changed --immutable and --mutable from 2000 each to 1000. Now, the session runs with 10GB of memory and 13GB for sharing common data in 1:55h.
Regards, Andreas
Andreas, How much real memory do you have on this machine? I would not expect that the size of the live data or the CPU time will have changed much, indeed I would have expected the CPU time to have reduced. There has, though, been a major change in the garbage-collector and in the way that the heap is allocated.
In languages with a GC the size of the heap effectively determines the amount of time the program spends garbage-collecting. With a small heap the program needs to garbage-collect more frequently and since the cost of each GC depends largely on the amount of live data the total GC time goes up. Poly/ML increases the total size of the heap as the amount of live data goes up but working out how to do this is largely down to heuristics. The big problem occurs if the heap size reaches the size of the available main memory because at that point GC results in a very significant increase in paging activity.
The GC in SVN has been extensively re-worked to allow for multi-threading the GC and there were various other changes. Multi-threading is off by default but if anyone wants to experiment with it there is a run-time option --gc-threads=N which sets the number of GC threads. Zero means use as many threads as there are processors. As part of these changes the way the -H option works has changed. If you are providing a -H option try providing a smaller value than you did with 5.4.1.
Determining the optimal heap size is an open research topic and I'm planning to look into this in more detail. Let me know how you get on and if there's still a problem I'll investigate.
Regards, David
On 16/11/2011 09:04, Andreas Lochbihler wrote:
Hi all,
when I build a large session in Isabelle2011-1 (to be precise, an extension of of JinjaThreads in the Archive of Formal Proofs) with PolyML 5.4.1, this takes 1:51h. While running the session, polyml requires 12GB of memory (VmSize in /proc/<PID>/status). The final call to PolyML.shareCommonData before writing the heap image consumes 17GB of memory.
I now ran the same session with the SVN version 1352 of PolyML. Then, it takes 2:35h, 16GB for the session and 21GB for sharing common data. What is happening there? Have other people experienced a similar surge in memory usage and runtime?
Thanks for any hints, Andreas
Andreas, I'm glad that worked. I'd suggest reducing the value of --mutable very considerably and possibly providing a value of, say, 1000 for --allocation. The default values are 50% of the heap for immutable, 5% for mutable and 45% for allocation. The mutable area used to be used for initial allocation and also for thread stacks. Stacks are now handled completely differently; there's no need for them to be in the GC'd heap because the run-time system can allocate and deallocate them. The mutable area was also used for initial allocation. Initial allocation is now in the "allocation" area.
It may be that these changes won't make a big difference because the RTS is going to grow the heap anyway. Eventually, I hope to be able to avoid having to set these things manually and have it all work automatically.
Regards, David
On 17/11/2011 12:25, Andreas Lochbihler wrote:
David,
Thanks for the hint with -H. I changed --immutable and --mutable from 2000 each to 1000. Now, the session runs with 10GB of memory and 13GB for sharing common data in 1:55h.
Regards, Andreas
Andreas, How much real memory do you have on this machine? I would not expect that the size of the live data or the CPU time will have changed much, indeed I would have expected the CPU time to have reduced. There has, though, been a major change in the garbage-collector and in the way that the heap is allocated.
In languages with a GC the size of the heap effectively determines the amount of time the program spends garbage-collecting. With a small heap the program needs to garbage-collect more frequently and since the cost of each GC depends largely on the amount of live data the total GC time goes up. Poly/ML increases the total size of the heap as the amount of live data goes up but working out how to do this is largely down to heuristics. The big problem occurs if the heap size reaches the size of the available main memory because at that point GC results in a very significant increase in paging activity.
The GC in SVN has been extensively re-worked to allow for multi-threading the GC and there were various other changes. Multi-threading is off by default but if anyone wants to experiment with it there is a run-time option --gc-threads=N which sets the number of GC threads. Zero means use as many threads as there are processors. As part of these changes the way the -H option works has changed. If you are providing a -H option try providing a smaller value than you did with 5.4.1.
Determining the optimal heap size is an open research topic and I'm planning to look into this in more detail. Let me know how you get on and if there's still a problem I'll investigate.
Regards, David
On 16/11/2011 09:04, Andreas Lochbihler wrote:
Hi all,
when I build a large session in Isabelle2011-1 (to be precise, an extension of of JinjaThreads in the Archive of Formal Proofs) with PolyML 5.4.1, this takes 1:51h. While running the session, polyml requires 12GB of memory (VmSize in /proc/<PID>/status). The final call to PolyML.shareCommonData before writing the heap image consumes 17GB of memory.
I now ran the same session with the SVN version 1352 of PolyML. Then, it takes 2:35h, 16GB for the session and 21GB for sharing common data. What is happening there? Have other people experienced a similar surge in memory usage and runtime?
Thanks for any hints, Andreas