On Tue, Sep 29, 2020, at 10:00 AM, David Matthews wrote:
I've had a look at the log and it definitely is odd. I wonder if it is attempting to allocate a very large object (cell) on the heap due to a bug somewhere. Allocating a very large vector or array would cause this. Probably the only way to find out would be to force a core dump.
Forcing a core dump turned out to be a poor choice since gdb needs to be able to call functions to examine C++ objects. After replacing the abort() with a sleep(999999) I was able to attach to a process in the problematic state and have found:
* This is immediately after a full GC which failed to completely evacuate the allocation spaces because non-allocation spaces were full. (I am not sure if this is intended to be possible?) * wordsRequiredToAllocate = 864471. This is a large vector or array, but not unreasonably so given the heap limit, and I think ML vectors with less than a million elements should be supported. * There are 467 allocation spaces, none with freeSpace larger than 130512 words. (This seems like a consequence of defaultSpaceSize). * currentAllocSpace = 61210624, spaceBeforeMinorGC = 22074672, so AllocHeapSpace refuses to create a new allocation space. * highWaterMark = 177949696, currentHeapSize = 139361280, spaceForHeap = 355793852
So far I have not managed to reproduce the currentAllocSpace > spaceBeforeMinorGC after a full GC condition under controlled conditions. I am currently attempting several runs of the flaky program with the attached patch applied and will update for the results, although I suspect this does not address the root cause.
-s