Thanks for the detailed explanation! It's great to know that this has been resolved.
Best wishes, Michael
On 04/03/15 00:45, David Matthews wrote:
An update on this. I've committed some changes that seem to have solved the problem. It's probably of general interest so I'll provide a bit of background.
It has to do with the optimisation of closures. In general an ML function needs to be represented by a closure, an item on the heap consisting of a pointer to the code and copies of the free variables needed by the function. A naive implementation would build a closure for each function as it was declared but it's possible to do much better than that. If we can be sure that a function is only ever called and not returned or assigned to a reference we don't need to put the closure on the heap. Poly/ML has used a couple of different techniques but more recently it has handled these cases by building the closure on the stack. A stack closure looks just like a heap closure so the calling conventions are the same. That means it can be used both when we can identify all the call sites of the function but also if it is passed into a function such as map or fold. (Actually, List.map and List.foldl/r are small enough that they're inlined but the principle is the same). The disadvantage is that a call to a function with a stack-closure cannot be tail-recursive because the closure needs to stay on the stack until the function returns. This is what went wrong with this example.
Another solution, which works when all the call sites of a function can be identified but not when a function is passed as an argument, is lambda-lifting. This involves adding the free variables as extra arguments to the function. The resulting function does not need a closure. I've implemented this for local functions where all the call sites can be found and left the stack closures for the other cases. It is this change that has fixed the problem since lambda-lifted functions can be tail-recursive.
It may be that lambda-lifted functions are more efficient for other reasons. Functions without closures can be code-generated using calls or jumps directly to the function rather than requiring an indirection through the closure. This may fit better with the pre-fetching of modern hardware.
David
On 05/02/2015 05:00, Michael Norrish wrote:
(I'll attempt to attach the tgz file to this message, but if 817KB is too big and this bounces or the attachment is dropped, I'll make it available separately.)
If compiled with Poly/ML, the attached program performs abysmally on the provided testcase.sml input file. Within seconds of beginning, it attempts to allocate more memory than my machine has (16GB), and basically brings it to its knees. If compiled with mlton, the testcase doesn't seem to cause any obvious memory allocation, and finishes in less than a second.
You can also compile with Moscow ML, though I haven?t included the build instructions in the Makefile. The execution of the mosml executable doesn't seem to allocate any memory either, and it terminates cleanly (after 17s). Though slow this is still preferable to Poly/ML's behaviour.
This is with Poly/ML 5.5.2 on
Linux telemachus 3.2.0-75-generic #110-Ubuntu SMP Tue Dec 16 19:11:55 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux
The Makefile generates two executables (if you have mlton and poly in your PATH): pholdeptool and mholdeptool. You can run them on the testcase, e.g.:
pholdeptool testcase.sml
The lexer code looks to me to be tail-recursive, and the program only accumulates a binary tree containing 15 strings. So it really does seem as if Poly/ML is doing something buggy.
(The program is lexing the test-case (which uncompresses to a 30MB file) in an extremely simple way, but performing a useful analysis.)
Michael
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml