Rob, I've had a look at this and found the problem. It turned out to be much simpler to fix than I'd feared and applying the diffs below reduces the compilation time from around 45 minutes to around 7 seconds! The problem had to do with the way Poly/ML inlines "small" functions. In order to get reasonably efficient code small functions are inserted inline. If they are tail-recursive they can get converted into something like while-loops. I changed this and improved it quite a bit in recent releases but it seems there was something left from a previous attempt at this which was getting in the way. It was causing recursive calls to be expanded inline when they shouldn't have been, leading to a massive code blow-up.
It was actually the fact that you had functions of the form val rec f = fn x => fn y => ..... rather than the equivalent "fun" declaration that meant that the problem only showed up in your code and hadn't appeared before. Poly/ML treats fun declarations specially so that a function of the form fun f a b c = ... gets its arguments passed on the stack. That means that we save the cost of creating a closure except when the function is partially applied. That doesn't apply to the equivalent val rec declaration. Perhaps it could but that would require the front end to look quite closely at the body of the outer function to see whether it was simply another function.
Let me know how you get on with this.
Regards, David.
RCS file: /usr/cvsroot/mlsource/MLCompiler/CodeTree/G_CODE.ML,v retrieving revision 1.22 diff -r1.22 G_CODE.ML 1122,1126c1122 < case pt of < Lambda _ => true < | Declar {value, ...} => isSmall (value, sizeOptVal) < | Newenv [c] => isSmall (c, sizeOptVal) < | _ => size pt < !maxInlineSize ---
size pt < !maxInlineSize
On Saturday, March 2, 2002, at 01:01 , Rob Arthan wrote:
David,
Last month, I ported another tool of ours called ClawZ to Poly/ML. ClawZ is much smaller than the ML parts of ProofPower. Somewhat to my surprise it contains some code which provokes a performance problem in the compiler, which ends up taking about 45 minutes to compile the last 2,400 or so lines of code on a 450MHz Pentium Xeon with 512Mb of RAM. This is the only case I've encountered where Poly/ML underperforms SML/NJ.