On Tue, Apr 2, 2013 at 8:51 AM, Makarius makarius@sketis.net wrote:
On Tue, 2 Apr 2013, Gergely Buday wrote:
An ML script could be just tailored rewriting those critical parts in ML
itself and compiled with mlton if necessary.
I hear that part about Mlton occasionally, and wonder if it is really significant. Do you have concrete performance figures at hand that show that the extra time for Mlton compilation is worth waiting? (Real applications, not just micro-benchmarks.)
There are some (admittedly, outdated in terms of compiler versions) performance comparisons of ML compilers at: http://mlton.org/Performance In general, though, if MLton does better than Poly/ML on micro-benchmarks, then I would imagine that it would tend to do better than Poly/ML on larger programs, where there are more opportunities for whole-program optimization. Of course, it also depends on the "real application" itself. No amount of compiler optimization can help if your application is I/O bound. Also, Poly/ML supports some kinds of applications such as Isabelle that require the ability to dynamically enter new code, which isn't compatible with MLton's compilation strategy.
For larger stand-alone programs, MLton's compilation gets better with respect to Poly/ML's; for example, here is Poly/ML 5.5 and MLton 20100608 compiling MLton (on a 2009 MacPro (2.66GHz Quad-Core Intel Xeon; 6GB 1066MhZ DDR3; MacOSX 10.7 (Lion)):
[mtf@fenrir mlton]$ /usr/bin/time make polyml-mlton ... 202.07 real 241.34 user 8.13 sys [mtf@fenrir mlton]$ /usr/bin/time make mlton-compile ... 305.59 real 285.22 user 17.18 sys
So, paying about 1.5X compile time to use MLton instead of Poly/ML. Watching the build, it appears that Poly/ML is spending quite a bit of time in the final 'PolyML.export'.
Now, here are the resulting executables compiling (a slightly old) version of HaMLet:
[mtf@fenrir tests]$ /usr/bin/time ../../build.polyml/bin/mlton -verbose 1 hamlet.sml MLton starting ... MLton finished in 9.63 + 57.56 (86% GC) 17.99 real 65.64 user 1.67 sys [mtf@fenrir tests]$ /usr/bin/time ../../build.mlton/bin/mlton -verbose 1 hamlet.sml MLton starting ... MLton finished in 5.95 + 2.52 (30% GC) 8.59 real 6.80 user 1.77 sys
So, paying about 2.0X run time to use Poly/ML instead of MLton (looking at wall-clock time). Note that things would be a bit worse on a single-core machine --- Poly/ML's parallel GC does a good job of utilizing this machine's 8 cores (technically, 4 cores w/ 2-way hyper threading), yielding a wall-clock time that is about a quarter of the total processor time.
And, here are the resulting executables compiling MLton:
[mtf@fenrir mlton]$ /usr/bin/time ../build.polyml/bin/mlton -verbose 2 mlton.mlb MLton starting ... MLton finished in 728.89 + 49105.93 (99% GC) 9615.53 real 49693.05 user 142.17 sys
[mtf@fenrir mlton]$ /usr/bin/time ../build.mlton/bin/mlton -verbose 2 mlton.mlb MLton starting ... MLton finished in 209.28 + 52.13 (20% GC) 262.58 real 254.47 user 7.06 sys
So, paying about 36.6X run time to use Poly/ML instead of MLton (looking at wall-clock time). Of course, it's clear that the Poly/ML compiled MLton is essentially GC bound when compiling MLton. Also, I didn't do anything special with adjusting Poly/ML's heap parameters --- I'm sure one could do better, if not much better (but, default behavior is the one that makes the first impression). In any case, one would still is paying about 3.5X run time to use Poly/ML instead of MLton (looking at mutator time, as reported by Timer.checkCPUTimes).
I'm sure that are some aspects of the MLton code base that make it more suitable for compilation by MLton than by other SML compilers, but I'd guess that this gives a reasonable estimate: pay a (one-time) 1.5X-2.0X compile time to use MLton instead of Poly/ML to gain a (many-time) 0.33X-0.5X run time. Maybe not the right trade off for development, but quite possibly the right trade off for deployment --- which is precisely the kind of scenario that Gergely had in mind.
-Matthew