On Tue, 2 Apr 2013, Matthew Fluet wrote:
There are some (admittedly, outdated in terms of compiler versions) performance comparisons of ML compilers at: http://mlton.org/Performance In general, though, if MLton does better than Poly/ML on micro-benchmarks, then I would imagine that it would tend to do better than Poly/ML on larger programs, where there are more opportunities for whole-program optimization. Of course, it also depends on the "real application" itself. No amount of compiler optimization can help if your application is I/O bound. Also, Poly/ML supports some kinds of applications such as Isabelle that require the ability to dynamically enter new code, which isn't compatible with MLton's compilation strategy.
I don't want to say anything inappropriate about Mlton -- We can't use it in Isabelle, due to the inherent alternation of compilation and execution that is never really finished, so we will never know its performance there. The way how Isabelle and similar theorem provers from the HOL family work violates the basic assumptions about whole-program optimization.
Incidently, the main "benchmark" for Isabelle/ML is the Isabelle/HOL image, and that also includes a lot of compile time. It needs both online compilation, and *fast* online compilation. (Presently Isabelle/HOL requires 1:30 min on recent consumer CPUs like i7 with 4 core * hyperthreading; historically it was up to 25min, although much smaller back then.)
Note that the extrapolation from microbenchmarks to real applications was done by the SML/NJ guys many years ago. According to that Isabelle on SML/NJ would have to be much faster than on Poly/ML, but historically it was always within a factor of 1.2 .. 2 slower in the best of its time. Now SML/NJ is approx. 40..100 times slower. What proved deadly to NJ were two things:
* Poor scalability of heap management (anything beyond approx. 100 MB is getting really slow). So "IO" should also include data moved between the CPU and the memory subsystem.
* Lack of support for multicore systems.
Any benchmark these days should include parallel processing routinely, but we should be glad to have support for multicore hardware at all for a few surviving implementations of Standard ML. (OCaml is really in a pitch there -- maybe some users can evade to F#.)
Makarius