Hi David and Matthew,
thanks for your encouraging answers. The SML code directly generated from Isabelle/HOL is available from the input [ADb] in the paper bibliography, in http://www.unirioja.es/cu/jodivaso/Isabelle/Gauss-Jordan/ under the link "View the generated code".
The matrices used for the benchmarks are in a zip file (9MB compressed, 160MB uncompressed) in the link "Download the benchmarks" in the previous website. Uncompressing the folder three different folders appear (RAT, REAL, Z2). In each of these folders there is a "Readme" file where we describe how we carried out the computations (we would be very glad to hear if there is a more proper way, both in MLton and Poly/ML).
The segmentation fault was raised with Poly/ML 5.5 trying to compute the matrix "Benchmarks/REAL/800_real.sml". There is also the issue of lack of performance from Poly/ML 5.5 to Poly/ML 5.2. As an example, the matrix "Benchmarks/REAL/200_real.sml" sums up 3.7 (processing) + 0.7 (executing) seconds with Poly/ML 5.5 and 31.0 (processing) + 0.4 (executing) seconds with Poly/ML 5.2 (the version shipped as an Ubuntu package); the same matrix in MLton took 9145.0 (processing) + 0.5 (executing) seconds.
Thanks in advance for your advice, do not hesitate to contact us for further questions,
Jesus
On Thu, Jul 25, 2013 at 4:59 PM, Matthew Fluet <matthew.fluet at gmail.com> wrote:
On Thu, Jul 25, 2013 at 8:03 AM, David Matthews <David.Matthews at prolingua.co.uk> wrote:
On 25/07/2013 11:29, Jesus Aransay wrote:
Jose Divas?n and I wrote a short paper for a Spanish workshop on functional programming ("Taller de Programaci?n Funcional 2013", http://babel.ls.fi.upm.es/tpf2013/) in which we test a SML program obtained with the Isabelle code generator tool to perform the Gauss Jordan elimination algorithm over matrices.
That is interesting work. Is it possible to get a copy of the code you ran for the various tests so I can check this out?
Seconded! I would love to investigate why MLton is performing so poorly on the input processing.
-Matthew
-- Jes?s Mar?a Aransay Azofra Universidad de La Rioja Dpto. de Matem?ticas y Computaci?n tlf.: (+34) 941299438 fax: (+34) 941299460 mail: jesus-maria.aransay at unirioja.es ; web: http://www.unirioja.es/cu/jearansa Edificio Luis Vives, c/ Luis de Ulloa s/n, 26004 Logro?o, Espa?a
-- Jes?s Mar?a Aransay Azofra Universidad de La Rioja Dpto. de Matem?ticas y Computaci?n tlf.: (+34) 941299438 fax: (+34) 941299460 mail: jesus-maria.aransay at unirioja.es ; web: http://www.unirioja.es/cu/jearansa Edificio Luis Vives, c/ Luis de Ulloa s/n, 26004 Logro?o, Espa?a
On Fri, Jul 26, 2013 at 6:07 AM, Jesus Aransay < jesus-maria.aransay at unirioja.es> wrote:
The matrices used for the benchmarks are in a zip file (9MB compressed, 160MB uncompressed) in the link "Download the benchmarks" in the previous website. Uncompressing the folder three different folders appear (RAT, REAL, Z2). In each of these folders there is a "Readme" file where we describe how we carried out the computations (we would be very glad to hear if there is a more proper way, both in MLton and Poly/ML).
Jesus,
Thanks for the link to the benchmarks (and sorry I didn't find them by following the links in your paper).
I wasn't expecting your benchmarks to "load" the matrices by including a (very large) SML expression in the program. For MLton, at least, this explains the large "processing" times --- which are really "compile" times. MLton isn't doing very well with this very large constant data; actually, it seems that one of the intermediate-representation type checkers is very slow on this kind of large constant data, as are a couple of optimization passes. I was expecting that the matrices would be loaded by opening and parsing a text or binary file, in which case the "processing" time would have been the I/O time. Note that by including the matrices in the program as static data, a compiler could choose to completely evaluate the Gauss-Jordan elimination at compile-time.
For MLton, compiling with "-profile time" incurs some cost; some optimizations are not as effective in the presence of profiling and there will be a runtime cost to sampling the program during execution. It would be better to simply use "time" to measure the run time, as you do to measure the compile time. I don't know if "PolyML.timing true" incurs a significant overhead. Another option would be to use the SML "Timer" and/or "Time" structures to do the timing; that might have the experimental advantage of using the same code to measure the time for both MLton and Poly/ML.
-Matthew
Hi Jesus, Thank you for sending that. I've now had a chance to have a look at it.
Building the large matrices by a single top-level val binding blows up very badly, particularly with Poly/ML 5.5. The good news is that this has been fixed in SVN. Even compiling 800_real.sml which is over 20Mbytes of ML source takes only around 2 minutes with the compiler from SVN trunk.
I haven't been able to reproduce the seg fault with Poly/ML 5.5 so far, at least with 5.5 from the SVN fixes branch. I eventually killed it after several hours of running. Did this happen only once or repeatedly?
I've looked at difference between 5.2 and 5.5 to see if I could see what was happening. A significant difference between these two versions is that since Poly/ML 5.4 the X86 code-generator has generated floating point operations in-line. This makes a very significant difference to functions that do floating point calculations because it means that intermediate results are retained in floating point registers whereas the older versions stored away the result of every floating point operation in the heap ("boxed" them). It's not clear whether your code would benefit from this and it's possible that there might be some extra cost. Certainly the profiling I did showed the same amount of heap allocation with 5.2 and 5.5.
A major difficulty with measuring timing when the runs are only a few seconds is that garbage-collection can have a significant effect. Depending on what has happened before the run starts it is possible that a GC may happen during the run, distorting the figures. With short runs there will probably be only a single GC if it happens at all. Runs of only a few seconds are a useful indication but to get consistent figures it's probably necessary for a run to take several minutes and involve multiple garbage collections.
I hope that's useful feedback. Thanks for posting your results.
Best regards, David
On 26/07/2013 11:07, Jesus Aransay wrote:
Hi David and Matthew,
thanks for your encouraging answers. The SML code directly generated from Isabelle/HOL is available from the input [ADb] in the paper bibliography, in http://www.unirioja.es/cu/jodivaso/Isabelle/Gauss-Jordan/ under the link "View the generated code".
The matrices used for the benchmarks are in a zip file (9MB compressed, 160MB uncompressed) in the link "Download the benchmarks" in the previous website. Uncompressing the folder three different folders appear (RAT, REAL, Z2). In each of these folders there is a "Readme" file where we describe how we carried out the computations (we would be very glad to hear if there is a more proper way, both in MLton and Poly/ML).
The segmentation fault was raised with Poly/ML 5.5 trying to compute the matrix "Benchmarks/REAL/800_real.sml". There is also the issue of lack of performance from Poly/ML 5.5 to Poly/ML 5.2. As an example, the matrix "Benchmarks/REAL/200_real.sml" sums up 3.7 (processing) + 0.7 (executing) seconds with Poly/ML 5.5 and 31.0 (processing) + 0.4 (executing) seconds with Poly/ML 5.2 (the version shipped as an Ubuntu package); the same matrix in MLton took 9145.0 (processing) + 0.5 (executing) seconds.
Thanks in advance for your advice, do not hesitate to contact us for further questions,
Jesus
On Thu, Jul 25, 2013 at 4:59 PM, Matthew Fluet <matthew.fluet at gmail.com> wrote:
On Thu, Jul 25, 2013 at 8:03 AM, David Matthews <David.Matthews at prolingua.co.uk> wrote:
On 25/07/2013 11:29, Jesus Aransay wrote:
Jose Divas?n and I wrote a short paper for a Spanish workshop on functional programming ("Taller de Programaci?n Funcional 2013", http://babel.ls.fi.upm.es/tpf2013/) in which we test a SML program obtained with the Isabelle code generator tool to perform the Gauss Jordan elimination algorithm over matrices.
That is interesting work. Is it possible to get a copy of the code you ran for the various tests so I can check this out?
Seconded! I would love to investigate why MLton is performing so poorly on the input processing.
-Matthew
-- Jes?s Mar?a Aransay Azofra Universidad de La Rioja Dpto. de Matem?ticas y Computaci?n tlf.: (+34) 941299438 fax: (+34) 941299460 mail: jesus-maria.aransay at unirioja.es ; web: http://www.unirioja.es/cu/jearansa Edificio Luis Vives, c/ Luis de Ulloa s/n, 26004 Logro?o, Espa?a
-- Jes?s Mar?a Aransay Azofra Universidad de La Rioja Dpto. de Matem?ticas y Computaci?n tlf.: (+34) 941299438 fax: (+34) 941299460 mail: jesus-maria.aransay at unirioja.es ; web: http://www.unirioja.es/cu/jearansa Edificio Luis Vives, c/ Luis de Ulloa s/n, 26004 Logro?o, Espa?a
On Tue, 2013-07-30 at 13:39 +0100, David Matthews wrote:
A major difficulty with measuring timing when the runs are only a few seconds is that garbage-collection can have a significant effect. Depending on what has happened before the run starts it is possible that a GC may happen during the run, distorting the figures. With short runs there will probably be only a single GC if it happens at all. Runs of only a few seconds are a useful indication but to get consistent figures it's probably necessary for a run to take several minutes and involve multiple garbage collections.
It may be worth calling "PolyML.fullGC ()" immediately before short benchmarks. Together with suitable memory parameters, this could perhaps avoid GC during short runs.
Best, Tjark