I've now merged a major piece of development work into GitHub master. This is a variant of 64-bit mode that uses 32-bit values. It was triggered by the observation that Isabelle often peformed better with the 32-bit version than the 64-bit despite the 64-bit version allowing a much larger heap and having more registers. It appears that Isabelle, at any rate, is limited more by the bandwidth between the processors and main memory than by the processors themselves. This may well be the result of the work on parallelising the application and the garbage collector.
In effect there are now three versions on the X86: native 64-bit, native 32-bit and 32-bits on 64-bits. The new version can be built with the --enable-compact32bit option to configure. It allows a heap size of up to 16Gbytes by using 31-bit "object ids", essentially indexes into an array of 8-byte cells. This heap contains only the data. Code, thread stacks and other support structures are outside this.
It has been extensively tested with Isabelle (thanks to Makarius) and seems to perform better than either of the native address versions. I started the development about a year ago; development of the final stages were supported by Data61 and CSIRO through Gerwin Klein.
I would be interested in feedback on this and also the native address versions. Whether it well help other applications may depend on whether the limitation is processor speed or memory bandwidth. It's likely that many applications will continue to perform best with native addressing.
The update also includes changes to the X86 code-generator that affect all the variants. These changes improve the way boolean values and conditionals are handled. There also other changes to the IO system. Garbage-collection of streams has been removed and overlapped IO is now used on Windows which should improve performance and responsiveness. There have been changes to the handling of floating point and Real32 has been added.
Let me have any feedback.
Regards, David
Hello. I got one error with compact32bit on 64bit:
(gdb) bt #0 0x00000000821f5400 in ?? () #1 0x00000000ff9ab2b0 in ?? () #2 0x00000000ff9ab2b2 in ?? () #3 0x0000000d021fff10 in ?? () #4 0x0000000d821f9e1c in ?? () #5 0x0000000d821f9f00 in ?? () #6 0x0000000d021fff38 in ?? () #7 0x00000000fffc3c10 in ?? () #8 0x00000000fffc3c14 in ?? () #9 0x00000000fffc3c26 in ?? () #10 0x0000000d821f9a25 in ?? () #11 0x0000000d021fff98 in ?? () #12 0x00000000fffc3c2e in ?? () #13 0x00000000fffc04b2 in ?? () #14 0x00000000fffbf892 in ?? () #15 0x0000000d821ea5b2 in ?? () #16 0x00000000fffc3c58 in ?? () #17 0x00000000fffc04b2 in ?? () #18 0x00000000fffbf928 in ?? () #19 0x00000000fffbf916 in ?? () #20 0x0000000d821e9646 in ?? () #21 0x0000000d821e29c8 in ?? () #22 0x0000000d821e29d7 in ?? () #23 0x0000000d021fffc0 in ?? () #24 0x00000000fffbf4a0 in ?? () #25 0x00000000fffbf4aa in ?? () #26 0x0000000d821e21ac in ?? () #27 0x0000000d821e21b6 in ?? () #28 0x0000000d021fffd8 in ?? () #29 0x0000000d821e203a in ?? () #30 0x0000000d821e204a in ?? () #31 0x0000000d021ffff0 in ?? () #32 0x00000008008acab5 in X86AsmRaiseException () from /home/nick/polyml_compact32bit_debug/lib/libpolyml.so.9 #33 0x00000008008acab5 in X86AsmRaiseException () from /home/nick/polyml_compact32bit_debug/lib/libpolyml.so.9 #34 0x0000000000000001 in ?? () Backtrace stopped: Cannot access memory at address 0xd02200000
??, 20 ???. 2019 ?. ? 13:21, David Matthews <David.Matthews at prolingua.co.uk>:
I've now merged a major piece of development work into GitHub master. This is a variant of 64-bit mode that uses 32-bit values. It was triggered by the observation that Isabelle often peformed better with the 32-bit version than the 64-bit despite the 64-bit version allowing a much larger heap and having more registers. It appears that Isabelle, at any rate, is limited more by the bandwidth between the processors and main memory than by the processors themselves. This may well be the result of the work on parallelising the application and the garbage collector.
In effect there are now three versions on the X86: native 64-bit, native 32-bit and 32-bits on 64-bits. The new version can be built with the --enable-compact32bit option to configure. It allows a heap size of up to 16Gbytes by using 31-bit "object ids", essentially indexes into an array of 8-byte cells. This heap contains only the data. Code, thread stacks and other support structures are outside this.
It has been extensively tested with Isabelle (thanks to Makarius) and seems to perform better than either of the native address versions. I started the development about a year ago; development of the final stages were supported by Data61 and CSIRO through Gerwin Klein.
I would be interested in feedback on this and also the native address versions. Whether it well help other applications may depend on whether the limitation is processor speed or memory bandwidth. It's likely that many applications will continue to perform best with native addressing.
The update also includes changes to the X86 code-generator that affect all the variants. These changes improve the way boolean values and conditionals are handled. There also other changes to the IO system. Garbage-collection of streams has been removed and overlapped IO is now used on Windows which should improve performance and responsiveness. There have been changes to the handling of floating point and Real32 has been added.
Let me have any feedback.
Regards, David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Thanks for the report. Did you see anything similar with the native code version? It could be a code-generator problem which would affect other versions or it could be something specific to 32-in-64. Unfortunately the stack trace doesn't tell me much other than that it looks as though it failed in ML code. Anything you can do to narrow down the problem and make it reproducible would be helpful.
Regards, David
On 22/01/2019 09:32, Kostirya wrote:
Hello. I got one error with compact32bit on 64bit:
(gdb) bt #0 0x00000000821f5400 in ?? () #1 0x00000000ff9ab2b0 in ?? () #2 0x00000000ff9ab2b2 in ?? () #3 0x0000000d021fff10 in ?? () #4 0x0000000d821f9e1c in ?? () #5 0x0000000d821f9f00 in ?? () #6 0x0000000d021fff38 in ?? () #7 0x00000000fffc3c10 in ?? () #8 0x00000000fffc3c14 in ?? () #9 0x00000000fffc3c26 in ?? () #10 0x0000000d821f9a25 in ?? () #11 0x0000000d021fff98 in ?? () #12 0x00000000fffc3c2e in ?? () #13 0x00000000fffc04b2 in ?? () #14 0x00000000fffbf892 in ?? () #15 0x0000000d821ea5b2 in ?? () #16 0x00000000fffc3c58 in ?? () #17 0x00000000fffc04b2 in ?? () #18 0x00000000fffbf928 in ?? () #19 0x00000000fffbf916 in ?? () #20 0x0000000d821e9646 in ?? () #21 0x0000000d821e29c8 in ?? () #22 0x0000000d821e29d7 in ?? () #23 0x0000000d021fffc0 in ?? () #24 0x00000000fffbf4a0 in ?? () #25 0x00000000fffbf4aa in ?? () #26 0x0000000d821e21ac in ?? () #27 0x0000000d821e21b6 in ?? () #28 0x0000000d021fffd8 in ?? () #29 0x0000000d821e203a in ?? () #30 0x0000000d821e204a in ?? () #31 0x0000000d021ffff0 in ?? () #32 0x00000008008acab5 in X86AsmRaiseException () from /home/nick/polyml_compact32bit_debug/lib/libpolyml.so.9 #33 0x00000008008acab5 in X86AsmRaiseException () from /home/nick/polyml_compact32bit_debug/lib/libpolyml.so.9 #34 0x0000000000000001 in ?? () Backtrace stopped: Cannot access memory at address 0xd02200000
Hello. I localized the error finally. Whew! Yes, it is *only* on 32-in-64 and it is reproducible with 100% on FreeBSD and CentOS. I prepared simple code for reproduce the error. It is FFI callback.
cat callback.c
void foo ( int n, void (*fn) (int) ) { int i = 1; while (n--) { (*fn)(i++); } return; }
cat cb-poly.sml
open Foreign val mylib = loadLibrary "callback.so" val foo = buildCall2 ((getSymbol mylib "foo"), (cInt, cFunction), cVoid) val cb = buildClosure1 ((fn v => print ((Int.toString v) ^ "\n")), cInt, cVoid) val _ = foo(3, cb)
cc -shared -o callback.so callback.c env LD_LIBRARY_PATH=. poly --script cb-poly.sml
??, 22 ???. 2019 ?. ? 17:51, David Matthews <David.Matthews at prolingua.co.uk>:
Thanks for the report. Did you see anything similar with the native code version? It could be a code-generator problem which would affect other versions or it could be something specific to 32-in-64. Unfortunately the stack trace doesn't tell me much other than that it looks as though it failed in ML code. Anything you can do to narrow down the problem and make it reproducible would be helpful.
Regards, David
On 22/01/2019 09:32, Kostirya wrote:
Hello. I got one error with compact32bit on 64bit:
(gdb) bt #0 0x00000000821f5400 in ?? () #1 0x00000000ff9ab2b0 in ?? () #2 0x00000000ff9ab2b2 in ?? () #3 0x0000000d021fff10 in ?? () #4 0x0000000d821f9e1c in ?? () #5 0x0000000d821f9f00 in ?? () #6 0x0000000d021fff38 in ?? () #7 0x00000000fffc3c10 in ?? () #8 0x00000000fffc3c14 in ?? () #9 0x00000000fffc3c26 in ?? () #10 0x0000000d821f9a25 in ?? () #11 0x0000000d021fff98 in ?? () #12 0x00000000fffc3c2e in ?? () #13 0x00000000fffc04b2 in ?? () #14 0x00000000fffbf892 in ?? () #15 0x0000000d821ea5b2 in ?? () #16 0x00000000fffc3c58 in ?? () #17 0x00000000fffc04b2 in ?? () #18 0x00000000fffbf928 in ?? () #19 0x00000000fffbf916 in ?? () #20 0x0000000d821e9646 in ?? () #21 0x0000000d821e29c8 in ?? () #22 0x0000000d821e29d7 in ?? () #23 0x0000000d021fffc0 in ?? () #24 0x00000000fffbf4a0 in ?? () #25 0x00000000fffbf4aa in ?? () #26 0x0000000d821e21ac in ?? () #27 0x0000000d821e21b6 in ?? () #28 0x0000000d021fffd8 in ?? () #29 0x0000000d821e203a in ?? () #30 0x0000000d821e204a in ?? () #31 0x0000000d021ffff0 in ?? () #32 0x00000008008acab5 in X86AsmRaiseException () from /home/nick/polyml_compact32bit_debug/lib/libpolyml.so.9 #33 0x00000008008acab5 in X86AsmRaiseException () from /home/nick/polyml_compact32bit_debug/lib/libpolyml.so.9 #34 0x0000000000000001 in ?? () Backtrace stopped: Cannot access memory at address 0xd02200000
Hi, Thank you for narrowing the error down. It should now be fixed.
There were a lot of cases where code that worked correctly on the native address versions failed in the 32-in-64 version and had to be modified. I'm sure there are still some cases that haven't been tested.
Regards, David
On 24/01/2019 08:08, Kostirya wrote:
Hello. I localized the error finally. Whew! Yes, it is *only* on 32-in-64 and it is reproducible with 100% on FreeBSD and CentOS. I prepared simple code for reproduce the error. It is FFI callback.
cat callback.c
void foo ( int n, void (*fn) (int) ) { int i = 1; while (n--) { (*fn)(i++); } return; }
cat cb-poly.sml
open Foreign val mylib = loadLibrary "callback.so" val foo = buildCall2 ((getSymbol mylib "foo"), (cInt, cFunction), cVoid) val cb = buildClosure1 ((fn v => print ((Int.toString v) ^ "\n")), cInt, cVoid) val _ = foo(3, cb)
cc -shared -o callback.so callback.c
I observed about 10% performance improvements in HOL4, between 5.7.1 release (x86_64) and latest master with the --enable-compact32bit option.
?Chun
Il giorno 20 gen 2019, alle ore 19:17, David Matthews <David.Matthews at prolingua.co.uk> ha scritto:
I've now merged a major piece of development work into GitHub master. This is a variant of 64-bit mode that uses 32-bit values. It was triggered by the observation that Isabelle often peformed better with the 32-bit version than the 64-bit despite the 64-bit version allowing a much larger heap and having more registers. It appears that Isabelle, at any rate, is limited more by the bandwidth between the processors and main memory than by the processors themselves. This may well be the result of the work on parallelising the application and the garbage collector.
In effect there are now three versions on the X86: native 64-bit, native 32-bit and 32-bits on 64-bits. The new version can be built with the --enable-compact32bit option to configure. It allows a heap size of up to 16Gbytes by using 31-bit "object ids", essentially indexes into an array of 8-byte cells. This heap contains only the data. Code, thread stacks and other support structures are outside this.
It has been extensively tested with Isabelle (thanks to Makarius) and seems to perform better than either of the native address versions. I started the development about a year ago; development of the final stages were supported by Data61 and CSIRO through Gerwin Klein.
I would be interested in feedback on this and also the native address versions. Whether it well help other applications may depend on whether the limitation is processor speed or memory bandwidth. It's likely that many applications will continue to perform best with native addressing.
The update also includes changes to the X86 code-generator that affect all the variants. These changes improve the way boolean values and conditionals are handled. There also other changes to the IO system. Garbage-collection of streams has been removed and overlapped IO is now used on Windows which should improve performance and responsiveness. There have been changes to the handling of floating point and Real32 has been added.
Let me have any feedback.
Regards, David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Thanks, that's interesting. Have you tried the latest master with native 64-bits? It would be useful to know whether it's the 32-in-64 that makes the difference for HOL4 or the changes to the code-generator.
Regards, David
On 22/01/2019 11:42, Chun Tian (binghe) wrote:
I observed about 10% performance improvements in HOL4, between 5.7.1 release (x86_64) and latest master with the --enable-compact32bit option.
?Chun
Hi David,
the short answer is, comparing with 5.7.1 release, both the latest master and the 32-in-64 option make the difference, but actually not that big as I reported yesterday.
Here are the test steps (of each round):
1. I do a full build of HOL4 in my working branch (in which the probability theories are greatly changed, also with one single theorem of 1500 lines of scripts). 2. I touched a base file (util_probScript.sml) in HOL?s ?src/probability?, then re-do the incremental build using single processor (but without generating theory html files at the end), thus all computation were used on compiling proof scripts in ?src/probability?.
I used Unix ?time? command to record the ?real" time spent on step 2, using three PolyML versions: 1) 5.7.1 release, 2) latest master (x64), 3) latest master (32-in-64). All tests were repeated 3 times, then I take the average time as the calculating basis. Below is the record:
[1] Poly/ML 5.7.1 Release 2m13.589s (133.589s) 2m11.944s (131.944s) 2m12.155s (132.155s) (average: 132.563s)
[2] Poly/ML 5.7.1 Release (Git version v5.7.1-481-g0a6ebca4) 2m11.569s (131.569s) 2m9.437s (129.437s) 2m9.820s (129.820s) (average: 130.275s)
[3] Poly/ML 5.7.1 32-in-64 (Git version v5.7.1-481-g0a6ebca4) 2m6.087s (126.087s) 2m7.180s (127.180s) 2m7.077s (127.077s) (average: 126.781s)
Thus, from 5.7.1 release to latest master (x64), I observed about 1.7% performance increments; from latest master (x64) to latest master (32-in-64), I observed another 2.7% performance increments; in total, from 5.7.1 release to latest master (32-in-64) there?re about 4.3% performance increments.
My observation yesterday (10%) was based on the time spent on the best single proof script. But actually from one version of PolyML to another, not all proof scripts get compiled faster. But I believe the total time spent on building the whole HOL4 should follow the same trends.
P. S. I think PolyML is already fast enough for HOL4, especially when you see how slow the Moscow ML builds are:) In Isabelle, every time when the user click inside the JEdit editor, it may trigger some computation at the PolyML side, I guess the performance issue is more urgent there.
Hope this helps,
Chun
Il giorno 22 gen 2019, alle ore 23:53, David Matthews <David.Matthews at prolingua.co.uk> ha scritto:
Thanks, that's interesting. Have you tried the latest master with native 64-bits? It would be useful to know whether it's the 32-in-64 that makes the difference for HOL4 or the changes to the code-generator.
Regards, David
On 22/01/2019 11:42, Chun Tian (binghe) wrote:
I observed about 10% performance improvements in HOL4, between 5.7.1 release (x86_64) and latest master with the --enable-compact32bit option. ?Chun
On 23/01/2019 08:42, Chun Tian (binghe) wrote:
P. S. I think PolyML is already fast enough for HOL4, especially when you see how slow the Moscow ML builds are:) In Isabelle, every time when the user click inside the JEdit editor, it may trigger some computation at the PolyML side, I guess the performance issue is more urgent there.
BTW: jEdit is just a plain text editor from http://www.jedit.org -- what you mean is generally called Prover IDE or PIDE, the specific product "Isabelle/jEdit", you also call it just "Isabelle" according to the application title.
The most dire need for Poly/ML performance is actually in batch builds, e.g. when maintaining the ever growing Archive of Formal Proofs (lets say after changing some notation or theorem names).
The Prover IDE seemingly wastes a lot of resources, but these are turned into direct prover interaction for the benefit of the user: it helps to do proofs efficiently. This interaction model has also motivated users to trim down slow proofs to what is really required: in recent years it had a beneficial impact on batch builds as well.
Makarius