This weekend, I attempted to build polyml 5.8 for Fedora (Rawhide for now, hopefully Fedora 30 later). I encountered issues with the 32-bit ARM build and the s390x build.
On s390x, immediately after the C++ code is compiled, polyimport is executed for the first time:
./polyimport polytemp.txt -I . < ./exportPoly.sml /bin/sh: line 1: 33283 Segmentation fault (core dumped) ./polyimport polytemp.txt -I . < ./exportPoly.sml
GDB says: (gdb) bt #0 0x000003fffde56ce6 in PolyObject::Get (this=0x8017c8fcff030000, i=0) at globals.h:311 #1 0x000003fffdebbfce in IntTaskData::SwitchToPoly (this=0x10336d0) at interpret.cpp:761 #2 0x000003fffdec2830 in IntTaskData::EnterPolyCode (this=0x10336d0) at interpret.cpp:2261 #3 0x000003fffde92eea in NewThreadFunction (parameter=0x10336d0) at processes.cpp:1307 #4 0x000003fffdb89da6 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x000003fffdd01b06 in thread_start () from /usr/lib64/libc.so.6
Notice the "this" parameter in frame 0. That is an invalid address ... unless one reverses the bytes to get 0x000003fffcc81780, which is a valid address. The code that produces the argument to indirect_0 has apparently done so in little endian byte order. If you can tell me where to look, I can help debug this further. The presence or absence of --enable-compact32bit makes no difference; the segfault happens either way.
On 32-bit ARM, the build succeeds, but one test fails: Tests/Succeed/Test174.ML. The first two failures are on lines 34 and 35:
check(Real32.fromLarge IEEEReal.TO_ZERO pm < p32); check(Real32.fromLarge IEEEReal.TO_ZERO mp > m32);
Entering those values by hand at the polyml command prompt I get:
Real32.fromLarge IEEEReal.TO_ZERO pm;
val it = 3.141592741: Real32.real
p32;
val it = 3.141592741: ?.Math.real
Real32.fromLarge IEEEReal.TO_ZERO mp;
val it = ~3.141592741: Real32.real
Then the tests at lines 40 and 42 fail:
check(Real32.fromLarge IEEEReal.TO_POSINF pp > p32); check(Real32.fromLarge IEEEReal.TO_POSINF mp > m32);
Real32.fromLarge IEEEReal.TO_POSINF pp;
val it = 3.141592741: Real32.real
Real32.fromLarge IEEEReal.TO_POSINF mp;
val it = ~3.141592741: Real32.real
Likewise, the tests at lines 48 and 50 fail:
check(Real32.fromLarge IEEEReal.TO_NEGINF pm < p32); check(Real32.fromLarge IEEEReal.TO_NEGINF mm < m32);
Real32.fromLarge IEEEReal.TO_NEGINF pm;
val it = 3.141592741: Real32.real
Real32.fromLarge IEEEReal.TO_NEGINF mm;
val it = ~3.141592741: Real32.real
The output of configure on ARM seems okay, although I will note an entirely different problem:
configure:17740: checking whether as supports .note.GNU-stack configure:17753: armv7hl-redhat-linux-gnueabi-gcc -c -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -fno-strict-aliasing -D_GNU_SOURCE conftest.c >&5 {standard input}: Assembler messages: {standard input}:552: Error: junk at end of line, first unrecognized character is `,' configure:17753: $? = 1 configure: failed program was: | /* confdefs.h */
[ skip lots of #defines that don't matter]
| /* end confdefs.h. */ | __asm__(".section .note.GNU-stack,"", at progbits"); | int | main () | { | | ; | return 0; | }
Taking a clue from the output of gcc -S conftest.c -fverbose-asm after removing the __asm__ line above, I changed @progbits to %progbits, and then the test worked.
Advice on how to proceed w.r.t the floating point issues with the ARM build is much appreciated. Regards,
Hi, Thanks for looking at this. The focus before the release of 5.8 was on the X86 and in particular the 32/64 version. The interpreted version wasn't really tested to the same extent. Since then I've fixed a couple of bugs that affected the interpreted version when I tested it on ARM and Mips development boards. Once of these caused a segfault and the other affected some of the Real32.real arithmetic. These fixed have been pushed to git master and I'm planning to add them to the fixes-5.8 branch.
It would be very helpful if you could rerun your tests with master so I know whether the bugs you've encountered have actually been fixed.
The test for .note.GNU-stack is only relevant on the X86 so configure should probably be changed so that this test is omitted in the interpreted version.
Regards, David
On 24/03/2019 23:37, Jerry James wrote:
This weekend, I attempted to build polyml 5.8 for Fedora (Rawhide for now, hopefully Fedora 30 later). I encountered issues with the 32-bit ARM build and the s390x build.
On s390x, immediately after the C++ code is compiled, polyimport is executed for the first time:
./polyimport polytemp.txt -I . < ./exportPoly.sml /bin/sh: line 1: 33283 Segmentation fault (core dumped) ./polyimport polytemp.txt -I . < ./exportPoly.sml
GDB says: (gdb) bt #0 0x000003fffde56ce6 in PolyObject::Get (this=0x8017c8fcff030000, i=0) at globals.h:311 #1 0x000003fffdebbfce in IntTaskData::SwitchToPoly (this=0x10336d0) at interpret.cpp:761 #2 0x000003fffdec2830 in IntTaskData::EnterPolyCode (this=0x10336d0) at interpret.cpp:2261 #3 0x000003fffde92eea in NewThreadFunction (parameter=0x10336d0) at processes.cpp:1307 #4 0x000003fffdb89da6 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x000003fffdd01b06 in thread_start () from /usr/lib64/libc.so.6
Notice the "this" parameter in frame 0. That is an invalid address ... unless one reverses the bytes to get 0x000003fffcc81780, which is a valid address. The code that produces the argument to indirect_0 has apparently done so in little endian byte order. If you can tell me where to look, I can help debug this further. The presence or absence of --enable-compact32bit makes no difference; the segfault happens either way.
On 32-bit ARM, the build succeeds, but one test fails: Tests/Succeed/Test174.ML. The first two failures are on lines 34 and 35:
check(Real32.fromLarge IEEEReal.TO_ZERO pm < p32); check(Real32.fromLarge IEEEReal.TO_ZERO mp > m32);
Entering those values by hand at the polyml command prompt I get:
Real32.fromLarge IEEEReal.TO_ZERO pm;
val it = 3.141592741: Real32.real
p32;
val it = 3.141592741: ?.Math.real
Real32.fromLarge IEEEReal.TO_ZERO mp;
val it = ~3.141592741: Real32.real
Then the tests at lines 40 and 42 fail:
check(Real32.fromLarge IEEEReal.TO_POSINF pp > p32); check(Real32.fromLarge IEEEReal.TO_POSINF mp > m32);
Real32.fromLarge IEEEReal.TO_POSINF pp;
val it = 3.141592741: Real32.real
Real32.fromLarge IEEEReal.TO_POSINF mp;
val it = ~3.141592741: Real32.real
Likewise, the tests at lines 48 and 50 fail:
check(Real32.fromLarge IEEEReal.TO_NEGINF pm < p32); check(Real32.fromLarge IEEEReal.TO_NEGINF mm < m32);
Real32.fromLarge IEEEReal.TO_NEGINF pm;
val it = 3.141592741: Real32.real
Real32.fromLarge IEEEReal.TO_NEGINF mm;
val it = ~3.141592741: Real32.real
The output of configure on ARM seems okay, although I will note an entirely different problem:
configure:17740: checking whether as supports .note.GNU-stack configure:17753: armv7hl-redhat-linux-gnueabi-gcc -c -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -fno-strict-aliasing -D_GNU_SOURCE conftest.c >&5 {standard input}: Assembler messages: {standard input}:552: Error: junk at end of line, first unrecognized character is `,' configure:17753: $? = 1 configure: failed program was: | /* confdefs.h */
[ skip lots of #defines that don't matter]
| /* end confdefs.h. */ | __asm__(".section .note.GNU-stack,"", at progbits"); | int | main () | { | | ; | return 0; | }
Taking a clue from the output of gcc -S conftest.c -fverbose-asm after removing the __asm__ line above, I changed @progbits to %progbits, and then the test worked.
Advice on how to proceed w.r.t the floating point issues with the ARM build is much appreciated. Regards,
On Mon, Mar 25, 2019 at 10:00 AM David Matthews <David.Matthews at prolingua.co.uk> wrote:
Hi, Thanks for looking at this. The focus before the release of 5.8 was on the X86 and in particular the 32/64 version. The interpreted version wasn't really tested to the same extent. Since then I've fixed a couple of bugs that affected the interpreted version when I tested it on ARM and Mips development boards. Once of these caused a segfault and the other affected some of the Real32.real arithmetic. These fixed have been pushed to git master and I'm planning to add them to the fixes-5.8 branch.
Sure, probably nearly all polyml installations are x86_64, so it makes sense to focus your efforts there.
It would be very helpful if you could rerun your tests with master so I know whether the bugs you've encountered have actually been fixed.
I just did a test build. The logs will be available here for a few days:
https://koji.fedoraproject.org/koji/taskinfo?taskID=33786018
Don't be fooled that it is still called version 5.8. I swapped out the 5.8 tarball for my own tarball made from git master. Both the s390x endianness issue and the ARM floating point issue appear to still be present in master. Note that Fedora uses these flags when building for 32-bit ARM:
-march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard
Regards,
Hi Jerry, Thanks for looking at this. I've tried building on the ARM using those flags and it seems that converting a double (Real.real) to a float (Real32.real) ignores the rounding setting and always uses the nearest.
Real32.fromLarge IEEEReal.TO_NEAREST mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_ZERO mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_POSINF mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_NEGINF mp == m32;
val it = true: bool
I don't know what to do about that. The test is useful on other platforms so it would be a shame to remove it.
So the s390x is big-endian? I don't have access to any big-endian hardware but I managed to find an image for a big-endian Mips machine under Qemu and confirm that there was a problem. Actually finding out what the problem is looks like quite a bit of work especially as it takes a long time to run anything.
Regards, David
On 27/03/2019 02:50, Jerry James wrote:
On Mon, Mar 25, 2019 at 10:00 AM David Matthews <David.Matthews at prolingua.co.uk> wrote:
Hi, Thanks for looking at this. The focus before the release of 5.8 was on the X86 and in particular the 32/64 version. The interpreted version wasn't really tested to the same extent. Since then I've fixed a couple of bugs that affected the interpreted version when I tested it on ARM and Mips development boards. Once of these caused a segfault and the other affected some of the Real32.real arithmetic. These fixed have been pushed to git master and I'm planning to add them to the fixes-5.8 branch.
Sure, probably nearly all polyml installations are x86_64, so it makes sense to focus your efforts there.
It would be very helpful if you could rerun your tests with master so I know whether the bugs you've encountered have actually been fixed.
I just did a test build. The logs will be available here for a few days:
https://koji.fedoraproject.org/koji/taskinfo?taskID=33786018
Don't be fooled that it is still called version 5.8. I swapped out the 5.8 tarball for my own tarball made from git master. Both the s390x endianness issue and the ARM floating point issue appear to still be present in master. Note that Fedora uses these flags when building for 32-bit ARM:
-march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard
Regards,
* David Matthews:
So the s390x is big-endian?
Yes.
I don't have access to any big-endian hardware but I managed to find an image for a big-endian Mips machine under Qemu and confirm that there was a problem.
The GCC compile farm has big endian machines, and so does Debian:
https://cfarm.tetaneutral.net/ https://dsa.debian.org/doc/guest-account/ https://wiki.debian.org/PorterBoxHowToUse
IBM also offers s390x community machines, but reportedly, setting up access is rather difficult, and I can't find the web page detailing the process right now.
Thanks for that. I'll bear it in mind. I've actually managed to fix the problem with 5.8 on big-endian machines and it now builds and runs the test suite correctly on the big-endian Mips.
Jerry: Can you test again with the S390x and confirm if it works now? This is in master. I'll add the fixes to fixes-5.8 if it works.
Regards, David
On 29/03/2019 12:03, Florian Weimer wrote:
- David Matthews:
So the s390x is big-endian?
Yes.
I don't have access to any big-endian hardware but I managed to find an image for a big-endian Mips machine under Qemu and confirm that there was a problem.
The GCC compile farm has big endian machines, and so does Debian:
https://cfarm.tetaneutral.net/ https://dsa.debian.org/doc/guest-account/ https://wiki.debian.org/PorterBoxHowToUse
IBM also offers s390x community machines, but reportedly, setting up access is rather difficult, and I can't find the web page detailing the process right now. _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On Fri, Mar 29, 2019 at 7:29 AM David Matthews <David.Matthews at prolingua.co.uk> wrote:
Thanks for that. I'll bear it in mind. I've actually managed to fix the problem with 5.8 on big-endian machines and it now builds and runs the test suite correctly on the big-endian Mips.
Jerry: Can you test again with the S390x and confirm if it works now? This is in master. I'll add the fixes to fixes-5.8 if it works.
It doesn't crash anymore, but "polyimport polytemp.txt -I . < ./exportPoly.sml" exits with exit code 1 without printing any error messages. I have tried the various --debug options without seeing anything alarming. Is there some way of figuring out the reason for the exit code?
On 31/03/2019 23:31, Jerry James wrote:
On Fri, Mar 29, 2019 at 7:29 AM David Matthews <David.Matthews at prolingua.co.uk> wrote:
Thanks for that. I'll bear it in mind. I've actually managed to fix the problem with 5.8 on big-endian machines and it now builds and runs the test suite correctly on the big-endian Mips.
Jerry: Can you test again with the S390x and confirm if it works now? This is in master. I'll add the fixes to fixes-5.8 if it works.
It doesn't crash anymore, but "polyimport polytemp.txt -I . < ./exportPoly.sml" exits with exit code 1 without printing any error messages. I have tried the various --debug options without seeing anything alarming. Is there some way of figuring out the reason for the exit code?
That's odd because that was the problem I had on the Mips BEFORE I fixed the bug. This is the commit: https://github.com/polyml/polyml/commit/195a36f7a587df9b391755668867d31af911... Before I investigate further can you check that you have actually tested it with that commit applied?
Regards, David
On 02/04/2019 11:43, David Matthews wrote:
On 31/03/2019 23:31, Jerry James wrote:
On Fri, Mar 29, 2019 at 7:29 AM David Matthews <David.Matthews at prolingua.co.uk> wrote:
Thanks for that.? I'll bear it in mind.? I've actually managed to fix the problem with 5.8 on big-endian machines and it now builds and runs the test suite correctly on the big-endian Mips.
Jerry: Can you test again with the S390x and confirm if it works now? This is in master.? I'll add the fixes to fixes-5.8 if it works.
It doesn't crash anymore, but "polyimport polytemp.txt -I . < ./exportPoly.sml" exits with exit code 1 without printing any error messages.? I have tried the various --debug options without seeing anything alarming.? Is there some way of figuring out the reason for the exit code?
That's odd because that was the problem I had on the Mips BEFORE I fixed the bug.? This is the commit: https://github.com/polyml/polyml/commit/195a36f7a587df9b391755668867d31af911...
Before I investigate further can you check that you have actually tested it with that commit applied?
I've now managed to get the S390x running in qemu and discovered that there was still a bug. I've pushed a fix to master ( https://github.com/polyml/polyml/commit/fec83e0fdb616509be10518727a8198281b7... ).
Regards, David
Hi,
please allow me a few words on ARM.
I just tried 'polyml 5.8 git master' on a Raspberry Pi3 B. uname -a is: Linux raspberrypi 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux OS is Raspbian 'Stretch' (Debian). Please find the results below.
It's been quite a while since we ported code from Sun and x86 to ARM. The main problem is, you first have to figure out what you've really got. ARM may run in little or big endian, it may have a certain feature, or not, depending on the implementation. Hence gcc compiler switches may work as expected, not at all, or anything in between.
For example, the above 'armv7l' really is a 64 bit armv8 running in 32 bit mode. However, the Programmers Guide for ARMv8 says it can do the following conversions on floating point values:
FCVT Sd, Hn // half-precision to single-precision FCVT Dd, Hn // half-precision to double-precision FCVT Hd, Sn // single-precision to half-precision FCVT Hd, Dn // double-precision to half-precision
But only if it is implemented. It is allowed to leave floating point out, or implementing it with or without exception trapping.
Back then porting the code was difficult since a lot of maths was involved. It's not without cause the Programmers Guide says you are well advised to do away with each and every compiler warning. Anyway, the solution was to leave the compiler with no choice regarding data types.
Regards Michael
./configure make compiler
realconv.cpp: In function ?char* rv_alloc(int)?: realconv.cpp:3648:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] sizeof(Bigint) - sizeof(ULong) - sizeof(int) + j <= i; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
elfexport.cpp: In member function ?virtual void ELFExport::ScanConstant(PolyObject*, byte*, ScanRelocationKind)?: elfexport.cpp:361:18: warning: unused variable ?offset? [-Wunused-variable] POLYUNSIGNED offset = (char*)a - (char*)memTable[aArea].mtOriginalAddr; ^~~~~~
make check-local
Failed Tests: Succeed/Test174.ML
use "Tests/Succeed/Test174.ML";
val check = fn: bool -> unit val p32 = 3.141592741: ?.Math.real val m32 = ~3.141592741: ?.Math.real val p = 3.141592741: real val m = ~3.141592741: real val pp = 3.141592741: real val it = (): unit val pm = 3.141592741: real val it = (): unit val mp = ~3.141592741: real val it = (): unit val mm = ~3.141592741: real val it = (): unit infix 4 == val == = fn: Real32.real * Real32.real -> bool val it = (): unit val it = (): unit val it = (): unit val it = (): unit val it = (): unit Exception- Fail "incorrect" raised Exception- Fail "incorrect" raised
On Thu, 28 Mar 2019, David Matthews wrote:
Hi Jerry, Thanks for looking at this. I've tried building on the ARM using those flags and it seems that converting a double (Real.real) to a float (Real32.real) ignores the rounding setting and always uses the nearest.
Real32.fromLarge IEEEReal.TO_NEAREST mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_ZERO mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_POSINF mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_NEGINF mp == m32;
val it = true: bool
I don't know what to do about that. The test is useful on other platforms so it would be a shame to remove it.
So the s390x is big-endian? I don't have access to any big-endian hardware but I managed to find an image for a big-endian Mips machine under Qemu and confirm that there was a problem. Actually finding out what the problem is looks like quite a bit of work especially as it takes a long time to run anything.
Regards, David
Thanks for testing it. I had actually tested it on my Pi3 B both in 32-bit and 64-bit modes. The issue was the fact that Test174 failed and the question was why and whether there was a serious bug. Having traced it to the rounding in the conversion from Real.real to Real32.real I have to say that I don't see that it is worth spending any more time on this.
Regards, David
On 30/03/2019 00:04, mmoel wrote:
Hi,
please allow me a few words on ARM.
I just tried 'polyml 5.8 git master' on a Raspberry Pi3 B. uname -a is: Linux raspberrypi 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux OS is Raspbian 'Stretch' (Debian). Please find the results below.
It's been quite a while since we ported code from Sun and x86 to ARM. The main problem is, you first have to figure out what you've really got. ARM may run in little or big endian, it may have a certain feature, or not, depending on the implementation. Hence gcc compiler switches may work as expected, not at all, or anything in between.
For example, the above 'armv7l' really is a 64 bit armv8 running in 32 bit mode. However, the Programmers Guide for ARMv8 says it can do the following conversions on floating point values:
FCVT Sd, Hn // half-precision to single-precision FCVT Dd, Hn // half-precision to double-precision FCVT Hd, Sn // single-precision to half-precision FCVT Hd, Dn // double-precision to half-precision
But only if it is implemented. It is allowed to leave floating point out, or implementing it with or without exception trapping.
Back then porting the code was difficult since a lot of maths was involved. It's not without cause the Programmers Guide says you are well advised to do away with each and every compiler warning. Anyway, the solution was to leave the compiler with no choice regarding data types.
Regards Michael
./configure make compiler
realconv.cpp: In function ?char* rv_alloc(int)?: realconv.cpp:3648:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] ???????? sizeof(Bigint) - sizeof(ULong) - sizeof(int) + j <= i; ?????????????? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
elfexport.cpp: In member function ?virtual void ELFExport::ScanConstant(PolyObject*, byte*, ScanRelocationKind)?: elfexport.cpp:361:18: warning: unused variable ?offset? [-Wunused-variable] ???? POLYUNSIGNED offset = (char*)a - (char*)memTable[aArea].mtOriginalAddr; ????????????????? ^~~~~~
make check-local
Failed Tests: Succeed/Test174.ML
use "Tests/Succeed/Test174.ML";
val check = fn: bool -> unit val p32 = 3.141592741: ?.Math.real val m32 = ~3.141592741: ?.Math.real val p = 3.141592741: real val m = ~3.141592741: real val pp = 3.141592741: real val it = (): unit val pm = 3.141592741: real val it = (): unit val mp = ~3.141592741: real val it = (): unit val mm = ~3.141592741: real val it = (): unit infix 4 == val == = fn: Real32.real * Real32.real -> bool val it = (): unit val it = (): unit val it = (): unit val it = (): unit val it = (): unit Exception- Fail "incorrect" raised Exception- Fail "incorrect" raised
On Thu, 28 Mar 2019, David Matthews wrote:
Hi Jerry, Thanks for looking at this.? I've tried building on the ARM using those flags and it seems that converting a double (Real.real) to a float (Real32.real) ignores the rounding setting and always uses the nearest.
Real32.fromLarge IEEEReal.TO_NEAREST mp == m32;
val it = true: bool
?Real32.fromLarge IEEEReal.TO_ZERO mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_POSINF mp == m32;
val it = true: bool
Real32.fromLarge IEEEReal.TO_NEGINF mp == m32;
val it = true: bool
I don't know what to do about that.? The test is useful on other platforms so it would be a shame to remove it.
So the s390x is big-endian?? I don't have access to any big-endian hardware but I managed to find an image for a big-endian Mips machine under Qemu and confirm that there was a problem.? Actually finding out what the problem is looks like quite a bit of work especially as it takes a long time to run anything.
Regards, David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml