Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
I wish I could help but there's not much I can suggest. The only idea that occurs to me is that there is some endian-ness issue that has crept in. Are these little-endian or big-endian? In theory the interpreter should work on both big-endian and little-endian but I've only tested the most recent version on X86. Have a look at an earlier version of Poly/ML and see if you have any more success with that.
David
On 12/01/2016 14:52, James Clarke wrote:
Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
They are all big-endian. I haven't tried mipsel; that could help narrow it down. One thing making me not so sure it's an endianness issue is that you support 32-bit PowerPC, and that runs properly. Also the mips builds are broken by GCC's optimisations; adding -fno-omit-frame-pointer made it work for some reason, if I remember correctly.
James
On 15 Jan 2016, at 11:29, David Matthews <David.Matthews at prolingua.co.uk> wrote:
I wish I could help but there's not much I can suggest. The only idea that occurs to me is that there is some endian-ness issue that has crept in. Are these little-endian or big-endian? In theory the interpreter should work on both big-endian and little-endian but I've only tested the most recent version on X86. Have a look at an earlier version of Poly/ML and see if you have any more success with that.
David
On 12/01/2016 14:52, James Clarke wrote: Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Hi David, I just tried building on mipsel, and that compiles and passes the test suite with the same compiler flags. Endianness is looking like a strong candidate, given that the only architectures it fails on are big-endian, although compiler optimisations are ?responsible?. I shall see if a very old version works on big-endian mips; if so, I will try and do a git bisect, otherwise it might have to be some painful debugging.
Regards, James
On 15 Jan 2016, at 11:59, James Clarke <jrtc27 at jrtc27.com> wrote:
They are all big-endian. I haven't tried mipsel; that could help narrow it down. One thing making me not so sure it's an endianness issue is that you support 32-bit PowerPC, and that runs properly. Also the mips builds are broken by GCC's optimisations; adding -fno-omit-frame-pointer made it work for some reason, if I remember correctly.
James
On 15 Jan 2016, at 11:29, David Matthews <David.Matthews at prolingua.co.uk> wrote:
I wish I could help but there's not much I can suggest. The only idea that occurs to me is that there is some endian-ness issue that has crept in. Are these little-endian or big-endian? In theory the interpreter should work on both big-endian and little-endian but I've only tested the most recent version on X86. Have a look at an earlier version of Poly/ML and see if you have any more success with that.
David
On 12/01/2016 14:52, James Clarke wrote: Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
James, I've managed to set up a big-endian mips debian virtual machine using qemu inside a virtual debian machine in virtualbox on Windows. Despite all the layers of virtualisation it works and more importantly Poly/ML actually builds successfully. It does crash with some larger examples, such as Tests/Succeed/Test133.ML, and I've seen some other crashes in the garbage-collector. I suspect that there is a problem with endian-ness somewhere but it may be possible to narrow this down with gdb.
Regards, David
On 16/01/2016 17:01, James Clarke wrote:
Hi David, I just tried building on mipsel, and that compiles and passes the test suite with the same compiler flags. Endianness is looking like a strong candidate, given that the only architectures it fails on are big-endian, although compiler optimisations are ?responsible?. I shall see if a very old version works on big-endian mips; if so, I will try and do a git bisect, otherwise it might have to be some painful debugging.
Regards, James
On 15 Jan 2016, at 11:59, James Clarke <jrtc27 at jrtc27.com> wrote:
They are all big-endian. I haven't tried mipsel; that could help narrow it down. One thing making me not so sure it's an endianness issue is that you support 32-bit PowerPC, and that runs properly. Also the mips builds are broken by GCC's optimisations; adding -fno-omit-frame-pointer made it work for some reason, if I remember correctly.
James
On 15 Jan 2016, at 11:29, David Matthews <David.Matthews at prolingua.co.uk> wrote:
I wish I could help but there's not much I can suggest. The only idea that occurs to me is that there is some endian-ness issue that has crept in. Are these little-endian or big-endian? In theory the interpreter should work on both big-endian and little-endian but I've only tested the most recent version on X86. Have a look at an earlier version of Poly/ML and see if you have any more success with that.
David
On 12/01/2016 14:52, James Clarke wrote: Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Hi David, Is this on Debian Jessie (current stable)? I've been doing all my work on Sid (new package versions are always built on unstable unless they are backports), which has GCC 5 (I believe it built fine under GCC 4.x, though I never ran the test suite); I had forgotten to mention this important fact, sorry about that! As this is for the Debian package, it's using their default compiler flags, although I think it breaks with either. Thanks for taking time to look at this though; I will continue to experiment as well.
Regards, James
On 18 Jan 2016, at 18:54, David Matthews <David.Matthews at prolingua.co.uk> wrote:
James, I've managed to set up a big-endian mips debian virtual machine using qemu inside a virtual debian machine in virtualbox on Windows. Despite all the layers of virtualisation it works and more importantly Poly/ML actually builds successfully. It does crash with some larger examples, such as Tests/Succeed/Test133.ML, and I've seen some other crashes in the garbage-collector. I suspect that there is a problem with endian-ness somewhere but it may be possible to narrow this down with gdb.
Regards, David
On 16/01/2016 17:01, James Clarke wrote: Hi David, I just tried building on mipsel, and that compiles and passes the test suite with the same compiler flags. Endianness is looking like a strong candidate, given that the only architectures it fails on are big-endian, although compiler optimisations are ?responsible?. I shall see if a very old version works on big-endian mips; if so, I will try and do a git bisect, otherwise it might have to be some painful debugging.
Regards, James
On 15 Jan 2016, at 11:59, James Clarke <jrtc27 at jrtc27.com> wrote:
They are all big-endian. I haven't tried mipsel; that could help narrow it down. One thing making me not so sure it's an endianness issue is that you support 32-bit PowerPC, and that runs properly. Also the mips builds are broken by GCC's optimisations; adding -fno-omit-frame-pointer made it work for some reason, if I remember correctly.
James
On 15 Jan 2016, at 11:29, David Matthews <David.Matthews at prolingua.co.uk> wrote:
I wish I could help but there's not much I can suggest. The only idea that occurs to me is that there is some endian-ness issue that has crept in. Are these little-endian or big-endian? In theory the interpreter should work on both big-endian and little-endian but I've only tested the most recent version on X86. Have a look at an earlier version of Poly/ML and see if you have any more success with that.
David
On 12/01/2016 14:52, James Clarke wrote: Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
James, It's Wheezy (old stable?) with gcc version 4.6.
I've just been running Poly/ML under the debugger and I've found something that might account for the crashes. I need to look more closely at this.
Regards, David
On 18/01/2016 19:34, James Clarke wrote:
Hi David, Is this on Debian Jessie (current stable)? I've been doing all my work on Sid (new package versions are always built on unstable unless they are backports), which has GCC 5 (I believe it built fine under GCC 4.x, though I never ran the test suite); I had forgotten to mention this important fact, sorry about that! As this is for the Debian package, it's using their default compiler flags, although I think it breaks with either. Thanks for taking time to look at this though; I will continue to experiment as well.
Regards, James
On 18 Jan 2016, at 18:54, David Matthews <David.Matthews at prolingua.co.uk> wrote:
James, I've managed to set up a big-endian mips debian virtual machine using qemu inside a virtual debian machine in virtualbox on Windows. Despite all the layers of virtualisation it works and more importantly Poly/ML actually builds successfully. It does crash with some larger examples, such as Tests/Succeed/Test133.ML, and I've seen some other crashes in the garbage-collector. I suspect that there is a problem with endian-ness somewhere but it may be possible to narrow this down with gdb.
Regards, David
On 16/01/2016 17:01, James Clarke wrote: Hi David, I just tried building on mipsel, and that compiles and passes the test suite with the same compiler flags. Endianness is looking like a strong candidate, given that the only architectures it fails on are big-endian, although compiler optimisations are ?responsible?. I shall see if a very old version works on big-endian mips; if so, I will try and do a git bisect, otherwise it might have to be some painful debugging.
Regards, James
On 15 Jan 2016, at 11:59, James Clarke <jrtc27 at jrtc27.com> wrote:
They are all big-endian. I haven't tried mipsel; that could help narrow it down. One thing making me not so sure it's an endianness issue is that you support 32-bit PowerPC, and that runs properly. Also the mips builds are broken by GCC's optimisations; adding -fno-omit-frame-pointer made it work for some reason, if I remember correctly.
James
On 15 Jan 2016, at 11:29, David Matthews <David.Matthews at prolingua.co.uk> wrote:
I wish I could help but there's not much I can suggest. The only idea that occurs to me is that there is some endian-ness issue that has crept in. Are these little-endian or big-endian? In theory the interpreter should work on both big-endian and little-endian but I've only tested the most recent version on X86. Have a look at an earlier version of Poly/ML and see if you have any more success with that.
David
On 12/01/2016 14:52, James Clarke wrote: Hi, I?ve been trying to port Poly/ML to mips and IBM?s S/390 (the 64-bit version, often referred to as s390x). For both, I tried just adding an extra case in configure.ac, along with corresponding HOSTARCHITECTURE macros and cases in libpolyml/elfexport.cpp. However, these all seem to segfault when polyimport is run when building (both with 5.5.2 and git commit ee26375, "Merge branch ?PICTest?"). I can?t seem to get a meaningful stack trace out of the mips segfault, but it crashes just after ?Use: basis/Socket.sml? is printed. However, on s390x, it crashes before anything is printed, and valgrind gave me the following (with no errors before this point) when running ee26375?s polyimport:
==16138== Thread 3: ==16138== Invalid read of size 8 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== Address 0xe000000005ab5b38 is not stack'd, malloc'd or (recently) free'd ==16138== ==16138== ==16138== Process terminating with default action of signal 11 (SIGSEGV) ==16138== Access not within mapped region at address 0xE000000005AB5000 ==16138== at 0x489EA50: Offset (globals.h:315) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:344) ==16138== by 0x489EA50: GetConstSegmentForCode (globals.h:350) ==16138== by 0x489EA50: ConstPtrForCode (globals.h:355) ==16138== by 0x489EA50: buildStackList(TaskData*, PolyWord*, PolyWord*) (run_time.cpp:413) ==16138== by 0x489EC87: exceptionToTraceException(TaskData*, SaveVecEntry*) (run_time.cpp:471) ==16138== by 0x48AC9ED: IntTaskData::SwitchToPoly() (interpret.cpp:877) ==16138== by 0x48ACC33: IntTaskData::EnterPolyCode() (interpret.cpp:1428) ==16138== by 0x489324D: NewThreadFunction(void*) (processes.cpp:1128) ==16138== by 0x48E591D: start_thread (pthread_create.c:335) ==16138== by 0x4C8CEA9: ??? (in /lib/s390x-linux-gnu/libc-2.21.so) ==16138== If you believe this happened as a result of a stack ==16138== overflow in your program's main thread (unlikely but ==16138== possible), you can try to increase the size of the ==16138== main thread stack using the --main-stacksize= flag. ==16138== The main thread stack size used in this run was 8388608.
(the ??? for libc is because valgrind does not yet understand compressed debug info; I removed a whole load of warnings to that effect)
Have you ever come across anything like this? Do you have any thoughts for where to start with hunting this down?
Regards, James Clarke
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
James, I've committed a fix that solves the problem I found. Poly/ML now runs all the regression tests and will rebuild the compiler. I was going to try it in jessie (stable) but installing that is proving more difficult than I'd expected.
I've added configure and elfexport entries for Mips. Please feel free to update it if you like and add any other machines you manage to get working.
Regards, David
On 18/01/2016 21:16, David Matthews wrote:
James, It's Wheezy (old stable?) with gcc version 4.6.
I've just been running Poly/ML under the debugger and I've found something that might account for the crashes. I need to look more closely at this.
Hi David, I had thought it might be something like that, but a quick glance through pexport didn?t enlighten me that much as I?m not familiar enough with the same codebase.
Those were the same elfexport properties I had. I will try building on Sid and see if it still crashes.
Thanks, James
On 19 Jan 2016, at 19:22, David Matthews <David.Matthews at prolingua.co.uk> wrote:
James, I've committed a fix that solves the problem I found. Poly/ML now runs all the regression tests and will rebuild the compiler. I was going to try it in jessie (stable) but installing that is proving more difficult than I'd expected.
I've added configure and elfexport entries for Mips. Please feel free to update it if you like and add any other machines you manage to get working.
Regards, David
On 18/01/2016 21:16, David Matthews wrote:
James, It's Wheezy (old stable?) with gcc version 4.6.
I've just been running Poly/ML under the debugger and I've found something that might account for the crashes. I need to look more closely at this.
Hi David, I added the patch (locally) to the Debian 5.5.2 package; it now builds successfully on sid on mips (and still works on mipsel!) and passes the test suite! Thanks for tracking this down; I shall check it still works with the latest master. I?m also waiting on my S/390 5.5.2 build to finish; hopefully the patch fixes that crash too.
Unrelated: I filed https://github.com/polyml/polyml/pull/13 a few days ago; could you please review it? I noticed that my Debian package build script was in fact not running the test suite, and found out this was because it was running the check rule, which does nothing for Poly/ML. While I can override this default behaviour in the package build script, I thought it would be a good idea if Poly/ML followed the convention.
Thanks, James
On 19 Jan 2016, at 19:31, James Clarke <jrtc27 at jrtc27.com> wrote:
Hi David, I had thought it might be something like that, but a quick glance through pexport didn?t enlighten me that much as I?m not familiar enough with the same codebase.
Those were the same elfexport properties I had. I will try building on Sid and see if it still crashes.
Thanks, James
On 19 Jan 2016, at 19:22, David Matthews <David.Matthews at prolingua.co.uk> wrote:
James, I've committed a fix that solves the problem I found. Poly/ML now runs all the regression tests and will rebuild the compiler. I was going to try it in jessie (stable) but installing that is proving more difficult than I'd expected.
I've added configure and elfexport entries for Mips. Please feel free to update it if you like and add any other machines you manage to get working.
Regards, David
Hi David, Unfortunately the 5.5.2 S/390 build was not successful. It appears the segfault has been fixed, but now polyimport just exits with exit code 1 (via a POLY_SYS_exit instruction in thread 4, so polytemp.txt does get loaded) and doesn?t print anything. I?m now trying the latest master; if that still has the same problem, I will try and find out why it?s exiting.
I have filed https://github.com/polyml/polyml/pull/14 to allow building on mipsel (I was using my own patch I wrote a while ago to add MIPS support and just applied your endian fix when I said it worked). This also includes a bit of a cleanup which is potentially subjective, so if you disagree with any of the changes I will happily change the pull request.
Regards, James
On 19 Jan 2016, at 22:57, James Clarke <jrtc27 at jrtc27.com> wrote:
Hi David, I added the patch (locally) to the Debian 5.5.2 package; it now builds successfully on sid on mips (and still works on mipsel!) and passes the test suite! Thanks for tracking this down; I shall check it still works with the latest master. I?m also waiting on my S/390 5.5.2 build to finish; hopefully the patch fixes that crash too.
Unrelated: I filed https://github.com/polyml/polyml/pull/13 a few days ago; could you please review it? I noticed that my Debian package build script was in fact not running the test suite, and found out this was because it was running the check rule, which does nothing for Poly/ML. While I can override this default behaviour in the package build script, I thought it would be a good idea if Poly/ML followed the convention.
Thanks, James
Hi James, I've applied your pull request. I've included your clean-up since I don't have any strong feelings about it.
As a result of the discussions about the Mips I've just bought a small development board for the Mips, a Ci20 that is essentially a Mips-based Raspberry Pi. In the early 1990s Prolingua had a Mips Decstation and I wrote a code-generator for Poly when Poly/ML was still written in Poly. Unlike the Sparc and x86 code-generators it never made it into the ML version of Poly/ML. It would be nice to resurrect this and at least put a simple Mips code-generator into Poly/ML if it can be done without a lot of work. Maintaining it in the mainline code is probably too much work but it would be nice to have it available somewhere. Regards, David
On 20/01/2016 17:02, James Clarke wrote:
Hi David, Unfortunately the 5.5.2 S/390 build was not successful. It appears the segfault has been fixed, but now polyimport just exits with exit code 1 (via a POLY_SYS_exit instruction in thread 4, so polytemp.txt does get loaded) and doesn?t print anything. I?m now trying the latest master; if that still has the same problem, I will try and find out why it?s exiting.
I have filed https://github.com/polyml/polyml/pull/14 to allow building on mipsel (I was using my own patch I wrote a while ago to add MIPS support and just applied your endian fix when I said it worked). This also includes a bit of a cleanup which is potentially subjective, so if you disagree with any of the changes I will happily change the pull request.
Regards, James
On 19 Jan 2016, at 22:57, James Clarke <jrtc27 at jrtc27.com> wrote:
Hi David, I added the patch (locally) to the Debian 5.5.2 package; it now builds successfully on sid on mips (and still works on mipsel!) and passes the test suite! Thanks for tracking this down; I shall check it still works with the latest master. I?m also waiting on my S/390 5.5.2 build to finish; hopefully the patch fixes that crash too.
Unrelated: I filed https://github.com/polyml/polyml/pull/13 a few days ago; could you please review it? I noticed that my Debian package build script was in fact not running the test suite, and found out this was because it was running the check rule, which does nothing for Poly/ML. While I can override this default behaviour in the package build script, I thought it would be a good idea if Poly/ML followed the convention.
Thanks, James
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml