I have now pushed a major update to git master which is the result of work going back to the beginning of the year. It covers a number of areas but largely the code-generator and the run-time system interface.
The basic design of much of the low-level parts of Poly/ML dated back to the early days. While it has changed a lot in detail the overall structure has remained the same. The idea was to bring the whole thing up to date.
The run-time system interface has been redesigned and instead of a vector of entries to the run-time system the ML code now uses a form of foreign-function interface to call individual functions. The advantage is that it is much easier to add new functions. In addition when building an executable it is possible for the linker to select only the parts of libpolyml that are actually required for the application, at least if the static library version is used. It should be possible to adapt the foreign-function interface it uses to speed up calls made using the Foreign structure, although that's not done at the moment.
The code-generator has been rewritten particularly the register assignment. The previous version had been the result of a series of adaptations to new architectures over the years. The new version focusses solely on the X86. I'm still working on this. Doing this has needed a temporary, slow, code-generator which is why it has taken until now to push this to git master.
The representation of strings has been simplified. Previously, single character strings were represented by the character itself as a tagged value. This was largely a relic of SML 90 which didn't have a separate char type. That has been removed and all strings are now represented as a vector of bytes. This speeds up string operations since the code no longer has to consider the special case.
SSE2 instructions are now used for floating point on the X86/64. Floating point support in the new code-generator is rudimentary at the moment and not to the same extent as the previous code-generator.
The handling of return addresses from functions has been simplified. A consequence of this is that when pieces of code are compiled they are stored in a separate area of memory rather than in the general heap and not garbage-collected. It is no longer possible to get an exception trace so PolyML.exception_trace has no effect. The debugger handles this much better anyway.
Although the focus has been on the X86 the portable byte-code interpreter has been improved and is significantly faster.
The system has had some basic testing but there are bound to be bugs in something as complex as this. I'm also continuing to work on improvements. When testing this it is essential to run "make compiler" at least once and generally twice to build the new compiler and then build the compiler itself with the new compiler.
David
David,
I have been having a go at building ProofPower with the latest updates. Here is some miscellaneous feedback:
1) Are these two errors in the configure output harmless? (This is on Mac OS X using Apple?s Xcode tools.)
checking for __attribute__((visibility("hidden")))... no clang: error: unsupported option '-print-multi-os-directory' clang: error: no input files checking that generated files are newer than configure? done
2) SML90.explode crashes. The following patch fixes this:
--- a/basis/SML90.sml +++ b/basis/SML90.sml @@ -81,9 +81,8 @@ struct fun ord "" = raise Ord | ord s = Char.ord(String.sub(s, 0))
fun chr i = str(Char.chr i) - (* Because single character strings are represented by the characters - themselves we just need to coerce String.explode. *) - val explode: string -> string list = RunCall.unsafeCast(String.explode) + + val explode: string -> string list = map str o String.explode val implode = String.concat
type instream = TextIO.instream and outstream = TextIO.outstream
3) If I configure with ?enable-intinf-as-int, make compiler fails:
Use: basis/IMPERATIVE_IO.sml Use: basis/ImperativeIO.sml Use: basis/TextIO.sml Exception- InternalError: findEntry - not found raised while compiling
make[3]: *** [polyexport.o] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all] Error 2 make: *** [compiler] Error 2
Not being able to build with ?enable-intinf-as-int isn?t a stopper for me, but when I press on, I am getting a segfault somewhere in the ProofPower parser generator. I will report again when I have isolated that.
Regards,
Rob.
On 16 Sep 2016, at 13:24, David Matthews <David.Matthews at prolingua.co.uk> wrote:
I have now pushed a major update to git master which is the result of work going back to the beginning of the year. It covers a number of areas but largely the code-generator and the run-time system interface.
The basic design of much of the low-level parts of Poly/ML dated back to the early days. While it has changed a lot in detail the overall structure has remained the same. The idea was to bring the whole thing up to date.
The run-time system interface has been redesigned and instead of a vector of entries to the run-time system the ML code now uses a form of foreign-function interface to call individual functions. The advantage is that it is much easier to add new functions. In addition when building an executable it is possible for the linker to select only the parts of libpolyml that are actually required for the application, at least if the static library version is used. It should be possible to adapt the foreign-function interface it uses to speed up calls made using the Foreign structure, although that's not done at the moment.
The code-generator has been rewritten particularly the register assignment. The previous version had been the result of a series of adaptations to new architectures over the years. The new version focusses solely on the X86. I'm still working on this. Doing this has needed a temporary, slow, code-generator which is why it has taken until now to push this to git master.
The representation of strings has been simplified. Previously, single character strings were represented by the character itself as a tagged value. This was largely a relic of SML 90 which didn't have a separate char type. That has been removed and all strings are now represented as a vector of bytes. This speeds up string operations since the code no longer has to consider the special case.
SSE2 instructions are now used for floating point on the X86/64. Floating point support in the new code-generator is rudimentary at the moment and not to the same extent as the previous code-generator.
The handling of return addresses from functions has been simplified. A consequence of this is that when pieces of code are compiled they are stored in a separate area of memory rather than in the general heap and not garbage-collected. It is no longer possible to get an exception trace so PolyML.exception_trace has no effect. The debugger handles this much better anyway.
Although the focus has been on the X86 the portable byte-code interpreter has been improved and is significantly faster.
The system has had some basic testing but there are bound to be bugs in something as complex as this. I'm also continuing to work on improvements. When testing this it is essential to run "make compiler" at least once and generally twice to build the new compiler and then build the compiler itself with the new compiler.
David _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
David,
On 17 Sep 2016, at 14:49, Rob Arthan <rda at lemma-one.com> wrote:
?
? I am getting a segfault somewhere in the ProofPower parser generator. I will report again when I have isolated that.
To the list of three issues in my previous e-mail, I can now add:
4) In some circumstances a function call can lead to a segfault.
I?ve attached a short extract from the parser generator code that demonstrates the problem. If you execute the body of the function empty_non_terminals interactively with the parameter bound to the test data, then nothing goes wrong. If you call the function with the test data as parameter (as on the last line of the attached file) you get a segfault.
Regards,
Rob.
Rob, Thanks for that and for cutting it down to something manageable. As a result I've been able to track down and fix the problem. Let me know as and when you find anything else.
Regards, David
On 17/09/2016 16:20, Rob Arthan wrote:
David,
On 17 Sep 2016, at 14:49, Rob Arthan <rda at lemma-one.com> wrote:
?
? I am getting a segfault somewhere in the ProofPower parser generator. I will report again when I have isolated that.
To the list of three issues in my previous e-mail, I can now add:
- In some circumstances a function call can lead to a segfault.
I?ve attached a short extract from the parser generator code that demonstrates the problem. If you execute the body of the function empty_non_terminals interactively with the parameter bound to the test data, then nothing goes wrong. If you call the function with the test data as parameter (as on the last line of the attached file) you get a segfault.
Regards,
Rob.
David,
Many thanks for the quick turn round on these issues. I have got quite a bit further. The ProofPower build is now failing as a result of the following issue:
5) In string comparisons, the leading character is being treated as negative if it has character code greater than 127:
E.g.,
"\128" < "\000";
val it = true: bool
"\128a" < "\000a";
val it = true: bool
Character comparisons are OK:
#"\128" < #"\000";
val it = false: bool
(This is on Mac OS X after building Poly/ML with Apple's Xcode tool chain.)
Regards,
Rob.
On 18 Sep 2016, at 16:36, David Matthews <David.Matthews at prolingua.co.uk> wrote:
Rob, Thanks for that and for cutting it down to something manageable. As a result I've been able to track down and fix the problem. Let me know as and when you find anything else.
Regards, David
On 17/09/2016 16:20, Rob Arthan wrote:
David,
On 17 Sep 2016, at 14:49, Rob Arthan <rda at lemma-one.com> wrote:
?
? I am getting a segfault somewhere in the ProofPower parser generator. I will report again when I have isolated that.
To the list of three issues in my previous e-mail, I can now add:
- In some circumstances a function call can lead to a segfault.
I?ve attached a short extract from the parser generator code that demonstrates the problem. If you execute the body of the function empty_non_terminals interactively with the parameter bound to the test data, then nothing goes wrong. If you call the function with the test data as parameter (as on the last line of the attached file) you get a segfault.
Regards,
Rob.
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Rob, Thanks for that. It was using signed rather than unsigned comparisons. I've pushed a fix.
Regards, David
On 18/09/2016 22:35, Rob Arthan wrote:
Many thanks for the quick turn round on these issues. I have got quite a bit further. The ProofPower build is now failing as a result of the following issue:
- In string comparisons, the leading character is being treated as negative
if it has character code greater than 127:
E.g.,
"\128" < "\000";
val it = true: bool
"\128a" < "\000a";
val it = true: bool
David,
I have just pulled get rev d7b9234f2793aef14b984ad808bbdfc6e1c59403 and get the following when I do make compiler (after configure with no options other than ?prefix=/usr/local/poly/dev):
Making all in libpolymain make[3]: Nothing to be done for `all'. Making all in . ./polyimport polytemp.txt -I . < ./exportPoly.sml Assertion failed: (ch == '\n'), function DoImport, file pexport.cpp, line 728. /bin/sh: line 1: 98398 Abort trap: 6 ./polyimport polytemp.txt -I . < ./exportPoly.sml make[3]: *** [polyexport.o] Error 134 make[2]: *** [all-recursive] Error 1 make[1]: *** [all] Error 2 make: *** [compiler] Error 2
Regards,
Rob.
Rob, Thanks for your feedback. I've managed to fix some of the issues. I haven't yet looked at your crash but that's next on my list.
On 17/09/2016 14:49, Rob Arthan wrote:
David,
I have been having a go at building ProofPower with the latest updates. Here is some miscellaneous feedback:
- Are these two errors in the configure output harmless? (This is on
Mac OS X using Apple?s Xcode tools.)
checking for __attribute__((visibility("hidden")))... no clang: error: unsupported option '-print-multi-os-directory' clang: error: no input files checking that generated files are newer than configure? done
This seems to be something that autoconf has put in for libtool. I would assume it's harmless and I'd guess it would show up in lots of projects.
- SML90.explode crashes. The following patch fixes this:
Thanks. That code should have been fixed a long time ago but really the SML90 structure should be retired.
- If I configure with ?enable-intinf-as-int, make compiler fails:
Use: basis/IMPERATIVE_IO.sml Use: basis/ImperativeIO.sml Use: basis/TextIO.sml Exception- InternalError: findEntry - not found raised while compiling
I've fixed that and it now builds. This may also fix the similar failure that Michael reported in HOL4.
Regards, David
On 16/09/16 14:24, David Matthews wrote:
I have now pushed a major update to git master which is the result of work going back to the beginning of the year. It covers a number of areas but largely the code-generator and the run-time system interface.
This all sounds very interesting and ambitious. After 30 years it will inevitably require subtle adjustments for Isabelle, but I am confident that we can once again get to a new stage of performance and robustness.
The handling of return addresses from functions has been simplified. A consequence of this is that when pieces of code are compiled they are stored in a separate area of memory rather than in the general heap and not garbage-collected.
Does that mean that run-time (re)compilation is a now potential memory leak?
I've heard that the JVM has only recently learned to do proper garbage collection of dynamically loaded modules.
It is no longer possible to get an exception trace so PolyML.exception_trace has no effect. The debugger handles this much better anyway.
Luckily, we have already ML_exception_debugger to replace the old ML_exception_trace.
The system has had some basic testing but there are bound to be bugs in something as complex as this. I'm also continuing to work on improvements. When testing this it is essential to run "make compiler" at least once and generally twice to build the new compiler and then build the compiler itself with the new compiler.
I am presently testing Poly/ML 38879127481c and Isabelle 9aed2da07200 and ran into a compiler problem in src/Pure/unify.ML:
Exception- Fail "Exception- InternalError: chooseReg raised while compiling" raised
To reproduce this, you need to update a local Isabelle repository clone as usual:
$ cd isabelle-repos
$ hg pull -u
$ ./bin/isabelle components -a
Use a local $HOME/.isabelle/etc/settings as usual like this:
ML_PLATFORM=x86-linux ML_HOME="$HOME/lib/polyml/${ML_PLATFORM}" ML_SYSTEM=polyml-5.6.1 ML_OPTIONS="-H 1500 --gcthreads 6"
It is important that the ML_HOME directory does *not* contain the old libsha1.so that we usually include: it does not work with the new Poly/ML and can be ignored for the moment.
Now you can invoke "./bin/isabelle build Pure" and it should run until the above compiler failure.
Makarius
On 19/09/2016 21:15, Makarius wrote:
The handling of return addresses from functions has been simplified. A consequence of this is that when pieces of code are compiled they are stored in a separate area of memory rather than in the general heap and not garbage-collected.
Does that mean that run-time (re)compilation is a now potential memory leak?
I've heard that the JVM has only recently learned to do proper garbage collection of dynamically loaded modules.
Yes. I can't see a way round it at the moment. It would be difficult to produce an example where the only reference to the code of a function was through the return address but it could happen if a thread started to execute a function contained in a reference and then the reference was overwritten.
Currently, there is a leak because each top-level expression is compiled down to machine code even though the code is only executed once. My plan is to avoid that by interpreting the top-level rather than fully compiling it.
The system has had some basic testing but there are bound to be bugs in something as complex as this. I'm also continuing to work on improvements. When testing this it is essential to run "make compiler" at least once and generally twice to build the new compiler and then build the compiler itself with the new compiler.
I am presently testing Poly/ML 38879127481c and Isabelle 9aed2da07200 and ran into a compiler problem in src/Pure/unify.ML:
Exception- Fail "Exception- InternalError: chooseReg raised while compiling" raised
I'll have a look at that.
Regards, David
On 20/09/16 14:15, David Matthews wrote:
On 19/09/2016 21:15, Makarius wrote:
The handling of return addresses from functions has been simplified. A consequence of this is that when pieces of code are compiled they are stored in a separate area of memory rather than in the general heap and not garbage-collected.
Does that mean that run-time (re)compilation is a now potential memory leak?
I've heard that the JVM has only recently learned to do proper garbage collection of dynamically loaded modules.
Yes. I can't see a way round it at the moment. It would be difficult to produce an example where the only reference to the code of a function was through the return address but it could happen if a thread started to execute a function contained in a reference and then the reference was overwritten.
Currently, there is a leak because each top-level expression is compiled down to machine code even though the code is only executed once. My plan is to avoid that by interpreting the top-level rather than fully compiling it.
That reminds me a bit of OCaml.
I have always considered it one of the big assets of Poly/ML that fully native compilation can be performed at run-time, without any difference to off-line compiled code.
Just a few weeks ago, I explained that to an audience that only knew boring static compilation in the manner of cc.
My understanding of an LCF-style proof assistant is that compilation and execution of the program can happen continuously all the time: new code is added (to implement tools and packages), then new definitions and proofs are performed, this is repeated indefinitely.
This is also the deeper reason, why we could never use Mlton. (Another reason is that compilation needs to be fast, in order to be able to develop all these fine tools interactively.)
Since Isabelle manages the invocation of the Poly/ML compiler itself, we can probably easily cope with that either way: keeping full compilation or degrading to interpretation. Although, I would probably just keep compilation let memory fill up in regular applications: not so much code is loaded after the main HOL image is finished.
I do wonder how the classic LCF-style proof assistants would cope with that, notably ProofPower and HOL4.
Makarius
On 17/10/2016 20:38, Makarius wrote:
On 20/09/16 14:15, David Matthews wrote:
On 19/09/2016 21:15, Makarius wrote:
The handling of return addresses from functions has been simplified. A consequence of this is that when pieces of code are compiled they are stored in a separate area of memory rather than in the general heap and not garbage-collected.
Does that mean that run-time (re)compilation is a now potential memory leak?
I've heard that the JVM has only recently learned to do proper garbage collection of dynamically loaded modules.
Yes. I can't see a way round it at the moment. It would be difficult to produce an example where the only reference to the code of a function was through the return address but it could happen if a thread started to execute a function contained in a reference and then the reference was overwritten.
Currently, there is a leak because each top-level expression is compiled down to machine code even though the code is only executed once. My plan is to avoid that by interpreting the top-level rather than fully compiling it.
That reminds me a bit of OCaml.
I have always considered it one of the big assets of Poly/ML that fully native compilation can be performed at run-time, without any difference to off-line compiled code.
I think there may be some misunderstanding here. I am not suggesting changing the way functions are compiled or making a distinction between code compiled interactively or through "use".
The result of compiling each function is a piece of memory containing the machine code for the function. However a piece of machine code is also produced for each top-level "program" to perform whatever side-effects or evaluations are necessary. It is this "extra" code that I am suggesting could be interpreted. It is only executed once, usually immediately after it has been compiled. Generally, the cost of compiling it to fully optimised machine code totally outweighs the execution time.
The reason for actually compiling the top-level is simplicity: it can be treated just like a function. Up till now it has simply been garbage collected away once it has been executed but that's no longer the case. With the latest changes every piece of code is kept whether it is reachable or not.
Although the lack of garbage collection of code would mean that repeatedly defining the same function would be a memory leak I would be surprised if it was a serious problem. Is it likely that one would repeatedly redefine the same function within a particular session?
David
On 17/10/16 23:58, David Matthews wrote:
Although the lack of garbage collection of code would mean that repeatedly defining the same function would be a memory leak I would be surprised if it was a serious problem. Is it likely that one would repeatedly redefine the same function within a particular session?
This happens all the time in IDE interaction: things are compiled, edited, re-compiled; thus old this become inaccessible.
You introduced that principle yourself many years ago, by providing the very nice PolyML.Compiler interface.
That is one of the big assets of Poly/ML and consequently of Isabelle/ML.
Makarius
On 18/10/2016 13:43, Makarius wrote:
On 17/10/16 23:58, David Matthews wrote:
Although the lack of garbage collection of code would mean that repeatedly defining the same function would be a memory leak I would be surprised if it was a serious problem. Is it likely that one would repeatedly redefine the same function within a particular session?
This happens all the time in IDE interaction: things are compiled, edited, re-compiled; thus old this become inaccessible.
You introduced that principle yourself many years ago, by providing the very nice PolyML.Compiler interface.
That is one of the big assets of Poly/ML and consequently of Isabelle/ML.
Yes, but the actual memory needed for the function code is not going to be large compared with the total heap size. We have heaps in the orders of gigabytes but the whole of Isabelle is just tens of megabytes.
Perhaps I should explain why I made this change and what could be done to mitigate the effects. There were two reasons. The first was to simplify the code and avoid the contortions that were necessary and the second was for security and long-term stability.
For the garbage collector to be able to compact the heap it has to be able to find and modify all the addresses of heap cells. To do that values are distinguished by a tag bit. If the bottom bit is set the value is an integer and is ignored by the GC. Other values are addresses. An address points always to the start of a cell so will be word aligned, either XXXX00 (32-bit) or XXX000 (64-bit) in the bottom bits. Before the start of a cell is a word containing the length of the cell and some bits that indicate whether the cell is a tuple/vector containing values (i.e. either tagged integers or addresses) or is byte data, typically a string.
This is extended to cope with cells containing machine code. This is just another type of cell. A code cell is not quite byte data because it can contain addresses if there are values that are compile-time constants.
Provided we're dealing with the entry point addresses of code cells this all works fine. The complication comes when the code is actually executed. If one function calls another, or even itself recursively, it uses the X86 CALL instruction. It is important to use this instruction and not try to do the function call any other way because among other things the prefetching hardware recognises CALL/RET pairs. The CALL instruction pushes the return address, the address of the next instruction, to the stack.
This, though, causes problems for the GC. Return addresses are inherently addresses into the middle of cells. They are also on an arbitrary alignment since X86 instructions are not aligned in any way. For the GC to be able to compact the heap it has to be able to find and update the return addresses. If Poly/ML used "stack frames" it might be possible to find return addresses using the frame pointer register but using frame pointers requires an extra register and increases the cost of every function call. Instead the code-generator added no-op instructions before each CALL such that the return address after the instruction was on a word+2 byte alignment i.e. XXX10 in the bottom bits. This is neither an integer nor a word address so the GC can recognise these as return addresses. There is still the problem of finding the actual start of the code cell and to do this there is a zero word at the end of each code cell. That requires that the code-generator never generates a full word of zeros anywhere else in the code, or at least not on a word boundary, so there are a few constraints on the code to ensure that is the case.
Removing the code from the normal heap and using a separate, non-garbage collected area means that these contortions are no longer needed. The length word is retained since it is needed when copying the code to an object file or saved state and when loading from a saved state.
The other reason for making the change is that having the code cells in the normal heap requires the heap to be given read, write and execute permissions. This is a problem for security and I was concerned that a future operating system update might ban the use of both write and execute permissions on the same area of memory. I think this is already an issue with SELinux. Using a separate "code" area avoids this. Although the code area needs to be writable to add new code cells to it there are tricks that can be used to get round this.
It might be possible to use a non-compacting GC on the code area i.e. to mark code cells that are no longer reachable and then reuse the space. That can lead to fragmentation but would reduce the memory leak. In almost all cases a code cell that is reachable will have at least one address in the heap that points at the start. It is, though, possible to construct pathological cases where the only reference is through a return address. We're not going to change the return address since we're not compacting so it might be possible to get round that by assuming that any value on the stack that looks like it might be a return address is a sufficient reason to keep the code cell, even if it is actually an integer.
Sorry that's been rather long but I wanted to put on record the thinking behind the change and maybe get some other ideas.
David
On 17 Oct 2016, at 20:38, Makarius <makarius at sketis.net> wrote:
On 20/09/16 14:15, David Matthews wrote:
... Currently, there is a leak because each top-level expression is compiled down to machine code even though the code is only executed once. My plan is to avoid that by interpreting the top-level rather than fully compiling it.
... I do wonder how the classic LCF-style proof assistants would cope with that, notably ProofPower and HOL4.
Thanks for thinking of us!
The ProofPower model is of incremental development of a database containing both data and functions to process it held in a persistent object store as provided in Poly/ML by the PolyML.SaveState structure. A great deal of the data will comprise syntax trees (proved theorems) that have each been calculated by evaluating a large ML expression (a proof) that is executed just once.
Most users will keep all the source files they used to build a database and will be prepared to rebuild from scratch quite frequently. I suspect that this use case won?t be badly affected by the leak - presumably the leak will be roughly proportional to the size of the proof scripts (and so it will be megabytes rather than gigabytes).
During interactive development of a proof script it is usual to try things out repeatedly with frequent evaluations of minor variants of an attempted proof step until you find one that works. It is ls also perfectly possible for a user to develop a ProofPower database the way people develop spreadsheets and SQL databases, incrementally adding things over a prolonged period of time. It is not clear to me how bad the impact would be in these use cases. Is there a way to find out how much memory is occupied by compiled code? If so, then I could try some experiments to quantify the impact.
On 19 Oct 2016, at 12:34, David Matthews <David.Matthews at prolingua.co.uk> wrote: The new mechanism for handling pieces of code deals with most of the issues apart from the question of garbage collection. I was really trying to get a feeling for how serious this was. From the comments on the mailing list it looks as though this is something that is wanted so I'll try and see if I can find a solution.
And thank you for thinking of us!
If it makes the solution easier, I don?t think there is any need to include garbage collection for code in the on-the-fly garbage collection. I think it would be fine to implement it either as an ML function on its own or built into PolyML.SaveState.saveChild (and friends) and PolyML.export.
Regards,
Rob.
On 20/10/2016 12:14, Rob Arthan wrote:
During interactive development of a proof script it is usual to try things out repeatedly with frequent evaluations of minor variants of an attempted proof step until you find one that works. It is ls also perfectly possible for a user to develop a ProofPower database the way people develop spreadsheets and SQL databases, incrementally adding things over a prolonged period of time. It is not clear to me how bad the impact would be in these use cases. Is there a way to find out how much memory is occupied by compiled code? If so, then I could try some experiments to quantify the impact.
On 19 Oct 2016, at 12:34, David Matthews <David.Matthews at prolingua.co.uk> wrote: The new mechanism for handling pieces of code deals with most of the issues apart from the question of garbage collection. I was really trying to get a feeling for how serious this was. From the comments on the mailing list it looks as though this is something that is wanted so I'll try and see if I can find a solution.
And thank you for thinking of us!
If it makes the solution easier, I don?t think there is any need to include garbage collection for code in the on-the-fly garbage collection. I think it would be fine to implement it either as an ML function on its own or built into PolyML.SaveState.saveChild (and friends) and PolyML.export.
Just one point. There is only a leak for redefinitions of functions while they are in local memory. If you load a saved state, redefine a function in it and then save a new state then the old code will be replaced with the new code, just as before.
The only difficulty is with garbage collecting code that might possibly still be executing. Once it has been written to disc there's no problem.
David
On 20 Oct 2016, at 18:01, David Matthews <David.Matthews at prolingua.co.uk> wrote:
On 20/10/2016 12:14, Rob Arthan wrote:
During interactive development of a proof script it is usual to try things out repeatedly with frequent evaluations of minor variants of an attempted proof step until you find one that works. It is ls also perfectly possible for a user to develop a ProofPower database the way people develop spreadsheets and SQL databases, incrementally adding things over a prolonged period of time. It is not clear to me how bad the impact would be in these use cases. Is there a way to find out how much memory is occupied by compiled code? If so, then I could try some experiments to quantify the impact.
On 19 Oct 2016, at 12:34, David Matthews <David.Matthews at prolingua.co.uk> wrote: The new mechanism for handling pieces of code deals with most of the issues apart from the question of garbage collection. I was really trying to get a feeling for how serious this was. From the comments on the mailing list it looks as though this is something that is wanted so I'll try and see if I can find a solution.
And thank you for thinking of us!
If it makes the solution easier, I don?t think there is any need to include garbage collection for code in the on-the-fly garbage collection. I think it would be fine to implement it either as an ML function on its own or built into PolyML.SaveState.saveChild (and friends) and PolyML.export.
Just one point. There is only a leak for redefinitions of functions while they are in local memory. If you load a saved state, redefine a function in it and then save a new state then the old code will be replaced with the new code, just as before.
That sounds very promising. Is it also the case that space for code that was only generated to calculate the value of a top level binding will be reclaimed when you save state or export an object file? E.g., the code generated for the right-hand side of the following binding.
val fact10 = let fun f i = if i <= 0 then 1 else i * f(i-1) in f 10 end;
If so, then the only one of my use cases that might have a problem is interactive development. I strongly suspect this is not going to be significant in practice. As I said, if there is a way to see how much memory is occupied by compiled code, I could do some experiments to see if my suspicion is correct.
Regards,
Rob.
I've now pushed the first phase of garbage collection of the code area to git master. This marks unreachable code segments and fills them with HLT instructions so that any attempt to execute a segment that is supposedly garbage will produce an illegal instruction fault. I've dealt with return addresses by considering any value that could be an address into a code segment as a potential return address. This works for my previous example. Finding the start of a code segment from a return address is done by having a bitmap to indicate the start address. This is a small overhead but much less than would be needed if we had to cover the whole heap.
The next step is to reuse code space that has been identified as garbage.
David
On 20/10/2016 21:57, Rob Arthan wrote:
That sounds very promising. Is it also the case that space for code that was only generated to calculate the value of a top level binding will be reclaimed when you save state or export an object file? E.g., the code generated for the right-hand side of the following binding.
val fact10 = let fun f i = if i <= 0 then 1 else i * f(i-1) in f 10 end;
If so, then the only one of my use cases that might have a problem is interactive development. I strongly suspect this is not going to be significant in practice. As I said, if there is a way to see how much memory is occupied by compiled code, I could do some experiments to see if my suspicion is correct.
Regards,
Rob.
Makarius,
On 19/09/2016 21:15, Makarius wrote:
I am presently testing Poly/ML 38879127481c and Isabelle 9aed2da07200 and ran into a compiler problem in src/Pure/unify.ML:
Exception- Fail "Exception- InternalError: chooseReg raised while compiling" raised
I've pushed a fix and it no longer fails here. It appears to finish compiling and then hang.
Regards, David
On 22/09/16 16:50, David Matthews wrote:
On 19/09/2016 21:15, Makarius wrote:
I am presently testing Poly/ML 38879127481c and Isabelle 9aed2da07200 and ran into a compiler problem in src/Pure/unify.ML:
Exception- Fail "Exception- InternalError: chooseReg raised while compiling" raised
I've pushed a fix and it no longer fails here. It appears to finish compiling and then hang.
Somehow our ML threads get entangled or deadlocked.
For now, it can be built like this:
isabelle build -o threads=1 Pure
Maybe I should make this the default for the bootstrap: with that already running, it is much easier to work with Isabelle/ML using the Prover IDE.
I will look again later, if I can find anything on my side of the multithreading implementation.
Makarius
On 23/09/2016 17:35, Makarius wrote:
On 22/09/16 16:50, David Matthews wrote:
On 19/09/2016 21:15, Makarius wrote:
I am presently testing Poly/ML 38879127481c and Isabelle 9aed2da07200 and ran into a compiler problem in src/Pure/unify.ML:
Exception- Fail "Exception- InternalError: chooseReg raised while compiling" raised
I've pushed a fix and it no longer fails here. It appears to finish compiling and then hang.
Somehow our ML threads get entangled or deadlocked.
For now, it can be built like this:
isabelle build -o threads=1 Pure
Maybe I should make this the default for the bootstrap: with that already running, it is much easier to work with Isabelle/ML using the Prover IDE.
I will look again later, if I can find anything on my side of the multithreading implementation.
I've run tests with the 64-bit version and also the byte-code interpreted version. They both completed successfully so I suspect a problem with the code-generator. It would be very useful to be able to pin this down. I had a look with gdb at the hanging state and it looks as though there are various threads waiting on a mutex.
Regards, David
On 16/09/16 14:24, David Matthews wrote:
The representation of strings has been simplified. Previously, single character strings were represented by the character itself as a tagged value. This was largely a relic of SML 90 which didn't have a separate char type. That has been removed and all strings are now represented as a vector of bytes. This speeds up string operations since the code no longer has to consider the special case.
We have actually exploited this singleton string representation systematically since 1998, when we explicitly decided against Unicode (which was 16-bit widechar at that time) and adopted a variable-length character representation of so-called "Isabelle symbols". UTF-8 did not exist yet, but today it fits nicely into the same scheme -- it has room for special characters of arbitrary length.
Thus SML90.explode only had to be upgraded to our Symbol.explode, which mostly produces singleton strings and sometimes longer ones. Type char is never used, because it is not sufficient to represent a text character.
Note that today, the variable-length nature of Unicode is generally acknowledged (see http://utf8everywhere.org) and arrays of fixed-length characters considered legacy.
It should be actually easy for Isabelle/ML to cope with the new representation of Poly/ML, using a table of pre-allocated singleton strings. I have done something similar for the Isabelle/Scala version of Symbol.explode, since the JVM is based on legacy UTF-16 widestring representation.
I can do the same for the Isabelle/ML library, but there might be a general benefit if basis/String.sml would do it uniformly in charAsstring and thus in String.str etc.
Instead of creating a new singleton string each time, it would refer to a vector of 256 pre-allocated values.
Makarius
On 19/09/2016 21:54, Makarius wrote:
On 16/09/16 14:24, David Matthews wrote:
The representation of strings has been simplified. Previously, single character strings were represented by the character itself as a tagged value. This was largely a relic of SML 90 which didn't have a separate char type. That has been removed and all strings are now represented as a vector of bytes. This speeds up string operations since the code no longer has to consider the special case.
It should be actually easy for Isabelle/ML to cope with the new representation of Poly/ML, using a table of pre-allocated singleton strings. I have done something similar for the Isabelle/Scala version of Symbol.explode, since the JVM is based on legacy UTF-16 widestring representation.
I can do the same for the Isabelle/ML library, but there might be a general benefit if basis/String.sml would do it uniformly in charAsstring and thus in String.str etc.
Instead of creating a new singleton string each time, it would refer to a vector of 256 pre-allocated values.
That seems a very good idea. I've implemented it.
Regards, David
On 20/09/16 14:16, David Matthews wrote:
Instead of creating a new singleton string each time, it would refer to a vector of 256 pre-allocated values.
That seems a very good idea. I've implemented it.
I've tried to compile this version d7b9234f2793 using configure --enable-intinf-as-int, but it fails towards the end as follows:
./polyimport --intIsIntInf polytemp.txt -I . < ./exportPoly.sml polyimport: pexport.cpp:728: bool PImport::DoImport(): Assertion `ch == '\n'' failed. /bin/bash: line 1: 515 Aborted (core dumped) ./polyimport --intIsIntInf
Makarius
David, Makarius,
I think this patch fixes it:
diff --git a/libpolyml/pexport.cpp b/libpolyml/pexport.cpp index b03b1da..a9ebd2e 100644 --- a/libpolyml/pexport.cpp +++ b/libpolyml/pexport.cpp @@ -158,7 +158,7 @@ void PExport::printObject(PolyObject *p) for (unsigned i = 0; i < ps->length; i++) { char ch = ps->chars[i]; - fprintf(exportFile, "%02x", ch); + fprintf(exportFile, "%02x", ch & 0xff); } } else
Regards,
Rob.
On 20 Sep 2016, at 15:04, Makarius <makarius at sketis.net> wrote:
On 20/09/16 14:16, David Matthews wrote:
Instead of creating a new singleton string each time, it would refer to a vector of 256 pre-allocated values.
That seems a very good idea. I've implemented it.
I've tried to compile this version d7b9234f2793 using configure --enable-intinf-as-int, but it fails towards the end as follows:
./polyimport --intIsIntInf polytemp.txt -I . < ./exportPoly.sml polyimport: pexport.cpp:728: bool PImport::DoImport(): Assertion `ch == '\n'' failed. /bin/bash: line 1: 515 Aborted (core dumped) ./polyimport --intIsIntInf
Makarius
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
David,
After using the patch in my last post to enable me to build poly, ProofPower now builds and behaves as I would expect it to when compiled without ?enable-intinf-as-int. When I compile with ?enable-intinf-as-int I get a problem because the type of PolyML.PrettyBlock has a FixedInt in it where it had just in in version 5.6.
PolyML.PrettyBlock;
val it = fn: FixedInt.int * bool * PolyML.context list * PolyML.pretty list -> PolyML.pretty
Is that intended? I can?t understand why it has happening because the datatype declaration in PRETTYSIG.sml uses int. If this is intended then I can work round it, but it would be nice not to have to.
Regards,
Rob.
On 20 Sep 2016, at 15:50, Rob Arthan <rda at lemma-one.com> wrote:
David, Makarius,
I think this patch fixes it:
diff --git a/libpolyml/pexport.cpp b/libpolyml/pexport.cpp index b03b1da..a9ebd2e 100644 --- a/libpolyml/pexport.cpp +++ b/libpolyml/pexport.cpp @@ -158,7 +158,7 @@ void PExport::printObject(PolyObject *p) for (unsigned i = 0; i < ps->length; i++) { char ch = ps->chars[i];
fprintf(exportFile, "%02x", ch);
fprintf(exportFile, "%02x", ch & 0xff); } } else
Regards,
Rob.
On 20 Sep 2016, at 15:04, Makarius <makarius at sketis.net> wrote:
On 20/09/16 14:16, David Matthews wrote:
Instead of creating a new singleton string each time, it would refer to a vector of 256 pre-allocated values.
That seems a very good idea. I've implemented it.
I've tried to compile this version d7b9234f2793 using configure --enable-intinf-as-int, but it fails towards the end as follows:
./polyimport --intIsIntInf polytemp.txt -I . < ./exportPoly.sml polyimport: pexport.cpp:728: bool PImport::DoImport(): Assertion `ch == '\n'' failed. /bin/bash: line 1: 515 Aborted (core dumped) ./polyimport --intIsIntInf
Makarius
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Rob, The complication is that the pretty print datatype is essentially shared between the compiler and the compiled code. If the compiler has been compiled with a different length of int from the user code there's the potential for confusion. The compiler builds default print functions for types and I was trying to ensure that both sides had the same understanding.
It may be that it isn't a problem. I'll think about this some more.
The "datatype" definition in PRETTYSIG.sml is commented out. The "constructors" actually use FixedInt.int.
Regards, David
On 20/09/2016 16:40, Rob Arthan wrote:
David,
After using the patch in my last post to enable me to build poly, ProofPower now builds and behaves as I would expect it to when compiled without ?enable-intinf-as-int. When I compile with ?enable-intinf-as-int I get a problem because the type of PolyML.PrettyBlock has a FixedInt in it where it had just in in version 5.6.
PolyML.PrettyBlock;
val it = fn: FixedInt.int * bool * PolyML.context list * PolyML.pretty list -> PolyML.pretty
Is that intended? I can?t understand why it has happening because the datatype declaration in PRETTYSIG.sml uses int. If this is intended then I can work round it, but it would be nice not to have to.
Regards,
Rob.
David,
On 20 Sep 2016, at 19:00, David Matthews <David.Matthews at prolingua.co.uk> wrote:
Rob, The complication is that the pretty print datatype is essentially shared between the compiler and the compiled code. If the compiler has been compiled with a different length of int from the user code there's the potential for confusion. The compiler builds default print functions for types and I was trying to ensure that both sides had the same understanding.
It may be that it isn't a problem. I'll think about this some more.
I have now tried a ProofPower build with a work-around for this and everything is working. It would be nice not to have to work around this. As things stand, the PolyML structure contains a rather odd mixture of int and FixedInt.int when you compile with ?enable-intinf-as-int. E.g., compare prettyPrint with addPrettyPrinter.
The "datatype" definition in PRETTYSIG.sml is commented out. The "constructors" actually use FixedInt.int.
I didn?t notice that!
Regards,
Rob.
Regards, David
On 20/09/2016 16:40, Rob Arthan wrote:
David,
After using the patch in my last post to enable me to build poly, ProofPower now builds and behaves as I would expect it to when compiled without ?enable-intinf-as-int. When I compile with ?enable-intinf-as-int I get a problem because the type of PolyML.PrettyBlock has a FixedInt in it where it had just in in version 5.6.
PolyML.PrettyBlock;
val it = fn: FixedInt.int * bool * PolyML.context list * PolyML.pretty list -> PolyML.pretty
Is that intended? I can?t understand why it has happening because the datatype declaration in PRETTYSIG.sml uses int. If this is intended then I can work round it, but it would be nice not to have to.
Regards,
Rob.
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Rob, I've had a look at this and it does appear to be more complicated than I thought. The pretty type is embedded fairly deeply in the compiler.
The difference between PolyML.prettyPrint and PolyML.addPrettyPrinter is that PolyML.addPrettyPrinter is an infinitely overloaded function, like PolyML.print and PolyML.makestring, and the FixedInt.int argument here is the the depth of the printing before "..." is used. The compiler builds pretty printing functions for types as they are defined and the code decrements and tests the argument using fixed precision arithmetic. All pretty printers have to use fixed precision arithmetic for the depth for everything to be safe.
PolyML.prettyPrint is defined in the basis library and the int argument is the line width. It uses FixedInt.toInt and FixedInt.fromInt where necessary so works with either fixed or arbitrary precision int.
Regards, David
On 21/09/2016 11:46, Rob Arthan wrote:
David,
On 20 Sep 2016, at 19:00, David Matthews <David.Matthews at prolingua.co.uk> wrote:
Rob, The complication is that the pretty print datatype is essentially shared between the compiler and the compiled code. If the compiler has been compiled with a different length of int from the user code there's the potential for confusion. The compiler builds default print functions for types and I was trying to ensure that both sides had the same understanding.
It may be that it isn't a problem. I'll think about this some more.
I have now tried a ProofPower build with a work-around for this and everything is working. It would be nice not to have to work around this. As things stand, the PolyML structure contains a rather odd mixture of int and FixedInt.int when you compile with ?enable-intinf-as-int. E.g., compare prettyPrint with addPrettyPrinter.
The "datatype" definition in PRETTYSIG.sml is commented out. The "constructors" actually use FixedInt.int.
I didn?t notice that!
Regards,
Rob.
Regards, David
On 20/09/2016 16:40, Rob Arthan wrote:
David,
After using the patch in my last post to enable me to build poly, ProofPower now builds and behaves as I would expect it to when compiled without ?enable-intinf-as-int. When I compile with ?enable-intinf-as-int I get a problem because the type of PolyML.PrettyBlock has a FixedInt in it where it had just in in version 5.6.
PolyML.PrettyBlock;
val it = fn: FixedInt.int * bool * PolyML.context list * PolyML.pretty list -> PolyML.pretty
Is that intended? I can?t understand why it has happening because the datatype declaration in PRETTYSIG.sml uses int. If this is intended then I can work round it, but it would be nice not to have to.
Regards,
Rob.
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On 20/09/16 16:50, Rob Arthan wrote:
I think this patch fixes it:
diff --git a/libpolyml/pexport.cpp b/libpolyml/pexport.cpp index b03b1da..a9ebd2e 100644 --- a/libpolyml/pexport.cpp +++ b/libpolyml/pexport.cpp @@ -158,7 +158,7 @@ void PExport::printObject(PolyObject *p) for (unsigned i = 0; i < ps->length; i++) { char ch = ps->chars[i];
fprintf(exportFile, "%02x", ch);
fprintf(exportFile, "%02x", ch & 0xff); } } else
It seems to work, but it is unclear to me why.
A few lines before there is the following text:
/* See if the first word is a possible length. The length cannot be one because single character strings are represented by the character. */ /* This is not infallible but it seems to be good enough to detect the strings. */ POLYUNSIGNED bytes = length * sizeof(PolyWord); if (length >= 2 && ...)
It looks like it requires further update.
Makarius
On 20 Sep 2016, at 16:42, Makarius <makarius at sketis.net> wrote:
On 20/09/16 16:50, Rob Arthan wrote:
I think this patch fixes it:
diff --git a/libpolyml/pexport.cpp b/libpolyml/pexport.cpp index b03b1da..a9ebd2e 100644 --- a/libpolyml/pexport.cpp +++ b/libpolyml/pexport.cpp @@ -158,7 +158,7 @@ void PExport::printObject(PolyObject *p) for (unsigned i = 0; i < ps->length; i++) { char ch = ps->chars[i];
fprintf(exportFile, "%02x", ch);
fprintf(exportFile, "%02x", ch & 0xff); } } else
It seems to work, but it is unclear to me why.
char can be either signed or unsigned. When passed to fprintf, it will be promoted to either int or unsigned int (based on char?s signedness). Thus, if char is signed, and ch is 0xf0 (for example), with a 32-bit int it will be sign-extended to 0xfffffff0, and printed as fffffff0, rather than f0. Then whatever is reading it will think that?s 4 bytes (3 ff?s and 1 f0). By anding with 0xff, ch gets promoted to int, but then only the lowest byte is selected, so it will stay as 0xf0.
Perhaps a clearer fix would be to make ch an unsigned char (or cast)?
James
Makarius,
On 20 Sep 2016, at 16:42, Makarius <makarius at sketis.net> wrote:
On 20/09/16 16:50, Rob Arthan wrote:
I think this patch fixes it:
diff --git a/libpolyml/pexport.cpp b/libpolyml/pexport.cpp index b03b1da..a9ebd2e 100644 --- a/libpolyml/pexport.cpp +++ b/libpolyml/pexport.cpp @@ -158,7 +158,7 @@ void PExport::printObject(PolyObject *p) for (unsigned i = 0; i < ps->length; i++) { char ch = ps->chars[i];
fprintf(exportFile, "%02x", ch);
fprintf(exportFile, "%02x", ch & 0xff); } } else
It seems to work, but it is unclear to me why.
A few lines before there is the following text:
/* See if the first word is a possible length. The length cannot be one because single character strings are represented by the character. */ /* This is not infallible but it seems to be good enough to detect the strings. */ POLYUNSIGNED bytes = length * sizeof(PolyWord); if (length >= 2 && ...)
It looks like it requires further update.
I think length is the length of the PolyObject representing the string while ps->length is the length of the string and will have been 1 in the call that caused the problem. I suspect the comments and possibly the test on length are redundant.
Regards,
Rob.
Rob and Makarius, First, thanks Rob for providing the patch. I've pushed the fix. I think the reason it works is that it was printing a full 32 or 64-bit value when the value was negative but the scan that was reading it back in again was expecting only a single byte.
What is actually happening is an attempt to see whether a byte cell is a string or something else, such as an arbitrary precision number. There is no precise way of distinguishing the two and for almost all purposes it doesn't matter. The only reason the code here is trying to distinguish them is that strings have a length word that is in host byte order. If the exporting and importing machines have the same endianness the byte cell could just be exported and imported as byte data but it's possible that the interpreted version could be exported on a little-endian machine and run on a big-endian machine. This is such a rare situation that I would not want to reserve a bit in the header of a cell to get it completely correct. It just needs to be good enough to get the compiler running so it can compile in the basis library.
The comment was misleading so I've removed it.
Regards, David
On 20/09/2016 16:53, Rob Arthan wrote:
Makarius,
On 20 Sep 2016, at 16:42, Makarius <makarius at sketis.net> wrote:
On 20/09/16 16:50, Rob Arthan wrote:
I think this patch fixes it:
diff --git a/libpolyml/pexport.cpp b/libpolyml/pexport.cpp index b03b1da..a9ebd2e 100644 --- a/libpolyml/pexport.cpp +++ b/libpolyml/pexport.cpp @@ -158,7 +158,7 @@ void PExport::printObject(PolyObject *p) for (unsigned i = 0; i < ps->length; i++) { char ch = ps->chars[i];
fprintf(exportFile, "%02x", ch);
fprintf(exportFile, "%02x", ch & 0xff); } } else
It seems to work, but it is unclear to me why.
A few lines before there is the following text:
/* See if the first word is a possible length. The length cannot be one because single character strings are represented by the character. */ /* This is not infallible but it seems to be good enough to detect the strings. */ POLYUNSIGNED bytes = length * sizeof(PolyWord); if (length >= 2 && ...)
It looks like it requires further update.
I think length is the length of the PolyObject representing the string while ps->length is the length of the string and will have been 1 in the call that caused the problem. I suspect the comments and possibly the test on length are redundant.
Regards,
Rob. _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml