I have recently downloaded the PolyML 5.2.1 archive and built it on my 4GB Vista x64 SP2 box, feeding the supplied Visual Studio workspace to my VS 2008 SP1 dev environment.
This went quite well, with only 2 build errors and one execution problem stopping the entire bootstrapping process! :)
The 2 errors are just from using the newer dev environment: the _osver variable is no longer available. So, the 2 usages (line 1434 in basicio.cpp and line 168 in process_env.cpp) need to change. As a minimal change, I would suggest
if (_osver & 0x8000) => if (GetVersion() & 0x80000000)
as something that will still work in older and newer environments.
The execution issue happens early in the "Custom Build Step" use of PolyImport.exe: during startup, the polymain(...) function obtains the size of installed physical memory, and then defaults to using half of it.
IMHO, this logic could use a new look, given modern system memory levels (4GB and up is becoming common, especially in today's dev environments)... in any case, the first actual call to VirtualAlloc is for something like 1.5 GB, and is failed. Since I was willing to believe that finding that much contiguous virtual address space could be hard - or maybe it was just that the demand to commit all of it too was the actual cause - I just forcibly cut the memory usage back to 256M and the rest of the build proceeded flawlessly, creating a functioning PolyML system.
I built all 4 of the pre-defined targets - the only one that appeared seriously slow was the "IntDebug". ;)
So, possibly adding some logic to (in the absence of command-line or other override) "cap" the default memory usage at 256 MB or 512 MB might be in order?
Robert
Robert, Thanks for your comments. I'm replying to your comments about memory size separately since that applies as much to Unix as to Windows.
Robert Roessler wrote:
The 2 errors are just from using the newer dev environment: the _osver variable is no longer available. So, the 2 usages (line 1434 in basicio.cpp and line 168 in process_env.cpp) need to change. As a minimal change, I would suggest
if (_osver & 0x8000) => if (GetVersion() & 0x80000000)
as something that will still work in older and newer environments.
Thanks, I've made that change. You can test it in SVN (I use TortoiseSVN on Windows). Actually, the test for _osver was just to provide compatibility with Win 95/98/Me and that's probably no longer relevant.
I built all 4 of the pre-defined targets - the only one that appeared seriously slow was the "IntDebug". ;)
Well, as I'm sure you realised, this builds the version of Poly/ML that interprets a byte-code rather than using native machine code and builds the interpreter without optimisation. So...
Regards, David
David Matthews wrote:
Robert, Thanks for your comments. I'm replying to your comments about memory size separately since that applies as much to Unix as to Windows.
Robert Roessler wrote:
The 2 errors are just from using the newer dev environment: the _osver variable is no longer available. So, the 2 usages (line 1434 in basicio.cpp and line 168 in process_env.cpp) need to change. As a minimal change, I would suggest
if (_osver & 0x8000) => if (GetVersion() & 0x80000000)
as something that will still work in older and newer environments.
Thanks, I've made that change. You can test it in SVN (I use TortoiseSVN on Windows). Actually, the test for _osver was just to provide compatibility with Win 95/98/Me and that's probably no longer relevant.
Thanks for the quick update - I have built from the SVN trunk (including hacking the memsize value, of course) without incident - yes, I am an old Tortoise{CVS,SVN} user myself. ;)
I am the author of the OCaml plugin for the Scintilla editing component, and would like to adapt it for Poly/ML (ML97)... while the changes needed are largely obvious (different keywords, changed/added token types, etc), I have two really basic questions:
1) where is a good place to look for an official ML97 grammar? Actually, I would like to be able to easily distinguish between that and the apparent(?) superset that Poly/ML accepts (I say this because when browsing the [S]ML sources, there seem to be anachronisms present - although this could just be me learning to read ML97 vs OCaml ;) ).
2) what is/are the preferred file extension(s)? The main ones in OCaml are [of course] .ml for source and .mli for interface files, but it isn't clear how this plays out for Poly etc...
After this, I may well be interested in the IDE work mentioned for 5.3 (if there is still anything to do) - I have read the "IDE Communications Protocol" doc, so that at least suggests where you may be going on this.
Robert Roessler
Robert Roessler wrote:
... After this, I may well be interested in the IDE work mentioned for 5.3 (if there is still anything to do) - I have read the "IDE Communications Protocol" doc, so that at least suggests where you may be going on this.
That didn't come out right - I am not suggesting that you need my help for your 5.3 release! :|
I do think the approach described for communications between an IDE and the Poly/ML toplevel looks interesting, particularly since a simple IDE I built for OCaml always struggled in the absence of clearly defined paths for this sort of integration.
Robert
Robert Roessler wrote:
I am the author of the OCaml plugin for the Scintilla editing component, and would like to adapt it for Poly/ML (ML97)... while the changes needed are largely obvious (different keywords, changed/added token types, etc), I have two really basic questions:
That sounds really interesting and if you're doing that I'd be happy to provide any help you need.
The main focus for 5.3 is in improving various aspects of the user interface. That has included improving error messages and changing various aspects of printing values. A major change, though, is in providing mechanisms for development environments to hook into Poly/ML in order to be able to provide users with much more information about their programs. I've been working on a short project at Edinburgh University with Lucas Dixon to see if we can get a little way with this. Much of it will have to wait until we have time and money to do it properly but we've managed to get some way with it in the last few months.
Lucas has been working on a jEdit plug-in while I've been working on adding this IDE support to my "MLShell" mini-IDE for Windows. The latter isn't (currently) open-source so I'm not distributing it at the moment but the jEdit plug-in is in the Poly/ML SVN repository. In either of these it's possible to compile some ML source code and then browse the source and ask for the type of an expression or go to the declaration of an identifier. This works even if there are type errors in the ML source code so that it won't fully compile. It just needs to be able to be parsed.
The hooks have been designed to work at two different levels. Part of the aim was to allow Isabelle to have access to this information and in Isabelle the ML code is wrapped within Isabelle code. There is therefore a functional interface available that allows ML code to capture an abstracted view of the parse-tree, navigate over it and extract type and declaration location information. Makarius has been involved in this part of the work.
Support for programming directly in ML is provided through the IDE protocol and this is the way that the jEdit plug-in and MLShell work. The IDE starts a background Poly/ML process that can compile ML code and keeps the last parse-tree. The IDE protocol defined in the documentation allows the IDE to extract information from the parse-tree. The user sees only the source code and navigates on that while the IDE and background Poly/ML process map between positions in the source code and nodes in the parse-tree.
We're not yet at the point of releasing this although much of the code is in SVN so it is possible to try it out. A major issue we're struggling with is exactly what constitutes a "project". For other languages and IDEs there's a clear distinction between compilation and run-time. With ML, though, compilation involves execution. Typically, a project will involve calling "use" on various files but "use" is actually being executed. Creating an ML structure also involves execution.
To answer your specific questions:
- where is a good place to look for an official ML97 grammar? Actually,
I would like to be able to easily distinguish between that and the apparent(?) superset that Poly/ML accepts (I say this because when browsing the [S]ML sources, there seem to be anachronisms present - although this could just be me learning to read ML97 vs OCaml ;) ).
The official definition of Standard ML is "The Definition of Standard ML (Revised)" by Milner, Tofte, Harper and MacQueen. I don't know whether there is a version of the grammar on the net somewhere. As far as possible Poly/ML accepts only the standard grammar although there may be one or two minor extensions. There are a few cases that are very difficult to parse exactly.
- what is/are the preferred file extension(s)? The main ones in OCaml
are [of course] .ml for source and .mli for interface files, but it isn't clear how this plays out for Poly etc...
Frequently ".ML" in upper case is used for ML source although case isn't significant on Windows. ".sml" is also widely used.
Regards, David
On Fri, 19 Jun 2009, David Matthews wrote:
We're not yet at the point of releasing this although much of the code is in SVN so it is possible to try it out. A major issue we're struggling with is exactly what constitutes a "project". For other languages and IDEs there's a clear distinction between compilation and run-time. With ML, though, compilation involves execution. Typically, a project will involve calling "use" on various files but "use" is actually being executed. Creating an ML structure also involves execution.
Even in more mainstream IDEs, the "project management" often does not work as smoothly as it should. For example, I often find myself "rebooting" Netbeans/Scala just to get in sync with external changes to sources or jars. Whatever we will do for Isabelle (which integrates theory sources with ML programs), we intend to do better :-)
Anyway, concerning the most basic "use" functionality in Poly/ML, for us the main question is about global effects. Since Poly/ML already allows the client (Isabelle) to provide an implementation of the ML environment (for toplevel types, vals, structures etc.), the remaining question is how to handle updates on references.
In your own IDE implementations I have seen the use of saved heap images to achieve undoable global state. I wonder if this could be somehow internalized, e.g. heap snapshots as plain ML values?
Anyway, even then I am unsure of what exactly will be required for the Isabelle/ML IDE. Right now, we say that "regular" ML text has to evaluate in a pure manner, without any side effects (toplevel bindings are handled as pure context update). In practice, ref updates are often used for runtime configuration (setting global flags), and that is intentionally not undoable.
Frequently ".ML" in upper case is used for ML source although case isn't significant on Windows. ".sml" is also widely used.
Isabelle/ML always uses .ML; some years ago many people started to use .sml and .sig (e.g. in SML/NJ and Moscow ML).
Makarius
Makarius wrote:
On Fri, 19 Jun 2009, David Matthews wrote:
We're not yet at the point of releasing this although much of the code is in SVN so it is possible to try it out. A major issue we're struggling with is exactly what constitutes a "project". For other languages and IDEs there's a clear distinction between compilation and run-time. With ML, though, compilation involves execution. Typically, a project will involve calling "use" on various files but "use" is actually being executed. Creating an ML structure also involves execution.
Even in more mainstream IDEs, the "project management" often does not work as smoothly as it should. For example, I often find myself "rebooting" Netbeans/Scala just to get in sync with external changes to sources or jars. Whatever we will do for Isabelle (which integrates theory sources with ML programs), we intend to do better :-)
I've not had problems with the IDEs I've used, such as Visual Studio for C++ and Eclipse for Java.
The issue for an IDE for ML is that traditionally there has been a lot more freedom about what an ML file can contain and what it may expect than is true for a C or Java file. A C source file has to be self-contained; the only context is provided by include files. That means the IDE can compile files independently.
The model we've traditionally used for ML, or at least Poly/ML, is that of a sequence of files compiled in a defined order with each file providing some context for its successors. The context generally includes declarations in the global name space but can include other context such as the current working directory. This makes it more difficult for the IDE to know exactly what context to use for a particular source file. The reason this is important is that we want to be able to compile a source file when the user edits it so that the IDE can provide information about internal types and that's only possible if all the top-level declarations that the file expects are available.
Anyway, concerning the most basic "use" functionality in Poly/ML, for us the main question is about global effects. Since Poly/ML already allows the client (Isabelle) to provide an implementation of the ML environment (for toplevel types, vals, structures etc.), the remaining question is how to handle updates on references.
In your own IDE implementations I have seen the use of saved heap images to achieve undoable global state. I wonder if this could be somehow internalized, e.g. heap snapshots as plain ML values?
The current mechanism we're using is that there is a project file that lists the source files and one of those files is marked as the root. That file can be compiled in the standard Poly/ML compiler and it typically contains "use" calls that compile the remainder of the files in some order determined by the user. There is a special "project build" phase that runs this but with a modified version of "use" that saves the state of Poly/ML just before each file is compiled in a project directory. By loading the appropriate saved state the IDE can re-establish the context if the user subsequently starts editing one of the files. Using a saved state means that both declarations and ref values are saved and restored to the appropriate values. The current working directory still needs to be saved and restored separately since that's not in the saved state.
Fundamentally, the issue is how far should the user be required to change existing source code and manner of working in order to be able to use the IDE. Must each file contain just one top-level declaration? Is a file allowed to call OS.FileSys.chDir to change the current directory? Can a file declare its own version of "use" by wrapping up PolyML.compiler and then compiling some files with it? This is less of an issue for new code but there's a lot of existing code around.
Regards, David.
On Sat, 20 Jun 2009, David Matthews wrote:
The model we've traditionally used for ML, or at least Poly/ML, is that of a sequence of files compiled in a defined order with each file providing some context for its successors. The context generally includes declarations in the global name space but can include other context such as the current working directory. This makes it more difficult for the IDE to know exactly what context to use for a particular source file. The reason this is important is that we want to be able to compile a source file when the user edits it so that the IDE can provide information about internal types and that's only possible if all the top-level declarations that the file expects are available.
This traditional ML model of "incremental compilation", i.e. compilation that is intertwined with execution, is exactly what is required for applications like Isabelle. It is one of the reaons why we could never use separate compilation versions of ML like MLton or MLkit.
An "Isabelle project" (which is called "session" in our terminology) consists of a collection of theory files that may also "use" ML text, either as separate files or embedded into our source format. This works by our own runtime invocation of the Poly/ML compiler.
For our ongoing Isabelle IDE efforts, we essentially assume that ML compilation can be invoked frequently and works more or less in real time (Poly/ML compilation is quite fast, especially if compared to the time for checking specifications and proofs in Isabelle.)
Fundamentally, the issue is how far should the user be required to change existing source code and manner of working in order to be able to use the IDE. Must each file contain just one top-level declaration? Is a file allowed to call OS.FileSys.chDir to change the current directory? Can a file declare its own version of "use" by wrapping up PolyML.compiler and then compiling some files with it? This is less of an issue for new code but there's a lot of existing code around.
In Isabelle/ML we now more or less assume that there are no chdirs and no side effects on refs. We are lucky enough to can afford this in our specific framework, which provides its own ways to manage context data without side effects.
Makarius
Robert Roessler wrote:
The execution issue happens early in the "Custom Build Step" use of PolyImport.exe: during startup, the polymain(...) function obtains the size of installed physical memory, and then defaults to using half of it.
IMHO, this logic could use a new look, given modern system memory levels (4GB and up is becoming common, especially in today's dev environments)... in any case, the first actual call to VirtualAlloc is for something like 1.5 GB, and is failed. Since I was willing to believe that finding that much contiguous virtual address space could be hard - or maybe it was just that the demand to commit all of it too was the actual cause - I just forcibly cut the memory usage back to 256M and the rest of the build proceeded flawlessly, creating a functioning PolyML system.
This issue comes up with both Unix and Windows. Makarius has just sent me an email about this issue in a virtual machine with Cygwin.
There's a question about what value to use for the initial heap size and it's not clear what the answer is. I'm raising it here in case people have some ideas. The issue is that the larger the heap the less frequently the garbage-collector has to run so in terms of getting performance big is better. This applies at least up to the stage where the heap reaches the size of the physical memory. Once it exceeds the physical memory and has to use the page file/swap space then the cost of garbage collection, at least in terms of real time if not CPU time, can rise dramatically.
The current code gets the physical memory size on the machine, up to a maximum of 4G in 32-bit mode, and sets the default heap size to half of that. If the -H option is given that is used instead.
Generally this will work although there are circumstances in which it may not. In particular, on a shared machine or if there are other processes running Poly/ML there may be insufficient swap space. One possibility might be to have the code try allocating a large heap and progressively reduce the size if that fails.
Regards, David
David Matthews wrote:
Robert Roessler wrote:
The execution issue happens early in the "Custom Build Step" use of PolyImport.exe: during startup, the polymain(...) function obtains the size of installed physical memory, and then defaults to using half of it.
This issue comes up with both Unix and Windows. Makarius has just sent me an email about this issue in a virtual machine with Cygwin. Generally this will work although there are circumstances in which it may not. In particular, on a shared machine or if there are other processes running Poly/ML there may be insufficient swap space. One possibility might be to have the code try allocating a large heap and progressively reduce the size if that fails.
Following up on my last message, I've now updated the code to try to allocate a smaller heap if the requested size fails. Currently, this works whether the heap size is given by -H option or by the default of half the real memory. I've tested it on Windows but not on Unix.
David
David Matthews wrote:
David Matthews wrote:
Robert Roessler wrote:
The execution issue happens early in the "Custom Build Step" use of PolyImport.exe: during startup, the polymain(...) function obtains the size of installed physical memory, and then defaults to using half of it.
This issue comes up with both Unix and Windows. Makarius has just sent me an email about this issue in a virtual machine with Cygwin. Generally this will work although there are circumstances in which it may not. In particular, on a shared machine or if there are other processes running Poly/ML there may be insufficient swap space. One possibility might be to have the code try allocating a large heap and progressively reduce the size if that fails.
Following up on my last message, I've now updated the code to try to allocate a smaller heap if the requested size fails. Currently, this works whether the heap size is given by -H option or by the default of half the real memory. I've tested it on Windows but not on Unix.
This does work now on my 4GB Vista x64 box... in particular, on the case encountered while Poly/ML is bootstrapping itself.
Once the [Release non-interpreted] executable is running, it seems to have 2 "private" memory pools of about 210 MB and 60 MB - with a WS of 10 MB.
WRT to your comment above on how you got here, I still question the strategy of grabbing such a huge chunk of memory, i.e., grabbing an amount based on a large fraction of the [currently possibly large] physical memory size - especially if all of this is committed at the time of allocation?
Both from the perspective of being a well-behaved task in the shared execution environment, or even in the "it's all about Poly/ML" case, this could lead to trouble.
The former case (based solely on the size of physical memory) appears to ignore the current memory loading, while even in the latter case, things may not be optimal if you want to run a number of Poly/ML instances.
Perhaps the advantage of Poly's current allocation method (over capping and/or gradual commitment) is simplicity... both of these alternatives, while allowing for better resource sharing, do potentially require more advance planning. As in "can I go with the default [smaller] heaps?" or "what happens if a Poly instance attempts to use delayed commit - but finds itself unable to get the space when it needs it?"
Robert