First a feature request: could we please have a version of the basic function for adding a string that takes an optional argument indicating how wide the Oppen algorithm should treat the string? At the moment, the infrastructure just seems to count the bytes emitted, and this causes premature line breaks if I'm emitting UTF-8 characters. Of course, if the infrastructure could magically become aware of UTF-8 or wide characters that would solve this particular use-case, but I'd still ask for this feature because I also want to emit byte-sequences that are terminal control codes. (On appropriate terminals, I emit vt100 codes to colour certain substrings, for example.)
Secondly, I'm a bit concerned that the new, recommended approach to pretty-printing requires me to translate into the Pretty structure before anything happens. I'd feel more comfortable if the structure could be produced and consumed lazily so that I don't end up with a large (strictly unnecessary) Pretty structure in memory.
Michael
________________________________
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
On Fri, 23 Aug 2013, Michael Norrish wrote:
I'm a bit concerned that the new, recommended approach to pretty-printing requires me to translate into the Pretty structure before anything happens. I'd feel more comfortable if the structure could be produced and consumed lazily so that I don't end up with a large (strictly unnecessary) Pretty structure in memory.
What is "large" here? 64KB? 64MB? 64GB? As far as I know, the original Oppen paper was optimizing for some situation < 64KB. BTW Poly/ML does implicit sharing of subtree structures whenever you run short of heap space.
For my part, I welcome the way David Matthews has brushed up the toplevel pretty printing some time ago, to get away from this implicit stateful streaming. What is particularly nice is PolyML.ContextProperty, which allows to inject uninterpreted proper lists into the tree structure, and pick them up later for user-space interpretation. Thus it is possible to roll-your-own ML pretty printing -- the system merely passes certain tree structures through the toplevel infrastructure.
The only bottle neck encountered here is the 64 MB limit of strings: Isabelle insists in shipping the final result as one blob, and at that point it might crash with exception Size. This is just a historical accident of the Isabelle output functions to produce a single string in the end. It would be better to retain the pretty trees until the data is shipped to the external consumer, i.e. the Isabelle/Scala process that does the physical pretty printing wrt. font metrics and window sizes.
Makarius
On 26/08/2013, at 9:08 PM, Makarius <makarius at sketis.net> wrote:
On Fri, 23 Aug 2013, Michael Norrish wrote:
I'm a bit concerned that the new, recommended approach to pretty-printing requires me to translate into the Pretty structure before anything happens. I'd feel more comfortable if the structure could be produced and consumed lazily so that I don't end up with a large (strictly unnecessary) Pretty structure in memory.
What is "large" here? 64KB? 64MB? 64GB? As far as I know, the original Oppen paper was optimizing for some situation < 64KB. BTW Poly/ML does implicit sharing of subtree structures whenever you run short of heap space.
It's certainly true that I may be wincing unnecessarily about memory. Nonetheless, there will be both a memory *and* time cost with this approach that Haskell, for example, probably wouldn't incur. (I believe Haskell's laziness (and deforestation optimisations?) should in theory ensure that the intermediate data structure doesn't get built or really even traversed. With what we have in SML, we will end up traversing the original term, building something almost isomorphic to that original, and then traversing that copy to actually do the printing. Two traversals, a whole bunch of allocations eventually followed by collections, and then the actual printing. Probably insignificant but it makes me wince.
Anyway, being able to lie about a string's size for the benefit of the Oppen algorithm is definitely more important.
Michael
________________________________
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
On Mon, 26 Aug 2013, Michael Norrish wrote:
It's certainly true that I may be wincing unnecessarily about memory. Nonetheless, there will be both a memory *and* time cost with this approach that Haskell, for example, probably wouldn't incur. (I believe Haskell's laziness (and deforestation optimisations?) should in theory ensure that the intermediate data structure doesn't get built or really even traversed. With what we have in SML, we will end up traversing the original term, building something almost isomorphic to that original, and then traversing that copy to actually do the printing. Two traversals, a whole bunch of allocations eventually followed by collections, and then the actual printing. Probably insignificant but it makes me wince.
Maybe. David Matthews could say more about how the Poly/ML runtime system really works, and what matters and what not.
I've myself done a lot of "empirical" performance tuning for Isabelle on Poly/ML in recent years, and the above considerations are rather alien to me now. The real performance happens elsewhere.
Makarius
On Mon, 26 Aug 2013, Michael Norrish wrote:
Anyway, being able to lie about a string's size for the benefit of the Oppen algorithm is definitely more important.
Or telling the true size of the string ...
It is up to David Matthews to comment on that, to say if he wants to add it to Poly/ML. If not, I can show how it is done in user-space, but it needs some digging into the Isabelle/ML sources.
Anyway, here are some more hints about "tree size optimization", or rather the irrelevance of it according to the following isabelle-dev thread from 31-Jul-2013: https://mailmanbroy.informatik.tu-muenchen.de/pipermail/isabelle-dev/2013-Ju...
The main extract is this:
Making a few performance tests recently revealed that fully immutable theory values merely increase the stamp size by a factor of 5, leading to approx. 100000 in main HOL and 300000 in JinjaThreads, with hardly any measurable impact of performance. It leads to approx. 20 table lookups in every kernel inference, just as in ancient times.
So this issue is now on the historic list of "premature optimizations that complicate code", namely that of the kernel. Even just practically, the "stale theory error" has probably wasted more time of tool developers that it saved CPU time.
It other words, the Isabelle inference kernel carries around large immutable tables of "stamps" at the order of 10^5, checks them in every single inference, and updates them in every elementary definition or declaration of theory content. Poly/ML 5.5.0 is able to digest that without problems, and hardly measurable performance impact compared to the "optimized" version that would re-use old stamp collections (basically a linear type implemented manually).
One motivation to get rid of this premature optimization was to make it easier for other prover guys to imitate the explicit theory context of Isabelle: it is very relevant for robustness and performance of the kernel. A bit like a hardware management unit to implement virtual memory, something you take for granted in mainstream computing, while most theorem provers still poke around physically in the context.
Makarius
On 23/08/2013 13:39, Michael Norrish wrote:
First a feature request: could we please have a version of the basic function for adding a string that takes an optional argument indicating how wide the Oppen algorithm should treat the string? At the moment, the infrastructure just seems to count the bytes emitted, and this causes premature line breaks if I'm emitting UTF-8 characters. Of course, if the infrastructure could magically become aware of UTF-8 or wide characters that would solve this particular use-case, but I'd still ask for this feature because I also want to emit byte-sequences that are terminal control codes. (On appropriate terminals, I emit vt100 codes to colour certain substrings, for example.)
Can you explain exactly how you use the resulting "pretty" structure. Do you have your own pretty printer or printing functions? If so you can always use PolyML.ContextProperty to pass your own data around. That's probably the way to do colouring rather than to use control codes within the strings.
Secondly, I'm a bit concerned that the new, recommended approach to pretty-printing requires me to translate into the Pretty structure before anything happens. I'd feel more comfortable if the structure could be produced and consumed lazily so that I don't end up with a large (strictly unnecessary) Pretty structure in memory.
Exactly how big is the Pretty structure likely to be? Is it very likely to survive a garbage collection? If you build the Pretty structure and then immediately display it, it is very likely that it will be collected away. There's then no real difference when compared with processing it lazily. The cost of allocation will be the same either way.
David
On 26/08/2013, at 9:58 PM, David Matthews <David.Matthews at prolingua.co.uk> wrote:
On 23/08/2013 13:39, Michael Norrish wrote:
First a feature request: could we please have a version of the basic function for adding a string that takes an optional argument indicating how wide the Oppen algorithm should treat the string? At the moment, the infrastructure just seems to count the bytes emitted, and this causes premature line breaks if I'm emitting UTF-8 characters. Of course, if the infrastructure could magically become aware of UTF-8 or wide characters that would solve this particular use-case, but I'd still ask for this feature because I also want to emit byte-sequences that are terminal control codes. (On appropriate terminals, I emit vt100 codes to colour certain substrings, for example.)
Can you explain exactly how you use the resulting "pretty" structure. Do you have your own pretty printer or printing functions? If so you can always use PolyML.ContextProperty to pass your own data around. That's probably the way to do colouring rather than to use control codes within the strings.
Well, if I want to exploit the ContextProperty, don't I then have to write my own versions of PolyML.prettyPrint and PolyML.addPrettyPrinter, implementing Oppen all over again? Of course, in the case of addPrettyPrinter, I can't write that at all because of its magic polymorphism. I'm keen to have the standard REPL give me back pretty-printed values, complete with suitable contextual annotations, but I then have to rely on the built-in implementation of Oppen.
Secondly, I'm a bit concerned that the new, recommended approach to pretty-printing requires me to translate into the Pretty structure before anything happens. I'd feel more comfortable if the structure could be produced and consumed lazily so that I don't end up with a large (strictly unnecessary) Pretty structure in memory.
Exactly how big is the Pretty structure likely to be? Is it very likely to survive a garbage collection? If you build the Pretty structure and then immediately display it, it is very likely that it will be collected away. There's then no real difference when compared with processing it lazily. The cost of allocation will be the same either way.
Yes. I think Makarius has persuaded me that I don't need to worry too much about this.
Michael
________________________________
The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.
On 26/08/2013 13:09, Michael Norrish wrote:
On 26/08/2013, at 9:58 PM, David Matthews <David.Matthews at prolingua.co.uk> wrote:
On 23/08/2013 13:39, Michael Norrish wrote:
First a feature request: could we please have a version of the basic function for adding a string that takes an optional argument indicating how wide the Oppen algorithm should treat the string? At the moment, the infrastructure just seems to count the bytes emitted, and this causes premature line breaks if I'm emitting UTF-8 characters. Of course, if the infrastructure could magically become aware of UTF-8 or wide characters that would solve this particular use-case, but I'd still ask for this feature because I also want to emit byte-sequences that are terminal control codes. (On appropriate terminals, I emit vt100 codes to colour certain substrings, for example.)
Can you explain exactly how you use the resulting "pretty" structure. Do you have your own pretty printer or printing functions? If so you can always use PolyML.ContextProperty to pass your own data around. That's probably the way to do colouring rather than to use control codes within the strings.
Well, if I want to exploit the ContextProperty, don't I then have to write my own versions of PolyML.prettyPrint and PolyML.addPrettyPrinter, implementing Oppen all over again? Of course, in the case of addPrettyPrinter, I can't write that at all because of its magic polymorphism. I'm keen to have the standard REPL give me back pretty-printed values, complete with suitable contextual annotations, but I then have to rely on the built-in implementation of Oppen.
You would have to write your own version of PolyML.prettyPrint and your own REPL to use it. Neither of these are very complicated and I can point you at the existing code as a basis. PolyML.print and PolyML.makestring, if you actually use them, would use the original pretty-printer. You would not need your own PolyML.addPrettyPrinter. That just adds a function that generates a "pretty" data structure to a particular type and doesn't care whether the data structure includes your own ContextProperty values.
I can see the advantage of changing PrettyString to include an explicit length; it's just that doing that would involve changes to nearly all the existing pretty-print functions. That's not just in Poly/ML code but any user code as well. It might be simpler to add a new constructor to the pretty datatype. I'll give it some thought but just at the moment I want to get the current version released before making any more changes.
David
On 26/08/13 22:34, David Matthews wrote:
You would have to write your own version of PolyML.prettyPrint and your own REPL to use it. Neither of these are very complicated and I can point you at the existing code as a basis. PolyML.print and PolyML.makestring, if you actually use them, would use the original pretty-printer. You would not need your own PolyML.addPrettyPrinter. That just adds a function that generates a "pretty" data structure to a particular type and doesn't care whether the data structure includes your own ContextProperty values.
Yes, I?d certainly be interested in seeing how to write my own REPL. I?ve long meant to try this as it would allow us to use a custom lexer as well. Please let me know where to look for the relevant docs/sample code. I assume that without an addPrettyPrinter function, this custom REPL would not be extensible in the same way that the default one is. I don?t think this would be a big deal in practice.
I can see the advantage of changing PrettyString to include an explicit length; it's just that doing that would involve changes to nearly all the existing pretty-print functions. That's not just in Poly/ML code but any user code as well. It might be simpler to add a new constructor to the pretty datatype. I'll give it some thought but just at the moment I want to get the current version released before making any more changes.
I agree that you?d want to keep the existing constructor without the size information. With luck, adding a new constructor with explicit size information would be minimally intrusive.
Thanks, Michael
On 27/08/2013 06:39, Michael Norrish wrote:
On 26/08/13 22:34, David Matthews wrote:
You would have to write your own version of PolyML.prettyPrint and your own REPL to use it. Neither of these are very complicated and I can point you at the existing code as a basis. PolyML.print and PolyML.makestring, if you actually use them, would use the original pretty-printer. You would not need your own PolyML.addPrettyPrinter. That just adds a function that generates a "pretty" data structure to a particular type and doesn't care whether the data structure includes your own ContextProperty values.
Yes, I?d certainly be interested in seeing how to write my own REPL. I?ve long meant to try this as it would allow us to use a custom lexer as well. Please let me know where to look for the relevant docs/sample code. I assume that without an addPrettyPrinter function, this custom REPL would not be extensible in the same way that the default one is. I don?t think this would be a big deal in practice.
The REPL is just a few lines in basis/FinalPolyML.sml . It's the readEvalPrint and handledLoop functions. The core boils down to
let val code = PolyML.compiler(readin, [CPNameSpace nameSpace, CPOutStream printOut]) handle Fail s => ( (* Compilation failed *) raise Fail s ) in code () (* Report exceptions in running code. *) handle exn => ( (* Run failed *) raise exn ) end
The real work is in PolyML.compiler. This takes a function that returns the next character in the input and a list of options. It returns a function that, when called, runs the compiled code and performs any side-effects. That includes printing the results. Actually, nearly everything, including printing can be overridden by adding specific options. The current list is in the code in FinalPolyML.sml with comments explaining what the options do and how they default.
User-defined pretty printing is done at a lower level than the REPL. Whenever a type identifier (a "type name" in the Definition) is created the compiler creates at run-time a ref cell and fills it in with a default function that, when called, builds a pretty tree for a value of that type. Type identifiers are typically created by datatype bindings but can also be created by opaque signature matches and functor applications. What addPrettyPrint does is to set the ref for a particular type to some user-defined function. It has to be treated specially by the compiler because it needs to get the run-time ref from type information that is only present at compile-time. If you need to post-process the output from the compiler there are options such as CPOutStream that is used for all output or CPNameSpace that puts values, types etc into the name-space and deals with printing.
Setting this up to do what you want may be a bit of an effort initially but the idea is that the interface to PolyML.compiler should remain stable with any changes being limited to adding new options to allow for additional features.
I can see the advantage of changing PrettyString to include an explicit length; it's just that doing that would involve changes to nearly all the existing pretty-print functions. That's not just in Poly/ML code but any user code as well. It might be simpler to add a new constructor to the pretty datatype. I'll give it some thought but just at the moment I want to get the current version released before making any more changes.
I agree that you?d want to keep the existing constructor without the size information. With luck, adding a new constructor with explicit size information would be minimally intrusive.
OK. It's not trivial because there are two versions of the pretty datatype, one inside the compiler and one in user code, and these have to be kept in step. I'll look into this once the current version is released.
David
On Tue, 27 Aug 2013, Michael Norrish wrote:
I?d certainly be interested in seeing how to write my own REPL.
BTW, here is a worked example of "eval" in Poly/ML 5.5: http://stackoverflow.com/questions/9555790/does-sml-poly-have-a-cl-like-repl...
Makarius