Philip Clayton wrote:
I have found a performance issue when using TextIO.StreamIO.input1 to read a functional stream. Looking at gc/non-gc times and using PolyML.profiling, it appears that garbage collection accounts for most of the time. There is some code below to demonstrate with stats that include comparison with SML/NJ.
The profiling shows that readFromReader in basis/BasicStreamIO.sml is responsible for creating values that are being garbage collected. Looking at this code, I can see various things that would contribute to this garbage collection but nothing that is obviously problematic. Is it simply the case that overheads in the implementation mean that it is not suitable for a large number of small reads?
I think I need to look again at the functional IO part of Poly/ML's basis library.
The idea of functional IO is that a stream should be repeatable. i.e. if a stream, f, has returned some data then re-reading from the stream should return the same data. The definition of functional IO in the basis library that I used when implementing this in Poly/ML had a number of program snippets that implied that it was not just the content that had to be repeatable but also the way the content was broken down. So, if the stream was read using "input1" to return a single character then a subsequent call to "input" on that same functional stream must return precisely one character.
val str = getInstream(openIn "/tmp/abc");
val str = ? : TextIO.StreamIO.instream
StreamIO.input1 str;
val it = SOME (#"0", ?) : (TEXT_STREAM_IO.elem * TextIO.StreamIO.instream) option
StreamIO.input str;
val it = ("0", ?) : TEXT_STREAM_IO.vector * TextIO.StreamIO.instream
However, I think when the book was published many of these examples were left out and although it doesn't seem to be stated formally I think the idea is that only the content needs to be repeatable. So "input" should return a string whose first character is the same as that returned by "input1" but whose length is unspecified. This seems to be what SML/NJ at least does. It looks like the problem for your example is that Poly/ML is building up a enormous stream of single character elements and that this is overwhelming the storage management.
Although the basis library defines imperative IO in terms of functional IO the implementation in Poly/ML is different so that it doesn't suffer from these problems.
David