Let me start with a disclaimer and explanation. I wanted to have font metrics for a set of Unicode fonts available to some SML code. The metrics are extracted from the font by a cascade of programs and I already had code to inject them into C projects by creating files that went static font_info[] = { ... ... ... }; with MANY lines of simple static data. Doing things that way puts the metric data instantly available within my program. I modified things to create the same in SML and ended up with val metrics = Vector.from List [ Vector.fromList [a,b,c,d,e,f], Vector.fromList [a,b,c,d,e,f], .. ]; with around 10000 entries in the top level list. This was simple to do and in particular simpler both now and for when/if the code gets used by people. In particular simpler than anything that has SML code that reads the metric data from a resource file. And I am only interested in the metrics I have right now and am not concerned about future expansion there.
Processing the input there works, but takes around 10 minutes of a reasonably fast desktop machine. That felt pretty bad.
If I change the code to go (as it were) [[x,x,x,x] @ [x,x,x,x] @ [x,x,x,x]] at the top level with say 200 "x"s in each segment and 50 segments then the code is processed in more like 10 seconds rather than 10 minutes.
This feels as if it is probably a really easy case where some aspect of processing lists in the input has been coded with quadratic growth rate in a way that is probably not necessary (the fact that the version segmented using "@" gets handled faster reassures me of that).
A really easy demo if this is just val l = length [ Vector.fromList [0], ... ]; with 10000 entries in the list.
I understand very happily that this is pathological code that nobody would ever write by hand and that as a way of handling data can be viewed as inflexible and crude. I am also happy that I can work round it using "@" and so it is not holding me up - but it felt as if it might be worth reporting and my expectation is that a very small change to the code would address it. I had looked and the code getList in SKIPS_.ML might have been the issue, but when I recode that to build the list backwards and then reverse at the end that does not unclog things, so the issue is at least one step deeper.
Arthur
I suspect that this is a problem with the intermediate code optimiser. List cells are converted into tuples (pairs) and there are lots of cases where it is important to optimise tuples. Unfortunately this case isn't one of them and it would be better to leave it alone.
Thanks for reporting this. I'm not sure when I'll have a chance to look at this in detail.
David
On 15/07/2016 10:45, Arthur Norman wrote:
Let me start with a disclaimer and explanation. I wanted to have font metrics for a set of Unicode fonts available to some SML code. The metrics are extracted from the font by a cascade of programs and I already had code to inject them into C projects by creating files that went static font_info[] = { ... ... ... }; with MANY lines of simple static data. Doing things that way puts the metric data instantly available within my program. I modified things to create the same in SML and ended up with val metrics = Vector.from List [ Vector.fromList [a,b,c,d,e,f], Vector.fromList [a,b,c,d,e,f], .. ]; with around 10000 entries in the top level list. This was simple to do and in particular simpler both now and for when/if the code gets used by people. In particular simpler than anything that has SML code that reads the metric data from a resource file. And I am only interested in the metrics I have right now and am not concerned about future expansion there.
Processing the input there works, but takes around 10 minutes of a reasonably fast desktop machine. That felt pretty bad.
If I change the code to go (as it were) [[x,x,x,x] @ [x,x,x,x] @ [x,x,x,x]] at the top level with say 200 "x"s in each segment and 50 segments then the code is processed in more like 10 seconds rather than 10 minutes.
This feels as if it is probably a really easy case where some aspect of processing lists in the input has been coded with quadratic growth rate in a way that is probably not necessary (the fact that the version segmented using "@" gets handled faster reassures me of that).
A really easy demo if this is just val l = length [ Vector.fromList [0], ... ]; with 10000 entries in the list.
I understand very happily that this is pathological code that nobody would ever write by hand and that as a way of handling data can be viewed as inflexible and crude. I am also happy that I can work round it using "@" and so it is not holding me up - but it felt as if it might be worth reporting and my expectation is that a very small change to the code would address it. I had looked and the code getList in SKIPS_.ML might have been the issue, but when I recode that to build the list backwards and then reverse at the end that does not unclog things, so the issue is at least one step deeper.
Arthur _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Thank you. So far as I am concerned this is an oddity to pass on to you rather than a show-stopper. Even if I had not modified my code to parse/compile faster I could survive even an overight run to deal with the big table I have because I can then use saveState to make a snapshot, and I sort of only have to do that about once. So stick it somewher eon your list but do not panic about it! Arthur
On Sat, 16 Jul 2016, David Matthews wrote:
I suspect that this is a problem with the intermediate code optimiser. List cells are converted into tuples (pairs) and there are lots of cases where it is important to optimise tuples. Unfortunately this case isn't one of them and it would be better to leave it alone.
Thanks for reporting this. I'm not sure when I'll have a chance to look at this in detail.
David
Hello,
On 7/15/16 11:45 AM, Arthur Norman wrote:
A really easy demo if this is just val l = length [ Vector.fromList [0], ... ]; with 10000 entries in the list.
I tried this. With 1000 entries, this is evaluated in about 17s on my old laptop.
But the following, equivalent, returns almost instantly:
val l = length (map Vector.fromList [ [0], ... ]);
I'll let David find out what happens, but it looks like building lists from their elements is very slow if the elements are created by application of some function.
Bernard.