ML compiler

List overview All Threads
Download

newer

older

Installing Poly/ML

Updates in CVS

ldixon＠inf.ed.ac.uk

3 Feb 2009 3 Feb '09

6:09 p.m.

Hi PolyML people,

I'm interested in trying to shrink the size of my binary/heap file;

I have 18604 lines of library code; the first 6922 lines give a 2.6 MB heap; the next 1200 lines on top of this give a 15 MB heap. I then have 7500 lines of program that push the heap up to 138 MB. (on some Linux' this becomes over 200MB!)

I'm doing garbage collection before writing the heap (although it sounds like this is now done automatically); and I can't quite tell how I'm getting this rather nasty blow up in heap size.

I suspect my use of substructures, I often have things like:

signature N1 ... signature N2 = sig structure n1 : N1 ... end

functor N2Fun(n1 : N1 and ...) : N2 = struct structure n1 = n1; ... end;

structure n1 : N1 = ...

structure n2a = functor(n1 : N1 and ...) : N2; structure n2b = functor(n1 : N1 and ...) : N2;

is the code for the substructure n1 literally getting copied each time?

Even with this I find it hard to imagine how I'd generate 120 MB of machine code from 7500 lines of ML...

I noticed some forget functions in the PolyML compiler; is there some clever way to say all I care about is stuff reachable from calling some specified function?

suggestions very welcome,

best, lucas

Show replies by date

pclayton＠taz.qinetiq.com

3 Feb 3 Feb

6:24 p.m.

New subject: [polyml] ML compiler

Have you tried appying PolyML.shareCommonData to the function you are exporting as in the following example from http://www.polyml.org/docs/Version5ReleaseNotes.html ?

$ poly Poly/ML 5.0 Release

...

fun f () = print "Hello World\n";

val f = fn : unit -> unit

...

PolyML.shareCommonData f;

val it = () : unit

...

PolyML.export("hello", f);

val it = () : unit

...

^D

Phil

Lucas Dixon wrote:

...

Hi PolyML people,

I'm interested in trying to shrink the size of my binary/heap file;

I have 18604 lines of library code; the first 6922 lines give a 2.6 MB heap; the next 1200 lines on top of this give a 15 MB heap. I then have 7500 lines of program that push the heap up to 138 MB. (on some Linux' this becomes over 200MB!)

I'm doing garbage collection before writing the heap (although it sounds like this is now done automatically); and I can't quite tell how I'm getting this rather nasty blow up in heap size.

I suspect my use of substructures, I often have things like:

signature N1 ... signature N2 = sig structure n1 : N1 ... end

functor N2Fun(n1 : N1 and ...) : N2 = struct structure n1 = n1; ... end;

structure n1 : N1 = ...

structure n2a = functor(n1 : N1 and ...) : N2; structure n2b = functor(n1 : N1 and ...) : N2;

is the code for the substructure n1 literally getting copied each time?

Even with this I find it hard to imagine how I'd generate 120 MB of machine code from 7500 lines of ML...

I noticed some forget functions in the PolyML compiler; is there some clever way to say all I care about is stuff reachable from calling some specified function?

suggestions very welcome,

best, lucas _______________________________________________ polyml mailing list polyml@inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml

The information contained in this E-Mail and any subsequent correspondence is private and is intended solely for the intended recipient(s). The information in this communication may be confidential and/or legally privileged. Nothing in this e-mail is intended to conclude a contract on behalf of QinetiQ or make QinetiQ subject to any other legally binding commitments, unless the e-mail contains an express statement to the contrary or incorporates a formal Purchase Order.

For those other than the recipient any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on such information is prohibited and may be unlawful.

Emails and other electronic communication with QinetiQ may be monitored and recorded for business purposes including security, audit and archival purposes. Any response to this email indicates consent to this.

Telephone calls to QinetiQ may be monitored or recorded for quality control, security and other business purposes.

QinetiQ Limited Registered in England & Wales: Company Number:3796233 Registered office: 85 Buckingham Gate, London SW1E 6PD, United Kingdom Trading address: Cody Technology Park, Cody Building, Ively Road, Farnborough, Hampshire, GU14 0LX, United Kingdom http://www.qinetiq.com/home/notices/legal.html

David.Matthews＠prolingua.co.uk

4 Feb 4 Feb

11:16 a.m.

New subject: [polyml] ML compiler

Lucas Dixon wrote:

...

is the code for the substructure n1 literally getting copied each time?

Even with this I find it hard to imagine how I'd generate 120 MB of machine code from 7500 lines of ML...

I noticed some forget functions in the PolyML compiler; is there some clever way to say all I care about is stuff reachable from calling some specified function?

There are several things you can do. To be honest I don't know why there is such an enormous blow-up.

If you are producing a single function as the result of your compilation then if you are exporting this all you need, as Phil says, is to run PolyML.shareCommonData on the function before you export it.

If you are building the function but still want to retain it within Poly/ML it's a little more complicated.

If you're using saveState to save the state it is always a good idea to run PolyML.shareCommonData PolyML.rootFunction; before PolyML.SaveState.saveState

What this does is go over the whole reachable memory and merge immutable values that are identical. In particular if the compiler has had to copy some signatures it will be able to merge those parts that can be merged and if they are in an existing parent state it will set the pointers to use the already saved data.

You can use PolyML.objSize to see the number of words used by everything reachable from a particular value. This could be useful to see if the blow-up is due to code in your function or information Poly/ML is keeping about structures and signatures. For example,

...

val x = ([1,2,3], [1,2,3]);

val x = ([1, 2, 3], [1, 2, 3]) : int list * int list

...

PolyML.objSize x;

val it = 21 : int

...

PolyML.shareCommonData x;

val it = () : unit

...

PolyML.objSize x;

val it = 12 : int

PolyML.objSize PolyML.rootFunction; can be used to see the total size of everything reachable from the root.

You can use PolyML.Compiler.structureNames() and similar to get a list of the structures and remove those you don't want with PolyML.Compiler.forgetStructure. You need to do this before saving state.

Let us know if this works.

David.

ldixon＠inf.ed.ac.uk

3:46 p.m.

New subject: [polyml] ML compiler

thanks for the suggestions.

Here's my results:

sharing common data made a good difference: I had a 141 MB binary, and it gets down to 44 MB.

I tried forgetting signatures, structures, and types, and even values; this made another 24MB of difference - so now I know I had 20MB of structure/signature defs. :)

It turns out that forgetting types and values make almost no difference.

So now I seem to have 20MB of code left - still much more than I would expect, but perhaps avoiding inlining functors will improve on this...

Now I have a couple of questions...

I was thinking about calling PolyML.shareCommonData on every defined value - would this do more than on PolyML.rootFunction ? or does polyml already ignore unused functions?

What happens to exceptions which are raise after PolyML.Compiler.forgetXXX where the exception would normally contain type-names ? are they also forgotten?

thanks, lucas

David Matthews wrote:

...

Lucas Dixon wrote:

...
is the code for the substructure n1 literally getting copied each time?

Even with this I find it hard to imagine how I'd generate 120 MB of machine code from 7500 lines of ML...

I noticed some forget functions in the PolyML compiler; is there some clever way to say all I care about is stuff reachable from calling some specified function?

There are several things you can do. To be honest I don't know why there is such an enormous blow-up.

If you are producing a single function as the result of your compilation then if you are exporting this all you need, as Phil says, is to run PolyML.shareCommonData on the function before you export it.

If you are building the function but still want to retain it within Poly/ML it's a little more complicated.

If you're using saveState to save the state it is always a good idea to run PolyML.shareCommonData PolyML.rootFunction; before PolyML.SaveState.saveState

What this does is go over the whole reachable memory and merge immutable values that are identical. In particular if the compiler has had to copy some signatures it will be able to merge those parts that can be merged and if they are in an existing parent state it will set the pointers to use the already saved data.

You can use PolyML.objSize to see the number of words used by everything reachable from a particular value. This could be useful to see if the blow-up is due to code in your function or information Poly/ML is keeping about structures and signatures. For example,

...
val x = ([1,2,3], [1,2,3]);

val x = ([1, 2, 3], [1, 2, 3]) : int list * int list

...
PolyML.objSize x;

val it = 21 : int

...
PolyML.shareCommonData x;

val it = () : unit

...
PolyML.objSize x;

val it = 12 : int

PolyML.objSize PolyML.rootFunction; can be used to see the total size of everything reachable from the root.

You can use PolyML.Compiler.structureNames() and similar to get a list of the structures and remove those you don't want with PolyML.Compiler.forgetStructure. You need to do this before saving state.

Let us know if this works.

David.

ldixon＠inf.ed.ac.uk

5 Feb 5 Feb

6:40 p.m.

New subject: [polyml] ML compiler

Some answers to my own questions...

Lucas Dixon wrote:

...

So now I seem to have 20MB of code left - still much more than I would expect, but perhaps avoiding inlining functors will improve on this...

saves only 1MB, despite the fact that I have many deeply nested functors in my code. Changing the max inline depth also seems to have pretty much no change on the size of generated code.... so now I still wonder how I got 20MB of compiled code, but at least its not 200 :)

I'm surprised that shareCommonData was so effective as I had almost only functions... does shareCommonData share functions that are the same? (David: I think I remember you telling me it did not - but I can't remember why...?)

...

I was thinking about calling PolyML.shareCommonData on every defined value - would this do more than on PolyML.rootFunction ? or does polyml already ignore unused functions?

I still don't know about this... I was wondering if its easy to shareCommonData for all values... I guess I can write a function, that generates a file that when run then does this... will try that some other time... or does PolyML.rootFunction include all data?

...

What happens to exceptions which are raise after PolyML.Compiler.forgetXXX where the exception would normally contain type-names ? are they also forgotten?

it appears that the data associated with exceptions is simply not printed.

best, lucas

David.Matthews＠prolingua.co.uk

6 Feb 6 Feb

9:08 a.m.

New subject: [polyml] ML compiler

Lucas Dixon wrote:

...

Some answers to my own questions...

saves only 1MB, despite the fact that I have many deeply nested functors in my code. Changing the max inline depth also seems to have pretty much no change on the size of generated code.... so now I still wonder how I got 20MB of compiled code, but at least its not 200 :)

I'd be interested to know why this is so large, and whether this is really code or some other data.

...

I'm surprised that shareCommonData was so effective as I had almost only functions... does shareCommonData share functions that are the same? (David: I think I remember you telling me it did not - but I can't remember why...?)

No, it doesn't. The reason is that, on the i386 at least, code segments contain PC-relative calls and jumps. That means that two pieces of code that look the same may well be calling different functions and so can't be merged and equally two pieces that actually call the same function wouldn't appear to be the same.

...

...
I was thinking about calling PolyML.shareCommonData on every defined value - would this do more than on PolyML.rootFunction ? or does polyml already ignore unused functions?

PolyML.shareCommonData detects any sharing between values that are reachable from its argument. PolyML.rootFunction is the function that is called when Poly/ML starts up and includes references to the global name space. So calling PolyML.shareCommonData PolyML.rootFunction will detect all the possible sharing within declarations you have made. Crucially, it detects sharing within the data structures the compiler uses to represent type-information so it reduces the space needed to store information about structures, signatures and functors.

Calling PolyML.shareCommonData again on particular values won't help. If something is completely unreachable then it will be thrown away by the garbage-collector.

...

...
What happens to exceptions which are raise after PolyML.Compiler.forgetXXX where the exception would normally contain type-names ? are they also forgotten?

it appears that the data associated with exceptions is simply not printed.

Exception packets contain an identifier for the exception and the name of the exception but do not otherwise contain any type information. When an exception is raised the printing mechanism searches in the global name-space to see if the exception has been declared there or in a global structure. If it finds a matching identifier it uses the type-information to print the arguments to the exception otherwise it just prints the name.

Regards, David

6019

Age (days ago)

6022

Last active (days ago)

polyml@lists.polyml.org

5 comments

3 participants

tags (0)

participants (3)

David.Matthews＠prolingua.co.uk
ldixon＠inf.ed.ac.uk
pclayton＠taz.qinetiq.com