[Polyml] Re: ML Basis

3 Sep 2025


      On 01/09/2025 15:18, David Matthews wrote:
...
On 01/09/2025 12:20, Phil Clayton wrote:
...
On 31/08/2025 13:54, vqn wrote:
...
As far as I can understand, this requires being able to

extract free identifiers from a module;
compare the content of two interfaces (not just their exported

identifiers);
3. link old compiled code to new code it depends on.
While (1) could probably be implemented through namespaces, I'm not sure
how to go about (2) and (3), especially since I am only wrapping the
compiler API. I.e limited to a single '(source code * compiled env) ->
fully compiled and linked code' operation.
Finding the free identifiers when compiling code is fairly easy.  This 
is what PolyML.make although it only looks at functors, structures and 
signatures.  There's no reason that other kinds of identifier couldn't 
be included if required.
...
...
Though for now the problem is more how to properly (de)serialize
compiled code so that it can be reused for subsequent compilations. :)
Yes.  This issue seems (sort of) related to linking names in old and 
new code but not for the object code itself (where types are, 
presumably, long since eliminated) but the SML types associated with 
certain entities in the object code.  Clearly I'm not familiar with 
the internals of Poly/ML compilation but I may take a closer look.
Serialising the result of the compilation and loading the serialised 
data into a subsequent computation are the difficult part.  When 
anything is compiled in Poly/ML the result is a graph in memory.  Some 
of this is a data structure that describes the types and/or signatures 
and some of it is what might be described as the "value".  Generally 
both the "type" and the "value" will involve the addresses of memory 
cells that were present before this particular computation.  These might 
be the cells that make up the type "int", say, or the cells that make up 
the "print" function and link to other cells for "stdOut".  Once the 
compilation is complete there's no way to go back from the graph and 
unpick it to work out which bits came from where.
This presents a problem for serialising if we want to be able to write 
out only part of the graph and then read it into a subsequent 
computation.  PolyML.export, used to create object files, writes out the 
whole graph so there's no need to recreate from a partial graph.
It is possible to distinguish cells by whether they came from the 
executable, say.  Newly created cells are created in the local heap but 
the cells in the executable are permanent and never garbage collected. 
PolyML.SaveState.saveState writes out new cells to the saved state.  The 
addresses of cells in the parent executable are written as offsets in 
the parent.  There's no way to know anything more about them so it's 
only possible to read the saved state back into the same executable. 
PolyML.saveModule does something similar.
Thank you for the high-level explanation - very helpful.
...
I'm not sure what this implies for CM/MLB since I'm not familiar with 
them.  I can see that you might want to avoid unnecessary recompilation 
but is it also necessary to avoid duplication of the compiled code?
I would have thought it is necessary to avoid duplication of code where 
mutable state is involved but perhaps I have misunderstood.  Still, I 
doubt the performance of CM could be matched if binary files contain 
multiple copies of the same code, so it is probably necessary, more so 
for large code bases where this is useful.  (Also, I think users would 
expect incremental compilation to be an optimization, giving something 
equivalent to full compilation although not identical due to e.g. 
loading modules in a different order.)
...
If 
one module depends on another is the idea to avoid storing the compiled 
code for the dependencies with it?
Yes, because this wouldn't scale up for large code bases.  In my case, 
the final binary (heap) from 32-bit SML/NJ is 45 MB and there are 
hundreds of modules.
Considering solutions to support cut-off incremental recompilation for 
MLB files, I wondered whether a checkpoint mechanism could allow only 
cells introduced after the checkpoint is declared to be written out to a 
file.  References to cells in the base executable would be stored as 
offsets, as currently done, but references to other cells created before 
the checkpoint would not be stored as offsets but as ML names and types, 
along with their constructor and infix status.  The thinking is that 
such a file could be loaded on top of new code that provided the same ML 
names and types with the same constructor/fixity status.  This would 
introduce a slight overhead when a module is compiled for the first time 
but would decrease subsequent compilation time.
Currently vqn is trying to get a simpler incremental compilation scheme 
to work:
...
I have been trying to implement incremental compilation by caching
compiled .mlb files (i.e compiler namespaces) and exporting and reimporting them
through {save,load}ModuleBasic
Roughly speaking, an MLB file (http://mlton.org/MLBasis) defines a basis 
in terms of a list of SML files and other MLB files, evaluated in order. 
  An MLB file is evaluated only once, so multiple references to the same 
MLB file reuse the result of its evaluation.  I think this requires 
{save,load}Module and their basic variants to work hierarchically but I 
don't see how this is supported.  I am guessing a saved module has its 
own copy of every dependency not in the (immutable) executable.  This 
appears to be an issue for e.g. mutable state, as shown in the example 
below.  (Note that `loadModule` seems to fail for Poly/ML built with 
compact32bit, so a non-compact32bit version is required.)  Is there a 
way to make {save,load}Module give the expected behavior below?
Phil
(* Suppose we have a module A with state and a module B that depends on 
A. *)
structure A =
   struct
     val r = ref 0
     fun set x = r := x
     fun get () = ! r
   end
structure B =
   struct
     fun get () = A.get () + 1
   end
;
A.set 5;
A.get ();  (* expect 5, ok *)
B.get ();  (* expect 6, ok *)
PolyML.SaveState.saveModule ("/tmp/a", {sigs = [], structs = ["A"], 
functors = [], onStartup = NONE});
PolyML.SaveState.saveModule ("/tmp/b", {sigs = [], structs = ["B"], 
functors = [], onStartup = NONE});
(* **** Fresh Poly/ML session **** *)
PolyML.SaveState.loadModule "/tmp/a";
PolyML.SaveState.loadModule "/tmp/b";
A.set 10;
A.get ();  (* expected 10, ok *)
B.get ();  (* expected 11, got 6: module B not using the same state as A! *)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Polyml] Re: ML Basis