On 01/09/2025 12:20, Phil Clayton wrote:
On 31/08/2025 13:54, vqn wrote:
As far as I can understand, this requires being able to
- extract free identifiers from a module;
- compare the content of two interfaces (not just their exported
identifiers); 3. link old compiled code to new code it depends on.
While (1) could probably be implemented through namespaces, I'm not sure how to go about (2) and (3), especially since I am only wrapping the compiler API. I.e limited to a single '(source code * compiled env) -> fully compiled and linked code' operation.
Finding the free identifiers when compiling code is fairly easy. This is what PolyML.make although it only looks at functors, structures and signatures. There's no reason that other kinds of identifier couldn't be included if required.
Though for now the problem is more how to properly (de)serialize compiled code so that it can be reused for subsequent compilations. :)
Yes. This issue seems (sort of) related to linking names in old and new code but not for the object code itself (where types are, presumably, long since eliminated) but the SML types associated with certain entities in the object code. Clearly I'm not familiar with the internals of Poly/ML compilation but I may take a closer look.
Serialising the result of the compilation and loading the serialised data into a subsequent computation are the difficult part. When anything is compiled in Poly/ML the result is a graph in memory. Some of this is a data structure that describes the types and/or signatures and some of it is what might be described as the "value". Generally both the "type" and the "value" will involve the addresses of memory cells that were present before this particular computation. These might be the cells that make up the type "int", say, or the cells that make up the "print" function and link to other cells for "stdOut". Once the compilation is complete there's no way to go back from the graph and unpick it to work out which bits came from where.
This presents a problem for serialising if we want to be able to write out only part of the graph and then read it into a subsequent computation. PolyML.export, used to create object files, writes out the whole graph so there's no need to recreate from a partial graph.
It is possible to distinguish cells by whether they came from the executable, say. Newly created cells are created in the local heap but the cells in the executable are permanent and never garbage collected. PolyML.SaveState.saveState writes out new cells to the saved state. The addresses of cells in the parent executable are written as offsets in the parent. There's no way to know anything more about them so it's only possible to read the saved state back into the same executable. PolyML.saveModule does something similar.
I'm not sure what this implies for CM/MLB since I'm not familiar with them. I can see that you might want to avoid unnecessary recompilation but is it also necessary to avoid duplication of the compiled code? If one module depends on another is the idea to avoid storing the compiled code for the dependencies with it?
David