Hello,
Over the past few months, I have been working on an implementation of the ML Basis system for Poly/ML. It supports the complete mlb spec and is available both as a library and a cli tool:
https://github.com/vqns/polymlb
On a side note, I have been trying to implement incremental compilation by caching compiled .mlb files (i.e compiler namespaces) and exporting and reimporting them through {save,load}ModuleBasic. I have however been running into the issue that exporting each .mlb to a different on-disk module causes opaque types mismatch when reimporting them. What would be the reason for that?
I have managed to work around it by exporting everything to the same file, but that result in a fairly large cache, e.g ~45mb for smlfmt (1.2mb executable, 15k loc). I suppose the reason for that is that I export the basis library as well, but I'm not sure there is a solution for that.
Hi, This sounds like an interesting project. Have you included it in the Poly/ML software directory at https://www.polyml.org/software/index.php ?
The modules mechanism was added quite a long time ago and as far as I'm aware isn't used much. The original idea was that it would enable a packager to package up Poly/ML along with a set of modules from various sources. This model is quite common for other languages. The problem is that modules can only be imported into the same executable. There is also the problem that you've noted where opaque types in different modules don't match. This comes ultimately from the fact that types are identified by ML references and that the references are different in different modules.
It might be possible to fix the opaque type problem but I'm not sure how useful it would be. I get the impression that most users of Poly/ML build their own executables from source and so would need to compile the modules they need anyway.
Regards, David
On 30/07/2025 22:07, vqn wrote:
Hello,
Over the past few months, I have been working on an implementation of the ML Basis system for Poly/ML. It supports the complete mlb spec and is available both as a library and a cli tool:
https://github.com/vqns/polymlb
On a side note, I have been trying to implement incremental compilation by caching compiled .mlb files (i.e compiler namespaces) and exporting and reimporting them through {save,load}ModuleBasic. I have however been running into the issue that exporting each .mlb to a different on-disk module causes opaque types mismatch when reimporting them. What would be the reason for that?
I have managed to work around it by exporting everything to the same file, but that result in a fairly large cache, e.g ~45mb for smlfmt (1.2mb executable, 15k loc). I suppose the reason for that is that I export the basis library as well, but I'm not sure there is a solution for that. _______________________________________________ Poly/ML mailing list -- polyml@lists.polyml.org To unsubscribe send an email to polyml-leave@lists.polyml.org
This sounds like an interesting project. Have you included it in the Poly/ML software directory at https://www.polyml.org/software/index.php ?
I just did: https://polyml.org/software/details.php?id=46.
It might be possible to fix the opaque type problem but I'm not sure how useful it would be. I get the impression that most users of Poly/ML build their own executables from source and so would need to compile the modules they need anyway.
My intention was to enable recompilation speed-up during a development cycle by caching modules locally to the project, rather than a system wide cache. Similar to PolyML.make but with on disk persistence so that one would not need to keep the building process running at all times.
On 30/07/2025 22:07, vqn wrote:
Over the past few months, I have been working on an implementation of the ML Basis system for Poly/ML. It supports the complete mlb spec and is available both as a library and a cli tool:
This looks great! Thanks for making this available. I have various comments which I'll report via Github.
On a side note, I have been trying to implement incremental compilation by caching compiled .mlb files (i.e compiler namespaces) and exporting and reimporting them through {save,load}ModuleBasic. I have however been running into the issue that exporting each .mlb to a different on-disk module causes opaque types mismatch when reimporting them. What would be the reason for that?
I have managed to work around it by exporting everything to the same file, but that result in a fairly large cache, e.g ~45mb for smlfmt (1.2mb executable, 15k loc). I suppose the reason for that is that I export the basis library as well, but I'm not sure there is a solution for that.
I am interested in incremental compilation as this could allow Poly/ML to be used on a large project which already has MLB files for another compiler, so I would like to see whether I can help get this working.
I have found SML/NJ's CM to be effective at avoiding recompilation. This is not surprising as it is the result of considerable research - see the references in the CM User Manual (https://smlnj.org/doc/CM/new.pdf). This makes a difference in practice: for example, adding a new function to a utilities module on which every other module depends does not cause everything to be rebuilt. Although I have not used MLKit, it appears that its incremental compilation of MLB files would behave similarly, according to the documentation (https://elsman.com/mlkit/mlbasisfiles.html#managing-compilation-and-recompil...). Out of interest, did you consider implementing cut-off incremental recompilation for MLB files, in particular as described in Elsman's paper (https://elsman.com/pdf/sepcomp_tr.pdf)?
Phil
I have found SML/NJ's CM to be effective at avoiding recompilation. This is not surprising as it is the result of considerable research - see the references in the CM User Manual (https://smlnj.org/doc/CM/new.pdf). This makes a difference in practice: for example, adding a new function to a utilities module on which every other module depends does not cause everything to be rebuilt. Although I have not used MLKit, it appears that its incremental compilation of MLB files would behave similarly, according to the documentation (https://elsman.com/mlkit/mlbasisfiles.html#managing-compilation-and-recompil...). Out of interest, did you consider implementing cut-off incremental recompilation for MLB files, in particular as described in Elsman's paper (https://elsman.com/pdf/sepcomp_tr.pdf)?
At a glance, both the CM's and MLKit's incremental recompilation seem to rely on the concept of an "exported interface" and only recompile a module when at least one of the free identifiers it depends on is part of such an interface which has changed.
As far as I can understand, this requires being able to
1. extract free identifiers from a module; 2. compare the content of two interfaces (not just their exported identifiers); 3. link old compiled code to new code it depends on.
While (1) could probably be implemented through namespaces, I'm not sure how to go about (2) and (3), especially since I am only wrapping the compiler API. I.e limited to a single '(source code * compiled env) -> fully compiled and linked code' operation.
Though for now the problem is more how to properly (de)serialize compiled code so that it can be reused for subsequent compilations. :)
On 31/08/2025 13:54, vqn wrote:
I have found SML/NJ's CM to be effective at avoiding recompilation. This is not surprising as it is the result of considerable research - see the references in the CM User Manual (https://smlnj.org/doc/CM/new.pdf). This makes a difference in practice: for example, adding a new function to a utilities module on which every other module depends does not cause everything to be rebuilt. Although I have not used MLKit, it appears that its incremental compilation of MLB files would behave similarly, according to the documentation (https://elsman.com/mlkit/mlbasisfiles.html#managing-compilation-and-recompil...). Out of interest, did you consider implementing cut-off incremental recompilation for MLB files, in particular as described in Elsman's paper (https://elsman.com/pdf/sepcomp_tr.pdf)?
At a glance, both the CM's and MLKit's incremental recompilation seem to rely on the concept of an "exported interface" and only recompile a module when at least one of the free identifiers it depends on is part of such an interface which has changed.
As far as I can understand, this requires being able to
- extract free identifiers from a module;
- compare the content of two interfaces (not just their exported identifiers);
- link old compiled code to new code it depends on.
While (1) could probably be implemented through namespaces, I'm not sure how to go about (2) and (3), especially since I am only wrapping the compiler API. I.e limited to a single '(source code * compiled env) -> fully compiled and linked code' operation.
I think that is a nice summary of what would be needed. I strongly suspect that the interface provided by the PolyML structure does not allow this to be implemented, which would be a perfectly good reason for not doing it!
Though for now the problem is more how to properly (de)serialize compiled code so that it can be reused for subsequent compilations. :)
Yes. This issue seems (sort of) related to linking names in old and new code but not for the object code itself (where types are, presumably, long since eliminated) but the SML types associated with certain entities in the object code. Clearly I'm not familiar with the internals of Poly/ML compilation but I may take a closer look.
Phil
On 01/09/2025 12:20, Phil Clayton wrote:
On 31/08/2025 13:54, vqn wrote:
As far as I can understand, this requires being able to
- extract free identifiers from a module;
- compare the content of two interfaces (not just their exported
identifiers); 3. link old compiled code to new code it depends on.
While (1) could probably be implemented through namespaces, I'm not sure how to go about (2) and (3), especially since I am only wrapping the compiler API. I.e limited to a single '(source code * compiled env) -> fully compiled and linked code' operation.
Finding the free identifiers when compiling code is fairly easy. This is what PolyML.make although it only looks at functors, structures and signatures. There's no reason that other kinds of identifier couldn't be included if required.
Though for now the problem is more how to properly (de)serialize compiled code so that it can be reused for subsequent compilations. :)
Yes. This issue seems (sort of) related to linking names in old and new code but not for the object code itself (where types are, presumably, long since eliminated) but the SML types associated with certain entities in the object code. Clearly I'm not familiar with the internals of Poly/ML compilation but I may take a closer look.
Serialising the result of the compilation and loading the serialised data into a subsequent computation are the difficult part. When anything is compiled in Poly/ML the result is a graph in memory. Some of this is a data structure that describes the types and/or signatures and some of it is what might be described as the "value". Generally both the "type" and the "value" will involve the addresses of memory cells that were present before this particular computation. These might be the cells that make up the type "int", say, or the cells that make up the "print" function and link to other cells for "stdOut". Once the compilation is complete there's no way to go back from the graph and unpick it to work out which bits came from where.
This presents a problem for serialising if we want to be able to write out only part of the graph and then read it into a subsequent computation. PolyML.export, used to create object files, writes out the whole graph so there's no need to recreate from a partial graph.
It is possible to distinguish cells by whether they came from the executable, say. Newly created cells are created in the local heap but the cells in the executable are permanent and never garbage collected. PolyML.SaveState.saveState writes out new cells to the saved state. The addresses of cells in the parent executable are written as offsets in the parent. There's no way to know anything more about them so it's only possible to read the saved state back into the same executable. PolyML.saveModule does something similar.
I'm not sure what this implies for CM/MLB since I'm not familiar with them. I can see that you might want to avoid unnecessary recompilation but is it also necessary to avoid duplication of the compiled code? If one module depends on another is the idea to avoid storing the compiled code for the dependencies with it?
David
On 01/09/2025 15:18, David Matthews wrote:
On 01/09/2025 12:20, Phil Clayton wrote:
On 31/08/2025 13:54, vqn wrote:
As far as I can understand, this requires being able to
- extract free identifiers from a module;
- compare the content of two interfaces (not just their exported
identifiers); 3. link old compiled code to new code it depends on.
While (1) could probably be implemented through namespaces, I'm not sure how to go about (2) and (3), especially since I am only wrapping the compiler API. I.e limited to a single '(source code * compiled env) -> fully compiled and linked code' operation.
Finding the free identifiers when compiling code is fairly easy. This is what PolyML.make although it only looks at functors, structures and signatures. There's no reason that other kinds of identifier couldn't be included if required.
Though for now the problem is more how to properly (de)serialize compiled code so that it can be reused for subsequent compilations. :)
Yes. This issue seems (sort of) related to linking names in old and new code but not for the object code itself (where types are, presumably, long since eliminated) but the SML types associated with certain entities in the object code. Clearly I'm not familiar with the internals of Poly/ML compilation but I may take a closer look.
Serialising the result of the compilation and loading the serialised data into a subsequent computation are the difficult part. When anything is compiled in Poly/ML the result is a graph in memory. Some of this is a data structure that describes the types and/or signatures and some of it is what might be described as the "value". Generally both the "type" and the "value" will involve the addresses of memory cells that were present before this particular computation. These might be the cells that make up the type "int", say, or the cells that make up the "print" function and link to other cells for "stdOut". Once the compilation is complete there's no way to go back from the graph and unpick it to work out which bits came from where.
This presents a problem for serialising if we want to be able to write out only part of the graph and then read it into a subsequent computation. PolyML.export, used to create object files, writes out the whole graph so there's no need to recreate from a partial graph.
It is possible to distinguish cells by whether they came from the executable, say. Newly created cells are created in the local heap but the cells in the executable are permanent and never garbage collected. PolyML.SaveState.saveState writes out new cells to the saved state. The addresses of cells in the parent executable are written as offsets in the parent. There's no way to know anything more about them so it's only possible to read the saved state back into the same executable. PolyML.saveModule does something similar.
Thank you for the high-level explanation - very helpful.
I'm not sure what this implies for CM/MLB since I'm not familiar with them. I can see that you might want to avoid unnecessary recompilation but is it also necessary to avoid duplication of the compiled code?
I would have thought it is necessary to avoid duplication of code where mutable state is involved but perhaps I have misunderstood. Still, I doubt the performance of CM could be matched if binary files contain multiple copies of the same code, so it is probably necessary, more so for large code bases where this is useful. (Also, I think users would expect incremental compilation to be an optimization, giving something equivalent to full compilation although not identical due to e.g. loading modules in a different order.)
If one module depends on another is the idea to avoid storing the compiled code for the dependencies with it?
Yes, because this wouldn't scale up for large code bases. In my case, the final binary (heap) from 32-bit SML/NJ is 45 MB and there are hundreds of modules.
Considering solutions to support cut-off incremental recompilation for MLB files, I wondered whether a checkpoint mechanism could allow only cells introduced after the checkpoint is declared to be written out to a file. References to cells in the base executable would be stored as offsets, as currently done, but references to other cells created before the checkpoint would not be stored as offsets but as ML names and types, along with their constructor and infix status. The thinking is that such a file could be loaded on top of new code that provided the same ML names and types with the same constructor/fixity status. This would introduce a slight overhead when a module is compiled for the first time but would decrease subsequent compilation time.
Currently vqn is trying to get a simpler incremental compilation scheme to work:
I have been trying to implement incremental compilation by caching compiled .mlb files (i.e compiler namespaces) and exporting and reimporting them through {save,load}ModuleBasic
Roughly speaking, an MLB file (http://mlton.org/MLBasis) defines a basis in terms of a list of SML files and other MLB files, evaluated in order. An MLB file is evaluated only once, so multiple references to the same MLB file reuse the result of its evaluation. I think this requires {save,load}Module and their basic variants to work hierarchically but I don't see how this is supported. I am guessing a saved module has its own copy of every dependency not in the (immutable) executable. This appears to be an issue for e.g. mutable state, as shown in the example below. (Note that `loadModule` seems to fail for Poly/ML built with compact32bit, so a non-compact32bit version is required.) Is there a way to make {save,load}Module give the expected behavior below?
Phil
(* Suppose we have a module A with state and a module B that depends on A. *)
structure A = struct val r = ref 0 fun set x = r := x fun get () = ! r end
structure B = struct fun get () = A.get () + 1 end ;
A.set 5; A.get (); (* expect 5, ok *) B.get (); (* expect 6, ok *)
PolyML.SaveState.saveModule ("/tmp/a", {sigs = [], structs = ["A"], functors = [], onStartup = NONE});
PolyML.SaveState.saveModule ("/tmp/b", {sigs = [], structs = ["B"], functors = [], onStartup = NONE});
(* **** Fresh Poly/ML session **** *)
PolyML.SaveState.loadModule "/tmp/a"; PolyML.SaveState.loadModule "/tmp/b";
A.set 10; A.get (); (* expected 10, ok *) B.get (); (* expected 11, got 6: module B not using the same state as A! *)
Roughly speaking, an MLB file (http://mlton.org/MLBasis) defines a basis in terms of a list of SML files and other MLB files, evaluated in order.  An MLB file is evaluated only once, so multiple references to the same MLB file reuse the result of its evaluation. I think this requires {save,load}Module and their basic variants to work hierarchically but I don't see how this is supported. I am guessing a saved module has its own copy of every dependency not in the (immutable) executable. This appears to be an issue for e.g. mutable state, as shown in the example below. (Note that `loadModule` seems to fail for Poly/ML built with compact32bit, so a non-compact32bit version is required.) Is there a way to make {save,load}Module give the expected behavior below?
The module mechanism was added to allow for modules to be added to the system but the idea was that they would be independent of each other. They were based on the code that creates object files (PolyML.export) but rather than exporting everything to make a completely independent file a module does not contain data that is present in the parent executable. When they are loaded into memory they are simply copied into the local heap and the system does not record anything about a loaded module.
Over the last few days I've done some experiments to change this and there is a branch on GitHub (DependentModules) that tracks it. In this version loadModule behaves more like a saved state. Loading a module, creating some values that use this and saving a new module creates a new module that is dependent on the first. This means that it is possible to create modules with dependencies and sharing in the way you describe. It also, incidentally, allows sharing of opaque types between modules.
There are, at least at the moment, some restrictions. saveModule creates a new module but this is not recorded in the exporting process. So exporting two modules from the same Poly/ML session will not work as you might expect. The answer is to call saveModule as the last action in a session. Then start a new session, load the module just saved along with any others, add some more declarations and create a new, dependent, module. The dependencies must be loaded explicitly before a dependent module can be loaded. There is no mechanism to load dependencies automatically.
From what you've said I think this should be sufficient for MLB. I would assume that the implementation can keep track of the dependencies. I'd like to come up with a mechanism for loading dependencies automatically for the more general case. This situation is very similar to the way hierarchical saved states work and automatically loading parent states has always been a bit problematic.
David
On 09/09/2025 09:13, David Matthews wrote:
Roughly speaking, an MLB file (http://mlton.org/MLBasis) defines a basis in terms of a list of SML files and other MLB files, evaluated in order.   An MLB file is evaluated only once, so multiple references to the same MLB file reuse the result of its evaluation. I think this requires {save,load}Module and their basic variants to work hierarchically but I don't see how this is supported. I am guessing a saved module has its own copy of every dependency not in the (immutable) executable. This appears to be an issue for e.g. mutable state, as shown in the example below. (Note that `loadModule` seems to fail for Poly/ML built with compact32bit, so a non-compact32bit version is required.) Is there a way to make {save,load}Module give the expected behavior below?
The module mechanism was added to allow for modules to be added to the system but the idea was that they would be independent of each other. They were based on the code that creates object files (PolyML.export) but rather than exporting everything to make a completely independent file a module does not contain data that is present in the parent executable. When they are loaded into memory they are simply copied into the local heap and the system does not record anything about a loaded module.
Over the last few days I've done some experiments to change this and there is a branch on GitHub (DependentModules) that tracks it. In this version loadModule behaves more like a saved state. Loading a module, creating some values that use this and saving a new module creates a new module that is dependent on the first. This means that it is possible to create modules with dependencies and sharing in the way you describe.  It also, incidentally, allows sharing of opaque types between modules.
Thank you for this. I have been doing some simple tests (without MLB files) and it looks promising.
There are, at least at the moment, some restrictions. saveModule creates a new module but this is not recorded in the exporting process. So exporting two modules from the same Poly/ML session will not work as you might expect. The answer is to call saveModule as the last action in a session. Then start a new session, load the module just saved along with any others, add some more declarations and create a new, dependent, module. The dependencies must be loaded explicitly before a dependent module can be loaded. There is no mechanism to load dependencies automatically.
The attached example works even though, in t1.sml, saveModule is immediately followed by loadModule for the module just saved _without_ restarting a new session. Can we avoid restarting the session generally, like this? (This would certainly simplify matters.)
Also, I have noticed something odd. The attached example can be run with
poly < t1.sml poly < t2.sml
However, if the system call to "sleep 1" is removed from t1.sml, the second command using t2.sml produces the error:
Exception- Fail "Segment already exists" raised
Is there a race condition somewhere?
Phil
On 10/09/2025 23:44, Phil Clayton wrote:
There are, at least at the moment, some restrictions. saveModule creates a new module but this is not recorded in the exporting process. So exporting two modules from the same Poly/ML session will not work as you might expect. The answer is to call saveModule as the last action in a session. Then start a new session, load the module just saved along with any others, add some more declarations and create a new, dependent, module. The dependencies must be loaded explicitly before a dependent module can be loaded. There is no mechanism to load dependencies automatically.
The attached example works even though, in t1.sml, saveModule is immediately followed by loadModule for the module just saved _without_ restarting a new session. Can we avoid restarting the session generally, like this? (This would certainly simplify matters.)
Yes, using loadModule immediately after saveModule should work because it replaces the entries in the name look-up tables with references to the loaded module.
Also, I have noticed something odd. The attached example can be run with
poly < t1.sml    poly < t2.sml
However, if the system call to "sleep 1" is removed from t1.sml, the second command using t2.sml produces the error:
Exception- Fail "Segment already exists" raised
Is there a race condition somewhere?
I haven't tried your specific example but I'm fairly certain that this has to do with the way that module identifiers are currently generated. Each module has to have a different identifier so that if a module depends on others all the dependencies can be sorted out. Currently this just uses a time-stamp and so if two modules are generated in quick enough succession they could have the same time-stamp. Although this has been used for saved states for a long time it isn't the best way to do it and won't work if someone wants reproducible builds. It would be better to create a hash of the contents and use that.
There are a few things like this that will need to be sorted out before this code is ready for wider use. At this stage I really want to be sure that this general scheme will meet the needs of the MLB code.
David
I've been testing the new dependent modules branch and managed to get saving / loading modules working correctly with opaque types.
There are, at least at the moment, some restrictions. saveModule creates a new module but this is not recorded in the exporting process. So exporting two modules from the same Poly/ML session will not work as you might expect. The answer is to call saveModule as the last action in a session. Then start a new session, load the module just saved along with any others, add some more declarations and create a new, dependent, module. The dependencies must be loaded explicitly before a dependent module can be loaded.
That is unfortunately not applicable here, but reloading the module instead of restarting into a new session seems to work.
There is no mechanism to load dependencies automatically.
From what you've said I think this should be sufficient for MLB. I would assume that the implementation can keep track of the dependencies.
Dependencies are already tracked, so loading them before a dependent module is fairly simple.
I have however ran into a single problem, which is that a new module seems to depend on all previously loaded modules, even if nothing seems to be shared. Attempting to load a module fails with "Mismatch for existing memory space". Attached is an example in which two modules, each containing a single value, are exported sequentially. Trying to load the second one without the first fails. Interestingly, it works fine if the modules are exported from different threads.
On 13/09/2025 18:20, vqn wrote:
I've been testing the new dependent modules branch and managed to get saving / loading modules working correctly with opaque types.
There are, at least at the moment, some restrictions. saveModule creates a new module but this is not recorded in the exporting process. So exporting two modules from the same Poly/ML session will not work as you might expect. The answer is to call saveModule as the last action in a session. Then start a new session, load the module just saved along with any others, add some more declarations and create a new, dependent, module. The dependencies must be loaded explicitly before a dependent module can be loaded.
That is unfortunately not applicable here, but reloading the module instead of restarting into a new session seems to work.
There is no mechanism to load dependencies automatically.
From what you've said I think this should be sufficient for MLB. I would assume that the implementation can keep track of the dependencies.
Dependencies are already tracked, so loading them before a dependent module is fairly simple.
I have however ran into a single problem, which is that a new module seems to depend on all previously loaded modules, even if nothing seems to be shared. Attempting to load a module fails with "Mismatch for existing memory space". Attached is an example in which two modules, each containing a single value, are exported sequentially. Trying to load the second one without the first fails.
As I understand it, an MLB file is compiled in its own, initially empty, namespace. Only the modules for MLB dependencies would be loaded, so the saved module for the MLB file would depend only on the modules of its dependencies. Is the issue that loadModule/saveModule are tracking module dependencies independently of namespaces?
Interestingly, it works fine if the modules are exported from different threads.
For me this still failed but possibly there are race conditions here.
Phil
As I understand it, an MLB file is compiled in its own, initially empty, namespace. Only the modules for MLB dependencies would be loaded, so the saved module for the MLB file would depend only on the modules of its dependencies. Is the issue that loadModule/saveModule are tracking module dependencies independently of namespaces?
Essentially, yes. I could be wrong, but from what I've seen (and tried), it seeems to require all previously exported modules, regardless of whether values are actually shared. The program I attached in my previous mail shows the problem: the second module does not load without the first.
Also, my apologies for the delay, but I've finally found the time to push the cache branch on github. For now it is still missing the actual export based cache, which I'll also push when I can finish cleaning it up.
I've now pushed an update to the DependentModules branch that largely deals with the outstanding issues.
It is no longer necessary to load a module immediately after saving it. saveModule puts the saved module into the same storage as loaded modules.
When a module is saved it is only dependent on other modules if it actually uses something from them. This needs a bit more work since it is currently possible for a module to inadvertently share immutable data particularly if PolyML.shareCommonData has been called.
The signature for a module is now derived from a hash of the contents itself rather than a time-stamp. It may be a good idea to combine the two to ensure that the the signatures are properly unique.
There are some new functions including PolyML.SaveState.showLoadedModules to display the current module and PolyML.SaveState.getModuleInfo to get the signature and dependencies of a module in the file system.
David
My apologies for the empty mail, it would seem that my client somehow did not send my reply.
It is no longer necessary to load a module immediately after saving it. saveModule puts the saved module into the same storage as loaded modules.
When a module is saved it is only dependent on other modules if it actually uses something from them. This needs a bit more work since it is currently possible for a module to inadvertently share immutable data particularly if PolyML.shareCommonData has been called.
The signature for a module is now derived from a hash of the contents itself rather than a time-stamp.
Thank you, this is great! I have indeed met some extra dependencies, particularly when strings are involved, but it can be fixed by a copying the strings, though I am not sure how that holds if a gc happens at the same time. (By the way, is there a more efficient way than CharVector.tabulate?)
It may be a good idea to combine the two to ensure that the the signatures are properly unique.
I think this is needed. Most signature collisions I've found happened with very small exports, though those have been fixed since copying strings, but I'm still finding a few for completely unrelated values.
I've pushed some updates to the DependentModules branch. Trying to figure out dependencies automatically seemed to be too complicated so now the responsibility has been moved to the caller. That has required some changes to the types of the functions and some new functions. The relevant part of the signature of PolyML.SaveState is now:
type moduleId = WordVector.vector
val saveDependentModule: string * {structs: string list, sigs: string list, functors: string list, onStartup: (unit -> unit) option} * (moduleId * string) list -> moduleId
val saveDependentModuleBasic: string * Universal.universal list * (moduleId * string) list -> moduleId
val saveModule: string * {structs: string list, sigs: string list, functors: string list, onStartup: (unit -> unit) option} -> moduleId
val saveModuleBasic: string * Universal.universal list -> moduleId
val loadModule: string -> moduleId val loadModuleBasic: string -> Universal.universal list * moduleId
val showLoadedModules: unit -> moduleId list val releaseModule: moduleId -> unit val getModuleInfo: string -> moduleId * (moduleId * string) list
The moduleId is the 8 byte signature of the module generated from a combination of a time-stamp and a hash of the contents.
The main difference from the previous version is that saveModule and saveModuleBasic create independent modules without any dependencies. To create a module with dependencies it is necessary to use saveDependentModule or saveDependentModuleBasic. These have an additional argument which is a list of the moduleIds to be included as dependencies. They must be currently loaded or saved in the current session.
The additional string value for each dependency is intended to be used when the module is loaded. When loadModule or loadModuleBasic is called to load a module which has dependencies a check is made to see if the dependency has already been loaded. If it has not the string is used as the argument to a recursive call to loadModule/loadModuleBasic to load the dependency. If the string is the empty string this will not happen and, since the dependency is not loaded and cannot be located the attempt to load the dependent module will fail.
releaseModule has been added and this moves the module from permanent store into local, garbage-collected memory. It does not remove any structures, signatures or functors from the name space but allows for garbage collection if they are redefined. It can be used to get the same effect as loadModule and saveModule previously had.
I'm hoping this will solve most of the outstanding problems and the requirement to specify the dependencies won't cause problems for the ML Basis code. The mechanism essentially provides an alternative to the long-standing saved state system.
David
That looks great, thank you. I think that should indeed solve everything so far; but it seems that PolyML.export was broken somewhere along the way and now causes Poly/ML to crash (segfault) when calling it. The change happened somewhere between 3b9b3b0 and 3e0f720.
Here's a backtrace from gdb:
#0 CopyScan::ScanAddress(PolyObject**) #1 CopyScan::ScanAddressAt(PolyWord*) #2 ScanAddress::ScanAddressesInObject(PolyObject*, unsigned long) #3 CopyScan::ScanCodeAddressAt(PolyObject**) #4 ScanAddress::ScanAddressesInObject(PolyObject*, unsigned long) #5 CopyScan::ScanObjectAddress(PolyObject*) #6 Exporter::RunExport(PolyObject*) #7 Processes::BeginRootThread(PolyObject*) #8 polymain
On 07/10/2025 00:27, vqn wrote:
That looks great, thank you. I think that should indeed solve everything so far; but it seems that PolyML.export was broken somewhere along the way and now causes Poly/ML to crash (segfault) when calling it. The change happened somewhere between 3b9b3b0 and 3e0f720.
Thanks. That should now be fixed. David
Hi, I have found a couple more issues on commit c50fe06, though some may be due to misuse on my part.
1. PolyML.SaveState.loadModuleBasic and getModuleInfo sometimes raise SysErr ENOENT despite the actual file existing and being a valid module. It seems to happen when the given path is absolute, though not always. Bisecting points to 6b4d491.
2. Compiling code that depends on values from an exported module sometimes fails and raises `InternalError: BICEval address not code raised`.
3. Exporting may segfault: `exporter.cpp:1036: unsigned int Exporter::findArea(void*): Assertion `0' failed.` Gdb stacktrace:
Exporter::findArea exporter.cpp:1036 ELFExport::createRelocation elfexport.cpp:340 Exporter::createRelocation exporter.cpp:1023 Exporter::relocateValue exporter.cpp:1017 Exporter::relocateObject exporter.cpp:1072 ELFExport::exportStore elfexport.cpp:750 Exporter::RunExport exporter.cpp:821 ExportRequest::Perform exporter.cpp:728 Processes::BeginRootThread processes.cpp:1416 polymain mpoly.cpp:440 main polystub.c:42
I unfortunately did not manage to find a simple reproducer for #2 and #3, but those happened while compiling the various programs I use as tests: smlfmt for the internal error, apltail and aplc for the segfault.
I've pushed a couple more fixes which should address these problems.
On 11/10/2025 22:49, vqn wrote:
- PolyML.SaveState.loadModuleBasic and getModuleInfo sometimes raise SysErr ENOENT despite the actual file existing and being a valid module.
Should be fixed by 3145a94.
- Compiling code that depends on values from an exported module sometimes fails and raises `InternalError: BICEval address not code raised`.
I haven't seen this although it could be related to the next problem.
- Exporting may segfault: `exporter.cpp:1036: unsigned int Exporter::findArea(void*): Assertion `0' failed.`
Should be fixed by fb8e56c.
Unless any more problems show up I will merge this branch into master and remove the DependentModules branch.
David
Thanks for the fixes. I can confirm that those errors are indeed gone. I am however now facing an issue with opaque types mismatch when building smlfmt. I have once again been unable to find a smaller reproducer, but it happens both when exporting the first time, as well as when loading from disk in later runs.
Here's the error:
src/smlfmt.sml:227.0-233.0: error: Type error in function application. Function: TabbedTokenDoc.prettyJustComments {ribbonFrac = ribbonFrac, maxWidth = maxWidth, indentWidth = indentWidth, tabWidth = ..., ...} : ?.TabbedTokenDoc.Token.t t -> TabbedTokenDoc.CustomString.t Argument: cs : Token.t t Reason: Can't unify ?.TabbedTokenDoc.Token.t = token (*Created from opaque signature*) with Token.t = token (*Created from opaque signature*) (Different type constructors)
If that is of any help:
TabbedTokenDoc is declared in src/prettier-print/TabbedTokenDoc.sml from the TabbedTokenDoc functor (src/base/PrettyTabbedDoc.sml). The Token substructure is passed in as argument to the functor and contains the opened original Token structure (src/ast/Token.sml) as well as a few extra functions.
The original Token structure and the TabbedTokenDoc functor are exported in a first module, the TabbedTokenDoc structure is exported in another, and those two are imported into the final namespace which is used to compile src/smlfmt.sml.
On 17/10/2025 23:23, vqn wrote:
Thanks for the fixes. I can confirm that those errors are indeed gone. I am however now facing an issue with opaque types mismatch when building smlfmt. I have once again been unable to find a smaller reproducer, but it happens both when exporting the first time, as well as when loading from disk in later runs.
The original Token structure and the TabbedTokenDoc functor are exported in a first module, the TabbedTokenDoc structure is exported in another, and those two are imported into the final namespace which is used to compile src/smlfmt.sml.
I suspect the problem is that some of the original modules that contained the types have not been included as dependencies in subsequent modules. As a result the types will have been duplicated and won't match.
When you export a module are you listing only the immediate children as dependencies in saveDependentModule or are you including the full dependency tree? To be safe, the whole tree needs to be included.
David
When you export a module are you listing only the immediate children as dependencies in saveDependentModule or are you including the full dependency tree? To be safe, the whole tree needs to be included.
I was indeed adding only the direct dependencies. It now works correctly when including all transitive dependencies as well.
I've been doing more testing and so far have not met any new problems. Thank you for all the work you've done.
On 23/10/2025 17:36, vqn wrote:
When you export a module are you listing only the immediate children as dependencies in saveDependentModule or are you including the full dependency tree? To be safe, the whole tree needs to be included.
I was indeed adding only the direct dependencies. It now works correctly when including all transitive dependencies as well.
I am testing the latest cache branch, a8849f2, and am seeing an opaque type mismatch. Does the cache branch have the latest fixes?
Phil
On 30/07/2025 22:07, vqn wrote:
Over the past few months, I have been working on an implementation of the ML Basis system for Poly/ML. It supports the complete mlb spec and is available both as a library and a cli tool:
https://github.com/vqns/polymlb
On a side note, I have been trying to implement incremental compilation by caching compiled .mlb files (i.e compiler namespaces) and exporting and reimporting them through {save,load}ModuleBasic.
Is your branch with this incremental compilation support publicly available somewhere? (I can see only "main" on Github.)
Phil
It is not public at the moment. It's based on a fairly old revision and honestly a bit of a mess. I'll push it once I've rebased on top of the latest main and cleaned it up.