I am trying to create an SML binding to a C function that returns an array that the caller must free. Usually, there is a free function that takes just the array (as a pointer) which can be attached as a finalizer with CInterface.setFinal. I have encountered a case [1] where the caller must also pass the size of the array, returned when the array is created, to the free function.
Simplifying the example, we have, for some C type Elem:
Elem *new (..., int *n_elems); /* n_elems is an 'out' parameter */ void free (Elem *elems, int n_elems);
and want an SML function like
val new : ... -> elem vector
Unfortunately, the function given to CInterface.setFinal is called with only one argument, the vol that is being finalized. Therefore this free function cannot be used. Does the current FFI architecture allow a variant of setFinal that passes extra arguments to the finalization function? For example:
val setFinal1 : sym -> 'a Conversion -> vol -> 'a -> unit
This isn't particularly common so is probably not a show-stopper. Another benefit could be enabling use of functions g_slice_alloc and g_slice_free1 that needs the number of bytes to free: https://developer.gnome.org/glib/stable/glib-Memory-Slices.html
Thanks, Phil
1. The C function in question is gtk_target_table_new_from_list that returns an array and its size. The array should be freed with gtk_target_table_free which should be passed the size. See: - https://developer.gnome.org/gtk3/stable/gtk3-Selections.html#gtk-target-tabl... - https://developer.gnome.org/gtk3/stable/gtk3-Selections.html#gtk-target-tabl...
On 14/09/2015 22:36, Phil Clayton wrote:
I am trying to create an SML binding to a C function that returns an array that the caller must free. Usually, there is a free function that takes just the array (as a pointer) which can be attached as a finalizer with CInterface.setFinal. I have encountered a case [1] where the caller must also pass the size of the array, returned when the array is created, to the free function.
Phil, I wonder if this is a case for the use of a weak reference ( http://www.polyml.org/documentation/Reference/Weak.html ) ? Have a look at this link and see whether it would work for you.
Essentially, what weak references do is allow you to run finalisers in "ML space". CInterface.setFinal sets up a C function to be run as a finaliser and the GC runs these immediately when it detects an unreferenced "vol". It can't run an ML function at that point because the ML heap is still being collected. Changes to weak references are notified at some point after ML is resumed after the GC.
If you've got any questions please ask. I'd also be interested to know if it solves your problem.
Regards, David
David,
I think weak references could do the job. Better still, I may be able to adapt (shamelessly copy) MLtonFinalizable: https://github.com/MLton/mlton/blob/master/basis-library/mlton/finalizable.s... https://github.com/MLton/mlton/blob/master/basis-library/mlton/finalizable.s... This would have the added bonus of a common interface for finalizable values between the compilers.
The main question is when to check the weak references. Is there some way to register a function to be called immediately after a GC? I'll investigate using a separate thread and the mutex which may be better anyway.
Finalizers should also be called when the ML session exits. It appears that functions registered with OS.Process.atExit are always run before Poly/ML exits (whether or not there is an explicit call to OS.Process.exit). Can you confirm that?
I was wondering how to implement the 'touch' function of MLTON_FINALIZABLE that forces a weak reference to stay alive. The expression ignore (PolyML.pointerEq (x, x) orelse raise Fail "touch"; print ""); seems to prevent Poly/ML optimizing the dependence on x away and works for any type x. Bizarrely, I found that without print "", the weak reference stayed alive. Can you think of something simpler?
Finally, a couple of general issues:
1. I was getting some unexpected behaviour with weak references using Poly/ML 5.5.0 - see the following example. Poly/ML 5.5.2 behaved as expected though. Does that mean 5.5.0 should be avoided?
local val w = Weak.weak (SOME (ref ())) in fun check () = isSome (!w) end ; check (); val () = PolyML.fullGC (); check (); (* false for 5.5.2, ok; true for 5.5.0 - issue? *)
2. The function weakArray confuses me although I doubt I will need to use it. It's not clear why it would be called with a reference, i.e. non-NONE argument because that is duplicated for every array element. Furthermore, if it is called with a non-NONE argument and the array size is more than 1, then this can crash Poly/ML. For example, entering the following in the top-level:
val wa = Weak.weakArray (2, SOME (ref ())); PolyML.fullGC (); wa; (* seg fault *)
Regards, Phil
15/09/15 13:21, David Matthews wrote:
On 14/09/2015 22:36, Phil Clayton wrote:
I am trying to create an SML binding to a C function that returns an array that the caller must free. Usually, there is a free function that takes just the array (as a pointer) which can be attached as a finalizer with CInterface.setFinal. I have encountered a case [1] where the caller must also pass the size of the array, returned when the array is created, to the free function.
Phil, I wonder if this is a case for the use of a weak reference ( http://www.polyml.org/documentation/Reference/Weak.html ) ? Have a look at this link and see whether it would work for you.
Essentially, what weak references do is allow you to run finalisers in "ML space". CInterface.setFinal sets up a C function to be run as a finaliser and the GC runs these immediately when it detects an unreferenced "vol". It can't run an ML function at that point because the ML heap is still being collected. Changes to weak references are notified at some point after ML is resumed after the GC.
If you've got any questions please ask. I'd also be interested to know if it solves your problem.
Regards, David
On 15/09/2015 22:18, Phil Clayton wrote:
I think weak references could do the job. Better still, I may be able to adapt (shamelessly copy) MLtonFinalizable: https://github.com/MLton/mlton/blob/master/basis-library/mlton/finalizable.s...
https://github.com/MLton/mlton/blob/master/basis-library/mlton/finalizable.s...
This would have the added bonus of a common interface for finalizable values between the compilers.
The main question is when to check the weak references. Is there some way to register a function to be called immediately after a GC? I'll investigate using a separate thread and the mutex which may be better anyway.
There's no way to register a function. Because of the way the thread system works I think the only way to do the finalisation is through a separate thread.
Finalizers should also be called when the ML session exits. It appears that functions registered with OS.Process.atExit are always run before Poly/ML exits (whether or not there is an explicit call to OS.Process.exit). Can you confirm that?
The intention is that that should be the case. If OS.Process.terminate is called the functions aren't run. I have just done a test and it appears that exiting by calling Thread.Thread.exit() doesn't run the atExit functions either.
I was wondering how to implement the 'touch' function of MLTON_FINALIZABLE that forces a weak reference to stay alive. The expression ignore (PolyML.pointerEq (x, x) orelse raise Fail "touch"; print ""); seems to prevent Poly/ML optimizing the dependence on x away and works for any type x. Bizarrely, I found that without print "", the weak reference stayed alive. Can you think of something simpler?
I looked at the MLton documentation and couldn't understand what "touch" was trying to achieve. Could you explain it?
My idea with weak variables in Poly was that the "token" and the item to be finalised would be linked in such a way that they would have the same lifetime e.g. the "token" would be paired with the item. Is the issue that a global optimising compiler such as MLton could work out that the token was not referenced even though the item was and remove it from the pair at the last reference? Poly/ML only does that in very limited circumstances. Is the idea of "touch" that this counts as a reference to the token and that you add a call to it after each reference to the item so that the lifetime of the token is no less than the lifetime of the item? Since the token is a reference either assigning or dereferencing it should work. Even if the result is always () that should still count as a reference.
Finally, a couple of general issues:
- I was getting some unexpected behaviour with weak references using
Poly/ML 5.5.0 - see the following example. Poly/ML 5.5.2 behaved as expected though. Does that mean 5.5.0 should be avoided?
I can't remember any specific changes in that area but it's perfectly possible there has been a bug fix. Certainly I would avoid older releases.
- The function weakArray confuses me although I doubt I will need to
use it. It's not clear why it would be called with a reference, i.e. non-NONE argument because that is duplicated for every array element. Furthermore, if it is called with a non-NONE argument and the array size is more than 1, then this can crash Poly/ML. For example, entering the following in the top-level:
val wa = Weak.weakArray (2, SOME (ref ())); PolyML.fullGC (); wa; (* seg fault *)
There's obviously a bug in that area. It appears to be associated with the fact that the same token is being used at two different places in the array.
Regards, David
16/09/15 12:40, David Matthews wrote:
On 15/09/2015 22:18, Phil Clayton wrote:
I think weak references could do the job. Better still, I may be able to adapt (shamelessly copy) MLtonFinalizable: https://github.com/MLton/mlton/blob/master/basis-library/mlton/finalizable.s...
https://github.com/MLton/mlton/blob/master/basis-library/mlton/finalizable.s...
This would have the added bonus of a common interface for finalizable values between the compilers.
The main question is when to check the weak references. Is there some way to register a function to be called immediately after a GC? I'll investigate using a separate thread and the mutex which may be better anyway.
There's no way to register a function. Because of the way the thread system works I think the only way to do the finalisation is through a separate thread.
Understood. I think a separate thread is better anyway.
Finalizers should also be called when the ML session exits. It appears that functions registered with OS.Process.atExit are always run before Poly/ML exits (whether or not there is an explicit call to OS.Process.exit). Can you confirm that?
The intention is that that should be the case. If OS.Process.terminate is called the functions aren't run. I have just done a test and it appears that exiting by calling Thread.Thread.exit() doesn't run the atExit functions either.
I think the desirable behaviour is for finalizers to be run whenever Poly/ML is ending its own process. For example, if SIGTERM causes Poly/ML to exit, finalizers should be run because they are typically performing clean-up operations. Is there a way to do that?
Also, even using OS.Process.atExit and exiting via OS.Process.exit, I am finding that remaining finalizers aren't being run. After a call to fullGC in the at-exit function, the weakly referenced values haven't been garbage-collected, so their finalizers aren't run. Has the lifespan of those values has not ended at that that point? (See my other email for the example code.)
I was wondering how to implement the 'touch' function of MLTON_FINALIZABLE that forces a weak reference to stay alive. The expression ignore (PolyML.pointerEq (x, x) orelse raise Fail "touch"; print ""); seems to prevent Poly/ML optimizing the dependence on x away and works for any type x. Bizarrely, I found that without print "", the weak reference stayed alive. Can you think of something simpler?
I looked at the MLton documentation and couldn't understand what "touch" was trying to achieve. Could you explain it?
"touch t" prevents finalization of t until the call has been evaluated. "touch t" an operation that uses t to do nothing, so t cannot be garbage-collected until the operation has finished.
My idea with weak variables in Poly was that the "token" and the item to be finalised would be linked in such a way that they would have the same lifetime e.g. the "token" would be paired with the item.
With these finalizable values, no "item" is required, there is only a "token". The token can be a reference to anything so we make it a reference to the value that finalizers will be run on. All access to the finalizable value is by dereferencing the token, so the token is referenced as long as the finalizable value is used. This is ensured by having an abstract type requiring Finalizable.withValue to be used to access the value.
Is the issue that a global optimising compiler such as MLton could work out that the token was not referenced even though the item was and remove it from the pair at the last reference?
I don't think that is the issue here (as we don't have an item).
Poly/ML only does that in very limited circumstances. Is the idea of "touch" that this counts as a reference to the token and that you add a call to it after each reference to the item so that the lifetime of the token is no less than the lifetime of the item?
Yes, but as there is no item, the purpose of touch is to delay finalization until some other event has occurred. One example use of touch is in the implementation of Finalizable.finalizeBefore. The 'other event' is finalization of another finalizable value.
Since the token is a reference either assigning or dereferencing it should work. Even if the result is always () that should still count as a reference.
I find that ignore (!value) doesn't keep the value alive whereas value := !value does. That was a useful suggestion - much more preferable to my earlier idea. The assignment is doing work whose effect isn't required but is that an overhead worth worrying about?
Phil
On Thu, Sep 17, 2015 at 6:48 AM, Phil Clayton <phil.clayton at lineone.net> wrote:
16/09/15 12:40, David Matthews wrote:
On 15/09/2015 22:18, Phil Clayton wrote:
I was wondering how to implement the 'touch' function of MLTON_FINALIZABLE that forces a weak reference to stay alive. The expression ignore (PolyML.pointerEq (x, x) orelse raise Fail "touch"; print ""); seems to prevent Poly/ML optimizing the dependence on x away and works for any type x. Bizarrely, I found that without print "", the weak reference stayed alive. Can you think of something simpler?
I looked at the MLton documentation and couldn't understand what "touch" was trying to achieve. Could you explain it?
"touch t" prevents finalization of t until the call has been evaluated. "touch t" an operation that uses t to do nothing, so t cannot be garbage-collected until the operation has finished.
Essentially, yes.
"MLton.Finalizable.touch v" is equivalent to "MLton.Finalizable.withValue (v, fn _ => ())", because "withValue" guarantees that the finalizers of v will not run before the function completes. Of course, internally, both "MLton.Finalizable.touch" and "MLton.Finalizable.withValue" are implemented using a more primitive "touch" with the more general type 'a -> unit. This primitive is treated by the compiler as an unoptimizable reference to the object; essentially, it is a future proof method of maintaining a strong reference to an object and its main use case is to maintain that strong reference to an object for which there is a weak reference.
For example, suppose one had:
val token = ref () val wtoken = Weak.new token ... val v1 = Weak.get wtoken ... !token (* and no subsequent uses of token *) ... val v2 = Weak.get wtoken ...
One might expect that v1 is SOME and v2 is NONE. But, MLton's optimizations are somewhat tuned towards SML without extensions such as weak references, so it would be reasonable to replace the "!token" with "()". And then v1 could be NONE. And replacing "!token" with "token := !token" might be optimized away in a future version of the compiler. The "touch" primitive, which perhaps should also be exported somewhere else in "structure MLton", provides a robust method instead of the fragile method of crafting a sophisticated nop that isn't optimized away. But, thus far, the main use case has been for finalizers.
Is the issue that a global optimising compiler such as MLton could work out that the token was not referenced even though the item was and remove it from the pair at the last reference?
I don't think that is the issue here (as we don't have an item).
In the context of MLton (and now in Poly/ML), the token is a Weak reference to the item and the check for when to run the finalizers is whether or not the weak reference has gone NONE. So, it is something like the example above that could cause a finalizer to run earlier than expected if there weren't a touch in each use of the finalized value.
Poly/ML only does that in very limited circumstances. Is the idea of "touch" that this counts as a reference to the token and that you add a call to it after each reference to the item so that the lifetime of the token is no less than the lifetime of the item?
Yes, but as there is no item, the purpose of touch is to delay finalization until some other event has occurred. One example use of touch is in the implementation of Finalizable.finalizeBefore. The 'other event' is finalization of another finalizable value.
Right. The use case for things like "Finalizable.finalizeBefore" is when one has a number of C pointers (which to SML look like simple Word64.word-s), where one knows that it is important to free the pointed-to C data structures in the right order. MLton can be a little more aggressive about "Word64.word ref"s (like flattening them into a containing data structure), so the "Word64.word ref" and the Weak reference to it can get a little out of sync as above. Again, a "touch" in the right place ensures that the reference is flattened and the Weak reference properly tracks the object.
-Matthew
On 17/09/2015 19:31, Matthew Fluet wrote:
On Thu, Sep 17, 2015 at 6:48 AM, Phil Clayton <phil.clayton at lineone.net> wrote:
16/09/15 12:40, David Matthews wrote:
On 15/09/2015 22:18, Phil Clayton wrote:
I looked at the MLton documentation and couldn't understand what "touch" was trying to achieve. Could you explain it?
"touch t" prevents finalization of t until the call has been evaluated. "touch t" an operation that uses t to do nothing, so t cannot be garbage-collected until the operation has finished.
Essentially, yes.
Thank you both for your explanations. I was gradually getting the idea as I wrote my original question but it's good to have it confirmed.
It looks as though the most future-proof solution is to add "touch" as a primitive as you've done.
David
Hi Phil,
Can't you wrap the array and size in a struct. For example in https://github.com/polyml/polyml/tree/cb1b36caa242fc6ea9f74b015158466efac68d... we have :
--------------------------------- //ForeignTest.c typedef struct _tree { struct _tree *left, *right; int nValue; } *tree;
int SumTree(tree t) { if (t == NULL) return 0; else return t->nValue + SumTree(t->left) + SumTree(t->right); } ---------------------------------
and
--------------------------------- (*ForeignTest.sml*) val sumTree = CInterface.call1 ( CInterface.load_sym mylib "SumTree") TREE INT;
---------------------------------
So you just write a FreeTree function, which is modelled after the FreeIt function in ForeignTest.c :
void FreeIt(void *p) { printf("Freed object at %p\n", p); fflush(stdout); free(p); }
, which then reads nValue and passes it on to the relevant gtk function?
On Mon, Sep 14, 2015 at 10:36 PM, Phil Clayton <phil.clayton at lineone.net> wrote:
I am trying to create an SML binding to a C function that returns an array that the caller must free. Usually, there is a free function that takes just the array (as a pointer) which can be attached as a finalizer with CInterface.setFinal. I have encountered a case [1] where the caller must also pass the size of the array, returned when the array is created, to the free function.
Simplifying the example, we have, for some C type Elem:
Elem *new (..., int *n_elems); /* n_elems is an 'out' parameter */ void free (Elem *elems, int n_elems);
and want an SML function like
val new : ... -> elem vector
Unfortunately, the function given to CInterface.setFinal is called with only one argument, the vol that is being finalized. Therefore this free function cannot be used. Does the current FFI architecture allow a variant of setFinal that passes extra arguments to the finalization function? For example:
val setFinal1 : sym -> 'a Conversion -> vol -> 'a -> unit
This isn't particularly common so is probably not a show-stopper. Another benefit could be enabling use of functions g_slice_alloc and g_slice_free1 that needs the number of bytes to free: https://developer.gnome.org/glib/stable/glib-Memory-Slices.html
Thanks, Phil
- The C function in question is gtk_target_table_new_from_list that
returns an array and its size. The array should be freed with gtk_target_table_free which should be passed the size. See:
https://developer.gnome.org/gtk3/stable/gtk3-Selections.html#gtk-target-tabl...
https://developer.gnome.org/gtk3/stable/gtk3-Selections.html#gtk-target-tabl... _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Hi Artella,
Thanks - I wondered about that but it is quite a bit of effort by comparison because I am creating a framework for automatically generating bindings rather than writing some bindings by hand. When calling a C function that takes the array pointer, the struct pointer would be passed from ML, so it would be necessary to wrap the C function to dereference the struct pointer to get the array pointer. I would want to do that dereferencing on the ML side (by manipulating vols) to avoid having to generate C function wrappers. That's not too much effort. Also, as you say, a free function would be needed but that would need to be generated I don't really want to extend the framework to do that - currently I don't generate any C code.
In MLton, creating a finalizable value from the pointer and size is simple. Roughly as follows:
fun fromNewPtr p n = let val array = Finalizable.new p in Finalizable.addFinalizer (t, fn p => free_ (p, n)); array end
where free_ is the C free function. If Poly/ML can use the same or similar code, that would be much easier!
Regards, Phil
15/09/15 15:27, Artella Coding wrote:
Hi Phil,
Can't you wrap the array and size in a struct. For example in https://github.com/polyml/polyml/tree/cb1b36caa242fc6ea9f74b015158466efac68d... we have :
//ForeignTest.c typedef struct _tree { struct _tree *left, *right; intnValue; } *tree;
int SumTree(tree t) { if (t == NULL) return 0; else return t->nValue + SumTree(t->left) + SumTree(t->right); }
and
(*ForeignTest.sml*) val sumTree = CInterface.call1 ( CInterface.load_sym mylib "SumTree") TREE INT;
So you just write a FreeTree function, which is modelled after the FreeIt function in ForeignTest.c :
void FreeIt(void *p) { printf("Freed object at %p\n", p); fflush(stdout); free(p); }
, which then reads nValue and passes it on to the relevant gtk function?
On Mon, Sep 14, 2015 at 10:36 PM, Phil Clayton <phil.clayton at lineone.net <mailto:phil.clayton at lineone.net>> wrote:
I am trying to create an SML binding to a C function that returns an array that the caller must free. Usually, there is a free function that takes just the array (as a pointer) which can be attached as a finalizer with CInterface.setFinal. I have encountered a case [1] where the caller must also pass the size of the array, returned when the array is created, to the free function. Simplifying the example, we have, for some C type Elem: Elem *new (..., int *n_elems); /* n_elems is an 'out' parameter */ void free (Elem *elems, int n_elems); and want an SML function like val new : ... -> elem vector Unfortunately, the function given to CInterface.setFinal is called with only one argument, the vol that is being finalized. Therefore this free function cannot be used. Does the current FFI architecture allow a variant of setFinal that passes extra arguments to the finalization function? For example: val setFinal1 : sym -> 'a Conversion -> vol -> 'a -> unit This isn't particularly common so is probably not a show-stopper. Another benefit could be enabling use of functions g_slice_alloc and g_slice_free1 that needs the number of bytes to free: https://developer.gnome.org/glib/stable/glib-Memory-Slices.html Thanks, Phil 1. The C function in question is gtk_target_table_new_from_list that returns an array and its size. The array should be freed with gtk_target_table_free which should be passed the size. See: - https://developer.gnome.org/gtk3/stable/gtk3-Selections.html#gtk-target-table-new-from-list - https://developer.gnome.org/gtk3/stable/gtk3-Selections.html#gtk-target-table-free _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk <mailto:polyml at inf.ed.ac.uk> http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On 15/09/2015 23:06, Phil Clayton wrote:
In MLton, creating a finalizable value from the pointer and size is simple. Roughly as follows:
fun fromNewPtr p n = let val array = Finalizable.new p in Finalizable.addFinalizer (t, fn p => free_ (p, n)); array end
where free_ is the C free function. If Poly/ML can use the same or similar code, that would be much easier!
This is my quick attempt to create something similar. As long as the result of the call to makeFinal is reachable the finalisation function will not be called. Once it becomes unreachable the finalisation function will be called soon after the GC. I've only tested it with explicit calls to PolyML.fullGC but it should work equally when the GC is activated automatically. However it does require a full GC and that can occur relatively infrequently. David
local open Weak val finals = ref []
fun threadFn () = let val () = Thread.Mutex.lock weakLock fun doFilter (ref (SOME _), _) = true | doFilter (ref NONE, f) = (f(); false) in while true do ( finals := List.filter doFilter (!finals); Thread.ConditionVar.wait(weakSignal, weakLock) ) end
val _ = Thread.Thread.fork(threadFn, []) in fun makeFinal (f : unit -> unit) = let val r = ref () val () = ThreadLib.protect weakLock (fn () => finals := (weak(SOME r), f) :: ! finals) () in r end end;
Thanks for the example - it was a useful to see how to use the Thread structure. And to see how easy it was. That's a nice library.
Attached is a first stab at implementing the FINALIZABLE signature shown below, which is identical to MLTON_FINALIZABLE. The code is based on the MLton implementation but with a number of differences. As mentioned in the other email, there seems to be an issue with finalizers not being run when exiting via OS.Process.exit.
Phil
signature FINALIZABLE = sig type 'a t val new : 'a -> 'a t val addFinalizer : 'a t * ('a -> unit) -> unit val finalizeBefore : 'a t * 'b t -> unit val touch : 'a t -> unit val withValue : 'a t * ('a -> 'b) -> 'b end
16/09/15 11:56, David Matthews wrote:
On 15/09/2015 23:06, Phil Clayton wrote:
In MLton, creating a finalizable value from the pointer and size is simple. Roughly as follows:
fun fromNewPtr p n = let val array = Finalizable.new p in Finalizable.addFinalizer (t, fn p => free_ (p, n)); array end
where free_ is the C free function. If Poly/ML can use the same or similar code, that would be much easier!
This is my quick attempt to create something similar. As long as the result of the call to makeFinal is reachable the finalisation function will not be called. Once it becomes unreachable the finalisation function will be called soon after the GC. I've only tested it with explicit calls to PolyML.fullGC but it should work equally when the GC is activated automatically. However it does require a full GC and that can occur relatively infrequently. David
local open Weak val finals = ref []
fun threadFn () = let val () = Thread.Mutex.lock weakLock fun doFilter (ref (SOME _), _) = true | doFilter (ref NONE, f) = (f(); false) in while true do ( finals := List.filter doFilter (!finals); Thread.ConditionVar.wait(weakSignal, weakLock) ) end val _ = Thread.Thread.fork(threadFn, [])
in fun makeFinal (f : unit -> unit) = let val r = ref () val () = ThreadLib.protect weakLock (fn () => finals := (weak(SOME r), f) :: ! finals) () in r end end; _______________________________________________ polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On 17/09/2015 11:50, Phil Clayton wrote:
Thanks for the example - it was a useful to see how to use the Thread structure. And to see how easy it was. That's a nice library.
Attached is a first stab at implementing the FINALIZABLE signature shown below, which is identical to MLTON_FINALIZABLE. The code is based on the MLton implementation but with a number of differences. As mentioned in the other email, there seems to be an issue with finalizers not being run when exiting via OS.Process.exit.
Phil, I've begun by adding your code to a Finalizable branch in the github repository. The Finalizable structure and signature are included in the basis library when "poly" is built. It looks good and there's just a few points.
Your definition of "touch" should be fine. I think I now understand what's happening. At the moment at any rate the Poly/ML optimiser views the "store" as a single entity so it won't try to reorder or remove assignments just because they occur to locations that a more sophisticated analysis might show to be independent. By the way, I think perhaps the reason you had that rather bizarre behaviour with your previous definition of "touch" was that the address of the token was still in a register. Calling "print" before calling PolyML.fullGC will have replaced the address with something else.
I was wondering how finalisation should this interact with saving state? Should the state of the finalizers be restored when loadState is called? That is what will happen with the code as it currently is but it is possible to change that so that the state is not affected by loadState by using "non-overwritable" references.
I've altered the initialisation code slightly. It now uses PolyML.onEntry to start the thread and install the atExit handler. This ensures that these are run on every run. Have a look at basis/BasicStreamIO.sml which also contains a "non-overwritable" reference for the list of streams that must be closed when Poly/ML exits.
I wonder if the finalizers should be called outside updatePendingList i.e. after "mutex" has been released? The problem at the moment is that if a finalizer calls "new" it will deadlock. A finalizer should probably not be creating a new finalizer but equally we don't want deadlock.
David
17/09/15 16:12, David Matthews wrote:
On 17/09/2015 11:50, Phil Clayton wrote:
Thanks for the example - it was a useful to see how to use the Thread structure. And to see how easy it was. That's a nice library.
Attached is a first stab at implementing the FINALIZABLE signature shown below, which is identical to MLTON_FINALIZABLE. The code is based on the MLton implementation but with a number of differences. As mentioned in the other email, there seems to be an issue with finalizers not being run when exiting via OS.Process.exit.
Phil, I've begun by adding your code to a Finalizable branch in the github repository. The Finalizable structure and signature are included in the basis library when "poly" is built. It looks good and there's just a few points.
Great!
Your definition of "touch" should be fine. I think I now understand what's happening. At the moment at any rate the Poly/ML optimiser views the "store" as a single entity so it won't try to reorder or remove assignments just because they occur to locations that a more sophisticated analysis might show to be independent. By the way, I think perhaps the reason you had that rather bizarre behaviour with your previous definition of "touch" was that the address of the token was still in a register. Calling "print" before calling PolyML.fullGC will have replaced the address with something else.
I was wondering how finalisation should this interact with saving state? Should the state of the finalizers be restored when loadState is called? That is what will happen with the code as it currently is but it is possible to change that so that the state is not affected by loadState by using "non-overwritable" references.
That's an interesting question. If there is a case for using a finalizable value on resources whose state is entirely captured in the mutable and immutable store, then the finalizable state should probably be restored. Most uses I've seen are for C pointers which wouldn't be available in a new session but this is already a limitation of vols. At the moment I can't think of a reason to use non-overwritable refs.
I've altered the initialisation code slightly. It now uses PolyML.onEntry to start the thread and install the atExit handler. This ensures that these are run on every run. Have a look at basis/BasicStreamIO.sml which also contains a "non-overwritable" reference for the list of streams that must be closed when Poly/ML exits.
I spotted this too after building a test case as a stand-alone executable and nothing happened...
I wonder if the finalizers should be called outside updatePendingList i.e. after "mutex" has been released? The problem at the moment is that if a finalizer calls "new" it will deadlock. A finalizer should probably not be creating a new finalizer but equally we don't want deadlock.
Good point. That was easily fixed by making the function clean return a list of functions to run instead of a flag. See attached patch (created with git format-patch). (I can't do pull requests at the moment because Github no longer works properly with Firefox 16.)
Regards, Phil
Thanks for your patch. I've integrated it and pushed it to github.
I was wondering how finalisation should this interact with saving state? Should the state of the finalizers be restored when loadState is called? That is what will happen with the code as it currently is but it is possible to change that so that the state is not affected by loadState by using "non-overwritable" references.
That's an interesting question. If there is a case for using a finalizable value on resources whose state is entirely captured in the mutable and immutable store, then the finalizable state should probably be restored. Most uses I've seen are for C pointers which wouldn't be available in a new session but this is already a limitation of vols. At the moment I can't think of a reason to use non-overwritable refs.
I thought some more about this and ran some tests. I've now made it an non-overwritable reference. I would expect that finalisation would only make sense for external state so it is better to have the finaliser list unaffected by SaveState.loadState. The following examples now do what I would expect: Session 1: let open Finalizable val z = new () in addFinalizer(z, fn () => print "Saver final\n"); PolyML.SaveState.saveState "saved"; touch z end; Prints "Saver final" on exit.
Session 2: let open Finalizable val z = new () in addFinalizer(z, fn () => print "Loader final\n"); PolyML.SaveState.loadState "saved"; touch z end; Prints "Loader final" on exit.
The only other changes I'm thinking about are: 1. Introducing a "touch" primitive for long-term security in case a future update to the optimiser means that the current "touch" becomes a no-op. 2. Simplifying the Weak structure by removing everything except the "weak" function. weakSignal and weakLock were really intended to allow users to write their own finalisation code but if we have finalisers provided they're probably unnecessary. There would obviously have to be a way to wake up the finaliser thread. Currently it's integrated with the signal handler thread in a non-obvious way.
David
18/09/15 11:53, David Matthews wrote:
Thanks for your patch. I've integrated it and pushed it to github.
I was wondering how finalisation should this interact with saving state? Should the state of the finalizers be restored when loadState is called? That is what will happen with the code as it currently is but it is possible to change that so that the state is not affected by loadState by using "non-overwritable" references.
That's an interesting question. If there is a case for using a finalizable value on resources whose state is entirely captured in the mutable and immutable store, then the finalizable state should probably be restored. Most uses I've seen are for C pointers which wouldn't be available in a new session but this is already a limitation of vols. At the moment I can't think of a reason to use non-overwritable refs.
I thought some more about this and ran some tests. I've now made it an non-overwritable reference. I would expect that finalisation would only make sense for external state so it is better to have the finaliser list unaffected by SaveState.loadState. The following examples now do what I would expect: Session 1: let open Finalizable val z = new () in addFinalizer(z, fn () => print "Saver final\n"); PolyML.SaveState.saveState "saved"; touch z end; Prints "Saver final" on exit.
Session 2: let open Finalizable val z = new () in addFinalizer(z, fn () => print "Loader final\n"); PolyML.SaveState.loadState "saved"; touch z end; Prints "Loader final" on exit.
That makes sense to me. The function cleanAtExit seems to be working now too. Was it the use of non-overwritable refs that fixed that?
The only other changes I'm thinking about are:
- Introducing a "touch" primitive for long-term security in case a
future update to the optimiser means that the current "touch" becomes a no-op.
I think that would be a very good idea.
- Simplifying the Weak structure by removing everything except the
"weak" function. weakSignal and weakLock were really intended to allow users to write their own finalisation code but if we have finalisers provided they're probably unnecessary. There would obviously have to be a way to wake up the finaliser thread. Currently it's integrated with the signal handler thread in a non-obvious way.
Also, is there any point in Weak.weak taking an optional value? (Would anyone ever call it with NONE?) Presumably weak could add the required SOME to give val weak : 'a ref -> 'a ref option ref
weakSignal and weakLock could be avoided by adding Signal.onWeakMark to allow functions to be called when weak references are marked as NONE. An on-weak-mark handler thread could be started by Signal. I really don't understand the overall architecture though.
Phil
On 18/09/2015 13:50, Phil Clayton wrote:
The only other changes I'm thinking about are:
- Introducing a "touch" primitive for long-term security in case a
future update to the optimiser means that the current "touch" becomes a no-op.
I think that would be a very good idea.
I've done that now. It's a new run-time system call which just returns 0. RTS calls are used for all built-in functions and it's possible for the code-generator to process simple ones directly inline. I haven't done that at the moment so it goes as far as a piece of assembly code.
- Simplifying the Weak structure by removing everything except the
"weak" function. weakSignal and weakLock were really intended to allow users to write their own finalisation code but if we have finalisers provided they're probably unnecessary. There would obviously have to be a way to wake up the finaliser thread. Currently it's integrated with the signal handler thread in a non-obvious way.
I think I'll leave that for the moment.
Also, is there any point in Weak.weak taking an optional value? (Would anyone ever call it with NONE?) Presumably weak could add the required SOME to give val weak : 'a ref -> 'a ref option ref
My thinking was that you might have a finite set of resources, such as file descriptors, and a list of weak references or a weak array with an entry for each resource. When a resource was allocated a SOME entry would be made in the weak ref/array. If the resource was no longer reachable the corresponding entry would be set to NONE and the resource could be reclaimed and later reallocated.
David
18/09/15 17:38, David Matthews wrote:
On 18/09/2015 13:50, Phil Clayton wrote:
The only other changes I'm thinking about are:
- Introducing a "touch" primitive for long-term security in case a
future update to the optimiser means that the current "touch" becomes a no-op.
I think that would be a very good idea.
I've done that now. It's a new run-time system call which just returns 0. RTS calls are used for all built-in functions and it's possible for the code-generator to process simple ones directly inline. I haven't done that at the moment so it goes as far as a piece of assembly code.
Much appreciated. For now I'd like to get a version working on top of 5.5.2, and I assume assignment will suffice there. Is there a way to create a non-overwritable ref in a normal session? I don't seem to be able to access LibrarySupport.noOverwriteRef (and these seem to be needed for cleanAtExit to work).
Also, is there any point in Weak.weak taking an optional value? (Would anyone ever call it with NONE?) Presumably weak could add the required SOME to give val weak : 'a ref -> 'a ref option ref
My thinking was that you might have a finite set of resources, such as file descriptors, and a list of weak references or a weak array with an entry for each resource. When a resource was allocated a SOME entry would be made in the weak ref/array. If the resource was no longer reachable the corresponding entry would be set to NONE and the resource could be reclaimed and later reallocated.
I see now. I had overlooked the possibility of reusing one of these weak refs after it is NONE.
Phil
On 19/09/2015 12:30, Phil Clayton wrote:
Much appreciated. For now I'd like to get a version working on top of 5.5.2, and I assume assignment will suffice there. Is there a way to create a non-overwritable ref in a normal session? I don't seem to be able to access LibrarySupport.noOverwriteRef (and these seem to be needed for cleanAtExit to work).
Are you sure that LibrarySupport.noOverwriteRef is actually required for the finalisers to be run on exit? It should only make a difference if you're saving state. I've just tried editing basis/Finalizable.sml so that will compile under 5.5.2-fixes and tested it with a simple example and it does what I'd expect.
David
19/09/15 13:11, David Matthews wrote:
On 19/09/2015 12:30, Phil Clayton wrote:
Much appreciated. For now I'd like to get a version working on top of 5.5.2, and I assume assignment will suffice there. Is there a way to create a non-overwritable ref in a normal session? I don't seem to be able to access LibrarySupport.noOverwriteRef (and these seem to be needed for cleanAtExit to work).
Are you sure that LibrarySupport.noOverwriteRef is actually required for the finalisers to be run on exit? It should only make a difference if you're saving state. I've just tried editing basis/Finalizable.sml so that will compile under 5.5.2-fixes and tested it with a simple example and it does what I'd expect.
My mistake - it seems I was testing the two versions on slightly different examples.
What I am actually observing is that finalizers are not run on exit for finalizable values that are in scope in the top-level environment. On exit, the REPL has finished, so shouldn't such values be garbage collected and therefore finalized?
Phil
On 21/09/2015 16:08, Phil Clayton wrote:
What I am actually observing is that finalizers are not run on exit for finalizable values that are in scope in the top-level environment. On exit, the REPL has finished, so shouldn't such values be garbage collected and therefore finalized?
I noticed that but it's very difficult to fix. The problem is that the top level environment is an array contained in the executable. It is treated as "initialised data" by the linker and loader. The garbage collector treats mutable data in the executable as roots for garbage collection which means it never has to look at the immutable data in the executable. The only way I can see of fixing the problem is to copy all the data, both mutable and immutable, from the executable into the heap.
As I wrote that it occurred to me that there might be some way of processing the current finaliser list during the shut-down process so that only weak refs that were referenced by other finalisers were preserved. This would allow the detection of dependencies, which is what is really needed at that point. This would be much cheaper and could be done instead of the current call to PolyML.fullGC.
I've merged the current Finalizer branch into master on github.
David
21/09/15 18:41, David Matthews wrote:
On 21/09/2015 16:08, Phil Clayton wrote:
What I am actually observing is that finalizers are not run on exit for finalizable values that are in scope in the top-level environment. On exit, the REPL has finished, so shouldn't such values be garbage collected and therefore finalized?
I noticed that but it's very difficult to fix. The problem is that the top level environment is an array contained in the executable. It is treated as "initialised data" by the linker and loader. The garbage collector treats mutable data in the executable as roots for garbage collection which means it never has to look at the immutable data in the executable. The only way I can see of fixing the problem is to copy all the data, both mutable and immutable, from the executable into the heap.
I see what you mean. I was thinking only about an interactive/batch session. For an executable, it's not clear that it makes sense to have a finalizable value at the top-level. Already something odd is happening: in the attached example toplevel_eg1.sml, the executable has a finalizer depending on whether PolyML.fullGC was called when building.
As I wrote that it occurred to me that there might be some way of processing the current finaliser list during the shut-down process so that only weak refs that were referenced by other finalisers were preserved. This would allow the detection of dependencies, which is what is really needed at that point. This would be much cheaper and could be done instead of the current call to PolyML.fullGC.
This reminded me of a similar discussion a while ago: http://lists.inf.ed.ac.uk/pipermail/polyml/2012-August/001050.html
Matthew Fluet suggested that just working through finalizers, ignoring garbage collection, could break certain invariants. I think I was only in favour of ignoring garbage collection given the limitations of setFinal, which don't apply here. My inclination is to see how we get on with iterating PolyML.fullGC on exit.
I've merged the current Finalizer branch into master on github.
I've been testing Finalizable and found a few things.
1. A finalizer that raises an exception causes the cleaning thread to terminate. Attached patch 0001 handles exceptions from finalizers and reports them.
2. I have an FFI-related test suite for checking finalization that produces an output log. Now that finalizers are called asynchronously, the log entries are out of place and the order is not deterministic. Even carefully ordering text output and always flushing does not entirely solve the problem: writing to stdOut from different threads seems to lead to some corruption. What I needed was a function to synchronously run all finalizers (like on exit) in the main thread. I have added a function doGCAndFinalize in the attached patch 0002. I'm not convinced by this patch but it resolves this issue.
3. For 5.5.2, when finalizeBefore is used, it appears that not all finalizers are run by repeatedly calling PolyML.fullGC. This occurs on exit and, with patch 0002 applied, when doGCAndFinalize is called. See attached example test1.sml.gz. It appears that this has been fixed in the current development by commit d9ca031dd99161c93ba03c42af396dafbf8c8482 so I mention that just for information. I have no immediate need for finalizeBefore, so this should not hold me up.
Please feel free to mangle/ignore patches as you see fit.
Phil
On 22/09/2015 00:52, Phil Clayton wrote:
I see what you mean. I was thinking only about an interactive/batch session. For an executable, it's not clear that it makes sense to have a finalizable value at the top-level. Already something odd is happening: in the attached example toplevel_eg1.sml, the executable has a finalizer depending on whether PolyML.fullGC was called when building.
I've merged the current Finalizer branch into master on github.
I've been testing Finalizable and found a few things.
- A finalizer that raises an exception causes the cleaning thread to
terminate. Attached patch 0001 handles exceptions from finalizers and reports them.
- I have an FFI-related test suite for checking finalization that
produces an output log. Now that finalizers are called asynchronously, the log entries are out of place and the order is not deterministic. Even carefully ordering text output and always flushing does not entirely solve the problem: writing to stdOut from different threads seems to lead to some corruption. What I needed was a function to synchronously run all finalizers (like on exit) in the main thread. I have added a function doGCAndFinalize in the attached patch 0002. I'm not convinced by this patch but it resolves this issue.
- For 5.5.2, when finalizeBefore is used, it appears that not all
finalizers are run by repeatedly calling PolyML.fullGC. This occurs on exit and, with patch 0002 applied, when doGCAndFinalize is called. See attached example test1.sml.gz. It appears that this has been fixed in the current development by commit d9ca031dd99161c93ba03c42af396dafbf8c8482 so I mention that just for information. I have no immediate need for finalizeBefore, so this should not hold me up.
Phil, I've been thinking about this for a while and reluctantly I've come to the conclusion that the only option is to remove Finalizable from the basis library. The problem is with trying to give a clear definition of what it does in the presence of all these variables.
At the heart of the difficulty is the view of what the garbage collector does. The idea of the garbage collector is that it should not have an observable effect on the program and so it can be run in various forms, minor collections or major collections, at various times. It guarantees not to remove data that may be required in the future.
Adding finalisers in this way turns this on its head. It makes the GC observable and it requires, to some extent, that the collector should guarantee to remove objects that are no longer required.
I really don't think this is achievable or even desirable. Trying to achieve finalisation by using reachability makes the program unpredictable and subject to changes in quite unexpected ways. The effect of commit d9ca031dd99161c93ba03c42af396dafbf8c8482 is a specific example. This commit changed the way some run-time system calls were handled on the X86 so that registers were not preserved across RTS calls. The reason for this was to try to speed up the calls. The fact that it had an observable effect on the program is worrying to say the least.
I've added "touch" to the Weak structure so you can implement the original Finalizable structure yourself and that's probably easiest for you.
Best regards, David