Hi,
Is there an update in the works for the Poly FFI?
I may not be looking in the correct place, but I am seeing what may be a needed feature: let's say I am happy with Poly managing the lifetimes of "volatile" C-data values, and handling the freeing of malloced memory.
What about C++? One would need some way to [at least] arrange for a delete operation (so as to invoke any needed destructor(s))... and this is assuming we wrap C++ objects created with new[] inside a "simple" new-created object - otherwise we need to be able to issue a delete[].
This could still potentially be handled by a C-style interface, if we could optionally associate a C "finalize" function with a volatile - this function could be called by the Poly memory reclamation when the volatile is truly unreachable; upon its return, Poly can go ahead and complete the free operation.
This issue comes up both in the general case of talking to extra-Poly code (where we may have some control of the interface presented), but also in the case of talking to an existing library that is C++ rather than C based... in this case, we really need to allow the foreign code base to perform its own lifetime-management related operations.
Like I said, this functionality may be available already... but if isn't, what are others' thoughts on the subject? Is this facility needed?
Robert
Robert Roessler wrote:
This could still potentially be handled by a C-style interface, if we could optionally associate a C "finalize" function with a volatile - this function could be called by the Poly memory reclamation when the volatile is truly unreachable; upon its return, Poly can go ahead and complete the free operation.
This issue comes up both in the general case of talking to extra-Poly code (where we may have some control of the interface presented), but also in the case of talking to an existing library that is C++ rather than C based... in this case, we really need to allow the foreign code base to perform its own lifetime-management related operations.
Like I said, this functionality may be available already... but if isn't, what are others' thoughts on the subject? Is this facility needed?
It seems like it could be useful. It isn't possible to run ML code while actually doing the garbage-collection so it would be necessary to record that a vol was due to be deleted and then call the finaliser at some point later on. That, of course, raises the question of which thread runs the finaliser. From a bit of searching it appears that in Java finalisers can be run on any thread which doesn't seem a good idea.
It may be possible to get the effect by using a weak reference (structure Weak). That requires a thread to monitor Weak.weakSignal and use that to then check which values have been deleted.
David
David Matthews wrote:
Robert Roessler wrote:
This could still potentially be handled by a C-style interface, if we could optionally associate a C "finalize" function with a volatile - this function could be called by the Poly memory reclamation when the volatile is truly unreachable; upon its return, Poly can go ahead and complete the free operation.
This issue comes up both in the general case of talking to extra-Poly code (where we may have some control of the interface presented), but also in the case of talking to an existing library that is C++ rather than C based... in this case, we really need to allow the foreign code base to perform its own lifetime-management related operations.
Like I said, this functionality may be available already... but if isn't, what are others' thoughts on the subject? Is this facility needed?
It seems like it could be useful. It isn't possible to run ML code while actually doing the garbage-collection so it would be necessary to record that a vol was due to be deleted and then call the finaliser at some point later on. That, of course, raises the question of which thread runs the finaliser. From a bit of searching it appears that in Java finalisers can be run on any thread which doesn't seem a good idea.
It may be possible to get the effect by using a weak reference (structure Weak). That requires a thread to monitor Weak.weakSignal and use that to then check which values have been deleted.
Probably because I already had a solution for this in my head (from what OCaml does), I wasn't clear...
The finalizer is intended to be *foreign* code, presumably C - and it is *never* able to do anying which affects the Poly heap(s). This is not as restrictive as it may sound, since the idea of the finalizer (as discussed here, anyway) is to manage the extra-Poly environment in any case.
I will look at this (in foreign.cpp?)... I will of course check the volatile implementation again, but aren't they handled specially WRT Poly saved state? As in, they do NOT persist?
That would be ideal, as the changes to support the foreign finalizer would be more contained... basically just an extra field per volatile value (or something like that), and API to set/get its value.
And the GC code to call the foreign function when the volatile is about to be reclaimed. ;)
A detail - finalizers are NOT allowed for Poly-owned volatiles.
Robert
Robert Roessler wrote:
It seems like it could be useful. It isn't possible to run ML code while actually doing the garbage-collection so it would be necessary to record that a vol was due to be deleted and then call the finaliser at some point later on. That, of course, raises the question of which thread runs the finaliser. From a bit of searching it appears that in Java finalisers can be run on any thread which doesn't seem a good idea.
It may be possible to get the effect by using a weak reference (structure Weak). That requires a thread to monitor Weak.weakSignal and use that to then check which values have been deleted.
Probably because I already had a solution for this in my head (from what OCaml does), I wasn't clear...
The finalizer is intended to be *foreign* code, presumably C - and it is *never* able to do anying which affects the Poly heap(s). This is not as restrictive as it may sound, since the idea of the finalizer (as discussed here, anyway) is to manage the extra-Poly environment in any case.
I will look at this (in foreign.cpp?)... I will of course check the volatile implementation again, but aren't they handled specially WRT Poly saved state? As in, they do NOT persist?
The term "vol" was introduced in the days when Poly/ML had a persistent store and the idea was that these were values that could not be saved into the store. The concept is still there: it's possible to save the ML object into a saved state but any attempt to use it another session will raise an exception.
That would be ideal, as the changes to support the foreign finalizer would be more contained... basically just an extra field per volatile value (or something like that), and API to set/get its value.
That was how I saw it when you described it.
And the GC code to call the foreign function when the volatile is about to be reclaimed. ;)
I was thinking that this would be an ML function (which might call a C function) but if it's a C function then it could indeed be called during the GC. In that case the get/set functions return/take a "vol" to denote the finaliser and adding it wouldn't be too difficult.
David
David Matthews wrote:
... I was thinking that this would be an ML function (which might call a C function) but if it's a C function then it could indeed be called during the GC. In that case the get/set functions return/take a "vol" to denote the finaliser and adding it wouldn't be too difficult.
... maybe for *you*. ;)
It was fairly easy to make the requisite changes to foreign.cpp to support a "set_finalize" function (well, at least something I would like to test)... but getting everything else set up properly in ML-land to actually get it *used* is proving elusive for the moment.
I am (for now) goint with a simple
val set_finalize: vol -> sym -> unit
which sets a new
void (*C_finalizer)(void*);
element in the Volatile struct of the vol with the C function pointer from the sym. This value is checked when a NON-owned vol is being collected, and if != NULL, the C function is called with the C_pointer of the vol passed to it.
Note that I am at least for the moment not trying to do a "get_finalize"... I don't have a use for it, I am note sure it really is needed in general - due at least in part to the somewhat transient, or "volatile", lifespan of the FFI linkages. Also, it would be more complicated, since instead of just grabbing the C function pointer from the sym and saving *that* value in the Volatile struct, I would need to save a reference to the sym itself and then dereference that every time I need to invoke a finalizer in the middle of a GC.
But after that, I am a bit adrift - without fully understanding the model used by Poly to add a new primitive *and* have the compiler actually link to my function in foreign.cpp, I am not achieving the desired results.
While I am of course attempting to use some of the existing FFI code as examples of how to do this (the "trees"), I don't fully grasp the underlying idea(s) (the "forest")... and so am adding various definitions of "set_finalize" into a number of extra/CInterface/*.ML files - with "mixed results". :|
Could you outline the *minimum* changes to SML structures and signatures to connect a
CInterface set_finalize vol -> sym -> unit
function to (I think)
static Handle set_finalize (TaskData *taskData, Handle h)
in foreign.cpp (assuming it will go in at the end of the "handlers" array)?
Thanks!
Robert
Robert Roessler wrote:
... While I am of course attempting to use some of the existing FFI code as examples of how to do this (the "trees"), I don't fully grasp the underlying idea(s) (the "forest")... and so am adding various definitions of "set_finalize" into a number of extra/CInterface/*.ML files - with "mixed results". :|
Could you outline the *minimum* changes to SML structures and signatures to connect a
CInterface set_finalize vol -> sym -> unit
function to (I think)
static Handle set_finalize (TaskData *taskData, Handle h)
in foreign.cpp (assuming it will go in at the end of the "handlers" array)?
If it's helpful, I can be more specific about what I have tried... while I still really want to know how in general to add primitives to Poly (and *why* it is done that way), if it is easier for you to point out where I am going wrong, here is where our story so far:
CInterfaceSig (after val mapSym...) val set_finalize : vol -> sym -> unit
LowerLevelSig (after val mapSym...) val set_finalize : vol -> sym -> unit
LowerLevel (after fun get_sym... fun set_finalize vol sym = Volatile.set_finalize vol sym; (* Use Volatile. or not? *)
DispatchSig (after val toPascalfunction...) val set_finalize : rawvol -> rawvol -> unit
Dispatch (after val toPascalfunction...) val set_finalize = next(two);
VolatileSig (after val toPascalfunction...) val set_finalize : vol -> vol -> unit
VOLS_THAT_HOLD_REFS (after fun toPascalfunction...)
... and here is where I am stuck (semi-independently of how close any of the previous is or isn't). I am not at all sure of what to do here, but I *think* this is where I [finally] get down to my function in foreign.cpp. ;)
Again, thanks for any light you can shed.
Robert
Robert Roessler wrote:
Robert Roessler wrote:
... While I am of course attempting to use some of the existing FFI code as examples of how to do this (the "trees"), I don't fully grasp the underlying idea(s) (the "forest")... and so am adding various definitions of "set_finalize" into a number of extra/CInterface/*.ML files - with "mixed results". :|
Could you outline the *minimum* changes to SML structures and signatures to connect a
CInterface set_finalize vol -> sym -> unit
function to (I think)
static Handle set_finalize (TaskData *taskData, Handle h)
in foreign.cpp (assuming it will go in at the end of the "handlers" array)?
If it's helpful, I can be more specific about what I have tried... while I still really want to know how in general to add primitives to Poly (and *why* it is done that way), if it is easier for you to point out where I am going wrong, here is where our story so far:
CInterfaceSig (after val mapSym...) val set_finalize : vol -> sym -> unit
LowerLevelSig (after val mapSym...) val set_finalize : vol -> sym -> unit
LowerLevel (after fun get_sym... fun set_finalize vol sym = Volatile.set_finalize vol sym; (* Use Volatile. or not? *)
OK, this is now just
fun set_finalize vol sym = set_finalize vol sym
DispatchSig (after val toPascalfunction...) val set_finalize : rawvol -> rawvol -> unit
Dispatch (after val toPascalfunction...) val set_finalize = next(two);
VolatileSig (after val toPascalfunction...) val set_finalize : vol -> vol -> unit
VOLS_THAT_HOLD_REFS (after fun toPascalfunction...)
OK, this is now
fun set_finalize vol sym = Underlying.set_finalize (#thevol vol) (#thevol sym)
???
Progress, of sorts - this builds (including the ML-land bootstrapping) - and I can do all the DLL loading, sym grabbing, and creation of C++ vols using one of my DLL entry points. Finally, I can even manually invoke the "finalizer" on this vol, and the proper delete is done, including all of the virtual destructors.
BUT, if I try to do an actual set_finalize to set the new field in the Volatile structure, Poly just goes into a (interruptable) hard loop - without hitting my breakpoint in foreign.cpp:set_finalize(...).
Robert
Robert, I thought that it would be quicker and easier to implement this myself rather than try to explain what to do. Try it out and let me know if it does what you want. I've extended the example in the mlsource/extra/CInterface/Examples directory to include a test. Regards, David
David Matthews wrote:
I thought that it would be quicker and easier to implement this myself rather than try to explain what to do. Try it out and let me know if it does what you want. I've extended the example in the mlsource/extra/CInterface/Examples directory to include a test.
Thanks, David - this certainly does what I need (and more) - and you get to spell finalize your way! :)
On the ML side, I only have one question: why did you change the typing of your setFinal from that in my set_finalize? Besides the fact that the curried form appears to be the "style" of CInterface, it just looks more pleasing - and it reminds one that this *isn't* C/C++. ;)
Who can look at
val setFinal : vol * sym -> unit
and think it looks cooler than
val setFinal : vol -> sym -> unit
??? And of course, the lost opportunities for partial application (whether that is useful in this specific instance or not)...
On the foreign.cpp side of things, I notice 3 differences vs my version:
1) I had it so that "finalisation" was only possible with NON-owned vols, while your approach potentially allows Poly to use this feature now also.
2) My field ordering was different in Volatile - since pointers can be larger than ints, I slipped this in BEFORE the
Bool Own_C_space;
I prefer to keep structs "packed" for alignment and aesthetic reasons.
3) Your form of "return unit" is obviously canonical - I was just thrown by a couple of instances of
return h; /* to be ignored */
Again, thanks - and if you DON'T have a hardcore reason for using the non-curried form and want me to "fix" setFinal, let me know. :)
Robert
Robert Roessler wrote:
David Matthews wrote:
I thought that it would be quicker and easier to implement this myself rather than try to explain what to do. Try it out and let me know if it does what you want. I've extended the example in the mlsource/extra/CInterface/Examples directory to include a test.
Thanks, David - this certainly does what I need (and more) - and you get to spell finalize your way! :)
On the ML side, I only have one question: why did you change the typing of your setFinal from that in my set_finalize? Besides the fact that the curried form appears to be the "style" of CInterface, it just looks more pleasing - and it reminds one that this *isn't* C/C++. ;)
Who can look at
val setFinal : vol * sym -> unit
and think it looks cooler than
val setFinal : vol -> sym -> unit
??? And of course, the lost opportunities for partial application (whether that is useful in this specific instance or not)...
Personally, I prefer not to curry functions if they require all their arguments to work and there's no obvious requirement for partial application, but that's very much a matter of personal taste. Actually, as I think about it this could usefully be curried if its type was val setFinal: sym -> vol -> unit It's quite likely that the same finalisation function would be used on different vols whereas I can't see why one would apply a different finaliser to the same vol.
On the foreign.cpp side of things, I notice 3 differences vs my version:
- I had it so that "finalisation" was only possible with NON-owned
vols, while your approach potentially allows Poly to use this feature now also.
Well, the vol that has the finaliser actually is an owned vol. That's because the C_pointer field contains the address of a malloc'd area that holds the C value. So when a C function returns a pointer value a new "vol" is allocated to contain the returned value. This extra level of indirection means that a vol can be a char, an int or a struct.
- My field ordering was different in Volatile - since pointers can be
larger than ints, I slipped this in BEFORE the
Bool Own_C_space;
I prefer to keep structs "packed" for alignment and aesthetic reasons.
Actually, this "Bool" is an int (really an unsigned or size_t) that contains the size in bytes of the area. Calling it Bool looks like a piece of legacy code.
- Your form of "return unit" is obviously canonical - I was just thrown
by a couple of instances of
return h; /* to be ignored */
That's actually wrong. (I didn't write the FFI code!) It is theoretically possible for a unit value to be compared for equality with another unit value and if that is done with the fall-back structure equality code it will yield the wrong answer if the same value is not used everywhere to represent unit.
Again, thanks - and if you DON'T have a hardcore reason for using the non-curried form and want me to "fix" setFinal, let me know. :)
I think I prefer it curried but as sym->vol->unit. I'll even accept "finalize" since my dictionary says it's acceptable and perhaps even preferred! It looks like there are a few things that need to be cleaned up anyway so I could make these changes.
Regards, David
David Matthews wrote:
Robert Roessler wrote:
David Matthews wrote:
... Personally, I prefer not to curry functions if they require all their arguments to work and there's no obvious requirement for partial application, but that's very much a matter of personal taste. Actually, as I think about it this could usefully be curried if its type was val setFinal: sym -> vol -> unit It's quite likely that the same finalisation function would be used on different vols whereas I can't see why one would apply a different finaliser to the same vol.
Indeed.
On the foreign.cpp side of things, I notice 3 differences vs my version:
- I had it so that "finalisation" was only possible with NON-owned
vols, while your approach potentially allows Poly to use this feature now also.
Well, the vol that has the finaliser actually is an owned vol. That's because the C_pointer field contains the address of a malloc'd area that holds the C value. So when a C function returns a pointer value a new "vol" is allocated to contain the returned value. This extra level of indirection means that a vol can be a char, an int or a struct.
Ahhh... this goes along with the "there are always more levels of indirections than you think" comments in the code. ;)
I did, however, make the assumption that the "ownedness" field referred to whatever the C_pointer field was pointing to, rather than the vol holding that field itself... unless I am still confused (quite possible), this distinction is not very useful, since ALL vols themselves are actually owned by Poly?
Digging my hole deeper, what is the possible reason for vols to "be a char, an int or a struct"? I thought vols are there to manage foreign *memory* - implying pointers to [presumably but not necessarily] dynamically allocated storage... why would one want them to hold a value type?
- My field ordering was different in Volatile - since pointers can be
larger than ints, I slipped this in BEFORE the
Bool Own_C_space;
I prefer to keep structs "packed" for alignment and aesthetic reasons.
Actually, this "Bool" is an int (really an unsigned or size_t) that contains the size in bytes of the area. Calling it Bool looks like a piece of legacy code.
Naturally, I did check to see what the base type of 'Bool' actually is before posting this... as it is an int, my comment stands - the pointers preceding this field could be 64 bits, while this would still be 32 in some 64-bit models (e.g., Windows x64) and preserving the 8-byte alignment in that case certainly wouldn't hurt.
... I think I prefer it curried but as sym->vol->unit. I'll even accept "finalize" since my dictionary says it's acceptable and perhaps even preferred! It looks like there are a few things that need to be cleaned up anyway so I could make these changes.
Sweet (on the curried part)! But I was kidding about the finalize / finalise thing... I know how you types like your "colour", "analyse", etc. :)
Robert
Robert Roessler wrote:
... Sweet (on the curried part)! But I was kidding about the finalize / finalise thing... I know how you types like your "colour", "analyse", etc. :)
The "you types" referring to people on your side of the pond, of course.
Robert
- I had it so that "finalisation" was only possible with NON-owned
vols, while your approach potentially allows Poly to use this feature now also.
Well, the vol that has the finaliser actually is an owned vol. That's because the C_pointer field contains the address of a malloc'd area that holds the C value. So when a C function returns a pointer value a new "vol" is allocated to contain the returned value. This extra level of indirection means that a vol can be a char, an int or a struct.
Ahhh... this goes along with the "there are always more levels of indirections than you think" comments in the code. ;)
I did, however, make the assumption that the "ownedness" field referred to whatever the C_pointer field was pointing to, rather than the vol holding that field itself... unless I am still confused (quite possible), this distinction is not very useful, since ALL vols themselves are actually owned by Poly?
Digging my hole deeper, what is the possible reason for vols to "be a char, an int or a struct"? I thought vols are there to manage foreign *memory* - implying pointers to [presumably but not necessarily] dynamically allocated storage... why would one want them to hold a value type?
A vol is a VALUE in C-space. It can be any of the possible C values. To be honest I still get confused about the way vols work and much of this is based on experimentation. Generally it's not necessary to understand them in order to use the FFI since the conversions (e.g. CInterface.INT) deal with a lot of this but if you need to write your own conversions then you need to understand the details. There's an example of a tree structure and building a conversion in Examples/ForeignTest.sml.
A vol can be an int, a char, a pointer etc. There are low-level conversion functions such as toCint that make vols from ML values and inverse functions such as fromCint that get ML values out. When using the (very) low level interface, CInterface.call_sym, to call a foreign function you have to build a list of vols and you get back a vol. This, though, has largely been superseded by call_sym_and_convert and the higher level call1, call2 etc functions.
A vol is always implemented as a pointer to a piece of memory that contains the value, so toCInt mallocs a piece of memory of size sizeof(int) and puts the ML value into it. In most contexts that means that the FFI has to load the value out of the memory in order to use it. It does, though, mean that a vol can be updated using CInterface.assign and that's particularly important for structs.
CInterface.alloc allocates a piece of memory and returns a vol that refers to the memory as its value. This isn't the same as having a vol that contains the address of the memory. If you use alloc to create an array or a struct and then pass the vol to a function you're passing the array or struct by value not by reference. Typically, you need to use CInterface.address to create a new vol that contains the address of the memory.
Normally vols own the piece of the memory that contain their C value. The exceptions are vols created by CInterface.offset and CInterface.deref. "offset" creates a vol that refers to a field of a struct or an array. Like any vol the value is actually a pointer to the C value but because no new memory has been allocated this doesn't "own" any memory. Similarly "deref" doesn't allocate any new memory; it assumes the memory has been allocated elsewhere.
I'm not exactly sure how this interacts with finalisation but I think it will work as expected. If a foreign function returns a value that needs finalisation it will be necessary to retain the vol it came in or create a new one and then attach the finalisation function to it. The important thing is that the ML data structure needs to keep a reference to the vol rather than, as in most cases, turning it into an ML value.
- My field ordering was different in Volatile - since pointers can be
larger than ints, I slipped this in BEFORE the
Bool Own_C_space;
I prefer to keep structs "packed" for alignment and aesthetic reasons.
Actually, this "Bool" is an int (really an unsigned or size_t) that contains the size in bytes of the area. Calling it Bool looks like a piece of legacy code.
Naturally, I did check to see what the base type of 'Bool' actually is before posting this... as it is an int, my comment stands - the pointers preceding this field could be 64 bits, while this would still be 32 in some 64-bit models (e.g., Windows x64) and preserving the 8-byte alignment in that case certainly wouldn't hurt.
That's a good point. I had forgotten about 64-bit systems. I still think, though, that the field should be a size_t value since it could be the size of an array.
... I think I prefer it curried but as sym->vol->unit. I'll even accept "finalize" since my dictionary says it's acceptable and perhaps even preferred! It looks like there are a few things that need to be cleaned up anyway so I could make these changes.
Sweet (on the curried part)! But I was kidding about the finalize / finalise thing... I know how you types like your "colour", "analyse", etc. :)
Well, I thought I'd avoid the issue and call it setFinal (perhaps to be changed to set_final for consistency)!
Regards, David
David Matthews wrote:
... A vol is a VALUE in C-space. It can be any of the possible C values. To
Would it be accurate / better to say "a vol represents a VALUE in C-space"?
... I'm not exactly sure how this interacts with finalisation but I think it will work as expected. If a foreign function returns a value that needs finalisation it will be necessary to retain the vol it came in or create a new one and then attach the finalisation function to it. The important thing is that the ML data structure needs to keep a reference to the vol rather than, as in most cases, turning it into an ML value.
I agree that it seems like finalisation will work as expected.
I do think that the FFI is extremely complex (both on the data representation / marshaling and on the actual execution linkage) compared to, say, OCaml's.
http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html
And yes, I know you didn't write it. ;)
OK, it *is* true that they "cheated" and added a keyword or two... but its "low level" implementation really *is* - and you can always add the typing-and-conversion layers like Poly's on top. In fact, the "LablGTK" package has something very similar for dealing with GTK.
... That's a good point. I had forgotten about 64-bit systems. I still think, though, that the field should be a size_t value since it could be the size of an array.
For all the reasons the somewhat ugly (depending on how you look at it) size_t exists, yes... and it doesn't even look like there will be many new warnings from C++ compilers by changing from "Bool" (i.e., "int").
... Well, I thought I'd avoid the issue and call it setFinal (perhaps to be changed to set_final for consistency)!
Don't give up - insist on
val set_finalise: sym -> vol -> unit
Accept no substitute! :)
Robert