There have been a few changes to the Foreign structure. The callN functions have been renamed as buildCallN and the way functions are passed as arguments has been changed.
The reason for the changes is to make clear that the expensive operations are creating the C functions and closures and that calling a C function or passing a constructed closure are comparatively cheap.
It is important to call the buildXXX functions at the top-level e.g. in a structure, so that the C function is created once. The old callN functions were curried and it wasn't apparent that the partial application to the conversions was quite different to the application of this to the arguments. For that reason the buildXXX take a tuple.
David
Hi, so if my interpretation of the above is correct the timings shouldn't be changing (i.e. one shouldn't be expecting the timings to improve)? The benchmarks seem to indicate this. I modified the second benchmark in http://lists.inf.ed.ac.uk/pipermail/polyml/2015-October/001673.html [the one with "(* This uses new FFI *)" at the top] to the following, and the time to compute was roughly the same :
*************************************************************************** *************************************************************************** open Foreign;
val mylib = loadLibrary "./intArray.so";
val c1 = buildCall1((getSymbol mylib "createIntArray"),cInt,cPointer)
val c2 = buildCall1((getSymbol mylib "destroyIntArray"),cPointer,cVoid)
val c3 = buildCall3((getSymbol mylib "setIntArray"),(cPointer,cInt,cInt),cVoid)
val c4 = buildCall2((getSymbol mylib "getIntArray"),(cPointer,cInt),cInt)
val c5 = buildCall1((getSymbol mylib "getSumIntArray"),(cPointer),cInt)
fun c_createIntArray (size) = c1 (size); fun c_destroyIntArray (p) = c2 (p); fun c_setIntArray (p,elem,value) = c3 (p,elem,value); fun c_getIntArray (p,elem) = c4 (p,elem); fun c_getSumIntArray (p) = c5 (p);
val size:int = 50000; val loops:int = 30; val cap:int = 50000;
fun loop (pData2) = let fun loopI i = if i = size then let val _ = () in c_setIntArray(pData2,0,c_getIntArray(pData2,size-1)); () end else let val previous = c_getIntArray(pData2,i-1); val use = if previous > cap then 0 else previous in c_setIntArray(pData2,i,use+1); loopI (i+1) end in loopI 1 end
fun benchmarkRun (pData2) = let fun bench i = if i = loops then () else let val _ = () in loop (pData2); bench (i+1) end in bench 1 end
fun main () = let val pData = c_createIntArray(size); in benchmarkRun(pData); print (Int.toString (c_getSumIntArray (pData))); print "\n" end *************************************************************************** ***************************************************************************
Thanks
On Wed, Dec 16, 2015 at 6:05 PM, David Matthews < David.Matthews at prolingua.co.uk> wrote:
There have been a few changes to the Foreign structure. The callN functions have been renamed as buildCallN and the way functions are passed as arguments has been changed.
The reason for the changes is to make clear that the expensive operations are creating the C functions and closures and that calling a C function or passing a constructed closure are comparatively cheap.
It is important to call the buildXXX functions at the top-level e.g. in a structure, so that the C function is created once. The old callN functions were curried and it wasn't apparent that the partial application to the conversions was quite different to the application of this to the arguments. For that reason the buildXXX take a tuple.
David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On 16/12/2015 14:00, Artella Coding wrote:
Hi, so if my interpretation of the above is correct the timings shouldn't be changing (i.e. one shouldn't be expecting the timings to improve)? The benchmarks seem to indicate this. I modified the second benchmark in http://lists.inf.ed.ac.uk/pipermail/polyml/2015-October/001673.html [the one with "(* This uses new FFI *)" at the top] to the following, and the time to compute was roughly the same :
open Foreign;
val mylib = loadLibrary "./intArray.so";
val c1 = buildCall1((getSymbol mylib "createIntArray"),cInt,cPointer)
val c2 = buildCall1((getSymbol mylib "destroyIntArray"),cPointer,cVoid)
val c3 = buildCall3((getSymbol mylib "setIntArray"),(cPointer,cInt,cInt),cVoid)
val c4 = buildCall2((getSymbol mylib "getIntArray"),(cPointer,cInt),cInt)
val c5 = buildCall1((getSymbol mylib "getSumIntArray"),(cPointer),cInt)
fun c_createIntArray (size) = c1 (size); fun c_destroyIntArray (p) = c2 (p); fun c_setIntArray (p,elem,value) = c3 (p,elem,value); fun c_getIntArray (p,elem) = c4 (p,elem); fun c_getSumIntArray (p) = c5 (p);
Correct. You're defining the functions at the top level so there's no problem. The problem with the old, curried, form was the temptation to write: call3 (getSymbol mylib "setIntArray") (cPointer,cInt,cInt) cVoid (pData2,0,c_getIntArray(pData2,size-1)) inside the loop.
It's even more of a problem with passing functions because the old mechanism of passing the ML function as an argument resulted in the creation of a LibFFI closure for the argument each time the C function was called.
David
I've been looking at the new Foreign structure. It is good to see that the family of callNretM functions has been eliminated by introducing cStar. I've updated a few examples to use the buildCall<N> functions and they appear to work.
With CInterface, I was avoiding the family of call<N> functions by implementing a single call function based on call_sym or call_sym_and_convert. I am trying to do something equivalent using Foreign.LowLevel based on the implementation in buildCall<N>withAbi. There is a problem: the field "updateML" is not accessible because 'a conversion is abstract. Am I right in thinking "updateML" is required for conversions involving cStar? If so, is there a way I can implement my own variant of a buildCall function?
Also - just an observation - in the functions buildCall<N>withAbi, couldn't the argument offsets (and the amount of memory to malloc) be calculated once, rather than for each foreign call? E.g. in buildCall2withAbi:
let val callF = callwithAbi abi [arg1Type, arg2Type] resType fnAddr val arg1Offset = alignUp(#size resType, #align arg1Type) val arg2Offset = alignUp(arg1Offset + #size arg1Type, #align arg2Type) val argSize = arg2Offset + #size arg2Type in fn (a, b) => let val rMem = malloc argSize ...
Phil
On 16/12/2015 12:35, David Matthews wrote:
There have been a few changes to the Foreign structure. The callN functions have been renamed as buildCallN and the way functions are passed as arguments has been changed.
The reason for the changes is to make clear that the expensive operations are creating the C functions and closures and that calling a C function or passing a constructed closure are comparatively cheap.
It is important to call the buildXXX functions at the top-level e.g. in a structure, so that the C function is created once. The old callN functions were curried and it wasn't apparent that the partial application to the conversions was quite different to the application of this to the arguments. For that reason the buildXXX take a tuple.
On 04/01/2016 20:39, Phil Clayton wrote:
I've been looking at the new Foreign structure. It is good to see that the family of callNretM functions has been eliminated by introducing cStar. I've updated a few examples to use the buildCall<N> functions and they appear to work.
With CInterface, I was avoiding the family of call<N> functions by implementing a single call function based on call_sym or call_sym_and_convert. I am trying to do something equivalent using Foreign.LowLevel based on the implementation in buildCall<N>withAbi. There is a problem: the field "updateML" is not accessible because 'a conversion is abstract. Am I right in thinking "updateML" is required for conversions involving cStar? If so, is there a way I can implement my own variant of a buildCall function?
My thinking was that anyone programming at the level of Foreign.LowLevel would want to build their own conversions. I can see it's a problem if you want to make use of the existing higher level conversions. Originally "conversion" was completely transparent but it meant that constructing any new "conversion" required all the functions to be provided even though most of the time the "update" functions aren't needed. I did think about having extended versions of "makeConversion" and "breakConversion" which would allow the "update" functions to be included or extracted. I'd like to keep "conversion" abstract because I can see that in the future it might be helpful to include extra functions or fields and having it completely transparent makes it difficult to provide backwards compatibility.
Also - just an observation - in the functions buildCall<N>withAbi, couldn't the argument offsets (and the amount of memory to malloc) be calculated once, rather than for each foreign call? E.g. in buildCall2withAbi:
let val callF = callwithAbi abi [arg1Type, arg2Type] resType fnAddr val arg1Offset = alignUp(#size resType, #align arg1Type) val arg2Offset = alignUp(arg1Offset + #size arg1Type, #align arg2Type) val argSize = arg2Offset + #size arg2Type in fn (a, b) => let val rMem = malloc argSize ...
True. The idea is that these functions should all be inlined in which case all these would be compile-time constants. Making them inlined means setting maxInlineSize to a very large value while compiling "basis/Foreign.sml". The way the basis library is compiled that wasn't possible until the 5.6 compilers were built.
I'm trying to avoid making any changes in git "master" until the release is complete but it would certainly be worth doing in the development version.
David
On 04/01/2016 21:51, David Matthews wrote:
On 04/01/2016 20:39, Phil Clayton wrote:
I've been looking at the new Foreign structure. It is good to see that the family of callNretM functions has been eliminated by introducing cStar. I've updated a few examples to use the buildCall<N> functions and they appear to work.
With CInterface, I was avoiding the family of call<N> functions by implementing a single call function based on call_sym or call_sym_and_convert. I am trying to do something equivalent using Foreign.LowLevel based on the implementation in buildCall<N>withAbi. There is a problem: the field "updateML" is not accessible because 'a conversion is abstract. Am I right in thinking "updateML" is required for conversions involving cStar? If so, is there a way I can implement my own variant of a buildCall function?
My thinking was that anyone programming at the level of Foreign.LowLevel would want to build their own conversions. I can see it's a problem if you want to make use of the existing higher level conversions.
In case you're interested in an example of using the conversions, see the attached code. This defines a single 'call' function that avoids a family of 'call<N>' functions by taking a function signature that is constructed using existing conversions. For example:
open Foreign Call val f1 = call sym1 (cInt && cInt --> cInt) : (int, int) pair -> int val f2 = call sym2 (cFloat --> cInt) : real -> int ... ... f1 (2 & 3) : int ... f2 Math.e : int
Here, the existing conversions are convenient for capturing the characteristics of a C type even when not using the existing buildCall<N> functions. I don't have any objection to creating my own conversions - easily done by wrapping existing ones - it would just be nice if all the existing ones worked with the above.
Originally "conversion" was completely transparent but it meant that constructing any new "conversion" required all the functions to be provided even though most of the time the "update" functions aren't needed. I did think about having extended versions of "makeConversion" and "breakConversion" which would allow the "update" functions to be included or extracted. I'd like to keep "conversion" abstract because I can see that in the future it might be helpful to include extra functions or fields and having it completely transparent makes it difficult to provide backwards compatibility.
It take your point about abstraction. "updateML" and "updateC" seem reasonable in concept - they are variants of load and store that are applied to the arguments after a call - but perhaps their names could be more meaningful.
Hiding the "updateX" functions is not a problem for me. To avoid the family of callNretM functions in CInterface, I already have a framework in place for dealing with arguments passed by reference so I don't need to use "cStar", though it would have allowed a little more code to be shared with MLton. I could prevent "cStar" from being used accidentally by making my own conversion type.
Also - just an observation - in the functions buildCall<N>withAbi, couldn't the argument offsets (and the amount of memory to malloc) be calculated once, rather than for each foreign call? E.g. in buildCall2withAbi:
let val callF = callwithAbi abi [arg1Type, arg2Type] resType fnAddr val arg1Offset = alignUp(#size resType, #align arg1Type) val arg2Offset = alignUp(arg1Offset + #size arg1Type, #align arg2Type) val argSize = arg2Offset + #size arg2Type in fn (a, b) => let val rMem = malloc argSize ...
True. The idea is that these functions should all be inlined in which case all these would be compile-time constants. Making them inlined means setting maxInlineSize to a very large value while compiling "basis/Foreign.sml". The way the basis library is compiled that wasn't possible until the 5.6 compilers were built.
I'm trying to avoid making any changes in git "master" until the release is complete but it would certainly be worth doing in the development version.
Another observation for future consideration: there is no conversion for the boolean type. I believe that the existing CInterface.BOOL is a convenience for an int treated as a boolean (indicating a non-zero value) and can easily be created as a derived conversion. It's worth noting that C99 has a type "_Bool" (with a macro to refer to it as "bool") that "is large enough to store the values 0 and 1". Depending on the platform, I believe that sizeof(_Bool) could be larger than one byte, so it could be useful to have a conversion for the _Bool type.
Phil
On 05/01/2016 11:45, Phil Clayton wrote:
Another observation for future consideration: there is no conversion for the boolean type. I believe that the existing CInterface.BOOL is a convenience for an int treated as a boolean (indicating a non-zero value) and can easily be created as a derived conversion. It's worth noting that C99 has a type "_Bool" (with a macro to refer to it as "bool") that "is large enough to store the values 0 and 1". Depending on the platform, I believe that sizeof(_Bool) could be larger than one byte, so it could be useful to have a conversion for the _Bool type.
At the lowest level the Poly/ML conversions make use of LibFFI types to provide size and alignment information. There's no "bool" type there and I would guess that this is because there's no consensus about exactly what it should be in C, at least historically.
David
I notice that the buildClosure<N>withAbi functions take the SML function to call back to in the same tuple as the type of the callback arguments and return value. Consequently the interface for constructing closures using doesn't allow the same CIF to be used for multiple callbacks to different SML functions (of the same type). Is that deliberate or could there be some advantage in allowing that?
For example, we could have:
val buildClosure2withAbi: LibFFI.abi * ('a conversion * 'b conversion) * 'c conversion -> ('a * 'b -> 'c) -> ('a * 'b -> 'c) closure
fun buildClosure2withAbi ( abi: abi, (arg1Conv: 'a conversion, arg2Conv: 'b conversion), resConv: 'c conversion ) : ('a * 'b -> 'c) -> ('a * 'b -> 'c) closure = let fun callback f (args, res) = ...
val argTypes = [#ctype arg1Conv, #ctype arg2Conv] and resType = #ctype resConv
val makeCallback = cFunctionWithAbi abi argTypes resType in fn f => Memory.memoise (fn () => makeCallback(callback f)) () end
Then, at the top-level, an application could define e.g.
val binOpClosure = buildClosure2 ((cDouble, cDouble), cDouble) val addClosure = binOpClosure Real.+ val subtractClosure = binOpClosure Real.- ...
I don't know enough about libffi to know whether that is useful or would be problematic.
Phil
On 16/12/2015 12:35, David Matthews wrote:
There have been a few changes to the Foreign structure. The callN functions have been renamed as buildCallN and the way functions are passed as arguments has been changed.
The reason for the changes is to make clear that the expensive operations are creating the C functions and closures and that calling a C function or passing a constructed closure are comparatively cheap.
It is important to call the buildXXX functions at the top-level e.g. in a structure, so that the C function is created once. The old callN functions were curried and it wasn't apparent that the partial application to the conversions was quite different to the application of this to the arguments. For that reason the buildXXX take a tuple.
David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On 11/01/2016 08:34, Phil Clayton wrote:
I notice that the buildClosure<N>withAbi functions take the SML function to call back to in the same tuple as the type of the callback arguments and return value. Consequently the interface for constructing closures using doesn't allow the same CIF to be used for multiple callbacks to different SML functions (of the same type). Is that deliberate or could there be some advantage in allowing that?
For the X86 at least; I can't say for the other architectures; the CIF is very simple, consisting really of the types and a few flags. I had expected that there would be some code there but there isn't. The complicated part is when the closure is actually built and this involves allocating memory with mmap so that it can be executable and building the code that is passed as the C function. There's little to be gained from reusing the CIF.
More importantly I wanted to steer users away from trying to create closures on-the-fly because of the cost. Before the recent change to use "buildClosureN" the calling mechanism implied that it was possible to define a function that took, say an integer and an ML function, and that the cost of passing the integer and the function were similar. In practice that's not the case. They're only similar if the ML function has already been wrapped as a C closure.
I have a long-term plan to turn the buildCallN and buildClosureN functions into something that is almost a mini-compiler that will produce an interface function when it is applied. It may only be possible to do that in certain cases and fall back to libffi in the other cases. It's complicated because there are different ABIs for the various combinations of X86-32/X86-64 and Unix/Windows.
David
On 11/01/2016 12:10, David Matthews wrote:
On 11/01/2016 08:34, Phil Clayton wrote:
I notice that the buildClosure<N>withAbi functions take the SML function to call back to in the same tuple as the type of the callback arguments and return value. Consequently the interface for constructing closures using doesn't allow the same CIF to be used for multiple callbacks to different SML functions (of the same type). Is that deliberate or could there be some advantage in allowing that?
For the X86 at least; I can't say for the other architectures; the CIF is very simple, consisting really of the types and a few flags. I had expected that there would be some code there but there isn't. The complicated part is when the closure is actually built and this involves allocating memory with mmap so that it can be executable and building the code that is passed as the C function. There's little to be gained from reusing the CIF.
I looked at the code for FFI runtime call 55 (create a CIF) and there's not a huge amount in there so it's not much overhead. I couldn't see where the memory allocated for a CIF is freed though. (Probably not an issue as the number of callback functions in an application is likely to bounded, assuming that they are built only once by correct use of buildClosure<N>.)
I have an interface where the CIF can be reused and was concerned that doing so would be problematic, given that Foreign doesn't allow this. It sounds like that isn't a problem though.
More importantly I wanted to steer users away from trying to create closures on-the-fly because of the cost. Before the recent change to use "buildClosureN" the calling mechanism implied that it was possible to define a function that took, say an integer and an ML function, and that the cost of passing the integer and the function were similar. In practice that's not the case. They're only similar if the ML function has already been wrapped as a C closure.
Yes, I see, hence the type 'a closure to represent a function whose closure has been constructed.
I have a long-term plan to turn the buildCallN and buildClosureN functions into something that is almost a mini-compiler that will produce an interface function when it is applied. It may only be possible to do that in certain cases and fall back to libffi in the other cases. It's complicated because there are different ABIs for the various combinations of X86-32/X86-64 and Unix/Windows.
Interesting - is that to make the FFI faster? I wonder if the experiences of MLton would be useful for creating such an interface.
Phil
On 13/01/2016 14:19, Phil Clayton wrote:
I looked at the code for FFI runtime call 55 (create a CIF) and there's not a huge amount in there so it's not much overhead. I couldn't see where the memory allocated for a CIF is freed though. (Probably not an issue as the number of callback functions in an application is likely to bounded, assuming that they are built only once by correct use of buildClosure<N>.)
The CIF is built the first time it is used and then cached for the rest of the execution.
I have a long-term plan to turn the buildCallN and buildClosureN functions into something that is almost a mini-compiler that will produce an interface function when it is applied. It may only be possible to do that in certain cases and fall back to libffi in the other cases. It's complicated because there are different ABIs for the various combinations of X86-32/X86-64 and Unix/Windows.
Interesting - is that to make the FFI faster? I wonder if the experiences of MLton would be useful for creating such an interface.
The idea is that the current buildCallN/buildClosureN functions together with the conversions would be sufficient for the interface. The only difference would be what would happen when they were called. The aim would be to improve the efficiency of calling a foreign function at the cost of the extra time "compiling" the function call.
A case in point is the mechanism for calling in X64/Unix. It looks from the code and a quick glance at the ABI documentation that this involves putting the first 6 fixed point values into general registers and the first 8 floating point registers into SSE registers. There are some complicated rules for small structures. LibFFI works all this out on every call so there is a significant overhead. I would rather have expected that it might have encoded some of this in the CIF but it doesn't seem to. By compiling an interface function the arguments could be moved directly from ML into the appropriate registers.
David
I've started using the Foreign structure and it seems to be working well in tests. (It will probably be a week until I can test it with larger applications though.) My comments below are just for interest, nothing significant to report.
I have a minor comment about the type of the store function in conversions: Memory.voidStar * 'a -> unit -> unit Every time I used a store function, I found that it would be more convenient to have 'a -> Memory.voidStar -> unit -> unit There were two reasons: 1. To easily compose store functions, e.g. val cX : x conversion = makeConversion { load = toX o load, store = store o fromX, <--- not valid ctype = ctype } 2. To apply just the value being stored to get a type that contains no type variables, e.g. storeInt 3 : Memory.voidStar -> unit -> unit
I'm not actually using Foreign directly but via a module PolyMLFFI to provide a suitable interface. This gives a type-safe interface for creating calls, callback closures and struct conversions without using an infinite family of functions for each, e.g. open PolyMLFFI val aCall = call aSym (cInt &&> cInt --> cDouble) The "price to pay" for this is that arguments or struct fields are joined using an infix pair constructor in ML, e.g. aCall (2 & 3) (This would be nicer if SML allowed user-defined infix type constructors...) This module also provides a low-level interface that is slightly higher-level than Foreign.LowLevel because it takes care of storing the arguments in memory. This can be used with the ctype, load and store components of a conversion, e.g. fun f m n = LowLevel.call fSym [cTypeInt, cTypeInt] cTypeInt [storeInt m, storeInt n] loadInt
I've attached this interface module, PolyMLFFI, in case it is useful to anyone. Also, I am interested in any comments, particularly regarding efficiency. (Currently it doesn't support cStar because updateC and updateML fields are not exposed by Foreign. It could be updated but I didn't need those.)
Phil
On 16/12/2015 12:35, David Matthews wrote:
There have been a few changes to the Foreign structure. The callN functions have been renamed as buildCallN and the way functions are passed as arguments has been changed.
The reason for the changes is to make clear that the expensive operations are creating the C functions and closures and that calling a C function or passing a constructed closure are comparatively cheap.
It is important to call the buildXXX functions at the top-level e.g. in a structure, so that the C function is created once. The old callN functions were curried and it wasn't apparent that the partial application to the conversions was quite different to the application of this to the arguments. For that reason the buildXXX take a tuple.
David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
On 13/01/2016 14:25, Phil Clayton wrote:
I've started using the Foreign structure and it seems to be working well in tests. (It will probably be a week until I can test it with larger applications though.)
I've replaced all use of the old CInterface structure by Foreign in a large library and so far test applications are all working as before. I'm glad to see the back of vols. I don't have any timing measurements, so can't comment on improvements there.
Phil