On 08/02/2011 00:31, Yue Li wrote:
> I'm trying to get better understanding about the code tree of Poly/ML
> function definition. I would appreciate if I can gain some insights from
> you.
>
> For instance, given the following definition:
>
> > fun f a b = a + b;
The first point to appreciate is that this is syntactic sugar for
val rec f = fn a => fn b => op+ (a, b);
However this general form isn't the best way of compiling the function.
Frequently a curried function is applied to several or all of its
arguments e.g.
val x = f y z
Using the general form would be inefficient. Applying f to y would
result in a closure being created on the heap which would then be
applied to z. Instead Poly/ML processes "fun" declarations specially.
fun f a b = a + b
is rewritten as
val rec f' = fn <a,b> => a + b
and rec f = fn_inline a => fn_inline b => f'<a,b>
i.e. a pair of mutually recursive functions, because f' could contain
recursive applications of f. The <> here denote multiple arguments in
registers or on the stack, something that doesn't have an exact
equivalent in ML since every function takes exactly one argument.
> The compiler prints out 3 sections of code tree: a block, a lambda and
> another block:
>
> BLOCK(MUTUAL(DECL #2{0 uses} =
> LAMBDA(
> f(2)
> CL=false CR=0 LEV=0 LOCALS=0
> ARGS=G G
> ARGLIVES=
> RES=G CLOS=()
> BLOCK(BLOCK(DECL #1{0 uses} = RECCONSTR(PARAM(0,1), PARAM(0,2));
> DECL #3{0 uses} = LOCAL(0,1);
> DECL #4{0 uses} = INDIRECT(0, LOCAL(0,3));
> DECL #5{0 uses} = INDIRECT(1, LOCAL(0,3));
> GLOBAL (FUN "run_call2(1)",
> LAMBDAINLINE(
> run_call2(1)
> CL=false CR=0 LEV=0 LOCALS=0
> ARGS=G
> ARGLIVES=
> RES=G CLOS=()
> LOCAL(1,1) G
> $(INDIRECT(0, PARAM(0,1)) G, INDIRECT(1, PARAM(0,1)) G)){LAMBDA}
> ) (*GLOBAL*){early} G $(RECCONSTR(LOCAL(0,4), LOCAL(0,5)) G)
> )
> )){LAMBDA} AND
This is the body of "f'". The name "f(2)" represents that it takes two
arguments but the name is only used for exception tracing and profiling.
It has no significance to ML.
>
> DECL #1{0 uses} =
> LAMBDAINLINE(
> f(1)
> CL=false CR=0 LEV=0 LOCALS=0
> ARGS=G
> ARGLIVES=
> RES=G CLOS=()
> LAMBDAINLINE(
> f(1)(1)
> CL=false CR=0 LEV=0 LOCALS=0
> ARGS=G
> ARGLIVES=
> RES=G CLOS=()
> LOCAL(2,2) G $(PARAM(0,1) G, PARAM(1,1) G)){LAMBDA}){LAMBDA}
> );
This is the body of "f". It consists of two inline functions, one
inside the other.
> RECCONSTR(LOCAL(0,1))
> )
This is the result of a top-level declaration. The final result is a
1-tuple containing a reference to the function "f". Actually because
these are all small functions it's likely that the result will be the
reference to the code-tree itself so that subsequent uses of "f" at the
top-level will expand the code in-line and reduce down to a use of the
addition operation. Top-level declarations can be simple function
declarations such as this example but just as often they involve
extensive computation and return the results as multiple declarations.
> I have three questions regarding this block:
>
> (1) First I'm wondering what do LOCAL(2, 2), and LOCAL(1, 1) refer to?
> My understanding so far is that the second number of the LOCAL operator
> means the number of DECL, but what does the first number refers to?
This is the nesting depth. Zero means local to the function, one means
local to the immediately containing function etc. So
LOCAL(2,2) G $(PARAM(0,1) G, PARAM(1,1) G))
means: apply the function declared by a DECL #2 two levels out to the
first or only parameter of the current function and the first or only
parameter of the immediately containing function. (Oh, there's some
oddity about the order of multiple parameters).
I hope this answers your other questions as well but let me know if
you'd like any more clarification.
Regards,
David