Dear Yue, You've obviously made quite a lot of progress.
Yue Li wrote:
- In the printed IR of a program, seems the print out has two
sections, for example:
2 * 3;
BLOCK(DECL #1{0 uses} = GLOBAL (FUN "run_call2(1)", LAMBDAINLINE( run_call2(1) CL=false CR=0 LEV=0 LOCALS=0 ARGS=G ARGLIVES= RES=G CLOS=() LOCAL(1,1) G $(INDIRECT(0, PARAM(0,1)) G, INDIRECT(1, PARAM(0,1)) G)){LAMBDA} ) (*GLOBAL*){early} G $(LIT <long> G); DECL #2{0 uses} = LOCAL(0,1); MUTUAL(); RECCONSTR(LOCAL(0,2)) ) BLOCK(DECL #1{0 uses} = GLOBAL (FUN "run_call2(1)", LAMBDAINLINE( run_call2(1) CL=false CR=0 LEV=0 LOCALS=0 ARGS=G ARGLIVES= RES=G CLOS=() LOCAL(1,1) G $(INDIRECT(0, PARAM(0,1)) G, INDIRECT(1, PARAM(0,1)) G)){LAMBDA} ) (*GLOBAL*){early} G $(LIT <long> G); DECL #2{0 uses} = LOCAL(0,1); MUTUAL(); RECCONSTR(LOCAL(0,2)) ) POLY_SYS_amul{early} G $(LIT2 G, LIT3 G)
I've had to go back to the code to try to understand what's going on here. It is actually printing the tree twice and that's a mistake. I added code to print the tree in the CODETREE functor, I think because in a very few cases it wasn't being printed, and omitted to remove the original printing code. I'll fix that.
The first section contains two same blocks, and always the block is duplicated in the first section for every input ML program. The second section contains an expression. It seems that the second expression is included in the blocks of the first section, and I'm a little bit confused here, so why the first section duplicates the blocks, and why the expression
POLY_SYS_amul{early} G $(LIT2 G, LIT3 G)
is separately printed out?
This is actually the output of the first phase of processing of the generated code-tree. In this case the original code contained some inline functions which have been stripped out leaving this code. It will actually be code-generated and evaluated by the compiler to reduce it to a constant. That's not going to provide any saving in this particular case since it's going to be evaluated only once but it may well do if this occurred inside a more complex function.
- TAGTEST(x, y, z) is used in several different places such as
generation of the boolean constants true and false, or testing whether a list is empty, but I still couldn't find documentation of TagTest from the source code, and I'm wondering what is the semantics of TAGTEST?
TAGTEST is really just an equality operation. It occurs primarily with datatypes but that does include the bool type which is essentially datatype bool = false | true TAGTEST(c1, c2, exp) is essentially the same as "c1 = exp". c2 is the maximum value of the tag (i.e. one less than the number of tags since they're numbered from zero) and is used if a sequence of tests is converted into an indexed jump.
- When I printed the IR of a signature, say:
signature Set =
# sig # type element # type set # val empty : set # end; BLOCK(LIT0) BLOCK(LIT0)
I suppose here again BLOCK(LIT0) is duplicated, and I'm wondering why this signature is represented like this simple instead of using a tuple whose elements are the decls, and its internal representations?
Declaring a signature doesn't generate any code so it's putting in zero here as a place-holder. Try declaring a structure.
- I found some documentation about the members of LAMBDA, but still
couldn't figure out how CL, CR and LOCALS are determined. In all my testing cases, always CL=false, and CR=0. Also for LOCALS, I thought it indicates the times that local variables are used. For instance:
Here I don't quite understand why LOCALS in the first Block is always 0, and why the LOCALS below is 3.
There are some values that are only used in later phases of the code-generator. It's simpler to have a single datatype and printer for the code-tree for all the phases and have the early phases just set the unused values to zero. CL, CR, LEV, LOCALS, CLOS and ARGLIVES are all of this sort. ARGS and RES are set in the initial phase. ARGS defines the the number and type of the arguments: F for Floating point and G for General (i.e. everything else) and RES the type of the result, again either F or G. ML functions are all expected to have precisely one argument and one result but the front end generates LAMBDAs that take more arguments for curried functions.
David