The build system uses PolyML.make not the usual make system. There is a short description of it here: https://polyml.org/documentation/Reference/PolyMLMake.html . The backends are built as a result of this line in mlsource/MLCompiler/CodeTree/ml_bind structure GCode = GCode This looks for a file called GCode.xxxx where xxxx is an extension such as .ML or .sml . An undocumented feature is that it first looks for files using the current architecture as returned from the RTS by PolyML.architecture() after converting this to lower case. So on the X86_64 it finds GCode.x86_64.ML and uses that. There are various GCode.xxxx files for the different architectures. To add a new architecture you need to add a new string to the poly_dispatch_c and the appropriate GCode.foo.ML file.
A further complication is that Poly/ML does not work in the same way as a more conventional compiler which compiles a file and write the object code for that file to the file system. Instead the compilation process builds a data structure much of which is executable code but including other values. At the end of the build process the structure is "exported" and that writes the object file. As PolyML.make runs each expression is compiled and immediately evaluated meaning that some of the code produced by the compiler is run immediately. Obviously if the evaluation produces a function that function is only evaluated when it is actually called. This make conventional cross-compilation difficult or impossible. The bootstrap process, which starts with interpreted code and ends up with machine code, has to work around this. It does it by first building a version of the interpreted code that has additional instructions in the code. These instructions are machine instructions on the target architecture that switch to the interpreter but are treated as no-ops by the interpreter itself. In this way during the next stage of bootstrap machine code functions can call interpreted code functions and vice versa. When the bootstrap is complete all the interpreted code is discarded.
David
On 23/11/2023 15:07, Andrei Formiga wrote:
Hi David,
Thank you for your answer. You're right - I have to understand a lot more in order to be able to create a new backend. I may have many more questions.
I guess the first one is: for a rebuild of the compiler (from the last bootstrap stage, not from scratch), how does the build system find out which files to compile, and in what order?
On Thu, Nov 23, 2023 at 4:12?AM David Matthews < David.Matthews at prolingua.co.uk> wrote:
Hi Andrei, It would be interesting to have another back-end but I really don't think what you are suggesting is feasible. There are currently three back-ends: native code for the X86(32/64), native code for the ARM64 and byte code. The byte code is interpreted by part of the run-time system and is used on architectures other than the X86 and ARM64 but it is also used during the initial bootstrap on the X86 and ARM64.
Apart from a small amount of architecture-specific code, and of course the interpreter in C++ for the byte code, all these back-ends make use of the same run-time system support. The run-time system is intimately bound up with the ML part of the system. They share a common view of how values are represented: short integers are tagged, addresses are not tagged, strings have a length word followed by byte data etc. Any new back-end has to maintain these representations. Before you even think about writing a new back-end you need to understand how all this works.
David
polyml mailing list polyml at inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml