Now that 5.8.2 has been released I've updated Git master with some
changes that have been in the pipeline for some time. These affect a
number of issues so they probably need a bit of explanation. These are:
a basic code-generator for 64-bit ARM, position-independent executables
and a new bootstrap process.
This is quite a long message since each of these require some explanation.
Bootstrap
The updated bootstrap process affects all architectures. Because
Poly/ML is written in ML there needs to be an ML compiler to compile the
source. The solution up to now has been to have pre-built compilers for
each architecture. The bootstrap process then compiles the basis
library and builds the final binary from that. The problem is that this
requires a compiler for each architectures; for 5.8.2 that is seven in
all: X86/32, 32-bit interpreted, 64-bit interpreted, X86/64 and
X86/64/32, with different version for the last two for Windows and Unix
because of the different ABIs. Adding ARM64 would have increased this
further.
The solution was to bootstrap from the interpreted version. This
requires only two pre-built compilers, for 32- and 64-bits. However
building a final native code compiler requires the whole system to be
compiled several times. This doesn't take long on reasonable hardware
but can be slow on under-powered machines or with debugging turned on.
The final binary is as fast as before; it's just the bootstrap that is slow.
A consequence of this is that it is no longer necessary to run "make
compiler" when building from Git. Previously the compiler itself was
not rebuilt with a simple "make" so changes that involved the compiler
itself needed "make compiler" in order to be incorporated. That is no
longer the case since the bootstrap process recompiles the compiler.
"make compiler" has been retained for compatibility but it may actually
be better not to use it, particularly if --enable-intinf-as-int has been
included. --enable-intinf-as-int builds the basis library and any
subsequent code with int as arbitrary precision. Running "make
compiler" would build the compiler itself with arbitrary precision
rather than fixed precision which will be bulkier even if it doesn't
noticeably affect the speed.
Position-independent code
The new version now generates position-independent executables on X86/64
and ARM64. This has been in the pipe-line for a while but was spurred
on by the fact that Mac OS requires it for ARM code. What this means is
that the code segments in object files created by PolyML.export no
longer contain absolute addresses. The "constant area" associated with
the code for each function is pulled out and placed in a read-only,
non-executable area. It's too complicated to do this X86/32 so 32-bit
programs will continue to need special treatment on platforms that have
problems with non-PIC but for the majority of code on 64-bits there
should no longer be problems in this area. It doesn't apply to Windows,
which doesn't need this, or to compact 32-bit or interpreted code where
the "code" is not marked as executable.
ARM64
Last but not least there is now a basic code-generator for the ARM64.
This has been tested on a wide range of hardware and systems including
Windows 10, Debian under Windows-subsystem-for-Linux, Mac OS X, PiOS on
various 64-bit Raspberry Pis and even big-endian NetBSD on a Raspberry
Pi. It is complete including compiled FFI and compact 32-bit. However,
at this stage the code-generator has no optimisation or proper register
allocation and treats the machine as essentially single-register plus
stack. That greatly simplifies the code-generation at the expense of
bulky and slow code. On a Mac Mini X86/64 code translated by Rosetta is
still roughly 1.5 to 2 times faster. Rosetta code is reported to be
about 80% of the speed of optimised ARM code so things should improve
with optimisation.
There is one point about the ARM code-generator that is worth making.
The ARM has a weaker memory model than the X86 and that can affect
multi-threaded code with shared references. Code that uses mutexes to
protect all accesses to shared references is not affected since the
mutex access includes the appropriate memory barriers but during testing
I came across a problem with futures in Isabelle. Since once a future
is evaluated it can never change it is generally possible to access it
without a lock and only use a lock if it appears to be unevaluated.
Unlike the X86 the ARM does not guarantee that another thread will see
updates to different addresses in the same order as they are made by the
thread making the assignments. In this particular case a thread was
creating a value on the heap and then assigning the address of this heap
cell to a shared reference. On the X86 the update to the heap would
always be seen before the update to the shared reference but on the ARM
it was possible for another thread to cache that part of the heap and so
read values in the heap that were completely random. Since the
consequence of this is completely unpredictable behaviour I took the
decision to implement '!' and ':=' using instructions that incorporate
memory barriers giving read-acquire/store-release semantics. Currently
this does not apply to other mutable structures such as arrays but for
consistency it probably should. There doesn't seem to be any measurable
difference in speed by using these instructions compared with the ones
without the memory barriers although the code is slightly longer.
David