On 15/05/2021 14:10, Florian Weimer wrote:
- David Matthews:
ARM64
There doesn't seem to be any measurable difference in speed by using these instructions compared with the ones without the memory barriers although the code is slightly longer.
Did you benchmark this on the M1 only, or on other AArch64 implementations as well? This result is very surprising.
Actually I tried it on the Microsoft SQ2. However this was a test with ML not with an imperative language using assignment and derefencing extensively. The point was to see if the cost of using instructions with memory barriers would outweigh the problems of having random failures in code. Memory barriers are only required for references; stores and loads of immutable data in the heap don't require them.
David