On Thu, 18 Oct 2007, David Matthews wrote:
Makarius wrote:
It might be possible to chain together stacks to increase the maximum stack size but there are a couple of problems that would need to be addressed. Maybe it's something to think about for version 5.2.
Using reasonably large stacks and multiple threads already works quite well in the forthcoming 5.1 version (i.e. the current CVS of Poly/ML). My impression is that for present day commodity hardware the total CPU utilization is as high as 75-85% (4 cores) or 60-70% (8 cores). So maybe you only need to worry after 1-2 more years :-)
How would increased memory bandwidth work in the poly runtime system? Stacks are already local to each thread, and chaining them in smaller pieces might give a small improvement. On the other hand, I'd guess that there is more potential for tuning heap access, maybe by giving each thread a local view on the mutable area?
I don't really understand what you're getting at.
I am just musing about certain observations when running many threads in parallel.
* Using the stack is probably just fine as it is now. Each thread has its own (reasonably large) segment. No extra losses for synchronization. (Is this correct?)
* Heap usage has extra penalties due to both synchronisations required by multiple threads allocating on the heap, and the single-threaded GC that occurrs from time to time.
Here the question is if some reorganisation of the existing two-stage heap model (small mutable area, large immutable area) could help to reduce the synchronization overhead.
Anyway, you probably know much better how things really work internally. I was just trying to understand where the lost CPU cycles with more that 8 parallel threads go. I've recently tried Poly/ML pre-5.1 on a 16 core machine, but failed to get more than 45% total CPU usage. (Since most people are still using 2-4 cores right now, there is probably no need for immediate action.)
Makarius