Dear all, I am experiencing inefficiency while using the Futures library within PolyML. Parallel threads are being forced to 'sleep' for several seconds at a time for no clear reason, and I suspect it may be due to memory management between multiple threads. The test machine has 80GB RAM and 24 available cores. The parallel process works in roughly the same way as the Bag of Tasks Model with Replacements: (I'll refer to the distributor of tasks as the Controller, and other threads as Workers) 1. The Controller constructs an initial list of tasks.2. It sends the tasks to Workers in parallel, distributing across cores appropriately. Tasks are spawned by Future calls. They return a mixture of new tasks and results.3. It synchronises on all of the spawned tasks, organises them and repeats from (2) if any tasks remain. Tasks require a modest amount of memory and significant time to run, but there are 100,000s of them. The memory consumed can rise by 10GB in a few seconds but does not exceed the available RAM, and is held almost entirely by the Controller. The process can take minutes to run, but I see that spawned Futures in the program are required to sleep for the majority (~75%) of the time. The turnover in memory is very high as well, especially in the Controller thread, so calls to the garbage collector can reduce the RAM required dramatically. Initially, as the original problem size grows the speedup via parallelisation rises to a peak of around 6 times, requiring only a few GBs of RAM. However, as the problem size continues to rise and the problem should be easier to parallelise, the speedup falls to the point where a sequential algorithm evaluating tasks one at a time is more efficient. Artificial tests show that when virtually no memory is required, a speedup of over 10 times is quite easy to achieve. What I believe might be causing this is the allocation of segment sizes to threads in the heap. I believe that calls to the common memory pool to change the segment size are causing all threads to sleep unnecessarily. Does each thread currently have its own independent adaptive segment size, or are all threads required to use the same segment size? Do all threads have to pause if one segment size is being changed? I have timed the synchronisation phase above and it requires 10-100s of microseconds typically, so not accounting for the regular sleep times of several seconds. I know the threads are being paused mid-task computation, and I have tried increasing the heap size as well to no effect. Does anyone know what might be causing this behaviour?Best regards,Michael