On Thu, 13 May 2010, David Matthews wrote:
Makarius wrote:
On Linux on the same hardware class the result is like that:
ML> elapsed_time (fn () => join (fork "true")) (); 0.102
These 100ms are somehow intrinsic to the way the Poly/ML runtime system invokes external processes. David can explain that better. (I wonder if the builtin delay loop is still required these days.)
The reason has to do with the limitations of the "waitpid" system call. The "select" system call allows a thread to block until any of a set of file descriptors become available or until a time-out whichever comes first. There's no equivalent when waiting for a process. The only choices are to block indefinitely or to include the WNOHANG option bit and poll the current status and return immediately. We don't want a thread to block indefinitely because if it receives an interrupt through Thread.Thread.interrupt() it needs to be woken up. That means the only option is to use polling. The current code blocks for 100ms then checks again. I guess an alternative would be to use an extra internal (blocking) thread and pipe and have the ML thread use "select" on the pipe. I could do that if this is a significant problem. Another possibility would be for your code to install a signal handler for SIGCHLD.
This sounds like a considerable complication to address such a minor issue. What I have done locally is to modify processes.cpp as follows:
Index: polyml/libpolyml/processes.cpp =================================================================== --- polyml/libpolyml/processes.cpp (revision 1165) +++ polyml/libpolyml/processes.cpp (working copy) @@ -994,8 +994,8 @@ void Waiter::Wait(unsigned maxMillisecs) { // Since this is used only when we can't monitor the source directly - // we set this to 100ms so that we're not waiting too long. - if (maxMillisecs > 100) maxMillisecs = 100; + // we set this to 10ms so that we're not waiting too long. + if (maxMillisecs > 10) maxMillisecs = 10; #ifdef WINDOWS_PC /* We seem to need to reset the queue before calling MsgWaitForMultipleObjects otherwise it frequently returns
With the following result:
ML> elapsed_time (fn () => join (fork "true")) (); 0.012
This is perfectly OK for our purposes -- we distribute Isabelle with our own compilation of Poly/ML anyway (it also includes a fast libsha1.so now, so people will really need to use our distribution anyway).
Are 10ms adequate for the Wait above? Can anything bad happen here? I would like to avoid this debianistic mistake of "improving" existing systems without any clue about the consequences.
Makarius