[polyml] 5.3 variability in Posix.Process.exec

14 May 2010

On Thu, 13 May 2010, David Matthews wrote:
...
Makarius wrote:
...
On Linux on the same hardware class the result is like that:
ML> elapsed_time (fn () => join (fork "true")) ();
  0.102
These 100ms are somehow intrinsic to the way the Poly/ML runtime system 
invokes external processes.  David can explain that better.  (I wonder if 
the builtin delay loop is still required these days.)
The reason has to do with the limitations of the "waitpid" system call. 
The "select" system call allows a thread to block until any of a set of 
file descriptors become available or until a time-out whichever comes 
first. There's no equivalent when waiting for a process.  The only 
choices are to block indefinitely or to include the WNOHANG option bit 
and poll the current status and return immediately.  We don't want a 
thread to block indefinitely because if it receives an interrupt through 
Thread.Thread.interrupt() it needs to be woken up.  That means the only 
option is to use polling.  The current code blocks for 100ms then checks 
again.  I guess an alternative would be to use an extra internal 
(blocking) thread and pipe and have the ML thread use "select" on the 
pipe.  I could do that if this is a significant problem.  Another 
possibility would be for your code to install a signal handler for 
SIGCHLD.
This sounds like a considerable complication to address such a minor 
issue.  What I have done locally is to modify processes.cpp as follows:
Index: polyml/libpolyml/processes.cpp
===================================================================

--- polyml/libpolyml/processes.cpp	(revision 1165)
+++ polyml/libpolyml/processes.cpp	(working copy)
@@ -994,8 +994,8 @@
  void Waiter::Wait(unsigned maxMillisecs)
  {
      // Since this is used only when we can't monitor the source directly
-    // we set this to 100ms so that we're not waiting too long.
-    if (maxMillisecs > 100) maxMillisecs = 100;
+    // we set this to 10ms so that we're not waiting too long.
+    if (maxMillisecs > 10) maxMillisecs = 10;
  #ifdef WINDOWS_PC
      /* We seem to need to reset the queue before calling
         MsgWaitForMultipleObjects otherwise it frequently returns
With the following result:
ML> elapsed_time (fn () => join (fork "true")) ();
0.012
This is perfectly OK for our purposes -- we distribute Isabelle with our 
own compilation of Poly/ML anyway (it also includes a fast libsha1.so now, 
so people will really need to use our distribution anyway).
Are 10ms adequate for the Wait above?  Can anything bad happen here?
I would like to avoid this debianistic mistake of "improving" existing 
systems without any clue about the consequences.
Makarius

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[polyml] 5.3 variability in Posix.Process.exec