5.3 variability in Posix.Process.exec

List overview All Threads
Download

newer

older

Poly/ML interpreter's opcodes

Poly/ML exit codes

Michael.Norrish＠nicta.com.au

9 May 2010 9 May '10

12:19 p.m.

On my Intel Macbook Pro running 10.5.8, I get

SysErr ("Operation not supported", SOME ENOTSUP)

On a Linux michaeln-desktop 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:56 UTC 2010 i686 GNU/Linux

I get:

-- Poly/ML 5.3 Release

...

Posix.Process.exec ("/bin/ls", []);

Warning-The type of (it) contains a free type variable. Setting it to a unique monotype. Segmentation fault

-- On a slightly different Linux atp-login1 2.6.26-2-686 #1 SMP Wed Aug 19 06:06:52 UTC 2009 i686 GNU/Linux

I get the expected behaviour...

Is there anything I should have done differently when building these Poly instances?

Michael

Show replies by date

David.Matthews＠prolingua.co.uk

10 May 10 May

4:37 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

Michael Norrish wrote:

...

On my Intel Macbook Pro running 10.5.8, I get

SysErr ("Operation not supported", SOME ENOTSUP)

On a Linux michaeln-desktop 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:56 UTC 2010 i686 GNU/Linux

I get:

-- Poly/ML 5.3 Release

...
Posix.Process.exec ("/bin/ls", []);

Warning-The type of (it) contains a free type variable. Setting it to a unique monotype. Segmentation fault

-- On a slightly different Linux atp-login1 2.6.26-2-686 #1 SMP Wed Aug 19 06:06:52 UTC 2009 i686 GNU/Linux

I get the expected behaviour...

--

I tried this out under the debugger and found it was actually "ls" itself that was crashing. I suspect the reason for the difference in behaviour has to do with differences in the implementation of "/bin/ls" on different platforms. When calling Posix.Process.exec you have to pass all the arguments including the first item which is conventionally the name of the program. So you should always be calling it as Posix.Process.exec("/bin/ls", ["ls"]);

I seem to recall that this changed in the 5.3 release from previous releases which included the program name in the argument list automatically but the Standard Basis library documentation clearly says that the new implementation is correct. I guess one of the reasons is that some programs actually do different things when invoked by different names.

David

Michael.Norrish＠nicta.com.au

11 May 11 May

1:22 a.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

...

I tried this out under the debugger and found it was actually "ls" itself that was crashing. I suspect the reason for the difference in behaviour has to do with differences in the implementation of "/bin/ls" on different platforms. When calling Posix.Process.exec you have to pass all the arguments including the first item which is conventionally the name of the program. So you should always be calling it as Posix.Process.exec("/bin/ls", ["ls"]);

...

I seem to recall that this changed in the 5.3 release from previous releases which included the program name in the argument list automatically but the Standard Basis library documentation clearly says that the new implementation is correct. I guess one of the reasons is that some programs actually do different things when invoked by different names.

This gives me the right behaviour on the Linux that had been causing a core dump, but on the Mac, I still get SysErr ("Operation not supported", SOME ENOTSUP).

Michael

Michael.Norrish＠nicta.com.au

5:01 a.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

On 11/05/10 00:37, David Matthews wrote:

...

I tried this out under the debugger and found it was actually "ls" itself that was crashing. I suspect the reason for the difference in behaviour has to do with differences in the implementation of "/bin/ls" on different platforms. When calling Posix.Process.exec you have to pass all the arguments including the first item which is conventionally the name of the program. So you should always be calling it as Posix.Process.exec("/bin/ls", ["ls"]);

...

I seem to recall that this changed in the 5.3 release from previous releases which included the program name in the argument list automatically but the Standard Basis library documentation clearly says that the new implementation is correct. I guess one of the reasons is that some programs actually do different things when invoked by different names.

Incidentally, on the machine where Posix.Process.exec("/bin/ls", []) seg. faults, Unix.execute("/bin/ls", []) works correctly. The Basis library documentation I have for the Unix structure doesn't discuss how the argv component should be setup, but attempting

Unix.execute("/bin/ls", ["ls"])

prompts a message from ls saying that there is no file called ls.

This doesn't seem consistent.

Curiously, on the Macbook where Posix.Process.exec gives ENOTSUP, Unix.execute does work.

Michael

David.Matthews＠prolingua.co.uk

11:59 a.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

Michael Norrish wrote:

...

Incidentally, on the machine where Posix.Process.exec("/bin/ls", []) seg. faults, Unix.execute("/bin/ls", []) works correctly. The Basis library documentation I have for the Unix structure doesn't discuss how the argv component should be setup, but attempting

Unix.execute("/bin/ls", ["ls"])

prompts a message from ls saying that there is no file called ls.

This doesn't seem consistent.

Curiously, on the Macbook where Posix.Process.exec gives ENOTSUP, Unix.execute does work.

It turns out that on Mac OS X execv returns ENOTSUP if the process is multi-threaded. You need to use Posix.Process.fork first to start a new process: case Posix.Process.fork() of SOME _ => OS.Process.exit OS.Process.success | NONE => Posix.Process.exec("/bin/ls", ["ls"]);

This isn't documented in Apple's man page for execv and I found the explanation at http://factor-language.blogspot.com/2007/07/execve-returning-enotsup-on-mac-...

Unix.execute is implemented in terms of fork and exec and the exec is called in the child process, roughly the code above.

The arguments for Posix.Process.exec and Unix.execute are different. The basis library book (Ganser and Reppy 2004) says for Unix.execute "execute(cmd, args) asks the operating system to execute the program named by the string cmd with the argument list args." For Posix.Process.exec it says "The args argument is a list of string arguments to be passed to the new program. By convention, the first item in args is some form of the filename of the new program, usually the last arc in the path or filename." Like any informal specification this is subject to interpretation but I think the current behaviour conforms to this definition.

David

Michael.Norrish＠nicta.com.au

1:05 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

On 11/05/10 19:59 , David Matthews wrote:

...

Michael Norrish wrote:

...
Incidentally, on the machine where Posix.Process.exec("/bin/ls", []) seg. faults, Unix.execute("/bin/ls", []) works correctly. The Basis library documentation I have for the Unix structure doesn't discuss how the argv component should be setup, but attempting

Unix.execute("/bin/ls", ["ls"])

prompts a message from ls saying that there is no file called ls.

This doesn't seem consistent.

Curiously, on the Macbook where Posix.Process.exec gives ENOTSUP, Unix.execute does work.

It turns out that on Mac OS X execv returns ENOTSUP if the process is multi-threaded. You need to use Posix.Process.fork first to start a new process: case Posix.Process.fork() of SOME _ => OS.Process.exit OS.Process.success | NONE => Posix.Process.exec("/bin/ls", ["ls"]);

Hmm. Thanks for going to the trouble of figuring this out. Is it clear what exit code the invoker will get in this situation? It looks to me as if it will get success, regardless of what happens in the forked child. If that's what happens, my feeling is that it's a bug (in Apple's design), and I'll have to have the parent hang around waiting for the child to finish. Of course, this is exactly what I was hoping to avoid by using exec in the first place. And it is important to get the right error code because it's all part of a build system that needs to know when to stop and when to continue.

(I just did a very simple-minded test which seemed to confirm that the exec-ed process's code is indeed lost if you use the idiom above. Boo hiss.)

Michael

David.Matthews＠prolingua.co.uk

4:35 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

Michael Norrish wrote:

...

...
It turns out that on Mac OS X execv returns ENOTSUP if the process is multi-threaded. You need to use Posix.Process.fork first to start a new process: case Posix.Process.fork() of SOME _ => OS.Process.exit OS.Process.success | NONE => Posix.Process.exec("/bin/ls", ["ls"]);

Hmm. Thanks for going to the trouble of figuring this out. Is it clear what exit code the invoker will get in this situation? It looks to me as if it will get success, regardless of what happens in the forked child. If that's what happens, my feeling is that it's a bug (in Apple's design), and I'll have to have the parent hang around waiting for the child to finish. Of course, this is exactly what I was hoping to avoid by using exec in the first place. And it is important to get the right error code because it's all part of a build system that needs to know when to stop and when to continue.

(I just did a very simple-minded test which seemed to confirm that the exec-ed process's code is indeed lost if you use the idiom above. Boo hiss.)

In the above code the invoker will return immediately with "success". It ought to be possible to modify the code to use Posix.Process.waitpid to wait for the child process. Possibly something along the lines of SOME pid => (case Posix.Process.waitpid(Posix.Process.W_CHILD pid, []) of W_EXITED => OS.Process.exit OS.Process.success | W_EXITSTATUS n => Posix.Process.exit n | _ => ??? ) However you do this you probably want to avoid using Posix.Process.fork on Cygwin (if that's relevant) since it's very expensive so it might be a good idea to try using Posix.Process.exec first and only use "fork" if you get ENOTSUP. It would certainly be worth looking at Makarius' code.

This does look like something non-standard in Apple's code. The detailed page for Posix "exec" simply says, "A call to any exec function from a process with more than one thread shall result in all threads being terminated and the new executable image being loaded and executed. No destructor functions or cleanup handlers shall be called."

David

Michael.Norrish＠nicta.com.au

12 May 12 May

1:28 a.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

...

In the above code the invoker will return immediately with "success". It ought to be possible to modify the code to use Posix.Process.waitpid to wait for the child process. Possibly something along the lines of SOME pid => (case Posix.Process.waitpid(Posix.Process.W_CHILD pid, []) of W_EXITED => OS.Process.exit OS.Process.success | W_EXITSTATUS n => Posix.Process.exit n | _ => ??? )

...

However you do this you probably want to avoid using Posix.Process.fork on Cygwin (if that's relevant) since it's very expensive so it might be a good idea to try using Posix.Process.exec first and only use "fork" if you get ENOTSUP. It would certainly be worth looking at Makarius' code.

Well, it seems as if the problem for exec is just on MacOS, and I'm happy to fork there, even if it's annoying to have to do so. Cygwin is not really relevant, but I suppose that, out of idle curiosity, it would nice to know if Posix.Process.exec worked there. My application really only wants to exec, not fork.

...

This does look like something non-standard in Apple's code. The detailed page for Posix "exec" simply says, "A call to any exec function from a process with more than one thread shall result in all threads being terminated and the new executable image being loaded and executed. No destructor functions or cleanup handlers shall be called."

A spec that we know is implementable because Linux manages it...

Michael

David.Matthews＠prolingua.co.uk

2:55 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

Michael Norrish wrote:

...

Well, it seems as if the problem for exec is just on MacOS, and I'm happy to fork there, even if it's annoying to have to do so. Cygwin is not really relevant, but I suppose that, out of idle curiosity, it would nice to know if Posix.Process.exec worked there. My application really only wants to exec, not fork.

It works fine on Cygwin. I did a further experiment on Mac OS X to see if exec would work if it was invoked from the "main" thread (i.e. the thread that initially starts the poly executable). In Mac OS X that's the only thread that receives asynchronous signals so has some special status. Unfortunately that didn't work either so it doesn't look as though there's any alternative to using "fork".

David

makarius＠sketis.net

9:20 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

On Wed, 12 May 2010, Michael Norrish wrote:

...

Cygwin is not really relevant, but I suppose that, out of idle curiosity, it would nice to know if Posix.Process.exec worked there.

I have checked the (long) mail exchange with David from 2 years ago, when we sorted out many issues to make it work the way it is in Isabelle2008 or later, until today.

Attached is an earlier experiment involving straight-forward fork/exec with join/kill operations. Here is an example on Cygwin:

ML> elapsed_time (fn () => join (fork "true")) (); 0.719

That's almost a second penalty. The delay depends on various factors, including the size of the running ML process. The best I can get here is approx. 0.4 seconds.

On Linux on the same hardware class the result is like that:

ML> elapsed_time (fn () => join (fork "true")) (); 0.102

These 100ms are somehow intrinsic to the way the Poly/ML runtime system invokes external processes. David can explain that better. (I wonder if the builtin delay loop is still required these days.)

Makarius

David.Matthews＠prolingua.co.uk

13 May 13 May

2:45 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

Makarius wrote:

...

On Wed, 12 May 2010, Michael Norrish wrote:

...
Cygwin is not really relevant, but I suppose that, out of idle curiosity, it would nice to know if Posix.Process.exec worked there.

I have checked the (long) mail exchange with David from 2 years ago, when we sorted out many issues to make it work the way it is in Isabelle2008 or later, until today.

Attached is an earlier experiment involving straight-forward fork/exec with join/kill operations. Here is an example on Cygwin:

ML> elapsed_time (fn () => join (fork "true")) (); 0.719

That's almost a second penalty. The delay depends on various factors, including the size of the running ML process. The best I can get here is approx. 0.4 seconds.

The problem is that there's no equivalent to "fork" on Windows so Cygwin has to simulate it. I don't know how it does it but it's bound to be expensive. Ironically, the CreateProcess call in Windows does exactly what's required most of the time by providing a combined "fork" and "exec".

...

On Linux on the same hardware class the result is like that:

ML> elapsed_time (fn () => join (fork "true")) (); 0.102

These 100ms are somehow intrinsic to the way the Poly/ML runtime system invokes external processes. David can explain that better. (I wonder if the builtin delay loop is still required these days.)

The reason has to do with the limitations of the "waitpid" system call. The "select" system call allows a thread to block until any of a set of file descriptors become available or until a time-out whichever comes first. There's no equivalent when waiting for a process. The only choices are to block indefinitely or to include the WNOHANG option bit and poll the current status and return immediately. We don't want a thread to block indefinitely because if it receives an interrupt through Thread.Thread.interrupt() it needs to be woken up. That means the only option is to use polling. The current code blocks for 100ms then checks again. I guess an alternative would be to use an extra internal (blocking) thread and pipe and have the ML thread use "select" on the pipe. I could do that if this is a significant problem. Another possibility would be for your code to install a signal handler for SIGCHLD.

David

makarius＠sketis.net

14 May 14 May

11:10 a.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

On Thu, 13 May 2010, David Matthews wrote:

...

Makarius wrote:

...
On Linux on the same hardware class the result is like that:

ML> elapsed_time (fn () => join (fork "true")) (); 0.102

These 100ms are somehow intrinsic to the way the Poly/ML runtime system invokes external processes. David can explain that better. (I wonder if the builtin delay loop is still required these days.)

The reason has to do with the limitations of the "waitpid" system call. The "select" system call allows a thread to block until any of a set of file descriptors become available or until a time-out whichever comes first. There's no equivalent when waiting for a process. The only choices are to block indefinitely or to include the WNOHANG option bit and poll the current status and return immediately. We don't want a thread to block indefinitely because if it receives an interrupt through Thread.Thread.interrupt() it needs to be woken up. That means the only option is to use polling. The current code blocks for 100ms then checks again. I guess an alternative would be to use an extra internal (blocking) thread and pipe and have the ML thread use "select" on the pipe. I could do that if this is a significant problem. Another possibility would be for your code to install a signal handler for SIGCHLD.

This sounds like a considerable complication to address such a minor issue. What I have done locally is to modify processes.cpp as follows:

Index: polyml/libpolyml/processes.cpp =================================================================== --- polyml/libpolyml/processes.cpp (revision 1165) +++ polyml/libpolyml/processes.cpp (working copy) @@ -994,8 +994,8 @@ void Waiter::Wait(unsigned maxMillisecs) { // Since this is used only when we can't monitor the source directly - // we set this to 100ms so that we're not waiting too long. - if (maxMillisecs > 100) maxMillisecs = 100; + // we set this to 10ms so that we're not waiting too long. + if (maxMillisecs > 10) maxMillisecs = 10; #ifdef WINDOWS_PC /* We seem to need to reset the queue before calling MsgWaitForMultipleObjects otherwise it frequently returns

With the following result:

ML> elapsed_time (fn () => join (fork "true")) (); 0.012

This is perfectly OK for our purposes -- we distribute Isabelle with our own compilation of Poly/ML anyway (it also includes a fast libsha1.so now, so people will really need to use our distribution anyway).

Are 10ms adequate for the Wait above? Can anything bad happen here? I would like to avoid this debianistic mistake of "improving" existing systems without any clue about the consequences.

Makarius

David.Matthews＠prolingua.co.uk

12:47 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

Makarius wrote:

...

On Thu, 13 May 2010, David Matthews wrote:

...
I guess an alternative would be to use an extra internal (blocking) thread and pipe and have the ML thread use "select" on the pipe.

This sounds like a considerable complication to address such a minor issue.

Well, there's code in the Windows-specific part to do this for pipes in Windows which suffer a similar problem. It's probably more important in that situation because it affects each read from a pipe not just the detection of when the process has completed. I don't think it would be hard to do the same for process termination in Unix.

...

Are 10ms adequate for the Wait above? Can anything bad happen here?

I can't see any reason why there should be a problem. Essentially, it's a matter of avoiding wasting too many CPU cycles with unnecessary polling. I could make the same change in Poly/ML SVN.

David

makarius＠sketis.net

11 May 11 May

1:16 p.m.

New subject: [polyml] 5.3 variability in Posix.Process.exec

On Tue, 11 May 2010, David Matthews wrote:

...

Michael Norrish wrote:

...
Incidentally, on the machine where Posix.Process.exec("/bin/ls", []) seg. faults, Unix.execute("/bin/ls", []) works correctly. The Basis library documentation I have for the Unix structure doesn't discuss how the argv component should be setup, but attempting

Unix.execute("/bin/ls", ["ls"])

prompts a message from ls saying that there is no file called ls.

This doesn't seem consistent.

Curiously, on the Macbook where Posix.Process.exec gives ENOTSUP, Unix.execute does work.

It turns out that on Mac OS X execv returns ENOTSUP if the process is multi-threaded. You need to use Posix.Process.fork first to start a new process: case Posix.Process.fork() of SOME _ => OS.Process.exit OS.Process.success | NONE => Posix.Process.exec("/bin/ls", ["ls"]);

This isn't documented in Apple's man page for execv and I found the explanation at http://factor-language.blogspot.com/2007/07/execve-returning-enotsup-on-mac-...

...

The arguments for Posix.Process.exec and Unix.execute are different. The basis library book (Ganser and Reppy 2004) says for Unix.execute "execute(cmd, args) asks the operating system to execute the program named by the string cmd with the argument list args."

...

For Posix.Process.exec it says "The args argument is a list of string arguments to be passed to the new program. By convention, the first item in args is some form of the filename of the new program, usually the last arc in the path or filename." Like any informal specification this is subject to interpretation but I think the current behaviour conforms to this definition.

Since these are once again the typical multi-platform issues that I know only too well, I would like to point out how it is done in Isabelle/ML: http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2009-1/src/Pure/ML-Sys...

The code might look scary, but it works well on Linux, Mac OS, Cygwin, which are the three official platform families supported by Isabelle. Extra complexitiy is caused by our multithreading setup, because several ML threads could invoke external processes independently, and we expect internal Thread.interrupt propagation, just as for any other ML execution.

Multiplatform support essentially works by ignoring official "standards", but use things that are known to behave uniformly on all systems:

* OS.Process.system which David has adjusted at some point to prevent surprises with fork/exec on the windows platform, which would otherwise copy the whole process memory space.

* Perl plumbing to get nested process groups right, to allow signals sent to sub-shell scripts, pipes etc.

* GNU bash to escape from the unknown standard /bin/sh -- there are many different versions of that on different systems, and they hardly agree on essentials like signal propagation.

Thus Isabelle/ML users can do strange things reliably, e.g. on Cygwin invoke system "notepad.exe" from an ML thread with some ML timout wrapping, and still get the expected behaviour:

ML> TimeLimit.timeLimit (Time.fromSeconds 3) system "notepad.exe"; Exception- TimeOut raised

Here is another example:

ML> val x = Future.fork (fn () => system "notepad.exe"); ML> Future.cancel x;

Makarius

5557

Age (days ago)

5562

Last active (days ago)

polyml@lists.polyml.org

13 comments

3 participants

tags (0)

participants (3)

David.Matthews＠prolingua.co.uk
makarius＠sketis.net
Michael.Norrish＠nicta.com.au