fork 安全

Mike Ash Friday Q&A 中文译文:fork 安全

作者 TommyWu
封面圖片: fork 安全

译文 · 原文: Friday Q&A 2012-01-20: Fork Safety · 作者 Mike Ash

原文:https://www.mikeash.com/pyblog/friday-qa-2012-01-20-fork-safety.html 发布:2012-01-20 作者:Mike Ash 译者:MiMo(mimo-v2.5-pro);代码块保留英文原样


又是时候来探讨那些古怪的编程秘辛了。在今天的文章中,我想深入了解 fork-safe(fork 安全)代码的细节、为何存在这些限制,以及你为何可能需要关心这些 —— 这个话题由 Ben Mitchell 提出。

Fork/Exec

fork 调用是创建新进程的标准 UNIX 方式。这是一个不寻常的调用,因为它每次调用会返回两次:一次在原始进程中返回,一次在新进程中返回。

如果你还不熟悉这个概念,可能一下子难以理解。一个小小的例子或许会有所帮助:

pid_t processID = fork();
// if successful, fork returns twice here
// in the parent, it returns with processID set to the pid of the child
// in the child, it returns with processID set to 0
// check for errors first
if(processID == -1)
{
// handle the error here
}
else if(processID == 0)
{
// this code only runs in the child process
}
else
{
// this code only runs in the parent process, and processID
// contains the child's process identifier
}

与大多数操作系统不同,UNIX 将创建新进程与启动新程序这两个概念分离开来。fork 调用创建一个新进程,该进程运行与之前相同的程序。而 exec 系列调用则是其伴随操作,它在不创建新进程的情况下运行一个新程序。要在新进程中启动新程序,UNIX 的标准做法是先调用 fork,然后调用 exec

例如,要在子进程中启动一个 shell 并向其传递命令的代码大致如下:

pid_t processID = fork();
if(processID == 0)
{
execl("/bin/sh", "/bin/sh", "-c", "some shell command here", (char *)NULL);
// if successful, exec never returns, because the new program
// begins executing, replacing the old one
// any code which runs at this point is therefore due to failure
char errstr[] = "Error calling execl, exiting the child.\n";
write(STDERR_FILENO, errstr, sizeof(errstr));
_exit(1);
}
// parent is the only one that could be running here
// proceed from here with the new process running

子进程中的安全代码
你可能注意到上文示例中的错误处理代码有些奇怪。通常这类情况下的错误处理代码会调用 fprintf 然后退出。

子进程中的环境十分严苛,要在那里编写正确的代码实属不易。除了直接调用一个 exec 函数外,任何额外操作都必须格外小心。这就是错误处理代码采用那种写法的原因。

要理解这一点,最好从内核如何实现 fork 以及这对子进程意味着什么的角度思考。

在最基本的抽象层面上,fork 只是创建了调用进程的一个副本。内核将进程关联的所有信息 —— 比如打开的文件描述符、内存和执行状态 —— 复制到一个新进程中。(内存通常采用写时复制(copy-on-write)方案以提高效率。)副本会经过微调,使得 fork 返回一个不同的返回值,然后新进程便开始运行。

起初,情况就这么简单。新进程与旧进程并无二致,两者便可以分道扬镳。

但后来多线程(multithreading)的出现彻底打乱了这一切。在 UNIX 的早期,一个进程(process)的执行状态(execution state)只包含单个线程(thread),自然就是调用 fork 的那个线程。然而,当进程拥有多个线程时,在复制进程时该如何处理这些线程呢?

如果将所有线程都复制过去,那么子进程启动时就会陷入大麻烦。试想一下,例如父进程中的一个线程正在调用 fork,而另一个线程正在向某个文件写入数据。fork 完成后,父进程继续写入该文件。与此同时,子进程启动了,它也继续向该文件写入,结果导致了一堆损坏的数据。这可不行!

如果所有线程都被复制,那么每个线程都需要感知到任何对 fork 的调用并做好相应准备。这将使任何调用 fork 的程序在模块化(modularity)设计上的任何希望都化为泡影。

解决方案是只复制调用 fork 的线程,其他线程则被遗留在父进程中。(实际上,其他线程的栈内存仍会被复制,以防其他地方存在对其的引用,但它们的执行状态不会被保留。)从子进程的角度看,所有其他线程似乎都被 fork 调用终止了。

这样是好了一些,但仍然很糟糕。终止线程是一种暴力且危险的操作。虽然有时可以侥幸成功,但如果某个线程正在执行关键任务,它将永远无法完成(从子进程的角度来看)。如果该线程持有一个锁,那个锁将永远不会被释放。

最后一点非常重要。锁被广泛使用以确保代码在多线程环境下调用的安全性。例如,malloc 就使用了锁。有时 objc_msgSend 也会用到 —— 在 Objective-C 中每当你编写 [] 消息发送表达式时,实际上都在调用它。

The request was rejected because it was considered high risk

通常,解决方案是在调用 fork 前尽可能完成准备工作。这可能意味着之后需要在父进程中进行一些清理,虽然麻烦,但对于正确代码来说最终是必要的。例如,你可能会写出如下从 NSString 获取可执行路径的错误代码

pid_t processID = fork();
if(processID == 0)
{
execl([path fileSystemRepresentation], ...);

这是不正确的,因为在子进程中运行了不安全代码。调用 -fileSystemRepresentation 会触发 objc_msgSend(消息发送),很可能分配内存,并可能引发任意数量的其他不安全调用。不过修复方法很简单,只需预先获取路径即可:

const char *pathCStr = [path fileSystemRepresentation];
pid_t processID = fork();
if(processID == 0)
{
execl(pathCStr, ...);

父进程中不需要任何清理代码,因为 pathCStr 实际上是自动释放(autoreleased)的。这里存在轻微的性能开销,因为 pathCStr 必须在父进程中被释放,但这种开销可以忽略不计,为正确性付出这点代价是值得的。

再举一个例子,你可能有一组需要在子进程中关闭的文件描述符(file descriptors)。下面是一个从 NSArray 获取这些描述符的错误示例:

pid_t processID = fork();
if(processID == 0)
{
for(NSNumber *fdObj in fdArray)
close([fdObj intValue]);

遍历数组并调用 intValue 都是不安全的操作。然而,这段代码不能简单地提前执行,因为我们不想在父进程中关闭这些文件描述符(file descriptors),只希望在子进程中关闭。解决方案是将数组转换为可在子进程中安全访问的数据结构,例如 C 数组:

NSUInteger fdArrayCount = [fdArray count];
int *fdArrayC = malloc(fdArrayCount * sizeof(*fdArrayC));
int *fdArrayCursor = fdArrayC;
for(NSNumber *fdObj in fdArray)
*fdArrayCursor++ = [fdObj intValue];
pid_t processID = fork();
if(processID == 0)
{
for(NSUInteger i = 0; i < fdArrayCount; i++)
close(fdArrayC[i]);
...
}
free(fdArrayC);

一个更棘手的情况是需要定制子进程的环境。最自然的做法是调用 setenv 来设置定制的环境变量。然而,调用该函数并没有合适的时机:不能在 fork 之后调用,因为它不是安全的 API;也不能在 fork 之前调用,因为环境是共享状态,另一个线程可能在你执行 fork 之前覆盖你的修改。

幸运的是,可以通过完全绕过 setenv 来解决这个问题,转而通过调用 execve 在启动新可执行文件的同时设置环境。该调用接收一个路径、一个参数数组和一个环境变量数组。当然,你必须确保在调用 fork 之前分配并填充这些数组,但这实现起来不会太麻烦。

编写安全的 API
如果出于某种原因你需要创建一个必须在 fork 的子进程端工作的 API,你需要像直接编写子进程端代码时一样保持谨慎。你对 API 的使用必须受到极其严格的限制。因此,这样做通常不太现实。

然而,如果你确实想这样做,可能会发现自己需要在 fork 调用前后做一些额外的准备工作。要求每次调用 fork 都必须包含你的 API 准备代码是不切实际的,但幸运的是系统提供了内置机制来解决这个问题。你可以使用 pthread_atfork 注册回调函数:注册一个在 fork 之前运行的回调,一个在父进程中 fork 之后运行的回调,以及一个在子进程中 fork 之后运行的回调。你可以利用这个机制强制 fork 等待锁释放和数据结构达到一致状态,或者简单地将子进程置于特殊模式以避免不安全操作。

但总体而言,通常的做法是 fork 后立即执行 exec,因此专门设计一个 API 来安全地介于两者之间基本上是没有意义的。

结论

如今很少需要使用 fork(在 iOS 上完全不能使用),但它偶尔仍会出现。fork 之后的不安全代码极其常见,因此需要警惕。即使你自己从未编写过 fork 调用,其背后的历史和限制也能为我们理解整个系统的构建方式提供一个有趣的视角。

今天就到这里。下次我们将继续探索奇妙的编程世界。如果你是第一次阅读,需要说明的是,周五问答(Friday Q & A)这个栏目是由读者驱动的 —— 如果你有想看到的话题,欢迎发送过来!


#Original (English)

Source: https://www.mikeash.com/pyblog/friday-qa-2012-01-20-fork-safety.html

It’s once again time to dive into bizarre programming arcana. In today’s article, I want to look at the details of fork-safe code, why the restrictions are present, and why you might care, a topic suggested by Ben Mitchell.

Fork/ExecThe fork call is the standard UNIX way to create a new process. It’s an unusual call in that it returns twice for each call: once in the original process, and once in the new one.

If you aren’t already familiar with the concept, it can be tough to wrap your head around it. A little example may help:

pid_t processID = fork();
// if successful, fork returns twice here
// in the parent, it returns with processID set to the pid of the child
// in the child, it returns with processID set to 0
// check for errors first
if(processID == -1)
{
// handle the error here
}
else if(processID == 0)
{
// this code only runs in the child process
}
else
{
// this code only runs in the parent process, and processID
// contains the child's process identifier
}

Unlike most OSes, UNIX separates the concept of creating a new process from starting a new program. The fork call creates a new process which runs the same program as before. The exec family of calls is the companion which runs a new program, without creating a new process. To start a new program in a new process, the standard way to do it in UNIX is to call fork, then exec.

For example, the code to start a shell in a subprocess and pass it a command would look something like this:

pid_t processID = fork();
if(processID == 0)
{
execl("/bin/sh", "/bin/sh", "-c", "some shell command here", (char *)NULL);
// if successful, exec never returns, because the new program
// begins executing, replacing the old one
// any code which runs at this point is therefore due to failure
char errstr[] = "Error calling execl, exiting the child.\n";
write(STDERR_FILENO, errstr, sizeof(errstr));
_exit(1);
}
// parent is the only one that could be running here
// proceed from here with the new process running

This technique is losing favor these days, as newer, more efficient APIs like posix_spawn come along. Still, this is a common way to do things.

Safe Code in the Child ProcessYou’ll notice that the error handling code in the example above is a little weird. Normal error handling code in a case like that would call fprintf and then exit.

The environment in the child process is harsh and makes it difficult to write correct code to run there. Anything you do beyond a direct call to an exec function has to be done with great care. That’s why the error handling code is written the way it is.

To understand why this is, it’s best to think of how fork would be implemented in the kernel and what implications that has for the child process.

At the most basic, conceptual level, fork simply makes a duplicate of the calling process. The kernel just takes all of the stuff associated with the process, like open file descriptors, memory, and execution state, and replicates it into a new process. (Memory is typically done with a copy-on-write scheme for efficiency.) The copy is modified slightly to give a different return value from fork, and then the new process is started.

Back in the early days, that was it. The new process is the same as the old, and they can go their separate ways.

Then multithreading came along and messed everything up. In the early days of UNIX, a process’s execution state contained only a single thread, which was, of course, the one calling fork. Once a process has multiple threads, though, what do you do with them when you copy the process?

If you copy all of the threads, then you’re in serious trouble when the child process starts up. Imagine, for example, that one thread in the parent calls fork while another one is writing some data to a file. After the fork completes, the parent continues writing to the file. Meanwhile, the child starts up and it also continues writing to the file, resulting in a bunch of corrupted data. Not good!

If all threads were copied, then all threads would need to be aware of any calls to fork and prepare themselves accordingly. This would defeat any hope of modularity in any program that called fork.

The solution is to only copy the thread that called fork. All others get left behind in the parent. (In practice, the other threads’ stacks would be copied, in case there are any references to them from elsewhere, but not their execution state.) From the point of view of the child, it looks like all other threads were killed by the call to fork.

This is better, but still bad. Killing threads is violent and dangerous. You can often get away with it, but if the thread was in the middle of something important, it will never finish (from the point of view of the child). If the thread held a lock, that lock will never be unlocked.

That last part is important. Locks are used all over the place to make code safe when called from multiple threads. For example, malloc uses locks. So, occasionally, does objc_msgSend, which is called every time you write a [] message send expression in Objective-C.

Imagine that another thread was in the middle of a call to malloc, with a lock held, when you call fork. Afterwards, in the child process, you call malloc (or call something which calls something which calls malloc) and it tries to take the same lock. It will see that the lock is already held, and wait. It will wait forever, since the thread that was going to release it is now dead.

Thus, you can only safely call code that’s guaranteed not to suffer from this problem. As it happens, the allowed APIs are the same as those can be called from a signal handler. See the sigaction man page for the full list.

You’ll notice that this list is really small. fprintf is not on it, and neither is exit. However, write is allowed, as is _exit. Therefore you can see why I wrote the error handling code the way I did. (If you’re wondering, _exit is a sort of shortcut way to exit your process which skips a lot of cleanup.)

Working Around the LimitsYou can make very few calls in the child process after a fork. Yet, you often have setup that you want to do before calling exec to start a new executable. How can you reconcile these two opposing forces?

In general, the answer is to do as much as possible before the call to fork. This may entail doing some cleanup in the parent afterwards, which is annoying but ultimately necessary for correct code. For example, you might write this incorrect code to get the exec path from an NSString:

pid_t processID = fork();
if(processID == 0)
{
execl([path fileSystemRepresentation], ...);

This is incorrect due to running unsafe code in the child process. The call to -fileSystemRepresentation invokes objc_msgSend, probably allocates memory, and may make any number of other unsafe calls. The fix here is easy, though. Just fetch the path beforehand:

const char *pathCStr = [path fileSystemRepresentation];
pid_t processID = fork();
if(processID == 0)
{
execl(pathCStr, ...);

We don’t need any cleanup code in the parent, since pathCStr is effectively autoreleased. There’s a slight performance penalty here, since pathCStr has to be deallocated in the parent, but it’s negligible and a small price to pay for correctness.

As another example, you may have a list of file descriptors that need to be closed in the child. Here’s an incorrect example of fetching those descriptors from an NSArray:

pid_t processID = fork();
if(processID == 0)
{
for(NSNumber *fdObj in fdArray)
close([fdObj intValue]);

Enumerating over the array and calling intValue are both unsafe. However, this code can’t simply be moved earlier, since we don’t want to close these file descriptors in the parent, only the child. The answer here is to convert the array into a data structure we can safely access in the child, like a C array:

NSUInteger fdArrayCount = [fdArray count];
int *fdArrayC = malloc(fdArrayCount * sizeof(*fdArrayC));
int *fdArrayCursor = fdArrayC;
for(NSNumber *fdObj in fdArray)
*fdArrayCursor++ = [fdObj intValue];
pid_t processID = fork();
if(processID == 0)
{
for(NSUInteger i = 0; i < fdArrayCount; i++)
close(fdArrayC[i]);
...
}
free(fdArrayC);

One place where this gets trickier is when customizing the child’s environment. The most natural way to do this would be to call setenv to set the customized environment variables. However, there’s no good plase to call that function. You can’t do it after the fork, as it’s not a safe API. You can’t do it before the fork, as the environment is shared state and another thread may overwrite your change before you get to the fork.

Fortunately, this can be worked around by skipping setenv altogether, and instead setting the environment simultaneous with starting the new executable by calling execve. This call takes a path, and array of arguments, and an array of environment variables. Of course, you must be sure to allocate and fill out these arrays before calling fork, but that can be done without too much trouble.

Writing a Safe APIIf for some reason you want to create an API that needs to work on the child side of a fork, you need to exercise the same caution as when writing child-side code directly. Your use of APIs must be extremely limited. As such, it’s usually not practical to do this.

However, if you really want to do it, you may find yourself needing to do some extra work around the fork call to prepare. It’s impractical to require every call to fork to also contain your API’s preparation code, but fortunately there’s a built-in facility for this. You can register callbacks using pthread_atfork. You register one callback to run before the fork, then one to run afterwards in the parent, and a third to run afterwards in the child. You could use this to force the fork to wait for locks to be released and data structures to be consistent, or you could simply use it to put the child side into a special mode which avoids unsafe operations.

Overall, though, you generally want to fork and then immediately exec, so designing an API to be safely used in between is mostly pointless.

ConclusionThere’s little call to use fork these days (and you can’t use it at all on iOS), but it does occasionally pop up. Unsafe code after a fork is extremely common, so beware. Even if you never write a fork call yourself, the history and constraints behind it offer an interesting perspective on how the system as a whole is put together.

That’s it for today. Come back next time for another exciting dive into weird programming stuff. In case this is your first time reading, it just so happens that Friday Q&A is driven by reader ideas, so if you have an idea for a topic that you’d like to see covered, send it in!