Just fork(), they say.
For those of you not in the know, the Python has had a long-running debate over what’s known as the Global Interpreter Lock. I say “debate”, though from what I can see it usually -aside from the odd serious debate – amounts to newbies popping up and saying “why can’t I multi-thread properly on multi-core systems”, and the veterans telling them the way things are. The argument usually goes: multi-threading is how I’m supposed to take advantage of multi-cores. Not around here, replies the Python community; around here, we use the tried and tested methods like forking child processes.
So, I’ve been reading about this for a while, but yesterday was the first time that I’ve actually run across it in my own work. I’ve been writing a genetic algorithm for some modeling that I’m involved in, and with the recent construction of frankenputer, I’ve finally run into the need to parallelize my code to get some speed ups. I wrote this GA in python because a). I’ve come to loathe C++, and b). it’s just so damned easy to code in. Thus, I needed to parallelize a GA written in Python. Thankfully, genetic algorithms are embarrassingly parallel, so I didn’t need too much in the way of inter-process communication, but here was where I ran into a few bumps. For the veterans, my troubles are going to seem incredibly banal, but for other newbies, I thought that I might document my steps.
First: why not threads? If you don’t know, the Global Interpreter Lock (which applies to the C implementation of Python for sure; I’m not sure if other implementations have a GIL or not) is essentially a limit which means that for every instance of the Python interpreter, only one thread is in the driving seat at a time, so running multiple threads at once does nothing to boost speed even on multi-core systems. Thanks to the GIL, the only way to utilize multiple processors is to spawn multiple interpreters, because each interpreter has its own GIL and the processes will run separately.
At least on Unix-like systems (e.g. Linux, OS X, etc), one of the recommended ways to handle this is to create child processes with fork(); the parent processes acts as a bookkeeper, while the child processes do the work.
The general form of it is this:
def child(): # Actions to be completed in the child process go here. def parent(): # Actions to be completed in the parent process go here. .... # In the main section of your code: newpid = os.fork() if newpid == 0: child() else: parent()
Why does this work? It takes a bit of a mental adjustment to understand. (Note: because fork() is a Unix and look-alike OS system command, this basic paradigm will work for any programming langauge that provides an interface to the command, of which Python is only one. It just happens to be the one that I’m using).
Just before the os.fork() call, there is one process in memory. After the os.fork() call, there are two processes in memory: one is the parent, one is the child, and the child is an exact copy of the parent. When I say exact copy, I mean it. From the point of view of the programmer, each process has all the same variables, with the same values, and the point of execution starts from immediately after the os.fork() call in both processes.
How does each process know which one they’re in? The
newpid variable is the key.
os.fork() will return one of two values: to the parent process, it will return the process id (PID) of the child process, while to the child process it will return 0. That’s it! Easy, eh?
Usually, yes. There are some subtleties to account for, certainly. For instance, remembering that each process has the same state (variables + values) going into life after the
fork() process, you have to make sure not to do things like have both processes try to work with the same file at the same time (though that’s not the same as intentionally using an object to communicate, like with a pipe). But for my purposes, this trick was almost complete.
Almost, anyways. The situation was complicated by the fact that I needed to spawn more than one process – in fact, what I wanted to do was spawn a total of eight to max out the processors. Getting the list of child processes up to eight wasn’t that hard, since it just meant keeping track of the child process ids and counting them to know whether I’d hit eight yet. But what happens when the children are finished?
Well, the child processes die, but how does the parent know? Herein lied the rub, and I ended up meandering down the wrong path for a while. There’s a few ways to keep track of child processes. Two of the main ones are signals and
os.waitpid(), and I started with signals.
In the Unix world, when child processes die they send a signal, SIGCHLD. In Python, you can install a handler for that signal, essentially a function which is called when the parent process receives SIGCHLD. I thought that this sounded like a great way to keep all eight cores humming: just spawn eight child processes, then prune the list of child pids by dropping them when SIGCHLD is caught. However, this turns out to be problematic, because the signal itself wreaked havoc with my child processes in ways I don’t even fully understand; the biggest problem was that SIGCHLD aborted the subprocess calls that my other children were making. Googling around, I discovered that the use of SIGCHLD is generally discouraged (though the reasons for this are usually vague), so I switched to the other method.
The other method is
os.waitpid(). This call allows the parent to determine the status of child processes, and it takes two arguments. The first is the process id of the child to poll, and the second is the option. I’ve leave it to the python docs to explain all the subtleties. The important point is that by passing a pid of -1 to
waitpid(), the function will report on the status of all the child processes that the parent has spawned, which allowed me to poll the children while I did other processing with a call to
waitpid() that looked like this:
(pid,status) = os.waitpid(0,os.WNOHANG) if pid: # One of the children has died, let's get another going. ....
waitpid() returns two numbers: a process id, and a status code (see the documentation for
wait() in the os module). If none of the children have finished, the return from
waitpid() is (0,0); otherwise, the first element of the tuple is the process id of the child that was finished.
This turned out to be a much better mechanism for tracking the children that my parent process had spawned, and meant that the rest of the effort involved in getting my GA to run in parallel was minimal. So the main point of this particular screed, besides gratifying myself for the time I spent debugging, was to hopefully prevent someone else from making the same mistake that I did. So, there you go!
UPDATE: I was just re-reading this post, and I realized that I may have seemed like I was being a little hard on the GIL. From what I’ve read, it does seem like it has some serious advantages to go with the potential downside, and I’m fairly convinced that it was a good idea overall. I also didn’t mention that there are packages that will help you parallelize your code: for simpler jobs, you could try things like Parallel Python (which, from what I understand, does essentially what I did above with more finesse and the ability to do some inter-process communication), or you could scale up to the big guns and use a python binding to MPI – the Message Passing Interface – like mpi4py or pypar. I ended up writing my own because my problem was actually too simple for both methods, and the extra functionality just got in the way.