Lisp – wow, that’s a weird feeling.

October 25, 2008

I’ve been holed up in my office lately, teaching myself Common Lisp for an upcoming part of my Ph.D, and I have to tell ya, it’s a weird feeling to write code in Lisp.  I read through Peter Seibel’s great online book Practical Common Lisp, which is an absolutely brilliant resource for first time Lispers, especially those coming from other languages.  (I will admit to a bit of irony, though:  I downloaded the book and wrote a Python script to HTMLize the footnotes into links so that I could jump back and forth without having to continually figure out where the hell I had been when I went to read that note).

In any case, I just finished my first “real” program in Lisp – real in that it was done entirely on my own and required more than 5 lines of code, though the problem I “solved” was an imaginary one.  It’s one of those throwaway efforts to acquaint yourself with a new way of thinking.  I went into it thinking that Lisp was overblown, one of those things that people looked back on fondly with rose-colored glasses while muttering about how “real programmers used to do it”, but by the time I was finished writing the thing, I found that I was actually enjoying myself.  I can’t even explain why, but by the end of writing the program, I was actively looking for new ways to extend the idea so that I could keep writing code!  And now, I find myself looking forward to my next program in Lisp, a far cry from where I started.

(If you’re wondering what I was writing, it’s kind of embarassing, but here it is:  I wrote a program to quantify the cost to typing words on a keyboard from the perspective of a single-finger, one-handed typist.  The “cost” is distance – i.e. how far would the finger have to travel – defined by adjacent keys, so that ‘T’ and ‘E’ are two units apart (‘T’ -> ‘R’ -> ‘E’).  I did this using cl-graph to map the keys onto a graph, and wrote an implementation of Djikstra’s algorithm to calculate the distances between key pairs.  I could then calculate the distance between each pair of letters in a word and sum the distances to get the total cost, which I then averaged over the length of the word so that the penalty for long words was minimized.  After that, it was a snap to write a couple of functions to do things like take in a word list and write out each word with its associated cost, or to get the total cost of a string.  See?  Told you it was trivial. 🙂 )


New parallel programming tool in Python 2.6

October 3, 2008

For some reason or another, I’m getting a lot of long-term hits to this post that I wrote a few months ago on parallel programming with Python.  Thus, I thought I might share the news with people who have been looking for solutions to this problem:  Python 2.6, which was just released, contains a new multiprocessing library which at first glance seems to formalize the procedure I outlined in my previous post and make it more robust.  From the description:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

If you’re interested in that, go grab 2.6 and check out the documentation for the package here!


Python and parallel programming: a nice step forward.

June 7, 2008

Via jessenoller.com, I note that after I posted my little discussion on parallel programming in Python a PEP (Python Enhancement Proposal) was accepted for the addition of the “multiprocessing” package to the Python standard library. I have to say, given the amount of parallel code that I’m going to be fiddling with over the next little while, I’m definitely excited about this one!


Python and parallel programming.

May 31, 2008

Just fork(), they say.

For those of you not in the know, the Python has had a long-running debate over what’s known as the Global Interpreter Lock. I say “debate”, though from what I can see it usually -aside from the odd serious debate – amounts to newbies popping up and saying “why can’t I multi-thread properly on multi-core systems”, and the veterans telling them the way things are. The argument usually goes: multi-threading is how I’m supposed to take advantage of multi-cores. Not around here, replies the Python community; around here, we use the tried and tested methods like forking child processes.

So, I’ve been reading about this for a while, but yesterday was the first time that I’ve actually run across it in my own work. I’ve been writing a genetic algorithm for some modeling that I’m involved in, and with the recent construction of frankenputer, I’ve finally run into the need to parallelize my code to get some speed ups. I wrote this GA in python because a). I’ve come to loathe C++, and b). it’s just so damned easy to code in. Thus, I needed to parallelize a GA written in Python. Thankfully, genetic algorithms are embarrassingly parallel, so I didn’t need too much in the way of inter-process communication, but here was where I ran into a few bumps. For the veterans, my troubles are going to seem incredibly banal, but for other newbies, I thought that I might document my steps.

First: why not threads? If you don’t know, the Global Interpreter Lock (which applies to the C implementation of Python for sure; I’m not sure if other implementations have a GIL or not) is essentially a limit which means that for every instance of the Python interpreter, only one thread is in the driving seat at a time, so running multiple threads at once does nothing to boost speed even on multi-core systems. Thanks to the GIL, the only way to utilize multiple processors is to spawn multiple interpreters, because each interpreter has its own GIL and the processes will run separately.

At least on Unix-like systems (e.g. Linux, OS X, etc), one of the recommended ways to handle this is to create child processes with fork(); the parent processes acts as a bookkeeper, while the child processes do the work.

The general form of it is this:

def child():
    # Actions to be completed in the child process go here.

def parent():
    # Actions to be completed in the parent process go here.

....

# In the main section of your code:
newpid = os.fork()

if newpid == 0:
    child()
else:
    parent()

Why does this work? It takes a bit of a mental adjustment to understand. (Note: because fork() is a Unix and look-alike OS system command, this basic paradigm will work for any programming langauge that provides an interface to the command, of which Python is only one. It just happens to be the one that I’m using).

Just before the os.fork() call, there is one process in memory. After the os.fork() call, there are two processes in memory: one is the parent, one is the child, and the child is an exact copy of the parent. When I say exact copy, I mean it. From the point of view of the programmer, each process has all the same variables, with the same values, and the point of execution starts from immediately after the os.fork() call in both processes.

How does each process know which one they’re in? The newpid variable is the key. os.fork() will return one of two values: to the parent process, it will return the process id (PID) of the child process, while to the child process it will return 0. That’s it! Easy, eh?

Usually, yes. There are some subtleties to account for, certainly. For instance, remembering that each process has the same state (variables + values) going into life after the fork() process, you have to make sure not to do things like have both processes try to work with the same file at the same time (though that’s not the same as intentionally using an object to communicate, like with a pipe). But for my purposes, this trick was almost complete.

Almost, anyways. The situation was complicated by the fact that I needed to spawn more than one process – in fact, what I wanted to do was spawn a total of eight to max out the processors. Getting the list of child processes up to eight wasn’t that hard, since it just meant keeping track of the child process ids and counting them to know whether I’d hit eight yet. But what happens when the children are finished?

Well, the child processes die, but how does the parent know? Herein lied the rub, and I ended up meandering down the wrong path for a while. There’s a few ways to keep track of child processes. Two of the main ones are signals and os.waitpid(), and I started with signals.

In the Unix world, when child processes die they send a signal, SIGCHLD. In Python, you can install a handler for that signal, essentially a function which is called when the parent process receives SIGCHLD. I thought that this sounded like a great way to keep all eight cores humming: just spawn eight child processes, then prune the list of child pids by dropping them when SIGCHLD is caught. However, this turns out to be problematic, because the signal itself wreaked havoc with my child processes in ways I don’t even fully understand; the biggest problem was that SIGCHLD aborted the subprocess calls that my other children were making. Googling around, I discovered that the use of SIGCHLD is generally discouraged (though the reasons for this are usually vague), so I switched to the other method.

The other method is os.waitpid(). This call allows the parent to determine the status of child processes, and it takes two arguments. The first is the process id of the child to poll, and the second is the option. I’ve leave it to the python docs to explain all the subtleties. The important point is that by passing a pid of -1 to waitpid(), the function will report on the status of all the child processes that the parent has spawned, which allowed me to poll the children while I did other processing with a call to waitpid() that looked like this:

(pid,status) = os.waitpid(0,os.WNOHANG)
if pid:
    # One of the children has died, let's get another going.
    ....

waitpid() returns two numbers: a process id, and a status code (see the documentation for wait() in the os module). If none of the children have finished, the return from waitpid() is (0,0); otherwise, the first element of the tuple is the process id of the child that was finished.

This turned out to be a much better mechanism for tracking the children that my parent process had spawned, and meant that the rest of the effort involved in getting my GA to run in parallel was minimal. So the main point of this particular screed, besides gratifying myself for the time I spent debugging, was to hopefully prevent someone else from making the same mistake that I did. So, there you go!

UPDATE: I was just re-reading this post, and I realized that I may have seemed like I was being a little hard on the GIL. From what I’ve read, it does seem like it has some serious advantages to go with the potential downside, and I’m fairly convinced that it was a good idea overall. I also didn’t mention that there are packages that will help you parallelize your code: for simpler jobs, you could try things like Parallel Python (which, from what I understand, does essentially what I did above with more finesse and the ability to do some inter-process communication), or you could scale up to the big guns and use a python binding to MPI – the Message Passing Interface – like mpi4py or pypar. I ended up writing my own because my problem was actually too simple for both methods, and the extra functionality just got in the way.


My programming literacy…

May 2, 2008

Name That Code
Created by OnePlusYou

I’ve actually written non-trivial amounts of code in about 4 of those…