Background
As an interpreted language, Python cannot compete directly with C or Fortran in execution speed. As I have written elsewhere, there are ways to make Python code run as fast as you need it to run. However, in scientific computing there is often no single processor that can execute a program in a reasonable amount of time. Computationally intensive problems require the use of parallel computers. A message-passing interface such as MPI is used to communicate between processes running on multiple CPUs. There is a Python interface to MPI called Pypar, which allows your Python programs to run in parallel on a system with an implemention of MPI installed. My experience is mainly with Open MPI, but there are several others. Below are some other resources to learn about about MPI and parallel programming in general.
Using MPI with Python works around one of Python’s more serious limitations: the Global Interpreter Lock (GIL). Basically, Python uses the GIL to prevent errors that may occur when two threads try to access the same data. What this means is that Python can only utilize a single CPU thread. Note that this is not a fundamental limitation of the language, but of the standard C implementation of the interpreter. New interpreter implementations may be thread-safe, but for now, Python is stuck in a single thread. The GIL will become more of a problem as ordinary desktop CPU’s gain speed by adding cores rather than increasing clock speed. Rather than exposing my ignorance of Python’s internals, here are some links to posts written by people who are more knowledgable about this subject than I am.
- Python Threads and the Global Interpreter Lock
- A post by a critic of the GIL
- Guido van Rossum, creator of Python, defends the GIL
MPI works around the GIL by spawning multiple CPU processes, each with its own version of the Python interpreter. While there is typically more overhead associated with creating a process than creating a thread, scientific applications are usually so processor-intensive that the benefit of creating the process far outweighs the cost of creating the process.
Resources
- IBM Redbook: Practical MPI Programming is a good overview of how MPI works
- Pypar is a Python interface to MPI
- The MPI Standards
Examples
I have an example, but I haven’t had time to write it up yet.