UPDATE 1/2016: It appears that Pypar is no longer maintained, although it probably still works. For new projects, I advise using mpi4py.
The following is an online copy of the documentation of Pypar. I am not the author of the text below. I have merely posted it online to make it more accessible to the community. The latest version of Pypar can be found at:
The Pypar page on Sourceforge

PyPAR – Parallel Python, no-frills MPI interface

Author: Ole Nielsen (2001, 2002, 2003)
Version: See pypar.__version__
Date: See pypar.__date__

TESTING THE INSTALLATION
GETTING STARTED
FIRST PARALLEL PROGRAM
PROGRAMMING FOR EFFICIENCY
PYPAR REFERENCE
DATA TYPES
STATUS OBJECT
EXTENSIONS
PROBLEMS
REFERENCES

TESTING THE INSTALLATION

After having installed pypar run
mpirun -np 2 testpypar.py
to verify that everything works.
If it doesn’t try to run a C/MPI program
(for example ctiming.c) to see if the problem lies with
the local MPI installation or with pypar.

GETTING STARTED

Assuming that the pypar C-extension compiled you should be able to
write
>> import pypar
from the Python prompt.
(This may not work under MPICH under Sunos – this doesn’t matter,
just skip this interactive bit and go to FIRST PARALLEL PROGRAM.)
Then you can write
>> pypar.size()
to get the number of parallel processors in play.
(When using pypar from the command line this number will be 1).
Processors are numbered from 0 to pypar.size()-1.
Try
>> pypar.rank()
to get the processor number of the current processor.
(When using pypar from the command line this number will be 0).
Finally, try
>> pypar.Get_processor_name()
to return host name of current node

FIRST PARALLEL PROGRAM

Take a look at the file demo.py supplied with this distribution.
To execute it in parallel on four processors, say, run
mpirun -np 4 demo.py
from the UNIX command line.
This will start four copies of the program each on its own processor.
The program demo.py makes a message on processor 0 and
sends it around in a ring
– each processor adding a bit to it – until it arrives back at
processor 0.
All parallel programs must find out what processor they are running on.
This is accomplished by the call
myid = pypar.rank()
The total number of processors is obtained from
proc = pypar.size()
One can then have different codes for different processors by branching
as in
if myid == 0
…
To send a general Python structure A to processor p, write
pypar.send(A, p)
and to receive something from processor q, write
X = pypar.receive(q)
This will cater for any (picklable Python structure) and make parallel
programs very simple and readable. While reasonably fast, it does not
achieve the full bandwidth obtainable by MPI.

PROGRAMMING FOR EFFICIENCY

Really low latency communication can be achieved by sticking
to Numeric arrays and specifying receive buffers whenever possible.
To send a Numeric array A to processor p, write
pypar.send(A, p, use_buffer=True)
and to receive the array from processor q, write
X = pypar.receive(q, buffer=X)
Note that X acts as a buffer and must be pre-allocated prior to the
receive statement as in Fortran and C programs using MPI.
These forms have superseded the raw forms present in pypar
prior to version 1.9. The raw forms have been recast in terms of the
above and have been retained for backwars compatibility.
See the script pytiming for an example of communication of Numeric arrays.

MORE PARALLEL PROGRAMMING WITH PYPAR

When you are ready to move on, have a look at the supplied
demos and the script testpypar.py which all give examples
of parallel programming with pypar.
You can also look at a standard MPI reference (C or Fortran)
to get some ideas. The principles are the same in pypar except
that many calls have been simplified.

PYPAR REFERENCE

Here is a list of functions provided by pypar:
(See section on Data types for explanation of ‘vanilla’).

Identification

size() — Number of processors
rank() — Id of current processor
get_processor_name() — Return host name of current node

Basic send forms

send(x, destination)
Sends data x of any type to destination with default tag.
send(x, destination, tag=t)
Sends data x of any type to destination with tag t.
send(x, destination, use_buffer=True)
Sends data x of any type to destination
assuming that recipient will specify a suitable buffer.
send(x, destination, bypass=True)
Send Numeric array of any type to recipient assuming
that a suitable buffer has been specified and that
recipient also specifies bypass=True

Basic receive forms

y=receive(source)
receives data y of any type from source with default tag.
y=receive(source, tag=t)
receives data y of any type from source with tag t.
y,status=receive(source, return_status=True)
receives data y and status object from source
y=receive(source, buffer=x)
receives data y from source and puts
it in x (which must be of compatible size and type).
It also returns a reference to x.
(Although it will accept all types this form is thought to be used
mainly for Numeric arrays).

Collective Communication:

broadcast(x, root):
Broadcasts x from root to all other processors.
All processors must issue the same bcast.
gather(x, root):
Gather all elements in x to buffer of
size len(x) * numprocs
created by this function.
If x is multidimensional buffer will have
the size of zero’th dimension multiplied by numprocs.
A reference to the created buffer is returned.
gather(x, root, buffer=y):
Gather all elements in x to specified buffer y
from source.
Buffer must have size len(x) * numprocs and
shape[0] == x.shape[0]*numprocs.
A reference to the buffer y is returned.
scatter(x, root):
Scatter all elements in x from root to all other processors
in a buffer created by this function.
A reference to the created buffer is returned.
scatter(x, root, buffer=y):
Scatter all elements in x from root to all other processors
using specified buffer y.
A reference to the buffer y is returned.
reduce(x, op, root):
Reduce all elements in x at root
applying operation op elementwise and return result in
buffer created by this function.
A reference to the created buffer is returned.
reduce(x, op, root, buffer=y):
Reduce all elements in x to specified buffer y
(of the same size as x)
at source applying operation op elementwise.
A reference to the buffer y is returned.

Other functions:

time() — MPI wall time
barrier() — Synchronisation point. Makes processors wait until all
processors have reached this point.
abort() — Terminate all processes.
finalize() — Cleanup MPI. No parallelism can take place after this point.
initialized() — True if MPI has been initialised
See pypar.py for doc strings on individual functions.

DATA TYPES

Pypar automatically handles different data types differently
There are three protocols:
‘array’: Numeric arrays of type Int (‘i’, ‘l’), Float (‘f’, ‘d’),
or Complex (‘F’, ‘D’) can be communicated
with the underlying mpiext.send_array and mpiext.receive_array.
This is the fastest mode.
Note that even though the underlying C implementation does not
support Complex as a native datatype, pypar handles them
efficiently and seemlessly by transmitting them as arrays of
floats of twice the size.
‘string’: Text strings can be communicated with mpiext.send_string and
mpiext.receive_string.
‘vanilla’: All other types can be communicated using the scripts
send_vanilla and receive_vanilla provided that the objects
can be serialised using
pickle (or cPickle). The latter mode is less efficient than the
first two but it can handle general structures.
Rules:
If keyword argument vanilla == 1, vanilla is chosen regardless of
x’s type.
Otherwise if x is a string, the string protocol is chosen
If x is an array, the ‘array’ protocol is chosen provided that x has one
of the admissible typecodes.
Function that take vanilla as a keyword argument can force vanilla mode
on any datatype.

STATUS OBJECT

A status object can be optionally returned from receive by
specifying return_status=True in the call.
The status object can subsequently be queried for information about the communication.
The fields are:
status.source: The origin of the received message (use e.g. with pypar.any_source)
status.tag: The tag of the received message (use e.g. with pypar.any_tag)
status.error: Error code generated by underlying C function
status.length: Number of elements received
status.size: Size (in bytes) of one element
status.bytes(): Number of payload bytes received (excl control info)
The status object is essential when use together with any_source or any_tag.

EXTENSIONS

At this stage only a subset of MPI is implemented. However,
most real-world MPI programs use only these simple functions.
If you find that you need other MPI calls, please feel free to
add them to the C-extension. Alternatively, drop me note and I’ll
use that as an excuse to update pypar.

PERFORMANCE

If you are passing simple Numeric arrays around you can reduce
the communication time by using the ‘buffer’ keyword arguments
(see REFERENCE above). These version are closer to the underlying MPI
implementation in that one must provide receive buffers of the right size.
However, you will find that these version have lower latency and
can be somewhat faster as they bypass
pypar’s mechanism for automatically transferring the needed buffer size.
Also, using simple numeric arrays will bypass pypar’s pickling of general
structures.
This will mainly be of interest if you are using a ‘fine grained’
parallelism, i.e. if you have frequent communications.
Try both versions and see for yourself if there is any noticable difference.

PROBLEMS

If you encounter any problems with this package please drop me a line.
I cannot guarantee that I can help you, but I will try.
Ole Nielsen

Pypar Documentation