Python is a very-high-level language. That makes it easy to write code quickly, but the program may not be as fast as a program compiled from a lower-level language. For this reason, many scientific programs are written in Fortran or C++. However, it has always been my experience that the majority of time on a project is spent in writing, modifiying, and debugging code, rather than executing. Fortunately, if written correctly, the time-critical parts of Python code can execute almost as fast as compiled software. Here is an example of a collision-detection algorithm which achieved almost a ten-fold increase in speed when written to use Numpy.

```
# Oldest, slowest method
for j in range(0,i):
if ((x-self.x[j])*(x-self.x[j]) + (y-self.y[j])*(y-self.y[j]) + (z-self.z[j])*(z-self.z[j])) < R_squared:
return True
for j in range(i+1,len(self.x)):
if ((x-self.x[j])*(x-self.x[j]) + (y-self.y[j])*(y-self.y[j]) + (z-self.z[j])*(z-self.z[j])) < R_squared:
return True
return False
```

This code starts with lists of the coordinates (self.x, self.y, self.z) of identical spheres in three-dimensional space. A sphere “i” is moved to a new location (x,y,z), and we need to know if it collides with any of the other spheres. This algorithm uses the Pythagorean Theorem to compute the center-to-center distance between spheres. If this distance is less than twice the sphere radius, then the new sphere intersects with one of the old ones, so we return “True,” otherwise return “False.” We loop through every sphere in the list (except for the current one) and test for a collision. Now, we will perform the same operation using Numpy:

```
if i>0:
d2 = (x-self.x[0:i])*(x-self.x[0:i]) + (y-self.y[0:i])*(y-self.y[0:i]) + (z-self.z[0:i])*(z-self.z[0:i])
if min(d2) < R_squared:
return True
if i+1 < len(self.x):
d2 = (x-self.x[i+1:len(self.x)])*(x-self.x[i+1:len(self.x)]) + (y-self.y[i+1:len(self.x)])*(y-self.y[i+1:len(self.x)]) + (z-self.z[i+1:len(self.x)])*(z-self.z[i+1:len(self.x)])
if min(d2) < R_squared:
return True
```

The basic collision detection test remains the same, but now the math is implemented using Numpy operaters on Numpy arrays. Notice that the loops are gone–Numpy operators can operate element-by-element on an array so that an entire “for” loop can be written in one line. Not only is the syntax cleaner, but it in this case, it sped up the code by almost a factor of ten! Numpy is written in C and carefully optimized to perform **fast** operations on arrays. When you need to operate on a whole array, the operations take place at the speed of compiled C. Note that I had to add some “if” statements to avoid creating zero-element arrays for the cases i=0 and i=len(array)-1.

I’m going to speculate on another reason why Numpy might be so much faster: the compiled Numpy code can take advantage of parallel execution in the CPU. Modern CPUs have some level of instruction-level parallel execution, even within a single core. A good compiler will find sections of code that don’t have to run sequentially, and compile them in a way that takes advantage of this. The collision detection is great example of code that can be executed in parallel–whether or not sphere “i” collides with sphere “i-1” has nothing to do with whether or not it collides with sphere “i+1.”

Here is an extensive example of how to optimize numerical operations in Python.

craighttp://en.wikipedia.org/wiki/Superscalar

Here is a link explaining how single-core CPU’s take advantage of parallel execution at the instruction level.