Optimizing Python code for fast math

I spent some time today profiling a Brownian dynamics simulation written in Python to see how I could make it faster before starting some long runs on a Linux cluster. In the sections below, I have attempted to quantify the speed improvements due to various changes. Keep in mind that the actual speed improvement in your code will vary, depending on where the actual bottlenecks occur. See my post about profiling Python code. Another caveat: I am running Python 2.4.4 because it’s installed on our cluster.

These books are helpful if you’re working with NumPy:

Use SciPy to implement fast lookup tables

Please see my previous post about how to implement fast lookup tables in Python using interp1d and UnivariateSpline. interp1d seems to be faster when operating on a NumpyArray, but UnivariateSpline is faster when operating on individual floating-point numbers. cipy.interpolate.UnivariateSpline wraps a compiled Fortran library (FITPACK), and it’s pretty much a drop-in replacement for interp1d. The one thing it can’t do is return a default value for out-of-range x values.

For scalar math, Python functions are faster than numpy functions

In one method, I realized that I had accidentally imported the sqrt() function from numpy, but I was only using it to compute the square root of scalar floating point numbers. Finding roots is computationally intensive, and from previous experience I suspected that numpy might be slower than the build-in Python math functions, so I knew this might be an easy place to find some speed. To quantify how much speed, I wrote this little test script:

from math import sqrt
from time import time

num_repeats = 10000000
start_time = time()
for i in range(num_repeats):
    sqrt(1012.1234215)
print "math.sqrt() Elapsed time: " + str(time()-start_time)

from numpy import sqrt
start_time = time()
for i in range(num_repeats):
    sqrt(1012.1234215)
print "numpy.sqrt() Elapsed time: " + str(time()-start_time)

It turns out that the sqrt() function from the standard Python math module is about seven times faster than the corresponding sqrt() function from numpy. As a side note, I learned that it is slightly faster (5-10%) to use the form “from math import sqrt” than it is to use “import math” and “math.sqrt()”. Because of this, you may not want to use “from numpy import *” if you are really concerned about speed.

Use Numpy for generating random numbers

Another significant improvement in speed came when I switched from the gauss() function of the built-in random library to the normal() function from numpy.random. Using a similar script to the one above, I found that the numpy version is about four times faster than random.gauss(). However, generating a random number from a uniform distribution (random.uniform) is very fast. In the source code of random, you will note that drawing one random number from a Gaussian distribution requires computing a square root, a log, a cosine, and a sine in Python. Numpy is faster because it does the math in C.

Function calls are expensive

A minor, but still useful, speed improvement came when I eliminated some unnecessary calls to the built-in functions max(), min(), and abs(). max(float1, float2) can be replaced with

if num1 &gt; num2:
    a = num1
else:
    a = num2

which is about 35% faster. min() and abs() can be similarly replaced with if/then/else statements.

Eliminate unnecessary import statements

When you are wrapping up development of a module, go back and delete all the “from some_module import some_unnecessary_function” statements. This is marginally helpful with speed, but it’s a good programming practice.

Pingback: A lookup table for fast Python math | as through a mirror dimly

...

March 19, 2013 at 5:08 pm

I’m really surprised about the if statement vs min/max/abs. How can this be?

Lewis
March 31, 2013 at 10:22 pm

I guess the built-ins must result in similar run-time code but with the additional overhead of the function all..

Craig

March 31, 2013 at 11:09 pm

Lewis,
I came to the same conclusion. The built-in function abs() must use an if/then internally, so you are using a branch either way, except that abs() adds the overhead of a function call. I just re-tested this on Python 2.7 on a Mac, and got the same results.

John

May 25, 2016 at 2:38 pm

Just FYI, scipy.interpolate.UnivariateSpline is not the same as interp1d. I think you meant scipy.interpolate.InterpolatedUnivariateSpline.