Numerical programming frequently requires the use of look-up tables. A look-up table is a collection of pre-computed values. When given an “x” value, the table returns a pre-computed “y” value. Look-up tables can be used to speed up numerical codes, when it is faster to look up a value in the table than it is to compute the value. They are also used when the data in the table cannot be computed–for example, experimental data or averaged results from an ensemble of Monte Carlo simulations. Another application is to compute a value when a function cannot be solved algebraically. Assume that you have a formula for a function q(h). You need the value of h for a given value of q, but the formula cannot be algebraically solved to get h(q). Instead, choose a range of h values, compute the function q(h), and store each value in a look-up table. Now you can get h(q) for any value stored in the table. The major limitation of a look-up table is that it cannot return valid results for any value of q which is outside the range of those stored in the table. Depending on its implementation, the table may be able to interpolate to return values between known points.
Scipy includes two handy lookup table classes, but they are sort of hidden. Here is an example of how to create a lookup table using interp1d and UnivariateSpline:
from scipy.interpolate import interp1d, UnivariateSpline from scipy.stats import norm from numpy import arange from numpy.random import uniform import matplotlib.pyplot as plt from time import time deltax = 0.1 xmin = -5.0 xmax = 5.0 # Set up samples x_samples = arange(xmin, xmax, deltax) pdf_samples = norm.pdf(x_samples) fig = plt.plot(x_samples, pdf_samples, 'ro', label='Sampled') # Interpolate on a finer grid pdf_interp = interp1d(x_samples, pdf_samples, kind='linear') x_interp = arange(xmin+deltax, xmax-deltax, 0.001) plt.plot(x_interp, pdf_interp(x_interp), 'b-', label='Interpolated') # Do the same thing with UnivariateSpline u = UnivariateSpline(x_samples, pdf_samples, k=1, s=0.0) plt.plot(x_interp, u(x_interp), 'k-', label='Spline') plt.xlabel('x') plt.ylabel('pdf') plt.legend() plt.show() # Time tests start_time = time() for i in range(100000): pdf_interp(uniform(xmin+deltax, xmax-deltax)) print("interp1d run time: {}".format(time() - start_time)) start_time = time() for i in range(100000): u(uniform(xmin+deltax, xmax-deltax)) print("UnivariateSpline run time: {}".format(time() - start_time)) x_fine = arange(xmin + deltax, xmax - deltax - 0.001, 1e-6) start_time = time() pdf_interp(x_fine) print("interp1d run time: {}".format(time() - start_time)) start_time = time() u(x_fine) print("UnivariateSpline run time: {}".format(time() - start_time))
First, create a Numpy array to store the x values for the lookup table. I’m using the pdf of the normal distribution as an example of a function to interpolate. The table uses linear interpolation to compute values between the known points. Higher-order interpolations can be used, but I don’t need them in this case. Higher-order interpolations can introduce distortions for certain types of data, so they are best avoided unless you really understand the data being fitted. Below is a plot illustrating the results of a look-up table:
The red dots are the values stored in the lookup table, the blue line is interpolated using interp1d, and the black line is interpolated using UnivariateSpline (the blue and black lines are indistinguishable). The function was approximated with a lookup table that was created with an evenly spaced array of x values. Notice how the points stored in the table are spaced more closely together in the y dimension for values of x further from the origin. Thus, the approximation becomes less accurate as x approaches 0, which is a consequence of using a linear “x” spacing with a nonlinear function. You could work around this by using a non-uniform array of “x” values to create the lookup table. This is another reason to understand the function you are approximating, instead of blindly placing values into the table.
Peformance
Like all operations in NumPy, interpolation runs much faster if you pass in a Numpy array instead of operating on individual values. On my older Mac laptop, UnivariateSpline is about twice as fast as interp1d when operating on individual floating point numbers, but interp2d is about twice as fast when operating on NumPy arrays:
interp1d run time: 3.5505311489105225 UnivariateSpline run time: 1.5044260025024414 interp1d run time: 0.1413729190826416 UnivariateSpline run time: 0.27219295501708984
Alternatives to SciPy
If you want to understand more about how lookup tables work, or can’t/won’t install SciPy, check out this interpolated lookup table class. I wouldn’t use it for serious math, since it doesn’t use Numpy and is probably not very fast.
Pingback: Optimizing Python code for fast math | as through a mirror dimly
Pingback: Lookup tables and spline fitting in Python | as through a mirror dimly