“Speed” is a complicated term when used in the context of software. Does it mean raw speed of execution, or reducing the amount of time until a correct result is obtained? Python is not the first language that comes to mind when people think of “fast software.” It is true that pure Python will usually not execute as quickly as the same algorithm directly coded in C or Fortran. However, when you define speed as “least amount of time until you get the right answer,” then Python is pretty fast. It is so easy to develop correct code in Python, when compared to low-level compiled languages, that Python is often the fastet route to a correct answer, even if the execution time is longer. Having said that, there are times when code has to execute quickly, and that’s why I will introduce you to profiling Python code.
Like all things Python, profiling is easier than you think. Python 2.4 has the profile module, and Python 2.5 has both profile and cProfile. cProfile is written in C for lower overhead, and it’s the recommended version. I am stuck with profile, because the cluster that I am working with still uses Python 2.4. You can read the docs for more details, but I will quickly outline what I find to be the most helpful usage. For example, say I want to profile the file run_sim.py. I use the following command line:
python -m profile -o profile_file run_sim.py
The argument -m tells Python to run the library module profile as a script. The argument -o is an option that tells profile to write the profile data to the specified file name. The last argument, of course, is the name of the script to analyze. Once the run is finished, you can analyze the results. I prefer to do this interactively, and I highly recommend the iPython shell. The following is an example of an interactive session in iPython.
In : import pstats In : p = pstats.Stats('profile_file') In : p.sort_stats('cumulative').print_stats(10) Fri Oct 24 17:18:37 2008 profile_file 278914972 function calls (277313577 primitive calls) in 4618.950 CPU seconds Ordered by: cumulative time List reduced from 550 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 4618.950 4618.950 profile:0(execfile('run_sim.py')) 1 12.740 12.740 4618.950 4618.950 run_sim.py:3(?) 1 0.000 0.000 4618.950 4618.950 :0(execfile) 129/1 0.000 0.000 4618.950 4618.950 <string>:1(?) 100000 1416.240 0.014 4513.100 0.045 ~/BrownianDynamics.py:212(timeStep) 12130610 1144.580 0.000 1620.700 0.000 ~/Brownian_Dynamics/trunk/BrownianDynamics.py:105(hasXYZCollision) 36484167 497.980 0.000 779.550 0.000 /usr/lib64/python2.4/random.py:506(gauss) 200000 22.760 0.000 673.340 0.003 ~/lib64/python/scipy/integrate/quadrature.py:182(simps) 300000 545.730 0.002 637.640 0.002 ~/lib64/python/scipy/integrate/quadrature.py:152(_basic_simps)
This is the profile of a Brownian Dynamics simulation. I have sorted the results by cumulative time, to see what parts of the code take longest to run. The first five “calls” are not that useful–they show the profiler calling the “master” script, which in turn calls a “slave” script that actually does the work. The fifth call is where things get interesting–it shows the runtime of the script that actually does the computations. The sixth call is to the method “hasXYZCollision” within that script. Clearly, this is the most time-consuming part of the code, which is why I have spend the past few posts optimizing it! The next call is to a random number generator–not much I can do about that right now. The final two calls demonstrate that you have to read the profile carefully. The function “simps” calls the function “_basic_simps”, so the total time spend doing numerical integration is actually 673 seconds, not 673+637 seconds.
I have found the Python profiler to be extremely useful. Since “premature optimization is the root of all evil,” the profiler is an necessary tool to find and improve those bits of code that are truly inefficient.