Back in 2005, Andy Wingo wrote a neat little statistical profiler named
statprof that promptly disappeared into obscurity. It has since languished almost unknown, with a handful of people writing semi-private forks that themselves seem to be dead.
Statistical profiling (also known as sampling profiling) is simple and sweet: the profiler periodically wakes up and samples the stack, then when all is done, it prints a simple report of which lines showed up most often in the profile.
Why would this matter, though? Python already has two built-in profilers: lsprof and the long-deprecated hotshot. The trouble with lsprof is that it only tracks function calls. If you have a few hot loops within a function, lsprof is nearly worthless for figuring out which ones are actually important.
A few days ago, I found myself in exactly the situation in which lsprof fails: it was telling me that I had a hot function, but the function was unfamiliar to me, and long enough that it wasn’t immediately obvious where the problem was.
After a bit of begging on Twitter and Google+, someone pointed me at statprof. But there was a problem: although it was doing statistical sampling (yay!), it was only tracking the first line of a function when sampling (wtf!?). So I fixed that, spiffed up the documentation, and now it’s both usable and not misleading. Here’s an example of its output, locating the offending line in that hot function more accurately:
% cumulative self time seconds seconds name 68.75 0.14 0.14 scmutil.py:546:revrange 6.25 0.01 0.01 cmdutil.py:1006:walkchangerevs 6.25 0.01 0.01 revlog.py:241:__init__ [...blah blah blah...] 0.00 0.01 0.00 util.py:237:__get__ --- Sample count: 16 Total time: 0.200000 seconds
I have uploaded statprof to the Python package index, so it’s almost trivial to install: “
easy_install statprof” and you’re up and running.
Since the code is up on github, please feel welcome to contribute bug reports and improvements. Enjoy!