(re)announcing statprof, a statistical profiler for Python

Back in 2005, Andy Wingo wrote a neat little statistical profiler named statprof that promptly disappeared into obscurity. It has since languished almost unknown, with a handful of people writing semi-private forks that themselves seem to be dead.

Statistical profiling (also known as sampling profiling) is simple and sweet: the profiler periodically wakes up and samples the stack, then when all is done, it prints a simple report of which lines showed up most often in the profile.

Why would this matter, though? Python already has two built-in profilers: lsprof and the long-deprecated hotshot. The trouble with lsprof is that it only tracks function calls. If you have a few hot loops within a function, lsprof is nearly worthless for figuring out which ones are actually important.

A few days ago, I found myself in exactly the situation in which lsprof fails: it was telling me that I had a hot function, but the function was unfamiliar to me, and long enough that it wasn’t immediately obvious where the problem was.

After a bit of begging on Twitter and Google+, someone pointed me at statprof. But there was a problem: although it was doing statistical sampling (yay!), it was only tracking the first line of a function when sampling (wtf!?). So I fixed that, spiffed up the documentation, and now it’s both usable and not misleading. Here’s an example of its output, locating the offending line in that hot function more accurately:

  %   cumulative      self          
 time    seconds   seconds  name    
 68.75      0.14      0.14  scmutil.py:546:revrange
  6.25      0.01      0.01  cmdutil.py:1006:walkchangerevs
  6.25      0.01      0.01  revlog.py:241:__init__
  [...blah blah blah...]
  0.00      0.01      0.00  util.py:237:__get__
---
Sample count: 16
Total time: 0.200000 seconds

I have uploaded statprof to the Python package index, so it’s almost trivial to install: “easy_install statprof” and you’re up and running.

Since the code is up on github, please feel welcome to contribute bug reports and improvements. Enjoy!

Posted in open source, python
6 comments on “(re)announcing statprof, a statistical profiler for Python
  1. Zooko says:

    Yay! This sounds like a promising addition to the toolkit.

    By the way, there is also http://packages.python.org/line_profiler, which is a deterministic (100% samples) per-line profiler.

  2. I’d tried line_profiler but quickly gave up, IIRC because I couldn’t get it to even compile, never mind work.

  3. Ehm says:

    That simple?

    $ sudo easy_install statprof
    install_dir /usr/local/lib/python2.6/dist-packages/
    Searching for statprof
    Reading http://pypi.python.org/simple/statprof/
    Reading http://packages.python.org/statprof
    No local packages or download links found for statprof
    error: Could not find suitable distribution for Requirement.parse(‘statprof’)

  4. Could you try the download again, please?

  5. david says:

    Bryan, could you detail what did not work when building line_profiler ? It is used quite often in the scipy community, and should definitely work.

  6. Zooko says:

    David:

    I bet Bryan did the same thing I did:

    $ hg clone https://bitbucket.org/robertkern/line_profiler
    $ cd line_profiler
    $ python setup.py build
    $ python setup.py build
    Could not import Cython. Using pre-generated C file if available.
    running build
    running build_py
    warning: build_py: byte-compiling is disabled, skipping.

    running build_ext
    building ‘_line_profiler’ extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c _line_profiler.c -o build/temp.linux-x86_64-2.7/_line_profiler.o
    gcc: error: _line_profiler.c: No such file or directory
    gcc: fatal error: no input files
    compilation terminated.
    error: command ‘gcc’ failed with exit status 4

    Oh, reading the error message more closely, I see that I need to have Cython installed for this to work…

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>