python – teideal glic deisbhéalach

(re)announcing statprof, a statistical profiler for Python

Bryan O'Sullivan — Mon, 09 Apr 2012 18:51:52 +0000

Back in 2005, Andy Wingo wrote a neat little statistical profiler named statprof that promptly disappeared into obscurity. It has since languished almost unknown, with a handful of people writing semi-private forks that themselves seem to be dead.

Statistical profiling (also known as sampling profiling) is simple and sweet: the profiler periodically wakes up and samples the stack, then when all is done, it prints a simple report of which lines showed up most often in the profile.

Why would this matter, though? Python already has two built-in profilers: lsprof and the long-deprecated hotshot. The trouble with lsprof is that it only tracks function calls. If you have a few hot loops within a function, lsprof is nearly worthless for figuring out which ones are actually important.

A few days ago, I found myself in exactly the situation in which lsprof fails: it was telling me that I had a hot function, but the function was unfamiliar to me, and long enough that it wasn’t immediately obvious where the problem was.

After a bit of begging on Twitter and Google+, someone pointed me at statprof. But there was a problem: although it was doing statistical sampling (yay!), it was only tracking the first line of a function when sampling (wtf!?). So I fixed that, spiffed up the documentation, and now it’s both usable and not misleading. Here’s an example of its output, locating the offending line in that hot function more accurately:

  %   cumulative      self          
 time    seconds   seconds  name    
 68.75      0.14      0.14  scmutil.py:546:revrange
  6.25      0.01      0.01  cmdutil.py:1006:walkchangerevs
  6.25      0.01      0.01  revlog.py:241:__init__
  [...blah blah blah...]
  0.00      0.01      0.00  util.py:237:__get__
---
Sample count: 16
Total time: 0.200000 seconds

I have uploaded statprof to the Python package index, so it’s almost trivial to install: “easy_install statprof” and you’re up and running.

Since the code is up on github, please feel welcome to contribute bug reports and improvements. Enjoy!

What’s in a text API?

Bryan O'Sullivan — Tue, 30 Jun 2009 06:00:44 +0000

Now that I’ve got the DEFUN 2009 schedule sorted out (you are coming, aren’t you?), I’ve had time to take a breath and think about the Haskell text library again. Its API is currently a clone of the ancient and venerable Haskell list API. If you’ve used the list API to do much text processing, you’ve probably spilled more than a few tears into your whiskey. The bytestring library also mostly clones the list API, albeit with a few improvements. This state of affairs makes me somewhat sad: here we are with a fabulous language, but a 1991-era API for mangling text.

To put this state of affairs into perspective, here is a function-by-function comparison of the string manipulation APIs of Python 2.6 and Haskell. This is intentionally somewhat pessimistic: I focus on aspects of the Python API that are either absent from or not trivially reimplemented in Haskell, but not the reverse. (If the details that follow make your eyes glaze over, skip them and read on after the table below.)

Python	Haskell
`x + y`	x `append` y
`x in y`	x `isInfixOf` y
`x < y`	`x < y`
`x <= y`	`x <= y`
`x == y`	`x == y`
`x != y`	`x /= y`
`x > y`	`x > y`
`x >= y`	`x >= y`
`x % (...)`
`x[i]`	x `index` i
`x[i:j]`	(j-i) `take` (i `drop` x)
`hash(x)`
`len(x)`	`length x`
`x * y`	y `replicate` x
`x.capitalize()`
`x.center(y)`
`x.count()`
`x.decode()`	`decode...` family
`x.encode()`	`encode...` family
`x.endswith(y)`	y `isSuffixOf` x
`x.expandtabs()`
`x.find(y)`
`x.format(...)`
`x.index(y)`
`x.isalnum()`	`all isAlphaNum x`
`x.isalpha()`	`all isAlpha x`
`x.isdigit()`	`all isDigit x`
`x.islower()`	`all isLower x`
`x.isspace()`	`all isSpace x`
`x.istitle()`
`x.isupper()`	`all isUpper x`
`x.join(y)`	`intercalate x y`
`x.ljust(w)`
`x.lower()`	`toLower x`
`x.lstrip()`	`dropWhile isSpace`
`x.partition(y)`	`break (==y) x`
`x.replace(y,z)`
`x.rfind(y)`
`x.rindex(y)`
`x.rjust(y)`
`x.rpartition(y)`
`x.rsplit(y)`
`x.rstrip(y)`
`x.split(y)`
`x.splitlines()`	`lines x`
`x.startswith(y)`	y `isPrefixOf` x
`x.strip()`
`x.swapcase()`
`x.title()`
`x.translate(y)`
`x.upper()`	`toUpper x`
`x.zfill()`

For now, I’m intentionally not looking at Python’s unicodedata or string packages, even though each contains a handful of additional useful functions.

How would I broadly categorise what’s missing from the current Haskell APIs?

Formatting. The format method that’s new in Python 2.6 is well designed and extremely useful. While there are a few formatting libraries on Hackage, each has flaws which I think are substantial enough to make them undesirable for wide use. As examples of those shortcomings, I’m thinking of a lack of static type safety or a poor fit for automated translation tools.
Searching and splitting text. The Haskell APIs are based on predicates over individual characters, whereas what’s usually needed is predicates over strings. In other words, don’t just find me a character; find me a substring.
Parsing. I’m not overly concerned about this, since Haskell’s libraries far outshine those of Python in this area. Although they currently lack support for the text library, the Parsec and attoparsec libraries will acquire it, I’m sure, as soon as there’s demand. What would be welcome is a decent Unicode-capable regular expression engine, for those times when you just have to get yourself into trouble in the name of expediency.

I intend to address each of these areas over the coming months, and I’ll write up the APIs I intend to flesh out here before I actually implement them, to solicit feedback from the community. One step that I think I’ll probably take, for instance, is to move a few of the functions in the Data.Text module that clone the list API into a new module, Data.Text.Legacy, so that I can use the same function names in Data.Text, but with more useful types. As an example of what I have in mind, I’d be inclined to move split :: Char -> Text -> [Text] into the legacy module, and replace it with split :: Text -> Text -> [Text].

There’s something of a tension between the goals of providing a small, focused text library and getting all the API details right in a way that will make it truly useful. I find the proliferation of tiny libraries on Hackage, each providing a few little pieces of missing functionality, to be pretty dispiriting from the point of view of getting dug in and producing useful application code quickly, so I intend for the text and text-icu libraries to be broadly useful from the get-go.

If you have opinions, or better yet patches, to contribute, let’s get things rolling!

Why you should not use pyinotify

Bryan O'Sullivan — Sat, 05 Jan 2008 01:18:50 +0000

A while ago, I had a need to monitor filesystem modifications, and I looked around for Python bindings for the Linux kernel’s inotify subsystem. At the time, the only existing library was pyinotify, so being a lazy sort, I naturally tried to use it.

On first glance, the documentation seems impressive, and the API looks reasonable. Effective use of inotify is a subtle affair, however, and pyinotify is not, shall we say, the best tool for the job. It’s difficult to tell what those problems might be from external inspection, though, so here are a few notes from my experience.

Correctness

A program using pyinotify can easily lose track of parts of its directory hierarchy. The library doesn’t raise an OSError exception if the inotify_add_watch system call fails: instead, it propagates the -1 error result up to the caller as a value in a dict, but without the value of errno to tell the caller why the error occurred.

It’s thus trivial to miss errors entirely, because the usual mechanism of raising exceptions isn’t used. Almost as bad, it’s impossible to distinguish between recoverable (tried to add a watch on a directory that no longer exists) and fatal (hit the system max_user_watches limit) errors.

Performance

To a regular Python hacker, the interface that pyinotify provides will probably look reasonable. If you want to handle some kind of event, just write a method that will get invoked with an Event object when that event occurs. How reassuringly normal.

Under the hood, though, the implementation is terrible. On every event, the library scans every event that the inotify interface could possibly report, and checks to see if your class implements one of several possible appropriately named methods. This means it’s traversing a 20-element dict, and performing up to 60 attribute lookups (of which up to 40 are based on %-formatted names), for every reported event.

This has disastrous performance implications. If you write a simple monitoring tool that uses pyinotify, use it to monitor activity in a Linux kernel source tree, and then start a build in that tree, try running top while your build runs. When I did this, I found that pyinotify was consuming an entire CPU trying to keep up with the flood of notification events.

Locking

All that needless attribute lookup churn isn’t the only problem: pyinotify uses a threading.RLock to protect every access to every attribute of its Watch class, by providing its own __getattribute__ and __setattr__ methods.

I can’t guess what the author thinks he’s protecting himself from, but he’s got a solid defence mounted against both correctness and performance there. (Blindly locking individual attributes isn’t going to protect the consistency of an entire data structure, and delegating responsibility for locking out to callers, who are probably all single-threaded anyway, might help to recover a bit of the execrable performance. Watch isn’t often on the fast path, thank goodness.)

Is it possible to do better?

A potential rejoinder to my performance criticisms is that Python isn’t a fast language. However, this doesn’t bear up in general: I’ve written plenty of nippy Python code. In this particular case, in response to my mounting horror at reading and fixing the pyinotify source, I wrote bindings of my own. In contrast to pyinotify consuming an entire CPU during moderately heavy filesystem activity, an app using my bindings consumes about 5% of a CPU, even in the face intensive activities like untarring a big file archive.

In part, this is because my bindings are less abstracted than those of pyinotify. I don’t dispatch out to user methods at all; the caller is responsible for checking a bitmask instead. The readability of application code isn’t really affected by this, but stripping out all the cruft massively improves performance.

In addition, the application itself is also responsible for using the library in an informed way. To get decent performance with inotify, you must delay calls to read so that the kernel has a chance to aggregate multiple notifications into a single buffer write. In other words, if a call to poll says “you’ve got events”, you have to wait a good fraction of a second before seeing what they are. I provide a Threshold class to help with this.

While it is certainly possible to call into pyinotify in a similarly informed way, I suspect that all its flab and abstraction will gull the unwary coder into thinking that maybe they’re not writing performance-critical code after all, when in fact they are.

There are other Python inotify interfaces available. One is, like mine, named python-inotify, but a quick glance at its source code revealed some of the same silliness with unnecessary locking that plagues pyinotify, so I quickly averted my eyes. There’s also a Python API to gamin. I have no opinion about it, beyond not wanting to run another daemon if I can avoid it.

My general advice would be to avoid writing code that involves monitoring filesystem activity. It’s all too easy to write code that looks sensible, but is actually racy, usually under circumstances that are difficult to reproduce. Tuning performance without introducing more races or bugs is tough. You’re getting the idea now: hard! scary! find something fun instead!

The corollary to this is, of course, that as a user, you ought to be suspicious of any programs you use that monitor filesystem activity. I bet the Beagle and Google Desktop teams have armloads of horror stories.

How to build safe, clean Python 2.5 RPMs for Fedora Core 6

Bryan O'Sullivan — Sat, 23 Dec 2006 00:18:54 +0000

Since FC6 ships with Python 2.4, you’re a bit stuck if you want to play with the new features of Python 2.5. Here’s a quick and easy way to build and install a cleanly-packaged version of Python 2.5 for FC6. First, you must ensure that you have a sufficient development environment available. Fortunately, you can do this in one step. Note: this is the only command you’ll need to run with root privileges until the time comes for you to install the Python RPM that you’ve built.

$ sudo yum install autoconf bzip2-devel db4-devel \

  expat-devel findutils gcc-c++ gdbm-devel glibc-devel gmp-devel \

  libGL-devel libX11-devel libtermcap-devel ncurses-devel \

  openssl-devel pkgconfig readline-devel sqlite-devel tar \

  tix-devel tk-devel zlib-devel

(That’s one long line of input.) This will trundle along for a few minutes, after which you’ll have all of the bits you need installed. Except for Python itself, that is. Simply grab this, in source RPM form, from your nearest friendly Rawhide repository.

lftp ftp://mirrors.kernel.org/fedora/core/development/source/SRPMS

mget python-2*.src.rpm

Next, install the Python source RPM into a temporary build directory of your choice. In this example, I’ll use “/tmp/mypy”.

mkdir -p /tmp/mypy/{BUILD,RPMS,SOURCES,SPECS}

rpm --define '_topdir /tmp' -ivh python-2*.src.rpm

Now you’ll need to go into the SOURCES directory and frob a single file:

cd /tmp/mypy/SOURCES

sed -ie 's/DBLIBVER=4.5/DBLIBVER=4.3/' python-2.5-config.patch

This tells the bsddb module to link against Berkeley DB 4.3 (the default on FC6), rather than 4.5 (which will presumably ship with Fedora 7). The next step is to build the Python RPM.

cd /tmp/mypy/SPECS

rpmbuild --define '_topdir /tmp/mypy' --define '__python_ver 25' -bb python.spec

This takes just a few minutes on my laptop, so it shouldn’t take long for you, either. Once you’re done, the binary RPMs will be present somewhere under /tmp/mypy/RPMS. On a 32-bit x86 machine, they’ll be in the i386 subdirectory, and on an x86_64 machine, they’ll be in the x86_64 subdirectory. You’ll have to become root to install them:

sudo rpm -ivh /tmp/mypy/RPMS/*/*.rpm

A nice aspect of this way of building is that the packages it builds should not conflict with the system’s default Python, so you ought not to have any peculiar explosions in one of the many system packages that expect a specific Python version. Your new “python” package will be named “python25”, for example, and the interpreter will be named “python25”, too.

Delicious Python

Bryan O'Sullivan — Mon, 13 Jun 2005 05:47:48 +0000

Or why I love popular scripting languages, reason number one zillion.

I use Sage with Firefox to keep up with various blogs, and del.icio.us as a URL dumping ground.

It took me approximately five minutes to find a Python interface to del.icio.us and write a script that turns sets of tagged URLs into an OPML file that I can either drop into Sage or post to my blog:

 import delicious deli = delicious.DeliciousNOTAPI() blogs = deli.get_posts_by_user('bos', 'blog') print ”' del.icio.us OPML Export ”' for blog in blogs:     blog['description'] = blog['description'].lower() blogs.sort(lambda a, b: cmp(a['description'], b['description'])) for blog in blogs:     print '' %         (blog['description'], blog.get('extended', ”), blog['url']) print ''

Why Python is useless for serious XML processing

Bryan O'Sullivan — Fri, 22 Oct 2004 08:08:06 +0000

I have a Python application in which, for my sins, I decided to use XML as an on-disk storage format. Unfortunately, when I made this decision, I neglected to measure the performance of the available Python XML processing implementations.

Bad, bad, bad mistake. I expected that I was going to trade a little saved work for some performance, but when I finally got around to profiling my app today, to see why it was so slow, I was shocked.

Using the xml.sax module, I am able to process a 2.5MB document in 2.5 seconds on a reasonably fast Pentium 4 system. That gives me one megabyte per second of emphysema-wheezing parsing power. This number is so spectacularly, laughably bad that I actually spent several hours rechecking my measurements to see if I was doing something heinously stupid. I wasn’t–that is, beyond naïvely hoping for decent performance in the first place.

Now, I could use PyRXP, and I have before, but it’s only about three times faster than xml.sax. I can chew through vastly more data using fp.write(repr(obj));eval(fp.read())!

I really need something that can parse tens of megabytes of data per second, so as far as I can tell, I simply can’t mix XML and Python at all. Sigh.