<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>teideal glic deisbhéalach &#187; python</title>
	<atom:link href="http://www.serpentine.com/blog/category/software/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.serpentine.com/blog</link>
	<description>Bryan O&#039;Sullivan&#039;s blog</description>
	<lastBuildDate>Thu, 01 Dec 2011 16:53:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>What&#8217;s in a text API?</title>
		<link>http://www.serpentine.com/blog/2009/06/30/python-and-haskell-text-apis-compare/</link>
		<comments>http://www.serpentine.com/blog/2009/06/30/python-and-haskell-text-apis-compare/#comments</comments>
		<pubDate>Tue, 30 Jun 2009 06:00:44 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/?p=397</guid>
		<description><![CDATA[Now that I&#8217;ve got the DEFUN 2009 schedule sorted out (you are coming, aren&#8217;t you?), I&#8217;ve had time to take a breath and think about the Haskell text library again. Its API is currently a clone of the ancient and venerable Haskell list API. If you&#8217;ve used the list API to do much text processing, [...]]]></description>
			<content:encoded><![CDATA[<p>Now that I&#8217;ve got the <a href="http://www.defun2009.info/blog/tutorial-schedule/">DEFUN 2009 schedule</a> sorted out (you <i>are</i> coming, aren&#8217;t you?), I&#8217;ve had time to take a breath and think about the <a href="http://hackage.haskell.org/package/text">Haskell text library</a> again. Its API is currently a clone of the ancient and venerable Haskell list API. If you&#8217;ve used the list API to do much text processing, you&#8217;ve probably spilled more than a few tears into your whiskey. The <a href="http://hackage.haskell.org/package/bytestring">bytestring library</a> also mostly clones the list API, albeit with a few improvements. This state of affairs makes me somewhat sad: here we are with a fabulous language, but a 1991-era API for mangling text.</p>

<p>To put this state of affairs into perspective, here is a function-by-function comparison of the string manipulation APIs of Python 2.6 and Haskell. This is intentionally somewhat pessimistic: I focus on aspects of the Python API that are either absent from or not trivially reimplemented in Haskell, but not the reverse. (If the details that follow make your eyes glaze over, skip them and <a href="#continue">read on after the table</a> below.)</p>

<table>
<tr><td><b>Python</b></td><td><b>Haskell</b></td></tr>
<tr><td><tt>x + y</tt></td><td><tt>x `append` y</tt></td></tr>
<tr><td><tt>x in y</tt></td><td><tt>x `isInfixOf` y</tt></td></tr>
<tr><td><tt>x &lt; y</tt></td><td><tt>x &lt; y</tt></td></tr>
<tr><td><tt>x &lt;= y</tt></td><td><tt>x &lt;= y</tt></td></tr>
<tr><td><tt>x == y</tt></td><td><tt>x == y</tt></td></tr>
<tr><td><tt>x != y</tt></td><td><tt>x /= y</tt></td></tr>
<tr><td><tt>x &gt; y</tt></td><td><tt>x &gt; y</tt></td></tr>
<tr><td><tt>x &gt;= y</tt></td><td><tt>x &gt;= y</tt></td></tr>
<tr><td><tt>x % (...)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x[i] </tt></td><td><tt>x `index` i</tt></td></tr>
<tr><td><tt>x[i:j]</tt></td><td><tt>(j-i) `take` (i `drop` x)</tt></td></tr>
<tr><td><tt>hash(x)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>len(x)</tt></td><td><tt>length x</tt></td></tr>
<tr><td><tt>x * y</tt></td><td><tt>y `replicate` x</tt></td></tr>
<tr><td><tt>x.capitalize()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.center(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.count()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.decode()</tt></td><td><tt>decode...</tt> family</td></tr>
<tr><td><tt>x.encode()</tt></td><td><tt>encode...</tt> family</td></tr>
<tr><td><tt>x.endswith(y)</tt></td><td><tt>y `isSuffixOf` x</tt></td></tr>
<tr><td><tt>x.expandtabs()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.find(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.format(...)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.index(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.isalnum()</tt></td><td><tt>all isAlphaNum x</tt></td></tr>
<tr><td><tt>x.isalpha()</tt></td><td><tt>all isAlpha x</tt></td></tr>
<tr><td><tt>x.isdigit()</tt></td><td><tt>all isDigit x</tt></td></tr>
<tr><td><tt>x.islower()</tt></td><td><tt>all isLower x</tt></td></tr>
<tr><td><tt>x.isspace()</tt></td><td><tt>all isSpace x</tt></td></tr>
<tr><td><tt>x.istitle()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.isupper()</tt></td><td><tt>all isUpper x</tt></td></tr>
<tr><td><tt>x.join(y)</tt></td><td><tt>intercalate x y</tt></td></tr>
<tr><td><tt>x.ljust(w)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.lower()</tt></td><td><tt>toLower x</tt></td></tr>
<tr><td><tt>x.lstrip()</tt></td><td><tt>dropWhile isSpace</tt></td></tr>
<tr><td><tt>x.partition(y)</tt></td><td><tt>break (==y) x</tt></td></tr>
<tr><td><tt>x.replace(y,z)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.rfind(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.rindex(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.rjust(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.rpartition(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.rsplit(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.rstrip(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.split(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.splitlines()</tt></td><td><tt>lines x</tt></td></tr>
<tr><td><tt>x.startswith(y)</tt></td><td><tt>y `isPrefixOf` x</tt></td></tr>
<tr><td><tt>x.strip()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.swapcase()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.title()</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.translate(y)</tt></td><td><tt></tt></td></tr>
<tr><td><tt>x.upper()</tt></td><td><tt>toUpper x</tt></td></tr>
<tr><td><tt>x.zfill()</tt></td><td><tt></tt></td></tr>
</table>
<a name="continue"/>
<p>For now, I&#8217;m intentionally not looking at Python&#8217;s <a href="http://docs.python.org/library/unicodedata.html"><tt>unicodedata</tt></a> or <a href="http://docs.python.org/library/string.html"><tt>string</tt></a> packages, even though each contains a handful of additional useful functions.</p>

<p>How would I broadly categorise what&#8217;s missing from the current Haskell APIs?</p>
<ol>
	<li>Formatting. The <a href="http://docs.python.org/library/string.html#formatstrings"><tt>format</tt></a> method that&#8217;s new in Python 2.6 is well designed and extremely useful. While there are a few formatting libraries on Hackage, each has flaws which I think are substantial enough to make them undesirable for wide use. As examples of those shortcomings, I&#8217;m thinking of a lack of static type safety or a poor fit for automated translation tools.</li>
	<li>Searching and splitting text. The Haskell APIs are based on predicates over individual characters, whereas what&#8217;s usually needed is predicates over <i>strings</i>. In other words, don&#8217;t just find me a character; find me a substring.</li>
	<li>Parsing. I&#8217;m not overly concerned about this, since Haskell&#8217;s libraries far outshine those of Python in this area. Although they currently lack support for the text library, the <a href="http://hackage.haskell.org/package/parsec">Parsec</a> and <a href="http://hackage.haskell.org/package/attoparsec">attoparsec</a> libraries will acquire it, I&#8217;m sure, as soon as there&#8217;s demand. What <em>would</em> be welcome is a decent Unicode-capable regular expression engine, for those times when you just <em>have</em> to get yourself into trouble in the name of expediency.</li>
</ol>

<p>I intend to address each of these areas over the coming months, and I&#8217;ll write up the APIs I intend to flesh out here before I actually implement them, to solicit feedback from the community. One step that I think I&#8217;ll probably take, for instance, is to move a few of the functions in the <tt>Data.Text</tt> module that clone the list API into a new module, <tt>Data.Text.Legacy</tt>, so that I can use the same function names in <tt>Data.Text</tt>, but with more useful types. As an example of what I have in mind, I&#8217;d be inclined to move <tt>split :: Char -&gt; Text -&gt; [Text]</tt> into the legacy module, and replace it with <tt>split :: Text -&gt; Text -&gt; [Text]</tt>.</p>

<p>There&#8217;s something of a tension between the goals of providing a small, focused text library and getting all the API details right in a way that will make it truly useful. I find the proliferation of tiny libraries on Hackage, each providing a few little pieces of missing functionality, to be pretty dispiriting from the point of view of getting dug in and producing useful application code quickly, so I intend for the text and <a href="http://hackage.haskell.org/package/text-icu">text-icu</a> libraries to be broadly useful from the get-go.</p>

<p>If you have opinions, or better yet patches, to contribute, let&#8217;s get things rolling!</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2009/06/30/python-and-haskell-text-apis-compare/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Why you should not use pyinotify</title>
		<link>http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/</link>
		<comments>http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/#comments</comments>
		<pubDate>Sat, 05 Jan 2008 01:18:50 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/</guid>
		<description><![CDATA[A while ago, I had a need to monitor filesystem modifications, and I looked around for Python bindings for the Linux kernel&#8217;s inotify subsystem. At the time, the only existing library was pyinotify, so being a lazy sort, I naturally tried to use it. On first glance, the documentation seems impressive, and the API looks [...]]]></description>
			<content:encoded><![CDATA[<p>A while ago, I had a need to monitor filesystem modifications, and I looked around for Python bindings for the Linux kernel&#8217;s <a href="http://en.wikipedia.org/wiki/Inotify">inotify</a> subsystem. At the time, the only existing library was <a href="http://pyinotify.sourceforge.net/">pyinotify</a>, so being a lazy sort, I naturally tried to use it.</p>
<p>On first glance, the documentation seems impressive, and the API looks reasonable. Effective use of inotify is a subtle affair, however, and pyinotify is not, shall we say, the best tool for the job. It&#8217;s difficult to tell what those problems might be from external inspection, though, so here are a few notes from my experience.</p>
<h3>Correctness</h3>
<p>A program using pyinotify can easily lose track of parts of its directory hierarchy. The library doesn&#8217;t raise an <code>OSError</code> exception if the <code>inotify_add_watch</code> system call fails: instead, it propagates the <code>-1</code> error result up to the caller as a value in a <code>dict</code>, but without the value of <code>errno</code> to tell the caller why the error occurred.</p>
<p>It&#8217;s thus trivial to miss errors entirely, because the usual mechanism of raising exceptions isn&#8217;t used. Almost as bad, it&#8217;s impossible to distinguish between recoverable (tried to add a watch on a directory that no longer exists) and fatal (hit the system <code>max_user_watches</code> limit) errors.</p>
<h3>Performance</h3>
<p>To a regular Python hacker, the interface that pyinotify provides will probably look reasonable. If you want to handle some kind of event, just write a method that will get invoked with an Event object when that event occurs. How reassuringly normal.</p>
<p>Under the hood, though, the implementation is terrible. On every event, the library scans every event that the inotify interface could possibly report, and checks to see if your class implements one of several possible appropriately named methods. This means it&#8217;s traversing a 20-element <code>dict</code>, and performing up to 60 attribute lookups (of which up to 40 are based on <code>%</code>-formatted names), for every reported event.</p>
<p>This has disastrous performance implications. If you write a simple monitoring tool that uses pyinotify, use it to monitor activity in a Linux kernel source tree, and then start a build in that tree, try running <code>top</code> while your build runs. When I did this, I found that pyinotify was consuming an <em>entire CPU</em> trying to keep up with the flood of notification events.</p>
<h3>Locking</h3>
<p>All that needless attribute lookup churn isn&#8217;t the only problem: pyinotify uses a <code>threading.RLock</code> to protect every access to every attribute of its <code>Watch</code> class, by providing its own <code>__getattribute__</code> and <code>__setattr__</code> methods.</p>
<p>I can&#8217;t guess what the author thinks he&#8217;s protecting himself from, but he&#8217;s got a solid defence mounted against both correctness and performance there. (Blindly locking individual attributes isn&#8217;t going to protect the consistency of an entire data structure, and delegating responsibility for locking out to callers, <em>who are probably all single-threaded anyway</em>, might help to recover a bit of the execrable performance. <code>Watch</code> isn&#8217;t often on the fast path, thank goodness.)</p>
<h3>Is it possible to do better?</h3>
<p>A potential rejoinder to my performance criticisms is that Python isn&#8217;t a fast language. However, this doesn&#8217;t bear up in general: I&#8217;ve written plenty of nippy Python code. In this particular case, in response to my mounting horror at reading and fixing the pyinotify source, I wrote <a href="http://hg.serpentine.com/python-inotify/">bindings of my own</a>. In contrast to pyinotify consuming an entire CPU during moderately heavy filesystem activity, an app using my bindings consumes about 5% of a CPU, even in the face intensive activities like untarring a big file archive.</p>
<p>In part, this is because my bindings are less abstracted than those of pyinotify. I don&#8217;t dispatch out to user methods at all; the caller is responsible for checking a bitmask instead. The readability of application code isn&#8217;t really affected by this, but stripping out all the cruft massively improves performance.</p>
<p>In addition, the application itself is also responsible for using the library in an informed way. To get decent performance with inotify, you <em>must</em> delay calls to <code>read</code> so that the kernel has a chance to aggregate multiple notifications into a single buffer write. In other words, if a call to <code>poll</code> says &#8220;you&#8217;ve got events&#8221;, you have to wait a good fraction of a second before seeing what they are. I provide a <code>Threshold</code> class to help with this.</p>
<p>While it is certainly possible to call into pyinotify in a similarly informed way, I suspect that all its flab and abstraction will gull the unwary coder into thinking that maybe they&#8217;re not writing performance-critical code after all, when in fact they are.</p>
<p>There are other Python inotify interfaces available. One is, like mine, named python-inotify, but a quick glance at its source code revealed some of the same silliness with unnecessary locking that plagues pyinotify, so I quickly averted my eyes. There&#8217;s also a Python API to <a href="http://www.gnome.org/~veillard/gamin/">gamin</a>. I have no opinion about it, beyond not wanting to run another daemon if I can avoid it.</p>
<p>My general advice would be to avoid writing code that involves monitoring filesystem activity. It&#8217;s all too easy to write code that looks sensible, but is actually racy, usually under circumstances that are difficult to reproduce. Tuning performance without introducing more races or bugs is tough. You&#8217;re getting the idea now: <em>hard! scary! find something fun instead!</em></p>
<p>The corollary to this is, of course, that as a user, you ought to be suspicious of any programs you use that monitor filesystem activity. I bet the Beagle and Google Desktop teams have <em>armloads</em> of horror stories.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2008/01/04/why-you-should-not-use-pyinotify/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How to build safe, clean Python 2.5 RPMs for Fedora Core 6</title>
		<link>http://www.serpentine.com/blog/2006/12/22/how-to-build-safe-clean-python-25-rpms-for-fedora-core-6/</link>
		<comments>http://www.serpentine.com/blog/2006/12/22/how-to-build-safe-clean-python-25-rpms-for-fedora-core-6/#comments</comments>
		<pubDate>Sat, 23 Dec 2006 00:18:54 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.serpentine.com/blog/2006/12/22/how-to-build-safe-clean-python-25-rpms-for-fedora-core-6/</guid>
		<description><![CDATA[Since FC6 ships with Python 2.4, you&#8217;re a bit stuck if you want to play with the new features of Python 2.5. Here&#8217;s a quick and easy way to build and install a cleanly-packaged version of Python 2.5 for FC6. First, you must ensure that you have a sufficient development environment available. Fortunately, you can [...]]]></description>
			<content:encoded><![CDATA[Since FC6 ships with Python 2.4, you&#8217;re a bit stuck if you want to play with the new features of Python 2.5. Here&#8217;s a quick and easy way to build and install a cleanly-packaged version of Python 2.5 for FC6.

First, you must ensure that you have a sufficient development environment available. Fortunately, you can do this in one step. Note: this is the <em>only</em> command you&#8217;ll need to run with root privileges until the time comes for you to install the Python RPM that you&#8217;ve built.
<blockquote>
<pre><code>$ sudo yum install autoconf bzip2-devel db4-devel \</code></pre>
<pre><code>  expat-devel findutils gcc-c++ gdbm-devel glibc-devel gmp-devel \</code></pre>
<pre><code>  libGL-devel libX11-devel libtermcap-devel ncurses-devel \</code></pre>
<pre><code>  openssl-devel pkgconfig readline-devel sqlite-devel tar \</code></pre>
<pre><code>  tix-devel tk-devel zlib-devel </code></pre>
</blockquote>
(That&#8217;s one long line of input.) This will trundle along for a few minutes, after which you&#8217;ll have all of the bits you need installed. Except for Python itself, that is. Simply grab this, in source RPM form, from your nearest friendly Rawhide repository.
<blockquote>
<pre><code>lftp ftp://mirrors.kernel.org/fedora/core/development/source/SRPMS</code></pre>
<pre><code>mget python-2*.src.rpm </code></pre>
</blockquote>
Next, install the Python source RPM into a temporary build directory of your choice. In this example, I&#8217;ll use &#8220;/tmp/mypy&#8221;.
<blockquote>
<pre><code>mkdir -p /tmp/mypy/{BUILD,RPMS,SOURCES,SPECS}</code></pre>
<pre><code>rpm --define '_topdir /tmp' -ivh python-2*.src.rpm </code></pre>
</blockquote>
Now you&#8217;ll need to go into the SOURCES directory and frob a single file:
<blockquote>
<pre><code>cd /tmp/mypy/SOURCES</code></pre>
<pre><code>sed -ie 's/DBLIBVER=4.5/DBLIBVER=4.3/' python-2.5-config.patch </code></pre>
</blockquote>
This tells the bsddb module to link against Berkeley DB 4.3 (the default on FC6), rather than 4.5 (which will presumably ship with Fedora 7).

The next step is to build the Python RPM.
<blockquote>
<pre><code>cd /tmp/mypy/SPECS</code></pre>
<pre><code>rpmbuild --define '_topdir /tmp/mypy' --define '__python_ver 25' -bb python.spec</code></pre>
</blockquote>
This takes just a few minutes on my laptop, so it shouldn&#8217;t take long for you, either. Once you&#8217;re done, the binary RPMs will be present somewhere under /tmp/mypy/RPMS. On a 32-bit x86 machine, they&#8217;ll be in the i386 subdirectory, and on an x86_64 machine, they&#8217;ll be in the x86_64 subdirectory. You&#8217;ll have to become root to install them:
<blockquote>
<pre><code>sudo rpm -ivh /tmp/mypy/RPMS/*/*.rpm </code></pre>
</blockquote>
A nice aspect of this way of building is that the packages it builds <em>should not conflict with</em> the system&#8217;s default Python, so you ought not to have any peculiar explosions in one of the many system packages that expect a specific Python version. Your new &#8220;python&#8221; package will be named &#8220;python25&#8243;, for example, and the interpreter will be named &#8220;python25&#8243;, too.]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2006/12/22/how-to-build-safe-clean-python-25-rpms-for-fedora-core-6/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Delicious Python</title>
		<link>http://www.serpentine.com/blog/2005/06/12/delicious-python/</link>
		<comments>http://www.serpentine.com/blog/2005/06/12/delicious-python/#comments</comments>
		<pubDate>Mon, 13 Jun 2005 05:47:48 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://home.serpentine.com/blog/2005/06/12/delicious-python/</guid>
		<description><![CDATA[Or why I love popular scripting languages, reason number one zillion. I use Sage with Firefox to keep up with various blogs, and del.icio.us as a URL dumping ground. It took me approximately five minutes to find a Python interface to del.icio.us and write a script that turns sets of tagged URLs into an OPML [...]]]></description>
			<content:encoded><![CDATA[<p>Or why I love popular scripting languages, reason number one zillion.</p>
<p>I use <a href="http://sage.mozdev.org/">Sage</a> with <a href="http://www.getfirefox.com/">Firefox</a> to keep up with various blogs, and <a href="http://del.icio.us"><tt>del.icio.us</tt></a> as a URL dumping ground.</p>
<p>It took me approximately five minutes to find a Python interface to <tt>del.icio.us</tt> and write a script that turns sets of tagged URLs into an OPML file that I can either drop into Sage or post to my blog:</p>
<p><pre> import delicious deli = delicious.DeliciousNOTAPI() blogs = deli.get_posts_by_user('bos', 'blog') print &rdquo;'  &rdquo;' for blog in blogs:     blog['description'] = blog['description'].lower() blogs.sort(lambda a, b: cmp(a['description'], b['description'])) for blog in blogs:     print '' %         (blog['description'], blog.get('extended', &rdquo;), blog['url']) print '' </pre></p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2005/06/12/delicious-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Python is useless for serious XML processing</title>
		<link>http://www.serpentine.com/blog/2004/10/22/why-python-is-useless-for-serious-xml-processing/</link>
		<comments>http://www.serpentine.com/blog/2004/10/22/why-python-is-useless-for-serious-xml-processing/#comments</comments>
		<pubDate>Fri, 22 Oct 2004 08:08:06 +0000</pubDate>
		<dc:creator>Bryan O'Sullivan</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://home.serpentine.com/blog/2004/10/22/why-python-is-useless-for-serious-xml-processing/</guid>
		<description><![CDATA[I have a Python application in which, for my sins, I decided to use XML as an on-disk storage format. Unfortunately, when I made this decision, I neglected to measure the performance of the available Python XML processing implementations. Bad, bad, bad mistake. I expected that I was going to trade a little saved work [...]]]></description>
			<content:encoded><![CDATA[<p>I have a Python application in which, for my sins, I decided to use XML as an on-disk storage format.  Unfortunately, when I made this decision, I neglected to measure the performance of the available Python XML processing implementations.</p>
<p>Bad, bad, <i>bad</i> mistake.  I expected that I was going to trade a little saved work for some performance, but when I finally got around to profiling my app today, to see why it was so slow, I was shocked.</p>
<p>Using the <a href="http://docs.python.org/lib/module-xml.sax.html"><code>xml.sax</code></a> module, I am able to process a 2.5MB document in 2.5 seconds on a reasonably fast Pentium 4 system.  That gives me one megabyte per second of emphysema-wheezing parsing power.  This number is so spectacularly, laughably bad that I actually spent several hours rechecking my measurements to see if I was doing something heinously stupid.  I wasn&#8217;t&ndash;that is, beyond naïvely hoping for decent performance in the first place.</p>
<p>Now, I could use <a href="http://www.reportlab.org/pyrxp.html">PyRXP</a>, and I have before, but it&#8217;s only about three times faster than <code>xml.sax</code>.  I can chew through vastly more data using <code>fp.write(repr(<i>obj</i>));eval(fp.read())</code>!</p>
<p>I really need something that can parse tens of megabytes of data per second, so as far as I can tell, I simply can&#8217;t mix XML and Python at all.  Sigh.</p>]]></content:encoded>
			<wfw:commentRss>http://www.serpentine.com/blog/2004/10/22/why-python-is-useless-for-serious-xml-processing/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

