The great Python vs ld.so smackdown

At work, I’m building an application in a combination of Python and C++. Working in Python is sheer pleasure, but trying to build a “shrink wrap” application in Python is a horrible task.

The problem is that in order to have any hope of shipping a single package that should work on a number of versions of a given Linux distribution, you must ship your own Python interpreter, and your own versions of any other packages that either have licensing issues or suffer from version drift.

In our case, we’re using PyQt (a Python interface to the Qt toolkit) for our user interface, and we have both licensing issues and version drift to contend with. We have to ship our own non-GPLed Qt library and PyQt modules, and statically link them into a single giant sack of bloat.

Yesterday, I made substantial progress on building the sack-o’-bloat, to the point where I had an actual binary that I could run. My excitement didn’t last long, as the program segfaulted during startup.

The problem turned out to be interesting. Because I like to live on the bleeding edge, I’m building with Python 2.4. Python has shipped with an Expat (an XML parser) interface module for a long time, but I hadn’t realised that it actually ships with its own embedded version of Expat. The Expat module is implemented as a shared object, pyexpat.so, which contains both the Python code and Python’s copy of Expat.

Qt also links against Expat. Combine Python and Qt into a single binary, and you end up with a binary that has a dynamic dependency on libexpat.so. The dynamic linker, ld.so, starts the program happily enough. For each symbol it’s asked to resolve, it checks every mapped shared object, in order, to see if it contains the symbol.

Since the dynamic linker opens libexpat.so long before pyexpat.so, it finds and resolves its definitions of Expat routines instead of Python’s. Couple this with the fact that Python 2.4 bundles a newer version of Expat than even Fedora Core 3 ships with, and you may see a problem.

The dynamic linker causes Python to try to use Expat functions that have fewer features in the system’s older copy of Expat, but Python never checks the return values of those functions, because it “knows” that they’re defined to always return a correct result.

And after all that, I wasn’t even the first to find this problem.

Posted in software

Leave a Reply

Your email address will not be published. Required fields are marked *

*