New year, new library releases, new levels of speed

Posted on 2014-01-09 by Bryan O'Sullivan — 1 Comment ↓

I just released new versions of the Haskell text, attoparsec, and aeson libraries on Hackage, and there’s a surprising amount to look forward to in them.

The summary for the impatient: some core operations in text and aeson are now much more efficient. With text, UTF-8 encoding is up to four times faster, while with aeson, encoding and decoding of JSON bytestrings are both up to twice as fast.

attoparsec 0.11.1.0

Perhaps the least interesting release is attoparsec. It adds a new dependency on Bas Van Dijk’s scientific package to allow efficient and more accurate parsing of floating point numbers, a longstanding minor weakness. It also introduces two new functions for single-token lookahead, which are used by the new release of aeson; read on for more details.

text 1.1.0.0

The new release of the text library has much better support for encoding to a UTF-8 bytestring via the encodeUtf8 function. The new encoder is up to four times faster than in the previous major release.

Simon Meier contributed a pair of UTF-8 encoding functions that can encode to the new Builder type in the latest version of the bytestring library. These functions are slower than the new encodeUtf8 implementation, but still twice as fast as the old encodeUtf8.

Not only are the new Builder encoders admirably fast, they’re more flexible than encodeUtf8, as Builders can be used to efficiently glue together from many small fragments. Once again, read on for more details about how this helped with the new release of aeson. (Note: if you don’t have the latest version of bytestring in your library environment, you won’t get the new Builder encoders.)

The second major change to the text library came about when I finally decided to expose all of the library’s internal modules. The newly exposed modules can be found in the Data.Text.Internal hierarchy. Before you get too excited, please understand that I can’t make guarantees of release-to-release stability for any functions or types that are documented as internal.

aeson 0.7.0.0

Finally, the new release of the aeson library focuses on improved performance and accuracy. We parse floating point numbers more accurately thanks once again to Bas van Dijk’s scientific library. And for performance, both decoding and encoding of JSON bytestrings are up to twice as fast as in the previous release.

On the decoding side, I used the new lookahead primitives from attoparsec to make parsing faster and less memory intensive (by avoiding backtracking, if you’re curious). Meanwhile, Simon Meier contributed a patch that uses his new Builder based UTF-8 encoder from the text library to double encoding performance. (Encoding performance is improved even if you don’t have the necessary new version of bytestring, but only by about 10%.)

On my crummy old Mac laptop, I can decode at 30-40 megabytes per second, and encode at 100-170 megabytes per second. Not bad!

Thanks

I'd particularly like to thank Bas van Dijk and Simon Meier for their excellent contributions during this most recent development cycle. It's really a pleasure to work with such smart, friendly people.

Simon and Bas deserve some kind of an additional medal for being forgiving of my sometimes embarrassingly long review latencies: some of Simon's patches against the text library are almost two years old! (Please pardon me while I grasp at straws in my slightly shamefaced partial defence here: the necessary version of bytestring wasn't released until three months ago, so I'm not the only person in the Haskell community with long review latencies...)

Posted in haskell, open source

One comment on “New year, new library releases, new levels of speed”

Simon Meier says:

2014-01-09 at 02:11

Hi Brian thanks for the kind words and the hard work you put into your libraries. They are a significant part of the reason for why we’re confidently using Haskell to build our backend services here at Erudify

best regards,
Simon