Subscribe to
Posts
Comments

I spent some time recently using my criterion benchmark suite to measure the performance of the Haskell text package. I've reproduced the 45 microbenchmarks that I've put together so far below.

The attention deficit summary

  • The performance of the text package is generally good, and there are some obvious areas for improvement.
  • In many cases, simply waiting for the release of GHC HEAD as 6.14 will yield huge (e.g. 100x) speedups "for free."
  • There are a few places where HEAD is noticeably slower than 6.10.

"Which type should I use?"

The text package is generally fast, and in many cases has a more refined and usefully text-oriented API than Haskell lists, so if you need to manipulate text, you should use this package.

If you are accustomed to using the bytestring package to work with text, you should probably switch to using the text package.

Lists are very slow in general, but they do win for a few trivial operations.

What I measured

When benchmarking, I tried two different versions of GHC:

  • The current stable release, 6.10.4.

  • The HEAD (aka the in-progress development code), which will become 6.14 late next year.

For each microbenchmark, I measured the performance of several different data types:

  • Plain old lists, the workhorse of Haskell. You'll see this in the charts below as "list".

  • The popular ByteString type, in both strict ("strict BS") and lazy ("lazy BS") variants.

  • Strict and lazy variants of the Text type—"strict T" and "lazy T", respectively.

I measured the performance of HEAD because it has a completely rewritten inliner. The performance of the text package is extremely sensitive to the quality of the inliner, as the numbers below indicate. In many cases, the inliner gives HEAD more than a factor of 100 speed advantage over 6.10!

When measuring, I evaluted results to head normal form, so operations that amortise as cheap (e.g. consing an element onto the front of a list) appear with their full unamortised costs.

How to read the charts

Here's a handy key for reading colours and labels:

  • Blue is GHC 6.10.4

  • Red is GHC HEAD

  • "lazy BS" is lazy bytestring

  • "strict BS" is strict bytestring

  • "list" is ... list

  • "lazy T" is lazy text

  • "strict T" is strict text

  • "utf8-string" is utf8-string

  • When you see a chart titled "length/foo", this indicates that I computed the length of a result, in an attempt to tickle the text package's stream fusion machinery into doing its thing.

Some of the charts below use linear Y axes, while those with large differences between fastest and slowest code use logarithmic axes. The kind of chart in use should always be clear. In every case, larger bars mean slower code. Unfortunately, using the Chart package, I couldn't figure out how to centre the label for a column beneath the column itself, so each label is slightly offset to the left. I also couldn't figure out how to turn off the tick marks or other chartjunk, so I apologize for the visual clutter.

The charts

I've added notes after whichever charts seem to deserve them. Charts are listed alphabetically.

time for append

time for concatMap

time for concat

time for cons

time for decode

time for drop

time for filter.filter

time for filter

Ouch! Something in HEAD is clobbering the performance of filter.

time for foldl'

time for foldr

time for head

time for init

time for intercalate

time for intersperse

time for isInfixOf

time for last

time for length-cons

time for length-decode

time for length-drop

time for length-filter.filter

time for length-filter

time for length-init

time for length-intercalate

time for length-intersperse

time for length-map.map

time for length-map

time for length-replicate-char

time for length-replicate-string

time for length-tail

time for length-take

time for length-toLower

time for length-toUpper

time for length-words

time for length-zipWith

time for map.map

time for map

time for replicate-char

time for replicate-string

time for reverse

time for tail

time for take

time for toLower

time for toUpper

time for words

time for zipWith

5 Responses to “The performance of Data.Text”

  1. on 11 Dec 2009 at 08:49Ivan Lazar Miljenovic

    Is the “replicate string” chart correct? Not only does it not have any bytestring values (presumably because there’s no corresponding BS.replicate function), but the list value seems to be one big red bar and strictT has no red bar.

  2. on 11 Dec 2009 at 08:50Ivan Lazar Miljenovic

    Oh, and why mention utf8-string at the top if none of the charts actually contain utf8-string information?

  3. on 11 Dec 2009 at 14:00Tom Tobin

    All the charts seem to have the X-axis labels shifted too far over to the left.

  4. on 13 Dec 2009 at 22:04Tim Docker

    In the chart library distribution:

    * example test1a/minimal shows how to strip all of the so called “chartjunk”
    * example test9c shows bar charts with correctly centred text labels

    Feel free to ask for assistance on the mailing list:

    http://groups.google.com/group/haskell-charts

  5. on 15 Dec 2009 at 01:11Bryan O'Sullivan

    Thank you for the pointers, Tim!

    I’d avoided the mailing list because of the huge quantity of spam, but I see that you’ve fixed that. Thanks.

Leave a Reply