The performance of Data.Text
December 10th, 2009 by Bryan O'Sullivan
I spent some time recently using my criterion benchmark suite to measure the performance of the Haskell text package. I've reproduced the 45 microbenchmarks that I've put together so far below.
The attention deficit summary
- The performance of the text package is generally good, and there are some obvious areas for improvement.
- In many cases, simply waiting for the release of GHC HEAD as 6.14 will yield huge (e.g. 100x) speedups "for free."
- There are a few places where HEAD is noticeably slower than 6.10.
"Which type should I use?"
The text package is generally fast, and in many cases has a more refined and usefully text-oriented API than Haskell lists, so if you need to manipulate text, you should use this package.
If you are accustomed to using the bytestring package to work with text, you should probably switch to using the text package.
Lists are very slow in general, but they do win for a few trivial operations.
What I measured
When benchmarking, I tried two different versions of GHC:
The current stable release, 6.10.4.
The HEAD (aka the in-progress development code), which will become 6.14 late next year.
For each microbenchmark, I measured the performance of several different data types:
Plain old lists, the workhorse of Haskell. You'll see this in the charts below as "list".
The popular ByteString type, in both strict ("strict BS") and lazy ("lazy BS") variants.
Strict and lazy variants of the Text type—"strict T" and "lazy T", respectively.
I measured the performance of HEAD because it has a completely rewritten inliner. The performance of the text package is extremely sensitive to the quality of the inliner, as the numbers below indicate. In many cases, the inliner gives HEAD more than a factor of 100 speed advantage over 6.10!
When measuring, I evaluted results to head normal form, so operations that amortise as cheap (e.g. consing an element onto the front of a list) appear with their full unamortised costs.
How to read the charts
Here's a handy key for reading colours and labels:
Blue is GHC 6.10.4
Red is GHC HEAD
"lazy BS" is lazy bytestring
"strict BS" is strict bytestring
"list" is ... list
"lazy T" is lazy text
"strict T" is strict text
"utf8-string" is utf8-string
When you see a chart titled "length/foo", this indicates that I computed the length of a result, in an attempt to tickle the text package's stream fusion machinery into doing its thing.
Some of the charts below use linear Y axes, while those with large differences between fastest and slowest code use logarithmic axes. The kind of chart in use should always be clear. In every case, larger bars mean slower code. Unfortunately, using the Chart package, I couldn't figure out how to centre the label for a column beneath the column itself, so each label is slightly offset to the left. I also couldn't figure out how to turn off the tick marks or other chartjunk, so I apologize for the visual clutter.
The charts
I've added notes after whichever charts seem to deserve them. Charts are listed alphabetically.








Ouch! Something in HEAD is clobbering the performance of filter.






































Is the “replicate string” chart correct? Not only does it not have any bytestring values (presumably because there’s no corresponding BS.replicate function), but the list value seems to be one big red bar and strictT has no red bar.
Oh, and why mention utf8-string at the top if none of the charts actually contain utf8-string information?
All the charts seem to have the X-axis labels shifted too far over to the left.
In the chart library distribution:
* example test1a/minimal shows how to strip all of the so called “chartjunk”
* example test9c shows bar charts with correctly centred text labels
Feel free to ask for assistance on the mailing list:
http://groups.google.com/group/haskell-charts
Thank you for the pointers, Tim!
I’d avoided the mailing list because of the huge quantity of spam, but I see that you’ve fixed that. Thanks.