What’s in a parsing library? (1/2)

Posted on 2010-03-03 by Bryan O'Sullivan — 10 Comments ↓

My goal in working on the new GHC I/O manager has been to get the Haskell network stack into a state where it could be used to attack high-performance and scalable networking problems, domains in which it has historically been weak.

While it's encouraging to have an excellent networking stack (Johan and I now have this thoroughly in hand), the next thing I'd look for is libraries to help build networked applications. One of the fundamental things that such apps need to do well is parse data, be it received from the network or read from files.

The Haskell parsing library of first resort has for years been Parsec. While other capable libraries exist (e.g. polyparse and uu-parsinglib), they don't appear to see much use.

As appealing as Parsec's API is, it has a few problems:

Parsec 2 is slow, and it has high memory overhead, due to its use of Haskell's String type for tokens. Parsec 3 can use the more efficient ByteString type (which is in any case much more appropriate for networked applications that deal in octets), but it achieves this flexibility at the cost of being even slower than Parsec 2.
Parsec's API demands that all of a parser's input be available at once. People usually work around this by feeding a Parsec parser with lazily read data, but lazy I/O is at odds with my goal of writing solid networked code.

What properties should a parsing library for networked applications ideally possess? There are a few obvious desiderata that have been well known for years. For example, it's important to have an appealing API and programming model. Parsec squarely fits this desire.

Performance is also a big consideration. Ideally, a parsing library would be fast enough that you wouldn't feel any real need for either of the following:

A few weeks to write an insane hand-bummed parser.
Mechanical parser generators or lexers (e.g. happy or alex).

There are some additional important constraints on a realistic library: it must fit well into a highly concurrent networked world full of unreliable, hostile and incompetent clients.

High concurrency levels demand a low per-connection memory footprint.
The need to cope with poorly behaved clients requires that applications must be able to throttle connections that are too busy, or kill connections that are too slow or attempting to consume too many server resources. A good parsing library will not get in the way of these needs.

A few years ago, I made a few half-hearted attempts to write a specialised version of Parsec, which I eventually named Attoparsec.

I began with a stripped-down Parsec that was specialised to accept ByteString input. I then extended the API to allow a parser to consume small chunks of input at a time.

Because I wasn't using Attoparsec "in anger" at the time, I made sure that my library worked (more or less), but I was not measuring its performance.

In late January of this year, I began to think about using Attoparsec as the parser for a simple HTTP server that I could use to benchmark our new GHC I/O manager code. Clearly, I'd want the parser to perform well, or it would distort my numbers rather badly.

By coincidence, John MacFarlane emailed me around the same time, with disturbing findings: he'd tried Attoparsec, and found its performance to be terrible! In fact, it was 4 to 20 times slower than plain Parsec with his experimental parser and test data. Clearly, I had some hard work to look forward to.

Happily, that work is now almost complete, and I am pleased with the results. In the next post, I'll have some details of what this all entails.

Posted in haskell, Uncategorized

10 comments on “What’s in a parsing library? (1/2)”

michalt says:

2010-03-03 at 10:08

Sounds great! Can’t wait for the next post. 🙂
Jason Dusek says:

2010-03-03 at 17:14

Oh, man — that’s surprising news about Attoparsec. I guess it still saves memory.

I appreciated that Attoparsec had straightforward Applicative and Alternative instances; many parsers have ended up with specialized versions of the operators or whole new shadow classes (I believe the Utrecht parser combinators have their own Applicative class).
Rob MacAulay says:

2010-04-11 at 10:04

Just thinking about the speed vs error handling issue.

Most of the Haskell parsing libraries return the failing string if an error occurs. So you could pass this to the slower but more friendly parser to give you more information.

Obviously, this bloats the code hugely, but that may be acceptable in some circumstances.

You might even extend the technique so that you have a very fast parser that works for 95% of the code, and hand over to a slower parser for other cases.
低価,新作登場コーチ COACH★バッグ(ボストンバッグ)F93342 オリーブヴァリックナイロンパッカブルダッフルアウトレット品激安！メンズレディースブランドセール通販 SALE 旅行 2014 母 says:

2015-11-24 at 15:57

RIMOWA リモワサルサデラックススーツケース SALSA DELUXE 中サイズ63L 4輪マルチホイールブラウン 872.63 Multiwheel BROWN 高品質
正規品レビューを書いて Dickies ディッキーズ Q&Q ソーラースマイルソーラーアナログ正規品 DQ-0001-03 ブルーレッドイエロー青赤黄色カラフルメンズレディース男女兼用ユニセッ says:

2015-11-28 at 04:37

http://www.atayurtcerrahi.com/CASIOカシオGSHOCKGショックメンズ腕時計時計多機能防水G75101V海-a-1807.html
LOGOS 防水防寒ダブル中綿スーツ·シグマ中綿2倍の寒冷地仕様、裏フリースで保温性抜群[防水防寒ウェアスーツコートジャケットパンツウインドブレーカーグローブガウンルームシュ says:

2015-12-04 at 02:42

http://www.cafesainto.com/半袖ジップシャツノースフェイスTHENORTHFACEレディースショートスリーブアル-tt-1977.html
[ロゴス LOGOS グリルキャリーバッグ XL]キャリーバッグ/かばん/カバン/鞄/バッグ/トートバッグ/ショルダーバッグ/チューブラル収納/収納バッグ/ビッグ/大型/持ち運び/運搬/収納/便利/キャンプ says:

2015-12-18 at 01:51

BBQグリル
新作勢揃い日本製　５枚組座布団カバー銘仙判　５５×５９ｃｍ座布団カバー細部までこだわり、日本国内の工場で生産された座布団カバーです５５×５９ｃｍ銘仙判座布団カバーYKKファス says:

2015-12-19 at 09:55

上質キングサイズ　タオル地ボックスシーツ季節を問わず使えるタオル布団カバー。気持ちの良い肌触りはずっと感じていたいタオルカバーは夏は余計な汗を吸い取り、冬は布団に入った時のひんやり感がありません低価格
定番,大得価ドン·ペリドンペリニヨン白 [2002] 750ml (箱なし) 正規品ギフトシャンパンドンペリドンペリニョン白 2002 価格ランキング父の日プレゼントランキングオススメギフト超激 says:

2015-12-30 at 00:46

限定品 inax 水栓 ★キッチン用水栓金具壁付タイプ一般水栓ツーハンドル混合水栓一般地 SF-212-13-RU 豊富な,人気
満足保証アイスウォッチ腕時計 [ ICE WATCH時計 ]( アイスウォッチ時計 ) シリフォーエバー ( Siri ) ユニセックス/男女兼用時計/ピンク/SIPKUS [スポーツカジュアル][プレゼント/ギフト] 品質100 says:

2016-01-05 at 14:07

http://www.mycabosource.com/シパールスパイシーピンクMaxi-CosiPearlチャイルドシートジュニアシー3577

What’s in a parsing library? (1/2)

10 comments on “What’s in a parsing library? (1/2)”

Leave a Reply Cancel reply