Sometimes, the old ways are the best

Over the past few months, the Sigma engineering team at Facebook has rolled out a major Haskell project: a rewrite of Sigma, an important weapon in our armory for fighting spam and malware.

Sigma has a mission-critical job, and it needs to scale: its growing workload currently sees it handling tens of millions of requests per minute.

The rewrite of Sigma in Haskell, using the Haxl library that Simon Marlow developed, has been a success. Throughput is higher than under its predecessor, and CPU usage is lower. Sweet!

Nevertheless, success brings with it surprises, and even though I haven’t worked on Sigma or Haxl, I’ve been implicated in one such surprise. To understand my accidental bit part in the show, let's begin by mentioning that Sigma uses JSON internally for various purposes. These days, the Haskell-powered Sigma uses aeson, the JSON library I wrote, to handle JSON data.

A few months ago, the Haxl rewrite of Sigma was going through an episode of crazytown, in which it would intermittently and unpredictably use huge amounts of CPU and memory. The culprit turned out to be JSON strings containing zillions of backslashes. (I have no idea why. If you’ve worked with large volumes of data for a long time, you won’t even bat an eyelash at the idea that a data store somewhere contains some really weird records.)

The team quickly mitigated the problem, and gave me a nudge that I might want to look into the problem. On Sunday evening, with a glass of red wine in hand, I finally dove in to see what was wrong.

Since the Sigma developers had figured out what was causing these time and space explosions, I immediately had a test case to work with, and the results were grim: decoding a mere megabyte of continuous backslashes took over a second, consumed over a gigabyte of memory, and killed concurrency by causing the runtime system to spend almost 90% of its time in the garbage collector. Yikes!

Whatever was going on? If you look at the old implementation of aeson’s unescape function, it seems quite efficient and innocuous. It’s reasonably tightly optimized low-level Haskell.

Trouble is, unescape uses an API (a bytestring builder) that is intended for streaming a result incrementally. Unfortunately the unescape function can’t hand any data back to its caller until it has processed an entire string.

The result is as you’d expect: we build a huge chain of thunks. In this case, the thunks will eventually write data efficiently into buffers. Alas, the thunks have nobody demanding the evaluation of their contents. This chain consumes a lot (a lot!) of memory and incurs a huge amount of GC overhead (long chains of thunks are expensive). Sadness ensues.

The “old ways” in the title refer to the fix: in place of a fancy streaming API, I simply allocate a single big buffer and blast the bytes straight into it.

For that pathological string with almost a megabyte of consecutive backslashes, the new implementation is 27x faster and uses 42x less memory, all for the cost of perhaps an hour of Sunday evening hacking (including a little enabling work that incidentally illustrates just how easy it is to work with monad transformers). Not bad!

Posted in haskell
27 comments on “Sometimes, the old ways are the best
  1. Gabriel says:

    Nice, Haskell FTW

  2. joe potato says:

    who pissed in Asil’s cornflakes this morning?

  3. Andrew says:

    Wow ! Impressive bug and an impressive fix 😀
    Btw. who is this guy Asil? why so abusive? calm down man.

  4. Todd says:

    Nice write up, thanks. Reading about code optimization is interesting, and usually fun because I get to sit back and read instead of trying to unravel my own code for once.

  5. D says:

    Please please please write the 2nd edition of RWH.

  6. スーパーコピー時計販売はコピーガガミラノ通販専門店です . 0.106072179 レプリカガガミラノの私は(私がオンラインで見つける大丈夫、ランダマイザ)大きなふわふわサンタの帽子の中にすべてのaBlogtoWatchチームメンバーの名前を入れて、ランダムに彼らは匿名で2014年からガガミラノの時計を選ぶだろう誰のために相互に各チームメンバーをペアに名前を描いた私はその後、それぞれに通知彼らは「買い物」のためだった、と彼らに “贈り物”を選択し、彼らはその人のためにその時計を選んだ理由について書くための期限を与えた人物 . スーパーコピーガガミラノ時計豊富な品揃えで最新作も随時入荷致しております のでごゆっくりとご覧ください。☆ ガガミラノ時計税関の没収する商品は再度無料にして発送します☆ 送料無料 .

  7. Adrian says:

    Hi Bryan, what is your email address please? thanks vm! Ak

  8. The birth of a boxers calvin klein child is indeed a wondrous moment.

  9. Mind blowing. Thank you for the post.

  10. Nice write up, thanks.

  11. Vikalp Saxena says:

    A Fraud company which hired me saying it needed some basic typing work from me is demanding a huge some of money from me. It didn’t sent me work, after 10 days from hiring it did and its mail landed in my spam folder. The workload was of 10 days. I saw it on the 9th day. I could only complete 1 day workload.
    Now they’re saying I have done a loss of the company so I must pay an amount more than my due salary. Here’s why I think it is fraud, the work was to look at the text in an image and fill up the form below. It could be easily done by a software so why were they making me do it. Don’t they want to save money? I need your help. I don’t have software degree, but if you give me in writing that it is indeed possible through software. Then I will be saved and I would be grateful to you. I am a jobless person not very qualified. And a fraud company is trying to scam me. Please help.

  12. softsoldier says:

    Well it makes sense

  13. Roman Alexeev says:

    You have not written any word about rebase – A you mad?

  14. Roman Alexeev says:

    i mean your definiteve guid book

  15. Peter says:

    no thanks

  16. Peter says:

    All the best

  17. Sam says:

    Stay Home

  18. saqibkhan says:

    you can easily download free software

  19. saqibkhan says:

    you can download free software

  20. Aamir Khan says:

    free books download on, in pdf epub both formats

  21. Aamir Khan says:

    download alll boos free

  22. Aamir Khan says:

    download all books

  23. shawaiz says:

    Perfect writing style, perfect research and attention to detail,
    not only to events but to emotions felt. Thank you so much sir for your great post on the great blog.

  24. sawaiz says:

    Perfect writing style, perfect research and attention to detail,
    not only to events but to emotions felt. Thank you so much sir for your great post on the great blog.

  25. BlackVuefqj says:

    Of his works, he is especially famous

2 Pings/Trackbacks for "Sometimes, the old ways are the best"
  1. […] Sometimes, the old ways are the best ::: teideal glic deisbhéalach […]

  2. […] in aeson, the Haskell JSON parsing library. Bryan O’Sullivan, the author of aeson, wrote a nice blog post about how he fixed it. It turns out that when you do things at Facebook scale, those […]

Leave a Reply

Your email address will not be published. Required fields are marked *