Sometimes, the old ways are the best

Over the past few months, the Sigma engineering team at Facebook has rolled out a major Haskell project: a rewrite of Sigma, an important weapon in our armory for fighting spam and malware.

Sigma has a mission-critical job, and it needs to scale: its growing workload currently sees it handling tens of millions of requests per minute.

The rewrite of Sigma in Haskell, using the Haxl library that Simon Marlow developed, has been a success. Throughput is higher than under its predecessor, and CPU usage is lower. Sweet!

Nevertheless, success brings with it surprises, and even though I haven’t worked on Sigma or Haxl, I’ve been implicated in one such surprise. To understand my accidental bit part in the show, let's begin by mentioning that Sigma uses JSON internally for various purposes. These days, the Haskell-powered Sigma uses aeson, the JSON library I wrote, to handle JSON data.

A few months ago, the Haxl rewrite of Sigma was going through an episode of crazytown, in which it would intermittently and unpredictably use huge amounts of CPU and memory. The culprit turned out to be JSON strings containing zillions of backslashes. (I have no idea why. If you’ve worked with large volumes of data for a long time, you won’t even bat an eyelash at the idea that a data store somewhere contains some really weird records.)

The team quickly mitigated the problem, and gave me a nudge that I might want to look into the problem. On Sunday evening, with a glass of red wine in hand, I finally dove in to see what was wrong.

Since the Sigma developers had figured out what was causing these time and space explosions, I immediately had a test case to work with, and the results were grim: decoding a mere megabyte of continuous backslashes took over a second, consumed over a gigabyte of memory, and killed concurrency by causing the runtime system to spend almost 90% of its time in the garbage collector. Yikes!

Whatever was going on? If you look at the old implementation of aeson’s unescape function, it seems quite efficient and innocuous. It’s reasonably tightly optimized low-level Haskell.

Trouble is, unescape uses an API (a bytestring builder) that is intended for streaming a result incrementally. Unfortunately the unescape function can’t hand any data back to its caller until it has processed an entire string.

The result is as you’d expect: we build a huge chain of thunks. In this case, the thunks will eventually write data efficiently into buffers. Alas, the thunks have nobody demanding the evaluation of their contents. This chain consumes a lot (a lot!) of memory and incurs a huge amount of GC overhead (long chains of thunks are expensive). Sadness ensues.

The “old ways” in the title refer to the fix: in place of a fancy streaming API, I simply allocate a single big buffer and blast the bytes straight into it.

For that pathological string with almost a megabyte of consecutive backslashes, the new implementation is 27x faster and uses 42x less memory, all for the cost of perhaps an hour of Sunday evening hacking (including a little enabling work that incidentally illustrates just how easy it is to work with monad transformers). Not bad!

Posted in haskell
119 comments on “Sometimes, the old ways are the best
  1. jasmine says:

    We believe that strong core values empower growth within organizations and create a company culture that allows team members to make good decisions by applying these values throughout everyday situations and challenges.
    Automated marketing agency

  2. pevime50 says:

    Cheers to overcoming the backslash debacle! |

  3. pevime50 says:

    Cheers to overcoming the backslash debacle | how much does a concrete patio cost

  4. Mara says:

    Greetings from all of us here at Phoenix Business Directory! its growing workload currently sees it handling tens of millions of requests per minute.

  5. veteba12 says:

    The problem was identified and mitigated, showcasing the unpredictable nature of working with large data volumes. | drywall and insulation contractors

  6. The Sigma team’s experience with the Haskell rewrite and subsequent optimization provides valuable lessons for developers working with high-performance applications. By addressing the JSON handling inefficiencies in aeson, significant improvements were achieved, ensuring that Sigma continues to perform effectively in its mission-critical role.

  7. Maloi says:

    There’s something timeless and comforting about tried-and-true methods that have stood the test of time. Embracing the old ways and knowing What is the difference between drywall and sheetrock can remind us of the simplicity and richness of life.

  8. xanih512 says:

    sometimes, it takes a glass of wine and a Sunday evening dive to uncover the real bottlenecks! | wall repair contractors

  9. Marra33 says:

    Hmmm… there’s something timeless and comforting about tried-and-true methods that have stood the test of time. These enduring practices provide a sense of reliability and assurance, connecting us to generations past and guiding us into the future. Explore Our Site to discover a wealth of these time-honored traditions, learn about their histories, and find new ways to incorporate them into your life.

  10. Carl says:

    I agree that sometimes the old ways are the best. In this case, the old way was to allocate a single big buffer and blast the bytes straight into it. This approach was much faster for https://www.drywallatlanta and used less memory than the previous approach.

  11. Venice says:

    Thanks for the time you took to share this info here. commercial epoxy flooring

  12. Sheena says:

    Traditional methods are not difficult for the drywall contractor near me, and you can gain a great deal of knowledge from them. However, this may not be the case for everyone.

  13. That’s impressive! Rewriting Sigma in Haskell and seeing such significant performance improvements is a testament to the language’s capabilities. Haxl seems like a great choice for handling the high volume of requests Sigma processes.

  14. Sammy says:

    You might also be interested in SEO agency.

  15. Alexandra says:

    It’s nice to see informative content here. Hendersonville Concrete Company concreters

  16. Melissa says:

    Sometimes, the old ways are the best, as Bryan O’Sullivan shows with his fix for a performance issue in Facebook’s Sigma project. By opting for a simple, direct approach—allocating a big buffer and writing bytes straight into it—he was able to dramatically improve speed and reduce memory usage. It’s a reminder that sometimes, the tried-and-true methods can be the most effective solution.
    Nashville Custom Railing

  17. Jess says:

    Sometimes, the old ways really are the best. In this case, a straightforward approach using a single buffer outperformed a more complex streaming method, leading to significant gains in both speed and memory efficiency. It’s a great reminder that simple solutions can often be the most effective, even in high-tech environments.
    CityWide Property Appraisals in New Haven CT

  18. Jimmy says:

    Old ways have no complicated way to solve problems like now. Sandy Springs fence

  19. This is a good topic for discussion. Anyone will not stay quiet about this.

  20. It’s great to see how a return to a straightforward approach can lead to significant performance improvements.

  21. Ryle says:

    This is an amazing site, I look forward to seeing more posts here. stamped concrete

Leave a Reply

Your email address will not be published. Required fields are marked *

*