A little care and feeding can go a long way

Sometimes, when a software package meets a certain level of maturity (or the desire to hack on it fades sufficiently), it's tempting to consider it "done". Here's a little tale of when done isn't really done.

About a week ago, I received a message from Finlay Thompson asking about my Haskell statistics package: he wanted to know how to generate pseudo-random variables using it. I redirected him from that to my mwc-random package, where my pseudo-random number generation code lives.

The mwc-random package currently provides generators for two widely used distributions: uniform and normal. When I was originally writing it, I paid particular attention to making it high quality, fast, and easy to use.

"High quality" sounds a little nebulous, but in the world of pseudo-random number generation, it's actually pretty well defined: a good PRNG should have a large period (the number of samples you need to pull out of it before it repeats itself, assuming a good seed), and the numbers it generates should withstand stringent tests of apparent independence (simply put, given one datum, you shouldn't be able to predict the next).

One algorithm that satisfies these criteria of quality is George Marsaglia's multiply-with-carry algorithm MWC256 (also known as MWC8222). It has a period of about 28222 (huge enough for all conceivable practical purposes), and stands up well to the "testu01", "diehard" and "big crush" statistical tests.

Due to its simplicity, MWC256 is also very fast, and under appropriate circumstances (e.g. on a 64-bit machine) it can be even faster than the well known Mersenne Twister algorithm (which also fails some statistical tests that MWC256 passes).

The Mersenne Twister is itself available for Haskellers to use, in the form of the mersenne-random package. This package is a wrapper around the Mersenne Twister library, and unfortunately it imposes on its users the underlying library's typically horrible constraints borne of too much Fortran programming: you can only have one PRNG per application, and it can only be used from a single thread! The mwc-random package is less restrictive: fire up as many PRNGs in different threads as you like, and they'll all operate independently. You can also use the PRNGs in either the ST or the IO monad, for further convenience.

When generating normally distributed random variables, the mwc-random package uses an algorithm known as the "modified ziggurat". One of the more popular algorithms for generating normally distributed variables is called the ziggurat, but its popularity belies an ill-understood quality problem: the numbers it generates aren't independent enough! It turns out that they are noticeably correlated. The modified ziggurat is almost as fast, and it sacrifices a little speed in the name of improved independence.

The base-level performance of the random number generators looks like this on my Mac using 32-bit GHC 6.12.3, where the time quoted is to generate a single double-precision floating point number:

  • Uniform: 142.6 nanoseconds

  • Normal: 15149 nanoseconds

Where does the question of being "done" or not come in? Well, while poking around tonight, I was a little surprised at the large difference in speed between the uniform and normal PRNGs, so I investigated. The ziggurat algorithm gets its name from the precomputed lookup tables it uses to gain its speed. It turns out that GHC's inliner was being too aggressive with the table-related code, causing the ziggurat tables to be regenerated over and over instead of precomputed just once. Ouch!

One small and very quick change, and the performance of the PRNG for normally distributed variables changed dramatically:

  • Before: 15149 nanoseconds

  • After: 246.8 nanoseconds

That's a little over 61 times faster. Not bad for a couple of lines of changes!

As a final note, now that GHC can build 64-bit programs on a Mac, you might wonder how it performs. Here's a comparison between 32-bit and 64-bit versions of GHC 7.0.2 (times in nanoseconds):

type32-bit64-bitspeedup
uniform Double148285.3
uniform Int325316.73.2
normal Int32252624.1

Those are some pretty nice performance improvements! Of course, not all applications come in the form of nice tight numeric kernels, so don't take it as given that you'll see improvements like this in your code.

Posted in haskell, open source
19 comments on “A little care and feeding can go a long way
  1. aleator says:

    Kudos for making mwc-random! Thanks to this what used to take me a several days to run now does so in an afternoon.

    Ville

  2. Paolo Losi says:

    Hi Brian,

    sorry for raising up an older post…

    Do you have any pointer on how to debug / optimize haskell code especially
    WRT inlining issues? It quite scary that a couple of pragma could make such
    a big difference!

    Would the same (old) code compiled with ghc 7.4.1 be better performing?

    Thanks!

  3. Marlon says:

    this is amazing. i denliitefy feel like there is a dearth of information about and attention to the time and space behavior of haskell programs. this is on purpose of course but i worry about using haskell for my microcontroller, i never know if that one badly programmed evaluation will start crunching and stalling.

  4. THX that’s a great answer!

  5. I’ve been looking for a post like this forever (and a day)

  6. It is strange, but interesting about how this film keeps the viewer’s attention. There isn’t a plotline, we just run through the alphabet over and over again by seeing these words. It’s hard to keep track of what words we’ve seen, or what letters we missed. At the same time, however, we know what’s coming – what letter the next word should start with. This makes the film easy to just sit back and watch, because we as viewers are not trying to figure it out.

  7. Ya learn something new everyday. It’s true I guess!

  8. I¡¦m not sure where you are getting your info, but good topic. I must spend a while finding out more or understanding more. Thank you for excellent info I was on the lookout for this info for my mission.

  9. by India’s rich color palette, Nicole Richie‘s spring 2012 Winter Kate collection is a warm weather fantasy wardrobe. ”I’ve always loved the possibility of being able to

  10. a10 says:

    I appreciate your working style at the end just my request is please share with us some more great post…

  11. colorswitch says:

    I think your post is very interesting and mostly i keep looking for like this type of websites where i learn or get new concept.

  12. roblox says:

    I am happy to visit on your site. Thanks to share it.

  13. I really like the dear information you offer in your articles.

  14. พูดว่า:Howdy! This is my 1st comment here so I just wanted to give a quick shout out and tell you I genuinely enjoy reading your posts. Can you suggest any other blogs/websites/forums that deal with the same subjects? Thanks a lot!

  15. Thanks for putting your gafas oakley holbrook effort in publishing this article.

  16. Kiyra Rani says:

    Great stuff, I like your appearance.

  17. Nisha Roy says:

    Strikingly Awesome These are a touch of the strong choices, saw for your use. Your blog web page page site page diffusing in radiantly baffling for saying to. thanks

  18. roblox hack says:

    Hello guys really such a amazing post dear and friends here is the best way to get robux for free

  19. Your share is the great knowledge I have gathered, you are an important person I admire, thank you

Leave a Reply

Your email address will not be published. Required fields are marked *

*