Subscribe to
Posts
Comments

Johan and I are hard at work getting the new I/O manager for GHC into shape. He published some numbers earlier today describing the dramatic performance difference between GHC's current timeout support and ours. Here, I'd like to talk about another aspect of performance: sending and receiving data.

We have a very simple benchmark that exercises our event and timeout APIs. It creates a number of pipes, and every time that it is told a pipe is ready for writing, it sends a 1-byte message over that pipe. When it is told the other end of a pipe is ready for reading, it reads the message after a configurable delay, until a specified number of messages have been sent and received.

In the charts below, we send a million 1-byte messages through an increasing number of pipes, starting at 100 and growing by a factor of 12/10 each time, ending at 83,898 pipes. Since a pipe consists of two file descriptors, we exceed the number of file descriptors that regular GHC can handle after just nine iterations. The new I/O manager deals happily with hundreds of thousands of file descriptors and active timeouts in use at one time, as the numbers below indicate. Note that the scales of all charts below are log-log.

I measured the first set of numbers on my Linux laptop running 64-bit Fedora 12 (2.4GHz Core 2 Duo), using GHC 6.10.4. The back end used here is epoll.

The second set of numbers come from my Mac laptop running OS X 10.6.2 (also a 2.4GHz Core 2 Duo), using GHC 6.12.1. The back end used here is kqueue.

And finally, a comparison between Linux and Mac performance numbers.

For small numbers of file descriptors and with no timeouts in use, epoll gives about 3 times better throughput than kqueue. With 160,000+ file descriptors in use, epoll's advantage declines to a factor of 2. We drop from 440,000 messages per second (nice!) with 100 pipes under Linux to 25,000 per second with 83,898 pipes.

When we introduce timeouts, the performance gap narrows, from a 2x advantage for epoll at low file descriptor count to parity at a high count, and throughput takes a hit that grows large as the number of file descriptors (and presumably timeouts) in use grows. I haven't checked to see why this is, but there can easily be over 100,000 timeouts active at one time when the delay between being notified and reading from a file descriptor is long; perhaps the expense of managing them has something to do with it?

In any case, with hundreds of thousands of file descriptors and timeouts active at once (oh, and a respectably small memory footprint, too), I am satisfied that our code is working extremely well!

3 Responses to “New GHC I/O manager, first sets of benchmark numbers”

  1. on 22 Jan 2010 at 09:44solrize

    I wonder what the OS kernels are doing as they deliver those messages. Maybe they could use some better data structures too.

  2. on 22 Jan 2010 at 12:49Luke Hoersten

    Well done! This looks awesome and your PR job has been great, too.

  3. on 14 Feb 2010 at 11:22Sergey Miryanov

    Can you compare new IO with old one?

Leave a Reply