Book review: Parallel and Concurrent Programming in Haskell

It's time someone finally wrote a proper review of Simon Marlow's amazing book, Parallel and Concurrent Programming in Haskell.

I am really not the right person to tackle this job objectively, because I have known Simon for 20 years and I currently happen to be his boss at Facebook. Nevertheless, I fly my flag of editorial bias proudly, and in any case a moment's glance at Simon's book will convince you that the absurdly purple review I am about to write is entirely justified.

Moreover, this book is sufficiently clear, and introduces so many elegant ideas and beautiful abstractions, that you would do well to learn the minimal amount of Haskell necessary to absorb its lessons, simply so that you can become enriched in the reading.

Simon's book makes an overdue departure from the usual Haskell literature (including my own book, which in case you didn't know is fully titled "Real World Haskell Of Six Years Ago Which We Should Have Edited A Little More Carefully") in assuming that you already have a modest degree of understanding of the language. This alone is slightly pathetically refreshing! I can't tell you how glad I am that functional programming has finally reached the point where we no longer have to start every bloody book by explaining what it is.

Actually, there's a second reason that I might not be an ideal person to review this book: I have only skimmed most of the first half, which concerns itself with parallel programming. Just between you and me, I will confess that parallel programming in Haskell hasn't lit my internal fire of enthusiasm. I used to do a lot of parallel programming in a previous life, largely using MPI, and the experience burned me out. While parallel programming in Haskell is far nicer than grinding away in MPI ever was, I do not love the subject enough that I want to read about it.

So what I'm really reviewing here is the second part of Simon's book, which if issued all by itself at the same price as the current entire tome, would still be a bargain. Let's talk about just how good it is.

The second half of the book concerns itself with concurrent programming, an area where Haskell particularly shines, and which happens to be the bread-and-butter of many a working programmer today. The treatment of concurrency does not depend in any way on the preceding chapters, so if you're so inclined, you can read chapter one and then skip to the second half of the book without missing any necessary information.

Chapter 7 begins by introducing some of the basic components of concurrent Haskell, threads (familiar to all) and a data type called an MVar. An MVar acts a bit like a single-item box: you can put one item into it if it's empty, otherwise you must wait; and you can take an item out if it's full, otherwise you must wait.

As humble as the MVar is, Simon uses it as a simple communication channel with which he builds a simple concurrent logging service. He then deftly identifies the performance problem that a concurrent service will have when an MVar acts as a bottleneck. Not content with this bottleneck, he illustrates how to construct an efficient unbounded channel using MVar as the building block, and clearly explains how this more complex structure works safely.

This is the heart of Simon's teaching technique: he presents an idea that is simple to grasp, then pokes a hole in it. With this hole as motivation, he presents a slightly more complicated approach that corrects the weaknesses of the prior step, without sacrificing that clarity.

For instance, the mechanism behind unbounded channels is an intricate dance of two MVars, where Simon clearly explains how they ensure that a writer will not block, while a reader will block only if the channel is empty. He then goes on to show how this channel type can be extended to support multicast, such that one writer can send messages to several readers. His initial implementation is subtly incorrect, which he once again explains and uses as a springboard to a final version. By this time, you've accumulated enough lessons from the progression of examples that you can appreciate the good design taste and durability of these unbounded channels.

Incidentally, this is a good time to talk about the chapter on parallel computing that I made sure not to skip: chapter 4, which covers dataflow parallelism using an abstraction called Par. Many of the types and concerns in this chapter will be familiar to you if you're used to concurrent programming with threads, which makes this the most practical chapter to start with if you want to venture into parallel programming in Haskell, but don't know where to begin. Par is simply wonderfully put together, and is an inspiring example of tasteful, parsimonious API design. So put chapter 4 on your must-read list.

Returning to the concurrent world, chapter 8 introduces exceptions, using asynchronous operations as the motivation. Simon builds a data type called Async, which is similar to "futures" or "promises" from other languages (and to the IVar type from chapter 4), and proceeds to make Async operations progressively more robust in the face of exceptions, then more powerful so that we can wait on the completion of one of several Async operations.

Chapter 9 resumes the progress up the robustness curve, by showing how we can safely cancel Async operations that have not yet completed, how to deal with the trouble that exceptions can cause when thrown at an inopportune time (hello, resource leaks!), and how to put an upper bound on the amount of time that an operation can run for.

Software transactional memory gets an extended treatment in chapters 10 and 11. STM has gotten a bad rap in the concurrent programming community, mostly because the implementations of STM that target traditional programming languages have drawbacks so huge that they are deeply unappealing. In the same way that the Java and C++ of 10-15 years ago ruined the reputation of static type systems when there were vastly better alternatives out there, STM in Haskell might be easy to consign to the intellectual dustbin by association, when in fact it's a much more interesting beast than its relatives.

A key problem with traditional STM is that its performance is killed stone dead by the amount of mutable state that needs to be tracked during a transaction. Haskell sidesteps much of this need for book-keeping with its default stance that favours immutable data. Nevertheless, STM in Haskell does have a cost, and Simon shows how to structure code that uses STM to make its overheads acceptable.

Another huge difficulty with traditional STM lies in the messy boundary between transactional code and code that has side effects (and which hence cannot be safely called from a transaction). Haskell's type system eliminates these difficulties, and in fact makes it easier to construct sophisticated combinations of transactional operations. Although we touched on STM having some overhead, Simon revisits the Async API and uses some of the advanced features of Haskell STM to build a multiple-wait implementation that is more efficient than its MVar-based predecessor.

In chapter 14, Simon covers Cloud Haskell, a set of fascinating packages that implement Erlang-style distributed message passing, complete with monitoring and restart of remote nodes. I admire Cloud Haskell for its practical willingness to adopt wholesale the very solid ideas of the Erlang community, as they have a quarter of a century of positive experience with their distinctive approach to constructing robust distributed applications.

If you don't already know Haskell, this book offers two significant gifts. The first is a vigorous and compelling argument for why Haskell is an uncommonly good language for the kind of concurrent programming that is fundamental to much of today's computing. The second is an eye-opening illustration of some beautiful and powerful APIs that transcend any particular language. Concise, elegant design is worth celebrating wherever you see it, and this book is brimful of examples.

On the other hand, if you're already a Haskell programmer, it is very likely that this book will awaken you to bugs you didn't know your concurrent code had, abstractions that you could be building to make your applications cleaner, and practical lessons in how to start simple and then refine your code as you learn more about your needs.

Finally, for me as a writer of books about computing, this book has lessons too. It is understated, letting the quality of its examples and abstractions convince more deeply than bombast could reach. It is minimalist, revisiting the same few initially simple ideas through successive waves of refinement and teaching. And it is clear, with nary a word out of place.

In short, if you care about Haskell, if you are interested in concurrency, if you appreciate good design, if you have an ear for well-crafted teaching, Parallel and Concurrent Programming in Haskell is a book that you simply must read. We simply do not see books of this quality very often, so treasure 'em when you see 'em.

Posted in haskell, reading
One comment on “Book review: Parallel and Concurrent Programming in Haskell
  1. Ersin Er says:

    Bryan, thanks for the review. Happy to have such high quality books about Haskell and FP. I think your Haskell book also deserves the credit as it’s truly a real world book. But as you said six years passed I would like to catch up with Haskell’s state of practice with a new edition of RWH. Is it planned for any time soon?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>