Applicative functors are gorgeous and versatile creatures, but as is common in Haskell, they lack a little in documentation. The paper that Conor and Ross wrote introducing them is good, but dense. What if we were to skip all the scene-setting kerfuffle, and plunge into using them by example?
I won’t attempt to describe what applicative functors actually are, because the idea is easy to absorb: we’ll pick it up as an incidental product of figuring out how to use them. Bear with me.
To begin with, we need a motivating example, and we’ll choose something simple, yet chewy: parsing an
application/x-www-form-urlencoded string. Remember, they look like this:
While we’ll use Parsec as our parsing library, we’ll give it a completely new skin.
module ApplicativeParsec ( -- Re-export the contents of these modules, for convenience. module Control.Applicative , module Text.ParserCombinators.Parsec ) where import Control.Applicative import Control.Monad (MonadPlus(..), ap) -- Hide Parsec's definitions of some Applicative functions. import Text.ParserCombinators.Parsec hiding (many, optional, (<|>)) -- Every Monad is an Applicative. instance Applicative (GenParser s a) where pure = return (<*>) = ap -- Every MonadPlus is an Alternative. instance Alternative (GenParser s a) where empty = mzero (<|>) = mplus
We’ve made two claims in the code above. We’ve said that a Parsec parser is an applicative functor (you know, whatever that is), and that we can chain parsers into a series of alternatives.
There doesn’t seem to be general agreement about whether it’s okay for a key in an urlencoded string to have no associated value, so we’ll assume that it is by wrapping the value up in
Maybe. If a key has a value (including an empty string), we’ll use
p_query :: CharParser () [(String, Maybe String)]
I’ll give the complete definition of the parser here, then walk through what the various bits mean.
p_query = pair `sepBy` char '&' where pair = liftA2 (,) (many1 safe) (optional (char '=' *> many safe)) safe = oneOf urlBaseChars <|> char '%' *> liftA2 diddle hexDigit hexDigit <|> ' ' <$ char '+' diddle a b = toEnum . fst . head . readHex $ [a,b] urlBaseChars = ['a'..'z']++['A'..'Z']++['0'..'9']++"$-_.!*'(),"
First of all, notice the complete lack of variables in the code. This is the reason behind calling these beasts applicative: we construct our code by combining applications of functions.
Now for a piecewise decomposition of the code.
pair `sepBy` char '&'
sepBy combinator takes two parsers as its arguments. It applies the left parser, then the right, then the first, accumulating the results of each left-hand parse in a list.
The definition of
pair is where things begin to become interesting.
pair = liftA2 (,) (many1 safe) (optional (char '=' *> many safe))
We’ll dissect this from the inside out, which leads us to consider this first.
char '=' *> many safe
*> combinator applies the parser on the left, throws away its result, then applies the parser on the right. You can think of the
> as “pointing at” the parser whose value is really to be returned. This implies that there’s probably a
<* combinator which throws away the result on the right, and indeed there is.
As for the parser on the right,
many applies a parser repeatedly, building up a list of results as it goes. It terminates as soon as the parser fails.
This snippet amounts to “match an equals sign, then run the
safe parser repeatedly until it fails, accumulating each result into a list”.
optional combinator turns a parser that must succeed into one that might fail.
optional :: (Alternative f) => f a -> f (Maybe a)
By wrapping our earlier snippet with
optional, we’ve turned it into a parser that will return
Nothing if our key isn’t followed by a value, or
Just wrapping the value if there is one.
The meaning of the
many1 safe snippet should be easy to guess:
many1 applies the
safe parser repeatedly, like
many, but it fails if its parser doesn’t succeed at least once. (If you’re used to thinking in terms of regular expressions,
many is like
liftA2 (,). This takes two parsers as arguments. It applies the first parser, then the second, then applies a function to their results. Here, the function is
(,), which constructs a tuple. Considered as a whole, this gives us a parser that applies two other parsers, and tuples up their results, returning the tuple as its result.
safe parser handles a single character of a key or value. Now that we’re beginning to become familiar with the notation, we can gloss a little over its details and concentrate on what’s new.
safe = oneOf urlBaseChars <|> char '%' *> liftA2 diddle hexDigit hexDigit <|> ' ' <$ char '+'
Most obvious is the
<|> combinator. It takes two parsers as arguments. If the left succeeds, its result becomes the result of
<|>. If it fails, the result becomes the result of the right parser. As the above code implies, we can chain uses of
The first “arm” of
safe parses a single character, taken from a set of acceptable characters. The second consumes a percent-encoded character, using the
liftA2 combinator that we saw earlier: it consumes a “%” character, followed by two hex digits, then applies
diddle to the digits to turn them into a single Haskell character.
diddle a b = toEnum . fst . head . readHex $ [a,b]
readHex function lives in the
Numeric module, and parses a hex string as an integer.
readHex :: (Num a) => String -> [(a, String)]
We’re now faced with a new combinator,
' ' <$ char '+'
This applies the parser on the right: if it succeeds, it throws away its result and instead returns the value on the left. In other words, if we match “+”, we return a space.
Here’s another example: here, we parse a series of RFC2822 or HTTP (or MIME, or what have you) headers into a list of name/value pairs, including the proper handling of continuation lines.
p_headers :: CharParser st [(String, String)] p_headers = header `manyTill` crlf where header = liftA2 (,) fieldName (char ':' *> spaces *> contents) fieldName = liftA2 (:) letter (many fieldChar) fieldChar = letter <|> digit <|> oneOf "-_" contents = liftA2 (++) (many1 notEOL <* crlf) (continuation <|> pure ) continuation = liftA2 (:) (' ' <$ many1 (oneOf " \t")) contents crlf :: CharParser st () crlf = (() <$ string "\r\n") <|> (() <$ newline) notEOL :: CharParser st Char notEOL = noneOf "\r\n"
Notice our use of the
<* combinator that we mentioned earlier, and once again the lack of let- or lambda-bound variables.
It can take a little while to get used to constructing and reading parsers in a wholly applicative style, but it does quickly start to feel both natural and appealing.
Compared to writing a parser in monadic style, the notation of the
Control.Applicative module leads to significantly more compact code. Some of the big wins in expressivity come from the attention of the authors to surprisingly small details, like the
<$ operators, and their precedences and associativities. These let us write chains of “pointy” functors, where we visually follow the arrows to see what’s really being used.
By the way, the complete name for applicative functors is a mouthful: they’re strong lax monoidal functors. These structures come to us from abstract algebra and category theory. If you’re a functional programmer and you’re not following the work of people like Conor McBride and Jeremy Gibbons, you really should: it’s a whole lot of fun when the seemingly ethereal world of abstract mathematics comes home to roost in Haskell.