Applicative functors are gorgeous and versatile creatures, but as is common in Haskell, they lack a little in documentation. The paper that Conor and Ross wrote introducing them is good, but dense. What if we were to skip all the scene-setting kerfuffle, and plunge into using them by example?
I won’t attempt to describe what applicative functors actually are, because the idea is easy to absorb: we’ll pick it up as an incidental product of figuring out how to use them. Bear with me.
To begin with, we need a motivating example, and we’ll choose something simple, yet chewy: parsing an
application/x-www-form-urlencoded string. Remember, they look like this:
While we’ll use Parsec as our parsing library, we’ll give it a completely new skin.
module ApplicativeParsec ( -- Re-export the contents of these modules, for convenience. module Control.Applicative , module Text.ParserCombinators.Parsec ) where import Control.Applicative import Control.Monad (MonadPlus(..), ap) -- Hide Parsec's definitions of some Applicative functions. import Text.ParserCombinators.Parsec hiding (many, optional, (<|>)) -- Every Monad is an Applicative. instance Applicative (GenParser s a) where pure = return (<*>) = ap -- Every MonadPlus is an Alternative. instance Alternative (GenParser s a) where empty = mzero (<|>) = mplus
We’ve made two claims in the code above. We’ve said that a Parsec parser is an applicative functor (you know, whatever that is), and that we can chain parsers into a series of alternatives.
There doesn’t seem to be general agreement about whether it’s okay for a key in an urlencoded string to have no associated value, so we’ll assume that it is by wrapping the value up in
Maybe. If a key has a value (including an empty string), we’ll use
p_query :: CharParser () [(String, Maybe String)]
I’ll give the complete definition of the parser here, then walk through what the various bits mean.
p_query = pair `sepBy` char '&' where pair = liftA2 (,) (many1 safe) (optional (char '=' *> many safe)) safe = oneOf urlBaseChars <|> char '%' *> liftA2 diddle hexDigit hexDigit <|> ' ' <$ char '+' diddle a b = toEnum . fst . head . readHex $ [a,b] urlBaseChars = ['a'..'z']++['A'..'Z']++['0'..'9']++"$-_.!*'(),"
First of all, notice the complete lack of variables in the code. This is the reason behind calling these beasts applicative: we construct our code by combining applications of functions.
Now for a piecewise decomposition of the code.
pair `sepBy` char '&'
sepBy combinator takes two parsers as its arguments. It applies the left parser, then the right, then the first, accumulating the results of each left-hand parse in a list.
The definition of
pair is where things begin to become interesting.
pair = liftA2 (,) (many1 safe) (optional (char '=' *> many safe))
We’ll dissect this from the inside out, which leads us to consider this first.
char '=' *> many safe
*> combinator applies the parser on the left, throws away its result, then applies the parser on the right. You can think of the
> as “pointing at” the parser whose value is really to be returned. This implies that there’s probably a
<* combinator which throws away the result on the right, and indeed there is.
As for the parser on the right,
many applies a parser repeatedly, building up a list of results as it goes. It terminates as soon as the parser fails.
This snippet amounts to “match an equals sign, then run the
safe parser repeatedly until it fails, accumulating each result into a list”.
optional combinator turns a parser that must succeed into one that might fail.
optional :: (Alternative f) => f a -> f (Maybe a)
By wrapping our earlier snippet with
optional, we’ve turned it into a parser that will return
Nothing if our key isn’t followed by a value, or
Just wrapping the value if there is one.
The meaning of the
many1 safe snippet should be easy to guess:
many1 applies the
safe parser repeatedly, like
many, but it fails if its parser doesn’t succeed at least once. (If you’re used to thinking in terms of regular expressions,
many is like
liftA2 (,). This takes two parsers as arguments. It applies the first parser, then the second, then applies a function to their results. Here, the function is
(,), which constructs a tuple. Considered as a whole, this gives us a parser that applies two other parsers, and tuples up their results, returning the tuple as its result.
safe parser handles a single character of a key or value. Now that we’re beginning to become familiar with the notation, we can gloss a little over its details and concentrate on what’s new.
safe = oneOf urlBaseChars <|> char '%' *> liftA2 diddle hexDigit hexDigit <|> ' ' <$ char '+'
Most obvious is the
<|> combinator. It takes two parsers as arguments. If the left succeeds, its result becomes the result of
<|>. If it fails, the result becomes the result of the right parser. As the above code implies, we can chain uses of
The first “arm” of
safe parses a single character, taken from a set of acceptable characters. The second consumes a percent-encoded character, using the
liftA2 combinator that we saw earlier: it consumes a “%” character, followed by two hex digits, then applies
diddle to the digits to turn them into a single Haskell character.
diddle a b = toEnum . fst . head . readHex $ [a,b]
readHex function lives in the
Numeric module, and parses a hex string as an integer.
readHex :: (Num a) => String -> [(a, String)]
We’re now faced with a new combinator,
' ' <$ char '+'
This applies the parser on the right: if it succeeds, it throws away its result and instead returns the value on the left. In other words, if we match “+”, we return a space.
Here’s another example: here, we parse a series of RFC2822 or HTTP (or MIME, or what have you) headers into a list of name/value pairs, including the proper handling of continuation lines.
p_headers :: CharParser st [(String, String)] p_headers = header `manyTill` crlf where header = liftA2 (,) fieldName (char ':' *> spaces *> contents) fieldName = liftA2 (:) letter (many fieldChar) fieldChar = letter <|> digit <|> oneOf "-_" contents = liftA2 (++) (many1 notEOL <* crlf) (continuation <|> pure ) continuation = liftA2 (:) (' ' <$ many1 (oneOf " \t")) contents crlf :: CharParser st () crlf = (() <$ string "\r\n") <|> (() <$ newline) notEOL :: CharParser st Char notEOL = noneOf "\r\n"
Notice our use of the
<* combinator that we mentioned earlier, and once again the lack of let- or lambda-bound variables.
It can take a little while to get used to constructing and reading parsers in a wholly applicative style, but it does quickly start to feel both natural and appealing.
Compared to writing a parser in monadic style, the notation of the
Control.Applicative module leads to significantly more compact code. Some of the big wins in expressivity come from the attention of the authors to surprisingly small details, like the
<$ operators, and their precedences and associativities. These let us write chains of “pointy” functors, where we visually follow the arrows to see what’s really being used.
By the way, the complete name for applicative functors is a mouthful: they’re strong lax monoidal functors. These structures come to us from abstract algebra and category theory. If you’re a functional programmer and you’re not following the work of people like Conor McBride and Jeremy Gibbons, you really should: it’s a whole lot of fun when the seemingly ethereal world of abstract mathematics comes home to roost in Haskell.
Instead of liftA2, you can also use another nice set of Applicative combinators, and .
liftA2 foo x y
foo x y
One of the nice things about this is that it generalizes easily to liftA3, liftA4, … that is,
liftAn foo x1 x2 … xn
foo x1 x2 … xn.
(For those who might care, note that is just a synonym for fmap! And applies a function “inside” an applicative to a value inside another applicative.)
Argh! it ate all my angle brackets! ok, take two, fingers crossed…
Instead of liftA2, you can also use another nice set of Applicative combinators, <$> and <*>.
liftA2 foo x y
foo <$> x <*> y
One of the nice things about this is that it generalizes easily to liftA3, liftA4, â€¦ that is,
liftAn foo x1 x2 â€¦ xn
foo <$> x1 <*> x2 <*> â€¦ <*> xn.
(For those who might care, note that <$> is just a synonym for fmap! And <*> applies a function â€œinsideâ€ an applicative to a value inside another applicative.)
It’s interesting how the point free style of parser combinators are coming back into vogue after being displaced by monadic combinators for a while.
The point free combinators was how I started (as pioneered(?) by Burge in “Recursive Programming Techniques” in the 70s).
Thanks for this example, which I found through Brent Yorgey’s ‘The Typeclassopedia’ article in ‘The Monad Reader’, Issue 13.
For those not too familiar with Parsec (like myself), here are some examples, using Hugs:
ApplicativeParsec> parseTest p_query “name=bryan+o%27sullivan&city=san+francisco”
[(“name”,Just “bryan o’sullivan”),(“city”,Just “san francisco”)]
You need the parseTest function to run the example interactively.
ApplicativeParsec> parseTest p_headers “Subject: RFC2822 or HTTP (or MIME, or what have you\n\n”
[(“Subject”,”RFC2822 or HTTP (or MIME, or what have you”)]
Note the 2 newlines at the end, which terminate the input in this text format.
I replaced the liftA2 in the source code with and , as explained above.
and forgot all about the < and > escape sequences, also as above. Tss,Tss…sorry!
But to parse the non-context free string of n ‘a’ followed by n ‘b’ followed by n ‘c’ characters you do need the monadic structure. Example taken from:
‘Combinator Parsing: A Short Tutorial’ by S.Doaitse Swierstra
p_abc :: CharParser st String
p_abc = do as <- many (char ‘a’)
let n = length as
bs <- pnTimes n (char ‘b’)
cs <- pnTimes n (char ‘c’)
return (as ++ bs ++ cs)
pnTimes :: Int -> CharParser st a -> CharParser st [a]
pnTimes 0 p = return 
pnTimes n p = (:) <$> p <*> pnTimes (n-1) p
ApplicativeParsec> parseTest p_abc “aaabbbccc”
Harkening back to the original premise of this entry, I would like to plug the versatility aspect of the applicatives on functors as opposed to using the monadic machinations.
The problem with using monads and bind comes in the type situation. The monads have to match if outside of bind for the next level monad down, as in this situation:
catMaybes $ [1..12] >>= ((\x -> if x >> return) >>= (liftM (* 8) >>> return)
Using applicative operators along with arrows you get the above with the catMaybe on the inside of the computational unit which is not possible as far as I can see with a single monad construction, without resorting to using the ($) outside of the application. Here is the same thing using the () from Control.Applicative and a new operator that I built, () that just extends down another functor layer:
:: forall a b (f1 :: * -> *) (f :: * -> *).
(Functor f, Functor f1) =>
(a -> b) -> f1 (f a) -> f1 (f b)
*Big3> ((\x -> if x < 5 then Nothing else Just x) ) >>> ((* 8) ) >>> catMaybes $ [1..12]
I will have to delve into Parsec module someday, but been having WAY to much fun, mucking about with other things.
(* 8) that is supposed to be times 8 — eight and came out a smiley on my monitor in my comment above.. oh well.
Very informative article. I like is especially b’coz it has contained all the key points very efficiently. By going through the info, I can see the research behind the information.
Thanks a lot for sharing such a good stuff.