Even though I wrote my Haskell blog helper tool purely for my own use, I don’t want to store hard-coded strings in it, lest my username and password escape into the wild.
This suggests that I need a small config file of some kind. I’m going to walk through the parser I wrote for this config file, not as a tutorial, but as an example of how to solve a simple pratical problem in Haskell.
The simplest kind of config file format that’s of any real use tends to look like this:
# Haskell blog config. xmlrpc = http://www.example.com/wordpress/xmlrpc.php editpost = http://www.example.com/wordpress/wp-admin/post.php?action=edit&post= user = blogdude # comments flow to the end of a line password = h8x
From inspection, we can see a few informal rules. Config items are name/value pairs. Empty lines are okay, and comments start with #, spanning to the end of a line.
This is a format that’s easy to parse by hand or with regular expressions, but I lately prefer to use Parsec for these kinds of jobs. Regexps are difficult to read and debug, and often slower than Parsec parsers. Compared to a handwritten parser, a Parsec parser sacrifices a little performance, but I win in considerably faster development, clearer code, and excellent error messages “for free”.
Here follows the boilerplate for the beginning of the module. We’ll
only be exporting two names from
> module ConfigFile ( > Config, > readConfig > ) where
We’ll be needing a few modules.
> import Char > import Control.Monad > import qualified Data.Map as Map > import Text.ParserCombinators.Parsec > import Data.Either > import Data.Maybe
The natural result of parsing a config file, at least in my mind, is a finite map from keys to values, where both are represented as strings.
> type Config = Map.Map String String
In my case, I’ll start with the left-hand-side of a config item, which I’d like to be a “C-like” identifier. The first character should be an alphabetic letter or underscore character. There might be only one character in an identifier, but if there are more, I’ll be expansive, and allow them to be digits, too.
> ident :: Parser String > ident = do c <- letter <|> char '_' > cs <- many (letter <|> digit <|> char '_') > return (c:cs) > <?> "identifier"
This definition provides a clear illustration of how readable Parsec
code can be. (By the way, the
<?> means “this is the name to use
when printing an error message”.)
When I’m developing a parser, I like to work from the bottom up,
starting with simple productions and moving “up the food chain” as I
debug each production. I’ll typically use Parsec’s
function repeatedly from within
ghci to test each production as I
go. Testing interactively from within Emacs makes the process even
more efficient; I can reload a module and start testing it with just a
The result of a successful match is as we’d expect:
*ConfigFile> parseTest ident "ok" "ok"
"([a-aA-Z_]\w*)" in Perl-like regexp notation, the
Parsec description of
ident is much longer. But if a match fails,
we get a useful error message, which is a good reason to prefer
Parsec. (When I use a normal Parsec entry point instead of
parseTest, Parsec will give me a file name in its error message,
*ConfigFile> parseTest ident "7" parse error at (line 1, column 1): unexpected "7" expecting identifier
Comments are easily dealt with.
> comment :: Parser () > comment = do char '#' > skipMany (noneOf "\r\n") > <?> "comment"
We’d like to be agnostic about line endings.
> eol :: Parser () > eol = do oneOf "\n\r" > return () > <?> "end of line"
Now to parse an actual config item.
> item :: Parser (String, String) > item = do key <- ident > skipMany space > char '=' > skipMany space > value <- manyTill anyChar (try eol <|> try comment <|> eof) > return (key, rstrip value) > where rstrip = reverse . dropWhile isSpace . reverse
manyTill combinator builds up a result from each match of the
parser that is its first argument, until the parser that is its second
argument successfully matches. After matching a config item, we
return it as a pair, stripping any trailing white space from the
A line can either be empty, or contain a comment or a config item.
This makes it a good candidate for using the
Maybe class. If we
match a comment, we’ll return
Nothing; if we match a config item, we
Just that item.
> line :: Parser (Maybe (String, String)) > line = do skipMany space > try (comment >> return Nothing) <|> (item >>= return . Just)
skipMany space above will happily consume newlines, so we
don’t need to explicitly check for empty lines. It also consumes
Finally, we need to parse an entire file. This is an example of how
it helps to spend some time browsing the standard Haskell libraries.
In this case, we’ll use
Data.Maybe to turn our list
Maybe values into a list of the
Just config items (think of it
as dropping all of the
Nothing entries from the list, and stripping
Just from every other entry).
> file :: Parser [(String, String)] > file = do lines <- many line > return (catMaybes lines)
readConfig action parses a config file, so it must run in the
IO monad. If the parse fails, it returns a parse error; on success,
it returns a
> readConfig :: SourceName -> IO (Either ParseError Config) > readConfig name = > parseFromFile file name >>= > return . fmap (foldr (uncurry Map.insert) Map.empty . reverse)
This is a dauntingly dense definition. Rather than jumping in to explain it piecewise, let’s first turn the code into something more like “beginner Haskell”.
> readConf2 :: SourceName -> IO (Either ParseError Config) > readConf2 name = > do result <- parseFromFile file name > return $ case result of > Left err -> Left err > Right xs -> Right (listToMap (reverse xs))
Okay! We get the result of a parse; if the parse was an error, we pass the error along unmodified. Otherwise, we turn the list into a map, and return it.
> listToMap :: [(String, String)] -> Config > listToMap ((k,v):xs) = Map.insert k v (listToMap xs) > listToMap  = Map.empty
A Haskell programmer with a little bit of experience will notice that
the above function looks almost like a fold from the right of a list.
The only problem is the
(k,v) pattern match that we used to “pick
apart” the arguments to
Map.insert, so that we could pass it the
three arguments it wants.
Map.insert :: (Ord k) => k -> v -> Map.Map k v -> Map.Map k v
It would be great if
Map.insert took arguments, instead of three.
We can fix this problem using
uncurry Map.insert :: (Ord k) => (k, v) -> Map.Map k v -> Map.Map k v
So now we can rewrite
listToMap in terms of
> listToMap3 :: [(String, String)] -> Config > listToMap3 = foldr (uncurry Map.insert) Map.empty
We can eliminate the
case expression in
readConf2 by noticing that
Either is an instance of Haskell’s
Functor class. This means that
we can use the
fmap function. Calling
fmap f on
Left l will return
Left l, but
fmap f (Right r) will return
Right (f r), which is exactly what we want.
foldr, it’s now easy to go backwards from
the relatively chatty definition of
readConf2 to the more austere
It’s possible to pare
readConfig back even further, so that it’s
entirely point-free, but this renders the code more confusing, at
least to my eyes.
> readConf3 = > (return . fmap (foldr (uncurry Map.insert) Map.empty . reverse) =<<) . > parseFromFile file
I’m in no way claiming that this parser is the be-all and end-all of config file parsers. It doesn’t care if a value is redefined (it will use the last definition); it doesn’t impose any logical structure or constraints on the contents of a file; and it turns everything into strings. But it was quick to write (about an hour, not including the time to write this article); it doesn’t have any dependencies on third-party libraries; and it perfectly fits the needs of my trivial blog helper.