Every language’s standard library has its weak spots. In C, for example, the
stdio functions don’t have a consistent notion of where the
FILE * belongs in the argument list. For
fwrite, it goes at the end; for
fseek, it’s at the beginning. This makes it harder to abstract away the details of the API. Instead of remembering “the
FILE * always goes at the front”, I have to memorise the order in which each specific function takes its arguments. Of such niggles are programmer annoyance born.
Haskell isn’t immune to poor API design. Partial function application is ubiquitous, as is the use of maps (the equivalent of hash tables, dicts, or what have you in imperative languages). Unfortunately, although the
Data.Map API is admirably thorough, it firmly resists partial application.
Yesterday, I wrote a quick program to find out, for a given GHC RPM, what versions of various libraries it’s bundled with. The easiest way to do this is by parsing the output of “
rpm -ql ghc682“, looking for lines like this (library name and version in bold):
This information is easy to pull out using a regexp.
type PkgName = String type PkgVersion = String packageInfo :: FilePath -> Maybe (PkgName, PkgVersion) packageInfo path = do ([_, name, version]:_) <- path =~~ ".*/lib/([^/]+)-([0-9][.0-9]*)" return (name, version)
If I slurp the entire output of
rpm into a single string, I can build a map of name to version quite cleanly. (I’m importing the
Data.Map module under the name
buildMap :: String -> M.Map PkgName PkgVersion buildMap = foldl' updateMap M.empty . map packageInfo . lines
The problem is that
updateMap isn’t as tidy as I’d like, because
Data.Map‘s functions are generally not friendly to partial application.
updateMap :: PkgMap -> Maybe PkgInfo -> PkgMap updateMap map (Just (name, version)) = M.insert name version map updateMap map _ = map
foldl' wants the accumulator (my map) as its first argument, but
M.insert for some reason puts the map argument at the end of its list. As a result, I had to write the above bulky definition for
updateMap to fix up the argument ordering.
Here’s an alternative order of arguments for
insertM :: (Ord a) => M.Map a b -> a -> b -> M.Map a b insertM map key value = M.insert key value map
As you can see, it just moves the map argument to the front. If the API looked like this, my
updateMap function would be considerably shorter. I could get rid of the pattern matching and use
updateMap' m = maybe m (uncurry (insertM m))
It’s a happy accident that
foldl' wants its arguments in what I think of as the “natural” order for partial application. Just so we don’t conflate the two considerations, a comparable example that has nothing to do with folds is
M.findWithDefault, which has the following type.
findWithDefault :: (Ord k) => a -> k -> Map k a -> a
This takes a default value as its first argument, which is returned if the key isn’t present in the map. Viewed through the “does it make sense for partial application?” lens, this is good:
findWithDefault "foo" gives us a function that always returns
"foo" if a lookup fails.
Weirdly, the second argument is the key to look up, not the map to perform the lookup in. So
findWithDefault "foo" "bar" gives me a function that will look for the key
"bar" in a given map. In practice, I’ve found that this is almost never the order of arguments I want. Just about always, I find myself wanting “given this map, look up some key” instead.
There is, to be sure, an element of judgment and experience in deciding on the order of arguments to a function. However, unlike in C, where argument ordering is merely an annoyance, making an API harder to learn, in Haskell this decision has practical consequences: it directly affects how usable your API is.
If you find yourself having to write noddy adapter functions or use
flip too frequently, chances are you’re using an API that didn’t have enough thought put into partial application.
The problem is that lots of folks are going to want the value at the end. Say I want a map with something inserted, but I don’t know what’s inserted until later. Or say I want a foldr, which is a -> b -> b instead if a foldl’ which is a -> b-> a ? Then the map as the first argument makes a great deal more sense. Anyway, looks like a job for a catMaybes in the pipeline rather than messing with the insert function at all, and as a bonus, you don’t even need to write a helper function!
Using foldr to build a map gives you a gigantic space leak. Try it. If you’re lucky, your compiler’s strictness analyser might rescue you from the leak, but I wouldn’t count on it.
Regarding catMaybes, there’s no real difference in the overall shape of the code. It just moves the case analysis to another place in the pipeline.
What if I want to write:
insert k1 v1 $ insert k2 v2 $ insert k3 v3 map
The order for insert is the same used for filter, map & friends, with the container in the end. Searching on the mailing lists there’s quite a few discussions on which order of arguments is the best. Usually the library’s author use cases favor one order while some users prefer a different one. There’s no golden rule, other than following the most usual patterns. For collections it’s usual to place the container last (otherwise it ends different than List).
Foldr should also get you fusion in this case though, no?
In any case, is
buildMap = foldl’ (flip $ uncurry M.insert) M.empty . catMaybes . map packageInfo . lines
really all *that* bad? Especially since catMaybes should get you some fusion as well.
Daniel, filter and map follow my general prescription by placing the less frequently varying argument first.
As for your suggested use of insert, it just doesn’t arise much in practice, so why design the API with that in mind?
Regarding your broader point, you’ll notice that where you say there’s no golden rule, I mentioned the element of judgment and experience. I believe we’re actually somewhat in agreement.
sclv, fusion doesn’t really have anything to do with the space leakage issue. The problem is that using foldr builds up a big unevaluated chain of thunks like so:
M.insert k1 v1 (M.insert k2 v2 (M.insert k3 v3 … M.empty)))
Using foldl’ instead causes the accumulator to be evaluated to WHNF at each step.
Yes, having the map as the final argument to map-“mutating” functions means you’ll have to use a flip now and then to make things foldl’ friendly, but is that really too high a price to pay for logical consistency with the other common container interfaces?
In the case of Data.Map’s interface, you’ll almost never want to reuse a partially applied “mutating” function, such as insert; they are effectively one-time-use calls for each map-key-value triple. (The only common exception is to specialize on the value, such as when using (flip (insertWith (+)) 1) to count keys.) Thus every call is likely to require new values for all three arguments, and no argument ordering is going to be optimal for all scenarios. For foldl’-driven scenarios, having the map as the first argument is certainly convenient, but then again it’s pretty easy to define:
ffoldl’ f = foldl’ (flip f)
and use it whenever you want foldl’ semantics but the accumulator as the 2nd argument. To use your scenario, for example:
buildMap = ffoldl’ (maybe id (uncurry M.insert)) M.empty . map packageInfo . lines
OCaml solves this with labeled arguments; do these interact badly with Haskell’s features? I’m not sure. Perhaps extensible record encodings allow you to fake it somewhat.
Incidentally, “insert” also follows the least-varying-argument-first rule (what I call “de Bruijn-like ordering”). After all, it works by induction on the dictionary data structure. That’s the argument that varies the most.
Putting the ‘target’ element last makes them easier to use with higher order state-manipulating functions like ‘gets’ and ‘modify’. Suppose you have a class of monads with a stack state; then you get mpush = modify . push or in pointy style, mpush x = modify (push x). If you put the target first, you need a flip in both cases.
Just my 2p… (or maybe that should be 1p at today’s exchange rates 😉