First steps with Haskell text API improvement

Posted in haskell
17 comments on “First steps with Haskell text API improvement
  1. Stephen Blackheath says:

    Bryan – This is great stuff! Thank you for your good works!

    I’m an extreme disliker of helper functions, and yet (contrary to what I said on #haskell) even I am starting to think that ‘strip’ is a good idea. Some observations:

    – I don’t like ‘dropAfter’ because it sounds like it drops everything after the first occurrence of the delimiter (from the left). I thought about ‘dropFinal’ but I’m starting to think ‘dropLast’ might be better because it’s consistent with ‘Prelude.last’.

    – I can’t think of anything better than ‘dropAround’. It seems good in that it is clear and memorable.

    – ‘stripLeft’ and ‘stripRight’ assume that left == start. I realize there is a precedent in Haskell: foldl, etc. However, this is a Unicode library and there are 600 million speakers of Arabic, Farsi and Hebrew world-wide. If Haskell does take over the world in spite of itself it would be nice not to annoy/confuse people.

    Making the names consistent with drop* doesn’t work. ‘head’ and ‘last’ (inspired by Prelude) don’t work either because it sounds like they work on a single character only. stripHeads is just plain clunky. So here’s an idea – how about some new terms Start and End? (They’re nouns – ‘Begin’ is a verb). These are conceptually consistent with ‘head’ and ‘last’:

    heads == start
    lasts == end

    – Instead of ‘strip’ you could consider ‘trim’, which is what Java uses – shorter and possibly clearer.

    Here they all are together:

    dropWhile, trimStart
    dropEnd, trimEnd
    dropAround, trim (or trimAround?)

    Less than perfect, I’m afraid, but hopefully there’s something useful in it. — Steve

  2. Nicolas Pouillard says:

    Great progress, thanks!

    About dropAfter, if I understand well the following equation holds:
    dropAfter p = reverse . dropWhile p . reverse

    If so I propose using the words reverse or backward in the name:
    revDropWhile
    dropWhileBackward

    I’m also in favor of trim{Start,End,}.

  3. This is all looking pretty good. I totally agree on the ubiquity of “chunksOf”. I always end up recreating it by some name, “groupsOf”, “breaklist”, etc etc. I wonder if just “chunk” would be a good name?

    I don’t really follow the logic of Stephen’s argument about Unicode. Surely the left and right in stripLeft and stripRight are referring to the underlying Haskell lists, which are always written x:y:z:[]. Stripping the left means stripping the head elements of the list. If writing Hebrew with Haskell allows one to write []:z:y:x I’d be very surprised!

  4. Mark Wotton says:

    I’d prefer “chomp” to either “strip” or “trim”, if a perlism isn’t considered too filthy…

  5. Arthur van Leeuwen says:

    I know for a fact dropAfter is useful, however, why not name it in accordance to spanEnd and breakEnd in Data.ByteString.Strict, i.e. dropWhileEnd ?

  6. Duncan Coutts says:

    The ‘split’ function was only in the Data.ByteString[.Lazy].Char8 modules, not in Data.List. So there’s no great history or existing standard practice that needs preserving. I’m not sure I’d bother with the Compat module.

  7. Programmer says:

    Chomp? What a horrid name. My vote is for strip or trim, in that order.

  8. brian says:

    stripLeft and stripRight seem fine to me. If they were named stripStart and stripEnd, I’d agree that they were badly named because of the language issue.

  9. Ian Taylor says:

    ‘trimLeft’, ‘trimRight’, ‘trim’ sound good to me.

    I always liked the sound of ‘join’ when dealing with text rather than ‘intercalate’. It goes well with split.

  10. I would like to see strip* included. I write these helpers in every haskell project (as strip, lstrip, rstrip).

    You didn’t mention the split library on hackage.. did you see it, any more good ideas to be harvested from there ?

    Great stuff.

  11. solrize says:

    strip/stripLeft/stripRight are in the spirit of the Python names for those functions, which in turn probably has more in common with Haskell than Perl or Java do. So I’d stay with them.

  12. Greg says:

    chunksOf

    in Ruby this is Enumerable#each_slice.

    # Ruby
    (1..10).each_slice(3) {|a| p a} # [1,2,3] …

    — Haksell
    eachSlice 4 “haskell.org” == [“hask”,”ell.”,”org”]

    I like chunksOf or groupsOf and slicesOf

  13. nbloomf says:

    IIRC the chunksOf function was implemented as groupBy in “On Lisp”.

  14. Keith says:

    I think chunksOf is generally useful enough to be in Data.List. Is is possible (Haskell’ ?) to add functions like this that turn out to be general enough

  15. Stephen Blackheath says:

    I second Arthur van Leeuwen’s “dropWhileEnd” suggestion

  16. Johan Tibell says:

    I agree with Duncan that a Compat module is unnecessary. The number of modules listed at http://hackage.haskell.org/package/text is already rather intimidating. I also wouldn’t bother with splitChar unless it has serious performance benefits.

    I also prefer join to intercalate but I guess that boat already sailed. I don’t remember what the original argument was but if intercalate was chosen over join because of name clashes with monads I think that’s a poor reason since we have namespaces.

    A parting comment: Beware of the potential combinatorial explosion that comes from creating a helper function for every common case rather than relying on composition. Haskell’s lack of keyword arguments makes libraries prone to export lots of fooByBar functions for lots of different “Bars”. If lots of possible “configuration” parameters are absolutely needed for a function consider passing them in a record instead of creating separate functions.

  17. Andrew says:

    Good stuff.

    chunksOf :: Int -> [a] -> [[a]]
    please.

Leave a Reply

Your email address will not be published. Required fields are marked *

*