Subscribe to
Posts
Comments

With Jens Petersen’s blessing, I’ve packaged GHC 6.6.1 for Fedora Extras. If you use FC6, it’s available via yum as of a few days ago. It will be a part of Fedora 7 as soon as that comes out, too.

The upgrade to 6.6.1 necessitated a bump of the release number of the Fedora Gtk2Hs package, too. There are no changes in the new release; it’s just built against the newer compiler release.

A few weeks ago, I spent a little time porting Peter Norvig’s Python code from his article How to Write a Spelling Corrector to Haskell. It’s a concise vehicle for a few Haskelly topics that don’t get much airing in front of a general audience: what idiomatic Haskell looks like, and the dreaded space leak.

Bear in mind as you read this that I’m porting Norvig’s algorithm, not implementing the approach that would be best suited to Haskell (dynamic programming). And I’m trying to do so idiomatically, so that the resulting code looks like decent Haskell. This isn’t an attempt to “golf” the code for either performance or terseness. Neither am I trying to show that either language is better than the other. Whew, enough with the disclaimers.

A Haskell module starts with a series of import statements that make names from other modules available in our namespace. A typical import section is likely to explicitly list every name it imports from a module, as here.

> import Control.Monad (liftM)
> import Data.Char (isAlpha, toLower)
> import Data.List (maximumBy)

There are some good reasons to list every name we import. It’s a helpful piece of self-documentation. By naming everything we import, we can later see which potentially unfamiliar module we’re importing a name from, without needing to search documentation or scratch our heads.

Explicit naming also reduces the likelihood of namespace collisions. It’s extremely common for Haskell modules to export functions with identical names but different types. If we import three modules that each export the name foo, but we only import the name foo from one of those modules, it’s clear to both the compiler and our reader which one we must want to use.

Another good way to avoid namespace collisions while preserving readability is to import all names from a module en masse, but in such a way that we have to qualify those names when we refer to them. Here’s what I mean by this.

> import qualified Data.Map as M

Given the above qualified import, if we want to refer to the name empty as exported by Data.Map, we’ll need to write M.empty.

Now for our first mention of performance. Haskell’s built-in String type is well known to be death to performance, as it’s no more than a linked list of boxed Chars. The effect this has on performance is easily measurable on data sets as small as a few hundred kilobytes, even on a fast CPU. The modern response is to use the newer, much faster ByteString types.

There are two kinds of ByteString. The “lazy” variant follows the Haskell norm of deferring work until it’s actually needed, while the “strict” variant completes its work immediately. To give you an idea of what this means, the lazy ByteString version of the readFile function reads a file in small chunks, on demand, while the strict variant reads the entire file into memory in one go. Strict ByteStrings can beat their lazy cousins on performance metrics, but this is fairly rare. In this case, when I ran some time measurements, they showed lazy beating strict by a factor of two.

> import qualified Data.ByteString.Lazy.Char8 as B

Let’s return to namespace clashes for a moment. The ByteString modules each contain dozens of names that clash with names defined in Prelude, the module that is always implicitly imported into every Haskell module. This is why we import them using the qualified mechanism. (And by the way, switching from lazy to strict ByteStrings throughout this module is a simple matter of deleting the .Lazy string from the imported module name above.)

In his Python spellchecker, Norvig uses a Python defaultdict, in other words a hash table, to hold his model data. Hash tables aren’t suitable for use in pure functional code, so we use a Haskell Map instead.

> type Model = M.Map B.ByteString Int

Norvig’s words function has a similar form in Haskell.

> -- def words(text): return re.findall('[a-z]+', text.lower()) 

> mkWords :: B.ByteString -> [B.ByteString]

> mkWords = filter (not . B.null) . B.splitWith (not . isAlpha) . B.map toLower

Our mkWords is a composed pipeline of functions, which we read from right to left.

  • B.map toLower returns a lower-cased copy of its input string.
  • B.splitWith (not . isAlpha) returns a list of strings, split everywhere that isAlpha returns False.
  • filter (not . B.null) eliminates any empty strings from the result of the call to B.splitWith.

The mkWords function is written in “point free” style, meaning that although it takes an argument, the argument isn’t referred to in the function definition. (To make matters confusing, the word “point” in “point free” really means “value”, not any kind of punctuation symbol.) Point free style can make one-line functions clearer, but it makes longer definitions much harder to understand and modify.

Notice that my mkWords function has a type signature. None of the type signatures in this module is actually necessary; the compiler can infer correct types for every value without ambiguity. But each one makes the code more readable to me, which is a huge benefit. If I were trying to golf this code, I’d have killed off all of the type signatures, and along with it about half of the code’s grokkability. For example, I’d have to work out for myself that the point-free function definition above is in fact a function of only one argument, and what its type is.

Where there’s a clear similarity between the Python and Haskell words functions, the train function has a radically different form in Haskell.

> -- def train(features):
> --     model = collections.defaultdict(lambda: 1)
> --     for f in features:
> --         model[f] += 1
> --     return model

> train :: [B.ByteString] -> Model

> train = foldl' updateMap M.empty
>     where updateMap model word = M.insertWith' (+) word 1 model

The observation here is that building the model is actually a “reduction” or “fold” operation: we start with an empty model, and fold over the input. Each fold/reduction step constructs a new model from the previous model and the next piece of input.

The Python code is a fold, too; there’s just not a clean way to write a fold that augments a dict in Python. So instead we have a loop that somewhat obfuscates the fact that we’re folding.

Folding isn’t an especially hard thing for neophytes to grasp, but here lurk a few demons. It is in the definition of train that Haskell’s laziness holds the power to bite us, in the form of the dreaded space leak. There are two potential pitfalls here, one of which nearly bit me as I was writing this article.

Our first pitfall is that there are two ways we can fold over a list. We can consume it from left to right, using foldl', or from right to left, using foldr. The choice of which order to use is not at all arbitrary.

The strict left-to-right fold is best for cases when an intermediate result is of no use to our caller. This is one of these cases: the caller will use the final complete Model; it can’t see or do anything with incremental pieces of it.

Note that if we didn’t use the strict left-to-right fold foldl', but instead used the lazy variant (foldl) we’d potentially get a space leak. Instead of constructing a new value for the next step in the fold, each step would construct a lazily deferred expression (called a thunk) that wouldn’t be evaluated until the entire fold had completed. As a result, we’d construct and return a huge thicket of thunks instead of a single tidy value.

A lazy right-to-left fold is frequently used in lazy functional programs, as it’s suitable for a caller that will lazily consume a result. However, if the caller needs a complete result (as in our case), foldr will generate a pile of thunks as it recurses down its input list, and they won’t be consumed until foldr reaches the end of the list.

Understanding when to use foldr or foldl' isn’t actually difficult; it’s just not something that newcomers to Haskell will naturally think about. Worse, the compiler can hide some of the effects of the choice of fold under some circumstances. When I first wrote this article, I was using foldl', but there seemed to be no difference in performance or space usage compared to foldl (lazy), so I switched to that. GHC’s strictness analysis was preventing me from noticing the problem with foldl at high optimisation levels, because it turned the lazy fold into a strict one for me.

Our second potential problem is that the choice of fold direction isn’t the only possible source of a space leak above. The expression M.insertWith' (+) word 1 model does one of two things when adding a key to the Model map.

  • If no such key exists in the map, it adds a new entry, with a key of word and value of 1.
  • If the key is already present, it creates a new map with the entry’s value updated, incremented by one.

The potential problem here is that in the “update an existing key” case, nothing is immediately going to inspect the new value. Thus a Haskell implementation will construct a thunk to defer its evaluation. Repeated updates of the same entry will thus construct a series of thunks of the form 1 + 1 + 1 + ..., instead of a single Int, and this pile of thunks won’t be evaluated until the first time someone tries to use that value.

Here, we’ve avoided this possible space leak by using the M.insertWith' function, which is deliberately strict: it forces the new value to be evaluated immediately. Haskell functions that are stricter than the default usually have a tick (') character at the ends of their names.

We now have two examples—in a simple-looking two-line function!—to demonstrate that avoiding space leaks in Haskell can be a subtle business for the uninitiated.

Space leaks are bad not merely for the memory they consume, but also because of their time effects. Build a big data structure in a leaky way, and the garbage collector will expend a lot of futile cycles traversing it.

And now, on we go to our next stop. The edits1 function in Python generates a set of all single-character edits of a word.

> -- def edits1(word):
> --     n = len(word)
> --     return set([word[0:i]+word[i+1:] for i in range(n)] +
> --                [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)]
> --                [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet]
> --                [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])

> edits1 :: B.ByteString -> [B.ByteString]

> edits1 word = concat
>     [[h `B.append` B.tail t | (h, t) <- hts'],
>      [B.concat [h, (B.take 1 . B.drop 1) t, B.take 1 t, B.drop 2 t] | (h, t) <- hts],
>      [B.concat [h, B.singleton c, B.tail t] | (h, t) <- hts', c <- alpha],
>      [B.concat [h, B.singleton c, t] | (h, t) <- hts, c <- alpha]]
>   where hts = [B.splitAt n word | n <- [0..fromIntegral len]]
>         hts' = take (len - 1) hts
>         len = fromIntegral (B.length word)
>         alpha = ['a'..'z']

If you find the above Haskell code less readable than its Python counterpart, this is not especially unusual. String manipulation code looks particularly clean in Python, at least to my eyes, while the corresponding Haskell is often more clunky.

The Haskell version of edits1 dispenses with sets, instead returning a list of strings. This list can contain duplicate entries, while the set returned by the Python version cannot. It’s surprisingly expensive to try to eliminate the duplicates in Haskell, but having them present has no effect on correctness, as far as I can tell.

Our penultimate stop is the correct function. The Python version uses the short-circuiting or operator to avoid the potentially expensive work of searching for single-character edits if the word is already present in the model, and it avoids the even more expensive search for two-character edits if it finds a suitable single-character edit.

> -- def known_edits2(word):
> --     return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

> -- def known(words): return set(w for w in words if w in NWORDS)

> -- def correct(word):
> --     candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
> --     return max(candidates, key=lambda w: NWORDS[w])

Our Haskell counterpart looks horribly inefficient by comparison. Oh noes! It constructs a complete list of all single- and two-character edits!

> correct :: Model -> B.ByteString -> B.ByteString

> correct model word = (maximumBy cmpFreq . head . filter (not . null))
>                      (map known [[word], edits1 word, edits2 word] ++ [[word]])
>     where known = filter (`M.member` model)
>           edits2 = concatMap edits1 . edits1
>           findFreq word = M.findWithDefault 1 word model
>           cmpFreq a b = compare (findFreq a) (findFreq b)

However, lazy evaluation works to our advantage here. What actually happens is that the call to head causes only the first non-null candidate (from filter) to be consumed. The construction of the rest of the list is deferred as a thunk that will never subsequently be evaluated. We thus get the short-circuiting behaviour of Python’s or operator for free: our expression could compute all of these expensive matches, but it doesn’t actually do so unless we really need them.

Finally, we separate the reading of data and interaction from our referentially transparent code above with a little bit of “lifting”. Our model construction and correction functions are completely pure; they have no side effects. But we need to read data from a file, which is an effectful operation. The liftM function takes the pure code on the left, and hooks it up to the impure action on the right, so that the result of the pure code is available within the impure IO monad.

> readModel :: FilePath -> IO Model

> readModel name = (train . mkWords) `liftM` B.readFile name

> main :: IO ()

(Hark back to the discussion of lazy ByteStrings earlier. The B.readFile function reads the entire model in small chunks, so the memory usage of this function is proportional to the size of the model we construct, with a constant factor for I/O that’s independent of the size of the input file.)

> main = do
>     model <- readModel "big.txt"
>     interact (unlines . map (B.unpack . correct model . B.pack) . lines)

The definition of main tersely says that we should construct our model, then use it to correct words fed to us on stdin, one per line. The argument to interact is once again a pure function; interact handles all the dirty work of reading input and writing output.

That’s us about done. Separation between pure and impure code; space leaks partly demystified; and no cheerleading.

I just released version 0.1 of FileManip, a Haskell library I put together to make it easier to futz about with files in the filesystem.

There are a few different components to the package.

The Find module lets you search the filesystem for files, after the manner of the Unix find program. It provides a nice embedded language for building filters and controlling recursion, so you can write readable expressions like this:

find always (fileType ==? RegularFile &&? extension ==? ".hs") "myPath"

This will return a list of all Haskell source files under myPath. The list is built lazily, so if you have a million files in your tree, but you only examine the first ten elements of the list, no extra work is done.

The obvious counterparts to the Find module in other languages would be Perl’s File::Find module, and Python’s os.walk function.

The Manip module provides some handy functions for manipulating files. For example, renameWith provides procedural renaming: given a file name, it applies a function to it, then renames the file to that name. Here’s how you might use it to change your naming convention for C++ source files:

find always (extension ==? ".cpp") >>=
    mapM_ (renameWith (replaceExtension "C"))

The modifyInPlace and modifyWithBackup functions edit a file in place (the latter saves the original copy to a procedurally-renamed backup file), so that you can more easily write sed-like expressions:

modifyInPlace (map toUpper) "shouting.txt"

Least interesting, but nonetheless useful, is GlobPattern, a glob pattern matching module. I wrote this because Haskell doesn’t provide a standard one, and the only other one I know of (in the MissingH library) goes through the regexp machinery. Ew.

Online docs are published here. The source tarball is available here, at HackageDB, the Haskell package database. There’s also a Darcs repository available here:

darcs get http://darcs.serpentine.com/filemanip/

Enjoy!

I don’t intend for this blog to become a dumping ground for code snippets, but sometimes the temptation is too strong. Here’s a simple but fast function for computing a factorial, using the binary splitting algorithm from this page.

factorial :: Integral a => a -> a

factorial n = split n 0
where split a b = let d = a - b
          in if d < 0
             then 0
             else case d of
                0 -> 1
                1 -> a
                2 -> a * (a - 1)
                3 -> a * (a - 1) * (a - 2)
                _ -> let m = (a + b) `div` 2
                     in split a m * split m b

Simon Peyton Jones will be giving two Haskell-related talks at OSCON in July! As far as I know, this will be the first time that Haskell gets an airing at a general-interest conference. Simon is a fantastic speaker, so if you were going to OSCON anyway, you absolutely should not miss his talks.

One of Simon’s talks is to be a three-hour Haskell tutorial. It’s explicitly not aimed at “pointy-headed academics”, so if you like to get practical things done quickly and safely, this should be just the job.

The other talk will be about nested data parallelism, which is a long-dormant subject that holds great promise. The idea here is that we’d like to be able to write parallel programs that are composable and modular, but that retain good performance characteristics, while not doing great violence to our source as we attempt to scale.

This is a hard problem, as it requires solving a number of not obviously related subproblems. A close imperative-land approximation I’ve seen to the work that the NPH people are doing is Charm++, which is itself very cool. And you can see from how long the Charm++ has been around that this isn’t a short-term campaign.

This year, the Haskell.org community has had nine projects funded by Google, out of 64 student applications. We had mentoring capacity for over 20 projects, so the final number was decided by Google, not by our ability to deal with them. Congratulations to those who were accepted! And to those who were not, take heart; the competition was tight.

Most of this year’s successful applications are centered around library and infrastructure improvements, so I’m hopeful that we’ll see quite a lot of generally useful code come out at the end of this summer.

One Haskell project is being sponsored by Portland State University: Spencer Janssen’s XHSB: an X binding for Haskell, based on XCB, mentored by James Sharp. So really, we have ten projects this year, not nine.

We plan to have at least one “backup” mentor for each project.

Malcolm Wallace has done a sterling job so far as the coordinator for Haskell.org. Thanks, Malcolm!

I recently bought a Thinkpad X60 that came with Windows preinstalled. I never actually use Windows, but it’s convenient to have around for the occasional BIOS update.

My Linux distribution of choice has long been Fedora. Unfortunately, unlike other modern Linux distros, Fedora doesn’t provide a simple graphical means to shrink an NTFS partition in its installer.

Since the X60 doesn’t have an optical drive, and I don’t have any USB storage like a memory stick or CD-ROM drive, I couldn’t easily boot into something like Knoppix. If I’d had such gear available, I could easily have shrunk the Windows partition using qtparted. So here are my notes on how to get dual-booting working, by hand, in a bare-bones environment.

Really, if there’s some way you can use a sane all-in-one GUI tool like qtparted to do all of this messy work, you should. You should think of these as the instructions of last resort for the terminally optimistic.

Since I had no suitable media handy, I installed Fedora onto the X60 over the network using PXE. (Getting a PXE server configured sounds scary, but I had one up and serving in less than five minutes on my older laptop. Granted, I’ve done this many times.)

When the Fedora installer (Anaconda) is running, it runs you through a series of questions in text mode, and then starts up an X-based GUI. Don’t run through any of the GUI setup steps yet. Behind the scenes, Anaconda also sets up some virtual terminals, one of which contains a root shell prompt. That root prompt is the key to shrinking your Windows partition.

To get to the virtual terminal with the root prompt, you’ll first need to press the key combination of Control+Alt+F1 to switch to the first virtual terminal. After that, you can press Alt+F2 to get to the second, Alt+F3 the third (I think this is the one where the root prompt lives), and so on. Alt+F6 will bring you back to the GUI.

Once you have a root prompt, you’ll need to find another system, such as the one you’re installing from, and download an ntfsprogs source tarball. Unpack it, and build static binaries. (You must have static binaries, because the install environment doesn’t provide shared libraries.)

tar zxf ntfsprogs-1.13.1.tar.gz
cd ntfsprogs-1.13.1
./configure --enable-really-static
make

(Don’t forget to make sure that you’re building binaries that can actually run on the machine you’re installing on! If you’re installing an x86_64 version of Fedora, for example, it doesn’t matter whether you build a 32-bit or 64-bit version of ntfsprogs. But the converse is not the case.)

Once the build has finished, find the ntfsresize program in the ntfsprogs subdirectory. This is what you’ll need to get onto the machine you’re installing on.

Go back to the root shell on the system you’re installing onto. Recent versions of Anaconda make ssh and scp commands available from the shell, so you can simply scp the ntfsresize program from the system where you built it into /tmp on the machine you’re installing on.

Before I continue, I must point out what should already be obvious: if you are foolhardy enough to follow any directions after this point, you could completely destroy all data on your disk. Proceed at your own risk! If you have any important data on the system, back it up.

You’ll need to use the fdisk command to find out the name of your Windows partition. If you have a newish machine, it probably contains a SATA drive, so you’ll probably need to invoke it thus:

fdisk /dev/sda

Just type p to print the partition table, and note the name of the big partition (there’s probably only one) that’s marked as HPFS/NTFS. Once you’ve done that, type q to quit out of fdisk.

Let’s assume that /dev/sda1 is your NTFS partition. The next thing to do is use ntfsresize to see what size it is:

/tmp/ntfsresize -i /dev/sda1

This will run for a few seconds, and spit out lots of information. The pertinent line to look for in the output begins with “You might resize at …”. Record that number: it’s the minimum you’re likely to get away with.

Now try a dry run of resizing. Let’s say you’ve chosen ten gigabytes as your target Windows partition size.

/tmp/ntfsresize -n -s 10G

Again, ntfsresize will run for a few seconds, and print lots of information. If it prints an error message, don’t try to follow any directions beyond this point. You’re on your own.

However, if the dry run completes successfully, you ought to be safe to resize the partition. This is the first point at which you can destroy things, so it’s not for the faint of heart. Simply run the same command as before, only without the -n option.

/tmp/ntfsresize -s 10G

This will take a few minutes to run. If it exits with an error message, this might be a good time to take up religion and prayer, or at least to look for your Windows install CD. That is, if your system shipped with one.

Assuming ntfsresize did its job, the next step is even more scary: you’ll have to fiddle with the disk’s partition table so that the NTFS partition’s size matches the size of the actual filesystem. Messing with the partition table is even more scary than resizing filesystems, so dust off that rabbit’s foot your well-meaning aunt gave you.

It’s time to run fdisk again, this time with prejudice.

fdisk /dev/sda

Once again, type p to print the partition table. This time, make a note of the start offset of the NTFS partition.

Next, you’ll need to delete that partition table entry, with d. It will ask you for the number of the partition to delete; if your NTFS partition is on /dev/sda1, you’ll be deleting partition 1.

With that partition table entry gone, you’ll be creating a new, smaller entry to replace it. To get started on this process, use n.

  1. fdisk will ask you whether to create a primary or extended partition. Choose p for primary.
  2. It will prompt you for the start offset of the partition. You must use the same start offset as the old partition had; this is likely to be the default.
  3. When fdisk asks you for an end offset, give it a relative offset that’s the exact same size as you gave earlier to ntfsresize. For example, if you resized to ten gigabytes, type +10G as the end offset.

The new shrunken partition you created will not have an NTFS system ID. Use t from the main fdisk prompt to change its ID. The number to change it to is 7.

Once you’re done, use p to take a look at the new partition table. This won’t actually be written to disk yet. Once you’re sure it looks correct (has the same start offset as before, the correct smaller size, and the right system ID), use w to write out the new partition table and quit fdisk.

In case I haven’t been successful in stating the risks of using fdisk by hand, here are a few charming possibilities for you to consider.

  • fdisk will make no attempt to ensure that your partition table is correct or consistent. Neither, most likely, will your operating system. It’s thus quite possible to accidentally cause your partitions to overlap. This can result in data corruption problems that might not surface until months later.
  • If you hork your partition table badly enough, it can be a pain to subsequently install anything on the system at all. I’ve been forced once or twice to use dd to splatter zeroes all over a partition table in order to get it back to a state where I could do anything with the disk.
  • Don’t say I didn’t warn you!

At this point, you can return to the Anaconda GUI installer by typing Alt+F6. Step through a few install screens. When Anaconda reaches the point of asking you about partitioning, you might be okay to let it partition automatically for you; I haven’t tried this, so I don’t know. I simply go with the custom disk layout option instead, and find to my satisfaction that I now have a big chunk of empty space to fill with swap and ext3 partitions. As you continue through the installation process, Anaconda should automatically set up the boot loader to let you boot into Windows or Linux.

One final thing to bear in mind: the next time you boot into Windows after your install is complete, Windows will probably run chkdsk during boot, to verify that its filesystem is still sane.

From diffbavis on #haskell:
> sum (map ord "al gore")
666

Someone showed up on #haskell yesterday, asking how to use regular expressions. This isn’t a completely straightforward question to answer. While Haskell’s regexp libraries provide the same functionality you’ll find in Perl, Python, and Java, they provide a rich and fairly abstract interface that can be daunting to newcomers.

So let’s fix that, and strip away the abstractions to show how we might actually use regexps in practice. I’m assuming that you’re already familiar with regexps from using them in some other language, and simply want to find your feet in Haskell. The standard library that implements regexps is Text.Regex.Posix. As the name suggests, this wraps the system’s native POSIX extended regexp library.

If you’re coming from a language like Perl, Python, or Java, you may not have encountered POSIX regular expressions before. Perl-style regexps are much more expressive, and hence far more widely used. POSIX regexps look superficially similar to Perl-style regexps, but they’re not as expressive, and they have different matching behaviour. There aren’t any concise online descriptions of the syntactic and semantic differences between the two regexp families, but the Wikipedia entry on regexps will help you to understand some of the differences. The only advantage of the POSIX regexp library is that it’s bundled with GHC; you don’t need to fetch any extra bits to get it to work.

Before we continue, let’s start up GHC’s interpreter, ghci.

  $ ghci
  Loading package base ... linking ... done.

To use the Text.Regex.Posix module, we use the :mod (or simply :m) command to load it and bring it into scope.

  Prelude> :mod +Text.Regex.Posix
  Prelude Text.Regex.Posix>

The simplest way to use regexps is via the “=~” function, which we normally use as an infix operator. And, of course, it’s exactly the same operator that Perl uses for regexp matching.

What’s interesting is that this function is polymorphic in both its arguments and its return type; this is why its documentation can be hard to read if you’re new to the language. Here’s a simplified type signature, to give you the idea.

  (=~) :: string -> pattern -> result

(I’ve left out the constraints on these type variables. The real type signature of =~ is huge, and only obscures early understanding of what on earth is going on, so I’m not going to reproduce it here.)

You can use either a normal Haskell string or a much more efficient ByteString as the pattern or string. You can use them in whatever combination you like; a String for one, a ByteString for the other, or whatever suits your needs.

If you try to use =~ interactively at the ghci prompt, ghci gives a fearsome error message.

  > "bar" =~ "(foo|bar)"

  <interactive>:1:0:
  No instance for (Text.Regex.Base.RegexLike.RegexContext Regex
                              [Char]
                              result)
    arising from use of `=~' at <interactive>:1:0-19
  Possible fix:
    add an instance declaration for
    (Text.Regex.Base.RegexLike.RegexContext Regex [Char] result)
  In the expression: "bar" =~ "(foo|bar)"
  In the definition of `it': it = "bar" =~ "(foo|bar)"

Yikes! What’s happened here is that we haven’t given a type for the result of the =~ operator. Since the result type is polymorphic, ghci has no way to infer what type we might actually want. We can easily fix this, by suffixing the expression with a type signature.

  > "bar" =~ "(foo|bar)" :: Bool
  True

  > "quux" =~ "(foo|bar)" :: Bool
  False

By constraining the type of the result to Bool, we get a simple “yes” or “no” answer when we ask if the match has succeeded.

But we can also use String as the result type, which gives us the first string that matches, or an empty string if the match fails.

  > let pat = "(foo[a-z]*bar|quux)"

  > "foodiebar, fooquuxbar" =~ pat :: String
  "foodiebar"

  > "nomatch" =~ pat :: String
  ""

If we use [String], we get a list of every string that matches.

  > "foodiebar, fooquuxbar" =~ pat :: String
  ["foodiebar","fooquuxbar"]

It’s also possible to get more detail about the context in which a match occurred. The 3-tuple in this result consists of the text before a match; the matched text; and the text after the match.

  > "before foodiebar after" =~ pat :: (String,String,String)
  ("before ","foodiebar"," after")

The 4-tuple below adds the text of all subgroups of the match.

  > "before foodiebar after" =~ pat :: (String,String,String,[String])
  ("before ","foodiebar"," after",["foodiebar"])

But wait, there’s more! You can get plain, simple offset information, either singly or in a list.

  > :mod +Text.Regex.Base

  > "the foodiebar" =~ pat :: (MatchOffset,MatchLength)
  (4,9)

  > "no match" =~ pat :: [(MatchOffset,MatchLength)]
  []

All of these different result types are instances of the RegexContext type class. By this point, you should have enough examples that you can read the complete documentation for the other RegexContext instances at Text.Regex.Base.Context without it seeming completely overwhelming. There are many more instances, some of which give you a lot of detailed information about a match.

There’s a monadic variant of the =~ operator, called =~~. You can use this to perform matches inside a monad, for example as follows. This binds a to the number of matches.

  > a <- "foo" =~~ "foo":: IO Int
  1

You can also use the =~~ operator outside of a monad in some cases. For example, you can try a match in the Maybe monad to tidily handle the possibility of a failure.

  > "foo" =~~ "bar":: Maybe String
  Nothing

  > "foo" =~~ "foo" :: Maybe String
  Just "foo"

Here’s an interesting pitfall to watch out for. There’s a small chance you could shoot yourself in the foot if you use a list as a RegexContext. Let me show you what I mean. This expression returns a list of all matches, which is what I’d normally expect.

  > "foo foo foo" =~ "foo" :: [String]
  ["foo","foo","foo"]

But this expression, which differs in only one character, runs the match in the list monad! It’s only ever going to return an empty or single-entry list. Eeek!

  > "foo foo foo" =~~ "foo" :: [String]
  ["foo"]

There’s normally no need to compile regular expression patterns. A pattern will be compiled the first time it’s used, and your Haskell runtime should memoise the compiled representation for you.

This is necessarily only a basic introduction to the Haskell regexp API. In practice, you will want to avoid the Text.Regex.Posix implementation; it’s terribly slow, and too strict besides. When I have time, I’ll talk about these problems in more detail, and what alternatives you can use. Until then, happy regexp matching!

I spent a while this evening reading through the documentation for the beta release of NVIDIA’s CUDA GPGPU system. My motivation for this was that nvcc, the CUDA compiler, is based on a code drop of the EkoPath compiler, which I’ve worked on intermittently over the past few years.

The programming model that these GPUs enforce is incredibly complex. It’s more than a little reminiscent of the Connection Machine (for a blast from the past, see a collection of scanned CM-5 documentation.

The idea is that the GPU executes a “kernel” of compute-intensive, highly parallelisable, code on behalf of the CPU. Data is transferred to the GPU when a kernel starts to execute, and back to the host when it completes. The GPU may execute multiple kernels simultaneously, if it is capable of it.

A kernel is executed as a collection of threads, with threads organised into “blocks”. The blocks are further organised into “grids”. Each thread has thread-local storage, and a block of threads shares a chunk read-write memory. This is how they communicate and synchronise with each other. Threads can also access the GPU’s global memory, although blocks of threads cannot synchronise with each other via global memory.

Global memory has a pile of constraints. There’s not one single kind of global memory; instead, there are three. Unlike the thread-local and block-shared memories, global memories can be accessed by the host (although accesses are expensive).

  • Plain old read-write memory. Accesses to this memory are not cached. When a thread block is scheduled to execute, and accesses global memory, its performance is heavily penalised if threads don’t access global memory in a suitable pattern (see section 6.1.2.1 of the Programming Guide).

  • “Constant” memory is read-only, and reads are cached. Again, there are restrictions on access patterns to constant memory that affect performance.

  • “Texture” memory is also read-only, and cached. Cached texture data is laid out to give best performance for 2D access patterns.

Block-shared memory isn’t immune to peculiar performance constraints, either. It’s organised into “banks”, to allow parallel access to multiple banks in a single operation. In other words, n threads can access n banks in one cycle. However, concurrent accesses to a single bank are forcibly serialised, so n threads hitting a single bank will take n cycles.

So at every level of the GPU memory hierarchy, there are multiple factors that you have to keep in mind if you want to achieve good performance.

The extensions that NVIDIA has made to the C language are fairly minimal. There are some new keywords to control layout of data, to determine where variables should live in the memory hierarchy. Not surprisingly, there are new vector types that look very much like OpenGL vector types.

A few similar keywords control function layout. Functions can be marked as device accessible only, resident on the device but called from the host (i.e. a kernel), or host-only. A weird piece of syntax `<<< Dg, Db, Ns >>>’ is required when calling a kernel, to control the execution parameters for the kernel: grid size, block size (and number of threads), and memory allocation.

With hardware of this complexity, a good optimising compiler can make a substantial difference to application performance. The heritage of NVCC is unimpeachable. It’s based on the EkoPath compiler, which has been the most frequent compiler used for SPEC benchmark submissions by AMD64 hardware vendors for several years. The EkoPath compiler was in turn based on Open64, a compiler that SGI GPLed when it dropped out of the compiler business.

Among the memory hierarchy related optimisations in the Open64 compiler family that are available to NVCC are the following (this is just a short list of highlights; the real number is big). I don’t know how many of these are enabled in NVCC, nor do I know enough about individual cache sizes or miss penalties to have a clue as to which ones are likely to be effective.

  • Loop nest optimisation. For a set of nested loops, this can change the order in which the inner and outer loops are executed, to improve the pattern of access to memory.

  • Vectorised intrinsics. If application code is, for example, computing sin(x[i]) for all i in a vector, the compiler can replace this with a single call to a highly optimised sin specialised for vectors.

  • Cache blocking. Replace a single loop over a large vector with smaller loops that operate on cache-sized chunks of the vector.

Given the degree of manual control over variable placement that NVIDIA’s C extensions seem to enforce, it’s not clear to me that their compiler team has had a chance to automate any of the transfer of data between levels of the memory hierarchy yet.

I also find it telling that the programmer’s guide includes specific guidelines on how to avoid bank conflicts in block-shared memory, where in at least some of these cases it’s clear that the compiler could be automating the job.

It’s worth reading the programmer’s guide in its entirety to get a sense of just how complex CUDA is, and how many different constraints the determined application programmer will have to keep in mind at a time. There’s not a lot of abstraction going on here (vendor-provided BLAS and FFT libraries notwithstanding).

We have plenty of previous examples of hardware that failed to live up to their early marketing promise, from the i860 to the PS3. CUDA looks set to follow in their footsteps: I expect that it will take vast amounts of work for programmers to get halfway decent performance out of a CUDA application, and that few will achieve more than 10% of theoretical peak performance.

People with the expertise, persistence, and bloody-mindedness to keep slogging away will undoubtedly see phenomenal speedups for some application kernels. I’m sure that the DOE and NSA, in particular, are drooling over this stuff, as are the quants on Wall Street. But those groups have a tolerance for pain that is fairly unique. This technology is a long way from anything like true accessibility, even to those already versed with parallel programming using environments like MPI or OpenMP. Still, it’s a great first step.

“Now, imagine a Beowulf cluster of these!”

« Prev - Next »