Subscribe to
Posts
Comments
Marc Ambinder comments on the peculiar cadence of the jacket blurbs for Douglas Feith’s new book, to which some wag responds with a suggestion for a similar endorsement.
Feith’s book is perfectly rectangular. Its page numbers progress in a pleasing upward sequence. Its evident shortcomings in terms of accuracy are offset by its usefulness in balancing wobbly furniture.
Snipped from a small mailing list.
You can use fruit and vegetables’ as a voltaic battery but they could only power something like a small LED bulb, no way would there be enough power to charge an iPod :)

Depends on how many you use… you need to have enough in series to give a high enough voltage, and then a heap in parallel for more current.

This reminds me my “electric snail fence” when I was a child… I was making batteries using lemons, brass screws, and nails to power a small electric fence using a pair of copper wires wrapped around matchstick “fenceposts” to keep snails in an enclosure (I was studying them).

At first, a single lemon seemed to be enough to discourage them from attempting to cross both wires… on touching both they would quickly retract and turn back.

However, snails kept leaving my enclosure… somehow when I was not looking they seemed to be able to cross the fence. So I started bumping up the voltage with more lemons in series. This clearly had an impact on the snails… now when they touched both wires of the fence they would rapidly fully retract into their shells and take some time before they even considered moving again.

But somehow, they still kept leaving… after adding more and more and more lemons and having no success, I decided to pull out the big guns and swapped the lemons for a 9V battery. Now the snails seemed to nearly have a heart-attack when they touched the fence; they would snap back into their shells and spew out a green froth, taking ages to work up the courage to come out again.

But they still kept leaving the enclosure… I was at a loss trying to figure out how these cunning snails were defeating my high-powered fence when I was not looking… and then I noticed all the bird tracks inside my fence.

I’ve been following Ian Lance Taylor’s updates on the status of gold, the new binutils linker, for a while, so when he announced that he’d added it to the binutils tree, I decided to make a little time to try it out.

I have a fairly large C++ application handy, so I tried linking it on my dual core Thinkpad with 2GB of memory, running Fedora 8.

When I compile this application with debugging symbols enabled, it’s not exactly fast to link: it takes about a minute. Because the amount of data the linker has to chew through is impressively large, the actual time varies by up to 20% from one run to the next: all of those object files don’t fit into memory at once. This also has the unpleasant effect of paging out parts of apps I’m trying to use, such as Emacs and Firefox. Hmph.

In this situation, gold does indeed improve linking performance, bringing the linking time down to about 40 seconds. The poor laptop’s disk still spends a lot of this time grinding away due to the sheer volume of data, so once again the number isn’t very repeatable.

To eliminate the effect of memory pressure, I used “strip -g” to drop the debug information from the main app’s object files. (The libraries it links against still contain debug info, but they’re much smaller than the app itself.)

Stripping the object files speeds up the regular binutils linker by a factor of four, reducing linking time to 14 seconds. The laptop’s disk is no longer being touched. I can still edit text files after a link without big initial stutters. Whew!

When I switched to using gold as the linker, I was at first a little surprised to find that it actually works at all. This isn’t especially common for a complicated program that’s just been committed to a source tree. Better yet, it’s as fast as Ian claims: my app now links in 2.6 seconds, almost 5.4 times faster than with the old binutils linker!

When I ran “ld –help”, I noticed the tantalising options “–threads” and “–thread-count”, so I tried those. It turns out that thread support is not enabled by default: I reconfigured, rebuilt, and tried again. Running with “–thread-count 2″, the linking time drops slightly, to 2.3 seconds: that’s nearly 6.1 times faster than before.

Does the application actually work when linked by gold? Yes, so far with no signs of problems. It’s about 3% larger than the version produced by the old linker. I haven’t looked into why this might be.

Is gold ready for prime time? It’s not too far off. It doesn’t support the –build-id flag, which my copy of gcc 4.3 is convinced it wants (perhaps because I’m running Fedora). It doesn’t link the Linux kernel properly, because the kernel tortures the linker in twisted and innovative ways. Also, I have run into a few non-reproducible crashes when running in threaded mode, such as the following:

internal error in find_runnable_or_wait, at ../../gold/workqueue.cc:262

Overall, though, I’m very impressed, and I’m looking forward to the point where gold entirely supplants the existing binutils linker. I expect that won’t take too long, once Mozilla and KDE developers find out about the performance boost. Great work, Ian!

For a little while, I’ve been curious about which of the packages people in the vast wasteland of CPAN actually use and care about. Here’s an attempt to answer that question with fifty popular Perl packages for your entertainment.

Before we begin, a brief overview of the column layout.

  • The mean vote from CPAN ratings.
  • The number of votes for that package registered at CPAN ratings.
  • The number of “vote” entries registered at the Debian popularity contest. This is approximately the number of people who currently use a package.

My ranking attempts to balance people’s votes against a package’s practical utility using a metric composed of equal parts baffling complexity and utter nonsense.

4.7  29  20962  DBI
5.0   2  45321  Text-Iconv
5.0   6  11507  HTML-Tree
4.8   5  14372  HTML-Parser
5.0   2  15132  Gtk2
4.8   4   9125  XML-Twig
5.0   1  17281  URI
4.8  10   1880  XML-Simple
4.4  18   2295  HTML-Template
4.7   3   5679  Net-DNS
5.0   4   2113  BerkeleyDB
4.0   4  19845  DBD-mysql
5.0   3   2586  Error
4.3   3   9365  TimeDate
5.0   3   2134  Crypt-SSLeay
4.4   9   2324  XML-LibXML
4.7   3   3464  XML-SAX
4.8  12    654  WWW-Mechanize
4.5   2   6297  IO-Zlib
4.8   5   1321  Config-IniFiles
4.6   8    796  Mail-Sendmail
3.8   6   6032  XML-Parser
4.7   7    641  Params-Validate
4.7   3   1457  HTML-TableExtract
4.0   7   2509  File-Temp
5.0   1   2479  XML-LibXML-Common
3.7   6   6322  Archive-Tar
5.0   5    488  Email-Valid
4.6  11    427  MIME-Lite
5.0   1   2371  Date-Manip
4.0   2   8022  Compress-Zlib
4.8   5    614  Parse-Yapp
5.0   1   2225  IO-Socket-SSL
4.0   1  15146  Cairo
5.0   3    707  Text-Template
5.0   1   2108  Net-CIDR-Lite
4.8   8    360  Path-Class
4.9   9    269  HTML-Mason
4.7  19    175  Spreadsheet-WriteExcel
5.0   2    911  HTML-Format
3.8   6   3167  Archive-Zip
4.1   8   1142  Parse-RecDescent
4.3   3   1937  IO-String
4.0   3   3585  Net-Server
4.3   3   1838  Class-Accessor
5.0   3    539  Cache-Cache
4.1   9    868  DBD-Pg
5.0   1   1555  AppConfig
5.0   2    773  Time-modules
4.4  12    318  File-Find-Rule

Here’s a simple program I wrote on a whim tonight, to take a very basic look at GHC’s low-level threading performance.

module Main where

import Control.Applicative
import Control.Concurrent.MVar
import Control.Concurrent
import Data.Time
import System.Environment

main = do
    mv <- newEmptyMVar
    start <- getCurrentTime
    loop mv =<< read . head <$> getArgs
    end <- getCurrentTime
    putStrLn $ "creation time: " ++ show (diffUTCTime end start)
    putMVar mv 0
    takeMVar mv
    fin <- getCurrentTime
    putStrLn $ "message time: " ++ show (diffUTCTime fin end)

loop :: MVar Int -> Int -> IO ()
loop mv n | n <= 0 = return ()
          | otherwise = do forkIO $ do
                              m <- takeMVar mv
                              putMVar mv $! m+1
                           loop mv (n-1)

This measures two things: how long it takes to fork a given number of threads, and how long it takes to send a tiny message through this ring of threads.

On my 32-bit desktop, I can fork 175,000 threads in one second, and send a message through the ring in 0.09 seconds. This was with a thread stack size of 172 bytes. GHC’s runtime will grow a thread’s stack dynamically if it overflows, so it’s safe to use such a small stack size.

During these runs, the memory consumed by the process maxed out at a slender 45MB.

I find all of these numbers to be very impressive, but they don’t grow linearly. Memory consumption stays low if I create 1.75 million threads (just 354MB), but the time to fork them leaps up to 95 seconds. Messaging through the ring also takes a big jump, to 5.6 seconds.

Applicative functors are gorgeous and versatile creatures, but as is common in Haskell, they lack a little in documentation. The paper that Conor and Ross wrote introducing them is good, but dense. What if we were to skip all the scene-setting kerfuffle, and plunge into using them by example?

I won’t attempt to describe what applicative functors actually are, because the idea is easy to absorb: we’ll pick it up as an incidental product of figuring out how to use them. Bear with me.

To begin with, we need a motivating example, and we’ll choose something simple, yet chewy: parsing an application/x-www-form-urlencoded string. Remember, they look like this:

name=bryan+o%27sullivan&city=san+francisco

While we’ll use Parsec as our parsing library, we’ll give it a completely new skin.

module ApplicativeParsec
    (
      -- Re-export the contents of these modules, for convenience.
      module Control.Applicative
    , module Text.ParserCombinators.Parsec
    ) where

import Control.Applicative
import Control.Monad (MonadPlus(..), ap)
-- Hide Parsec's definitions of some Applicative functions.
import Text.ParserCombinators.Parsec hiding (many, optional, (<|>))

– Every Monad is an Applicative.
instance Applicative (GenParser s a) where
    pure = return
    (<*>) = ap

– Every MonadPlus is an Alternative.
instance Alternative (GenParser s a) where
    empty = mzero
    (<|>) = mplus

We’ve made two claims in the code above. We’ve said that a Parsec parser is an applicative functor (you know, whatever that is), and that we can chain parsers into a series of alternatives.

There doesn’t seem to be general agreement about whether it’s okay for a key in an urlencoded string to have no associated value, so we’ll assume that it is by wrapping the value up in Maybe. If a key has a value (including an empty string), we’ll use Just, otherwise Nothing.

p_query :: CharParser () [(String, Maybe String)]

I’ll give the complete definition of the parser here, then walk through what the various bits mean.

p_query = pair `sepBy` char '&'
  where pair = liftA2 (,) (many1 safe)
                          (optional (char '=' *> many safe))
        safe = oneOf urlBaseChars
           <|> char '%' *> liftA2 diddle hexDigit hexDigit
           <|> ' ' <$ char '+'
        diddle a b = toEnum . fst . head . readHex $ [a,b]

urlBaseChars = ['a'..'z']++['A'..'Z']++['0'..'9']++”$-_.!*’(),”

First of all, notice the complete lack of variables in the code. This is the reason behind calling these beasts applicative: we construct our code by combining applications of functions.

Now for a piecewise decomposition of the code.

pair `sepBy` char '&'

The sepBy combinator takes two parsers as its arguments. It applies the left parser, then the right, then the first, accumulating the results of each left-hand parse in a list.

The definition of pair is where things begin to become interesting.

pair = liftA2 (,) (many1 safe)
                  (optional (char '=' *> many safe))

We’ll dissect this from the inside out, which leads us to consider this first.

char '=' *> many safe

The *> combinator applies the parser on the left, throws away its result, then applies the parser on the right. You can think of the > as “pointing at” the parser whose value is really to be returned. This implies that there’s probably a <* combinator which throws away the result on the right, and indeed there is.

As for the parser on the right, many applies a parser repeatedly, building up a list of results as it goes. It terminates as soon as the parser fails.

This snippet amounts to “match an equals sign, then run the safe parser repeatedly until it fails, accumulating each result into a list”.

The optional combinator turns a parser that must succeed into one that might fail.

optional :: (Alternative f) => f a -> f (Maybe a)

By wrapping our earlier snippet with optional, we’ve turned it into a parser that will return Nothing if our key isn’t followed by a value, or Just wrapping the value if there is one.

The meaning of the many1 safe snippet should be easy to guess: many1 applies the safe parser repeatedly, like many, but it fails if its parser doesn’t succeed at least once. (If you’re used to thinking in terms of regular expressions, many is like *, and many1 like +.)

Now to liftA2 (,). This takes two parsers as arguments. It applies the first parser, then the second, then applies a function to their results. Here, the function is (,), which constructs a tuple. Considered as a whole, this gives us a parser that applies two other parsers, and tuples up their results, returning the tuple as its result.

The safe parser handles a single character of a key or value. Now that we’re beginning to become familiar with the notation, we can gloss a little over its details and concentrate on what’s new.

safe = oneOf urlBaseChars
   <|> char '%' *> liftA2 diddle hexDigit hexDigit
   <|> ' ' <$ char '+'

Most obvious is the <|> combinator. It takes two parsers as arguments. If the left succeeds, its result becomes the result of <|>. If it fails, the result becomes the result of the right parser. As the above code implies, we can chain uses of <|>.

The first “arm” of safe parses a single character, taken from a set of acceptable characters. The second consumes a percent-encoded character, using the liftA2 combinator that we saw earlier: it consumes a “%” character, followed by two hex digits, then applies diddle to the digits to turn them into a single Haskell character.

diddle a b = toEnum . fst . head . readHex $ [a,b]

The readHex function lives in the Numeric module, and parses a hex string as an integer.

readHex :: (Num a) => String -> [(a, String)]

We’re now faced with a new combinator, <$.

' ' <$ char '+'

This applies the parser on the right: if it succeeds, it throws away its result and instead returns the value on the left. In other words, if we match “+”, we return a space.

Here’s another example: here, we parse a series of RFC2822 or HTTP (or MIME, or what have you) headers into a list of name/value pairs, including the proper handling of continuation lines.

p_headers :: CharParser st [(String, String)]
p_headers = header `manyTill` crlf
  where header = liftA2 (,) fieldName (char ‘:’ *> spaces *> contents)
        fieldName = liftA2 (:) letter (many fieldChar)
        fieldChar = letter <|> digit <|> oneOf “-_”
        contents = liftA2 (++) (many1 notEOL <* crlf)
                               (continuation <|> pure [])
        continuation = liftA2 (:) (’ ‘ <$ many1 (oneOf ” \t”)) contents

crlf :: CharParser st ()
crlf = (() <$ string “\r\n”) <|> (() <$ newline)

notEOL :: CharParser st Char
notEOL = noneOf “\r\n”

Notice our use of the <* combinator that we mentioned earlier, and once again the lack of let- or lambda-bound variables.

It can take a little while to get used to constructing and reading parsers in a wholly applicative style, but it does quickly start to feel both natural and appealing.

Compared to writing a parser in monadic style, the notation of the Control.Applicative module leads to significantly more compact code. Some of the big wins in expressivity come from the attention of the authors to surprisingly small details, like the <* and <$ operators, and their precedences and associativities. These let us write chains of “pointy” functors, where we visually follow the arrows to see what’s really being used.

By the way, the complete name for applicative functors is a mouthful: they’re strong lax monoidal functors. These structures come to us from abstract algebra and category theory. If you’re a functional programmer and you’re not following the work of people like Conor McBride and Jeremy Gibbons, you really should: it’s a whole lot of fun when the seemingly ethereal world of abstract mathematics comes home to roost in Haskell.

LLVM for Fedora

I’ve just packaged up LLVM 2.1 for Fedora. It hasn’t hit the testing repository yet, but when it does, you’ll be able to install it in straightforward fashion:

yum --enablerepo=testing install llvm llvm-devel llvm-docs

Until the packages are pushed out (within a few days), you can download them directly.

The Monad Squad

Tired of imperative programmers kicking sand in your face? Send some cut-out lambdas and a postal order for 5 frobs to haskell.org, and we’ll send you a handy-dandy poster of those famous superheroes, the Monad Squad!

State helps functional programmers cross the imperative street!

Identity can blend in anywhere!

Writer never forgets a detail!

Cont can travel through time!

List can do many things at once!

Reader can tell what you’re thinking!

Together, they battle the evil genius unsafePerformIO, who is always trying to destroy the world!

This is, I must say, very clever. In my latest round of inbound spam, I’ve noticed that some senders have begun sending valid links to http://google.com/ in their messages. The technique they’re using is to obfuscate a target URL inside a Google “I’m feeling lucky” query: this means that the domain near the left of the URL really is google.com and doesn’t need to be faked, but it immediately reroutes a click to the spammer’s target, which is difficult to read due to some escaping. This is a cute social engineering attack, riding on Google’s brand and domain name to gull the unwary into clicking.

An obvious variant of this technique would be to seed a link farm with statistically improbable phrases, such that an “I’m feeling lucky” search for some innocuous but unlikely term, e.g. “woozy numbat playing kazoo”, would end up with a spammer’s site advertising something rather less wholesome as the number one hit. A spammer could even extend the use of SIPs to provide a canary trap to validate email addresses:if the inbound search term is “feral pet smells linux”, and we only sent that combination to user@domain.com, then the address must be valid.

POPL 08 takes place next week, so San Francisco will be flooded with an army of burly and menacing programming language researchers and type theorists. Kids, don’t say you haven’t been warned.

On Wednesday evening, January 9, Phil Wadler will be repeating for the public his talk “Well-typed programs can’t be blamed”, based on joint work with Robby Findler. Here’s the abstract from their eponymous paper.

We show how contracts with blame fit naturally with recent work on hybrid types and gradual types. Unlike hybrid types or gradual types, we require casts in the source code, in order to indicate where type errors may occur. Two (perhaps surprising) aspects of our approach are that refined types can provide useful static guarantees even in the absence of a theorem prover, and that type dynamic should not be regarded as a supertype of all other types. We factor the well-known notion of subtyping into new notions of positive and negative subtyping, and use these to characterise where positive and negative blame may arise. Our approach sharpens and clarifies some recent results in the literature.

The talk has been organised by BayFP (join up!), and will take place at 7:30pm in the Stanford Room of the Stanford Court Hotel, at 905 California Street.

If you’re a researcher attending POPL, you ought to come to Phil’s talk even if you’ve already seen it. It will be a great, and rare, opportunity to meet Silicon Valley hackers who use functional languages in startups and established companies.

Next »