Subscribe to
Posts
Comments

I’ve spent a few spare hours here and there working on a pure Haskell interface to MySQL recently. On the principle that perhaps someone else might want to join in the fun, I’ve published a darcs repository already (see the link above for more details):

darcs get http://darcs.serpentine.com/mysql

There are a few reasons I’m doing this. One is licensing-related: the existing hsql-mysql bindings claim to be BSD-licensed, but they’re just an FFI wrapper around the MySQL client library, so they (and any program that uses them) are subject to the GPL. My library speaks the MySQL wire protocol in pure Haskell, and I’ve BSD-licensed it.

The code isn’t actually usable for applications yet, as I’ve mainly done the work of getting login (a major pain) and querying working. I’ve spent a lot of time with the following very handy references:

This has also been a great opportunity to see how well Hackage, Haskell’s equivalent of CPAN, is working out in practice. When I wanted a library that would help me to decode and encode the wire protocol efficiently and flexibly, I was able to grab binary. When I subsequently needed SHA-1 hash support to implement the password hashing protocol, I downloaded the crypto package. No muss, no fuss.

Some months ago, I wrote a Python implementation of Daniel Lemire’s Weighted Slope One collaborative filtering algorithm.

Steve Jenson sent me a pointer to his Scala implementation last week, but his code is a straight port of the Python version, not using any of Scala’s tasty functional crunchiness. In an idle period the other evening, I ported my Python code to Haskell.

module SlopeTwo (Rating, SlopeOne, empty, predict, update) where

import Data.List (foldl')
import qualified Data.Map as M

newtype SlopeOne a = SlopeOne (M.Map a (M.Map a (Int, Double))) deriving (Show)
type Rating a = M.Map a Double

empty = SlopeOne M.empty
addT (a,b) (c,d) = let (l,r) = (a+c, b+d) in l `seq` r `seq` (l, r)

update :: Ord a => SlopeOne a -> [Rating a] -> SlopeOne a
update (SlopeOne s) = SlopeOne . M.map (M.map norm) . foldl' update' s
  where norm (a,b) = (a, b / fromIntegral a)
        update' s rm = foldl' (\m (k, v) -> insert m k v) s prod
          where prod = [((a, b), (1, m - n)) | (a, m) <- rs, (b, n) <- rs]
                rs = M.toList rm
        insert m (a, b) v = M.alter foo a m
          where foo = Just . maybe (M.singleton b v) (M.insertWith' addT b v)

predict :: Ord a => SlopeOne a -> Rating a -> Rating a
predict (SlopeOne s) rm =
  let freqm = foldl' insert M.empty
              [(a,b,r) | (a, m) <- M.keys s, (b, r) <- M.toList rm]
      insert n (a,b,r) = let (f, d) = find s (a, b) in
          M.insertWith' addT a (f, fromIntegral f * (r + d)) n
      norm = M.map (\(a, b) -> b / fromIntegral a) freqm
      find m (a,b) = M.findWithDefault (0,0) b (M.findWithDefault M.empty a m)
  in M.filterWithKey (\k v -> v > 0 && M.notMember k rm) norm

The Haskell code uses essentially the same approach as the Python code, though of course it has to use purely functional maps instead of dicts (hash tables), because hash tables don’t make much sense in a functional language. I also cut down the number of maps from two to one, which simplified the code a little.

Also, instead of doubly nested loops, the Haskell code uses single folds. How can we replace nested loops with a single unnested equivalent? The answer lies in treating lists as control flow: we use a list comprehension to generate the cartesian product of two lists, after the following manner.

ghci> [(x,y) | x <- [1,2,3], y <- ['a','b']]
[(1,'a'),(1,'b'),(2,'a'),(2,'b'),(3,'a'),(3,'b')]

Here’s a bit of test code, again ported from my Python version (yes, it gives the same answer).

userData :: [Rating String]
userData = map M.fromList [
 [("squid", 1.0), ("cuttlefish", 0.5), ("octopus", 0.2)],
 [("squid", 1.0), ("octopus", 0.5), ("nautilus", 0.2)],
 [("squid", 0.2), ("octopus", 1.0), ("cuttlefish", 0.4), ("nautilus", 0.4)],
 [("cuttlefish", 0.9), ("octopus", 0.4), ("nautilus", 0.5)]
 ]

prediction = predict (update empty userData) (M.fromList [("squid", 0.4)])

A few days ago, I wrote a Haskell library for building and working with suffix trees. It builds a suffix tree lazily, so even though its performance is O(n log n) on large input strings, it often has linear performance for many applications.

The implementation is simple and easy to read; the API is well-documented and easy to use; and the internals of the tree structure are exposed, so you can build your own traversal, inspection, and manipulation functions on top of it. The code is liberally licensed (BSD3), to encourage wide use.

You can download the stable version from HackageDB, or fetch the source repository:

darcs get http://darcs.serpentine.com/suffixtree

I’ve had an account on del.icio.us for several years, but I only started using it heavily perhaps a year ago. While it’s a wonderful site in many respects, I’ve been surprised and disappointed by what’s happened since Yahoo acquired the company: nothing at all.

Clearly, Yahoo has had no idea what to do with their acquisition: almost its first reaction to the grafted del.icio.us antibodies was to roll out a homegrown NIH tagging site, the name of which nobody now remembers. Since then, Yahoo hasn’t even bothered to switch users over to its unified login system as it did with Flickr. Whether you liked that move or not, it was a clear sign that Yahoo had some interest in Flickr.

Prior to the acquisition, I figure that del.icio.us had fulfilled perhaps 5% of its initial promise. In order to grow its user base beyond hardcore power users, the site clearly needed a less hostile user interface; nothing has happened. They could have increased existing user loyalty and retention by making it easier for people to find others with similar interests; nothing has happened. They could have made the site more useful to individuals by helping people understand the evolution of their own interests. Nothing has happened.

It’s a shame to see that potential go unfulfilled. Even from a business perspective, if Yahoo could quite reasonably not figure out how to squeeze some money out of its acquisition, it could at very little cost to itself be improving its image among the nerdiest of the nerdy by investing in del.icio.us and blowing its geek cred trumpet. Instead, I have a notion that Josh Schachter is languishing in some dungeon in Santa Clara, watching the grains of sand fall through his stock option hourglass, and waiting to make good his escape. What a waste.

I suspect that the same thing is also happening with Flickr, though not to quite the same extent. Take a look at the numbers of pre- and post-acquisition changes to the site and you’ll see what I mean. Sure, geotagging is cute, but it’s almost the only significant change in over two years. This sclerosis of small acquisitions doesn’t make me sanguine for Yahoo’s future.

Irish television hasn’t changed much since I left the country twelve years ago. The main national broadcaster, RTÉ, mainly serves up imported American and British shows, with locally originated fare dominated by vast quantities of sports coverage, a few soaps and talk shows, and the occasional documentary. It’s a tepid, uninspiring mix.

Except for one show, that is. Soupy Norman is a Polish soap opera (Pierwsza miłość), dubbed into English and utterly twisted out of recognition by a team of Irish comedians. The writing, air of absurdity, and comic timing are pitch-perfect. (Mind you, the Cork accents might be a bit tough for non-native ears to decipher. My wife kept asking me “can you really understand what they’re saying?”)

The craic team of bloggers at National Disgrace have kindly provided an index to every Soupy Norman episode on YouTube. Watch them right away.

Here’s a great quote from Yaron Minsky about the use of types in functional programs.

[...] most of the advantage of types in a language like ML comes from completely vanilla uses of the type system. One of our programming maxims at Jane Street is to “make illegal states unrepresentable”. Algebraic datatypes, parametric polymorphism and abstraction can be used to great effect in this regard.

I think that people who are new to ML think of the type system as merely a constraint, that is to say, you write code as you did before, and then you see whether the compiler accepts it. As people get more experienced, they start to use the type system as a tool in its own right, and one of the things that is constantly on one’s mind is how to pass off as much of the proof burden as possible to the compiler. Most of this work can and should be done using boring old vanilla ML types.

This is a wonderful point. From the outside, the type systems of languages like Haskell and ML tend to look like a sort of archaic ecstatic religious rite. The bleeding mendicants pause as they shuffle past, sing a verse in praise of the purifying pain of strong typing, then prod themselves with pointy sticks and progress along their lonely road.

In practice, though, the type system is a powerful tool that helps to prevent mistakes, by forcing you to do some thinking up front. It’s not something you put up with; it’s something you value, albeit after a while.

One of the difficulties of crossing the Rubicon from type system outsider to insider is that it’s hard to find motivating examples in small, easily understood chunks of code, the sort of code that one writes as a beginner. The value doesn’t really begin to accrue until you’re either writing something moderately complex or refactoring existing code.

Here’s a fabulous way to start a research paper.
In 1995, McArthur Wheeler walked into two Pittsburgh banks and robbed them in broad daylight, with no visible attempt at disguise. He was arrested later that night, less than an hour after videotapes of him taken from surveillance cameras were broadcast on the 11 o’clock news. When police later showed him the surveillance tapes, Mr. Wheeler stared in incredulity. “But I wore the juice,” he mumbled. Apparently, Mr. Wheeler was under the impression that rubbing one’s face with lemon juice rendered it invisible to videotape cameras.
The paper in question, Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments is a great read. Personally, I have my nose rubbed in the evidence of my own ineptitude often enough that I hope that I’m not in the category of people the authors study. That said, “but I wore the juice!” has now become my catch-all excuse for error.

I’ve been running test releases of Fedora 7, and lately the final release, for a number of months on my fairly new Lenovo Thinkpad X60. Here’s a brief summary of my experiences.

I’ve been using Fedora since 0.92 (Taroon), and while I’ve generally been happy with it, F7 is not exactly a model of stability or sleekness. Here’s a list of problems I’ve run into. My biggest problems so far have been with suspend and resume, and network drivers.

Suspend to RAM doesn’t work by default; when I resume, the screen is blank, and can’t be coaxed back to life save through a reboot. This is bug 243759. I came up with a fix (for my hardware, at least) in bug 237849. Richard Hughes’s fabulous HAL quirk site was the key to figuring out that one; thanks, Richard!

Suspend to disk (aka “hibernate”) works, some of the time. But it interacts poorly with the crummy quality of the current e1000 ethernet driver.

The e1000 driver is flaky in F7. If I boot with a live ethernet cable plugged in, it works. But if I plug one in after booting, it fails to detect the carrier signal, and simply never notices the cable. Bug 236956.

There is a workaround, of sorts. If I manually rmmod, then modprobe the driver again, it will detect the carrier signal. Unfortunately, after I do this, the next time that I try to suspend to disk, all of my network-related processes get stuck in an unkillable state, and suspend throws in the towel. Bug 245107.

If I try to use the screen backlight brightness control buttons, the screen goes black instead. Bug 220466.

If I disable the internal modem in the BIOS, sound no longer works. Bug 205603.

The new iwl3945 wireless driver simply falls over after a while, depending on the network configuration it’s talking to. Bug 240826.

By the way, I’m not claiming that these problems are in any way the fault of Dave Jones, Chuck Ebbert, the HAL guys, or other Fedora maintainers. They have a thankless job trying to clean up the endless flood of new bugs in upstream kernels, not to mention cascades of junk hardware and buggy BIOSes from vendors. I’m frankly relieved every time the kernel boots reliably at all.

On top of these stability and reliability problems, performance has taken a bit of a beating. F7 seems to take quite a bit longer than earlier releases to boot, something Dave Jones has quantified. The post-boot bloat also rises: there are more kernel threads and userspace daemons running than ever before.

On the positive side, 3D graphics works quite well. I can actually run OpenGL apps like Google Earth and Second Life without crashes, and get decent frame rates out of them. And of course I can turn on wobbly windows and induce vertigo and nausea within minutes, where previously I had to resort to using Windows to get that sensation.

I’ve been sitting on this for a while, so I’m very excited to announce that Don Stewart, John Goerzen and I are collaborating on an upcoming book for O’Reilly, the working title of which is “Real-World Haskell”. Better yet, O’Reilly has agreed to publish the title under a Creative Commons license!

You can find more details, and follow our progress, over at the web site we’ve set up.

With Jens Petersen’s blessing, I’ve packaged GHC 6.6.1 for Fedora Extras. If you use FC6, it’s available via yum as of a few days ago. It will be a part of Fedora 7 as soon as that comes out, too.

The upgrade to 6.6.1 necessitated a bump of the release number of the Fedora Gtk2Hs package, too. There are no changes in the new release; it’s just built against the newer compiler release.

« Prev - Next »