Some months ago, I wrote a Python implementation of Daniel Lemire‘s Weighted Slope One collaborative filtering algorithm.

Steve Jenson sent me a pointer to his Scala implementation last week, but his code is a straight port of the Python version, not using any of Scala’s tasty functional crunchiness. In an idle period the other evening, I ported my Python code to Haskell.

module SlopeTwo (Rating, SlopeOne, empty, predict, update) where import Data.List (foldl') import qualified Data.Map as M newtype SlopeOne a = SlopeOne (M.Map a (M.Map a (Int, Double))) deriving (Show) type Rating a = M.Map a Double empty = SlopeOne M.empty addT (a,b) (c,d) = let (l,r) = (a+c, b+d) in l `seq` r `seq` (l, r) update :: Ord a => SlopeOne a -> [Rating a] -> SlopeOne a update (SlopeOne s) = SlopeOne . M.map (M.map norm) . foldl' update' s where norm (a,b) = (a, b / fromIntegral a) update' s rm = foldl' (\m (k, v) -> insert m k v) s prod where prod = [((a, b), (1, m - n)) | (a, m) <- rs, (b, n) <- rs] rs = M.toList rm insert m (a, b) v = M.alter foo a m where foo = Just . maybe (M.singleton b v) (M.insertWith' addT b v) predict :: Ord a => SlopeOne a -> Rating a -> Rating a predict (SlopeOne s) rm = let freqm = foldl' insert M.empty [(a,b,r) | (a, m) <- M.keys s, (b, r) <- M.toList rm] insert n (a,b,r) = let (f, d) = find s (a, b) in M.insertWith' addT a (f, fromIntegral f * (r + d)) n norm = M.map (\(a, b) -> b / fromIntegral a) freqm find m (a,b) = M.findWithDefault (0,0) b (M.findWithDefault M.empty a m) in M.filterWithKey (\k v -> v > 0 && M.notMember k rm) norm

The Haskell code uses essentially the same approach as the Python code, though of course it has to use purely functional maps instead of dicts (hash tables), because hash tables don’t make much sense in a functional language. I also cut down the number of maps from two to one, which simplified the code a little.

Also, instead of doubly nested loops, the Haskell code uses single folds. How can we replace nested loops with a single unnested equivalent? The answer lies in treating lists as control flow: we use a list comprehension to generate the cartesian product of two lists, after the following manner.

ghci> [(x,y) | x <- [1,2,3], y <- ['a','b']] [(1,'a'),(1,'b'),(2,'a'),(2,'b'),(3,'a'),(3,'b')]

Here's a bit of test code, again ported from my Python version (yes, it gives the same answer).

userData :: [Rating String] userData = map M.fromList [ [("squid", 1.0), ("cuttlefish", 0.5), ("octopus", 0.2)], [("squid", 1.0), ("octopus", 0.5), ("nautilus", 0.2)], [("squid", 0.2), ("octopus", 1.0), ("cuttlefish", 0.4), ("nautilus", 0.4)], [("cuttlefish", 0.9), ("octopus", 0.4), ("nautilus", 0.5)] ] prediction = predict (update empty userData) (M.fromList [("squid", 0.4)])

Because of the “norm” operation, your “update” function only makes sense if applied to “empty :: SlopeOne a” as its first parameter. So I would suggest that initial empty value by hard coded instead of a parameter to “update”. And in this case “update” should be renamed to something more like “slopeOneFromRatings”

If you actually want “updateSlopeOne” that adds novel user preferences to a non-empty slopeOne value then you need to reverse the “norm” operation before adding in novel information (and finish with the “norm” operation).

Or you could never apply “norm” to the slopeOne information and always do the “norm” division when looking up values.

A more stylistic change would be to make slopeOne into the type “M.Map (a,a) (Int,Double)” which takes the item pair as a single key. Your current code with nested maps could be made slightly more efficient by grouping the insertions so that each insertion did not always involve a lookup into both the outer and inner map (by grouping by the outer Map’s key either per user (easy given the way you construct prod) or sorted across all the users). Also the (item1,item2)

keys are irrelevant since the value is always zero.

Hi, Chris. Regarding your suggested stylistic change, I originally wrote the Haskell code to use the kind of map you suggest, but it was both longer and less readable than the current code, the reason being that the left-hand item in a key pair was overrepresented in proportion to the number of times the right-hand item showed up. Accounting for this was annoying. Using nested maps avoids the problem.

Ah, I have now read your “predict” as well as the “update”. I had to rewrite your variable names to be sure I could read your code correctly. Reading the original paper I see your algorithms look correct. The default (0,0) does nothing since the first 0 multiples the rating and so the running total is unchanged.

A possibly clearer version of predict:

predict’ :: Ord a => SlopeOne a -> Rating a -> Rating a

predict’ (SlopeOne matrixIn) userRatings = M.mapMaybeWithKey calcItem matrixIn

where calcItem item1 innerMap | M.member item1 userRatings = Nothing

| M.null combined = Nothing

| norm_rating

The blog ate my code, so I posted all my code at http://haskell.org/haskellwiki/CollaborativeFiltering

The edited update function ensures there are no “diagonal elements”.

The predict’ function makes the relationship of the SlopeOne matrix to the predicted value a bit clearer.

And before I run for dinner I changed the code on the wiki so that update2 produces an un-normalized SlopeOne’ matrix and changed my predict’ to consume this instead.

You are right — the sparse matrices as nested maps are easy to work with. And I got to use Data.Map.mapMaybeWithKey and Data.Map.intersectionWith which I have not had call to employ before.

Cheers,

Chris