Weighted Slope One in Haskell: collaborative filtering in 29 lines of code

Some months ago, I wrote a Python implementation of Daniel Lemire‘s Weighted Slope One collaborative filtering algorithm.

Steve Jenson sent me a pointer to his Scala implementation last week, but his code is a straight port of the Python version, not using any of Scala’s tasty functional crunchiness. In an idle period the other evening, I ported my Python code to Haskell.

module SlopeTwo (Rating, SlopeOne, empty, predict, update) where

import Data.List (foldl')
import qualified Data.Map as M

newtype SlopeOne a = SlopeOne (M.Map a (M.Map a (Int, Double))) deriving (Show)
type Rating a = M.Map a Double

empty = SlopeOne M.empty
addT (a,b) (c,d) = let (l,r) = (a+c, b+d) in l `seq` r `seq` (l, r)

update :: Ord a => SlopeOne a -> [Rating a] -> SlopeOne a
update (SlopeOne s) = SlopeOne . M.map (M.map norm) . foldl' update' s
  where norm (a,b) = (a, b / fromIntegral a)
        update' s rm = foldl' (\m (k, v) -> insert m k v) s prod
          where prod = [((a, b), (1, m - n)) | (a, m) <- rs, (b, n) <- rs]
                rs = M.toList rm
        insert m (a, b) v = M.alter foo a m
          where foo = Just . maybe (M.singleton b v) (M.insertWith' addT b v)

predict :: Ord a => SlopeOne a -> Rating a -> Rating a
predict (SlopeOne s) rm =
  let freqm = foldl' insert M.empty
              [(a,b,r) | (a, m) <- M.keys s, (b, r) <- M.toList rm]
      insert n (a,b,r) = let (f, d) = find s (a, b) in
          M.insertWith' addT a (f, fromIntegral f * (r + d)) n
      norm = M.map (\(a, b) -> b / fromIntegral a) freqm
      find m (a,b) = M.findWithDefault (0,0) b (M.findWithDefault M.empty a m)
  in M.filterWithKey (\k v -> v > 0 && M.notMember k rm) norm

The Haskell code uses essentially the same approach as the Python code, though of course it has to use purely functional maps instead of dicts (hash tables), because hash tables don’t make much sense in a functional language. I also cut down the number of maps from two to one, which simplified the code a little.

Also, instead of doubly nested loops, the Haskell code uses single folds. How can we replace nested loops with a single unnested equivalent? The answer lies in treating lists as control flow: we use a list comprehension to generate the cartesian product of two lists, after the following manner.

ghci> [(x,y) | x <- [1,2,3], y <- ['a','b']]
[(1,'a'),(1,'b'),(2,'a'),(2,'b'),(3,'a'),(3,'b')]

Here's a bit of test code, again ported from my Python version (yes, it gives the same answer).

userData :: [Rating String]
userData = map M.fromList [
 [("squid", 1.0), ("cuttlefish", 0.5), ("octopus", 0.2)],
 [("squid", 1.0), ("octopus", 0.5), ("nautilus", 0.2)],
 [("squid", 0.2), ("octopus", 1.0), ("cuttlefish", 0.4), ("nautilus", 0.4)],
 [("cuttlefish", 0.9), ("octopus", 0.4), ("nautilus", 0.5)]
 ]

prediction = predict (update empty userData) (M.fromList [("squid", 0.4)])
Posted in haskell
6 comments on “Weighted Slope One in Haskell: collaborative filtering in 29 lines of code
  1. Chris K says:

    Because of the “norm” operation, your “update” function only makes sense if applied to “empty :: SlopeOne a” as its first parameter. So I would suggest that initial empty value by hard coded instead of a parameter to “update”. And in this case “update” should be renamed to something more like “slopeOneFromRatings”

    If you actually want “updateSlopeOne” that adds novel user preferences to a non-empty slopeOne value then you need to reverse the “norm” operation before adding in novel information (and finish with the “norm” operation).

    Or you could never apply “norm” to the slopeOne information and always do the “norm” division when looking up values.

    A more stylistic change would be to make slopeOne into the type “M.Map (a,a) (Int,Double)” which takes the item pair as a single key. Your current code with nested maps could be made slightly more efficient by grouping the insertions so that each insertion did not always involve a lookup into both the outer and inner map (by grouping by the outer Map’s key either per user (easy given the way you construct prod) or sorted across all the users). Also the (item1,item2)
    keys are irrelevant since the value is always zero.

  2. Hi, Chris. Regarding your suggested stylistic change, I originally wrote the Haskell code to use the kind of map you suggest, but it was both longer and less readable than the current code, the reason being that the left-hand item in a key pair was overrepresented in proportion to the number of times the right-hand item showed up. Accounting for this was annoying. Using nested maps avoids the problem.

  3. Chris K says:

    Ah, I have now read your “predict” as well as the “update”. I had to rewrite your variable names to be sure I could read your code correctly. Reading the original paper I see your algorithms look correct. The default (0,0) does nothing since the first 0 multiples the rating and so the running total is unchanged.

    A possibly clearer version of predict:

    predict’ :: Ord a => SlopeOne a -> Rating a -> Rating a
    predict’ (SlopeOne matrixIn) userRatings = M.mapMaybeWithKey calcItem matrixIn
    where calcItem item1 innerMap | M.member item1 userRatings = Nothing
    | M.null combined = Nothing
    | norm_rating

  4. Chris K says:

    The blog ate my code, so I posted all my code at http://haskell.org/haskellwiki/CollaborativeFiltering

    The edited update function ensures there are no “diagonal elements”.

    The predict’ function makes the relationship of the SlopeOne matrix to the predicted value a bit clearer.

  5. Chris K says:

    And before I run for dinner I changed the code on the wiki so that update2 produces an un-normalized SlopeOne’ matrix and changed my predict’ to consume this instead.

    You are right — the sparse matrices as nested maps are easy to work with. And I got to use Data.Map.mapMaybeWithKey and Data.Map.intersectionWith which I have not had call to employ before.

    Cheers,
    Chris

  6. Kulwinder says:

    A FISHER SAVED MY LIFE! After my mother’s death in 1987 I was very deesesrpd. I had lost faith in myself and in life and ended up farming my 2 teenage kids out to friends and going homeless myself. Home, as it were, became the woods of Phippsburg, Maine, where I pitched a tent and furnished it with a cot, a lantern and a camping stove. I kept my clothes in plastic bags. Well, I had been there through the summer and fall, and now it was late November. It was getting very cold and icy rains were falling hard. In fact, I was woken up about midnight one night and the rain was coming down so hard I thought it would flatten the tent. I thought I’d better get up and check the tent poles. So I flung my foot over the side of my cot and, to my utter shock, it plunged into frigid water about half way up my calf! So I yanked it back and grabbed my flashlight. When I swept the inside of the tent with the beam I saw a strange site all of my plastic bags of clothes were floating here and there. The tent was filling with water! I burst into tears and probably added another few inches of water to the flood. But incredibly despite how awful things were, the next day I STILL couldn’t bring myself to rent an apartment even though I had $10,000 in the bank! That’s how messed up I was. It was going to take something even worse to motivate me. Here’s what finally levered me out of my emotional morass: it was the middle of the night, just a few nights later Thanksgiving eve I recall and I was jerked awake by the most god-awful sound. It sounded like a women being murdered and shrieking in the most utter agony! Could it really be that? But then I remembered that there was a local legend about The Phippsburg Shrieker, which was described as a yeti-like monster. And that scared me even more! In any case, it sounded so close that I thought for sure it whether it was a murderer or a monster was going to rip my tent open and kill me! I lay wide awake and shaking, huddled in my sleeping bag, clutching my open jack knife, all night. But that did it. I had finally had enough. So when, the dawn eventually broke an eternity later I took a cautious peek outside, and seeing that the coast was clear, I dove into action: broke camp; threw my gear in my truck; barreled to the nearby town of Bath and rented the first place I looked at which was a half of a duplex on Elm Street. Then I got my stuff out of storage and my kids from friends and we moved in. The place was a dump (but with good-bones ) and my kids dubbed it The Nightmare on Elm Street. But I was so happy to be inside where it was warm and dry, and to have my children with me again that I didn’t care. And besides, I love to do extreme make-overs. (This turned out to be very extreme but worth it.) So, on the wings of my new found gratitude and abundance attitude I #1 turned the dump into into a palace, #2 started a new successful business, and #3 transformed my nightmare life into a dream. However, it wasn’t until the following June, when I went back to Phippsburg for my birthday celebration, that I discovered exactly what it was that I had heard that terrifying night the November before. My family and I were seated in Spinney’s restaurant ordering a lobster dinner and looking out at the sunset over the water. While we were waiting for our meal to come I got talking with some folks at the next table. They were local people I knew slightly. And in the course of conversation I shared the story of my encounter with the Phippsburg Shrieker. As I wrapped it up, I noticed the family exchanging knowing glances and grins. Then, after hemming and hawing a bit the father explained, Ayuh, we git rid of a lot of folks-from-away with thet monst-ah myth. But since you ahn’t from too-o-o far away, I’ll let you in on the truth. The Shriek-ah is really just a FISH-AH. Ayuh, ayuh, ayuh. Well, we all had a good laugh about that and ever since I have loved fishers, because if it hadn’t been for The Phippsburg Shrieker they’d have probably have found my frozen body in that tent come spring.

2 Pings/Trackbacks for "Weighted Slope One in Haskell: collaborative filtering in 29 lines of code"
  1. […] this implementation, I’m going to tackle the Weighted Slope One.  Using the Haskell implementation by Bryan O’Sullivan as my source of inspiration.  I find that going between Haskell and F# to be […]

  2. […] this implementation, I’m going to tackle the Weighted Slope One.  Using the Haskell implementation by Bryan O’Sullivan as my source of inspiration.  I find that going between Haskell and F# to be […]

Leave a Reply

Your email address will not be published. Required fields are marked *

*