Steve Jenson sent me a pointer to his Scala implementation last week, but his code is a straight port of the Python version, not using any of Scala’s tasty functional crunchiness. In an idle period the other evening, I ported my Python code to Haskell.
module SlopeTwo (Rating, SlopeOne, empty, predict, update) where import Data.List (foldl') import qualified Data.Map as M newtype SlopeOne a = SlopeOne (M.Map a (M.Map a (Int, Double))) deriving (Show) type Rating a = M.Map a Double empty = SlopeOne M.empty addT (a,b) (c,d) = let (l,r) = (a+c, b+d) in l `seq` r `seq` (l, r) update :: Ord a => SlopeOne a -> [Rating a] -> SlopeOne a update (SlopeOne s) = SlopeOne . M.map (M.map norm) . foldl' update' s where norm (a,b) = (a, b / fromIntegral a) update' s rm = foldl' (\m (k, v) -> insert m k v) s prod where prod = [((a, b), (1, m - n)) | (a, m) <- rs, (b, n) <- rs] rs = M.toList rm insert m (a, b) v = M.alter foo a m where foo = Just . maybe (M.singleton b v) (M.insertWith' addT b v) predict :: Ord a => SlopeOne a -> Rating a -> Rating a predict (SlopeOne s) rm = let freqm = foldl' insert M.empty [(a,b,r) | (a, m) <- M.keys s, (b, r) <- M.toList rm] insert n (a,b,r) = let (f, d) = find s (a, b) in M.insertWith' addT a (f, fromIntegral f * (r + d)) n norm = M.map (\(a, b) -> b / fromIntegral a) freqm find m (a,b) = M.findWithDefault (0,0) b (M.findWithDefault M.empty a m) in M.filterWithKey (\k v -> v > 0 && M.notMember k rm) norm
The Haskell code uses essentially the same approach as the Python code, though of course it has to use purely functional maps instead of dicts (hash tables), because hash tables don’t make much sense in a functional language. I also cut down the number of maps from two to one, which simplified the code a little.
Also, instead of doubly nested loops, the Haskell code uses single folds. How can we replace nested loops with a single unnested equivalent? The answer lies in treating lists as control flow: we use a list comprehension to generate the cartesian product of two lists, after the following manner.
ghci> [(x,y) | x <- [1,2,3], y <- ['a','b']] [(1,'a'),(1,'b'),(2,'a'),(2,'b'),(3,'a'),(3,'b')]
Here's a bit of test code, again ported from my Python version (yes, it gives the same answer).
userData :: [Rating String] userData = map M.fromList [ [("squid", 1.0), ("cuttlefish", 0.5), ("octopus", 0.2)], [("squid", 1.0), ("octopus", 0.5), ("nautilus", 0.2)], [("squid", 0.2), ("octopus", 1.0), ("cuttlefish", 0.4), ("nautilus", 0.4)], [("cuttlefish", 0.9), ("octopus", 0.4), ("nautilus", 0.5)] ] prediction = predict (update empty userData) (M.fromList [("squid", 0.4)])