Collaborative filtering made easy

Posted on 2006-12-12 by Bryan O'Sullivan — 18 Comments ↓

During a lunchtime conversation the other day, a coworker mentioned that he was hacking in his spare time on an entry for the Netflix Prize. This got me to thinking about collaborative filtering: why had I never seen a good description of how to do it? I suspect that people who might ordinarily have a casual interest in the subject hear that there are some statistics involved, whereupon they immediately freeze in the mathematical headlights, and turn the conversation to something else, anything else. In early 2005, a researcher named Daniel Lemire published, with Anna Maclachlan, a paper with the jazzy title of “Slope One Predictors for Online Rating-Based Collaborative Filtering“. This is an important paper, because it presents a family of really simple collaborative filtering schemes. I mean really simple: there are no statistics involved, just a little bit of linear algebra. Unfortunately, because the Lemire/Maclachlan paper is aimed at an academic audience, it’s not trivial to tell by merely reading it how simple the technique it presents is, so what follows is my attempt to translate its ideas into my native language of Attention Deficit Programmerese. To make things more concrete, I’m going to present an implementation in less than 40 lines of Python (and I’ll try to explain any obscure bits of Python as I go). Cool, huh? Regardless of the underlying implementation, collaborative filters tend to try to solve the same broad problem using much the same data.

You have your crowd of users.
You have your big pile of items.
Some of your users have been generous enough to tell you what they think of some of your items.
You want to hawk more items to a user, and you’d prefer to make your suggestions relevant to their likely interests.

Items 1 through 3 suggest that you could use the opinions that people have recorded about shiny doodads they have seen, to give a good guess as to which doodads they haven’t seen, but might like. I am not going to explain the mathematics behind the Slope One predictor, because Lemire and Maclachlan’s paper pretty much pulls the numerical justification out of a hat. I’m not immersed in the literature of the field, so I’ll be lazy and take them at their word when they say their approach works; and hey, that’s easy code to write, so let’s get hacking! (The scheme I’m going to talk about is referred to as “Weighted Slope One” in the Lemire and Maclachlan paper.) My Slope One implementation stores its data in two 2D matrices. The X and Y axes are identical in each matrix. If the items to be rated are “cuttlefish”, “octopus”, and “squid”, we end up with two 3×3 matrices, each with the following rows and columns:

The coloured cells above indicate which elements of the matrix contain useful information. I’ll describe why most of the matrix is empty later. To keep things simple, let’s represent our matrices as two levels of Python dict objects (a dict is simply a hash table, if you’re not familiar with Python). The key of each dict is an item ID, so to get the matrix entry for “squid vs cuttlefish”, we look in the first-level dict for squid, then the second-level dict for cuttlefish.

class SlopeOne(object):

    def __init__(self):

        self.diffs = {}

        self.freqs = {}

The self.diffs matrix will hold differential ratings (I’ll describe what these are soon), and the self.freqs matrix will contain the number of times we’ve computed a differential rating for each pair of items. We’ll fill in the first and second levels of the dicts once we know what our data looks like, so let’s talk about the shape of the data. We need to be able to express ratings such as “Alice rated a squid as 1, and a cuttlefish as 4″. This is easy to express with two levels of dicts; the first maps from a user to all of that user’s ratings, and the second maps items to their ratings. In other words, our user data looks like this in Python:

userdata = dict(alice=dict(squid=1, cuttlefish=4)),

                bob=dict(squid=3, octopus=2))

The time has come to load the data into our matrices; let’s call the method update.

    def update(self, data):

To fill in the matrices, we must look at each rating from every user.

        for user, ratings in data.iteritems():

For each user and rating they have submitted, we will be iterating over every rating they have submitted a second time. The first thing we do is make sure that the necessary second-level dicts in each matrix exist.

            for item, rating in ratings.iteritems():

                self.freqs.setdefault(item, {})

                self.diffs.setdefault(item, {})

The setdefault call is an unfortunately ugly Pythonic idiom that means “if there’s no value for the given key (the first parameter) in the dict, assign it the given value (the second parameter)”. Inside this loop, we iterate over the user’s ratings again, to compare each rating against every other.

                for item2, rating2 in ratings.iteritems():

                    self.freqs[item].setdefault(item2, 0)

                    self.diffs[item].setdefault(item2, 0.0)

Here, we ensure that there are zeroes present in the matrix for every pair of ratings. (Notice that we’re using an integer zero to count frequencies, and a floating point zero for ratings.) For every pair of ratings, we modify the appropriate cell in self.freqs by adding one to the number of times we’ve seen that pair of ratings.

                    self.freqs[item][item2] += 1

And to update the self.diffs matrix, we compute the difference between the ratings for the two items.

                    self.diffs[item][item2] += rating - rating2

This is why I called self.diffs the “differential rating” matrix. Every time a user rates an item, we update the appropriate cells in this matrix to reflect the accumulated difference between that rating and every other rating the user has made. Once we’re done, we scale each differential rating based on the number of times we’ve seen that pair of items.

        for item, ratings in self.diffs.iteritems():

            for item2, rating in ratings.iteritems():

            ratings[item2] /= self.freqs[item][item2]

Now let’s pull the whole thing together, so you can see how simple this update method is.

    def update(self, data):

        for user, prefs in data.iteritems():

            for item, rating in prefs.iteritems():

                self.freqs.setdefault(item, {})

                self.diffs.setdefault(item, {})

                for item2, rating2 in prefs.iteritems():

                    self.freqs[item].setdefault(item2, 0)

                    self.diffs[item].setdefault(item2, 0.0)

                    self.freqs[item][item2] += 1

                    self.diffs[item][item2] += item - item2

        for item, ratings in self.diffs.iteritems():

            for item2, rating in ratings.iteritems():

                ratings[item2] /= self.freqs[item][item2]

That’s a measly 13 lines of code, 4 of which are due to us using dict objects to represent matrices. Nice! (By the way, notice that the update method made no use of the user IDs we passed in. This technique doesn’t care who rated what, just how an item has been rated. Collaborative filtering techniques that don’t use user information directly are called “item-based”, while those that do are called “user-based”.) It’s easy to fill out our matrices:

s = SlopeOne()

s.update(dict(alice=dict(squid=1.0, cuttlefish=4.0)),

              bob=dict(squid=1.0, octupus=3.0)))

With a little data in place, we’d like to make a prediction. To do this, we’ll pass the ratings a user has made to the predict method; it will return predictions of ratings for items the user hasn’t rated.

    def predict(self, userprefs):

We begin with two empty dicts; preds will accumulate prediction data, and freqs will accumulate counts.

        preds, freqs = {}, {}

We iterate over every item the user has rated, for items that have been rated by anyone (the try/except clause below lets us skip over unrated item pairs).

        for item, rating in userprefs.iteritems():

            for diffitem, diffratings in self.diffs.iteritems():

                try:

                    freq = self.freqs[diffitem][item]

                except KeyError:

                    continue

We make sure there are zeroes present in our dicts before we accumulate anything into them.

                preds.setdefault(diffitem, 0.0)

                freqs.setdefault(diffitem, 0)

The accumulation step is simple.

                preds[diffitem] += freq * (diffratings[item] + rating)

                freqs[diffitem] += freq

We must filter the result a little before we return it. We return a new dict, containing weighted predictions for every item, but omitting items that the user has rated (no point in predicting those), and omitting items for which we have a cumulative frequency of zero.

        return dict([(item, value / freqs[item])

                     for item, value in preds.iteritems()

                     if item not in userprefs and freqs[item] > 0])

And now let’s see the predict method without all of those intrusive comments, again to see how simple it is.

    def predict(self, userprefs):

        preds, freqs = {}, {}

        for item, rating in userprefs.iteritems():

            for diffitem, diffratings in self.diffs.iteritems():

                try:

                    freq = self.freqs[diffitem][item]

                except KeyError:

                    continue

                preds.setdefault(diffitem, 0.0)

                freqs.setdefault(diffitem, 0)

                preds[diffitem] += freq * (diffratings[item] + rating)

                freqs[diffitem] += freq

        return dict([(item, value / freqs[item])

                     for item, value in preds.iteritems()

                     if item not in userprefs and freqs[item] > 0])

Our proof-of-concept Weighted Slope One predictor clocks in at 33 lines of Python. If we had a sparse 2D matrix class handy, we could trim six of those lines. That’s not bad! It wouldn’t be difficult to go from this trivial implementation to something truly efficient. From the way we construct the matrices in the update method, we can make a few potentially space saving observations.

Some portions of each matrix are entirely empty. They don’t contain zeroes; they contain nothing, because nobody has rated those pairs of items. If Alice rates squid and cuttlefish, while Bob rates squid and octopus, there will be no entries for cuttlefish vs octopus in either matrix.
The diagonal of the self.diffs matrix will always be either empty or zero, so there’s no need to store values on the diagonal of the self.freqs matrix.
Each entry in the upper right of the self.diffs matrix is the negative of the symmetric entry in the bottom left. The upper right of the self.freqs matrix is symmetric to the bottom left (corresponding entries have identical values).

We could thus represent each matrix as a strictly lower triangular matrix. Because much of a matrix is going to be empty for real databases of items and ratings, we would save a lot by storing it in sparse form. If you want to build a web site that uses collaborative filtering, and you’re not a specialist, the Slope One scheme has a lot going for it. It is simple to implement, and the arithmetic is easy to understand (in stark contrast to techniques that use singular value decomposition, cosine correlation or Pearson’s correlation coefficient, or other gymnastic statistical approaches). It’s also easy to see that updates and predictions can both be performed in data parallel style (predictions are easy to express in mapreduce form). Better yet, it’s easy and cheap to update the Slope One data structures on the fly, instead of taking everything offline and performing a heavyweight nightly thrashing of a database. You can download a copy of my sample Weighted Slope One implementation as slope_one.py. Have fun!

Posted in software

18 comments on “Collaborative filtering made easy”

Justin Mason says:

2006-12-13 at 03:39

Wow, that’s simple. Thanks for posting it!
john o'donovan says:

2007-03-13 at 16:00

good work!! i’ve implemented a few different CF algorithms. the smallest one being about 10 times what you have done 😉
Ray Waterman says:

2007-04-20 at 02:45

Nice clear explaination
PB says:

2007-05-08 at 12:45

I think there is a typo in the article:

self.diffs[item][item2] += item – item2

should be:

self.diffs[item][item2] += rating – rating2

Thanks for the fantastic article!
mohit says:

2007-08-22 at 07:05

This is a excellent post .Thanks for it.
FeepingCreature says:

2007-08-28 at 10:43

Just for the sake of comparison, here’s an implementation of the same algorithm in D.

http://pastebin.ca/673772

It’s not quite as neat, but it comes surprisingly close for a strong-typed, C-derivative language.
veritas says:

2008-01-25 at 15:14

Thanks for the great explanation and tutorial

I’m a little curious about implementing this using a sparse matrix. Would I need a special sparse matrix class or could I just pass in dicts that aren’t full (dense) matrices.

I.E, can I just pass in a matrix (formed using dicts) that just contains non-zero ratings.

I’m guessing when I do this I could also remove these two lines:

self.freqs[item].setdefault(item2, 0)

self.diffs[item].setdefault(item2, 0.0)

Looking through the code, I can’t see a reason why it wouldn’t work, but I’m new to this so if you can chime in, I’d appreciate it
Doug says:

2008-02-20 at 13:16

Hi,
I run this algorithm against the MovieLens 100K data set.
When I try to get recommendation for a movie sometimes the predicted recommendation is far above 5. Notice that the valid rating should be in between 1 and 5.

Is this an issue with this SlopeOne algo?

Thanks
Jason DeFontes says:

2008-10-31 at 13:28

@Doug,

Yes, that is expected. If you put the algorithm into words (in the simplest case of only two items) it would be something like:

“On average, everyone else rates Octopus with two more points than they rate Squid. Therefore, your predicted rating for Octopus should be two points more than your actual rating for Squid.”

If your actual rating for Squid is 4, then your predicted rating for Octopus will be calculated as 6. If the acceptable range of ratings is 1 to 5, then you just have to truncate the prediction to make it fit.
Arayana Bilgi says:

2010-04-24 at 05:57

Great code here; could implement at the site flawlessly.
Nan Zheng says:

2010-04-28 at 14:09

Althogh I am not familiar with Python, I still feel this blog great as it make an easy way to applicate slope one. However, I wonder if you considered the input data scale. As if the input rating data is quite huge, can the dic still works?
anonymous says:

2010-10-21 at 10:18

Hi, shouldn’t the line
freq = self.freqs[diffitem][item]
actually be
freq += self.freqs[diffitem][item]

?
The problems mentioned by PB are also not fixed yet 😉
anonymous says:

2010-10-21 at 10:27

(forget about the post above)
Matt says:

2012-01-25 at 08:31

Hey! Your link to `slope_one.py` is dead
Matt says:

2012-01-25 at 09:36

Happens to be a copy of slope_one.py here http://code.google.com/p/citylife1/source/browse/trunk/slope_one.py?r=2 and a line for line Ruby port https://github.com/ashleyw/Slope-One
Matt says:

2012-01-25 at 10:26

Thanks for the code Bryan. A warning to anyone using it, as explained in a comment in the Haskell port, the `update` method is wrong from the second time you use it.

http://www.serpentine.com/blog/2007/08/27/weighted-slope-one-in-haskell-collaborative-filtering-in-29-lines-of-code/#comment-75536
Lhfcws says:

2013-06-05 at 02:30

I notice that you have 3 nested for-loop in def update(). I used your code example , and tried running data with 5000 * 820. It seems that the time cost is not so accepted with 3 nested for-loop( time cost is O(NMM) ).
Onefly says:

2014-03-17 at 02:07

in the last line “print s.predict(dict(squid=0.4))”
why should be dict(squid=0.4)?
Thank you!