*anything*else. In early 2005, a researcher named Daniel Lemire published, with Anna Maclachlan, a paper with the jazzy title of “Slope One Predictors for Online Rating-Based Collaborative Filtering“. This is an important paper, because it presents a family of

*really simple*collaborative filtering schemes. I mean

*really*simple: there are

*no*statistics involved, just a little bit of linear algebra. Unfortunately, because the Lemire/Maclachlan paper is aimed at an academic audience, it’s not trivial to tell by merely reading it how simple the technique it presents is, so what follows is my attempt to translate its ideas into my native language of Attention Deficit Programmerese. To make things more concrete, I’m going to present an implementation in less than 40 lines of Python (and I’ll try to explain any obscure bits of Python as I go). Cool, huh? Regardless of the underlying implementation, collaborative filters tend to try to solve the same broad problem using much the same data.

- You have your crowd of users.
- You have your big pile of items.
- Some of your users have been generous enough to tell you what they think of some of your items.
- You want to hawk more items to a user, and you’d prefer to make your suggestions relevant to their likely interests.

*have*seen, to give a good guess as to which doodads they

*haven’t*seen, but

*might like*. I am not going to explain the mathematics behind the Slope One predictor, because Lemire and Maclachlan’s paper pretty much pulls the numerical justification out of a hat. I’m not immersed in the literature of the field, so I’ll be lazy and take them at their word when they say their approach works; and hey, that’s easy code to write, so let’s get hacking! (The scheme I’m going to talk about is referred to as “Weighted Slope One” in the Lemire and Maclachlan paper.) My Slope One implementation stores its data in two 2D matrices. The X and Y axes are identical in each matrix. If the items to be rated are “cuttlefish”, “octopus”, and “squid”, we end up with two 3×3 matrices, each with the following rows and columns: The coloured cells above indicate which elements of the matrix contain useful information. I’ll describe why most of the matrix is empty later. To keep things simple, let’s represent our matrices as two levels of Python

`dict`objects (a dict is simply a hash table, if you’re not familiar with Python). The key of each dict is an item ID, so to get the matrix entry for “squid vs cuttlefish”, we look in the first-level dict for squid, then the second-level dict for cuttlefish.

`class SlopeOne(object):`

` def __init__(self):`

` self.diffs = {}`

` self.freqs = {}`

The `self.diffs`matrix will hold differential ratings (I’ll describe what these are soon), and the

`self.freqs`matrix will contain the number of times we’ve computed a differential rating for each pair of items. We’ll fill in the first and second levels of the dicts once we know what our data looks like, so let’s talk about the shape of the data. We need to be able to express ratings such as “Alice rated a squid as 1, and a cuttlefish as 4”. This is easy to express with two levels of dicts; the first maps from a user to all of that user’s ratings, and the second maps items to their ratings. In other words, our user data looks like this in Python:

`userdata = dict(alice=dict(squid=1, cuttlefish=4)),`

` bob=dict(squid=3, octopus=2))`

The time has come to load the data into our matrices; let’s call the method update.
` def update(self, data):`

To fill in the matrices, we must look at each rating from every user.
` for user, ratings in data.iteritems():`

For each user and rating they have submitted, we will be iterating over every rating they have submitted a second time. The first thing we do is make sure that the necessary second-level `dict`s in each matrix exist.

` for item, rating in ratings.iteritems():`

` self.freqs.setdefault(item, {})`

` self.diffs.setdefault(item, {})`

The `setdefault`call is an unfortunately ugly Pythonic idiom that means “if there’s no value for the given key (the first parameter) in the

`dict`, assign it the given value (the second parameter)”. Inside this loop, we iterate over the user’s ratings again, to compare each rating against every other.

` for item2, rating2 in ratings.iteritems():`

` self.freqs[item].setdefault(item2, 0)`

` self.diffs[item].setdefault(item2, 0.0)`

Here, we ensure that there are zeroes present in the matrix for every pair of ratings. (Notice that we’re using an integer zero to count frequencies, and a floating point zero for ratings.)
For every pair of ratings, we modify the appropriate cell in `self.freqs`by adding one to the number of times we’ve seen that pair of ratings.

` self.freqs[item][item2] += 1`

And to update the `self.diffs`matrix, we compute the difference between the ratings for the two items.

` self.diffs[item][item2] += rating - rating2`

This is why I called `self.diffs`the “differential rating” matrix. Every time a user rates an item, we update the appropriate cells in this matrix to reflect the accumulated difference between

*that*rating and

*every other*rating the user has made. Once we’re done, we scale each differential rating based on the number of times we’ve seen that pair of items.

` for item, ratings in self.diffs.iteritems():`

` for item2, rating in ratings.iteritems():`

` ratings[item2] /= self.freqs[item][item2]`

Now let’s pull the whole thing together, so you can see how simple this `update`method is.

` def update(self, data):`

` for user, prefs in data.iteritems():`

` for item, rating in prefs.iteritems():`

` self.freqs.setdefault(item, {})`

` self.diffs.setdefault(item, {})`

` for item2, rating2 in prefs.iteritems():`

` self.freqs[item].setdefault(item2, 0)`

` self.diffs[item].setdefault(item2, 0.0)`

` self.freqs[item][item2] += 1`

` self.diffs[item][item2] += item - item2`

` for item, ratings in self.diffs.iteritems():`

` for item2, rating in ratings.iteritems():`

` ratings[item2] /= self.freqs[item][item2]`

That’s a measly 13 lines of code, 4 of which are due to us using `dict`objects to represent matrices. Nice! (By the way, notice that the

`update`method made no use of the user IDs we passed in. This technique doesn’t care

*who*rated

*what*, just

*how*an item has been rated. Collaborative filtering techniques that don’t use user information directly are called “item-based”, while those that do are called “user-based”.) It’s easy to fill out our matrices:

`s = SlopeOne()`

`s.update(dict(alice=dict(squid=1.0, cuttlefish=4.0)),`

` bob=dict(squid=1.0, octupus=3.0)))`

With a little data in place, we’d like to make a prediction. To do this, we’ll pass the ratings a user has made to the `predict`method; it will return predictions of ratings for items the user hasn’t rated.

` def predict(self, userprefs):`

We begin with two empty `dict`s;

`preds`will accumulate prediction data, and

`freqs`will accumulate counts.

` preds, freqs = {}, {}`

We iterate over every item the user has rated, for items that have been rated by anyone (the try/except clause below lets us skip over unrated item pairs).
` for item, rating in userprefs.iteritems():`

` for diffitem, diffratings in self.diffs.iteritems():`

` try:`

` freq = self.freqs[diffitem][item]`

` except KeyError:`

` continue`

We make sure there are zeroes present in our `dict`s before we accumulate anything into them.

` preds.setdefault(diffitem, 0.0)`

` freqs.setdefault(diffitem, 0)`

The accumulation step is simple.
` preds[diffitem] += freq * (diffratings[item] + rating)`

` freqs[diffitem] += freq`

We must filter the result a little before we return it. We return a new `dict`, containing weighted predictions for every item, but omitting items that the user has rated (no point in predicting those), and omitting items for which we have a cumulative frequency of zero.

` return dict([(item, value / freqs[item])`

` for item, value in preds.iteritems()`

` if item not in userprefs and freqs[item] > 0])`

And now let’s see the `predict`method without all of those intrusive comments, again to see how simple it is.

` def predict(self, userprefs):`

` preds, freqs = {}, {}`

` for item, rating in userprefs.iteritems():`

` for diffitem, diffratings in self.diffs.iteritems():`

` try:`

` freq = self.freqs[diffitem][item]`

` except KeyError:`

` continue`

` preds.setdefault(diffitem, 0.0)`

` freqs.setdefault(diffitem, 0)`

` preds[diffitem] += freq * (diffratings[item] + rating)`

` freqs[diffitem] += freq`

` return dict([(item, value / freqs[item])`

` for item, value in preds.iteritems()`

` if item not in userprefs and freqs[item] > 0])`

Our proof-of-concept Weighted Slope One predictor clocks in at 33 lines of Python. If we had a sparse 2D matrix class handy, we could trim six of those lines. That’s not bad!
It wouldn’t be difficult to go from this trivial implementation to something truly efficient. From the way we construct the matrices in the update method, we can make a few potentially space saving observations.
- Some portions of each matrix are entirely empty. They don’t contain zeroes; they contain
*nothing*, because nobody has rated those pairs of items. If Alice rates squid and cuttlefish, while Bob rates squid and octopus, there will be no entries for cuttlefish vs octopus in either matrix. - The diagonal of the
`self.diffs`matrix will always be either empty or zero, so there’s no need to store values on the diagonal of the`self.freqs`matrix. - Each entry in the upper right of the
`self.diffs`matrix is the negative of the symmetric entry in the bottom left. The upper right of the`self.freqs`matrix is symmetric to the bottom left (corresponding entries have identical values).

Wow, that’s simple. Thanks for posting it!

good work!! i’ve implemented a few different CF algorithms. the smallest one being about 10 times what you have done 😉

Nice clear explaination

I think there is a typo in the article:

self.diffs[item][item2] += item – item2

should be:

self.diffs[item][item2] += rating – rating2

Thanks for the fantastic article!

This is a excellent post .Thanks for it.

Just for the sake of comparison, here’s an implementation of the same algorithm in D.

http://pastebin.ca/673772

It’s not quite as neat, but it comes surprisingly close for a strong-typed, C-derivative language.

Thanks for the great explanation and tutorial

I’m a little curious about implementing this using a sparse matrix. Would I need a special sparse matrix class or could I just pass in dicts that aren’t full (dense) matrices.

I.E, can I just pass in a matrix (formed using dicts) that just contains non-zero ratings.

I’m guessing when I do this I could also remove these two lines:

self.freqs[item].setdefault(item2, 0)

self.diffs[item].setdefault(item2, 0.0)

Looking through the code, I can’t see a reason why it wouldn’t work, but I’m new to this so if you can chime in, I’d appreciate it

Hi,

I run this algorithm against the MovieLens 100K data set.

When I try to get recommendation for a movie sometimes the predicted recommendation is far above 5. Notice that the valid rating should be in between 1 and 5.

Is this an issue with this SlopeOne algo?

Thanks

@Doug,

Yes, that is expected. If you put the algorithm into words (in the simplest case of only two items) it would be something like:

“On average, everyone else rates Octopus with two more points than they rate Squid. Therefore, your predicted rating for Octopus should be two points more than your actual rating for Squid.”

If your actual rating for Squid is 4, then your predicted rating for Octopus will be calculated as 6. If the acceptable range of ratings is 1 to 5, then you just have to truncate the prediction to make it fit.

Great code here; could implement at the site flawlessly.

Althogh I am not familiar with Python, I still feel this blog great as it make an easy way to applicate slope one. However, I wonder if you considered the input data scale. As if the input rating data is quite huge, can the dic still works?

Hi, shouldn’t the line

freq = self.freqs[diffitem][item]

actually be

freq += self.freqs[diffitem][item]

?

The problems mentioned by PB are also not fixed yet 😉

(forget about the post above)

Hey! Your link to `slope_one.py` is dead

Happens to be a copy of slope_one.py here http://code.google.com/p/citylife1/source/browse/trunk/slope_one.py?r=2 and a line for line Ruby port https://github.com/ashleyw/Slope-One

Thanks for the code Bryan. A warning to anyone using it, as explained in a comment in the Haskell port, the `update` method is wrong from the second time you use it.

http://www.serpentine.com/blog/2007/08/27/weighted-slope-one-in-haskell-collaborative-filtering-in-29-lines-of-code/#comment-75536

I notice that you have 3 nested for-loop in def update(). I used your code example , and tried running data with 5000 * 820. It seems that the time cost is not so accepted with 3 nested for-loop( time cost is O(NMM) ).

in the last line “print s.predict(dict(squid=0.4))”

why should be dict(squid=0.4)?

Thank you!