Placeholder Image

Subtitles section Play video

  • Anant has already talked to you about recommender systems, and so far we have

  • learned how to do, collaborative filtering recommender systems and

  • how to do, profile based recommender systems.

  • What we will do today in this lecture,

  • we will look into latent factor recommender models.

  • So basically the idea will be that we will adopt this

  • machine learning perspective where we'll be using optimization in order to build.

  • Very accurate recommender systems.

  • And in particular, our lecture today will be moderated by, by,

  • by the Netflix price, right?

  • So, Netflix is, is a company that is renting out, DVDs, and in order

  • to increase customer experience, Netflix wants to be able to recommend which

  • customers will like which movies, and this is called the movie recommender system.

  • Right, and a few years ago, actually, Netflix prepared some data and

  • created The Grand Challenge, where the idea was who can build the best

  • recommender system and the person who does that gets a one million dollars prize.

  • So here is how the.

  • Competition was structured.

  • So the idea was that first we would get, scientists would get training data.

  • This is a set of data of 100 million ratings from 480,000, users.

  • And around 17,000 movies.

  • And this is six years' worth of data.

  • From 2000 to 2005.

  • Right?

  • So based on this data the idea is that we build.

  • Or create a recommender system and then we would send these

  • recommendations to Netflix that would check our predicted recommendation.

  • Basically how much does a given user like the movie.

  • We did their test data set which is a set of Ratings that only Netflix has, and

  • this way we would want to predict or, the, and make sure that the predicted ratings

  • are correspond to the actual true ratings of the people who like those movies.

  • So that's basically the idea.

  • So, we are given hundred million movie ratings, and

  • we want to build a recommender system out of this.

  • So, the way we can think of our ratings data in,

  • is in terms of the utility matrix R.

  • And, basically, the way this matrix is composed is that

  • rows corresponds to movies and columns corresponds to users.

  • And this matrix,

  • most of it is empty in a sense that most users haven't seen most of the movies.

  • Right? But whenever a mo, a user sees a movie and

  • rates that movie, that represents an entry in this matrix, right?

  • So for example, this entry, in this particular en,

  • element of the matrix, number 5, that would mean that, a user in the,

  • in the fourth column liked the third movie a lot and rated it given five stars.

  • Right? So now given such a,

  • such a matrix, we want to formulate.

  • The recommendation problem.

  • The way we think of the recommendation problem is in a sense that we want to

  • predict how much will a given user like some movie that

  • they will see in the future.

  • So in this sense, we can use the historical rating data to build our model.

  • This is what we'll call the training data.

  • And then using this strange model we want to predict how much the users like

  • movies that they haven't yet seen and we will call this the test data.

  • And now when I say that we want to predict how much will a user like a given movie

  • the way we want to quantify how accurate is our prediction we will use what is

  • called the root mean square data.

  • So, basically the idea will be that our goal is to make predictions about how

  • much users are going to like the movies, then we will go and ask these users hey,

  • how much do you really like that movie?

  • And then we want these two, the prediction and

  • the true value, to be as close as possible.

  • So, the root mean square error, the way the formula works, is follows.

  • Right?

  • For a given user x in movie i I want to predict how much will that

  • user-movie pair what will be the rating of user x to movie i.

  • Of course I also know the true rating for movie i of user x.

  • And what we want to do is we want the discrepancy between the predicted rating

  • and the true rating.

  • To be, to be small in a sense that we want to take the difference

  • square this difference.

  • The reason we want to

  • square is because the difference could be either positive of negative.

  • So if we square it up, everything is positive.

  • Now we sum all these errors across all the users and all the movies.

  • We take the square root and divide by the total number of ratings, and

  • this is called root-mean-square error.

  • So in some sense, our goal is to in some sense complete this matrix R.

  • Wherever the values for ratings are unknown.

  • And our goal is to complete this matrix as accurately as possible.

  • So in some sense,

  • we want to predict how much is a given user going to like a given movie.

  • So what is the input and what is the output, right?

  • So what Netflix gives us is the training data of hun, 100 million ratings.

  • So half a million people, 20,000 movies, and, and ratings.

  • Our goal is to predict the, a few ratings for every user.

  • Where the evaluation criteria or the good, the way, how we

  • evaluate how good that our predictions, are based on the root mean squared error.

  • And, what we also know, for example, is that, that the root mean square error of

  • the Netflix system that was used in house and at that point in time was zero, 0.95.

  • Okay?

  • So in some sense, on average this, this in,

  • in-house recommender system was about one star away from the, from the true rating.

  • Right?

  • So, this competition that was taking place in 2009 and 2010.

  • More than 2,700 teams joined the competition.

  • And what matrix also said was whoever is able to

  • improve their in-house recommended system by 10% gets $1 million prize.

  • Right? So the idea is how can

  • we beat the in-house recommended system and get.

  • . The root brings square error from .9 to

  • .85 that's the whole idea.

  • Now the structure of how to do this is the following.

  • Basically a modern recommender system today is composed of

  • several different components.

  • And, the idea is that basically we want to adopt this kind of

  • multi-scale modeling approach to the, to the rating data.

  • And the idea is that we want to combine several different views of the data from

  • this type of global top level view of the data to some kind of regional modeling of

  • the data, all the way down to very local interactions between movies and the users.

  • Right? And what the,

  • the goal of today's lecture will be,

  • that we basically focus on the, on the, original or the,

  • you know, the middle level, modeling of the data through matrix factorization.

  • So the methodology that we'll be using to, to achieve, our modeling will be

  • through matrix factorization which relates very nicely to the singular value,

  • the composition and dimensionality reduction.

  • Lecture that we talked about previously.

  • But before we kind of go to the main body of today's lecture, let me talk to you

  • about global modeling and also about collaborative filtering which is very

  • a local way of modeling the data So what do I mean by global modeling of the data.

  • The global modeling of the data is basically that you want to kind of

  • compute some average, average rating over the movies and, and also over the users.

  • Right so if you would use a global way to model how much will

  • a given user like a given movie then we could say the min movie rating in the,

  • in our data set on Netflix is 3.7 stars.

  • And then we could say oh but for example the Sixth Sense movie.

  • Is, is rated half a star above the average rating.

  • And then we could say, oh, but this given user is very critical.

  • So they tend to, they tend to rate things

  • about 0.2 stars below the average rating, right.

  • So we could now say, okay, so 3.7 is our baseline.

  • Now we have a movie that is above average for half a star, so that is 4.2, but this

  • person, Joe, is a critical person, so they tend to be .2 stars below the average.

  • So, 4.2 minus, minus .2 equals 4, right?

  • So, in some sense our prediction how much will Joe like the Sixth Sense,

  • Sense would be 4 stars, as I said because of 3.7 plus .5.

  • Minus 0.2 equals 4.0, right?

  • So this is how basically, just based on the average,

  • average of the user rating and

  • average of the movie rating and overall average rating throughout the data set,

  • we can already make a prediction how much will a user like a given movie.

  • Of course this is a very crude, very global prediction.

  • But it turns out to be very useful.

  • So this is at the end, one end of the spectrum.

  • On the other end of the spectrum we can use very kind of local neighborhood

  • properties of a given user and a given movie to make recommendations.

  • And this is what collaborative filtering does.

  • Right? So in some sense the idea is,

  • would be following, right?

  • So I would like to,

  • to estimate what are other movies that are similar to the Sixth Sense that Joe

  • also rated and based on this I would then make a recommendation.

  • Right?

  • And based on these kinds of signals I would then make a final estimate that Joe

  • is like, going to like our movie with, or rate it with 3.8 stars.

  • Okay?

  • So how does collaborative filtering do what I just said?

  • The idea is basically the following.

  • Is that we want to derive unknown ratings from those,

  • from the ratings of similar movies.

  • Right? This is the item-item base

  • collaborative filtering.

  • So now the, the big question with collaborative filtering.

  • Is how do we operationalize the notion of similar?

  • Right? So, I basically won't need

  • to define a similarity measure that tells me how similar are movies i and j.

  • And now that I, that for every pair of movies for

  • example, I'm able to measure the similarity.

  • Then the idea is the following, I given that I want to make up a recommendation.

  • For movie I, the way I will do this is I would go find K nearest neighbors,

  • basically K most similar movies to movie I, and then I would ask,

  • ask the ratings and, and add them together in a weighted combination where movies

  • that are more similar to my movie I, they have a higher weight.

  • So, this is the formula that I have.

  • At the bottom, So let me explain it.

  • Right. So

  • what we want to do is we want to predict how much will user x like movie i.

  • The way we do this is that for a.

  • For a given movie.

  • And for we go and find let's say k-nearest neighbors.

  • Meaning k most similar movies j to this query movie i.

  • And then we say what was the rating that this user X gave to that given movie I,

  • and then we also multiply this with the similarity to movie I and J right.

  • So the idea basically is that if movies I and J are very similar, then SIJ will be

  • will be high so the rating that user X gave to movie J will have a high weight.

  • If I, if the movie J is of close similarity to I, then what will happen is

  • that, that rating of user X to the movie J will have no weight in this summation.

  • And then, at the bottom, we just normalize, right?

  • And this is one way how we can compute,

  • the rating based on the collaborative filtering.

  • Now how do we put together the local effects and

  • the global effects right how do we combine collaborative filtering with glo,

  • with the global effects model that I described described two slides ago.

  • The way we do this is.

  • That basically we just add the two together.

  • Right? So here's,

  • here's the formula that I will explain again.

  • All right so my recommendation or

  • prediction of how much will user i user x like movie i.

  • Is the, the baseline predictor.

  • Which is what we talked before.

  • Right. So the, the baseline predictor for

  • user x and movie i is simply the average movie rating plus the average

  • deviation for user x plus the average deviation for movie i.

  • Right. So in our previous case v of x was

  • minus point two because our user, Joe, was.

  • Very critical, and in our case the 6th Sense was a good movie, so

  • the bias of that movie was .5, and the overall rating was 3.7.

  • Right?

  • So our baseline recommendation for Joe at that point in time was 4.0.

  • And now, of course, this is our baseline recommendation.

  • And now we can take the part from the collaborative filtering, where we take k

  • near, closest movie is, j, to the given movie, I.

  • And ask, what is the deviation of the,

  • of the user's rating with regard to the, to their baseline for that movie.

  • And take away the combination of this and compute the recommendation right so

  • here the the first part is the is the global part of the model and

  • the the second part is the collaborative filtering part of the model.

  • Of course here are a few caveats a few things to to be careful about.

  • First, first one is that finding coming up with the good similarity measure.

  • Basically the good SIJ is very hard and sometimes it's, it's arbitrary or

  • one has to try many different metrics before one starts to work.

  • But pairwise, the problem with pairwise similarities is that basically they,

  • they, they Consider the similarities between the movies but

  • they kind of ignore the similarities between the users.

  • And another thing here we could say is that taking this

  • weighted average sometimes can be, can be restricted.

  • But these are kind of, if there are some issues with this model.

  • These are the three to talk about.

  • But in practice this already works quite well.

  • So let me show you how I do these different models work, okay?

  • So what I'm showing you with this yellow arrow is,

  • what is the performance of various methods on this Netflix price.

  • Right, for example, if for a given user we would just go and

  • predict the global average, so we would just compute.

  • what is the global average star rating on the whole.

  • Netflix data set and what whenever a user comes you could just say 3.7.

  • That's our prediction.

  • If you do that our root and square add on would be 1.12.

  • Right so if we are kind of very naive and