Nguồn dữ liệu thường được dùng trong data mining

Keep in mind the data sources they have to mine…

  1. Purchased shopping carts = real money from real people spent on real items = powerful data and a lot of it.
  2. Items added to carts but abandoned.
  3. Pricing experiments online (A/B testing, etc.) where they offer the same products at different prices and see the results
  4. Packaging experiments (A/B testing, etc.) where they offer different products in different “bundles” or discount various pairings of items
  5. Wishlists – what’s on them specifically for you – and in aggregate it can be treated similarly to another stream of basket analysis data
  6. Referral sites (identification of where you came in from can hint other items of interest)
  7. Dwell times (how long before you click back and pick a different item)
  8. Ratings by you or those in your social network/buying circles – if you rate things you like you get more of what you like and if you confirm with the “i already own it” button they create a very complete profile of you
  9. Demographic information (your shipping address, etc.) – they know what is popular in your general area for your kids, yourself, your spouse, etc.
  10. user segmentation = did you buy 3 books in separate months for a toddler? likely have a kid or more.. etc.
  11. Direct marketing click through data – did you get an email from them and click through? They know which email it was and what you clicked through on and whether you bought it as a result.
  12. Click paths in session – what did you view regardless of whether it went in your cart
  13. Number of times viewed an item before final purchase
  14. If you’re dealing with a brick and mortar store they might have your physical purchase history to go off of as well (i.e. toys r us or something that is online and also a physical store)
  15. etc. etc. etc.

 

The four big take-a-ways you should have are:

  1. Amazon (or any retailer) is looking at aggregate data for tons of transactions and tons of people… this allows them to even recommend pretty well for anonymous users on their site.
  2. Amazon (or any sophisticated retailer) is keeping track of behavior and purchases of anyone that is logged in and using that to further refine on top of the mass aggregate data.
  3. Often there is a means of over riding the accumulated data and taking “editorial” control of suggestions for product managers of specific lines (like some person who owns the ‘digital cameras’ vertical or the ‘romance novels’ vertical or similar) where they truly are experts
  4. There are often promotional deals (i.e. sony or panasonic or nikon or canon or sprint or verizon pays additional money to the retailer, or gives a better discount at larger quantities or other things in those lines) that will cause certain “suggestions” to rise to the top more often than others – there is always some reasonable business logic and business reason behind this targeted at making more on each transaction or reducing wholesale costs, etc.

 

Trên trang hacker news

What is a Good Recommendation Algorithm? (acm.org)
70 points by Anon84 2230 days ago | 15 comments

 

Greg nails something that seems to be passing the academic world of recommendations by: you can’t measure recommendations quality with RMSE. It’s just not a good metric. User happiness is the goal, not the ability to predict ratings of unrated items. I’m glad to have someone with a little more clout than me saying this.Some ask, “What’s the difference?” If I tell you about 5 albums that you’ve already heard of, are the recommendations good? Even if we’re pretty certain you’ll like them? If you’re buying an obscure jazz album and you get “Kind of Blue” as a recommendation (probably most popular jazz album in history and one any jazz fan would know of) is that a good recommendation?

How do users build trust of recommendations? How does that factor into the algorithms? It turns out you need a mix of obvious and surprising results. All obvious and they don’t discover anything; all surprising and they don’t trust them.

Those are the right questions. A good algorithm for recommendations is one that people interact with and discover things with.

This is an awesome read (in fact, I uhm, submitted it a few minutes before at Greg’s blog, but it’s good enough that I upvoted it here too). As soon as I ran across it I immediately blogged, tweeted, and submitted here. I’d had a draft of an essay along these lines kicking around for ages.

—–

I think they use RMSE because it’s easy, not because it’s ideal. Bellkor, a participating team in Netflix challenge, discussed this in their paper describing their method who won the progress prize; they calculated whether minute differences in RMSE improved the quality of top10 results; it did pretty significantly.—–

Just fished it out — paper is here for the curious:http://public.research.att.com/~volinsky/netflix/RecSys08tut…

It’s one, amusingly, that I’d skipped because it seemed to be less technical. 🙂 Good stuff.

—–

This hasn’t been passing us by! Netflix were the ones who decided to make RMSE the criterion for their contest, and put up a million dollars to ride on it for good measure, so it’s hardly a surprise all the papers are focused on it. Of course, RMSE doesn’t measure user satisfaction; that’s why we write papers describing the techniques that seem to work, and it’s up to Netflix (and other recommendation service providers) to pick which of those they want to use given that they’re maximizing something slightly different.—–

It’s true that not being in academia that I don’t hear the conversations that fill the gaps between publications. But if one’s simply going from the published output on collaborative filtering at the moment there has been some convergence on RMSE as a benchmark. That’s understandable, since it’s easily measurable, and as you say, there are some folks throwing $1mm at it (which really isn’t much considering what it’d do for their sales).—–

Still, wouldn’t predicting how well somebody likes something form a good basis for running a recommendation engine on top of it? Maybe it is a waste of effort for many scenarios, but if you can do it well, you can still add all sorts of algorithms to pick the best recommedations from the predictions?—–

Well, that’s the question underlying the article. Consider the hypothetical case of a movie that is very controversial: all 1’s or 5’s. Even if your system can tell that a user is quite likely to fall in the ‘5’ camp, the only safe prediction for a high variance movie is something close to the middle. Even if you are pretty sure the user would give this movie a 5, the squared error for the small chance of a 1 is enormous.But a rating close to the middle is never going to be chosen as a recommendation if the algorithm involves recommending the movies with the highest predicted scores. Instead, an RMSE based system is always going to prefer safe 4’s over risky 5’s. This doesn’t mean that improved predictions can’t yield improved recommendations, but I don’t see truly great ones ever coming from a prediction based approach.

Personally, I want a recommendation system that maximizes the percentage of recommendations that I would rate as 5’s, and don’t much care if the misses are 1’s, 2’s, or 3’s.

—–

And beyond that it’s somewhat domain specific as to what the tolerance for misses is. In something like recommending online (non-paid) content, it doesn’t matter much. It’s worth more to take a gamble on something a user will really like than to give them something you’re sure they won’t hate. If you get two great hits and three bad hits, it’s probably still a net win for the user. On the other hand, if you’re say, doing online dating recommendations, you probably want to avoid the polarized cases since you could lose a paid customer with one horrible recommendation.—–

I’d argue that “user happiness” isn’t the goal for Netflix, long-term revenue is. That’s relatively easy to measure, and certainly easier than something nebulous like “user happiness.” You can even test different recommendation algorithms and see which maximizes long-term revenue.Presumably Netflix knows that the recommendation algorithm has a significant impact on their bottom line, which is why they launched the Netflix Prize to outsource new algorithm development.

Now, Netflix can’t give revenue data to third parties, and they also don’t want to let third-party recommendation algorithms run on their system because an “average” algorithm will hurt their bottom line.

The question then becomes: which well-understood metric correlates best with long-term revenue?

Perhaps the answer is RMSE, which is why Netflix chose it. That doesn’t seem totally implausible to me.

—–

You’d expect that. In the recommendations world that’s called “business rules” and includes things from skewing results based on margins to not showing inappropriate recommendations (say, women’s clothing to men).However, I’m pretty sure that Amazon’s recommendations don’t do that, or don’t do it much, anyway. Their “similar product” recommendations seem to be on a very simple (and often mediocre quality) pure counting correlation between two items purchases. It’s much harder to guess which algorithms are at work for personalized recommendations.

At the end of the day, profit margins aside, there’s a lot that goes into optimizing recommendations that can’t be easily measured. How do you measure customer loyalty based on good recommendations? There have been a number of market research studies that indicate that recommendations do drive customer loyalty, but it’s hard to say where the sweet spot is between skewing things toward higher margins vs. skewing things towards customer utility. About 80% of Amazon’s visitors aren’t there to buy stuff — and that’s great for them! They’ve become an information portal / window shopping location that happens to also sell stuff. Which is a great position to be in when somebody does think of buying stuff.

That Netflix uses RMSE for their contest doesn’t bother me. What I think Greg is reacting to (and certainly my sentiment, again, this is really similar to something I’d been writing) is that there’s becoming a blurring between stimulus and response here and there’s the assumption, if not in this subfield, certainly among those casually tracking recommendations advances, that RMSE is a good way of measuring a recommendations algorithm, not just, “the metric Netflix is using”, when in fact, it’s really a much more inexact science.

—–

A simple item based algorithm which has been reported to work quite well is Slope One. The advantages are that it is easy to implement, can be updated on the fly, and it works well (enough) for sparse data sets.http://www.daniel-lemire.com/fr/abstracts/SDM2005.html

There’s also examples using python, java, and PHP/SQL.

—–

A friend of mine made a Rails plugin called acts_as_recommendable (a plugin for collaborative filtering): http://github.com/maccman/acts_as_recommendable/tree/master—–

1) Getting more preference-defining data from the user trumps algorithm improvements at this point. Netflix would have improved RMSE even more by turning over additional data like queue-adds, page views, user age\gender, etc. 2) Use caution criticizing RMSE as overly blunt. It may seem so, but it is not obvious that an algorithm can be improved for top N prediction simply because you declare that as the focus.—–

Netflix needed a formal measure for their contest, so RMSE is a useful one while “making people happy” is not. A business that relies on recommendation can plug a new algorithm with better RMSE and get improved results immediately, it is an important part of the puzzle.—–

“Making people happy” is hard to define, but you can pick better concrete metrics than RMSE, and this article offers suggestions on how. An important part of solving any problem is defining success correctly.—–

 

 

 

Trên blog blog.echen.me

Item-to-Item Collaborative Filtering with Amazon’s Recommendation System

Introduction

In making its product recommendations, Amazon makes heavy use of an item-to-item collaborative filtering approach. This essentially means that for each item X, Amazon builds a neighborhood of related items S(X); whenever you buy/look at an item, Amazon then recommends you items from that item’s neighborhood. That’s why when you sign in to Amazon and look at the front page, your recommendations are mostly of the form “You viewed… Customers who viewed this also viewed…”.

Other approaches.

The item-to-item approach can be contrasted to:

  • A user-to-user collaborative filtering approach. This finds users similar to you (e.g., it could find users who bought a lot of items in common with you), and suggest items that they’ve bought but you haven’t.
  • A global, latent factorization approach. Rather than looking at individual items in isolation (in the item-to-item approach, if you and I both buy a book X, Amazon will make essentially the same recommendations based on X, regardless of what we’ve bought in the past), a global approach would look at all the items you’ve bought, and try to detect properties that characterize what you like. For example, if you buy a lot of science fiction books and also a lot of romance books, a global-approach algorithm might try to recommend you books with both science fiction and romance elements.

Pros/cons of the item-to-item approach:

  • Pros over the user-to-user approach: Amazon (and most applications) has many more users than items, so it’s computationally simpler to find similar items than it is to find similar users. Finding similar users is also a difficult algorithmic task, since individual users often have a very wide range of tastes, but individual items usually belong to relatively few genres.
  • Pros over the factorization approach: Simpler to implement. Faster to update recommendations: as soon as you buy a new book, Amazon can make a new recommendation in the item-to-item approach, whereas a factorization approach would have to wait until the factorization has been recomputed. The item-to-item approach can also be more easily leveraged in several areas, not only in the recommendations made to you, but also in the “similar items/other customers also bought” section when you look at a particular item.
  • Cons of the item-to-item approach: You don’t get very much diversity or surprise in item-to-item recommendations, so recommendations tend to be kind of “obvious” and boring.

How to find similar items

Since the item-to-item approach makes crucial use of similar items, here’s a high-level view of how to do it. First, associate each item with the set of users who have bought/looked at it. The similarity between any two items could then be a normalized measure of the number of users they have in common (i.e., the Jaccard index) or the cosine distance between the two items (imagine each item as a vector, with a 1 in the ith element if user i has bought it, and 0 otherwise).

 

 

 

 

 

Trên bài viết what is good recommendation

Netflix is offering one million dollars for a better recommendation engine.  Better recommendations clearly are worth a lot.

But what are better recommendations?  What do we mean by better?

In the Netflix Prize, the meaning of better is quite specific.  It is theroot mean squared error (RMSE) between the actual ratings Netflix customers gave the movies and the predictions of the algorithm.

Let’s say we build a recommender that wins the contest.  We reduce the error between our predictions and what people actually will rate by 10% over what Netflix used to be able to do.  Is that good?

Depending on what we want, it might be very good.  If what we want to do is show people how much they might like a movie, it would be good to be as accurate as possible on every possible movie.

However, this might not be what we want.  Even in a feature that shows people how much they might like any particular movie, people care a lot more about misses at the extremes.  For example, it could be much worse to say that you will be lukewarm (a prediction of 3 1/2 stars) on a movie you love (an actual of 4 1/2 stars) than to say you will be slightly less lukewarm (a prediction of 2 1/2 stars) on a movie you are lukewarm about (an actual of 3 1/2 stars).

Moreover, what we often want is not to make a prediction for any movie, but find the best movies.  In TopN recommendations, a recommender is trying to pick the best 10 or so items for someone. It does not matter if you cannot predict what people will hate or shades of lukewarm.  The only thing that matters is picking 10 items someone will love.

A recommender that does a good job predicting across all movies might not do the best job predicting the TopN movies.  RMSE equally penalizes errors on movies you do not care about seeing as it does errors on great movies, but perhaps what we really care about is minimizing the error when predicting great movies.

There are parallels here with web search.  Web search engines primarily care about precision (relevant results in the top 10 or top 3).  They only care about recall when someone would notice something they need missing from the results they are likely to see.  Search engines do not care about errors scoring arbitrary documents, just their ability to find the top N documents.

Aggravating matters further, in both recommender systems and web search, people’s perception of quality is easily influenced by factors other than the items shown.  People hate slow websites and perceive slowly appearing results to be worse than fast appearing results.  Differences in the information provided about each item (especially missing data or misspellings) can influence perceived quality.  Presentation issues, even color of the links, can change how people focus their attention and which recommendations they see.  People trust recommendations more when the engine can explain why it made them.  People like recommendations that update immediately when new information is available.  Diversity is valued; near duplicates disliked.  New items attract attention, but people tend to judge unfamiliar or unrecognized recommendations harshly.

In the end, what we want is happy, satisfied users.  Will a recommendation engine that minimizes RMSE make people happy?


Comments


Andrei Lopatenko

I believe that the main point of this post is correct: the best RMSE is not equal to the best user satisfaction, but I am not sure that the TopN is the only one relevant metrics for the movie recommendation system. For example, TopN does not say anything about the diversity (if I LOVE French comedies with Pierre Richard, it does not mean that I want to watch only them this week, I want more suggestions in different genres), novelty, etc
I would expect good movie recommendation system to be a good ‘exploration’ interactive system, which could tell me why I may like this movie and why it is similar/different from the movie I like/dislike (http://www.clerkdogs.com/ is a good example)


Eric Schwarzkopf

The movie rating example reminds me of utility theory – I really have to brush up on that but there might be some fitting models of utility that could be used to derive an improved quality measure of recommendations in certain domains.
I think the domain or user-need specificity of the quality measure is key here. I’ve got different requirements on a news filtering system than a movie recommendation system.
The former should keep me informed while consuming a minimum of my time and I don’t really need an explanation of why something was recommended to me – except for when it’s so far of that I’ve to figure out what corrective action to take.
The latter should assist me in figuring out in which movie to invest time and money, and I’m willing to invest some time up front to make a good decision. Here, diversity in the set of recommended movies and an explanation of the reasons for recommending a movie are welcome.


The account that made this comment no longer exists.

What makes a recommendation system great? In my mind the answer is simple. The best recommendation systems are the ones that engage the user and drive customer loyalty.

Things like RMSE over a test data set given a training set are at best crude proxies for this, and at worst completely miss the mark. Even metrics like click through rate, order size and conversion rate that just consider session-level behavior can be misleading. In my experience they tend to drive you towards recommendations that are not globally optimal in the long term.

The delicate balance is to be reactive to short-term trends in the market, but to do so with an eye towards driving long-term value via deep relationships with your customers.

I have this conversation with richrelevance’s customers all the time, and I’m pleased that they share my commitment to building long-lasting relationships with their customers.


Ian Soboroff

Beyond how you interpret RMSE (or whatever metric you decide on), you really do have to to consider the user’s task and the cost of a bad recommendation.

For a Netflix user, the cost of a bad recommendation is not so great. The risk of that bad recommendation (how bad does the recommedation have to be such that you still rent the movie have and still ruin your evening?) is also not so great.

I have long thought this is a perennial barrier for recommender research — beyond how commercializable it might or might not be, there’s only so far you can get trying to recommend movies. Recommenders are in use in lots of other domains, not all in product or media recommendation, but no research is being done there. Well, not a lot.


Jeremy Pickens

While I agree that user’s generally want more 5-star movies and fewer 1-star movies, I disagree that this means recommendation is similar to TopN web search. Web search assumes very little interactivity, and once the user has found the one item/link he is looking for, he is done with the search activity.

With recommendations, on the other hand, people are more exploratory- and recall-oriented. I’ll bet people don’t just have 3 or 10 items in their Netflix queue. We would have to ask Netflix what that average queue length is, but anecdotal evidence (http://www.geeksugar.com/1865307) places that number in the dozens to hundreds range. That’s much more recall-oriented than top3 or top10 web search.

Another example is music recommendation, ala Pandora. You seed Pandora with a few songs or artists that you like, and it then sets up a personalized, recommendation-oriented radio station for you, and streams the music to you at a rate of approximately 20 songs per hour. A couple of hours, over a couple of days, puts the number of recommendations in the hundreds. After a few weeks or months of using Pandora, this number moves to the thousands.

So unlike web search, where people want to find the one answer and be done, Pandora’s music recommendation is a longer-term, recall-oriented process. And I’ll bet people are even more willing to put up with some bad, and even more lukewarm, songs in the mix — because they’re more interested in getting as many good, different, interesting songs (dozens? hundreds?) as possible. Picking the 10 items that someone will love is not the only thing that matters to them. Recall trumps precision.


eric chaves

I think that the 5 star recommendation system is fundamentally flawed as a preference rating system. The five star system was meant to be a democratic rating system, and should have been used to measure individual preference. Netflix should have posed the challenge to develop a better rating system, not a better algorithm. Read more here:

http://www.thinksketchdesign.com/2009/03/25/web/media/netflix-on-facebook-the-slow-revolution-of-recommendation-engines


The account that made this comment no longer exists.

I’ve posted on this topic at

http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/netflix_prize_s.html

RMSE doesn’t reward a system that’s aware of its own uncertainty, and distinguishing between mediocrity and controversy does require a model of uncertainty.


Scott Wheeler

Another thing that seems to be often overlooked is how you get users to trust recommendations. When I first started playing with recommendation algorithms I was trying to produce novel results — things that the user didn’t know about and would be interesting to them, rather than using some of the more basic counting algorithms that are used e.g. for Amazon’s related products. What I realized pretty quickly is that even I didn’t trust the recommendations. They seemed disconnected, even if upon clicking on them I’d realize they were, in fact, interesting and related.

What I came to from that was that in a set of recommendations you usually want to scale them such that you slip in a couple of obvious results to establish trust — things the user almost certainly knows of, and probably won’t click on, but they establish, “Ok, yeah, these are my taste.” Then you apply a second ranking scheme and jump to things they don’t know about. Once you’ve established trust of the recommendations they’re much more likely to follow up on the more novel ones.

This differs somewhat from search where the catch phrase is “authoritative sources” (stemming back to Kleinberg’s seminal paper on graph-based search) — you want to hit the right mixes of novelty and identity, rather than just finding high degrees of correlation.


Phoebe Spanier

Perhaps for the best of both worlds, focusing on improving both search and recommendations (precision and recall) to offer people the two options for discovering media is the way to go.

http://www.jinni.com


Aleks Jakulin

I’ve posted on this topic at

http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/netflix_prize_s.html

RMSE doesn’t reward a system that’s aware of its own uncertainty, and distinguishing between mediocrity and controversy does require a model of uncertainty.

 

https://kunuk.wordpress.com/2012/03/04/how-does-the-amazon-recommendation-system-work-analyze-the-algorithm-and-make-a-prototype-that-visualizes-the-algorithm/

Advertisements