Personalizing the Web:Building effective recommender systems : Personalizing the Web: Building effective recommender systems Bamshad Mobasher
Center for Web Intelligence
School of Computer Science, Telecommunication, and Information Systems
DePaul University, Chicago, Illinois, USA
Outline : Outline Web Personalization & Recommender systems
Basic Approaches & Algorithms
Special focus on collaborative filtering
Extending Traditional Approaches
Hybrid models
Personalization Based on Data Mining
Vulnerability of Collaborative Filtering to Attacks
Web Personalization : Web Personalization The Problem
Dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests
Common Approaches
Collaborative Filtering
Give recommendations to a user based on preferences of “similar” users
Preferences on items may be explicit or implicit
Content-Based Filtering
Give recommendations to a user based on items with “similar” content in the user’s profile
Rule-Based (Knowledge-Based) Filtering
Provide recommendations to users based on predefined (or learned) rules
age(x, 25-35) and income(x, 70-100K) and childred(x, >=3) recommend(x, Minivan)
Slide4 : Content-Based Recommender Systems
Content-Based Recommenders: Personalized Search Agents : Content-Based Recommenders: Personalized Search Agents How can the search engine determine the “user’s context”? Query: “Madonna and Child” ? ? Need to “learn” the user profile:
User is an art historian?
User is a pop music fan?
Collaborative Recommender Systems : Collaborative Recommender Systems
Collaborative Recommender Systems : Collaborative Recommender Systems
Slide8 : Collaborative Recommender Systems
Collaborative Recommender Systems : Collaborative Recommender Systems http://movielens.umn.edu
Slide10 : Hybrid Recommender Systems
Other Combined to Hybrid Recommenders : Other Combined to Hybrid Recommenders
Other Forms of Collaborative Filtering : Other Forms of Collaborative Filtering Social Tagging (Folksonomy)
people add free-text tags to their content
where people happen to use the same terms then their content is linked
frequently used terms floating to the top to create a kind of positive feedback loop for popular tags.
Examples:
Del.icio.us
Flickr
QLoud & iTunes
Social / Collaborative Tags : Social / Collaborative Tags
Social / Collaborative Tags : Social / Collaborative Tags
Social / Collaborative Tags : Social / Collaborative Tags
The Recommendation Task : The Recommendation Task Basic formulation as a prediction problem
Typically, the profile Pu contains preference scores by u on some other items, {i1, …, ik} different from it
preference scores on i1, …, ik may have been obtained explicitly (e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)
Content-Based Recommenders : Content-Based Recommenders Predictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile.
E.g., user profile Pu contains
recommend highly: and recommend “mildly”:
Content-Based Recommenders :: more examples : Content-Based Recommenders :: more examples Music recommendations
Play list generation Example: Pandora
Basic Collaborative Filtering Process : Basic Collaborative Filtering Process Neighborhood Formation Phase Recommendations Neighborhood
Formation Recommendation
Engine Current User Record Historical
User Records user item rating Nearest
Neighbors Combination
Function Recommendation Phase
Collaborative Recommender Systems : Collaborative Recommender Systems Collaborative filtering recommenders
Predictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’s profile
i.e. users with similar tastes (aka “nearest neighbors”)
requires computing correlations between user u and other users according to interest scores or ratings
k-nearest-neighbor (knn) strategy
Can we predict Karen’s rating on the unseen item Independence Day?
Collaborative Filtering: Measuring Similarities : Collaborative Filtering: Measuring Similarities Pearson Correlation
weight by degree of correlation between user U and user J
1 means very similar, 0 means no correlation, -1 means dissimilar
Works well in case of user ratings (where there is at least a range of 1-5)
Not always possible (in some situations we may only have implicit binary values, e.g., whether a user did or did not select a document)
Alternatively, a variety of distance or similarity measures can be used
Average rating of user J
on all items.
Slide22 : Collaborative filtering recommenders
Predictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’s profile
i.e. users with similar tastes (aka “nearest neighbors)
requires computing correlations between user u and other users according to interest scores or ratings
prediction Correlation to Karen Predictions for Karen on Indep. Day based on the K nearest neighbors Collaborative Recommender Systems
Collaborative Filtering: Making Predictions : Collaborative Filtering: Making Predictions When generating predictions from the nearest neighbors, neighbors can be weighted based on their distance to the target user
To generate predictions for a target user a on an item i:
ra = mean rating for user a
u1, …, uk are the k-nearest-neighbors to a
ru,i = rating of user u on item I
sim(a,u) = Pearson correlation between a and u
This is a weighted average of deviations from the neighbors’ mean ratings (and closer neighbors count more)
Example Collaborative System : Example Collaborative System Best match Prediction
Using k-nearest neighbor with k = 1
Collaborative Recommenders :: problems of scale : Collaborative Recommenders :: problems of scale
Item-based Collaborative Filtering : Item-based Collaborative Filtering Find similarities among the items based on ratings across users
Often measured based on a variation of Cosine measure
Prediction of item I for user a is based on the past ratings of user a on items similar to i.
Suppose:
Predicted rating for Karen on Indep. Day will be 7, because she rated Star Wars 7
That is if we only use the most similar item
Otherwise, we can use the k-most similar items and again use a weighted average sim(Star Wars, Indep. Day) > sim(Jur. Park, Indep. Day) > sim(Termin., Indep. Day)
Item-based collaborative filtering : Item-based collaborative filtering
Item-Based Collaborative Filtering : Item-Based Collaborative Filtering Bestmatch Prediction
Collaborative Filtering: Evaluation : Collaborative Filtering: Evaluation split users into train/test sets
for each user a in the test set:
split a’s votes into observed (I) and to-predict (P)
measure average absolute deviation between predicted and actual votes in P
MAE = mean absolute error
average over all test users
Semantically Enhanced Collaborative Filtering : Semantically Enhanced Collaborative Filtering Basic Idea:
Extend item-based collaborative filtering to incorporate both similarity based on ratings (or usage) as well as semantic similarity based on domain knowledge
Semantic knowledge about items
Can be extracted automatically from the Web based on domain-specific reference ontologies
Used in conjunction with user-item mappings to create a combined similarity measure for item comparisons
Singular value decomposition used to reduce noise in the semantic data
Semantic combination threshold
Used to determine the proportion of semantic and rating (or usage) similarities in the combined measure
Semantically Enhanced Hybrid Recommendation : Semantically Enhanced Hybrid Recommendation An extension of the item-based algorithm
Use a combined similarity measure to compute item similarities:
where,
SemSim is the similarity of items ip and iq based on semantic features (e.g., keywords, attributes, etc.); and
RateSim is the similarity of items ip and iq based on user ratings (as in the standard item-based CF)
is the semantic combination parameter:
= 1 only user ratings; no semantic similarity
= 0 only semantic features; no collaborative similarity
Semantically Enhanced CF : Semantically Enhanced CF Movie data set
Movie ratings from the movielens data set
Semantic info. extracted from IMDB based on the following ontology
Semantically Enhanced CF : Semantically Enhanced CF Used 10-fold x-validation on randomly selected test and training data sets
Each user in training set has at least 20 ratings (scale 1-5)
Semantically Enhanced CF : Semantically Enhanced CF Dealing with new items and sparse data sets
For new items, select all movies with only one rating as the test data
Degrees of sparsity simulated using different ratios for training data
Web Mining Approach to Personalization : Web Mining Approach to Personalization Basic Idea
generate aggregate user models (usage profiles) by discovering user access patterns through Web usage mining (offline process)
Clustering user transactions
Clustering items
Association rule mining
Sequential pattern discovery
match a user’s active session against the discovered models to provide dynamic content (online process)
Advantages
no explicit user ratings or interaction with users
helps preserve user privacy, by making effective use of anonymous data
enhance the effectiveness and scalability of collaborative filtering
Web Usage Mining : Web Usage Mining Web Usage Mining
discovery of meaningful patterns from data generated by user access to resources on one or more Web/application servers
Typical Sources of Data:
automatically generated Web/application server access logs
e-commerce and product-oriented user events (e.g., shopping cart changes, product clickthroughs, etc.)
user profiles and/or user ratings
meta-data, page content, site structure
User Transactions
sets or sequences of pageviews possibly with associated weights
a pageview is a set of page files and associated objects that contribute to a single display in a Web Browser
Personalization Based on Web Usage Mining : Personalization Based on Web Usage Mining Offline Process
Personalization Based on Web Usage Mining: : Personalization Based on Web Usage Mining: Online Process
Conceptual Representation of User Transactions or Sessions : Conceptual Representation of User Transactions or Sessions Session/user data Pageview/objects Raw weights are usually based on time spent on a page, but in practice, need to normalize and transform.
Web Usage Mining: clustering example : Web Usage Mining: clustering example Transaction Clusters:
Clustering similar user transactions and using centroid of each cluster as a usage profile (representative for a user segment) Sample cluster centroid from CTI Web site (cluster size =330)
Using Clusters for Personalization : Using Clusters for Personalization PROFILE 0 (Cluster Size = 3)
--------------------------------------
1.00 C.html
1.00 D.html
PROFILE 1 (Cluster Size = 4)
--------------------------------------
1.00 B.html
1.00 F.html
0.75 A.html
0.25 C.html
PROFILE 2 (Cluster Size = 3)
--------------------------------------
1.00 A.html
1.00 D.html
1.00 E.html
0.33 C.html Original Session/user data Result of
Clustering Given an active session A B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile.
Clustering and Collaborative Filtering :: clustering based on ratings: movielens : Clustering and Collaborative Filtering :: clustering based on ratings: movielens
Clustering and Collaborative Filtering :: tag clustering example : Clustering and Collaborative Filtering :: tag clustering example
Profile Injection Attacks : Profile Injection Attacks Consist of a number of "attack profiles"
added to the system by providing ratings for various items
engineered to bias the system's recommendations
Two basic types:
“Push attack” (“Shilling”): designed to promote an item
“Nuke attack”: designed to demote a item
Prior work has shown that CF recommender systems are highly vulnerable to such attacks
Attack Models
strategies for assigning ratings to items based on knowledge of the system, products, or users
examples of attack models: “random”, “average”, “bandwagon”, “segment”, “love-hate”
A Successful Push Attack : A Successful Push Attack Prediction
Best Match “user-based” algorithm using k-nearest neighbor with k = 1
Amazon blushes over sex link gaffeBy Stefanie Olsen : Amazon blushes over sex link gaffe By Stefanie Olsen http://news.com.com/Amazon+blushes+over+sex+link+gaffe/2100-1023_3-976435.html
Story last modified Mon Dec 09 13:46:31 PST 2002
In a incident that highlights the pitfalls of online recommendation systems, Amazon.com on Friday removed a link to a sex manual that appeared next to a listing for a spiritual guide by well-known Christian televangelist Pat Robertson.
The two titles were temporarily linked as a result of technology that tracks and displays lists of merchandise perused and purchased by Amazon visitors. Such promotions appear below the main description for products under the title, "Customers who shopped for this item also shopped for these items.”
Amazon's automated results for Robertson's "Six Steps to Spiritual Revival” included a second title by Robertson as well as a book about anal sex for men….
Amazon conducted an investigation and determined … “hundreds of customers going to the same items while they were shopping on the site”….
Profile Injection Attacks : Profile Injection Attacks
A Generic Attack Profile : A Generic Attack Profile Attack models differ based on ratings assigned to filler and selected items Ratings for k selected items Rating for the target item IS IF IÆ Ratings for l filler items Unrated items in the attack profile
Average and Random Attack Models : Random ratings for l filler items Average and Random Attack Models Random Attack: filler items are assigned random ratings drawn from the overall distribution of ratings on all items across the whole DB
Average Attack: ratings each filler item drawn from distribution defined by average rating for that item in the DB
The percentage of filler items determines the amount knowledge (and effort) required by the attacker Rating for the target item IF IÆ Unrated items in the attack profile
Bandwagon Attack Model : Bandwagon Attack Model What if the system's rating distribution is unknown?
Identify products that are frequently rated (e.g., “blockbuster” movies)
Associate the pushed product with them
Ratings for the filler items centered on overall system average rating (Similar to Random attack)
frequently rated items can be guessed or obtained externally Ratings for k frequently rated items Rating for the target item IS IF IÆ Random ratings for l filler items Unrated items in the attack profile
Segment Attack Model : Segment Attack Model Assume attacker wants to push product to a target segment of users
those with preference for similar products
fans of Harrison Ford
fans of horror movies
like bandwagon but for semantically-similar items
originally designed for attacking item-based CF algorithms
maximize sim(target item, segment items)
minimize sim(target item, non-segment items) Ratings for k favorite items in user segment Rating for the target item IS IF IÆ Ratings for l filler items Unrated items in the attack profile
Nuke Attacks: Love/Hate Attack Model : Nuke Attacks: Love/Hate Attack Model
Min rating for the target item IF IÆ Max rating for l filler items Unrated items in the attack profile A limited-knowledge attack in its simplest form
Target item given the minimum rating value
All other ratings in the filler item set are given the maximum rating value
Note:
Variations of this (an the other models) can also be used as a push or nuke attacks, essentially by switching the roles of rmin and rmax.
How Effective Can Attacks Be? : How Effective Can Attacks Be? First A Methodological Note
Using MovieLens 100K data set
50 different "pushed" movies
selected randomly but mirroring overall distribution
50 users randomly pre-selected
Results were averages over all runs for each movie-user pair
K = 20 in all experiments
Evaluating results
prediction shift
how much the rating of the pushed movie differs before and after the attack
hit ratio
how often the pushed movie appears in a recommendation list before and after the attack
Example Results: Average Attack : Example Results: Average Attack Average attack is very effective against user based algorithm (Random not as effective)
Item-based CF more robust (but vulnerable to other attack types such as “segment attack” [Burke & Mobasher, 2005]
Example Results: Bandwagon Attack : Example Results: Bandwagon Attack Only a small profile needed (3%-7%)
Only a few (< 10) popular movies needed
As effective as the more data-intensive average attack (but still not effective against item-based algorithms)
Results: Impact of Profile Size : Results: Impact of Profile Size Only a small number of filler items need to be assigned ratings. An attacker, therefore, only needs to use part of the product space to make the attack effective. In the item-based algorithm we don’t see the same drop-off, but prediction shift shows a logarithmic behavior – near maximum at about 7% filler size.
Example Results: Segmented Attack Against Item-Based CF : Example Results: Segmented Attack Against Item-Based CF Very effective against targeted group
Best against item-based
Also effective against user-based
Low knowledge
Possible Solutions : Possible Solutions Explicit trust calculation?
select peers through network of trust relationships
law of large numbers
hard to achieve numbers needed for CF to work well
Hybrid recommendation
Some indications that some hybrids may be more robust
Model-based recommenders
Certain recommenders using clustering are more robust, but generally at the cost of less accuracy
But a probabilistic approach has been shown to be relatively accurate [See: Model-Based Collaborative Filtering as a Defense Against Profile Injection Attacks, B. Mobasher, R. Burke, JJ Sandvig. AAAI 2006, Boston.]
Detection and Response
Approaches to Detection & Response : Approaches to Detection & Response Profile Classification
Classification model to identify attack profiles and exclude these profiles in computing predictions
Uses the characteristic features of most successful attack models
Designed to increase cost of attacks by detecting most effective attacks
Anomaly Detection
Classify Items (as being possibly under attack)
Not dependent on known attack models
Can shed some light on which type of items are most vulnerable to which types of attacks But, what if the attack does not closely correspond to known attack signature In Practice: need a comprehensive framework combining both approaches
Conclusions : Conclusions Why recommender systems?
Many algorithmic advances more accurate and reliable systems more confidence by users
Assist users in
Finding more relevant information, items, products
Give users alternatives broaden user knowledge
Building communities
Help companies to
Better engage users and customers building loyalty
Increase sales (on average 5-10%)
Problems and challenges
More complex Web-based applications more complex user interactions need more sophisticated models
Need to further explore the impact of recommendations on (a) user behavior and (b) on the evolution of Web communities
Privacy, security, trust
Slide61 :
?
Results: Semantically Enhanced Hybrid : Results: Semantically Enhanced Hybrid Alpha 0.0 = 100% semantic item-based similarity
Alpha 1.0 = 100% collaborative item-based similarity Semantic features extracted for movies: top actors, director, genre, synopsis (top keywords), etc.
Anomaly Detection: Using Control Charts : Anomaly Detection: Using Control Charts Upper and lower boundaries on average ratings of items used as signal thresholds for push and nuke attacks, respectively. A new item’s average rating Observations: avg. ratings on training items in a particular category, assuming no biased ratings
Anomaly Detection: Using Time Series : Anomaly Detection: Using Time Series A sudden change in an item’s mean rating may indicate a suspicious pattern
Anomaly Detection Results : Anomaly Detection Results SPC can be effective in identifying items under attack
Time series effective in long-term monitoring of items
Detection performance highly affected by the rating density and popularity of items For more on the anomaly detection approach see:
Securing Collaborative Filtering Against Malicious Attacks Through Anomaly Detection. R. Bhaumik, C. Williams, B. Mobasher, R. Burke
In Proceedings of the 4th Workshop on Intelligent Techniques for Web Personalization (ITWP'06), held at AAAI 2006, Boston, July 2006.
Classification-Based Approach to Detection : Classification-Based Approach to Detection
Profile Classification
Automatically identify attack profiles and exclude them from predictions
Reverse-engineered profiles likely to be most damaging
Increase cost of attacks by detecting most effective attacks
Characteristics of known attack models are likely to appear in other effective attacks as well
Basic Approach
Create attributes that capture characteristics of suspicious profiles
Use attributes to build classification models
Apply model to user profiles to identify and discount potential attacks
Two Types of Detection Attributes
Generic – Focus on overall profile characteristics
Model-specific – based on characteristics of specific attack models
Partition profile to maximize similarity to known models
Generate attributes related to partition characteristics
Methodological Note for Detection Results : Methodological Note for Detection Results Data set
Using MovieLens 100K data set
Data split 50% training, 50% test
Profile classifier - Supervised training approach
kNN classifier, k=9
Training data
Half of actual data labeled as “Authentic”
Insert a mix of attack profiles built from several attack models labeled as “Attack”
Test data
Start with second half of actual data
Insert test attack profiles targeting different movies than targeted in training data
Recommendation Algorithm
User based kNN, k = 20
Evaluating results
50 different target movies
selected randomly but mirroring overall distribution
50 users randomly pre-selected
Results were averaged over all runs for each movie-user pair
Evaluation Metrics : Evaluation Metrics Detection attribute value:
Information Gain – attack profile vs. authentic profile
Classification performance:
True positive = # of attack profiles correctly identified
False positive = # of authentic profiles misclassified as attacks
False negatives = # of attack profiles misclassified as authentic
Precision = true positives / (true pos. + false pos.)
Percent of profiles identified as attacks that are attacks
Recall = true positives / (true pos. + false negatives)
Percent of attack profiles that were identified correctly
Recommender robustness:
Prediction shift – change in recommender’s prediction resulting from the attack
Classification Effectiveness: Average and Random Push Attacks : Classification Effectiveness: Average and Random Push Attacks Note: As a baseline we compared our classifier with the ad hoc approach for attack detection by Chirita et al., WIDM 2005, which does not use all of the proposed attributes and does not build a classification model.
Robustness:Impact of Detection on Prediction Shift Due to Attacks : Robustness: Impact of Detection on Prediction Shift Due to Attacks
Attacks in Collaborative Recommenders: Summary : Attacks in Collaborative Recommenders: Summary Collaborative spam (clam?)
Worse than we thought; common algorithms vulnerable; targeting quite easy to achieve
Attacks, if designed correctly, can require very limited system- or user-specific knowledge
Need to understanding properties of attack models
Can help in designing more robust algorithms
E.g., hybrid and model-based algorithms
Needed fro effective detection and response
Most effective attacks are those that mimic known attack models
A Push Attack Against Item-Based Algorithm : A Push Attack Against Item-Based Algorithm Prediction
BestMatch
Examples of Generic Attributes : Examples of Generic Attributes Weighted Deviation from Mean Agreement (WDMA)
Average difference in profile’s rating from mean rating on each item weighted by the item’s inverse rating frequency squared
Weighted Degree of Agreement (WDA)
Sum of profile’s rating agreement with mean rating on each item weighted by inverse rating frequency
Average correlation of the profile's k nearest neighbors
Captures rogue profiles that are part of large attacks with similar characteristics
Variance in the number of ratings in a profile compared to the average number of ratings per user
Few real users rate a large # of items
Model Specific Attributes : Model Specific Attributes Partition profile to maximize similarity to known models
Generate attributes related to partition characteristics that would stand out if the profile was that type of attack
Examples of Model Specific Attributes : Examples of Model Specific Attributes Average attack detection model
Partition profile to minimize variance in ratings in Pu,F from mean rating for each item
For average attack, the mean variance of the filler partition is likely less than an authentic user
Segment attack detection model
Partition profile into items with high ratings and low ratings
For segment attack, the difference between the average rating of these two groups is likely greater than that of an authentic user
Target focus detection model (TMF)
Use the identified Pu,T partitions to identify concentrations of items under attack across all profiles