logging in or signing up proposal1 Gavril Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 38 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 20, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Adaptive Information Retrieval: Adaptive Information Retrieval Eren Manavoglu1 Advisor Dr. C. Lee Giles1,2 1Department of Computer Science and Engineering The Pennsylvania State University, University Park, PA, 16802 2School of Information Sciences and Technology The Pennsylvania State University, University Park, PA, 16802Information Retrieval over the Web Today: Information Retrieval over the Web Today Main tools: Search engines User input: query terms/questions Result presentation: ranked list of matching documents Ranking algorithm: a combination of topical relevancy, citations and document popularity (probably)Why Do We Need Adaptive Information Retrieval?: Why Do We Need Adaptive Information Retrieval? Different users use the same query to access different information. Jaguar: animal or automobile? Even the same user may have different information needs at different points in time about the same topic. Dick Cheney: Scooter scandal last month, Hunting accident today.Adaptive Information Retrieval Techniques : Adaptive Information Retrieval Techniques Personalization* Personalization on server side: Relies on explicit user input mainly (MyYahoo, Personalized Google) and the parameters of the ranking algorithm is personalized Personalization on client side: Uses both implicit and explicit user feedback. The user model is usually in the form of interest profile(s) Recommender Systems Will be discussed in more detail later Clustering search results Will be discussed in more detail later * A survey can be found in Manavoglu, Giles, Sping, Wang, “The Power of One - Gaining Business Value from Personalization Technologies”, Chapter 9What this Proposal is about: What this Proposal is about Learn the user behavior models to capture individual differences Group the documents at query time to capture the current state of available information sources Use the user model to highlight or recommend groups of documents of interest to him/her Work Done So Far: Work Done So Far Recommender Systems User Behavior Modeling Clustering Search ResultsRecommender Systems: Recommender Systems Goal: To automate the “word-of-mouth” recommendations within a community. Approaches: Content-based filtering: Recommendations are done solely based on the content of the items (e.g. topic, author, etc.) Collaborative filtering: Similar users are identified based on the items they view and then the items viewed by like-minded users are recommended to a given user [Resnick 94] Hybrid: Both content of the items and the similarity of the users are used in making recommendations [Shani 2005]Recommender Systems -A Probablistic Approach: Recommender Systems - A Probablistic Approach Represent the data as a collection of ordered sequences of document requests Assume that we are given a dataset consisting of ordered sequences in some fixed alphabet. The elements of the alphabet are documents. For each document define its history H as a so-far observed subsequence of the current sequence Our goal is to predict the next document Dnext given the history H Learn a probabilistic model P(Dnext | H, Data)Recommender Systems - Probabilistic Models: Recommender Systems - Probabilistic Models A mixture model based approach could be taken for learning the probabilistic model P(Dnext | H, Data) Nc is the number of components, P(Dnext |H,Data,k) is the component model Component distribution, P(Dnext |H,Data,k), should make use of the sequence information and the learning should be scalable Mixture of multinomials, mixture of markov modelsProposed Recommendation Model - Maximum Entropy (maxent): Proposed Recommendation Model - Maximum Entropy (maxent) Maximum entropy provides a framework to combine information from different knowledge sources. Each knowledge source imposes a set of constraints. Intersection of all constraints contains a set of probability functions. Maximum entropy principle chooses the one with the highest entropy, i.e. most flat function. Advantage of maxent approach Combine 1st-order Markov model features and long term dependencies among the data. Representation for Long Term Dependence - Triggers: Representation for Long Term Dependence - Triggers Long term dependence of Dnext on H is represented with triggers. The pair of actions (a,b) is a trigger iff is substantially different from P(Dnext=b). Triggers are pairs with high mutual information scores.Maximum Entropy Model Definition: Maximum entropy objective function leads to the following model: Fs(D,H) are the feature indicator functions. S is the total number of bigrams+triggers for document D s are maxent model parameters for component k If s corresponds to bigram (a,b) then Fs(D,H)=1 iff D=a and Dprev=b, and Fs(D,H)=0 otherwise Zλ,k(H) is a normalization constant Maximum Entropy Model DefinitionMaximum Entropy Model -A Complication and its Solution: Maximum Entropy Model - A Complication and its Solution Maxent computation is expensive if the number of items is large. On a dataset of 500K items it could take months This problem can be solved if the items are clustered before applying the maxent model We employed a user navigation based clustering algorithm using this simple idea: documents that are requested consecutively are related Compute the document bigrams Apply greedy clustering using the bigram counts* * Details of the algorithm can be found in Pavlov, Manavoglu, Giles, Pennock, Giles 2004Experimental Evaluation: Experimental Evaluation Competing Models Maxent 1st Order Markov model Mixture of Markov models (30-components) Mixture of Multinomial (60-components, with the similarity based CiteSeer predictors as its feature set) Correlation Individual CiteSeer similarity-based predictors CiteSeer merged-similarity-based predictor We call ResearchIndex Merge the recommender that is obtained by pulling all similarity-based recommenders together—essentially, this corresponds to the currently available recommending system in CiteSeer.Current Recommendation Algorithms in CiteSeer: Current Recommendation Algorithms in CiteSeer Similarity recommenders Content-based: text or citation similarity Sentence Similarity, Text Similarity, Active Bibliography, Cited By, Co-citations, Cites* Collaborative: user similarity Users Who Viewed* Based on source similarity On Same Site* * For detailed information on CiteSeer recommenders please see Lawrence, Giles, Bollacker, 1999.Experimental Setup: Experimental Setup Conducted on 6 months worth of CiteSeer data in the form of < time, user, request > Chronologically partitioned the data into 5 million training and 1.7 million test requests Document requests were aggregated by userID and sessionized. An inactivity time of 300 seconds was used to break the requests into sessions Robots were identified and eliminated based on the number of requests submitted Documents that appeared less than 3 times were discarded (~250,000 documents remained)Evaluation Metrics: Evaluation Metrics Hit ratio: the ratio of correct predictions to the total number of predictions made Average height of prediction: Predictions are a list of ranked actions, where the ranking is done by ordering the actions by their probability values, thus average height of prediction is the average height of the hit within this list First N predictions in the list are considered and the performance is evaluated based on the success of these N predictions, for N=1,…,5,10 (called bins hereafter)Results: Results Among the individual CiteSeer recommenders Active Bibliography was the best predictor. However the merged predictor performed better than Active Bibliography hence it will represent the CiteSeer recommender in the remainder of the experiments.Results: Results Although merged CiteSeer predictor does better for longer recommendation lists, maxent is the best for small number of guesses. Correlation improves as the recommendation list gets longer Also presented is the average height results of the competing models. With respect to average height maxent outperformed all the other recommendersConclusions - Discussion: Conclusions - Discussion Maxent proved to be one of the best models especially when the number of recommendations to be made was limited But clustering the documents is a crucial point for its feasibility. Most of the experiments done in the field of recommender systems have been offline experiments. However this could be misleading, the effect of the recommendation on the user should also be taken into account. We plan to setup a medium for online experiments.User Behavior Models: User Behavior ModelsMotivation: Motivation In the previous section we focused on only the items users viewed. But users do much more than just viewing items. They browse, query, look at recommendations, etc. Can we improve user experience by looking at these other actions as well? Can we detect what the users are trying to do in a given session?User Behavior Modeling: User Behavior Modeling What data do we have about users? Web logs, in clickstream format How can we make use of this data? By applying web usage mining techniquesPrevious Work on Web Usage Mining: Previous Work on Web Usage Mining Frequent item set identification by association rule extraction methods1 Collaborative Filtering for recommender systems2,3 Probabilistic graphical models (e.g. Bayesian nets) to discover and represent dependencies among different variables4 Sequential pattern analysis to discover patterns in time-ordered sessions5 1Liu et al, Integrating Classification and Association Rule Mining. Fourth International Conference on Knowledge Discovery and Data Mining, 1998 2Resnick et al, GroupLens. An open architecture for Collaborative Filtering of Netnews. ACM 1994 Conference on Computer Supported Cooperative Work. 3Pavlov, Manavoglu, Pennock, Giles, Collborative Filtering with Maximum Entropy. IEEE Intelligent Systens 2004 4D. Heckerman, Bayesian Networks for Data Mining. Data Mining and Knowledge Discovery 1997 5Cadez et al, Predictive Profiles for Transaction Data Using Finite Mixture Models. TR, UC Irvine 2001Problem Definition: Problem Definition We propose the following sequential framework for modeling user behavior. Each user session is represented as a sequence, labeled with a user ID. Each individual item in the sequence represents a user action. For each action, history H(U) is defined as the so-far observed ordered sequence of actions. P(Anext|H(U),Data) is the behavior model for user U that predicts the next action Anext, given the history H(U). User U, time t Action At: Query H(U): empty User U, time t+1 H(U): At Action At+1: Browse User U, time t+2 H(U): AtAt+1 Action At+2: Help Problem Definition: Problem Definition Problem: To infer P(Anext |H(U),Data) for each individual given the training data. Method Proposed: Mixture Models Why Mixture Models?: Why Mixture Models? Mixture models can handle heterogeneous data and provide a clustering framework. Mixture models are weighted combinations of component models. Each component represents a dominant pattern in the data. A global mixture model captures the general patterns among users. A mixture model can be personalized to capture individual behavior patterns. Buyer Model Browser Model Customer A (Frequently visits, but rarely buys) = 0.2*Buyer Model +0.8*Browser Model McLachlan & Basford 1988, Cadez 2001Mixture Models: Mixture Models Global Mixture Models Parameters are same for all individuals. An Nc- component mixture model: Personalized Mixture Models: Personalized Mixture Models Learn individual component probabilities, αU,k’s for each user. Resulting model is therefore specific to each user. For component distribution P(Anext|H(U),Data,k) Markov and maximum entropy distributions are used.Parameter Estimation: Parameter Estimation Learn the parameters of the global model by expectation-maximization (EM) algorithm. Fix the component distributions. Learn the individual component weights, αU,k’s, for known users by EM. The αU,k’s for users who do not appear in the training data will be the global αk’s values.Visualization: Visualization Each component of a mixture model can be viewed as a cluster. Each session is a weighted combination of these clusters. Probability of user session Su belonging to cluster k is Each session is assigned to the cluster that maximizes P(k|SU,Data). Sample ClusterVisualization - Graphs: Visualization - Graphs If number of actions is large, viewing and interpreting sample sessions becomes impossible. Build a graph for each cluster. Triggers and bigrams are computed and sorted for each cluster Pairs of actions > threshold value are used to build the graph The sequence of actions defines the direction of the edgesExperimental Setup: Experimental Setup CiteSeer1 log files of approximately 2 months. Log files are series of transactions in the form of <time, action, user ID, related info>. <1014151912.455 event authorhomepage 3491213 197248 - - ip: 42779 agent: Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows NT)> Returning users are identified by cookies. Actions are broken into sessions based on time heuristics (300 seconds of inactivity indicates the end of a session). User1 Seesion1 37 31 34 31 32 User1 Session2 37 31 34 31 34 33 34 User2 Session1 31 34 33 20 User2 Session2 33 20 Robots are identified based on the number of requests per session and are removed. http://citeseer.ist.psu.eduSlide34: CiteSeer user actions and descriptions used during experimentsExperiments: Experiments User behavior models are evaluated based on the accuracy of their predictions. User behavior clusters are visualized to demonstrate the descriptive ability of the models. For maxent models the history was empirically set to the last 5 actions.Hit Ratio Results – Returning Users: Hit Ratio Results – Returning Users Hit ratio results on known users for 3 component mixture model Hit ratio results on known users for 10 component mixture model Personalized models outperformed global models. Personalization significantly improved for maxent model. Personalized Markov mixture is better for returning users.Hit Ratio Results – All Users: Hit Ratio Results – All Users Hit ratio results on all users for 3 component mixture model Hit ratio results on all users for 10 component mixture model Maxent performs better than Markov for all users. Maxent chooses the most flat function and has more parameters. Mixture of Markov models can better fit the training data (for returning users).User Behavior Clusters - Maxent Model Graphs: User Behavior Clusters - Maxent Model Graphs Cluster 4: Users Querying CiteSeer through another search engine Cluster 6: Users who use CiteSeer query interface Cluster 9: Users who look at recommendations and citationsConclusions: Conclusions Both mixture of maximum entropy and Markov models are able to generate strong predictive models. Markov models are better for predicting future actions of returning users. Maximum Entropy models perform better for predicting the global model and first time users. Predicting future actions could be useful in recommending shortcuts and personalizing the user interface. CiteSeer users significantly differ in the way they use the engine. Mixture models can be used to discover these different patterns. Identifying the cluster a session belongs to can be used to recommend certain functionalities of CiteSeer (e.g. bibtex, recommendations, etc). Manavoglu, Pavlov, Giles ICDM 2003Navigation Graph: Navigation Graph Query Behavior Analysis(Work in Progress): Query Behavior Analysis (Work in Progress)Query Behavior Patterns - Motivation: Query Behavior Patterns - Motivation Goal: To investigate how the users interact with the query interface To identify dominant patterns of querying behavior, which could then be used in recommendations If the user is submitting the same query over and over should we show the same recommendations? If the user is modifying the original query string persistently should we change the recommendation criteria to give more priority to diversity of the recommendations (because modifying the query could be an indication of unsatisfactory results/recommendations)?Query Types: Query Types Query Types New Query String: none the terms in the query (excluding stop words and logical operators) were seen in the previous query Modified Query String: a partially new query, some terms are eliminated and/or added Same Query String: same as the most previous query string Document Query: Document database is queried Citation Query: Citation database is queriedQuery Behavior Patterns - Methodology: Query Behavior Patterns - Methodology Sequential Modeling Sessionize log files Consider the queries in a session as an ordered sequence Identify the query types within each session Investigate the patterns for the ordered sequence of query types Algorithm Mixture of 1st order Markov Experimental Setup Same as the previous sectionQuery Types - Example: Query Types - Example Input: <sessionID queryEvent userID queryString> 1 documentquery uid1 “online routing” 1 documentquery uid1 “online routing lipmann” 1 documentquery uid1 “lipmann” 1 documentquery uid1 “Maarten Lipmann” 1 documentquery uid1 “oltsp” 1 citationquery uid1 “oltsp” Output: uid1 docNew docModified docModified docModified docNew citeOldQuery Behavior - Results: Query Behavior - Results Query Behavior Analysis - Observations: Query Behavior Analysis - Observations Users tend to submit the same query more than once in the same session Only a small portion of CiteSeer users are submitting citation queries (15%) CiteSeer users are trying to modify their queries instead of giving up Can query modification be viewed as an indication to change document recommendation or ranking strategy?Online Clustering of Search Results: Online Clustering of Search ResultsOnline Clustering of Search Results -Motivation: Online Clustering of Search Results - Motivation So far we looked at user’s behavior to improve user experience and retrieval quality. Is it also possible to organize the data adaptively to improve the quality of the results? Grouping topically related items together in the result page may improve coverage and diversity of the displayed resultsWhy Online?: Why Online? An offline clustering/classification of search results would not be query dependent. Two documents could be related for the query “machine learning” but they may not be as relevant for a rather more specific one “machine learning for recommender systems”. The time of query changes the set of available documents. A news search on “hurricane” query today and “hurricane” query a week ago would result in very different set of documents Online Clustering of Search Results - Test Case: Online Clustering of Search Results - Test Case To evaluate the performance of clustering on a frequently updated index (with deletions as well as addition of documents) we chose to work on News articles. Due to architecture of the underlying news search engine (Yahoo! News) the implemented solution had to satisfy the following conditions: Access to 100 documents per query Work in a stateless/memoryless environment Fetch original results, cluster them and return the clustered results in less than 300 milliseconds This work was done during an internship at Yahoo! The paper is in review for confidentiality requirements. Please ask about the details.Online Clustering of Search Results - Algorithm: Online Clustering of Search Results - Algorithm Clustering Algorithm Hierarchical Agglomerative Clustering, Mixture of Multinomials, Cluto, KNN, Kmeans were implemented and compared. Hierarchical Aglomerative Clustering (HAC) algorithm outperformed others in news domain. HAC algorithm works on the pairwise similarities of news articles. Pairwise similarity matrix is computed at the time of query The similarity of a cluster to another one is computed as a factor of the pairwise document similarities. Similarity metric is an approximation of cosine similarity measure. Stopping criteria is a threshold on the similarity of the clusters. The optimum threshold value is discovered with an exhaustive search on the hold-out dataset. For a survey on Clustering algorithms see Berkhin, 2002Online Clustering of Search Results - Clustering Algorithm Evaluation: Online Clustering of Search Results - Clustering Algorithm Evaluation HAC clustering algorithm was evaluated against the aforementioned algorithms and Google News Clusters by an editorial review. 300 queries were chosen in random from that day’s user logs. Reviewers were shown 4 documents per cluster. Results of 2 different algorithms were displayed side by side and reviewers were asked to chose between the 2. HAC was shown to outperform the rest of the algorithms (the difference was statistically significant) Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Users may not realize how and why the documents were grouped together. Having names for clusters serving as short descriptions would help users process and navigate through the result set easier. Common naming approaches: Choose the most representative article for each cluster and use its title as the cluster label (Google news search http://news.google.com) Find the most representative phrases within the documents belonging to each cluster (Clusty http://news.clusty.com/) For Keyphrase Extraction see Zha, SIGIR 2002Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Observations: News articles are usually formulated to answer more than one of the following questions: where, when, who, what and how. Articles belonging to the same cluster may be sharing answers to only some of these questions. Goal: We want to partition the titles into substrings corresponding to one (or more) of these questions. And then choose among these substrings the most common one.Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Methodology: Use POS tags to find the boundaries between compact substrings. Break the sentence into smaller parts at these points and generate a candidate set composed of these shorter substrings and their continuous concatenations. Example: Title: “Wilma forces Caribbean tourists to flee” Substrings: “Wilma”, “Caribbean tourists”, “flee” Candidates: “Wilma”, “Wilma forces Caribbean tourists”, “Caribbean tourists”, “Wilma forces Caribbean tourists to flee”, “flee”, “Caribbean tourists to flee” Score the candidates based on coverage, frequency or length or a combination of these metrics. Choose the candidate with the highest score to be the cluster label.Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Scoring Algorithms: Coverage Score: A metric to measure how well the candidate covers the information in all titles. Frequency Score: Is the frequency score for the full string. Length normalized frequency Score: Candidates with less number of words will have more frequencies. To avoid this bias towards shorter candidates we normalize the frequency scores based on the number of words they includeOnline Clustering of Search Results - Naming Evaluation: Online Clustering of Search Results - Naming Evaluation 160 queries were chosen at random from the daily query logs. The results of these queries were clustered and saved. Candidate names for each cluster were generated offline. Candidates included titles as well. 8 users were asked to either select from the list of candidates the best label or type in their own choice if they could not find an appropriate candidate. At least 3 judgments were collected per query. 55% of the names chosen by reviewers were titles. Experimental results showed that length normalized frequency score outperforms the others (60% match with user labels). In the case of ties, coverage scores were found to be the best tie-breaker.Work In Progress: Work In Progress Further query analysis: What are the users searching for: Titles, abstracts, references, publication dates, venues? Are the users viewing/downloading on linked documents, or not? Investigating community based adaptation strategies* Designing a framework for online evaluation of recommender systems Such a platform for CiteSeer was designed by Cosley et al (called REFEREE) in its early days. Our goal is to make it scale to and be available in the next generation CiteSeer. *Recently published paper on discovering E-communities: Zhou, Manavoglu, Li, Giles, Zha in WWW 2006My Proposal: My Proposal MyCiteSeer A unified adaptive information retrieval engine where document, action and query behaviors are all factored inHow?: How? Action Model selects the modules and shortcuts to be displayed user U is interested in only users who viewed document recommendations => only this document recommendation module will be shown on document page Query Model and Action Model selects the query interface user U searches for titles only and looks at more than 20 results => the default query field is title and results are clustered Document model selects the content with input from Action and Query models. Action and Query models act as implicit feedback Evaluation?: Evaluation? A user study would give the most reliable and comprehensive assessment of the system. Individual components as well as the overall system can be evaluated. Some form of implicit feedback can also be used for evaluation. Clickthrough rate on results page, number of actions taken to access a given document/page, time spent on completing certain tasks. If used together with some explicit user feedback (through random questionnaires, ratings, or feedback via email, blogs or forums) the appropriate metrics could be identified and used. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
proposal1 Gavril Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 38 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 20, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Adaptive Information Retrieval: Adaptive Information Retrieval Eren Manavoglu1 Advisor Dr. C. Lee Giles1,2 1Department of Computer Science and Engineering The Pennsylvania State University, University Park, PA, 16802 2School of Information Sciences and Technology The Pennsylvania State University, University Park, PA, 16802Information Retrieval over the Web Today: Information Retrieval over the Web Today Main tools: Search engines User input: query terms/questions Result presentation: ranked list of matching documents Ranking algorithm: a combination of topical relevancy, citations and document popularity (probably)Why Do We Need Adaptive Information Retrieval?: Why Do We Need Adaptive Information Retrieval? Different users use the same query to access different information. Jaguar: animal or automobile? Even the same user may have different information needs at different points in time about the same topic. Dick Cheney: Scooter scandal last month, Hunting accident today.Adaptive Information Retrieval Techniques : Adaptive Information Retrieval Techniques Personalization* Personalization on server side: Relies on explicit user input mainly (MyYahoo, Personalized Google) and the parameters of the ranking algorithm is personalized Personalization on client side: Uses both implicit and explicit user feedback. The user model is usually in the form of interest profile(s) Recommender Systems Will be discussed in more detail later Clustering search results Will be discussed in more detail later * A survey can be found in Manavoglu, Giles, Sping, Wang, “The Power of One - Gaining Business Value from Personalization Technologies”, Chapter 9What this Proposal is about: What this Proposal is about Learn the user behavior models to capture individual differences Group the documents at query time to capture the current state of available information sources Use the user model to highlight or recommend groups of documents of interest to him/her Work Done So Far: Work Done So Far Recommender Systems User Behavior Modeling Clustering Search ResultsRecommender Systems: Recommender Systems Goal: To automate the “word-of-mouth” recommendations within a community. Approaches: Content-based filtering: Recommendations are done solely based on the content of the items (e.g. topic, author, etc.) Collaborative filtering: Similar users are identified based on the items they view and then the items viewed by like-minded users are recommended to a given user [Resnick 94] Hybrid: Both content of the items and the similarity of the users are used in making recommendations [Shani 2005]Recommender Systems -A Probablistic Approach: Recommender Systems - A Probablistic Approach Represent the data as a collection of ordered sequences of document requests Assume that we are given a dataset consisting of ordered sequences in some fixed alphabet. The elements of the alphabet are documents. For each document define its history H as a so-far observed subsequence of the current sequence Our goal is to predict the next document Dnext given the history H Learn a probabilistic model P(Dnext | H, Data)Recommender Systems - Probabilistic Models: Recommender Systems - Probabilistic Models A mixture model based approach could be taken for learning the probabilistic model P(Dnext | H, Data) Nc is the number of components, P(Dnext |H,Data,k) is the component model Component distribution, P(Dnext |H,Data,k), should make use of the sequence information and the learning should be scalable Mixture of multinomials, mixture of markov modelsProposed Recommendation Model - Maximum Entropy (maxent): Proposed Recommendation Model - Maximum Entropy (maxent) Maximum entropy provides a framework to combine information from different knowledge sources. Each knowledge source imposes a set of constraints. Intersection of all constraints contains a set of probability functions. Maximum entropy principle chooses the one with the highest entropy, i.e. most flat function. Advantage of maxent approach Combine 1st-order Markov model features and long term dependencies among the data. Representation for Long Term Dependence - Triggers: Representation for Long Term Dependence - Triggers Long term dependence of Dnext on H is represented with triggers. The pair of actions (a,b) is a trigger iff is substantially different from P(Dnext=b). Triggers are pairs with high mutual information scores.Maximum Entropy Model Definition: Maximum entropy objective function leads to the following model: Fs(D,H) are the feature indicator functions. S is the total number of bigrams+triggers for document D s are maxent model parameters for component k If s corresponds to bigram (a,b) then Fs(D,H)=1 iff D=a and Dprev=b, and Fs(D,H)=0 otherwise Zλ,k(H) is a normalization constant Maximum Entropy Model DefinitionMaximum Entropy Model -A Complication and its Solution: Maximum Entropy Model - A Complication and its Solution Maxent computation is expensive if the number of items is large. On a dataset of 500K items it could take months This problem can be solved if the items are clustered before applying the maxent model We employed a user navigation based clustering algorithm using this simple idea: documents that are requested consecutively are related Compute the document bigrams Apply greedy clustering using the bigram counts* * Details of the algorithm can be found in Pavlov, Manavoglu, Giles, Pennock, Giles 2004Experimental Evaluation: Experimental Evaluation Competing Models Maxent 1st Order Markov model Mixture of Markov models (30-components) Mixture of Multinomial (60-components, with the similarity based CiteSeer predictors as its feature set) Correlation Individual CiteSeer similarity-based predictors CiteSeer merged-similarity-based predictor We call ResearchIndex Merge the recommender that is obtained by pulling all similarity-based recommenders together—essentially, this corresponds to the currently available recommending system in CiteSeer.Current Recommendation Algorithms in CiteSeer: Current Recommendation Algorithms in CiteSeer Similarity recommenders Content-based: text or citation similarity Sentence Similarity, Text Similarity, Active Bibliography, Cited By, Co-citations, Cites* Collaborative: user similarity Users Who Viewed* Based on source similarity On Same Site* * For detailed information on CiteSeer recommenders please see Lawrence, Giles, Bollacker, 1999.Experimental Setup: Experimental Setup Conducted on 6 months worth of CiteSeer data in the form of < time, user, request > Chronologically partitioned the data into 5 million training and 1.7 million test requests Document requests were aggregated by userID and sessionized. An inactivity time of 300 seconds was used to break the requests into sessions Robots were identified and eliminated based on the number of requests submitted Documents that appeared less than 3 times were discarded (~250,000 documents remained)Evaluation Metrics: Evaluation Metrics Hit ratio: the ratio of correct predictions to the total number of predictions made Average height of prediction: Predictions are a list of ranked actions, where the ranking is done by ordering the actions by their probability values, thus average height of prediction is the average height of the hit within this list First N predictions in the list are considered and the performance is evaluated based on the success of these N predictions, for N=1,…,5,10 (called bins hereafter)Results: Results Among the individual CiteSeer recommenders Active Bibliography was the best predictor. However the merged predictor performed better than Active Bibliography hence it will represent the CiteSeer recommender in the remainder of the experiments.Results: Results Although merged CiteSeer predictor does better for longer recommendation lists, maxent is the best for small number of guesses. Correlation improves as the recommendation list gets longer Also presented is the average height results of the competing models. With respect to average height maxent outperformed all the other recommendersConclusions - Discussion: Conclusions - Discussion Maxent proved to be one of the best models especially when the number of recommendations to be made was limited But clustering the documents is a crucial point for its feasibility. Most of the experiments done in the field of recommender systems have been offline experiments. However this could be misleading, the effect of the recommendation on the user should also be taken into account. We plan to setup a medium for online experiments.User Behavior Models: User Behavior ModelsMotivation: Motivation In the previous section we focused on only the items users viewed. But users do much more than just viewing items. They browse, query, look at recommendations, etc. Can we improve user experience by looking at these other actions as well? Can we detect what the users are trying to do in a given session?User Behavior Modeling: User Behavior Modeling What data do we have about users? Web logs, in clickstream format How can we make use of this data? By applying web usage mining techniquesPrevious Work on Web Usage Mining: Previous Work on Web Usage Mining Frequent item set identification by association rule extraction methods1 Collaborative Filtering for recommender systems2,3 Probabilistic graphical models (e.g. Bayesian nets) to discover and represent dependencies among different variables4 Sequential pattern analysis to discover patterns in time-ordered sessions5 1Liu et al, Integrating Classification and Association Rule Mining. Fourth International Conference on Knowledge Discovery and Data Mining, 1998 2Resnick et al, GroupLens. An open architecture for Collaborative Filtering of Netnews. ACM 1994 Conference on Computer Supported Cooperative Work. 3Pavlov, Manavoglu, Pennock, Giles, Collborative Filtering with Maximum Entropy. IEEE Intelligent Systens 2004 4D. Heckerman, Bayesian Networks for Data Mining. Data Mining and Knowledge Discovery 1997 5Cadez et al, Predictive Profiles for Transaction Data Using Finite Mixture Models. TR, UC Irvine 2001Problem Definition: Problem Definition We propose the following sequential framework for modeling user behavior. Each user session is represented as a sequence, labeled with a user ID. Each individual item in the sequence represents a user action. For each action, history H(U) is defined as the so-far observed ordered sequence of actions. P(Anext|H(U),Data) is the behavior model for user U that predicts the next action Anext, given the history H(U). User U, time t Action At: Query H(U): empty User U, time t+1 H(U): At Action At+1: Browse User U, time t+2 H(U): AtAt+1 Action At+2: Help Problem Definition: Problem Definition Problem: To infer P(Anext |H(U),Data) for each individual given the training data. Method Proposed: Mixture Models Why Mixture Models?: Why Mixture Models? Mixture models can handle heterogeneous data and provide a clustering framework. Mixture models are weighted combinations of component models. Each component represents a dominant pattern in the data. A global mixture model captures the general patterns among users. A mixture model can be personalized to capture individual behavior patterns. Buyer Model Browser Model Customer A (Frequently visits, but rarely buys) = 0.2*Buyer Model +0.8*Browser Model McLachlan & Basford 1988, Cadez 2001Mixture Models: Mixture Models Global Mixture Models Parameters are same for all individuals. An Nc- component mixture model: Personalized Mixture Models: Personalized Mixture Models Learn individual component probabilities, αU,k’s for each user. Resulting model is therefore specific to each user. For component distribution P(Anext|H(U),Data,k) Markov and maximum entropy distributions are used.Parameter Estimation: Parameter Estimation Learn the parameters of the global model by expectation-maximization (EM) algorithm. Fix the component distributions. Learn the individual component weights, αU,k’s, for known users by EM. The αU,k’s for users who do not appear in the training data will be the global αk’s values.Visualization: Visualization Each component of a mixture model can be viewed as a cluster. Each session is a weighted combination of these clusters. Probability of user session Su belonging to cluster k is Each session is assigned to the cluster that maximizes P(k|SU,Data). Sample ClusterVisualization - Graphs: Visualization - Graphs If number of actions is large, viewing and interpreting sample sessions becomes impossible. Build a graph for each cluster. Triggers and bigrams are computed and sorted for each cluster Pairs of actions > threshold value are used to build the graph The sequence of actions defines the direction of the edgesExperimental Setup: Experimental Setup CiteSeer1 log files of approximately 2 months. Log files are series of transactions in the form of <time, action, user ID, related info>. <1014151912.455 event authorhomepage 3491213 197248 - - ip: 42779 agent: Mozilla/2.0 (compatible; MSIE 3.02; Update a; Windows NT)> Returning users are identified by cookies. Actions are broken into sessions based on time heuristics (300 seconds of inactivity indicates the end of a session). User1 Seesion1 37 31 34 31 32 User1 Session2 37 31 34 31 34 33 34 User2 Session1 31 34 33 20 User2 Session2 33 20 Robots are identified based on the number of requests per session and are removed. http://citeseer.ist.psu.eduSlide34: CiteSeer user actions and descriptions used during experimentsExperiments: Experiments User behavior models are evaluated based on the accuracy of their predictions. User behavior clusters are visualized to demonstrate the descriptive ability of the models. For maxent models the history was empirically set to the last 5 actions.Hit Ratio Results – Returning Users: Hit Ratio Results – Returning Users Hit ratio results on known users for 3 component mixture model Hit ratio results on known users for 10 component mixture model Personalized models outperformed global models. Personalization significantly improved for maxent model. Personalized Markov mixture is better for returning users.Hit Ratio Results – All Users: Hit Ratio Results – All Users Hit ratio results on all users for 3 component mixture model Hit ratio results on all users for 10 component mixture model Maxent performs better than Markov for all users. Maxent chooses the most flat function and has more parameters. Mixture of Markov models can better fit the training data (for returning users).User Behavior Clusters - Maxent Model Graphs: User Behavior Clusters - Maxent Model Graphs Cluster 4: Users Querying CiteSeer through another search engine Cluster 6: Users who use CiteSeer query interface Cluster 9: Users who look at recommendations and citationsConclusions: Conclusions Both mixture of maximum entropy and Markov models are able to generate strong predictive models. Markov models are better for predicting future actions of returning users. Maximum Entropy models perform better for predicting the global model and first time users. Predicting future actions could be useful in recommending shortcuts and personalizing the user interface. CiteSeer users significantly differ in the way they use the engine. Mixture models can be used to discover these different patterns. Identifying the cluster a session belongs to can be used to recommend certain functionalities of CiteSeer (e.g. bibtex, recommendations, etc). Manavoglu, Pavlov, Giles ICDM 2003Navigation Graph: Navigation Graph Query Behavior Analysis(Work in Progress): Query Behavior Analysis (Work in Progress)Query Behavior Patterns - Motivation: Query Behavior Patterns - Motivation Goal: To investigate how the users interact with the query interface To identify dominant patterns of querying behavior, which could then be used in recommendations If the user is submitting the same query over and over should we show the same recommendations? If the user is modifying the original query string persistently should we change the recommendation criteria to give more priority to diversity of the recommendations (because modifying the query could be an indication of unsatisfactory results/recommendations)?Query Types: Query Types Query Types New Query String: none the terms in the query (excluding stop words and logical operators) were seen in the previous query Modified Query String: a partially new query, some terms are eliminated and/or added Same Query String: same as the most previous query string Document Query: Document database is queried Citation Query: Citation database is queriedQuery Behavior Patterns - Methodology: Query Behavior Patterns - Methodology Sequential Modeling Sessionize log files Consider the queries in a session as an ordered sequence Identify the query types within each session Investigate the patterns for the ordered sequence of query types Algorithm Mixture of 1st order Markov Experimental Setup Same as the previous sectionQuery Types - Example: Query Types - Example Input: <sessionID queryEvent userID queryString> 1 documentquery uid1 “online routing” 1 documentquery uid1 “online routing lipmann” 1 documentquery uid1 “lipmann” 1 documentquery uid1 “Maarten Lipmann” 1 documentquery uid1 “oltsp” 1 citationquery uid1 “oltsp” Output: uid1 docNew docModified docModified docModified docNew citeOldQuery Behavior - Results: Query Behavior - Results Query Behavior Analysis - Observations: Query Behavior Analysis - Observations Users tend to submit the same query more than once in the same session Only a small portion of CiteSeer users are submitting citation queries (15%) CiteSeer users are trying to modify their queries instead of giving up Can query modification be viewed as an indication to change document recommendation or ranking strategy?Online Clustering of Search Results: Online Clustering of Search ResultsOnline Clustering of Search Results -Motivation: Online Clustering of Search Results - Motivation So far we looked at user’s behavior to improve user experience and retrieval quality. Is it also possible to organize the data adaptively to improve the quality of the results? Grouping topically related items together in the result page may improve coverage and diversity of the displayed resultsWhy Online?: Why Online? An offline clustering/classification of search results would not be query dependent. Two documents could be related for the query “machine learning” but they may not be as relevant for a rather more specific one “machine learning for recommender systems”. The time of query changes the set of available documents. A news search on “hurricane” query today and “hurricane” query a week ago would result in very different set of documents Online Clustering of Search Results - Test Case: Online Clustering of Search Results - Test Case To evaluate the performance of clustering on a frequently updated index (with deletions as well as addition of documents) we chose to work on News articles. Due to architecture of the underlying news search engine (Yahoo! News) the implemented solution had to satisfy the following conditions: Access to 100 documents per query Work in a stateless/memoryless environment Fetch original results, cluster them and return the clustered results in less than 300 milliseconds This work was done during an internship at Yahoo! The paper is in review for confidentiality requirements. Please ask about the details.Online Clustering of Search Results - Algorithm: Online Clustering of Search Results - Algorithm Clustering Algorithm Hierarchical Agglomerative Clustering, Mixture of Multinomials, Cluto, KNN, Kmeans were implemented and compared. Hierarchical Aglomerative Clustering (HAC) algorithm outperformed others in news domain. HAC algorithm works on the pairwise similarities of news articles. Pairwise similarity matrix is computed at the time of query The similarity of a cluster to another one is computed as a factor of the pairwise document similarities. Similarity metric is an approximation of cosine similarity measure. Stopping criteria is a threshold on the similarity of the clusters. The optimum threshold value is discovered with an exhaustive search on the hold-out dataset. For a survey on Clustering algorithms see Berkhin, 2002Online Clustering of Search Results - Clustering Algorithm Evaluation: Online Clustering of Search Results - Clustering Algorithm Evaluation HAC clustering algorithm was evaluated against the aforementioned algorithms and Google News Clusters by an editorial review. 300 queries were chosen in random from that day’s user logs. Reviewers were shown 4 documents per cluster. Results of 2 different algorithms were displayed side by side and reviewers were asked to chose between the 2. HAC was shown to outperform the rest of the algorithms (the difference was statistically significant) Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Users may not realize how and why the documents were grouped together. Having names for clusters serving as short descriptions would help users process and navigate through the result set easier. Common naming approaches: Choose the most representative article for each cluster and use its title as the cluster label (Google news search http://news.google.com) Find the most representative phrases within the documents belonging to each cluster (Clusty http://news.clusty.com/) For Keyphrase Extraction see Zha, SIGIR 2002Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Observations: News articles are usually formulated to answer more than one of the following questions: where, when, who, what and how. Articles belonging to the same cluster may be sharing answers to only some of these questions. Goal: We want to partition the titles into substrings corresponding to one (or more) of these questions. And then choose among these substrings the most common one.Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Methodology: Use POS tags to find the boundaries between compact substrings. Break the sentence into smaller parts at these points and generate a candidate set composed of these shorter substrings and their continuous concatenations. Example: Title: “Wilma forces Caribbean tourists to flee” Substrings: “Wilma”, “Caribbean tourists”, “flee” Candidates: “Wilma”, “Wilma forces Caribbean tourists”, “Caribbean tourists”, “Wilma forces Caribbean tourists to flee”, “flee”, “Caribbean tourists to flee” Score the candidates based on coverage, frequency or length or a combination of these metrics. Choose the candidate with the highest score to be the cluster label.Online Clustering of Search Results - Naming the Clusters: Online Clustering of Search Results - Naming the Clusters Scoring Algorithms: Coverage Score: A metric to measure how well the candidate covers the information in all titles. Frequency Score: Is the frequency score for the full string. Length normalized frequency Score: Candidates with less number of words will have more frequencies. To avoid this bias towards shorter candidates we normalize the frequency scores based on the number of words they includeOnline Clustering of Search Results - Naming Evaluation: Online Clustering of Search Results - Naming Evaluation 160 queries were chosen at random from the daily query logs. The results of these queries were clustered and saved. Candidate names for each cluster were generated offline. Candidates included titles as well. 8 users were asked to either select from the list of candidates the best label or type in their own choice if they could not find an appropriate candidate. At least 3 judgments were collected per query. 55% of the names chosen by reviewers were titles. Experimental results showed that length normalized frequency score outperforms the others (60% match with user labels). In the case of ties, coverage scores were found to be the best tie-breaker.Work In Progress: Work In Progress Further query analysis: What are the users searching for: Titles, abstracts, references, publication dates, venues? Are the users viewing/downloading on linked documents, or not? Investigating community based adaptation strategies* Designing a framework for online evaluation of recommender systems Such a platform for CiteSeer was designed by Cosley et al (called REFEREE) in its early days. Our goal is to make it scale to and be available in the next generation CiteSeer. *Recently published paper on discovering E-communities: Zhou, Manavoglu, Li, Giles, Zha in WWW 2006My Proposal: My Proposal MyCiteSeer A unified adaptive information retrieval engine where document, action and query behaviors are all factored inHow?: How? Action Model selects the modules and shortcuts to be displayed user U is interested in only users who viewed document recommendations => only this document recommendation module will be shown on document page Query Model and Action Model selects the query interface user U searches for titles only and looks at more than 20 results => the default query field is title and results are clustered Document model selects the content with input from Action and Query models. Action and Query models act as implicit feedback Evaluation?: Evaluation? A user study would give the most reliable and comprehensive assessment of the system. Individual components as well as the overall system can be evaluated. Some form of implicit feedback can also be used for evaluation. Clickthrough rate on results page, number of actions taken to access a given document/page, time spent on completing certain tasks. If used together with some explicit user feedback (through random questionnaires, ratings, or feedback via email, blogs or forums) the appropriate metrics could be identified and used.