trt 11

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Text Retrieval and Mining # 12 Recommendation System, Recommender System: 

Text Retrieval and Mining # 12 Recommendation System, Recommender System Lecture by Young Hwan CHO, Ph. D. Youngcho@gmail.com

Recommendation Systems: 

Recommendation Systems 정의 사용자가 경험하지 않은 것 중에서 적합한 것을 선택해서 제안하는 시스템 입력 : 사용자의 과거 경험, 다른 사용자들의 선택, 대상들의 정보 응용처 전자 상거래 : Amazon.com, CDNOW, Media Unbound 정보 서비스 : Pandora (Music Genome Project) 방법 사용자 이용의 측면 : 다른 사용자들의 선호도를 기본으로 현재의 사용자와 가장 취향이 비슷한 사용자 혹은 사용자 그룹을 선택해서 현재의 사용자가 선택하지 않은 것을 제안 컨텐츠의 측면 : 사용자가 선호하는 컨텐츠의 성향을 분석해서 컨텐츠를 그룹핑하고 이 중에서 사용자가 과거에 선택하지 않았던 것을 제안 일반적인 문제 정보 부족 : 사용자와 대상은 많은데, 경험 기록은 부족 : Sparse Matrix 새로운 대상 : 어느 사용자도 과거에 경험하지 않았음 프라이버시 : 개인정보 유출에 대한 우려가 있음

Methods: 

Methods Use item-to-item similarity – content-based Use item-to-item similarity – association A C B like similar contents Recommend

Methods: 

Methods Use people-to-people similarity – demographic Use people-to-people similarity – collaborative

What do RSs achieve?: 

What do RSs achieve? Help people make decisions Where to spend attention Where to spend money Help maintain awareness New products New information Demographic features Item features Sales history Purchase history Customer Recommend items

Sample Applications: 

Sample Applications Ecommerce Product recommendations - amazon Corporate Intranets Recommendation, finding domain experts, … Digital Libraries Finding pages/books people will like Medical Applications Matching patients to doctors, clinical trials, … Customer Relationship Management Matching customer problems to internal experts

Well-known recommender systems: Amazon and Netflix: 

Well-known recommender systems: Amazon and Netflix

Corporate intranets - document recommendation: 

Corporate intranets - document recommendation

Corporate intranets - “expert” finding: 

Corporate intranets - “expert” finding

Recommender System 구성: 

Recommender System 구성 내용 분석이 필요 내용 분석이 불필요 다른 사용자 정보 필요 다른 사용자 정보 불필요 collaborative demographic association content-based

Recommender System =:= Clustering Model: 

Recommender System =:= Clustering Model Content Clustering 문서 : 키워드와 중요도 벡터 상품 : 분류, 메이커, 가격, 기능, 상품평 등에 대한 값 벡터 People Clustering 행태 : 문서 혹은 상품에 대한 선호도 프로필 : 나이, 주소, 직업, 결혼여부, 취미 등 Recommender System 만의 고유한 문제 상품의 숫자는 너무 많고, 사용자는 그중에 한두개 정도만을 구매함 [사용자 * 상품] 매트릭스가 심하게 Sparse함

Short History: 

Short History 혼합 시스템 IR: Information Retrieval IF : Information Filter Item Information(Content) User Information 촛점 (Pure) 내용기반 추천 1980 (Pure) 협동 추천 1990 1992 1999 GroupLens(Minesota) Fab(Stanford) Agent NetPerceptions Firefly

Collaborative filtering (CF): 

Collaborative filtering (CF) Collaborative Filtering (CF): A promising Recommender System technology. Used in many of the most successful Recommender Systems on the web w y m r f c

Simplest Algorithm: Naïve k Nearest Neighbors: 

Simplest Algorithm: Naïve k Nearest Neighbors U viewed d1, d2, d5. Look at who else viewed d1, d2 or d5. Recommend to U the doc(s) most “popular” among these users. U V W d1 d2 d5

Single evidence CF: 

Single evidence CF

옥션의 사례 – 서비스 개인화: 

옥션의 사례 – 서비스 개인화

Recommender System의 기법들: 

Recommender System의 기법들 Collaborative Filtering (CF) Singular Value Decomposition (SVD) An SVD-CF Approach in the Recommender Systems Domain

The RS Space: 

The RS Space User-User Links Links derived from similar attributes, explicit connections

Link types: 

Link types User attributes-based Recommendation Male, 18-35: Recommend The Matrix Content Similarity You liked The Matrix: recommend The Matrix Reloaded Collaborative Filtering People with interests like yours also liked Kill Bill

Collaborative Method: 

Collaborative Method Advantages No needs of contents analysis Items that are difficult to analyze contents can be recommended Ex> Movie, music, … No needs of user information High precision Method Find out similar users Predict preferences based on similar users preferences

Collaborative Method: 

Collaborative Method Computing similarity 유사도 계산 Pearson correlation coefficient r a,i : 사용자 a 의 상품 i 에 대한 평가값 ra : 사용자 a 의 평균 평가값 Example 사용자 a : (1, 9, 10) 사용자 b : (2, 10, 9) 사용자 a는 사용자 b와 더 유사함 사용자 c : (10, 1, 2)

Collaborative Method: 

Collaborative Method Prediction of preferences Weighted sum of similar users’ preferences : 사용자 a와 u의 유사도 Example Average rating of user a: 5 Preferences of user a User b: (2, 8, 8), wa,b = 0.5 = (5, 5, 5) + (-3, 3, 3)*0.5 + (-1, -1, 2)*0.1 User c: (4, 4, 7), wa,c = 0.1 = (3.4, 6.4, 6.7)

Data Sparseness Problem: 

Data Sparseness Problem Example data

Data Sparseness Problem: 

Data Sparseness Problem Available data are usually very sparse Buy 2~3 items among thousands of items Cosine similarity can not be computed Reduce dimension

Dimensionality Reduction: 

Dimensionality Reduction Using category information Represent user preference vector with item categories Monster Co., Lion King, Pocahontas  animation Holloween, Scream  horror

Dimensionality Reduction: 

Dimensionality Reduction Singular Value Decomposition (SVD) Decompose the user-item matrix Amn Amn = Umm Smn (Vnn)T S : Diagonal matrix that contains the singular values of A in descending order U, V : Orthogonal matrices

Dimensionality Reduction: 

Dimensionality Reduction SVD example

Dimensionality Reduction: 

Dimensionality Reduction Approximation of A Select largest k singular values A’mn = Umk Skk (Vnk)T Computing user similarity AAT = USVT(USVT)T = USVTVSTUT = (US)(US)T Projection of A into k dimension A’mn Vnk = Umk Skk

An Example: 

An Example User-item matrix

An Example: 

An Example Reduction, k = 2

An Example: 

An Example User-user similarity

What is the SVD doing: 

What is the SVD doing Type 1 Type 2 … Type k Users Items Atypical users Samples

An Example: 

An Example User vectors in 2-D space u6 u4 u5 u3 u2 u1

Experiments : 

Experiments Dataset – MovieLens 943 users, 1628 movies, 1~5 rating, 6.4% rated Change ratings to 0/1  3.6% rated Experiments Compare performance of plain collaborative(CF) and reduced dimension(SVD) recommendation CF: 60 neighbor SVD: rank 20 Change sparseness to 2.0%, 1.0%, 0.5%

Experiments: 

Experiments Metric Hit ratio Remove 1 rating from each user  test data Recommend 10 items for each user If the test data is in the recommended item  hit Total # of hit Total # of test data Result Sparseness 3.6%  SVD improves hit ratio by x % Sparseness 0.5%  SVD improves hit ratio by x % Hit ratio =

Experiments: 

Experiments Results

Results from SIGIR 2004 Paper: 

Results from SIGIR 2004 Paper Much better predicts top movies Cost is that it tends to often predict blockbuster movies A serendipity/ trust trade-off

Case-Based Reasoning (CBR): 

Case-Based Reasoning (CBR) Use people-to-people similarity 속성이 유사한 고객(case)을 찾아 유사한 고객이 구매한 상품을 추천 Automatic, ephemeral A C B like same feature

Slide39: 

Moviegoer Survey

Slide40: 

Independence Day Courage Under Fire Birdcage Nutty Professor 1 3 4 2 25 30 35 40 45 50 0 age Source ? ? ? Nearest neighbor Nearest neighbor Nearest neighbor All the data points closest to this point saw “Independence Day”. Nearest Neighbor

Case-Based Reasoning (CBR): 

Case-Based Reasoning (CBR) Example Customers with sales history

Case-Based Reasoning (CBR): 

Case-Based Reasoning (CBR) Distance to neighbors d_gender(A, B) = |A – B| d_age(A, B) = d_salary(A, B) = |A – B| / max difference d_sum = d_gender + d_age + d_salary Prediction based on distance weighted sum

Decision Trees: 

Decision Trees Use decision tree to classify customers 과거의 여러 고객들의 구매 데이터로부터 고객을 분류하는 decision tree를 생성하고 이를 이용하여 새로운 고객에 대한 분류 수행 A C B 속성 Class A Class C Class B

Decision Trees: 

Decision Trees Constructing a tree Easy way - one path for each example Better way - make it as simple as possible Ex> (a=0, b=0)  Class A (a=0, b=1)  Class B (a=1, b=0)  Class A a? b? b? Class A Class A Class B 0 0 0 1 1 b? 1 0 Class A Class B vs.

Neural Networks: 

Neural Networks Use neural network to classify customers 과거의 여러 고객들의 구매 데이터로부터 고객을 분류하는 neural network을 훈련시키고 이를 이용하여 새로운 고객에 대한 분류 수행 A C B 속성 Class A Class C Class B

Conclusion: 

Conclusion Future issues Integrating various methods Attribute of people (demographic info.) Attribute of product Purchase data People’s rating Fully automatic recommendation Implicit negative data? Producing marketing information Grouping customers Sales prediction