2 dt

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts: 

Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts Paper By: Source: International Journal of Electronic Commerce / Spring 2001, VoL 5, No.3, pp. 45-62. Copyright © 2001 M.E. Sharpe, Inc. All rights reserved. Jong Woo Kim, Byung Hun Lee, Michael J. Shaw, Hsin-Lu Chang, Matthew Nelson

Paper Authors:: 

Paper Authors: JONG WOO KIM is an assistant professor in the depart­ment of statistics at Chungnam National University in Teajon, Korea. BYUNG HUN LEE is a researcher in marketing at MPC LTD. in Korea High-level timing goals. MICHAEL J. SHAW is director of the Center for Information Systems and Technology Management at the University of Illinois at Urbana­Champaign. HSIN-LU CHANG is a Ph.D. student in information systems at the School of Commerce, University of Illinois at Urbana-Champaign. MATTHEW NELSON currently is a second-year Ph.D. student in information systems at the University of Illinois.

Paper References:: 

Paper References: Agrawal, R.; Aming, A.; Bollinger, T.; Mehta, M.; Schafer, J.; and Srikant, R. The Quest data mining system. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining. Portland, OR, 1996, pp. 244-249. Allen, c.; Kania, D.; and Yaeckel, B. Internet World Guide to One-to-One Web Marketing. New York: John Wiley, 1998. Berry, M.J .A., and Linoff, G. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: Wiley Computer Publishing, 1997. BroadVision. A BroadVision one-ta-one white paper. White paper, BroadVision (1996). BroadVision. http://broadvision.com (2000). Business Week Graphic: On-line sales are soaring. Business Week (June 22, 1998) (based on data from Forrester Research). Chaturvedi, A.R; Hutchinson, G.K.; and Nazareth, D.L. Supporting complex real-time decision making through machine learning. Decision Support Systems, 10, 2 (September 1993),213-233. CLIPS. CLIPS Riference Manual (Version 6.05). CLIPS, 1997. Garbonell, J.G., and Michalski, R.S. Machine learning: A historical and methodological analysis. Al Magazine (fall 1983), 69-79. Gupta, 0.;, Digiovanni, M.; Norita, H.; and Goldberg, K. Jester 2.0: Evaluation of a new linear time collaborative filtering algorithm. 22nd International ACM SIGIR Conference on Research and Development in Informa­tion Retrieval (August 1999), pp. 291-292. Johnson, R.A., and Wichern, D.W. Applied Multivariate Statistical Analysis. Englewood-Cliffs, NJ: Prentice Hall International, 1992. Kim, ]., Lee, K., Shaw, M. J., Chang, H., and Nelson, M. A preference scoring tedmique to personalized advertisements on Internet storefront. Working paper, University of lllinois at Urbana-Champaign (2000). Lee, K. Personalized advertisement techniques for one-to-one marketing on Internet stores. Master's thesis, Chungnam National University, 2000.

Paper References: Cont.: 

Paper References: Cont. LikeMinds. LikeMinds'·personalization server 3.0 technology backgrounder. White paper, Andromedia (1999). Mehta, M.; Agrawal, R,: and Rissanen, J. SLIQ: A fast scalable classifier for data mining. In Proceeding of the Fifth International Conference on Extending Database Technology. Avignon (March 1996). Net Perceptions. www.netperceptions.com (2000). Peppers, D., and Rogers, M. The One to One Future: Building Relationships One Customer at a Time. New York: Currency and Doubleday, 1993. Quinlan, J.R. Induction of decision-trees. In J .W. Chavlik, W. Jude, and T.G. Dietterich (eds.), Readings in Machine Learning. San Mateo, CA: Morgan Kaufmann, 1990, pp. 57-69. Quinlan, J.R. Leaming decision-tree classifiers. ACM Computing Surveys, 28, 1 (March 1996), 71-72. Schafer, J.B.; Konstan, J.;: and Riedl, J. Recommendation system in e­commerce. In Proceedings of the ACM Conference on Electronic Commerce, Denver (November 3-5,1999), pp. 158-166. Shaw, M.J Machine leaming methods for intelligent decision support. Decision Support Systems, 10,2 (September 1993), 79-83. Shaw, M.J.; Blanning, R; Strader, T.; and Whinston, A. Handbook on Electronic Commerce. Berlin and Heidelberg: Springer-Verlag, 2000. Tsai, L.-H., and Koehler, GJ The accuracy of concepts learned from induction. Decision Support Systems, 10,2 (September 1993), 161-172. Weiss, S.M., and Indurkhya, Nitin. Predictive Data Mining: A Practical Guide

Reference for the paper:: 

Reference for the paper: http://an5qy7ag4q.scholar.serialssolutions.com/?sid=google&auinit=JW&aulast=Kim&atitle=Application+of+Decision-Tree+Induction+Techniques+to+Personalized+Advertisements+on+Internet+Storefronts&title=International+journal+of+electronic+commerce&volume=5&issue=3&date=2001&spage=45&issn=1086-4415 FOR MORE INFO... STONY BROOK UNIVERSITY LIBRARIES

Introduction: 

Introduction This paper studies personalized recommendation techniques that suggest products or services to the customers of Internet storefronts based on their demographics or past purchasing behavior

Currently available techniques: 

Currently available techniques Currently available recommendation techniques are primarily based: Collaborative filtering Preference scoring Rule-based approaches

Currently available techniques cont.: 

Currently available techniques cont. Collaborative filtering selects advertisements for customers based on the opinions of other customers with similar past preferences. The preference-scoring approach to personalized recommendation uses a preference-score concept to select personalized advertisements based on initial customer profile, purchase history, and behavior in Internet stores In the rule-based approach, marketing rules from marketing experts are a core component in providing personalized advertisements.

Rule based approach: 

Rule based approach Based on the marketing rule that “undergraduates in Indiana whose income is greater than $80000 prefer to own Luxury cars”, car advertisements are displayed whenever customers who live in Indiana access the Internet storefront.

Rule based approach: 

Rule based approach

Issues in Rule based approach: 

Issues in Rule based approach The marketing rules for personalized recommendation usually come from marketing experts in that domain. The acquisition of marketing rules, however, is unsystematic, time-consuming, and dependent on the intuition of marketing experts. The marketing rules in a knowledge base need to be continuously updated and changed, but manual change management is a difficult task.

Proposed Solution: 

Proposed Solution To overcome the defects of the rule-based approach in personalized advertisements, the authors propose decision-tree induction techniques that extract marketing rules from databases in Internet stores. Rule-based personalization has two phases: marketing-rule extraction, and real-time advertisement selection

Proposed Solution: 

Proposed Solution In order to obtain valuable marketing rules, decision tree induction technique is used to analyze purchase-transaction histories, customer profiles, and product information. The extracted marketing rules are stored in a marketing-rule base and are used for real-time personalized-advertisement selection when customers visit the Internet store

Proposed Solution Schematic: 

Proposed Solution Schematic

Marketing Rule Extraction: 

Marketing Rule Extraction The extraction of useful marketing rules using decision-tree induction techniques begins by defining a hierarchy tree of product categories to extract marketing rules at various abstraction levels of product categories MP3 Files Sporting goods Pop Music Classical Music Symphony Concert Rap Dance Metal Shakira Britany Kim Rolla

Marketing Rule Extraction: 

Marketing Rule Extraction The rule-extraction phase has four steps: (1) selecting learning data, (2) generating target variables, (3) constructing a decision tree, and (4) selecting a marketing rule

Marketing Rule Ext. – Phase 1: 

Marketing Rule Ext. – Phase 1 Step 1 Training data sets and test data sets are selected from customer records, making it possible to begin with a data set of manageable size. The size of a data set is determined by such factors as number of customer-profile records, the volume of disk storage, and the computing capacity of the inductive learning tools.

Marketing Rule Ext. – Phase 1: 

Marketing Rule Ext. – Phase 1 Step 2 In the next step, values of target variables are generated for the selected data records. In the present approach, target variables are whether or not a customer prefers a specific product category in the hierarchy tree. Since the target variables do not exist in the databases, they must be generated. This is done using the purchase-transaction database.

Marketing Rule Ext. – Phase 1: 

Marketing Rule Ext. – Phase 1 For example, customer profile records can have the following attributes: CUSTOMER = <ID, age, gender, hobby, location, married, job, ..., education>   After the target variables are generated, CUSTOMER relation is extended as follows. CUSTOMER' = <ID, age, gender, hobby, location, married, job, ..., education, t_MP3_file, t_Sporting_Goods/Leisure, t_Pop_Music, t_Classical_Music, ....., t_Fitness_Accessories>       where t_MP3_file, Etc. are binary variables to specify a customer's preference for the specific product categories. That is, the binary variable is assigned to 1 when the customer prefers the corresponding product category.

Marketing Rule Ext. – Phase 1: 

Marketing Rule Ext. – Phase 1 Step 3 The third step is decision-tree construction. Using inductive learning tools and the training data set, decision trees are inducted for all the target variables. Step 4 After the decision tree is constructed, useful marketing rules are filtered from constructed decision-trees based on the validation results from test data sets and accuracy of the rules.

Marketing Rule Ext. – Phase 1: 

Marketing Rule Ext. – Phase 1

TARGET-VARIABLE GENERATION : 

TARGET-VARIABLE GENERATION There are several possible ways to generate target variables based on the purchase-transaction database: (1) the counting-based method, (2) the expected-value-based method, (3) the statistics-based method, and (4) the subcategory-based method.

Counting-based method: 

Counting-based method The counting-based method, based on the number of purchases in a specific product category, makes it possible to decide whether or not the customer prefers the product category. That is,     PRE[subij] = 1 if NP[subij] >= Omega where, NP[subij] = the number of purchase of customer i for product category j Omega is a minimum threshold, which is determined by analysts.     PRE[subij] = 0 otherwise    

Expected-value-based method : 

Expected-value-based method N[subj] is the number of products in product category j. Alpha is a multiplier for the expected value. Usually alpha is equal to or greater than 1. When alpha is 1, if customer i purchased more items in product category j than the expected number of purchases of product category j, this means that customer i prefers product category j

Statistics-based method : 

Statistics-based method Statistical values like the mean, the median, the first quartile, and the third quartile are used to generate target variables

Subcategory based method: 

Subcategory based method In the case of non-leaf-node product categories, the corresponding target variables can be generated based on subcategory target variable values (when the target variables for subcategories are already generated using one of the above methods). PRE[subij] = 1 if PRE[subik] = 1 for some subcategory k of j     PRE[subij] = 0 otherwise

MARKETING-RULE SELECTION : 

MARKETING-RULE SELECTION A decision tree includes marketing rules that link customer demographics with preferences for product categories. Since marketing rules may have low predictability or may not be accurate, the valuable marketing rules have to be selected from constructed decision trees. Two phase heuristics are used for marketing-rule selection.     1. Choose decision trees whose predictability (1 - misclassification rate), with respect to a training data set, is greater than a certain threshold.     2. Select nodes from the filtered decision trees whose accuracy (i.e., purity) is greater than a certain threshold.      The selected nodes in the decision trees are translated in the rule structure. Here is an example of a translated rule. If age < 30 and gender = male then Ballad, with accuracy = .9 and level = 3.

ADVERTISEMENT SELECTION: 

ADVERTISEMENT SELECTION Personalized advertisements are selected on a real-time basis. Let M be the number of advertisements to be displayed in an Internet storefront, and L the depth of the product-category hierarchy tree. The advertisement-selection algorithm is as follows. STEP 1 m: = 0,l: = L, generate fact set using the specific customer's profile.

Advertisement Selection Algorithm Continued.: 

Advertisement Selection Algorithm Continued. STEP 2 Until m < M and 1 > 0 2.1 Fire marketing rules of level 1 with customer's facts. 2.2 If there are matching rules, matched rules are fired by accuracy order. The product categories appeared in consequent parts of the rules are selected as preferred product categories for the specific customer. Choose advertisements for that category from the advertisement database. The number of selected advertisements should be less than M - m. 2.3 If m = M, exit. 2.4 Update m to reflect the additionally selected advertisements, and update l to l-1.

Advertisement Selection Algorithm Continued.: 

Advertisement Selection Algorithm Continued. STEP 3 If m < M, randomly select M - m advertisements from the advertisement database

EXPERIMENT: 

EXPERIMENT EXPERIMENTAL DATA COLLECTION Data from 330 respondents were gathered. At first, the respondents were asked to select preferred product categories in MP3 music files and sporting goods or leisure equipment. Then they were asked to choose five MP3 music files among the 32 displayed products. The displayed products were equally selected from all MP3 leaf nodes. The respondents were also asked to purchase five sporting goods or leisure equipment items among the 24 displayed products. The displayed products were equally selected from all sporting goods or leisure equipment leaf nodes (three products from each of the eight leaf nodes). On the next two pages, respondents were asked to rank MP3 music files and sporting goods/leisure equipment items on a five-point personal-interest scale (1 = low interest to 5 = high interest). 16 advertisements for MP3 music files were displayed to rank interests about the advertisements, which were selected one item from each leaf node. In addition, 16 advertisements for sporting goods or leisure equipment items were displayed to rank interests about those, which were two selected items from each leaf node. Finally, the respondents were asked to specify their profiles, including age, gender, job, and so on.

MARKETING-RULE EXTRACTION: 

MARKETING-RULE EXTRACTION The 330 respondents were divided into two data sets, one for marketing-rule generation, and the other for testing the effectiveness of the proposed approach. For marketing-rule generation, 198 responses (60% of the total data set) were used. Among the 198 data items, 70 percent were used to construct decision trees, and 30 percent to validate constructed decision trees. For target-variable generation, an expected value-based approach of alpha= 1 was used. All told, 37 decision trees were constructed (26 for MP3 music files and 11 for sporting goods or leisure equipment). Decision trees with predictability greater than 65 percent were selected in the marketing-rule selection step. Decision rules with accuracy greater than 65 percent were selected as valuable marketing rules.

EFFECTIVENESS TEST: 

EFFECTIVENESS TEST To test the effectiveness of the proposed decision-tree induction approach, the results were compared to the preference-scoring approach and a random selection. The data of 132 respondents (40% of the total data set) were isolated for effectiveness testing. Three advertisements were selected for each respondent: that is, one advertisement using each of the three approaches for each respondent. In the case of the decision-tree induction approach, the marketing rules generated earlier were used to select personalized advertisements. In the case of the preference-scoring approach, advertisements were selected based on the respondents' own interest-manifestation behavior (purchase history, profile, preferred product categories). In the case of random selection, advertisements were selected on a simple random basis. The means of the five-point-scale personal-interest scores for the selected advertisements were compared statistically

RESULT: 

RESULT

Conclusion: 

Conclusion Statistically, there were differences among the five-point-scale scored means of customer' interest in advertisements from the three different approaches: decision-tree induction, preference scoring, and random selection. Thus, the effectiveness of personalized-recommendation techniques is affected by the decision algorithm utilized in the personalized- advertisement selection process. The proposed decision-tree induction approach is effective in generating marketing rules as a means to assist rule-based personalized recommendations. In the case of sporting goods or leisure equipment, decision-tree induction gave the best results. But in the case of MP3 music files, preference scoring gave the best results. This indicates that the appropriateness of personalized-advertisement techniques depends on the characteristics of product categories. Thus it is necessary to study how various personalized-advertisement techniques can be used cooperatively.

ThankYou: 

ThankYou Next Presentation – 5/2/2006 Topic - New Advances in Data Mining By: Group 14 Madhavarapu,Chidroop Sandhuria,Deepanshu DON’T FORGET TO COME TO CLASS!