logging in or signing up TakagiBISC2004 ozturk Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: Embed: Flash iPad Copy Does not support media & animations WordPress Embed Customize Embed URL: Copy Thumbnail: Copy The presentation is successfully added In Your Favorites. Views: 62 Category: Travel/ Places.. License: All Rights Reserved Like it (0) Dislike it (0) Added: March 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Conceptual Fuzzy Sets and Context Sensitive Information Retrieval: Conceptual Fuzzy Sets and Context Sensitive Information Retrieval Tomohiro Takagi Meiji University, UC BerkeleyOutline: Outline coping with context dependent meanings toward conceptual fuzzy sets from IR point of view Trial 1: TREC Novelty Track Trial 2: TREC Web track Trial 3: Enhancing Google Image Search Trial 4: Detection of illegal websites Sparse Cording ModelCoping with context dependent meanings: Coping with context dependent meanings Ordinary approach: Cases cases×cases×cases× ……. Impossible Ex) heavy: (elephant or human or dog or cat or mouse or …) ×(old or middle or young or child or baby…) ×(Europe or Asia or Africa or ….) × Humans do not memorize things in that way. Slide4: Our approach: Fusion of fractions of knowledge “Heavy” means bigger weight than usual. Usually “middle” and “young” is bigger than “child”. Usually “baby” is smaller than “child”. … Fractions of knowledge: Fractions of knowledge “meaning representation from use” proposed by Wittgenstein: “meaning representation from use” proposed by Wittgenstein According to Wittgenstein the various meanings of a label (word) can be represented by other labels (words) in its use. In this spirit, conceptual fuzzy sets, in which meaning of a word is represented by the distribution of the activation of other words depending on context, are proposed.toward conceptual fuzzy sets: toward conceptual fuzzy sets Heavy word A (with grade) Word B (with grade) Word E (with grade) Word D (with grade) Word F (with grade) Word C (with grade)toward conceptual fuzzy sets: toward conceptual fuzzy sets Heavy word A (with grade) Word B (with grade) Word E (with grade) Word D (with grade) Word F (with grade) Word C (with grade) toward conceptual fuzzy sets: toward conceptual fuzzy sets Heavy word A (with grade) Word B (with grade) Word E (with grade) Word D (with grade) Word F (with grade) Word C (with grade) Conceptual fuzzy set (possibility distribution supporting concept “heavy”) But how to generate possibility distribution reflecting context? Meanings of JAVA in three deferent contexts : Meanings of JAVA in three deferent contexts Java coffee Island Programming languageActivated Fraction of Knowledge: Activated Fraction of Knowledge coffee Island Slide12: Topics (Concepts) Clustering words (word – document matrix) + optimization using corpus Dmoz (ODP) Artificial Brain Word vector Context Word vector IN OUTSlide13: Java & Mocha Java & Windows Relational matrix CFS with 3 prototype vector CFS with 15 prototype vector coffee S/W travel H/W coffee S/W travel H/WSimulations using actual home pages: Simulations using actual home pages Randomly selected 45 home pages Extracted 247 words from the pages Built 247 x 247 relational matrix based on co-occurrenceSlide15: Results expanded from keyword input java & application coffee computer travel 3 times iterations 10 times iterations Co-occurrenceClustering 60 web pages: Clustering 60 web pages Co-occurrence CFSs :Movie :Music :Travel :Cooking from IR point of view: from IR point of view Exact word matching Un-match from IR point of view: from IR point of view Exact word matching Expansion Un-match Soft match .. but low precision from IR point of view: from IR point of view Exact word matching Expansion Context aware focused expansion Un-match Soft match .. but low precision Better quality match From both point of view: From both point of view Fuzzy sets Information retrieval Information Retrieval based on meaning representation using CFS, which is possibility distribution of words reflecting context.TRIAL 1 : TRIAL 1 10,000 words 800 fractions = 800 clusters Optimized weights Slide22: ・ ・ ・ ・ ・ ・ ・ ・ ・ X fraction c1 Similarity (x, c1) Similarity (x, c2) Similarity (x, cm) amn am1 a1n a12 a11Examples of expansion: Examples of expansion WORLD SPORTS AT 0000 GMT WORLD CUP. PARIS _ FIFA bans Laurent Blanc for two games, confirming that the French defender is out of Sunday's World Cup final against Brazil.TREC Novelty Track: TREC Novelty Track Tasks Relevancy Detection Novelty Detection Learning data Reuter (TREC 2002) corpus 810,000 documents Indexed words: 10,000 Prototypes: 800Relevancy Detection System: Relevancy Detection SystemResult of Task 1 and Task 3 : Result of Task 1 and Task 3 Task 1, Relevant and Novel F Scores: Task 1, Relevant and Novel F Scores TRIAL 2: TRIAL 2 Case 1: 120,000 fractions = docs Case 2: 70,000 fractions = clusters of docs. TREC Web track Topic Distillation Task : document Modified vector query Modified vector output matching TREC Web track Topic Distillation Task Gov collection (1.2 million HTML docs.) Example of Expansion: Example of Expansion Physical fitness (0.0392 → 0.1362) 0.111806 fit 0.107622 physic 0.031421 sport 0.023926 exercis 0.020036 aerob 0.018505 heart 0.018082 obes 0.017206 particl 0.015366 walk computer virus (0.0105 → 0.0982) 0.098488 viru 0.086169 comput 0.036659 softwar 0.031507 encrypt 0.029903 vulner 0.027442 hacker 0.026835 virus 0.024170 intrus 0.024154 secur 0.022238 passwordResults (R-Precision): Results (R-Precision) Case 2 0.1733 The best last year 0.1636 Case 1 0.1612 2nd best last year 0.1485Enhancing Google Image Search - 20,000 index words - 60,000 prototypes: Enhancing Google Image Search - 20,000 index words - 60,000 prototypes TRIAL 3 Slide33: “gates” Bill Gates Experimental resultsSlide34: Query User relevance feedback Meaning representation using CFS Query refinement Focus reflecting contextSlide35: Experimental result - 1 With feedback Without feedback Query = catSlide36: Experimental result - 2 With feedback Without feedback Query = appleSlide38: Text based Image Search Content (Image) based Search Enhanced Image Search Next Step of Image SearchDetection of illegal websites : Detection of illegal websites TRIAL 4 Illegal sites: Illegal sites Warez Illegal distribution and sale of commercial software Emulation Illegal distribution of software, such as video games Music Distribution of music data that infringes on copyrights Adult Pornographic depictions and expressions Hacking & Cracking Distribution of illegal hacking and cracking software sharing of technical know-how Drugs & Guns Sale of drugs and guns sharing of acquisition routes Killing Descriptions of murder and other violent acts Illustration of illegal site: Illustration of illegal site Many suspicious words Many commercial software names High link rate to compressed files Looks like Illegal distribution and sale of commercial software Concept Description of “Warez”, “Music” and “Emulation” : Concept Description of “Warez”, “Music” and “Emulation” Warez Music Emulator Suspicious words Commercial software Software maker Suspicious words Compressed file types URL List CFS System: CFS System HTML document TF-IDF values Types of linked files and URLs Names (software, makers, music, artists) Support Vector MachineEvaluation: Evaluation Randomly selected 300 actual Web sites (including 85 illegal sites) Compared CFS system with plain TF-IDF systemResults: CFS system TF-IDF system precision 0.9878 1.0000 recall 0.9529 0.8706 E measure 0.0299 0.0692 precision 0.9817 0.9556 recall 0.9953 1.0000 E measure 0.0115 0.0227 illegal pages legal pages 300 pages ResultsCFS based on Sparse Cording: CFS based on Sparse Cording Training corpus: 200,000 Reuters news articles (1996/08/20 - 1997/08/19) Sparse Cording: Sparse Cording In human brain One information ： one neuron (grandmother cell) One information ： several neurons (cell assembly) Information A Information B Information C Information A Information B Information CInterconnection based on Mutual Information: Interconnection based on Mutual Information Term Layer Context LayerSlide49: Meaning of a word is encoded as a activation pattern of neurons. Fractions of knowledge are encoded as interconnections of neurons. Get the most appropriate word as a result. Operating cell assembly: Operating cell assembly 1. term input 2. propagating activation to related context 3. detecting the context 4. propagating activation to related word 5. term outputExamples of expansion: Input “child” + “seat” Input “child” Input “seat” Examples of expansion You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.