logging in or signing up Constructing Fuzzy Thesaurus for WWW lusi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 195 Category: Entertainment License: All Rights Reserved Like it (1) Dislike it (0) Added: December 06, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch: Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch M. De Cock, S. Guadarrama and M. Nikravesh 2003 BISC FLINT-CIBI BISC, UC BerkeleyContent: Content Introduction Basic Concepts Term Weighting The WTW-approach Association Rules Fuzzy Terms Examples ConclusionIntroduction: Introduction During 70s-80s: Small text collections. Structured Databases. Information Retrieval methods. Now: Huge multimedia collections. Unstructured Web. Fuzzy Retrieval methods.Fuzzy Thesaurus: Fuzzy Thesaurus Is a couple (T, R) consisting of a set T of terms and a set R of binary fuzzy relations. Examples of binary fuzzy relations are: Similarity Broader Narrower Part Of Instance Of … Basic Concepts: Basic Concepts Document-Term relation: Crisp W: D x T -> {0,1} Fuzzy W: D x T -> [0,1] Term-Term Relation R: Man-Made: Dictionaries, Synonyms, Ontologies,… Computer-Made: WTW, Association Rules, Similarity and Inclusion Measures,…Term Weighting: Term Weighting Local terms weights (lij): Binary (fij) Logarithmic log(1+fij) Normalized ((fij)+(fij/maxkfkj))/2 Term frequency. fij Global terms weights (gi): None 1 Entropy 1+( j(pijlog(pij))/log(n))Term Weighting: Term Weighting IDF log(n/j(fij)) GfIdf (j fij)/(j(fij)) Normal 1/(jf2ij)0.5 Probabilistic Inverse log((n-j(fij))/j(fij)) Document normalization None 1 Cosine (j(gilij)2)-0.5Document-Term Matrix W: Document-Term Matrix W Binary [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ] TF-IDF [ 0.6 0 0 0 0 0.6 0 0 0 0.6 ] [ 0 0.5 0 0.5 0 0 0 0.5 0.5 0.5 ] [ 0.5 0.5 0 0 0.5 0 0 0 0.5 0 ] [ 0 0 0.6 0 0 0 0.6 0 0 0.6 ] [ 0 0 0.5 0 0.5 0.5 0 0 0.5 0 ] [ 0 0 0 0 0 0 0.7 0 0.7 0 ] [ 0.5 0 0 0.5 0 0 0 0.5 0 0.5 ] [ 0 0.5 0 0 0.5 0.5 0.5 0 0 0 ] [ 0 0 0 0.7 0 0 0 0.7 0 0 ] [ 0.5 0 0 0.5 0.5 0 0 0.5 0.5 0 ] Crisp Document-Term Matrix: Crisp Document-Term Matrix Fuzzy Document-Term Matrix: Fuzzy Document-Term Matrix The WT.W approach: The WT.W approachTerm-Term Matrix WTW: Term-Term Matrix WTW [ 0.1033 0.0250 0 0.0450 0.0450 0.0333 0 0.0450 0.0450 0.0583 ] [ 0.0250 0.0700 0 0.0200 0.0500 0.0250 0.0250 0.0200 0.0450 0.0200 ] [ 0 0 0.0583 0 0.0250 0.0250 0.0333 0 0.0250 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0500 0.0250 0.0200 0.0950 0.0500 0.0250 0.0200 0.0700 0 ] [ 0.0333 0.0250 0.0250 0 0.0500 0.0833 0.0250 0 0.0250 0.0333 ] [ 0 0.0250 0.0333 0 0.0250 0.0250 0.1083 0 0.0500 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0450 0.0250 0.0400 0.0700 0.0250 0.0500 0.0400 0.1400 0.0200 ] [ 0.0583 0.0200 0.0333 0.0450 0 0.0333 0.0333 0.0450 0.0200 0.1117 ]WTW Term-Term Matrix: WTW Term-Term Matrix Association Rules: Association Rules The Rows correspond to documents. The Columns correspond to terms. We want to find association rules between terms. Rules A=>B, are defined by:Confidence or Relative Cardinality: Confidence or Relative CardinalityCompositional Approach: Compositional ApproachSup-Prod Composition: Sup-Prod CompositionFuzzy Terms: Fuzzy Terms Meaning of term is a fuzzy set of documents. µ(t)= 0.8/d1+ 0.2/d2+ 0.0/d3+… Meaning of a document is a fuzzy set of terms. (d)= 0.1/t1+ 0.0/t2+ 0.8/t3+… Another interpretation of the document-term matrix: W = [µ(t1) µ(t2) µ(t3) …] WT = [(d1) (d2) (d3) …]Fuzzy Sets: Fuzzy Sets Inclusion measures: Similarity measures:Term-Document Matrix WT: Term-Document Matrix WT Fuzzy [0.6 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.4] [0.0 0.4 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.5 0.0 0.0 0.0 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.4] [0.6 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.0 0.7 0.0 0.5 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.4 0.5 0.0 0.5 0.7 0.0 0.0 0.0 0.4] [0.6 0.4 0.0 0.6 0.0 0.0 0.5 0.0 0.0 0.0] Crisp [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ] 2-D Projection of W: 2-D Projection of WFuzzy terms 1, 9: Fuzzy terms 1, 9Similarity using Min: Similarity using MinSimilarity using Prod: Similarity using ProdFuzzy Terms 2: Fuzzy Terms 2 Meaning of term is a fuzzy set of terms. µ(t)= 0.5/t1+ 1.0/t2+ 0.2/t3+… Meaning of a document is a fuzzy set of documents. (d)= 0.1/d1+ 0.5/d2+ 0.1/d3+… Another interpretation of the term-term and document-document matrix: T = [µ(t1) µ(t2) µ(t3) …] D = [(d1) (d2) (d3) …]Application to BeMySearch: Application to BeMySearch Query Expansion. Query Refinement. Re-Ranking. Navigation. User Profile. …Conclusions: Conclusions General Framework: WTW Association Rules Fuzzy relation composition Fuzzy terms Relies on a fuzzy document-term relation. Traditionally probabilistic approach. Necessity of really fuzzy approach.Future Work: Future Work More Relations: Sentence - Term Paragraph - Sentence Document - Paragraph Document – Document Clustering techniques Cluster of documents or paragraphs or sentences. Cluster of terms. Questions & Comments: Questions & Comments You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Constructing Fuzzy Thesaurus for WWW lusi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 195 Category: Entertainment License: All Rights Reserved Like it (1) Dislike it (0) Added: December 06, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch: Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch M. De Cock, S. Guadarrama and M. Nikravesh 2003 BISC FLINT-CIBI BISC, UC BerkeleyContent: Content Introduction Basic Concepts Term Weighting The WTW-approach Association Rules Fuzzy Terms Examples ConclusionIntroduction: Introduction During 70s-80s: Small text collections. Structured Databases. Information Retrieval methods. Now: Huge multimedia collections. Unstructured Web. Fuzzy Retrieval methods.Fuzzy Thesaurus: Fuzzy Thesaurus Is a couple (T, R) consisting of a set T of terms and a set R of binary fuzzy relations. Examples of binary fuzzy relations are: Similarity Broader Narrower Part Of Instance Of … Basic Concepts: Basic Concepts Document-Term relation: Crisp W: D x T -> {0,1} Fuzzy W: D x T -> [0,1] Term-Term Relation R: Man-Made: Dictionaries, Synonyms, Ontologies,… Computer-Made: WTW, Association Rules, Similarity and Inclusion Measures,…Term Weighting: Term Weighting Local terms weights (lij): Binary (fij) Logarithmic log(1+fij) Normalized ((fij)+(fij/maxkfkj))/2 Term frequency. fij Global terms weights (gi): None 1 Entropy 1+( j(pijlog(pij))/log(n))Term Weighting: Term Weighting IDF log(n/j(fij)) GfIdf (j fij)/(j(fij)) Normal 1/(jf2ij)0.5 Probabilistic Inverse log((n-j(fij))/j(fij)) Document normalization None 1 Cosine (j(gilij)2)-0.5Document-Term Matrix W: Document-Term Matrix W Binary [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ] TF-IDF [ 0.6 0 0 0 0 0.6 0 0 0 0.6 ] [ 0 0.5 0 0.5 0 0 0 0.5 0.5 0.5 ] [ 0.5 0.5 0 0 0.5 0 0 0 0.5 0 ] [ 0 0 0.6 0 0 0 0.6 0 0 0.6 ] [ 0 0 0.5 0 0.5 0.5 0 0 0.5 0 ] [ 0 0 0 0 0 0 0.7 0 0.7 0 ] [ 0.5 0 0 0.5 0 0 0 0.5 0 0.5 ] [ 0 0.5 0 0 0.5 0.5 0.5 0 0 0 ] [ 0 0 0 0.7 0 0 0 0.7 0 0 ] [ 0.5 0 0 0.5 0.5 0 0 0.5 0.5 0 ] Crisp Document-Term Matrix: Crisp Document-Term Matrix Fuzzy Document-Term Matrix: Fuzzy Document-Term Matrix The WT.W approach: The WT.W approachTerm-Term Matrix WTW: Term-Term Matrix WTW [ 0.1033 0.0250 0 0.0450 0.0450 0.0333 0 0.0450 0.0450 0.0583 ] [ 0.0250 0.0700 0 0.0200 0.0500 0.0250 0.0250 0.0200 0.0450 0.0200 ] [ 0 0 0.0583 0 0.0250 0.0250 0.0333 0 0.0250 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0500 0.0250 0.0200 0.0950 0.0500 0.0250 0.0200 0.0700 0 ] [ 0.0333 0.0250 0.0250 0 0.0500 0.0833 0.0250 0 0.0250 0.0333 ] [ 0 0.0250 0.0333 0 0.0250 0.0250 0.1083 0 0.0500 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0450 0.0250 0.0400 0.0700 0.0250 0.0500 0.0400 0.1400 0.0200 ] [ 0.0583 0.0200 0.0333 0.0450 0 0.0333 0.0333 0.0450 0.0200 0.1117 ]WTW Term-Term Matrix: WTW Term-Term Matrix Association Rules: Association Rules The Rows correspond to documents. The Columns correspond to terms. We want to find association rules between terms. Rules A=>B, are defined by:Confidence or Relative Cardinality: Confidence or Relative CardinalityCompositional Approach: Compositional ApproachSup-Prod Composition: Sup-Prod CompositionFuzzy Terms: Fuzzy Terms Meaning of term is a fuzzy set of documents. µ(t)= 0.8/d1+ 0.2/d2+ 0.0/d3+… Meaning of a document is a fuzzy set of terms. (d)= 0.1/t1+ 0.0/t2+ 0.8/t3+… Another interpretation of the document-term matrix: W = [µ(t1) µ(t2) µ(t3) …] WT = [(d1) (d2) (d3) …]Fuzzy Sets: Fuzzy Sets Inclusion measures: Similarity measures:Term-Document Matrix WT: Term-Document Matrix WT Fuzzy [0.6 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.4] [0.0 0.4 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.5 0.0 0.0 0.0 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.4] [0.6 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.0 0.7 0.0 0.5 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.4 0.5 0.0 0.5 0.7 0.0 0.0 0.0 0.4] [0.6 0.4 0.0 0.6 0.0 0.0 0.5 0.0 0.0 0.0] Crisp [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ] 2-D Projection of W: 2-D Projection of WFuzzy terms 1, 9: Fuzzy terms 1, 9Similarity using Min: Similarity using MinSimilarity using Prod: Similarity using ProdFuzzy Terms 2: Fuzzy Terms 2 Meaning of term is a fuzzy set of terms. µ(t)= 0.5/t1+ 1.0/t2+ 0.2/t3+… Meaning of a document is a fuzzy set of documents. (d)= 0.1/d1+ 0.5/d2+ 0.1/d3+… Another interpretation of the term-term and document-document matrix: T = [µ(t1) µ(t2) µ(t3) …] D = [(d1) (d2) (d3) …]Application to BeMySearch: Application to BeMySearch Query Expansion. Query Refinement. Re-Ranking. Navigation. User Profile. …Conclusions: Conclusions General Framework: WTW Association Rules Fuzzy relation composition Fuzzy terms Relies on a fuzzy document-term relation. Traditionally probabilistic approach. Necessity of really fuzzy approach.Future Work: Future Work More Relations: Sentence - Term Paragraph - Sentence Document - Paragraph Document – Document Clustering techniques Cluster of documents or paragraphs or sentences. Cluster of terms. Questions & Comments: Questions & Comments