Constructing Fuzzy Thesaurus for WWW

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch : 

Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch M. De Cock, S. Guadarrama and M. Nikravesh 2003 BISC FLINT-CIBI BISC, UC Berkeley

Content: 

Content Introduction Basic Concepts Term Weighting The WTW-approach Association Rules Fuzzy Terms Examples Conclusion

Introduction: 

Introduction During 70s-80s: Small text collections. Structured Databases. Information Retrieval methods. Now: Huge multimedia collections. Unstructured Web. Fuzzy Retrieval methods.

Fuzzy Thesaurus: 

Fuzzy Thesaurus Is a couple (T, R) consisting of a set T of terms and a set R of binary fuzzy relations. Examples of binary fuzzy relations are: Similarity Broader Narrower Part Of Instance Of …

Basic Concepts: 

Basic Concepts Document-Term relation: Crisp W: D x T -> {0,1} Fuzzy W: D x T -> [0,1] Term-Term Relation R: Man-Made: Dictionaries, Synonyms, Ontologies,… Computer-Made: WTW, Association Rules, Similarity and Inclusion Measures,…

Term Weighting: 

Term Weighting Local terms weights (lij): Binary (fij) Logarithmic log(1+fij) Normalized ((fij)+(fij/maxkfkj))/2 Term frequency. fij Global terms weights (gi): None 1 Entropy 1+( j(pijlog(pij))/log(n))

Term Weighting: 

Term Weighting IDF log(n/j(fij)) GfIdf (j fij)/(j(fij)) Normal 1/(jf2ij)0.5 Probabilistic Inverse log((n-j(fij))/j(fij)) Document normalization None 1 Cosine (j(gilij)2)-0.5

Document-Term Matrix W: 

Document-Term Matrix W Binary [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ] TF-IDF [ 0.6 0 0 0 0 0.6 0 0 0 0.6 ] [ 0 0.5 0 0.5 0 0 0 0.5 0.5 0.5 ] [ 0.5 0.5 0 0 0.5 0 0 0 0.5 0 ] [ 0 0 0.6 0 0 0 0.6 0 0 0.6 ] [ 0 0 0.5 0 0.5 0.5 0 0 0.5 0 ] [ 0 0 0 0 0 0 0.7 0 0.7 0 ] [ 0.5 0 0 0.5 0 0 0 0.5 0 0.5 ] [ 0 0.5 0 0 0.5 0.5 0.5 0 0 0 ] [ 0 0 0 0.7 0 0 0 0.7 0 0 ] [ 0.5 0 0 0.5 0.5 0 0 0.5 0.5 0 ]

Crisp Document-Term Matrix: 

Crisp Document-Term Matrix

Fuzzy Document-Term Matrix: 

Fuzzy Document-Term Matrix

The WT.W approach: 

The WT.W approach

Term-Term Matrix WTW: 

Term-Term Matrix WTW [ 0.1033 0.0250 0 0.0450 0.0450 0.0333 0 0.0450 0.0450 0.0583 ] [ 0.0250 0.0700 0 0.0200 0.0500 0.0250 0.0250 0.0200 0.0450 0.0200 ] [ 0 0 0.0583 0 0.0250 0.0250 0.0333 0 0.0250 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0500 0.0250 0.0200 0.0950 0.0500 0.0250 0.0200 0.0700 0 ] [ 0.0333 0.0250 0.0250 0 0.0500 0.0833 0.0250 0 0.0250 0.0333 ] [ 0 0.0250 0.0333 0 0.0250 0.0250 0.1083 0 0.0500 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0450 0.0250 0.0400 0.0700 0.0250 0.0500 0.0400 0.1400 0.0200 ] [ 0.0583 0.0200 0.0333 0.0450 0 0.0333 0.0333 0.0450 0.0200 0.1117 ]

WTW Term-Term Matrix: 

WTW Term-Term Matrix

Association Rules: 

Association Rules The Rows correspond to documents. The Columns correspond to terms. We want to find association rules between terms. Rules A=>B, are defined by:

Confidence or Relative Cardinality: 

Confidence or Relative Cardinality

Compositional Approach: 

Compositional Approach

Sup-Prod Composition: 

Sup-Prod Composition

Fuzzy Terms: 

Fuzzy Terms Meaning of term is a fuzzy set of documents. µ(t)= 0.8/d1+ 0.2/d2+ 0.0/d3+… Meaning of a document is a fuzzy set of terms. (d)= 0.1/t1+ 0.0/t2+ 0.8/t3+… Another interpretation of the document-term matrix: W = [µ(t1) µ(t2) µ(t3) …] WT = [(d1) (d2) (d3) …]

Fuzzy Sets: 

Fuzzy Sets Inclusion measures: Similarity measures:

Term-Document Matrix WT: 

Term-Document Matrix WT Fuzzy [0.6 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.4] [0.0 0.4 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.5 0.0 0.0 0.0 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.4] [0.6 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.0 0.7 0.0 0.5 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.4 0.5 0.0 0.5 0.7 0.0 0.0 0.0 0.4] [0.6 0.4 0.0 0.6 0.0 0.0 0.5 0.0 0.0 0.0] Crisp [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ]

2-D Projection of W: 

2-D Projection of W

Fuzzy terms 1, 9: 

Fuzzy terms 1, 9

Similarity using Min: 

Similarity using Min

Similarity using Prod: 

Similarity using Prod

Fuzzy Terms 2: 

Fuzzy Terms 2 Meaning of term is a fuzzy set of terms. µ(t)= 0.5/t1+ 1.0/t2+ 0.2/t3+… Meaning of a document is a fuzzy set of documents. (d)= 0.1/d1+ 0.5/d2+ 0.1/d3+… Another interpretation of the term-term and document-document matrix: T = [µ(t1) µ(t2) µ(t3) …] D = [(d1) (d2) (d3) …]

Application to BeMySearch: 

Application to BeMySearch Query Expansion. Query Refinement. Re-Ranking. Navigation. User Profile. …

Conclusions: 

Conclusions General Framework: WTW Association Rules Fuzzy relation composition Fuzzy terms Relies on a fuzzy document-term relation. Traditionally probabilistic approach. Necessity of really fuzzy approach.

Future Work: 

Future Work More Relations: Sentence - Term Paragraph - Sentence Document - Paragraph Document – Document Clustering techniques Cluster of documents or paragraphs or sentences. Cluster of terms.

Questions & Comments: 

Questions & Comments

authorStream Live Help