# Constructing Fuzzy Thesaurus for WWW

Views:

Category: Entertainment

## Presentation Description

No description available.

## Presentation Transcript

### Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch :

Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch M. De Cock, S. Guadarrama and M. Nikravesh 2003 BISC FLINT-CIBI BISC, UC Berkeley

### Content:

Content Introduction Basic Concepts Term Weighting The WTW-approach Association Rules Fuzzy Terms Examples Conclusion

### Introduction:

Introduction During 70s-80s: Small text collections. Structured Databases. Information Retrieval methods. Now: Huge multimedia collections. Unstructured Web. Fuzzy Retrieval methods.

### Fuzzy Thesaurus:

Fuzzy Thesaurus Is a couple (T, R) consisting of a set T of terms and a set R of binary fuzzy relations. Examples of binary fuzzy relations are: Similarity Broader Narrower Part Of Instance Of …

### Basic Concepts:

Basic Concepts Document-Term relation: Crisp W: D x T -> {0,1} Fuzzy W: D x T -> [0,1] Term-Term Relation R: Man-Made: Dictionaries, Synonyms, Ontologies,… Computer-Made: WTW, Association Rules, Similarity and Inclusion Measures,…

### Term Weighting:

Term Weighting Local terms weights (lij): Binary (fij) Logarithmic log(1+fij) Normalized ((fij)+(fij/maxkfkj))/2 Term frequency. fij Global terms weights (gi): None 1 Entropy 1+( j(pijlog(pij))/log(n))

### Term Weighting:

Term Weighting IDF log(n/j(fij)) GfIdf (j fij)/(j(fij)) Normal 1/(jf2ij)0.5 Probabilistic Inverse log((n-j(fij))/j(fij)) Document normalization None 1 Cosine (j(gilij)2)-0.5

### Document-Term Matrix W:

Document-Term Matrix W Binary [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ] TF-IDF [ 0.6 0 0 0 0 0.6 0 0 0 0.6 ] [ 0 0.5 0 0.5 0 0 0 0.5 0.5 0.5 ] [ 0.5 0.5 0 0 0.5 0 0 0 0.5 0 ] [ 0 0 0.6 0 0 0 0.6 0 0 0.6 ] [ 0 0 0.5 0 0.5 0.5 0 0 0.5 0 ] [ 0 0 0 0 0 0 0.7 0 0.7 0 ] [ 0.5 0 0 0.5 0 0 0 0.5 0 0.5 ] [ 0 0.5 0 0 0.5 0.5 0.5 0 0 0 ] [ 0 0 0 0.7 0 0 0 0.7 0 0 ] [ 0.5 0 0 0.5 0.5 0 0 0.5 0.5 0 ]

### Crisp Document-Term Matrix:

Crisp Document-Term Matrix

### Fuzzy Document-Term Matrix:

Fuzzy Document-Term Matrix

### The WT.W approach:

The WT.W approach

### Term-Term Matrix WTW:

Term-Term Matrix WTW [ 0.1033 0.0250 0 0.0450 0.0450 0.0333 0 0.0450 0.0450 0.0583 ] [ 0.0250 0.0700 0 0.0200 0.0500 0.0250 0.0250 0.0200 0.0450 0.0200 ] [ 0 0 0.0583 0 0.0250 0.0250 0.0333 0 0.0250 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0500 0.0250 0.0200 0.0950 0.0500 0.0250 0.0200 0.0700 0 ] [ 0.0333 0.0250 0.0250 0 0.0500 0.0833 0.0250 0 0.0250 0.0333 ] [ 0 0.0250 0.0333 0 0.0250 0.0250 0.1083 0 0.0500 0.0333 ] [ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ] [ 0.0450 0.0450 0.0250 0.0400 0.0700 0.0250 0.0500 0.0400 0.1400 0.0200 ] [ 0.0583 0.0200 0.0333 0.0450 0 0.0333 0.0333 0.0450 0.0200 0.1117 ]

### WTW Term-Term Matrix:

WTW Term-Term Matrix

### Association Rules:

Association Rules The Rows correspond to documents. The Columns correspond to terms. We want to find association rules between terms. Rules A=>B, are defined by:

### Confidence or Relative Cardinality:

Confidence or Relative Cardinality

### Compositional Approach:

Compositional Approach

### Sup-Prod Composition:

Sup-Prod Composition

### Fuzzy Terms:

Fuzzy Terms Meaning of term is a fuzzy set of documents. µ(t)= 0.8/d1+ 0.2/d2+ 0.0/d3+… Meaning of a document is a fuzzy set of terms. (d)= 0.1/t1+ 0.0/t2+ 0.8/t3+… Another interpretation of the document-term matrix: W = [µ(t1) µ(t2) µ(t3) …] WT = [(d1) (d2) (d3) …]

### Fuzzy Sets:

Fuzzy Sets Inclusion measures: Similarity measures:

### Term-Document Matrix WT:

Term-Document Matrix WT Fuzzy [0.6 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.4] [0.0 0.4 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.5 0.0 0.0 0.0 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.4] [0.6 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0] [0.0 0.0 0.0 0.6 0.0 0.7 0.0 0.5 0.0 0.0] [0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4] [0.0 0.4 0.5 0.0 0.5 0.7 0.0 0.0 0.0 0.4] [0.6 0.4 0.0 0.6 0.0 0.0 0.5 0.0 0.0 0.0] Crisp [ 1 0 0 0 0 1 0 0 0 1 ] [ 0 1 0 1 0 0 0 1 1 1 ] [ 1 1 0 0 1 0 0 0 1 0 ] [ 0 0 1 0 0 0 1 0 0 1 ] [ 0 0 1 0 1 1 0 0 1 0 ] [ 0 0 0 0 0 0 1 0 1 0 ] [ 1 0 0 1 0 0 0 1 0 1 ] [ 0 1 0 0 1 1 1 0 0 0 ] [ 0 0 0 1 0 0 0 1 0 0 ] [ 1 0 0 1 1 0 0 1 1 0 ]

### 2-D Projection of W:

2-D Projection of W

Fuzzy terms 1, 9

### Similarity using Min:

Similarity using Min

### Similarity using Prod:

Similarity using Prod

### Fuzzy Terms 2:

Fuzzy Terms 2 Meaning of term is a fuzzy set of terms. µ(t)= 0.5/t1+ 1.0/t2+ 0.2/t3+… Meaning of a document is a fuzzy set of documents. (d)= 0.1/d1+ 0.5/d2+ 0.1/d3+… Another interpretation of the term-term and document-document matrix: T = [µ(t1) µ(t2) µ(t3) …] D = [(d1) (d2) (d3) …]

### Application to BeMySearch:

Application to BeMySearch Query Expansion. Query Refinement. Re-Ranking. Navigation. User Profile. …

### Conclusions:

Conclusions General Framework: WTW Association Rules Fuzzy relation composition Fuzzy terms Relies on a fuzzy document-term relation. Traditionally probabilistic approach. Necessity of really fuzzy approach.

### Future Work:

Future Work More Relations: Sentence - Term Paragraph - Sentence Document - Paragraph Document – Document Clustering techniques Cluster of documents or paragraphs or sentences. Cluster of terms.