Presentation Transcript
Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch : Constructing Fuzzy Thesaurus for WWW: Application to BeMySearch M. De Cock, S. Guadarrama and M. Nikravesh
2003 BISC FLINT-CIBI
BISC, UC Berkeley
Content : Content Introduction
Basic Concepts
Term Weighting
The WTW-approach
Association Rules
Fuzzy Terms
Examples
Conclusion
Introduction : Introduction During 70s-80s:
Small text collections.
Structured Databases.
Information Retrieval methods.
Now:
Huge multimedia collections.
Unstructured Web.
Fuzzy Retrieval methods.
Fuzzy Thesaurus : Fuzzy Thesaurus Is a couple (T, R) consisting of a set T of terms and a set R of binary fuzzy relations.
Examples of binary fuzzy relations are:
Similarity
Broader
Narrower
Part Of
Instance Of
…
Basic Concepts : Basic Concepts Document-Term relation:
Crisp W: D x T -> {0,1}
Fuzzy W: D x T -> [0,1]
Term-Term Relation R:
Man-Made: Dictionaries, Synonyms, Ontologies,…
Computer-Made: WTW, Association Rules, Similarity and Inclusion Measures,…
Term Weighting : Term Weighting Local terms weights (lij):
Binary (fij)
Logarithmic log(1+fij)
Normalized ((fij)+(fij/maxkfkj))/2
Term frequency. fij
Global terms weights (gi):
None 1
Entropy 1+( j(pijlog(pij))/log(n))
Term Weighting : Term Weighting IDF log(n/j(fij))
GfIdf (j fij)/(j(fij))
Normal 1/(jf2ij)0.5
Probabilistic Inverse log((n-j(fij))/j(fij))
Document normalization
None 1
Cosine (j(gilij)2)-0.5
Document-Term Matrix W : Document-Term Matrix W Binary
[ 1 0 0 0 0 1 0 0 0 1 ]
[ 0 1 0 1 0 0 0 1 1 1 ]
[ 1 1 0 0 1 0 0 0 1 0 ]
[ 0 0 1 0 0 0 1 0 0 1 ]
[ 0 0 1 0 1 1 0 0 1 0 ]
[ 0 0 0 0 0 0 1 0 1 0 ]
[ 1 0 0 1 0 0 0 1 0 1 ]
[ 0 1 0 0 1 1 1 0 0 0 ]
[ 0 0 0 1 0 0 0 1 0 0 ]
[ 1 0 0 1 1 0 0 1 1 0 ] TF-IDF
[ 0.6 0 0 0 0 0.6 0 0 0 0.6 ]
[ 0 0.5 0 0.5 0 0 0 0.5 0.5 0.5 ]
[ 0.5 0.5 0 0 0.5 0 0 0 0.5 0 ]
[ 0 0 0.6 0 0 0 0.6 0 0 0.6 ]
[ 0 0 0.5 0 0.5 0.5 0 0 0.5 0 ]
[ 0 0 0 0 0 0 0.7 0 0.7 0 ]
[ 0.5 0 0 0.5 0 0 0 0.5 0 0.5 ]
[ 0 0.5 0 0 0.5 0.5 0.5 0 0 0 ]
[ 0 0 0 0.7 0 0 0 0.7 0 0 ]
[ 0.5 0 0 0.5 0.5 0 0 0.5 0.5 0 ]
Crisp Document-Term Matrix : Crisp Document-Term Matrix
Fuzzy Document-Term Matrix : Fuzzy Document-Term Matrix
The WT.W approach : The WT.W approach
Term-Term Matrix WTW : Term-Term Matrix WTW
[ 0.1033 0.0250 0 0.0450 0.0450 0.0333 0 0.0450 0.0450 0.0583 ]
[ 0.0250 0.0700 0 0.0200 0.0500 0.0250 0.0250 0.0200 0.0450 0.0200 ]
[ 0 0 0.0583 0 0.0250 0.0250 0.0333 0 0.0250 0.0333 ]
[ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ]
[ 0.0450 0.0500 0.0250 0.0200 0.0950 0.0500 0.0250 0.0200 0.0700 0 ]
[ 0.0333 0.0250 0.0250 0 0.0500 0.0833 0.0250 0 0.0250 0.0333 ]
[ 0 0.0250 0.0333 0 0.0250 0.0250 0.1083 0 0.0500 0.0333 ]
[ 0.0450 0.0200 0 0.1150 0.0200 0 0 0.1150 0.0400 0.0450 ]
[ 0.0450 0.0450 0.0250 0.0400 0.0700 0.0250 0.0500 0.0400 0.1400 0.0200 ]
[ 0.0583 0.0200 0.0333 0.0450 0 0.0333 0.0333 0.0450 0.0200 0.1117 ]
WTW Term-Term Matrix : WTW Term-Term Matrix
Association Rules : Association Rules The Rows correspond to documents.
The Columns correspond to terms.
We want to find association rules between terms.
Rules A=>B, are defined by:
Confidence or Relative Cardinality : Confidence or Relative Cardinality
Compositional Approach : Compositional Approach
Sup-Prod Composition : Sup-Prod Composition
Fuzzy Terms : Fuzzy Terms Meaning of term is a fuzzy set of documents.
µ(t)= 0.8/d1+ 0.2/d2+ 0.0/d3+…
Meaning of a document is a fuzzy set of terms.
(d)= 0.1/t1+ 0.0/t2+ 0.8/t3+…
Another interpretation of the document-term matrix:
W = [µ(t1) µ(t2) µ(t3) …]
WT = [(d1) (d2) (d3) …]
Fuzzy Sets : Fuzzy Sets Inclusion measures: Similarity measures:
Term-Document Matrix WT : Term-Document Matrix WT Fuzzy
[0.6 0.0 0.5 0.0 0.0 0.0 0.5 0.0 0.0 0.4]
[0.0 0.4 0.5 0.0 0.0 0.0 0.0 0.5 0.0 0.0]
[0.0 0.0 0.0 0.6 0.5 0.0 0.0 0.0 0.0 0.0]
[0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4]
[0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.5 0.0 0.4]
[0.6 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.0]
[0.0 0.0 0.0 0.6 0.0 0.7 0.0 0.5 0.0 0.0]
[0.0 0.4 0.0 0.0 0.0 0.0 0.5 0.0 0.7 0.4]
[0.0 0.4 0.5 0.0 0.5 0.7 0.0 0.0 0.0 0.4]
[0.6 0.4 0.0 0.6 0.0 0.0 0.5 0.0 0.0 0.0] Crisp
[ 1 0 0 0 0 1 0 0 0 1 ]
[ 0 1 0 1 0 0 0 1 1 1 ]
[ 1 1 0 0 1 0 0 0 1 0 ]
[ 0 0 1 0 0 0 1 0 0 1 ]
[ 0 0 1 0 1 1 0 0 1 0 ]
[ 0 0 0 0 0 0 1 0 1 0 ]
[ 1 0 0 1 0 0 0 1 0 1 ]
[ 0 1 0 0 1 1 1 0 0 0 ]
[ 0 0 0 1 0 0 0 1 0 0 ]
[ 1 0 0 1 1 0 0 1 1 0 ]
2-D Projection of W : 2-D Projection of W
Fuzzy terms 1, 9 : Fuzzy terms 1, 9
Similarity using Min : Similarity using Min
Similarity using Prod : Similarity using Prod
Fuzzy Terms 2 : Fuzzy Terms 2 Meaning of term is a fuzzy set of terms.
µ(t)= 0.5/t1+ 1.0/t2+ 0.2/t3+…
Meaning of a document is a fuzzy set of documents.
(d)= 0.1/d1+ 0.5/d2+ 0.1/d3+…
Another interpretation of the term-term and document-document matrix:
T = [µ(t1) µ(t2) µ(t3) …]
D = [(d1) (d2) (d3) …]
Application to BeMySearch : Application to BeMySearch Query Expansion.
Query Refinement.
Re-Ranking.
Navigation.
User Profile.
…
Conclusions : Conclusions General Framework:
WTW
Association Rules
Fuzzy relation composition
Fuzzy terms
Relies on a fuzzy document-term relation.
Traditionally probabilistic approach.
Necessity of really fuzzy approach.
Future Work : Future Work More Relations:
Sentence - Term
Paragraph - Sentence
Document - Paragraph
Document – Document
Clustering techniques
Cluster of documents or paragraphs or sentences.
Cluster of terms.
Questions & Comments : Questions & Comments
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.