wu zixin defense

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

TagSense - Marrying Folksonomy and Ontology: 

TagSense - Marrying Folksonomy and Ontology By: Zixin Wu Advisor: Amit P. Sheth Committee: John A. Miller Prashant Doshi

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Folksonomy: 

Folksonomy Web page and photos from Flick.com Web page from del.icio.us

Folksonomy Definitions: 

Folksonomy Definitions The behavior of massive tagging in social context and its product – tags for Web resources. It is collaborative metadata extraction and annotation. (from Thomas Vander Wal): Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). [1] (from Tom Gruber): the emergent labeling of lots of things by people in a social context. [2]

Features of Folksonomy: 

Features of Folksonomy Makes metadata extraction from multimedia Web resources easier. Extract information from the perspective of information consumer, e.g. put tags about the house in a photo but not the dog in it. Popular tags prevail and tags for a Web resource converge over time.

The Long Tail: 

The Long Tail

Power Law Distribution of Tags [3]: 

Power Law Distribution of Tags [3]

Folksonomy Triad [4,5]: 

Folksonomy Triad [4,5] The person tagging The Web resource being tagged The tag(s) being used on that Web resource We can use two of the elements to find a third element. e.g. find persons with similar interests by comparing the Web resources they tagged and the tags they used

Motivation Scenarios – Ambiguous Words: 

Motivation Scenarios – Ambiguous Words Search for “apple” Search for “turkey”

Disambiguation: 

Disambiguation What people usually do: add more keywords for disambiguation Trade off between precision and recall rates

Motivation Scenarios – Background Knowledge: 

Motivation Scenarios – Background Knowledge Task: Find photos about cities in Europe Solution1: Search “city Europe” Solution2: try the name of cities in Europe one by one Could be improved if the system knows Which term/concept is a city Which city is in Europe

Significant Drawbacks of Folksonomy: 

Significant Drawbacks of Folksonomy Keyword ambiguity Lack of background knowledge

Ontology: 

Ontology Ontology is an important term in Knowledge Representation and the key enabler of the Semantic Web A formal specification of a conceptualization [6] Ontologies state knowledge explicitly by using URIs and relationships, e.g. “#Paris #is_located_in #Europe” Current Specifications: RDF(s) [7,8], OWL [9], etc.

Semantic Annotation: 

Semantic Annotation Figure from [10]

Multiple Ontologies: 

Multiple Ontologies One Ontology cannot be always comprehensive enough Ontologies may be incompatible If multiple ontologies are used, we need to select and rank ontologies for a query.

Objectives: 

Objectives Shorten the time and effort for information retrieval in folksonomy improve recall rates by considering synonyms and enabling semantic search improve result ranking by putting the most appropriate items on the top of query results

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Approach Overview: 

Approach Overview Do not add any burden to our users: they should be able to use only tags to describe and search Web resources Do not expect our users have Semantic Web background Utilize ontologies as background knowledge in information retrieval

Approach Overview: 

Approach Overview Ontologies Folksonomy

Some Terms: 

Some Terms Web resource: anything with a URL Label: one or more keywords, e.g. air ticket Tag: a label tagged to a Web resource. Two different tags may have the same label Sense Cluster (or cluster): where tags with similar meanings are put together. Ideally, a cluster corresponds to a meaning. But often times, a meaning is represented by multiple clusters together. Semantic annotation: to associate a cluster with ontological concepts

Approach Overview: 

Approach Overview (a dot is a tag, a circle in blue is a sense cluster, a circle in yellow is an actual meaning) Ontology 1 Ontology 2

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Data Cleanup: “Dirty” Tags: 

Data Cleanup: “Dirty” Tags “bird” and “birds” “ebook” and “e-book”, “air-ticket”, “airticket”, and ”air ticket” “freephotos” should be “free”,”photo” “travelagent” should be “travel agent” “sculture” should be “sculpture” @pub-travel Europe2005

Tag Normalization: 

Tag Normalization Check 2 online dictionaries: Webster.com and Dict.cn Webster.com: stemming and misspelling Swimming -> swim, dogs -> dog Sculture -> sculpture Dict.cn: more words and compound words ibm: not in Webster, but in Dict.cn open source Try to split tags “freephotos” -> “free” and “photo” Ignore pure numbers, such as 2005, 07_01_2005

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Sense Indexing: 

Sense Indexing Ticket Keywords Senses Access Permit For an offender Fine Good

Sense Indexing: 

Sense Indexing The mappings between keywords and senses are n:m Index Web resources by senses instead of keywords. Put tags with similar meaning into the same cluster Need to disambiguate each node when indexing

Differences from Word Sense Disambiguation [11-15]: 

Differences from Word Sense Disambiguation [11-15] No sentence: no sentence structure, no part-of-speech analysis. The order of the labels in a Web resource are not necessary relevant. Produced in a social context: significant number of terms are not in lexicons. Terms change more frequently. That means we need to create senses for those terms. Relatively less noise.

Why Clustering (1): 

Why Clustering (1) Since we will match the clusters to ontological concepts, why not annotate each tag? Some terms are not in any ontology By aggregating the contexts of the tags in the same cluster, we know which contexts are important, which are noise (especially for narrow folksonomy) apple mac powerbook light paint long apple mac powerbook ajax web design

Why Clustering (2): 

Why Clustering (2) We get more context for semantic annotation Athens University Georgia Athens University Greece ?

Synonym: 

Synonym Seems impossible to automatically detect synonym ONLY based on the context of tags Reason: that contexts are similar enough does not imply synonyms Solution: use WordNet’s [16] synsets as synonym lists

Polysemy: 

Polysemy Cluster tags which have the same labels (or synonyms) into “sense clusters” based on the similarity of their contexts.

Context of Tags: 

Context of Tags Context of a tag T Other tags co-occur with T in a Web resource The co-occurrence frequencies e.g. User1: “turkey,istanbul,mosque”; User2: “turkey,istanbul, tour” In narrow folksonomy, all co-occurrence frequencies are 1 turkey istanbul tour mosque 1 1 1 1 2

Relatedness of Tags: 

Relatedness of Tags Basic idea: TF-IDF turkey istanbul tour mosque 1 1 1 1 2 turkey istanbul tour mosque 1/2 2/4 2/4 1/4 And then times IDF Co-occurrence TF

Context of a Cluster: 

Context of a Cluster Other clusters whose tags connect (co-occur) to the tags in this cluster The co-occurrence frequency of two clusters is the aggregation of the co-occurrence frequencies of the tags in the clusters 2 3 5

Relatedness of Clusters: 

Relatedness of Clusters The same calculation as relatedness of nodes

Important Context of a Cluster: 

Important Context of a Cluster Relatedness Context Important Context Level 1 Important Context Level 2 Important Context Level 3

Motivation for Building Senses: 

Motivation for Building Senses In order to search photos about turkey bird, some people use “bird” besides “turkey”, some use “animal”, some use “food”, “wild”, etc. Can we include all these tags, and then use them to build a sense? The clue to recognize these tags is that they co-occur with each others more often than with other tags (which are also the context of “turkey”)

Tag Disambiguation Process: 

Tag Disambiguation Process Put all tags with the same label (or synonyms) into one cluster. Do the following 3 phases to build senses.

Tag Disambiguation Phase 1: 

Tag Disambiguation Phase 1 Identify Important Context Level 1 Create a undirected weighted graph called Context Graph Each node in the graph is a cluster in the Important Context Level 1 The weight of an edge is the relatedness of the two clusters. (relatedness is asymmetric, we take the larger one). Apply a threshold to the edges of the Context Graph, so that the graph becomes one or more disconnected component. Create a sense corresponding to each component, and use the clusters in a component as the context of the corresponding sense.

Tag Disambiguation Phase 1: 

Tag Disambiguation Phase 1 We are disambiguating “turkey”, so the cluster “turkey” is hidden for better illustration.

Tag Disambiguation Phase 2: 

Tag Disambiguation Phase 2 The purpose of this phase is to find missing senses in Phase 1, which are not used often in the dataset Identify Important Context Level 2 For each cluster in Important Context Level 2, find the most related sense built in Phase 1 (and also above a threshold). If there is such a sense, merge the cluster being considered to that sense’s context. Otherwise, build a new sense and use the cluster as the context.

Tag Disambiguation Phase 2: 

Tag Disambiguation Phase 2 The red clusters are newly discovered in Phase 2

Tag Disambiguation Phase 3: 

Tag Disambiguation Phase 3 Identify Important Context Level 3 Similar to Phase 2, but do not create any new sense; just enrich the context of the senses built in Phase 1 and Phase 2.

Tag Disambiguation Process - continue: 

Tag Disambiguation Process - continue Compare each tag we are considering with the senses. Select the best matched sense and assign the tag to it. Do step 2 and step 3 again when the number of the tags we are considering is increased to a certain percentage.

Tag Disambiguation Process: 

Tag Disambiguation Process turkey istanbul turkish x y MatchScore=x+y

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Utilizing Ontologies: 

Utilizing Ontologies Match each cluster to ontological concepts where appropriate But there is no named relationships between tags That means we cannot compare by the names of relationships We will need relatedness of ontological concepts We will also need similarity of ontological concepts in semantic search

Relatedness of Ontological Concepts: 

Relatedness of Ontological Concepts Basic idea: TF-IDF 0 for any pair of concepts without relationship. TF-IDF(c1,c2)=TF(c1,c2)*IDF(c1)

Relatedness of Ontological Concepts: 

Relatedness of Ontological Concepts TF (c1 to c2): Issue query “c1 c2” to Yahoo! Search Engine Get the hit count h Issue queries for each concept cx connected to c2:”cx c2” Get the hit counts hx TF(c1,c2)=h/∑hx IDF(c): Issue query “c” to Yahoo! Search Engine Get the hit count h Yahoo! current index size: 20 billion pages IDF(C)=-log(h/20 billion)

Similarity of Ontological Concepts: 

Similarity of Ontological Concepts First, only consider the taxonomy in the ontology Information Content [17]: IC(c) = -log(prob(c)) Sim(c1, c2)=2*IC(ancestor)/(IC(c1)+IC(c2)) [18] Car Sedan Coupe 58 M 76 M 1040 M Hit Counts Sum Probability Information Content Car Sedan Coupe 58 M 76 M 1174 M Car Sedan Coupe 0.0029 0.0038 0.0587 Car Sedan Coupe 2.54 2.42 2.23 Sim (Sedan, Coupe) = 2*2.23/(2.54+2.42) = 0.899

Similarity of Ontological Concepts: 

Similarity of Ontological Concepts Also consider other types of relationships by using Jaccard (COSINE) Similarity coefficient Athens Atlanta Georgia Is_located_in Is_located_in

Matching Clusters to Ontologies: 

Matching Clusters to Ontologies Compare the important context of a cluster with the context (concepts) of an ontological concept Sum up the relatedness of matched context clusters Select the best ontological concept which gets the best matching score and also above a threshold

Matching Clusters to Ontologies: 

Matching Clusters to Ontologies A context cluster x is considered matched to a context concept y if: They have the same label (or synonym), or If x is matched to y’ and the relatedness (or similarity) of y’ to y is above a threshold, or If the relatedness of x’ (which is matched to y) to x is above a threshold

Matching Clusters to Ontologies - example: 

Matching Clusters to Ontologies - example turkey turkey bird bird turkey turkey animal bird animal Semantic annotation Rel(bird,animal)>threshold Sim(bird,animal)>threshold turkey turkey animal bird bird Semantic annotation Case 2 Case 1 Case 3 Rel(bird,animal)>threshold

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Semantic Search [19,20]: 

Semantic Search [19,20] Search by the ontological relationships Currently only consider “subclass” and “type” relationships Map to the corresponding clusters by semantic annotations Expand the corresponding clusters by including other clusters with the same label, because some clusters may not have semantic annotation but they should have.

Semantic Search: 

Ottawa Ottawa Madrid Seoul Seoul Madrid Madrid Madrid Seoul Geography Domain Ontology Politics Domain Ontology Ottawa Semantic Search

Most-Desired Senses Ranking: 

Most-Desired Senses Ranking We need to rank the candidate clusters The system show one photo for each candidate cluster The user select the best photo from the samples The system ranks other clusters based on the selection

Most-Desired Senses Ranking: 

Most-Desired Senses Ranking The basic idea is finding shortest paths in a graph from a single source Put a constant energy on the source cluster, and distribute the energy to other clusters The weight of an edge is the similarity of the clusters

Slide62: 

Ottawa Ottawa Madrid Seoul Seoul Madrid Madrid Madrid Seoul Geography Domain Ontology Politics Domain Ontology 1 0.21556036 0.19367748 0.05285901 0.46428478 0.32185575 0.08457597 0.3705629 0.24315993 Ottawa 0.09375922

Clusters Similarity: 

Clusters Similarity If the semantic annotations of two clusters refer to the same ontology, use the similarity of corresponding ontological concepts. Otherwise, calculate cluster similarity by the context of the two clusters.

Cluster Similarity by Context: 

Cluster Similarity by Context A modified version of Dice similarity Let’s say we are comparing cluster1 and cluster2 Compare only the important context of cluster1 and cluster2 Calculate the percentage of overlapped context Decide if context cluster c1 of cluster1 and context cluster c2 of cluster2 is matched by the way in matching clusters to ontologies

Ontology Ranking [21,23]: 

Ontology Ranking [21,23] Ontologies come from a repository If multiple ontology is used for a query, we need to give a weight to each ontology The ontology with higher weight has higher “power” to decide the similarity/relatedness of two ontological concepts Rank ontologies by using the 4 most recent queries of the same user

Ontology Ranking: 

Ontology Ranking Centrality Measure [22] Thing C D(c) H(c)

Ontology Ranking: 

Ontology Ranking Density Measure [22]

Ontology Ranking: 

Ontology Ranking

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

System overview: 

System overview Photos with Tags Queries Sense Index Sense Indexing Module Ontology Mapping Module Ontology Mapping Search Engine Ontology Ranking Module Ontology Ranks Semantic Query Module Query Result Ontologies Tag Cleanup Module Ontology Measuring Module Ontology Measures Query History

Evaluation Measures: 

Evaluation Measures Compare with Google Desktop on the same datasets How much time a user has to spend in order to find the required photos. How many clicks of the mouse a user has to do in order to find the required photos. How many different queries a user has to issue in order to find the required photos. The user may change the query at any time if he feels necessary.

Evaluations (1): 

Evaluations (1) Experiment set 1: for disambiguation DataSets 500 photos with a tag “apple” 500 photos with a tag “turkey”

User Case 1: 

User Case 1 Task1: find 50 photos about Apple electronic products

User Case 2: 

User Case 2 Task2: find 30 photos about the fruit apple

User Case 3: 

User Case 3 Task3: 50 photos about the country Turkey

User Case 4: 

User Case 4 Task4: find 10 photos about turkey birds

Evaluations (2): 

Evaluations (2) Experiment set 2: for semantic search DataSets About 300 photos for each of the following tag: Beijing, Madrid, Ottawa, Rome, Seoul, Tokyo, Baltimore, New York, Pittsburgh, Washington D.C., Amsterdam, Florence, Venice, Athens Greece, Athens Georgia Ontologies An ontology in travel domain (partially from Realtravel.com) Modified AKTiveSA [24] project ontology in geography domain An ontology in politics domain (partially from SWETO[25])

Use Case 5: 

Use Case 5 Task 5: find up to 5 photos for 5 cities in Europe

Evaluation: 

Evaluation Most-Desire Senses Ranking approach may involve time overhead in selecting the most wanted photo sense Changing query involves time overhead in thinking and typing Overall, users spent significantly less time and effort in finding the information they want

Outline: 

Outline Background and Motivation Approach Overview Tag Normalization Sense Indexing Utilizing ontologies Semantic Search and Ranking Implementation and Evaluations Conclusions Demo

Conclusions: 

Conclusions We proposed an approach to combine folksonomies and ontologies Index Web resources by senses into sense clusters Match sense clusters to ontological concepts Semantic search based on ontological relationships Most-Desired Sense Ranking approach Multiple ontologies ranking Evaluation: users spent significant less time and effort in finding the information they want

Slide82: 

Demo

Slide83: 

Questions and comments

References (1): 

References (1) [1] Wal, T.V. Folksonomy Coinage and Definition. 2004 [cited; Available from: http://vanderwal.net/folksonomy.html. [2] Gruber, T., Ontology of Folksonomy: A Mash-up of Apples and Oranges. International Journal on Semantic Web and Information Systems, 2007. 3(1). [3] Halpin, H., V. Robu, and H. Shepherd. The Complex Dynamics of Collaborative Tagging. in WWW '07: Proceedings of the 16th international conference on World Wide Web. 2007: ACM. [4] Wal, T.V. Folksonomy Definition and Wikipedia. 2005 [cited; Available from: http://www.vanderwal.net/random/entrysel.php?blog=1750. [5] Mika, P., Ontologies are us: A unified model of social networks and semantics. Journal of Web Semantics, 2007. 5(1): p. 5-15. [6] Gruber, T.R., A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 1993. 5(2): p. 199-220. [7] Resource Description Framework (RDF). [cited; Available from: http://www.w3.org/RDF/. [8] RDF Vocabulary Description Language 1.0: RDF Schema. 2004 [cited; Available from: http://www.w3.org/TR/rdf-schema/.

References (2): 

References (2) [9] McGuinness, D.L. and F.v. Harmelen. OWL Web Ontology Language. 2004 [cited; Available from: http://www.w3.org/TR/owl-features/. [10] Kiryakov, A., et al., Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web, 2004. 2(1): p. 49-79. [11] Ide, N. and J. Véronis, Word sense disambiguation: The state of the art. Computational Linguistics, 1998. 1(24): p. 1-40. [12] Wilks, Y. and M. Stevenson. Sense Tagging: Semantic Tagging with a Lexicon. in the SIGLEX Workshop Tagging Text with Lexical Semantics: What, why and how? 1997. Washington, D.C. [13] Diab, M. and P. Resnik. An Unsupervised method for Word Sense Tagging using Parallel Corpara. in the 40th Annual Meeting of the Association for Computational Linguistics. 2002. Philadelphia, Pennsylvania. [14] Molina, A., et al. Word Sense Disambiguation using Statistical Models and WordNet. in 3rd International Conference on Language Resources and Evaluation. 2002. Las Palmas de Gran Canaria, Spain. [15] Banerjee, S. and B.P. Mullick, Word Sense Disambiguation and WordNet Technology. Literary and Linguistic Computing, 2007. 22(1): p. 1-15. [16] Fellbaum, C., WordNet: An Electronic Lexical Database. 1998: The MIT Press.

References (3): 

References (3) [17] Resnik, P., Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999. 11: p. 95-130. [18] Lin, D., An Information-Theoretic Definition of Similarity, in International Conference on Machine Learning (ICML). 1998: Madison, Wisconsin, USA. [19] Sheth, A., et al., Managing Semantic Content for the Web. IEEE Internet Computing, 2002. 6(4): p. 80-87. [20] Guha, R., R. McCool, and E. Miller. Semantic search. in the 12th international conference on World Wide Web. 2003. [21] Arumugam, M., A. Sheth, and I.B. Arpinar, Towards Peer-to-Peer Semantic Web: A Distributed Environment for Sharing Semantic Knowledge on the Web, in International Workshop on Real World RDF and Semantic Web Applications. 2002: Hawaii, USA. [22] Alani, H. and C. Brewster. Ontology ranking based on the analysis of concept structures. in the 3rd international conference on Knowledge capture. 2005. [23] Zhang, Y., W. Vasconcelos, and D. Sleeman. OntoSearch: An Ontology Search Engine. in The Twenty-fourth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence. 2004. Cambridge, UK. [24] AKTiveSA. [cited; Available from: http://sa.aktivespace.org/. [25] Aleman-Meza, B., et al. SWETO: Large-Scale Semantic Web Test-bed. in 16th Int'l Conf. Software Eng. & Knowledge Eng., Workshop on Ontology in Action, Knowledge Systems Inst. 2004.

authorStream Live Help