logging in or signing up s1350 nambiar Manlio Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 27 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 06, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Answering Imprecise Queries over Web Databases Ullas Nambiar and Subbarao Kambhampati Department of CS & Engineering Arizona State University VLDB , Aug 30 – Sep 02, 2005, Trondheim, NorwayWhy Imprecise Queries ?: Why Imprecise Queries ? The Imprecise Query Answering Problem: The Imprecise Query Answering Problem Problem Statement: Given a conjunctive query Q over a relation R, find a ranked set of tuples of R that satisfy Q above a threshold of similarity Tsim. Ans(Q) ={x|x Є R, Similarity(Q,x) >Tsim} Constraints: Autonomous Database Data accessible only by querying Data model, operators etc not modifiable Supports boolean model (relation) Existing Approaches: Existing Approaches Similarity search over Vector space Data must be stored as vectors of text WHIRL, W. Cohen, 1998 Enhanced database model Add ‘similar-to’ operator to SQL. Distances provided by an expert/system designer VAGUE, A. Motro, 1998 Support similarity search and query refinement over abstract data types Binderberger et al, 2003 User guidance Users provide information about objects required and their possible neighborhood Proximity Search, Goldman et al, 1998 Limitations: User/expert must provide similarity measures New operators to use distance measures Not applicable over autonomous databasesMotivation & Challenges what we did !: Motivation & Challenges what we did ! Objectives Minimal burden on the end user No changes to existing database Domain independent Motivation Mimic relevance based ranked retrieval paradigm of IR systems Can we learn relevance statistics from database ? Use the estimated relevance model to improve the querying experience of users Challenges Estimating Query-Tuple Similarity Weighted summation of attribute similarities Syntactic similarity inadequate Need to estimate semantic similarity Not enough Ontologies Measuring Attribute Importance Not all attributes equally important Users cannot quantify importanceThe AIMQ approach: The AIMQ approachQuery-Tuple Similarity: Query-Tuple Similarity Tuples in extended set show different levels of relevance Ranked according to their similarity to the corresponding tuples in base set using n = Count(Attributes(R)) and Wimp is the importance weight of the attribute Euclidean distance as similarity for numerical attributes e.g. Price, Year VSim – semantic value similarity estimated by AIMQ for categorical attributes e.g. Make, ModelDeciding Attribute Order: Deciding Attribute Order Mine AFDs and Approximate Keys Create dependence graph using AFDs Strongly connected hence a topological sort not possible Using Approximate Key with highest support partition attributes into Deciding set Dependent set Sort the subsets using dependence and influence weights Measure attribute importance as Attribute relaxation order is all non-keys first then keys Greedy multi-attribute relaxationEmpirical Evaluation: Empirical Evaluation Goal Test robustness of learned dependencies Evaluate the effectiveness of the query relaxation and similarity estimation Database Used car database CarDB based on Yahoo Autos CarDB( Make, Model, Year, Price, Mileage, Location, Color) Populated using 100k tuples from Yahoo Autos Algorithms AIMQ RandomRelax – randomly picks attribute to relax GuidedRelax – uses relaxation order determined using approximate keys and AFDs ROCK: RObust Clustering using linKs (Guha et al, ICDE 1999) Compute Neighbours and Links between every tuple Neighbour – tuples similar to each other Link – Number of common neighbours between two tuples Cluster tuples having common neighbours Robustness of Dependencies: Robustness of Dependencies Attribute dependence order & Key quality is unaffected by sampling Robustness of Value Similarities: Robustness of Value SimilaritiesEfficiency of Relaxation: Efficiency of Relaxation Random Relaxation Guided RelaxationAccuracy over CarDB: Accuracy over CarDB 14 queries over 100K tuples Similarity learned using 25k sample Mean Reciprocal Rank (MRR) estimated as Overall high MRR shows high relevance of suggested answersAIMQ - Summary: AIMQ - Summary An approach for answering imprecise queries over Web database Mine and use AFDs to determine attribute order Domain independent semantic similarity estimation technique Automatically compute attribute importance scores Empirical evaluation shows Efficiency and robustness of algorithms Better performance than current approaches High relevance of suggested answers Domain independence You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
s1350 nambiar Manlio Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 27 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 06, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Answering Imprecise Queries over Web Databases Ullas Nambiar and Subbarao Kambhampati Department of CS & Engineering Arizona State University VLDB , Aug 30 – Sep 02, 2005, Trondheim, NorwayWhy Imprecise Queries ?: Why Imprecise Queries ? The Imprecise Query Answering Problem: The Imprecise Query Answering Problem Problem Statement: Given a conjunctive query Q over a relation R, find a ranked set of tuples of R that satisfy Q above a threshold of similarity Tsim. Ans(Q) ={x|x Є R, Similarity(Q,x) >Tsim} Constraints: Autonomous Database Data accessible only by querying Data model, operators etc not modifiable Supports boolean model (relation) Existing Approaches: Existing Approaches Similarity search over Vector space Data must be stored as vectors of text WHIRL, W. Cohen, 1998 Enhanced database model Add ‘similar-to’ operator to SQL. Distances provided by an expert/system designer VAGUE, A. Motro, 1998 Support similarity search and query refinement over abstract data types Binderberger et al, 2003 User guidance Users provide information about objects required and their possible neighborhood Proximity Search, Goldman et al, 1998 Limitations: User/expert must provide similarity measures New operators to use distance measures Not applicable over autonomous databasesMotivation & Challenges what we did !: Motivation & Challenges what we did ! Objectives Minimal burden on the end user No changes to existing database Domain independent Motivation Mimic relevance based ranked retrieval paradigm of IR systems Can we learn relevance statistics from database ? Use the estimated relevance model to improve the querying experience of users Challenges Estimating Query-Tuple Similarity Weighted summation of attribute similarities Syntactic similarity inadequate Need to estimate semantic similarity Not enough Ontologies Measuring Attribute Importance Not all attributes equally important Users cannot quantify importanceThe AIMQ approach: The AIMQ approachQuery-Tuple Similarity: Query-Tuple Similarity Tuples in extended set show different levels of relevance Ranked according to their similarity to the corresponding tuples in base set using n = Count(Attributes(R)) and Wimp is the importance weight of the attribute Euclidean distance as similarity for numerical attributes e.g. Price, Year VSim – semantic value similarity estimated by AIMQ for categorical attributes e.g. Make, ModelDeciding Attribute Order: Deciding Attribute Order Mine AFDs and Approximate Keys Create dependence graph using AFDs Strongly connected hence a topological sort not possible Using Approximate Key with highest support partition attributes into Deciding set Dependent set Sort the subsets using dependence and influence weights Measure attribute importance as Attribute relaxation order is all non-keys first then keys Greedy multi-attribute relaxationEmpirical Evaluation: Empirical Evaluation Goal Test robustness of learned dependencies Evaluate the effectiveness of the query relaxation and similarity estimation Database Used car database CarDB based on Yahoo Autos CarDB( Make, Model, Year, Price, Mileage, Location, Color) Populated using 100k tuples from Yahoo Autos Algorithms AIMQ RandomRelax – randomly picks attribute to relax GuidedRelax – uses relaxation order determined using approximate keys and AFDs ROCK: RObust Clustering using linKs (Guha et al, ICDE 1999) Compute Neighbours and Links between every tuple Neighbour – tuples similar to each other Link – Number of common neighbours between two tuples Cluster tuples having common neighbours Robustness of Dependencies: Robustness of Dependencies Attribute dependence order & Key quality is unaffected by sampling Robustness of Value Similarities: Robustness of Value SimilaritiesEfficiency of Relaxation: Efficiency of Relaxation Random Relaxation Guided RelaxationAccuracy over CarDB: Accuracy over CarDB 14 queries over 100K tuples Similarity learned using 25k sample Mean Reciprocal Rank (MRR) estimated as Overall high MRR shows high relevance of suggested answersAIMQ - Summary: AIMQ - Summary An approach for answering imprecise queries over Web database Mine and use AFDs to determine attribute order Domain independent semantic similarity estimation technique Automatically compute attribute importance scores Empirical evaluation shows Efficiency and robustness of algorithms Better performance than current approaches High relevance of suggested answers Domain independence