Challenges in Querying Autonomous Databases:
Challenges in Querying Autonomous Databases Imprecise Queries
User’s needs are not clearly defined hence:
Queries may be too general
Queries may be too specific General Solution: “Expected Relevance Ranking” Challenge: Automated & Non-intrusive assessment of Relevance and Density functions Incomplete Data
Databases are often populated by:
Lay users entering data
Automated extraction Challenge: Rewriting a user’s query to retrieve highly relevant Similar/ Incomplete tuples However, how can we retrieve similar/
incomplete tuples in the first place? Challenge: Provide explanations for the uncertain answers in order to gain the user’s trust Once the similar/incomplete tuples have been
retrieved, why should users believe them?
Expected Relevance Ranking Model:
Expected Relevance Ranking Model Problem:
How to automatically and non-intrusively assess the Relevance & Density functions? Estimating Relevance (R):
Learn relevance for user population as
a whole in terms of value similarity
Sum of weighted similarity for each constrained attribute
Content Based Similarity
(Mined from probed sample using SuperTuples)
Co-click Based Similarity
(Yahoo Autos recommendations)
Co-occurrence Based Similarity (GoogleSets) Estimating Density (P):
Learn density for each attribute
independent of the other attributes
AFDs used for feature selection
AFD-Enhanced NBC Classifiers AFDs play a role in:
Attribute Importance
Feature Selection
Query Rewriting
Retrieving Relevant Answers via Query Rewriting:
Given an AFD, rewrite the query using the determining set attributes in order to retrieve possible answers Q1’: Make=Honda Λ Body Style=coupe Retrieving Relevant Answers via Query Rewriting Retrieve certain answers namely tuples t1 and t6 Q2’: Make=Honda Λ Body Style=sedan Certain Answers Thus we retrieve: Incomplete Answers Similar Answers Problem:
How to rewrite a query to retrieve answers which are highly relevant to the user? Given a query Q:(Model=Civic) retrieve all the relevant tuples
Explaining Results to Users:
Explaining Results to Users Problem:
How to gain users trust when showing them similar/incomplete tuples? View Live QUIC Demo
Empirical Evaluation:
Empirical Evaluation Ranking Order User Study:
14 queries & ranked lists of uncertain tuples
Asked to mark the Relevant tuples
R-Metric used to determine ranking quality Similarity Metric User Study:
Each user shown 30 lists
Asked which list is most similar
Users found Co-click to be the most similar to their personal relevance function Query Rewriting Evaluation:
Measure inversions between rank of query and actual rank of tuples
By ranking the queries, we are able to (with relatively good accuracy) retrieve tuples in order of their relevance to the user 2 User Studies (10 users, data extracted from Yahoo Autos)
Conclusion:
Conclusion QUIC is able to handle both imprecise queries and incomplete data over autonomous databases
By an automatic and non-intrusive assessment of relevance and density functions, QUIC is able to rank tuples in order of their expected relevance to the user
By rewriting the original user query, QUIC is able to efficiently retrieve both similar and incomplete answers to a query
By providing users with a explanation as to why they are being shown answers which do not exactly match the query constraints, QUIC is able to gain the user’s trust
http://styx.dhcp.asu.edu:8080/QUICWeb