Share PowerPoint. Anywhere!

icde00

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Like authorSTREAM?


You can vote once a day till December
10th, Vote Now!
Views: 2
Like it  ( Likes) Dislike it  ( Dislikes)
Added: December 18, 2007 This presentation is Public
Presentation Category :Entertainment
Presentation StatisticsNew!
Views on authorSTREAM: 2
Presentation Transcript

Query Planning with Limited Source Capabilities : Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara


Motivation : Heterogeneous information sources on the WWW Information-integration systems Limited query capabilities Music stores: amazon.com, cdnow.com. Must specify a value of Artist or Title. The sources do not answer queries such as “Give me all your information about CDs.” Motivation


Example : Query: “Find the prices of CDs containing a song titled Friends.” Example


Source tuples : Source tuples Not all the tuples could be retrieved from the sources due to the restrictions.


Slide5 : Traditional approach: consider each join at a time.


Slide6 : Our approach: retrieve as many tuples as possible. X X X X This approach could save the user $15 - $10 = $5!


Observations : Access views not in a join to retrieve bindings; Recursive process; Some tuples in the answer cannot be retrieved. Observations


Questions : How to compute the maximal answer? When should we access sources not in a query? What sources should be accessed? Questions


Source views : Source views A set of source views V with binding patterns: b: a value must be specified for the attribute f: free Each view schema uses a set of global attributes


Queries : A query Q includes: Input attributes: I; Output attributes: O. Queries


Connections : Connection: a set of views that connect I and O in Q. Meaning: natural join of the views. Universal-relation-like assumptions, but connections can be generated in various ways. Connections


Question 1: Computing the maximal answer : Question 1: Computing the maximal answer Translate a query and source views into a Datalog program. Borrowed the idea from Duschka and Levy [IJCAI-97]. We eliminate useless source accesses. Why Datalog programs? Recursion.


Constructing program (Q,V) : Constructing program (Q,V) Connection rules: ans(P) :- V1(s1, C) & V2 (C, A, P) ans(P) :- V1(s1, C) & V3 (C, A, P) Fact rule: song(s1) :-


Slide14 : Binding assumptions: A binding for an attribute is from the attribute’s domain; Do not allow the “strategy” of trying all the possible strings to “test” the source (may not terminate); Any binding is either obtained from the query, or from a tuple returned by a source query. The program (Q,V) computes the maximal answer.


Question 2: when to access off-query sources? : Query: Input: A = a1 Output: D = ? Connections: T1 = {v1,v3}, T2 = {v2,v3} Not all the views need to accessed. Question 2: when to access off-query sources?


Independent connections : Independent connections A connection T is independent if all the views in T can be queried starting from the input attributes as the initial bindings and using only the views in T. Theorem: off-connection source accesses are only necessary for nonindependent connections.


Question 3: what sources should be accessed? : A view v is relevant to connection T if we may miss some answers to T when v is not used. How to find all the relevant views of a nonindependent connection? Question 3: what sources should be accessed?


Kernel : Kernel A kernel of a connection is a minimal set of attributes that need to be initially bound in addition to the input attributes to query the full connection. A connection may have multiple kernels.


Algorithm FIND_REL: Finding relevant views of a connection : Algorithm FIND_REL: Finding relevant views of a connection Find all the relevant views of connection T2 = {v2,v3}: (1) Compute queryable views: {v1,v2 ,v3,v4,v5}; (2) Find a kernel K of T2 : K = {C}; (4) Return R  T2 = {v1,v2 ,v3 ,v4}. (3) Compute all the views that can help produce bindings for the attributes in K: R = {v1,v2 ,v4} ;


Constructing an efficient program : Constructing an efficient program Compute the relevant views for each connection; Take the union of all these relevant source views; Use these views to construct a new program; Remove useless rules.


Conclusions : Conclusions A query-planning framework to compute the maximal answer to a query (Duschka and Levy [IJCAI-97]). Techniques for telling when to access off-query views; Algorithms: finding all the relevant sources for a query; constructing an efficient program.


Other related work : Other related work Rajaraman, Sagiv, and Ullman [PODS-95]: Shows how to find an equivalent query rewriting using views with binding restrictions; We give the maximal rewriting of a query. Optimizing conjunctive queries with binding restrictions: Yerneni, Li, Garcia-Molina, and Ullman [ICDT-99]; Florescu et al. [SIGMOD-99]. Testing connection containment: Li [Stanford-CS-TR 2000], using results of monadic programs to prove the problem is decidable.


Predicates : Predicates EDB predicates IDB predicates v1(S, C) V1 (S, C) v2(C, A,P) V2 (C, A, P) v3(C, A, P) V3 (C, A, P) cd(C) song(S) artist(A) price(P) ans(P)


Evaluating program (Q,V) : Evaluating program (Q,V) Assume the right side of an -rule or a domain rule is: domA1(A1), …, domAp(Ap), vi(A1,…, Am) Once we have bindings for domA1(A1), …, domAp(Ap), evaluate the rule and populate the domain predicates and -predicate. Repeat until no more facts can be derived. Compute the maximal answer to the query.


Forward-closure : Forward-closure Given views W  V, and attributes X, the forward-closure of X given W, denoted f-closure(X,W), is the the set of views in W that can be eventually queried by using the views in W, starting from the initial bindings X.


Backward-closure : Backward-closure of a set of attributes X: b-closure(X), is the set of views that can help retrieve bindings for X. Backward-closure Lemma: All backward-closures of a connection are the same. b-closure(C) = {v1,v2,v4}


BF-chain, backward-closure : BF-chain: Backward-closure: BF-chain, backward-closure b-closure(C) = {v1,v2,v4}


Other possibilities of obtaining bindings : Other possibilities of obtaining bindings Cached data: For a cached tuple ti(a1,a2) for view vi(A1,A2), add the following rules to the program (Q, V): vi(a1,a2) :- domA1(a1) :- domA2(a2) :- Domain knowledge: student(name, dept, GPA). dept = CS, Physics, Chemistry, etc.


Computing a partial answer : Computing a partial answer Independent connections: complete answers are computable. Nonindependent connections: access some relevant views. May terminate evaluating the program after some results are computed.