Workshop WebDB 2000b

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Partial Query-Evaluation in Internet Query Engines: 

Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin & Oregon Graduate Institute

Outline: 

Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

Querying the WWW: The Present: 

Querying the WWW: The Present Who won the Nobel prize for Physics in 1999?

Querying the WWW: The Present: 

Querying the WWW: The Present Want 1998 Red BMW No accidents 20% < avg. model price HTML File HTML File HTML File HTML File HTML File HTML File HTML File

Querying the WWW: The Future?: 

Querying the WWW: The Future? Want 1998 Red BMWNo accidents 20% < avg. model price

Inside the Internet Query Engine: 

Inside the Internet Query Engine (carId, model, price, otherinfo) Red Used BMW Cars (carId, model, price, otherinfo)

The Problem: 

The Problem Return results to users as soon as possible “Results so far” for queries with blocking operators Arbitrary blocking operators Not exists, Average, Nest … Blocking operators occurring anywhere in the query Potentially intermixed with non-blocking operators

Outline: 

Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

What is a Partial Result of a Query?: 

What is a Partial Result of a Query? Let Full Result of Query Q on Inputs A and B be: Q(A, B) Then Partial Result of Query Q on Inputs A and B is: Q(PA, PB) PA  A PB  B

Maximal Output Property: 

Maximal Output Property Produce “correct” results as soon as possible Why? If query is non-blocking Produces results soon If query is blocking Return “non-blocking parts” soon (e.g., outer join)

Inside the Internet Query Engine: 

Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW Cars Accident Reports (carId) (carId, model, price, otherinfo)

Anytime Property: 

Anytime Property Blocking operators should be able to return the “result so far” at any time Why? User can request partial results at any time

Inside the Internet Query Engine: 

Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW Cars Accident Reports (carId) (carId, model, price, otherinfo)

Non-Monotonic Input/Output Property: 

Non-Monotonic Input/Output Property Operators should handle “changes”, not just additions to input Similarly, operators should produce “changes”, not just additions to output Both blocking and non-blocking operators Why? Partial results may represent “wrong” answers Need to be corrected later

Inside the Internet Query Engine: 

Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW Cars Accident Reports (carId) (carId, model, price, otherinfo)

Flexible Input Property: 

Flexible Input Property Should be able to process data from any input at any time Processes data as it becomes available Why? If query is non-blocking: Can return results soon If query is blocking Faster partial result response time

A Note on Partial Result Accuracy: 

A Note on Partial Result Accuracy Focus is on producing partial results Architecture is general enough to exploit existing techniques Online aggregation [Hellerstein et. al.] Nested aggregates [Tan et. al.] Accuracy for general blocking operators?

Outline: 

Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

Where do we start?: 

Where do we start? Use known flexible input, maximal output operator implementations Non-blocking: select, symmetric hash join, Xjoin Blocking: group-by, symmetric outer join Blocking operator implementations should satisfy anytime property All operator implementations should satisfy non-monotonic input/output property

Non-Monotonic Input/Output: 

Non-Monotonic Input/Output Re-evaluation Approach: On partial result request, compute results “so far” Then forget all “potentially incorrect” inputs Differential Approach: On partial result request, compute results “so far” “Update” incorrect inputs for future result computation

Inside the Internet Query Engine: 

Inside the Internet Query Engine (carId, model, price, otherinfo) Red 1998 BMW Cars Accident Reports (carId) (carId, model, price, otherinfo)

Re-evaluation Join: 

Re-evaluation Join (1, Z3, 10000) (Z3, 15000) (19, Z3, 20000) (5, 400i, 30000) (400i, 25000) (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (3, 400i, 20000) (3, 400i, 20000)

Re-evaluation Join: 

Re-evaluation Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (3, 400i, 20000) (8, 400i, 20000) (8, 400i, 20000) (Z3, 15000) (Z3, 15000) (400i, 23333) (400i, 23333)

Differential Join: 

Differential Join (1, Z3, 10000) (Z3, 15000) (19, Z3, 20000) (5, 400i, 30000) (400i, 25000) (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (3, 400i, 20000) (3, 400i, 20000)

Differential Join: 

Differential Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 25000) (3, 400i, 20000) update (400i, 23333)

Differential Join: 

Differential Join (1, Z3, 10000) (19, Z3, 20000) (5, 400i, 30000) (Z3, 15000) (400i, 23333) (3, 400i, 20000) (8, 400i, 20000) (8, 400i, 20000)

Re-evaluation vs. Differential: 

Re-evaluation vs. Differential Re-evaluation Approach: Simple – just “forget” partial inputs Easier to extend (no changes to tuple structure) Unnecessary computation Differential Approach: Need to handle deletions/updates of inputs Changes to tuple structure Re-computes only what is necessary

Outline: 

Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

Response Time: 

Response Time

Outline: 

Outline Motivation Desired Operator Properties Implementation Alternatives Performance Evaluation Conclusion

Conclusion: 

Conclusion New properties for query engine operators Operator implementation alternatives Re-evaluation Differential Evaluation Partial results improve response time Re-evaluation approach is simpler Differential approach is more efficient

Future Work: 

Future Work General GUI Partial result accuracy for general blocking operators Changes at finer granularities Consistent partial results

Related Work: 

Related Work Online aggregation [Hellerstein et. al.] Nested aggregates [Tan et. al.] Online reordering [Raman et. al.] Symmetric hash join [Wilschut et. al.] Adaptive operators [Ives et. al.] XJoin [Urhan et. al.]