21 LookingGlass

Uploaded from authorPOINTLite
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Adaptive Query Processing in the Looking Glass: 

Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)

Adaptive Query Processing (AQP) Systems: Publication Timeline: 

Adaptive Query Processing (AQP) Systems: Publication Timeline … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Introduction

Motivation: 

Motivation Plenty of recent work on Adaptive Query Processing (AQP) in different contexts Conventional DBMS query processing, data integration, continuous queries in stream systems No exhaustive, in-depth categorization and comparison of AQP systems to date Difficult to answer questions like: Will techniques from one system work on another? What are the shortcomings of each system? Which system is best for a new application domain? Introduction

Our Contributions: 

Our Contributions Detailed study of current AQP systems Classification of AQP systems into 3 families Comparison across families in terms of AQP tasks Identification of shortcomings & new approaches to address them Introduction

Roadmap: 

Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

Primer on Traditional Query Processing: 

Primer on Traditional Query Processing Optimizer: Chooses best plan Query Introduction

Need for Adaptive Query Processing: 

Need for Adaptive Query Processing Introduction Correlated & skewed data distributions Errors in stats estimates, optimizer mistakes Detect plan suboptimality, re-optimize Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computing

Our Focus: AQP for a Single Query: 

Our Focus: AQP for a Single Query Introduction AQP System: A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single query

Roadmap: 

Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

AQP System Families: 

AQP System Families Plan-based AQP systems AQP for traditional plan-based DBMSs Continuous-Query-based (CQ-based) AQP systems AQP for long-running continuous queries over data streams Routing-based AQP systems AQP for DBMSs and continuous queries based on adaptive tuple routing AQP Families

AQP in Plan-based Systems: 

AQP in Plan-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats AQP Families

AQP in Plan-based Systems: 

AQP in Plan-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats AQP Families

Example Plan-based AQP Systems: 

Example Plan-based AQP Systems … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres AQP Families

Primer on Continuous Query Processing: 

Primer on Continuous Query Processing Continuous Queries (CQs) are long-running queries usually over data streams Example CQ: Filtering packet streams Stream properties or system conditions may change while query is running  best plan may change σ1 σ2 σ3 Packets Chosen packets AQP Families

AQP in CQ-based Systems: 

AQP in CQ-based Systems Optimizer: Chooses best plan Query AQP Families Statistics Tracker: Creates/updates stats Runstats

AQP in CQ-based Systems: 

AQP in CQ-based Systems Optimizer: Chooses best plan Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions

AQP in CQ-based Systems: 

AQP in CQ-based Systems Optimizer: Ensures that plan is best for current stats Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans

AQP in CQ-based Systems: 

AQP in CQ-based Systems Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Stats to track Re-optimize Uses stats to cost plans Optimizer: Ensures that plan is best for current stats

Example CQ-based AQP Systems: 

… 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example CQ-based AQP Systems AQP Families

Primer on Routing-based Processing: 

Primer on Routing-based Processing Non-plan-based architecture where tuples are routed individually through operators No optimizer Exemplified by Eddies [AH00] AQP Families

AQP in Routing-based Systems: 

AQP in Routing-based Systems AQP Families Optimizer: Chooses best plan Query

AQP in Routing-based Systems: 

AQP in Routing-based Systems Tuple Router: Integrated Optimizer & Stats Tracker Query or Continuous Query AQP Families

Slide23: 

… 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example Routing-based AQP Systems AQP Families

Roadmap: 

Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

Comparison Across AQP System Families: 

Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability

Comparison Across AQP System Families: 

Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability

Techniques for Tracking Statistics: 

Techniques for Tracking Statistics Observation Mostly in Plan-based systems Competition Mostly in Plan-based systems Profiling Mostly in CQ-based systems Exploration In Routing-based systems Comparison

Tracking Statistics: Observation [KD98]: 

Tracking Statistics: Observation [KD98] Collect statistics on operator behavior or intermediate subexpressions in a plan Comparison

Tracking Statistics: Competition [A93]: 

Tracking Statistics: Competition [A93] Extra processing to collect statistics Comparison

Tracking Statistics: Profiling [BMM+04]: 

Tracking Statistics: Profiling [BMM+04] Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics Builds a “statistical profile” that can be used to estimate many individual statistics Comparison σ1 σ2 σ3 Profiled tuples

Tracking Statistics: Exploration [AH00]: 

Tracking Statistics: Exploration [AH00] A fraction of tuples are routed along routes different from the current best route to track statistics along those routes No redundant processing Comparison σ1 σ2 σ3 Packets Chosen packets Tuple Router

Comparing Statistics-Tracking Techniques: Extra Overhead Introduced: 

Comparing Statistics-Tracking Techniques: Extra Overhead Introduced Comparison Increasing overhead Observation Exploration (inefficient routes for some tuples) Profiling (extra processing on some tuples) Competition (lots of extra work)

Comparing Statistics-Tracking Techniques: Coverage of Different Statistics: 

Comparing Statistics-Tracking Techniques: Coverage of Different Statistics Comparison Increasing coverage Observation & Competition (limited by plan) Exploration (limited by large number of routes) Profiling (highest since it builds statistics profile)

Comparing Statistics-Tracking Techniques: Accuracy of Estimation: 

Comparing Statistics-Tracking Techniques: Accuracy of Estimation Comparison Increasing accuracy Observation & Competition Exploration (but, susceptible to routing bias) Profiling (depends on sampling fraction)

Roadmap: 

Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned

What have we learned? (1): 

What have we learned? (1) Many similarities in internals of different AQP families Can re-use many current (and new) AQP techniques across families Ex: Profiling from CQ-based systems Enables, e.g., faster detection of plan suboptimality in Plan-based systems Generates more accurate statistics at lower cost in Routing-based systems New Ideas

What have we learned? (2): 

What have we learned? (2) Current AQP systems are reactive E.g., do not consider sensitivity to errors/changes in stats New Ideas Proactive Re-optimization

What have we learned? (3): 

What have we learned? (3) Challenging meta problems in AQP for continuous queries need to be addressed Larger and more complex plan spaces  higher costs for statistics tracking and re-optimization Tracking “Return-of-Investment” on AQP Avoiding thrashing, e.g., on bursty changes in statistics New Ideas Proposal: Plan Logging for Continuous Queries

Plan Logging for Continuous Queries: 

Plan Logging for Continuous Queries Log the statistics and re-optimization history Query is long-running Example view over log for R S T ⋈ ⋈ New Ideas Plans lying in a high-dimensional space of statistics time

Summary: 

Summary AQP is becoming important: New data and application trends CS-wide push towards Autonomic Computing Significant amount of work on AQP in recent years Our contributions: In-depth categorization and comparison of AQP systems and techniques Identified current shortcomings and new approaches to AQP Conclusions