logging in or signing up 21 LookingGlass Regina1 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 64 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 09, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Adaptive Query Processing in the Looking Glass: Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison) Adaptive Query Processing (AQP) Systems:Publication Timeline: Adaptive Query Processing (AQP) Systems: Publication Timeline … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres IntroductionMotivation: Motivation Plenty of recent work on Adaptive Query Processing (AQP) in different contexts Conventional DBMS query processing, data integration, continuous queries in stream systems No exhaustive, in-depth categorization and comparison of AQP systems to date Difficult to answer questions like: Will techniques from one system work on another? What are the shortcomings of each system? Which system is best for a new application domain? IntroductionOur Contributions: Our Contributions Detailed study of current AQP systems Classification of AQP systems into 3 families Comparison across families in terms of AQP tasks Identification of shortcomings & new approaches to address them IntroductionRoadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedPrimer on Traditional Query Processing: Primer on Traditional Query Processing Optimizer: Chooses best plan Query IntroductionNeed for Adaptive Query Processing: Need for Adaptive Query Processing Introduction Correlated & skewed data distributions Errors in stats estimates, optimizer mistakes Detect plan suboptimality, re-optimize Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computingOur Focus: AQP for a Single Query: Our Focus: AQP for a Single Query Introduction AQP System: A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single queryRoadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedAQP System Families: AQP System Families Plan-based AQP systems AQP for traditional plan-based DBMSs Continuous-Query-based (CQ-based) AQP systems AQP for long-running continuous queries over data streams Routing-based AQP systems AQP for DBMSs and continuous queries based on adaptive tuple routing AQP FamiliesAQP in Plan-based Systems: AQP in Plan-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats AQP FamiliesAQP in Plan-based Systems: AQP in Plan-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats AQP FamiliesExample Plan-based AQP Systems: Example Plan-based AQP Systems … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres AQP FamiliesPrimer on Continuous Query Processing: Primer on Continuous Query Processing Continuous Queries (CQs) are long-running queries usually over data streams Example CQ: Filtering packet streams Stream properties or system conditions may change while query is running best plan may change σ1 σ2 σ3 Packets Chosen packets AQP FamiliesAQP in CQ-based Systems: AQP in CQ-based Systems Optimizer: Chooses best plan Query AQP Families Statistics Tracker: Creates/updates stats RunstatsAQP in CQ-based Systems: AQP in CQ-based Systems Optimizer: Chooses best plan Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditionsAQP in CQ-based Systems: AQP in CQ-based Systems Optimizer: Ensures that plan is best for current stats Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plansAQP in CQ-based Systems: AQP in CQ-based Systems Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Stats to track Re-optimize Uses stats to cost plans Optimizer: Ensures that plan is best for current statsExample CQ-based AQP Systems: … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example CQ-based AQP Systems AQP FamiliesPrimer on Routing-based Processing: Primer on Routing-based Processing Non-plan-based architecture where tuples are routed individually through operators No optimizer Exemplified by Eddies [AH00] AQP FamiliesAQP in Routing-based Systems: AQP in Routing-based Systems AQP Families Optimizer: Chooses best plan QueryAQP in Routing-based Systems: AQP in Routing-based Systems Tuple Router: Integrated Optimizer & Stats Tracker Query or Continuous Query AQP FamiliesSlide23: … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example Routing-based AQP Systems AQP FamiliesRoadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedComparison Across AQP System Families: Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing ScalabilityComparison Across AQP System Families: Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing ScalabilityTechniques for Tracking Statistics: Techniques for Tracking Statistics Observation Mostly in Plan-based systems Competition Mostly in Plan-based systems Profiling Mostly in CQ-based systems Exploration In Routing-based systems ComparisonTracking Statistics: Observation [KD98]: Tracking Statistics: Observation [KD98] Collect statistics on operator behavior or intermediate subexpressions in a plan ComparisonTracking Statistics: Competition [A93]: Tracking Statistics: Competition [A93] Extra processing to collect statistics ComparisonTracking Statistics: Profiling [BMM+04]: Tracking Statistics: Profiling [BMM+04] Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics Builds a “statistical profile” that can be used to estimate many individual statistics Comparison σ1 σ2 σ3 Profiled tuplesTracking Statistics: Exploration [AH00]: Tracking Statistics: Exploration [AH00] A fraction of tuples are routed along routes different from the current best route to track statistics along those routes No redundant processing Comparison σ1 σ2 σ3 Packets Chosen packets Tuple RouterComparing Statistics-Tracking Techniques: Extra Overhead Introduced: Comparing Statistics-Tracking Techniques: Extra Overhead Introduced Comparison Increasing overhead Observation Exploration (inefficient routes for some tuples) Profiling (extra processing on some tuples) Competition (lots of extra work)Comparing Statistics-Tracking Techniques: Coverage of Different Statistics: Comparing Statistics-Tracking Techniques: Coverage of Different Statistics Comparison Increasing coverage Observation & Competition (limited by plan) Exploration (limited by large number of routes) Profiling (highest since it builds statistics profile)Comparing Statistics-Tracking Techniques: Accuracy of Estimation: Comparing Statistics-Tracking Techniques: Accuracy of Estimation Comparison Increasing accuracy Observation & Competition Exploration (but, susceptible to routing bias) Profiling (depends on sampling fraction)Roadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedWhat have we learned? (1): What have we learned? (1) Many similarities in internals of different AQP families Can re-use many current (and new) AQP techniques across families Ex: Profiling from CQ-based systems Enables, e.g., faster detection of plan suboptimality in Plan-based systems Generates more accurate statistics at lower cost in Routing-based systems New IdeasWhat have we learned? (2): What have we learned? (2) Current AQP systems are reactive E.g., do not consider sensitivity to errors/changes in stats New Ideas Proactive Re-optimizationWhat have we learned? (3): What have we learned? (3) Challenging meta problems in AQP for continuous queries need to be addressed Larger and more complex plan spaces higher costs for statistics tracking and re-optimization Tracking “Return-of-Investment” on AQP Avoiding thrashing, e.g., on bursty changes in statistics New Ideas Proposal: Plan Logging for Continuous QueriesPlan Logging for Continuous Queries: Plan Logging for Continuous Queries Log the statistics and re-optimization history Query is long-running Example view over log for R S T ⋈ ⋈ New Ideas Plans lying in a high-dimensional space of statistics timeSummary: Summary AQP is becoming important: New data and application trends CS-wide push towards Autonomic Computing Significant amount of work on AQP in recent years Our contributions: In-depth categorization and comparison of AQP systems and techniques Identified current shortcomings and new approaches to AQP Conclusions You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
21 LookingGlass Regina1 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 64 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 09, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Adaptive Query Processing in the Looking Glass: Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison) Adaptive Query Processing (AQP) Systems:Publication Timeline: Adaptive Query Processing (AQP) Systems: Publication Timeline … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres IntroductionMotivation: Motivation Plenty of recent work on Adaptive Query Processing (AQP) in different contexts Conventional DBMS query processing, data integration, continuous queries in stream systems No exhaustive, in-depth categorization and comparison of AQP systems to date Difficult to answer questions like: Will techniques from one system work on another? What are the shortcomings of each system? Which system is best for a new application domain? IntroductionOur Contributions: Our Contributions Detailed study of current AQP systems Classification of AQP systems into 3 families Comparison across families in terms of AQP tasks Identification of shortcomings & new approaches to address them IntroductionRoadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedPrimer on Traditional Query Processing: Primer on Traditional Query Processing Optimizer: Chooses best plan Query IntroductionNeed for Adaptive Query Processing: Need for Adaptive Query Processing Introduction Correlated & skewed data distributions Errors in stats estimates, optimizer mistakes Detect plan suboptimality, re-optimize Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computingOur Focus: AQP for a Single Query: Our Focus: AQP for a Single Query Introduction AQP System: A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single queryRoadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedAQP System Families: AQP System Families Plan-based AQP systems AQP for traditional plan-based DBMSs Continuous-Query-based (CQ-based) AQP systems AQP for long-running continuous queries over data streams Routing-based AQP systems AQP for DBMSs and continuous queries based on adaptive tuple routing AQP FamiliesAQP in Plan-based Systems: AQP in Plan-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats AQP FamiliesAQP in Plan-based Systems: AQP in Plan-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats AQP FamiliesExample Plan-based AQP Systems: Example Plan-based AQP Systems … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres AQP FamiliesPrimer on Continuous Query Processing: Primer on Continuous Query Processing Continuous Queries (CQs) are long-running queries usually over data streams Example CQ: Filtering packet streams Stream properties or system conditions may change while query is running best plan may change σ1 σ2 σ3 Packets Chosen packets AQP FamiliesAQP in CQ-based Systems: AQP in CQ-based Systems Optimizer: Chooses best plan Query AQP Families Statistics Tracker: Creates/updates stats RunstatsAQP in CQ-based Systems: AQP in CQ-based Systems Optimizer: Chooses best plan Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditionsAQP in CQ-based Systems: AQP in CQ-based Systems Optimizer: Ensures that plan is best for current stats Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plansAQP in CQ-based Systems: AQP in CQ-based Systems Continuous Query AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Stats to track Re-optimize Uses stats to cost plans Optimizer: Ensures that plan is best for current statsExample CQ-based AQP Systems: … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example CQ-based AQP Systems AQP FamiliesPrimer on Routing-based Processing: Primer on Routing-based Processing Non-plan-based architecture where tuples are routed individually through operators No optimizer Exemplified by Eddies [AH00] AQP FamiliesAQP in Routing-based Systems: AQP in Routing-based Systems AQP Families Optimizer: Chooses best plan QueryAQP in Routing-based Systems: AQP in Routing-based Systems Tuple Router: Integrated Optimizer & Stats Tracker Query or Continuous Query AQP FamiliesSlide23: … 1976 1977 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example Routing-based AQP Systems AQP FamiliesRoadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedComparison Across AQP System Families: Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing ScalabilityComparison Across AQP System Families: Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing ScalabilityTechniques for Tracking Statistics: Techniques for Tracking Statistics Observation Mostly in Plan-based systems Competition Mostly in Plan-based systems Profiling Mostly in CQ-based systems Exploration In Routing-based systems ComparisonTracking Statistics: Observation [KD98]: Tracking Statistics: Observation [KD98] Collect statistics on operator behavior or intermediate subexpressions in a plan ComparisonTracking Statistics: Competition [A93]: Tracking Statistics: Competition [A93] Extra processing to collect statistics ComparisonTracking Statistics: Profiling [BMM+04]: Tracking Statistics: Profiling [BMM+04] Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics Builds a “statistical profile” that can be used to estimate many individual statistics Comparison σ1 σ2 σ3 Profiled tuplesTracking Statistics: Exploration [AH00]: Tracking Statistics: Exploration [AH00] A fraction of tuples are routed along routes different from the current best route to track statistics along those routes No redundant processing Comparison σ1 σ2 σ3 Packets Chosen packets Tuple RouterComparing Statistics-Tracking Techniques: Extra Overhead Introduced: Comparing Statistics-Tracking Techniques: Extra Overhead Introduced Comparison Increasing overhead Observation Exploration (inefficient routes for some tuples) Profiling (extra processing on some tuples) Competition (lots of extra work)Comparing Statistics-Tracking Techniques: Coverage of Different Statistics: Comparing Statistics-Tracking Techniques: Coverage of Different Statistics Comparison Increasing coverage Observation & Competition (limited by plan) Exploration (limited by large number of routes) Profiling (highest since it builds statistics profile)Comparing Statistics-Tracking Techniques: Accuracy of Estimation: Comparing Statistics-Tracking Techniques: Accuracy of Estimation Comparison Increasing accuracy Observation & Competition Exploration (but, susceptible to routing bias) Profiling (depends on sampling fraction)Roadmap: Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learnedWhat have we learned? (1): What have we learned? (1) Many similarities in internals of different AQP families Can re-use many current (and new) AQP techniques across families Ex: Profiling from CQ-based systems Enables, e.g., faster detection of plan suboptimality in Plan-based systems Generates more accurate statistics at lower cost in Routing-based systems New IdeasWhat have we learned? (2): What have we learned? (2) Current AQP systems are reactive E.g., do not consider sensitivity to errors/changes in stats New Ideas Proactive Re-optimizationWhat have we learned? (3): What have we learned? (3) Challenging meta problems in AQP for continuous queries need to be addressed Larger and more complex plan spaces higher costs for statistics tracking and re-optimization Tracking “Return-of-Investment” on AQP Avoiding thrashing, e.g., on bursty changes in statistics New Ideas Proposal: Plan Logging for Continuous QueriesPlan Logging for Continuous Queries: Plan Logging for Continuous Queries Log the statistics and re-optimization history Query is long-running Example view over log for R S T ⋈ ⋈ New Ideas Plans lying in a high-dimensional space of statistics timeSummary: Summary AQP is becoming important: New data and application trends CS-wide push towards Autonomic Computing Significant amount of work on AQP in recent years Our contributions: In-depth categorization and comparison of AQP systems and techniques Identified current shortcomings and new approaches to AQP Conclusions