logging in or signing up GBO Breezy Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 48 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 15, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Federating Scientific Data : Federating Scientific Data Tanu Malik Dept. of Computer Science Johns Hopkins UniversityNew trends in scientific research : New trends in scientific research Explore the same scientific phenomena in related sub-disciplines In Astronomy, the same sky is observed in different electromagnetic spectra In Biology, cures for the same disease is searched in genetics, physiology and molecular biology Trend is important for new discoveries to take place Correlation between new discoveries and the number of connections between fundamental properties Number of connections increase by federating data from sub-disciplinesIssues in building federations : Issues in building federations Systems in scientific sub-disciplines are autonomous and thus heterogeneous Sub-disciplines make independent choices in hardware and software Federations can become non-inter-operable Scientific sub-disciplines are data-intensive Data acquisition governed by Moore’s law Federated tasks (queries) compare, merge and join large data-sets from multiple sub-disciplines Efficient execution of tasks By optimizing individual tasks before execution By minimizing volume of data transfer from all tasks Collection of task-specific statistics might be essential for efficient execution Need for effective solutions to allow for efficient federation, management, and correlation of dataAstronomy: Our representative science : Astronomy: Our representative science Astronomy needs federations Currently, more than 100 independent surveys published on the Internet Typically each contains 10-100 million objects Different wavelengths, different properties, but same objects Several questions for finding similar objects Real, high-dimensional data Astronomy data has no commercial value No privacy concerns. Free for sharing Great for experimenting Our contributions : Our contributions A federation of astronomy archives: The SkyQuery Architecture, specification of federated queries, algorithms for optimizing federated queries A caching framework Consists of algorithms that minimize network traffic. Algorithms build upon economic principles Estimating query result size A model that exploits data mining techniques to estimate result size of federated queriesThe SkyQuery: An astronomy federation : The SkyQuery: An astronomy federation Ability to query data across archives. Typical Query: Find all objects near the Pole star that were observed by archives measuring infra-red wavelengths but not by archives measuring radio wavelengths This is a distributed probabilistic spatial join query as there is error associated with the recorded location of an object. Federation is necessary to run this query as infra-red and radio archives are independently managed Queries are to be optimized as the “near” criteria can potentially transfer thousands of objects over the Internet SkyQuery Architecture : SkyQuery Architecture SkyQuery status : SkyQuery status Uses Web services standard for federation Currently a federation of 30 sites Is expected to grow to 120 sites soon Actively used by astronomers all over the world The spin-off is OpenSkyQuery that provides intuitive interfaces and fully-developed language for cross-match specification http://www.openskyquery.net (User site) http://www.skyquery.net (Research and Development site)Bottleneck in SkyQuery : Bottleneck in SkyQuery In January 2004, a participating site of SkyQuery resulted in: Total number of queries: 26,667 Total number of object accesses: 45,154 Total bytes on the network: 1200 GB ( More than 1 TB!) Total number of unique object accesses: 3890 Total size of objects that were accessed: 90 GB SkyQuery does not respect good network citizenship Locality in object accesses can be translated into effective caching methods Naïve caching is dangerous : Naïve caching is dangerous Risks of caching Caching may transfer unfiltered data: May transfer columns or tables across network Caching moves data to programs/queries Caching may reduce parallelism May move queries from many servers to few caches Benefits of a federation Filters data prior to transfer: Selection and aggregation queries reduce data size Moves the program (fewer bytes) to the data (more bytes) Executes queries in parallel: Experiment scales with the number of federated sites Bypass-Yield: An effective caching method : Bypass-Yield: An effective caching method Distinguish between queries To bypass To satisfy in cache Bypass queries Sent directly to database servers Caching them does not save network traffic Allows for benefits of federation: moving-queries-not-data, parallelism, and data filtering Queries satisfied in cache Save network traffic Defines yield of a query to Distinguish queries Helps decide accessed data objects to load from servers to cache Caching these data objects saves network traffic Measures the network savings rate per unit of cache space usedNetwork flow in a Bypass-Yield Cache : Network flow in a Bypass-Yield Cache DL: load cost, DB: bypass cost Minimize WAN cost on (DL + DB)The BYC metric : The BYC metric Let si be the size and fi be the fetch cost of object oi Let yi,j be the size of the query-result of the jth query that accesses oi Let pi,j be the probability of occurrence of the jth query above Byte-Yield Hit Rate (BYHR) for object oi defined as: BYHR has two components: Expected benefit of caching oi due to yield: Scaled by the cost/size ratio:BYC algorithms make economic decisions : BYC algorithms make economic decisions Rate-Profile: An optimistic algorithm Predicts object access patterns from workload Combines recency and frequency to measure probability of access for objects in and outside cache Evicts objects with lowest rate of savings. Optimistically, loads objects whose rate of savings are increasing OnlineBY: A defensive algorithm Makes no assumptions about workload Loads an object into the cache when network traffic equal to the size of object has been incurred Maintains the cache according to an web caching algorithm (GDS) Is lg2(k) competitive, k is the ratio of size of cache, to the smallest object in cache SpaceEffBY: A randomized algorithm Maintains no state about workload Loads objects with probability equal to yield/size of objectCurrent research:Yield estimation : Current research:Yield estimation Selectivity Estimation Estimating the number of tuples that satisfy a query Accurate estimates can help in For choosing best plans in query optimization As feedback to users Resource consumption, load balancing For BYC to achieve maximum network savings Accurate selectivity estimation is crucial to BYC performance Governs load and evict decisionsSelectivity estimation in BYC : Selectivity estimation in BYC Common methods to estimate selectivity Random sampling, histogram, wavelet based compressions A caching environment introduces new requirements on the selectivity estimation problem Caches are close to clients not servers; often have no access to databases Cache is a constrained resource in terms of storage Caching systems are adaptive and onlineTemplate example : Template example Assume there are large no of queries to the following query template SELECT order date,required date FROM Order WHERE freight [OP] [VALUE] VALUE varies over the domain [A,B]. A,B are real values. Operator OP comes from the set {<,>} Freight distribution : Freight distribution Cumulative frequency distribution for freight attribute for OP = ‘<‘ Needs to known precisely to estimate selectivity of all queriesTemplate example : Template example Assume there are large no of queries to the following query template SELECT order date,required date FROM Order WHERE freight [OP] [VALUE] VALUE varies in the range [M,N]. M,N are real values. Operator OP comes from the set {<,>} The feature vector consists of {OP,VALUE} Consider feature vectors over all such queries. Assume selectivity is known for some queries (say 25%), in addition, say the selectivity can be classified into 3 classes: low, medium and highEstimation example : Estimation example Decision tree learns u, and v from query attributes Use linear regression in each class to obtain actual yield valuesProperties of our solution : Properties of our solution Uses a query based approach Current methods use a data-based approach Detect templates in the workload Most scientific experiments adhere to query templates Very fast and compact Use decision trees and regression over templates Crude estimates for the bypass phase, accurate estimates for loading data Able to process complex queries Extracts feature vectors from templatesFuture directions : Future directions To improve query execution performance in the cache In the absence of indices and materialized views Dynamic physical design which reorganizes itself according to workload Open questions Finding the best file organization How to best represent queries in terms of database schema, without actually changing the physical organization? How to update changes to schema when the workload changes? Plan to work on the above problem till Spring of 2006.Acknowledgements : Acknowledgements Inside: Comp. Science: Randal Burns, Members of HSSL Phy & Astro: Alex Szalay, The SDSS Team JHU: The GBO Committee Outside: Microsoft Research: Jim Gray Univ. of Notre Dame: Amitabh Chaudhary, Nitesh Chawla Carnegie Mellon: Anastassia Ailamaki, Stratos PapadomanolakisWeb services: Web services SkyQuery uses Web services for interoperability Use of Internet standards Communication Protocol : HTTP Message Exchange Model: Simple Object Access Protocol(SOAP) with eXtensible Markup Language(XML) encoding Describing, Defining and Discovering Web Services: Web Services Description Language (WSDL), Uniform Discovery, Description and Integration (UDDI) of Web Services The object cross-match : The object cross-match Matching objects across archives, if they correspond to the same astronomical body. Given: A set of observations, one from each archive. Question: Do these observations refer to the same astronomical body? Answer: Easy, if positions can be measured precisely; the X Match answer is deterministic. Probabilistic Cross Match: Probabilistic Cross Match Measured positions have an error due to instrument inaccuracies. This error follows a Gaussian distribution (and is known for each archive) Given: A set of observations, one from each archive Question: What is the probability that these observations refer to the same astronomical body? Algorithm: ? Probabilistic Cross MatchComputing the probability of a X Match: Computing the probability of a X Match To compute the probability of a X Match, we assume a position for the real body. The probability of a X Match varies with this considered position of the real body. We consider that position for which probability is highest. This position is a kind of weighted mean. Probability at this position can be represented by a circular region about mean. Our X Match algorithm reports those sets of observations: probability > thresholdThe probabilistic cross-match : The probabilistic cross-match Background: Page Caching: Background: Page Caching Fixed size objects/pages, different fetch cost Cache hit is equivalent to an entire page being accessed Caches pages that have high fetch cost Used in operating systems Goal: Minimize total fetch cost of pages P6 P6Background: Object Caching: Background: Object Caching Variable size objects, different fetch cost Cache hit is equivalent to accessing an entire object Caches objects that have high cost/size ratio. Used in proxy web caching Goal: Minimize total fetch cost of objects 08 O8Bypass-Yield Caching: Bypass-Yield Caching Objects are of variable size For each object, cost of access varies Cache hit implies fetching an entire object, part of an object, or an aggregate computed over an object Goal: Minimize total network cost (load and bypass) Client Requests Q8Rate-Profile: Rate-Profile Estimating probability of object access to estimate savings By characterizing workload For objects in cache Estimates probability as frequency-counts over cache lifetime Computes a rate of network savings Every in-cache access increase the rate in proportion to query yield Every time step (query access) decays rate Rate-Profile: Rate-Profile For objects outside cache Separates queries accesses for an object into clusters (episodes) For each episode maintains frequency-counts over episode length Maximum rate in each episode is the maximum benefit of caching; An optimistic decision Ages maximum benefits in each episode by weight Computes the expected rate of network savings Rate-Profile: Algorithm: Rate-Profile: Algorithm Discount object load cost from expected savings Penalize object for the network cost incurred in loading an object into cache Load Vs Bypass Compares discounted expected rate of network savings of an object not in the cache with the rate of network savings of all objects in the cache If greater, load. Else, bypass Eviction Evict object(s) with lowest current savingsOnlineBY: OnlineBY Formulates the bypass-yield caching problem as a combination of Rent-or-buy problem, and Object caching problem The bypass Vs. load decision: A defensive decision Bypasses queries (rents) until total bypass traffic exceeds fetch cost Then, loads (buys) an object Eviction decisions and in-cache policies Cache is maintained according to an algorithm for object caching problem This is O(lg2k) competitive k = size of cache / size of smallest objectRandomized Ski Rental: Randomized Ski Rental Randomized version of OnlineBY Does not store the amount of traffic bypassed Load decision: A random decision Load an object with probability equal to the ratio of the yield of the current query to the fetch cost Requires no metadata for making load Vs bypass decision Reduces total amount of meta data required Performs well in practice ( no bounds yet )Network Cost of a Trace (columns) : Network Cost of a Trace (columns) Rate-Profile dynamically chooses between no cache and inline cache Rate-Profile compares well with an optimal static cache Other algorithms show similar resultsAlgorithm Cost Analysis : Algorithm Cost Analysis Rate-Profile is better than OnlineBY in column caching SpaceEffBY always lags behind In all algorithms, column caching is better than table cachingSchema Locality : Schema Locality Schema locality implies reuse of schema elements than data elements Both tables and columns show heavy and long lasting periods of reuseContainment : Containment It is imperative for workload to show both containment and locality for query caching to be viable (Luo et al) Determining exact query containment is NP complete (Chandra and Merlin) Few objects experience reuseArchitecture : Architecture Status We are currently implementing the above system within BYC. We currently have preliminary experiments prove the efficacy of the above model. Results(1): Range Query: Results(1): Range QueryResult(2): User-defined Function Queries: Result(2): User-defined Function QueriesResult(3): Index Queries: Result(3): Index Queries You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
GBO Breezy Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 48 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 15, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Federating Scientific Data : Federating Scientific Data Tanu Malik Dept. of Computer Science Johns Hopkins UniversityNew trends in scientific research : New trends in scientific research Explore the same scientific phenomena in related sub-disciplines In Astronomy, the same sky is observed in different electromagnetic spectra In Biology, cures for the same disease is searched in genetics, physiology and molecular biology Trend is important for new discoveries to take place Correlation between new discoveries and the number of connections between fundamental properties Number of connections increase by federating data from sub-disciplinesIssues in building federations : Issues in building federations Systems in scientific sub-disciplines are autonomous and thus heterogeneous Sub-disciplines make independent choices in hardware and software Federations can become non-inter-operable Scientific sub-disciplines are data-intensive Data acquisition governed by Moore’s law Federated tasks (queries) compare, merge and join large data-sets from multiple sub-disciplines Efficient execution of tasks By optimizing individual tasks before execution By minimizing volume of data transfer from all tasks Collection of task-specific statistics might be essential for efficient execution Need for effective solutions to allow for efficient federation, management, and correlation of dataAstronomy: Our representative science : Astronomy: Our representative science Astronomy needs federations Currently, more than 100 independent surveys published on the Internet Typically each contains 10-100 million objects Different wavelengths, different properties, but same objects Several questions for finding similar objects Real, high-dimensional data Astronomy data has no commercial value No privacy concerns. Free for sharing Great for experimenting Our contributions : Our contributions A federation of astronomy archives: The SkyQuery Architecture, specification of federated queries, algorithms for optimizing federated queries A caching framework Consists of algorithms that minimize network traffic. Algorithms build upon economic principles Estimating query result size A model that exploits data mining techniques to estimate result size of federated queriesThe SkyQuery: An astronomy federation : The SkyQuery: An astronomy federation Ability to query data across archives. Typical Query: Find all objects near the Pole star that were observed by archives measuring infra-red wavelengths but not by archives measuring radio wavelengths This is a distributed probabilistic spatial join query as there is error associated with the recorded location of an object. Federation is necessary to run this query as infra-red and radio archives are independently managed Queries are to be optimized as the “near” criteria can potentially transfer thousands of objects over the Internet SkyQuery Architecture : SkyQuery Architecture SkyQuery status : SkyQuery status Uses Web services standard for federation Currently a federation of 30 sites Is expected to grow to 120 sites soon Actively used by astronomers all over the world The spin-off is OpenSkyQuery that provides intuitive interfaces and fully-developed language for cross-match specification http://www.openskyquery.net (User site) http://www.skyquery.net (Research and Development site)Bottleneck in SkyQuery : Bottleneck in SkyQuery In January 2004, a participating site of SkyQuery resulted in: Total number of queries: 26,667 Total number of object accesses: 45,154 Total bytes on the network: 1200 GB ( More than 1 TB!) Total number of unique object accesses: 3890 Total size of objects that were accessed: 90 GB SkyQuery does not respect good network citizenship Locality in object accesses can be translated into effective caching methods Naïve caching is dangerous : Naïve caching is dangerous Risks of caching Caching may transfer unfiltered data: May transfer columns or tables across network Caching moves data to programs/queries Caching may reduce parallelism May move queries from many servers to few caches Benefits of a federation Filters data prior to transfer: Selection and aggregation queries reduce data size Moves the program (fewer bytes) to the data (more bytes) Executes queries in parallel: Experiment scales with the number of federated sites Bypass-Yield: An effective caching method : Bypass-Yield: An effective caching method Distinguish between queries To bypass To satisfy in cache Bypass queries Sent directly to database servers Caching them does not save network traffic Allows for benefits of federation: moving-queries-not-data, parallelism, and data filtering Queries satisfied in cache Save network traffic Defines yield of a query to Distinguish queries Helps decide accessed data objects to load from servers to cache Caching these data objects saves network traffic Measures the network savings rate per unit of cache space usedNetwork flow in a Bypass-Yield Cache : Network flow in a Bypass-Yield Cache DL: load cost, DB: bypass cost Minimize WAN cost on (DL + DB)The BYC metric : The BYC metric Let si be the size and fi be the fetch cost of object oi Let yi,j be the size of the query-result of the jth query that accesses oi Let pi,j be the probability of occurrence of the jth query above Byte-Yield Hit Rate (BYHR) for object oi defined as: BYHR has two components: Expected benefit of caching oi due to yield: Scaled by the cost/size ratio:BYC algorithms make economic decisions : BYC algorithms make economic decisions Rate-Profile: An optimistic algorithm Predicts object access patterns from workload Combines recency and frequency to measure probability of access for objects in and outside cache Evicts objects with lowest rate of savings. Optimistically, loads objects whose rate of savings are increasing OnlineBY: A defensive algorithm Makes no assumptions about workload Loads an object into the cache when network traffic equal to the size of object has been incurred Maintains the cache according to an web caching algorithm (GDS) Is lg2(k) competitive, k is the ratio of size of cache, to the smallest object in cache SpaceEffBY: A randomized algorithm Maintains no state about workload Loads objects with probability equal to yield/size of objectCurrent research:Yield estimation : Current research:Yield estimation Selectivity Estimation Estimating the number of tuples that satisfy a query Accurate estimates can help in For choosing best plans in query optimization As feedback to users Resource consumption, load balancing For BYC to achieve maximum network savings Accurate selectivity estimation is crucial to BYC performance Governs load and evict decisionsSelectivity estimation in BYC : Selectivity estimation in BYC Common methods to estimate selectivity Random sampling, histogram, wavelet based compressions A caching environment introduces new requirements on the selectivity estimation problem Caches are close to clients not servers; often have no access to databases Cache is a constrained resource in terms of storage Caching systems are adaptive and onlineTemplate example : Template example Assume there are large no of queries to the following query template SELECT order date,required date FROM Order WHERE freight [OP] [VALUE] VALUE varies over the domain [A,B]. A,B are real values. Operator OP comes from the set {<,>} Freight distribution : Freight distribution Cumulative frequency distribution for freight attribute for OP = ‘<‘ Needs to known precisely to estimate selectivity of all queriesTemplate example : Template example Assume there are large no of queries to the following query template SELECT order date,required date FROM Order WHERE freight [OP] [VALUE] VALUE varies in the range [M,N]. M,N are real values. Operator OP comes from the set {<,>} The feature vector consists of {OP,VALUE} Consider feature vectors over all such queries. Assume selectivity is known for some queries (say 25%), in addition, say the selectivity can be classified into 3 classes: low, medium and highEstimation example : Estimation example Decision tree learns u, and v from query attributes Use linear regression in each class to obtain actual yield valuesProperties of our solution : Properties of our solution Uses a query based approach Current methods use a data-based approach Detect templates in the workload Most scientific experiments adhere to query templates Very fast and compact Use decision trees and regression over templates Crude estimates for the bypass phase, accurate estimates for loading data Able to process complex queries Extracts feature vectors from templatesFuture directions : Future directions To improve query execution performance in the cache In the absence of indices and materialized views Dynamic physical design which reorganizes itself according to workload Open questions Finding the best file organization How to best represent queries in terms of database schema, without actually changing the physical organization? How to update changes to schema when the workload changes? Plan to work on the above problem till Spring of 2006.Acknowledgements : Acknowledgements Inside: Comp. Science: Randal Burns, Members of HSSL Phy & Astro: Alex Szalay, The SDSS Team JHU: The GBO Committee Outside: Microsoft Research: Jim Gray Univ. of Notre Dame: Amitabh Chaudhary, Nitesh Chawla Carnegie Mellon: Anastassia Ailamaki, Stratos PapadomanolakisWeb services: Web services SkyQuery uses Web services for interoperability Use of Internet standards Communication Protocol : HTTP Message Exchange Model: Simple Object Access Protocol(SOAP) with eXtensible Markup Language(XML) encoding Describing, Defining and Discovering Web Services: Web Services Description Language (WSDL), Uniform Discovery, Description and Integration (UDDI) of Web Services The object cross-match : The object cross-match Matching objects across archives, if they correspond to the same astronomical body. Given: A set of observations, one from each archive. Question: Do these observations refer to the same astronomical body? Answer: Easy, if positions can be measured precisely; the X Match answer is deterministic. Probabilistic Cross Match: Probabilistic Cross Match Measured positions have an error due to instrument inaccuracies. This error follows a Gaussian distribution (and is known for each archive) Given: A set of observations, one from each archive Question: What is the probability that these observations refer to the same astronomical body? Algorithm: ? Probabilistic Cross MatchComputing the probability of a X Match: Computing the probability of a X Match To compute the probability of a X Match, we assume a position for the real body. The probability of a X Match varies with this considered position of the real body. We consider that position for which probability is highest. This position is a kind of weighted mean. Probability at this position can be represented by a circular region about mean. Our X Match algorithm reports those sets of observations: probability > thresholdThe probabilistic cross-match : The probabilistic cross-match Background: Page Caching: Background: Page Caching Fixed size objects/pages, different fetch cost Cache hit is equivalent to an entire page being accessed Caches pages that have high fetch cost Used in operating systems Goal: Minimize total fetch cost of pages P6 P6Background: Object Caching: Background: Object Caching Variable size objects, different fetch cost Cache hit is equivalent to accessing an entire object Caches objects that have high cost/size ratio. Used in proxy web caching Goal: Minimize total fetch cost of objects 08 O8Bypass-Yield Caching: Bypass-Yield Caching Objects are of variable size For each object, cost of access varies Cache hit implies fetching an entire object, part of an object, or an aggregate computed over an object Goal: Minimize total network cost (load and bypass) Client Requests Q8Rate-Profile: Rate-Profile Estimating probability of object access to estimate savings By characterizing workload For objects in cache Estimates probability as frequency-counts over cache lifetime Computes a rate of network savings Every in-cache access increase the rate in proportion to query yield Every time step (query access) decays rate Rate-Profile: Rate-Profile For objects outside cache Separates queries accesses for an object into clusters (episodes) For each episode maintains frequency-counts over episode length Maximum rate in each episode is the maximum benefit of caching; An optimistic decision Ages maximum benefits in each episode by weight Computes the expected rate of network savings Rate-Profile: Algorithm: Rate-Profile: Algorithm Discount object load cost from expected savings Penalize object for the network cost incurred in loading an object into cache Load Vs Bypass Compares discounted expected rate of network savings of an object not in the cache with the rate of network savings of all objects in the cache If greater, load. Else, bypass Eviction Evict object(s) with lowest current savingsOnlineBY: OnlineBY Formulates the bypass-yield caching problem as a combination of Rent-or-buy problem, and Object caching problem The bypass Vs. load decision: A defensive decision Bypasses queries (rents) until total bypass traffic exceeds fetch cost Then, loads (buys) an object Eviction decisions and in-cache policies Cache is maintained according to an algorithm for object caching problem This is O(lg2k) competitive k = size of cache / size of smallest objectRandomized Ski Rental: Randomized Ski Rental Randomized version of OnlineBY Does not store the amount of traffic bypassed Load decision: A random decision Load an object with probability equal to the ratio of the yield of the current query to the fetch cost Requires no metadata for making load Vs bypass decision Reduces total amount of meta data required Performs well in practice ( no bounds yet )Network Cost of a Trace (columns) : Network Cost of a Trace (columns) Rate-Profile dynamically chooses between no cache and inline cache Rate-Profile compares well with an optimal static cache Other algorithms show similar resultsAlgorithm Cost Analysis : Algorithm Cost Analysis Rate-Profile is better than OnlineBY in column caching SpaceEffBY always lags behind In all algorithms, column caching is better than table cachingSchema Locality : Schema Locality Schema locality implies reuse of schema elements than data elements Both tables and columns show heavy and long lasting periods of reuseContainment : Containment It is imperative for workload to show both containment and locality for query caching to be viable (Luo et al) Determining exact query containment is NP complete (Chandra and Merlin) Few objects experience reuseArchitecture : Architecture Status We are currently implementing the above system within BYC. We currently have preliminary experiments prove the efficacy of the above model. Results(1): Range Query: Results(1): Range QueryResult(2): User-defined Function Queries: Result(2): User-defined Function QueriesResult(3): Index Queries: Result(3): Index Queries