SEMINAR ON: SEMINAR ON Data Mining Strategy for Database Buffer Management Presented By: Ajay Kumar Priyadarshi B.Tech Final Yr,CSE 0604210004 Madan Mohan Malviya Engineering College Gorakhpur (U.P.) OUTLINE: OUTLINE Introduction to Buffer management Buffer management Approaches Clock and G-Clock. LRD. FBRD. LRU-K 2Q. Proposal for DATA MINING Conclusion References Buffer Management In Database: Buffer Management In Database Efficient main memory allocation and replacement for database queries. Reduce disk operation,enhance system throughput by utlilising dedicated buffer pool for caching databases pages. DBMS uses main-memory as a buffer to reduce disk access. Buffer is divided into frames of same size and each frame contains the page of secondary storage file. When page request occurs that is not in the buffer pool,the buffer pool manager selects the appropriate frame from list of frames.If frame is not free than Buffer Manager determines the page currently in buffer pool should be replaced. Buffer Management Approaches: Buffer Management Approaches Least Recently Used(LRU). Clock And G-Clock Strategy. Least Reference Density Strategy. Frequency based Replacement Strategy. LRU-K Strategy. 2Q Strategy. Contd…: Contd… Least Recently Used - When a new frame Buffer is needed,the page in the buffer that has not been accessed for the longest time is replaced. Clock and G-Clock Strategy - Clock Strategy Adds a Use-bit to every buffer page. Use-bit indicates whether page referenced or not. 0-bit page is victim for replacement and reset to 1. Encounting 1-bit page is reset to 0 and causes move to next page. Slide 6: Contd… G-Clock Strategy Replaces Use-bit of a buffer page Pi with a reference counter RCi (initialized to a weight equal to 1 or larger than 1). References to Pi increments the corresponding RCi . When a Buffer Fault occurs Victim page Pi is found by decrementing RCi by 1 until 0 value is found Similar to G-Clock,except that it relates actual no. of Pi referenced in RCi to Page Age(total no. of elasped references to all pages since Pi’s first reference) . Least Reference Density Strategy - Slide 7: Contd… Least Reference Density Strategy -- Age Of Page = GRC – FC(i). Where GRC is the total no.of references and FC(i ) is the reference at its first references. RD(i)= RC(i)/GRC(i)/FC(i), where RD(i ) is the reference density of a page. LRD Assumes equidistant arrival of page references. LRD algorithm always chooses the buffer page with lowest RD(i). Problem : High Reference Density at the beginning of interval keeps pages in the buffer much longer the desired. Slide 8: Contd… Frequency Based Replacement Strategy - Similar to LRD,except that it considers factoring out locality from Reference Counts( RCi’s ) through whole page age. FBR frame contains three refinements : New Section : controls increments of reference counts. Old Section : confine the replacement choices that have not been too recently referenced. Adjust section : periodically adjusting reference counts so as too approximate references frequencies. FBR sets portion of frame stack as new section When page hits,its RCi’s not increment,,otherwise incremented Slide 9: Contd… It sets portion of bottom frame stack as old section. All replacement candidates come from this section. The remaining part sets as middle section. Allows frequently occurred pages to build up their references counts. Problem : count overflow. Certain pages build up their high reference counts and never being replaced. Slide 10: Contd… LRU-K Strategy - Problem with LRU is that it gaurantees a long buffer life,even if the page will never been refrenced again. LRU-K solves this problem by considering access history for each page. Keep tracks of time of last K-refreances to all pages in presence of all accessing patterns. When a buffer frame is needed for a new page being read,the page P to be dropped is one with Backward K-distance bt(p,k) is maximum for all pages in buffer. Strength : LRU-K has significant cost/performance advantages over conventional algorithms like LRU,sometimes as high as 40%. Slide 11: Contd… 2Q Strategy - Based on LRU-K,except ,It test pages for their last second refrence time. 2Q divides buffer B into three queues : Am Queue : LRU queue, It keeps frequently accessed pages. Alin Queue : FIFO queue,It retains newly correlated reference pages. Alout Queue : FIFO queue,used to detect pages that have high long term access rates. On the first reference to a page, 2Q places it into Alin. If a page in Alin is accessed, it isn’t promoted to Am queue because the second access may simply be correlated refrence. Slide 12: Contd… When a page fault occurs, if the length of Alin exceeds a certain threshold, the older first accessed page is thrown out of Alin but remembered in Alout. At this time, if Alout also exceeds its length threshold, then discard the oldest page from alout. If Alin queue has not yet achieved its predetermined length, then replace the least recently used page in Am queue. Slide 13: Proposal for Data Mining for Buffer Management The proposed approach discovers knowledge from database access sequences and uses it to guide buffer management . Figure shows the components Page Request Access Information Collector Monitor & Controller Access Knowledge Miner Buffer Manager Database buffer Access Knowledge database Access Information Database Slide 14: Contd… Buffer manager : Manages all operations on the buffer. Monitor & Controller : Controls mining time and frequency by monitoring pre-specified performance metrics of buffer management. Information Collector : collects and stores page requests of each transaction in an access information database. Access Knowledge Miner : mines access knowledge from the access Information database off-line. On performing a series of Experiments to assess the performance. Contd….: Contd…. From the test results presented in table1, we found that the proposed data mining based approach can achieve much higher buffer hit ratio than LRU. Utilizing knowledge discovered from page access history is important for managing databse buffer. Buffer size 2.5% 5.0% 7.5% 10.0% 2.5% DMBMA BUFFER HIT RATIO 11.34% 16.9% 20.65% 22.04% 22.62% LRU BUFFER HIT RATIO 3.49% 7.24% 11.09% 14.43% 17.48% Summary: Summary LRU-K : – Performs better then other approaches by access k page references instead of using 1 page reference. – vulnerable to coordinator failure at certain times causes blocking. Data mining based buffer managemant approach : – Applying knowledge discovered from database access history to database buffer management. – Achieve better System performance . References: References C.M. Chen and N. Roussopoulos. Adaptive database buffer allocation using query feedback.In Proc. of the 19th Conference on Very Large Data Bases, pages 342-353, Dublin, Ireland,August 1993. H. Chen, J.X. Yu, K. Yamaguchi, H. Kitagawa,N. Ohbo and Y. Fujiwara. Lru-s: A new buffer allocation approach for oodbms. In Proc. of the1990 ACM SIGMOD International Conference on management of data. J. Rodriguez-Rosell. Empirical data reference behavior in data base systems. IEEE Computer, Volume 9 , Number 11 , November 1976. M. Stonebraker. Operating system support for database management. Communications of the ACM, Volume 24 , Number 7, pages 412-417, July 1981. References: References T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In Proc. of the 20th Conference on Very Large Data Bases, pages 439-450,Santiago, Chile, September 1994. W. Effelsberg and T. Haerder. Principles of database buffer management. AGM Transaction on Database Systems, Volume 9, Number 4, pages 560-595, December 1984. L. Feng, H. Lu, Y.C. Tay and K.H. Tung. Buffer management in distributed database systems:A data mining based approach. In Proc. Of the 6th International Conference on Extending Database Technology, Valencia, Spain, March 1998. A. Dan, P.S. Yu and J-Y. Chung. Characterization of database access pattern for analytic prediction of buffer hit probability. VLDB Journal, Volume 4, Number 1, pages 127-155, January 1995. Queries….: Queries….