logging in or signing up R13 2 Sigfrid Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 94 Category: Education License: All Rights Reserved Like it (2) Dislike it (0) Added: February 07, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach: Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign VLDB 2006Outline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Multi-Dimensional Ranking Analysis: Multi-Dimensional Ranking Analysis Consider an online used car database R Type (e.g., sedan, convertible) Maker (e.g., Ford, Hyundai) Color (e.g., red, silver) Price Mileage select top 10 * from R where type = “convertible” order by price + mileage asc select top 10 * from R where type = “convertible” and color = “red” order by price + mileage asc select top 10 * from R where type = “convertible” and color = “red” and maker = “Ford” order by price + mileage asc Roll Up Drill DownOLAP with Ranking?: OLAP with Ranking? Data cube revisited Pre-compute multi-dimensional group-bys Traditional measures: SUM, COUNT, AVG Materializing all top-k results is not feasible Different k values Various ranking functions e.g., order by (price-20k)^2 + (mileage-10k)^2 asc Our Proposal: Ranking Cube Semi-online computation with semi-offline materialization Support a broad class of ranking functionsMore on Rank-Aware Data Cube: More on Rank-Aware Data Cube Given a relation R A1, A2, …, As are selection dimensions N1,N2,…,Nr are ranking dimensions {Ai} and {Nj} are not exclusive Our goal: efficiently answering top-k queries in a multi-dimensional space Ranking function f(x) satisfies: Given the sub-domain of x, the extreme point x* can be computed Given a sub-domain of x, the upper and lower bounds of f(x) can be computedRank-Aware Query Processing: Rank-Aware Query Processing Rank-aware materialization for linear functions Onion [Chang et al, SIGMOD’00], PREFER [Hristidis et al. SIGMOD’01], Robust Indexing [Xin et al. VLDB’06] Rank-aware query transformation Map rank query to range query [Chaudhuri et al. VLDB’99, Bruno et al. TODS’02] Rank-aware query optimization TA [Fagin et al. PODS’ 01], RankSQL [Li et al. SIGMOD’05], Boolean+Ranking [Zhang et al. SIGMOD’06] Rank aggregate RankAgg [Li et al. SIGMOD’06], ObjectFinder [Kaushik et al. SIGMOD’06] Rank query with Joins Ranked Join indices [Tsaparas et al. ICDE’03], Rank-Join [Ilyas et al, VLDB’03, SIGMOD ’04] And more…What’s New with Ranking Cube: What’s New with Ranking Cube An effort made to enrich the data cube Inherit the power of multi-dimensional analysis A new rank-aware materialization without assuming particular (e.g., linear) function Top-k query processing based on Rank Cube Transfer a top-k query to a sequence of selection queries Block-level access instead of tuple-level access No modification needed in DBMSOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Ranking Cube: Ranking Cube Intuition Given a ranking function, the ranking cube should be able to: Quickly locate the most promising data region How many tuples are there, and which tuples? Efficient data retrieval Approach Step 1: Create logical block space for rank analysis Group geometrically closed tuples into blocks by data partitioning Equi-depth R-tree Clustering Compute (logical) block ID for each block The logical block space constitutes the basis for data cubingRanking Cube (cont.): Ranking Cube (cont.) Approach (cont.) Step 1: Create logical block space for rank analysis Each tuple is associated with a (logical) block ID Step 2: Compute measures in ranking cube Group-by with selection dimensions Straight-forward measure: logical block IDs, as well as the list of tuple ID (TID) inside Alternative measure: Compressed version (will discuss later) Step 3: Create physical block space for efficient I/O The size of the logical block differs in each cuboids due to the multi-dimensional selections Group nearby logical blocks into a physical block for efficient data retrievalConstructing Ranking Cube: Constructing Ranking Cube Expected Logical Block Size P Measure in Ranking Cube A cell in ranking cube Generating Logical Block Dimension A1,A2: Selection Dimensions N1,N2: Ranking Dimensions Create Logical Block Space N1 N2 Data Cubing Table for data cubing Block tableConstructing Ranking Cube (cont.): Constructing Ranking Cube (cont.) The sizes of TID list in different cuboids are not balanced due to the different cardinality of each dimension Physical block: Merge nearby logical blocks Logical block: Original block partitions Physical BlockOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Query Processing (1): Query Processing (1) Data access methods Get physical block from Ranking Cube: Clustered index on Cell identifiers (A1, A2, B’) Get logical block from Block Table: Clustered index on BQuery Processing (2): Query Processing (2) Locate the first logical block (b1) The target physical block (t1, t3, t4) is retrieved (t1,t4) is returned, t3 is buffered Query processing works on logical block space Data accessing works on physical block space Ranking Cube maintains the mapping between logical block and physical block S list: maintains the current top answers H list: maintains the best possible unseen answers Locate the second logical block (b5) The target physical block (t1, t3, t4) is identified t3 was buffered, thus is directed returned Select top 2 * from R where A1=1 and A2=1 order by N1+N2 ascQuery Processing (3): Query Processing (3) Determine next logical blocks to be retrieved First logical block: analyzed by ranking function Continuing logical blocks Found in neighboring blocks (for convex functions) Decompose the space and analyze each of them (for other functions) Ranking Function: N1+N2 First Block Second Block Ranking Function: (N1-0.5)^2+(N2-0.5)^2 First Block Second BlockOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Ranking Fragments (1): Ranking Fragments (1) Curse of dimensionality? ABCD ABC ABD ACD BCD AC BC AD BD CD A D B C AB Partition dimensions into several groups Materialize low dimensional cuboids offline Assembly high dimensional cuboids online Mining Cube Approach [Li et al, VLDB’04]Ranking Fragments (2): Ranking Fragments (2) Materialized: Cuboids A1, Cuboids A2 To Assemble: Cuboids A1A2 Do not assemble the whole cuboids Assemble the required cells only Requested Cell in Cuboids A1A2 Cuboids A1: Request (A1=1, B=b1), return { t1, t4, t10,…} Cuboids A2: Request (A2=1, B=b1), return {t1, t4, t3,…} Merge two lists for cuboids A1A2: Request (A1=1,A2=1, B=b1), return { t1, t4,} Ranking Cube: Beyond the index: Decouple the cubing and partitioning modules Unified logical block space for all cuboids Reduces computation and space comparing with m-dim indices Makes the online fragment assembly easier Advanced partitioning methods for high-dimensional and structural data Compressing TID list Lossless compression: e.g., dictionary encoding, null suppression Lossy compression: e.g., bloom filter High-level summary: e.g., count (mean) in each logical block Compressing across cuboids: e.g., correlation between cells Block-level data access instead of tuple-level access Ranking Cube: Beyond the indexOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Experimental Results: Experimental Results Performance study Baseline: SQL Server Index on each dimension Query transformation [Chaudhuri et al. VLDB’99] Transform a ranked query to a range selection query Multi-dimensional index Ranking cube (fragment) Index on cube cells (cuboids) Index on block ids (block tables)Experiment Setting: Experiment Setting Implementation details C# + MS SQL Server 2005 Store all ranking cubes, block tables in SQL Server Synthetic data sets Real data set: Forest CoverType 12 selection dimensions, 3 ranking dimensions, 3.5M tuples Execution Time w.r.t. K: Execution Time w.r.t. K Synthetic data Default parameter setting# Dimensions in Ranking Function: # Dimensions in Ranking Function Synthetic data with 4 ranking dimensions Partitioning was built on all 4 ranking dimensions The number of ranking attributes in queries are varied from 2 to 4 Number of Data Tuples: Number of Data Tuples Vary the size of the database from 1M to 10M Very promising performance on large datasets Ranking Fragments: Ranking Fragments Forest CoverType data Partition selection dimensions into groups with size 3 Build ranking fragments on each group Vary the fragment sizeSpace Usage: Space Usage 1. Space usage grows linearly with number of selection dimensions 2. Most space is used to store the cell identifiers in the relational table 3. The space usage can be greatly reduced by storing the ranking cube out of the relational table Build ranking fragments with group size 2Conclusions and Future work: Conclusions and Future work OLAP with Ranking Ranking Cube as semi-offline materialization Ranked query processing by semi-online computation Current status Extended to multi-relational ranked queries using multi-rank-cube Future work Apply compression techniques Exploit and compare different partitioning strategies Support more query types You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
R13 2 Sigfrid Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 94 Category: Education License: All Rights Reserved Like it (2) Dislike it (0) Added: February 07, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach: Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign VLDB 2006Outline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Multi-Dimensional Ranking Analysis: Multi-Dimensional Ranking Analysis Consider an online used car database R Type (e.g., sedan, convertible) Maker (e.g., Ford, Hyundai) Color (e.g., red, silver) Price Mileage select top 10 * from R where type = “convertible” order by price + mileage asc select top 10 * from R where type = “convertible” and color = “red” order by price + mileage asc select top 10 * from R where type = “convertible” and color = “red” and maker = “Ford” order by price + mileage asc Roll Up Drill DownOLAP with Ranking?: OLAP with Ranking? Data cube revisited Pre-compute multi-dimensional group-bys Traditional measures: SUM, COUNT, AVG Materializing all top-k results is not feasible Different k values Various ranking functions e.g., order by (price-20k)^2 + (mileage-10k)^2 asc Our Proposal: Ranking Cube Semi-online computation with semi-offline materialization Support a broad class of ranking functionsMore on Rank-Aware Data Cube: More on Rank-Aware Data Cube Given a relation R A1, A2, …, As are selection dimensions N1,N2,…,Nr are ranking dimensions {Ai} and {Nj} are not exclusive Our goal: efficiently answering top-k queries in a multi-dimensional space Ranking function f(x) satisfies: Given the sub-domain of x, the extreme point x* can be computed Given a sub-domain of x, the upper and lower bounds of f(x) can be computedRank-Aware Query Processing: Rank-Aware Query Processing Rank-aware materialization for linear functions Onion [Chang et al, SIGMOD’00], PREFER [Hristidis et al. SIGMOD’01], Robust Indexing [Xin et al. VLDB’06] Rank-aware query transformation Map rank query to range query [Chaudhuri et al. VLDB’99, Bruno et al. TODS’02] Rank-aware query optimization TA [Fagin et al. PODS’ 01], RankSQL [Li et al. SIGMOD’05], Boolean+Ranking [Zhang et al. SIGMOD’06] Rank aggregate RankAgg [Li et al. SIGMOD’06], ObjectFinder [Kaushik et al. SIGMOD’06] Rank query with Joins Ranked Join indices [Tsaparas et al. ICDE’03], Rank-Join [Ilyas et al, VLDB’03, SIGMOD ’04] And more…What’s New with Ranking Cube: What’s New with Ranking Cube An effort made to enrich the data cube Inherit the power of multi-dimensional analysis A new rank-aware materialization without assuming particular (e.g., linear) function Top-k query processing based on Rank Cube Transfer a top-k query to a sequence of selection queries Block-level access instead of tuple-level access No modification needed in DBMSOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Ranking Cube: Ranking Cube Intuition Given a ranking function, the ranking cube should be able to: Quickly locate the most promising data region How many tuples are there, and which tuples? Efficient data retrieval Approach Step 1: Create logical block space for rank analysis Group geometrically closed tuples into blocks by data partitioning Equi-depth R-tree Clustering Compute (logical) block ID for each block The logical block space constitutes the basis for data cubingRanking Cube (cont.): Ranking Cube (cont.) Approach (cont.) Step 1: Create logical block space for rank analysis Each tuple is associated with a (logical) block ID Step 2: Compute measures in ranking cube Group-by with selection dimensions Straight-forward measure: logical block IDs, as well as the list of tuple ID (TID) inside Alternative measure: Compressed version (will discuss later) Step 3: Create physical block space for efficient I/O The size of the logical block differs in each cuboids due to the multi-dimensional selections Group nearby logical blocks into a physical block for efficient data retrievalConstructing Ranking Cube: Constructing Ranking Cube Expected Logical Block Size P Measure in Ranking Cube A cell in ranking cube Generating Logical Block Dimension A1,A2: Selection Dimensions N1,N2: Ranking Dimensions Create Logical Block Space N1 N2 Data Cubing Table for data cubing Block tableConstructing Ranking Cube (cont.): Constructing Ranking Cube (cont.) The sizes of TID list in different cuboids are not balanced due to the different cardinality of each dimension Physical block: Merge nearby logical blocks Logical block: Original block partitions Physical BlockOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Query Processing (1): Query Processing (1) Data access methods Get physical block from Ranking Cube: Clustered index on Cell identifiers (A1, A2, B’) Get logical block from Block Table: Clustered index on BQuery Processing (2): Query Processing (2) Locate the first logical block (b1) The target physical block (t1, t3, t4) is retrieved (t1,t4) is returned, t3 is buffered Query processing works on logical block space Data accessing works on physical block space Ranking Cube maintains the mapping between logical block and physical block S list: maintains the current top answers H list: maintains the best possible unseen answers Locate the second logical block (b5) The target physical block (t1, t3, t4) is identified t3 was buffered, thus is directed returned Select top 2 * from R where A1=1 and A2=1 order by N1+N2 ascQuery Processing (3): Query Processing (3) Determine next logical blocks to be retrieved First logical block: analyzed by ranking function Continuing logical blocks Found in neighboring blocks (for convex functions) Decompose the space and analyze each of them (for other functions) Ranking Function: N1+N2 First Block Second Block Ranking Function: (N1-0.5)^2+(N2-0.5)^2 First Block Second BlockOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Ranking Fragments (1): Ranking Fragments (1) Curse of dimensionality? ABCD ABC ABD ACD BCD AC BC AD BD CD A D B C AB Partition dimensions into several groups Materialize low dimensional cuboids offline Assembly high dimensional cuboids online Mining Cube Approach [Li et al, VLDB’04]Ranking Fragments (2): Ranking Fragments (2) Materialized: Cuboids A1, Cuboids A2 To Assemble: Cuboids A1A2 Do not assemble the whole cuboids Assemble the required cells only Requested Cell in Cuboids A1A2 Cuboids A1: Request (A1=1, B=b1), return { t1, t4, t10,…} Cuboids A2: Request (A2=1, B=b1), return {t1, t4, t3,…} Merge two lists for cuboids A1A2: Request (A1=1,A2=1, B=b1), return { t1, t4,} Ranking Cube: Beyond the index: Decouple the cubing and partitioning modules Unified logical block space for all cuboids Reduces computation and space comparing with m-dim indices Makes the online fragment assembly easier Advanced partitioning methods for high-dimensional and structural data Compressing TID list Lossless compression: e.g., dictionary encoding, null suppression Lossy compression: e.g., bloom filter High-level summary: e.g., count (mean) in each logical block Compressing across cuboids: e.g., correlation between cells Block-level data access instead of tuple-level access Ranking Cube: Beyond the indexOutline : Outline Introduction Ranking cube Answering top-k queries by Ranking Cube Ranking Fragments Performance study Discussion and Conclusions Experimental Results: Experimental Results Performance study Baseline: SQL Server Index on each dimension Query transformation [Chaudhuri et al. VLDB’99] Transform a ranked query to a range selection query Multi-dimensional index Ranking cube (fragment) Index on cube cells (cuboids) Index on block ids (block tables)Experiment Setting: Experiment Setting Implementation details C# + MS SQL Server 2005 Store all ranking cubes, block tables in SQL Server Synthetic data sets Real data set: Forest CoverType 12 selection dimensions, 3 ranking dimensions, 3.5M tuples Execution Time w.r.t. K: Execution Time w.r.t. K Synthetic data Default parameter setting# Dimensions in Ranking Function: # Dimensions in Ranking Function Synthetic data with 4 ranking dimensions Partitioning was built on all 4 ranking dimensions The number of ranking attributes in queries are varied from 2 to 4 Number of Data Tuples: Number of Data Tuples Vary the size of the database from 1M to 10M Very promising performance on large datasets Ranking Fragments: Ranking Fragments Forest CoverType data Partition selection dimensions into groups with size 3 Build ranking fragments on each group Vary the fragment sizeSpace Usage: Space Usage 1. Space usage grows linearly with number of selection dimensions 2. Most space is used to store the cell identifiers in the relational table 3. The space usage can be greatly reduced by storing the ranking cube out of the relational table Build ranking fragments with group size 2Conclusions and Future work: Conclusions and Future work OLAP with Ranking Ranking Cube as semi-offline materialization Ranked query processing by semi-online computation Current status Extended to multi-relational ranked queries using multi-rank-cube Future work Apply compression techniques Exploit and compare different partitioning strategies Support more query types