logging in or signing up s970 burdick Stella Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 66 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 05, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript OLAP Over Uncertain and Imprecise Data: OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin), Shivakumar Vaithyanathan (IBM)Dimensions in OLAP: CA MA NY TX East West All Location Civic Sierra F150 Camry Truck Sedan All Automobile Dimensions in OLAPMeasures, Facts, and Queries: Auto = Truck Loc = East SUM(Repair) = ? Measures, Facts, and Queries MA NY TX CA West East ALL Civic Sierra F150 Camry Truck Sedan ALL Automobile p1 Auto = F150 Loc = NY Repair = $200 LocationSlide4: Extend the OLAP model to handle data ambiguity Imprecision UncertaintyImprecision: MA NY TX CA West East ALL Location Civic Sierra F150 Camry Truck Sedan ALL Automobile p1 p2 p3 p4 p5 p6 p7 p8 Auto = F150 Loc = East Repair = $200 p9 p10 Imprecision p11Representing Imprecision using Dimension Hierarchies : Representing Imprecision using Dimension Hierarchies Dimension hierarchies lead to a natural space of “partially specified” objects Sources of imprecision: incomplete data, multiple sources of dataMotivating Example: Sierra F150 Truck MA NY East p5 Motivating Example Query: COUNTDesideratum I: Consistency: Desideratum I: Consistency Consistency specifies the relationship between answers to related queries on a fixed data set Sierra F150 Truck MA NY East p5 Desideratum II: Faithfulness: Desideratum II: Faithfulness Faithfulness specifies the relationship between answers to a fixed query on related data sets Sierra F150 MA NY Data Set 1 Data Set 2 Data Set 3Slide10: Formal definitions of both Consistency and Faithfulness depend on the underlying aggregation operator Can we define query semantics that satisfy these desiderata?Slide11: p1 p2 Query Semantics Possible Worlds [Kripke63,…] p4 p1 p3 p5 p2 p1 p3 p4 p5 p2 p4 p1 p3 p5 p2 w1 w2 w3 w4Possible Worlds Query Semantics: Possible Worlds Query Semantics Given all possible worlds together with their probabilities, queries are easily answered (using expected values) But number of possible worlds is exponential!Allocation: Allocation Allocation gives facts weighted assignments to possible completions, leading to an extended version of the data Size increase is linear in number of (completions of) imprecise facts Queries operate over this extended version Key contributions: Appropriate characterization of the large space of allocation policies Designing efficient allocation policies that take into account the correlations in the dataStoring Allocations using Extended Data Model: Storing Allocations using Extended Data Model p1 p2 Truck EastClassifying Allocation Policies: Classifying Allocation Policies Ignored Used Ignored Used Uniform EM Count Measure Correlation Dimension CorrelationResults on Query Semantics: Results on Query Semantics Evaluating queries over extended version of data yields expected value of the aggregation operator over all possible worlds intuitively, the correct value to compute Efficient query evaluation algorithms for SUM, COUNT consistency and faithfulness for SUM, COUNT are satisfied under appropriate conditions Dynamic programming algorithm for AVERAGE Unfortunately, consistency does not hold for AVERAGEAlternative Semantics for AVERAGE: Alternative Semantics for AVERAGE APPROXIMATE AVERAGE E[SUM] / E[COUNT] instead of E[SUM/COUNT] simpler and more efficient satisfies consistency extends to aggregation operators for uncertain measuresUncertainty: Uncertainty Measure value is modeled as a probability distribution function over some base domain e.g., measure Brake is a pdf over values {Yes,No} sources of uncertainty: measures extracted from text using classifiers Adapt well-known concepts from statistics to derive appropriate aggregation operators Our framework and solutions for dealing with imprecision also extend to uncertain measuresSummary: Summary Consistency and faithfulness desiderata for designing query semantics for imprecise data Allocation is the key to our framework Efficient algorithms for aggregation operators with appropriate guarantees of consistency and faithfulness Iterative algorithms for allocation policiesCorrelation-based Allocation: Correlation-based Allocation Involves defining an objective function to capture some underlying correlation structure a more stringent requirement on the allocations solving the resulting optimization problem yields the allocations EM-based iterative allocation policy interesting highlight: allocations are re-scaled iteratively by computing appropriate aggregations You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
s970 burdick Stella Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 66 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: February 05, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript OLAP Over Uncertain and Imprecise Data: OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin), Shivakumar Vaithyanathan (IBM)Dimensions in OLAP: CA MA NY TX East West All Location Civic Sierra F150 Camry Truck Sedan All Automobile Dimensions in OLAPMeasures, Facts, and Queries: Auto = Truck Loc = East SUM(Repair) = ? Measures, Facts, and Queries MA NY TX CA West East ALL Civic Sierra F150 Camry Truck Sedan ALL Automobile p1 Auto = F150 Loc = NY Repair = $200 LocationSlide4: Extend the OLAP model to handle data ambiguity Imprecision UncertaintyImprecision: MA NY TX CA West East ALL Location Civic Sierra F150 Camry Truck Sedan ALL Automobile p1 p2 p3 p4 p5 p6 p7 p8 Auto = F150 Loc = East Repair = $200 p9 p10 Imprecision p11Representing Imprecision using Dimension Hierarchies : Representing Imprecision using Dimension Hierarchies Dimension hierarchies lead to a natural space of “partially specified” objects Sources of imprecision: incomplete data, multiple sources of dataMotivating Example: Sierra F150 Truck MA NY East p5 Motivating Example Query: COUNTDesideratum I: Consistency: Desideratum I: Consistency Consistency specifies the relationship between answers to related queries on a fixed data set Sierra F150 Truck MA NY East p5 Desideratum II: Faithfulness: Desideratum II: Faithfulness Faithfulness specifies the relationship between answers to a fixed query on related data sets Sierra F150 MA NY Data Set 1 Data Set 2 Data Set 3Slide10: Formal definitions of both Consistency and Faithfulness depend on the underlying aggregation operator Can we define query semantics that satisfy these desiderata?Slide11: p1 p2 Query Semantics Possible Worlds [Kripke63,…] p4 p1 p3 p5 p2 p1 p3 p4 p5 p2 p4 p1 p3 p5 p2 w1 w2 w3 w4Possible Worlds Query Semantics: Possible Worlds Query Semantics Given all possible worlds together with their probabilities, queries are easily answered (using expected values) But number of possible worlds is exponential!Allocation: Allocation Allocation gives facts weighted assignments to possible completions, leading to an extended version of the data Size increase is linear in number of (completions of) imprecise facts Queries operate over this extended version Key contributions: Appropriate characterization of the large space of allocation policies Designing efficient allocation policies that take into account the correlations in the dataStoring Allocations using Extended Data Model: Storing Allocations using Extended Data Model p1 p2 Truck EastClassifying Allocation Policies: Classifying Allocation Policies Ignored Used Ignored Used Uniform EM Count Measure Correlation Dimension CorrelationResults on Query Semantics: Results on Query Semantics Evaluating queries over extended version of data yields expected value of the aggregation operator over all possible worlds intuitively, the correct value to compute Efficient query evaluation algorithms for SUM, COUNT consistency and faithfulness for SUM, COUNT are satisfied under appropriate conditions Dynamic programming algorithm for AVERAGE Unfortunately, consistency does not hold for AVERAGEAlternative Semantics for AVERAGE: Alternative Semantics for AVERAGE APPROXIMATE AVERAGE E[SUM] / E[COUNT] instead of E[SUM/COUNT] simpler and more efficient satisfies consistency extends to aggregation operators for uncertain measuresUncertainty: Uncertainty Measure value is modeled as a probability distribution function over some base domain e.g., measure Brake is a pdf over values {Yes,No} sources of uncertainty: measures extracted from text using classifiers Adapt well-known concepts from statistics to derive appropriate aggregation operators Our framework and solutions for dealing with imprecision also extend to uncertain measuresSummary: Summary Consistency and faithfulness desiderata for designing query semantics for imprecise data Allocation is the key to our framework Efficient algorithms for aggregation operators with appropriate guarantees of consistency and faithfulness Iterative algorithms for allocation policiesCorrelation-based Allocation: Correlation-based Allocation Involves defining an objective function to capture some underlying correlation structure a more stringent requirement on the allocations solving the resulting optimization problem yields the allocations EM-based iterative allocation policy interesting highlight: allocations are re-scaled iteratively by computing appropriate aggregations