logging in or signing up dw olap FunSchool Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 2357 Category: Education License: All Rights Reserved Like it (2) Dislike it (0) Added: June 18, 2007 This Presentation is Public Favorites: 2 Presentation Description No description available. Comments Posting comment... By: atabari (43 month(s) ago) Dear Sir, . I'm doing a research on the structure of OLAP systems . So I want to know if I can have a copy of your useful presentation about OLAP . My email is : ali.tabari_univer@yahoo.com Thank you in advance. Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Data Warehousing andOLAP: Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing: Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice andamp; dice, rollup, MOLAP, pivot, ... Outline: Outline What is a data warehouse? Why a warehouse? Models andamp; operations Implementing a warehouse Future directions What is a Warehouse?: What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision maker often a copy of operational data with value-added data (e.g., summaries, history) integrated time-varying non-volatile What is a Warehouse?: What is a Warehouse? Collection of tools gathering data cleansing, integrating, ... querying, reporting, analysis data mining monitoring, administering warehouse Warehouse Architecture: Warehouse Architecture Metadata Why a Warehouse?: Why a Warehouse? Two Approaches: Query-Driven (Lazy) Warehouse (Eager) Query-Driven Approach: Query-Driven Approach Advantages of Warehousing: Advantages of Warehousing High query performance Queries not visible outside warehouse Local processing at sources unaffected Can operate when sources unavailable Can query data not stored in a DBMS Extra information at warehouse Modify, summarize (store aggregates) Add historical information Advantages of Query-Driven: Advantages of Query-Driven No need to copy data less storage no need to purchase data More up-to-date data Query needs can be unknown Only query interface needed at sources May be less draining on sources OLTP vs. OLAP: OLTP vs. OLAP OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse OLTP vs. OLAP: OLTP vs. OLAP Mostly updates Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users OLTP OLAP Data Marts: Data Marts Smaller warehouses Spans part of organization e.g., marketing (customers, products, sales) Do not require enterprise-wide consensus but long term integration problems? Warehouse Models & Operators: Warehouse Models andamp; Operators Data Models relations stars andamp; snowflakes cubes Operators slice andamp; dice roll-up, drill down pivoting other Star: Star Star Schema: Star Schema Terms: Terms Fact table Dimension tables Measures Dimension Hierarchies: Dimension Hierarchies store sType city region è snowflake schema è constellations Cube: Cube Fact table view: Multi-dimensional cube: dimensions = 2 3-D Cube: 3-D Cube dimensions = 3 Multi-dimensional cube: Fact table view: ROLAP vs. MOLAP: ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing MOLAP: Multi-Dimensional On-Line Analytical Processing Aggregates: Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81 Aggregates: Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date Another Example: Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollup Aggregates: Aggregates Operators: sum, count, max, min, median, ave 'Having' clause Using dimension hierarchy average by region (within store) maximum by month (within date) Cube Aggregation: Cube Aggregation day 2 day 1 129 . . . Example: computing sums Cube Operators: Cube Operators day 2 day 1 129 . . . sale(c1,*,*) sale(*,*,*) sale(c2,p2,*) Extended Cube: Extended Cube day 2 day 1 * sale(*,p2,*) Aggregation Using Hierarchies: Aggregation Using Hierarchies customer region country (customer c1 in Region A; customers c2, c3 in Region B) Pivoting: Pivoting Multi-dimensional cube: Fact table view: Implementing a Warehouse: Implementing a Warehouse Monitoring: Sending data from sources Integrating: Loading, cleansing,... Processing: Query processing, indexing, ... Managing: Metadata, Design, ... Monitoring: Monitoring Source Types: relational, flat file, IMS, VSAM, IDMS, WWW, news-wire, … Incremental vs. Refresh Monitoring Techniques: Monitoring Techniques Periodic snapshots Database triggers Log shipping Data shipping (replication service) Transaction shipping Polling (queries to source) Screen scraping Application level monitoring è Advantages andamp; Disadvantages!! Monitoring Issues: Monitoring Issues Frequency periodic: daily, weekly, … triggered: on 'big' change, lots of changes, ... Data transformation convert data to uniform format remove andamp; add fields (e.g., add date to get history) Standards (e.g., ODBC) Gateways Integration: Integration Data Cleaning Data Loading Derived Data Data Cleaning: Data Cleaning Migration (e.g., yen ð dollars) Scrubbing: use domain-specific knowledge (e.g., social security numbers) Fusion (e.g., mail list, customer merging) Auditing: discover rules andamp; relationships (like data mining) Loading Data: Loading Data Incremental vs. refresh Off-line vs. on-line Frequency of loading At night, 1x a week/month, continuously Parallel/Partitioned load Derived Data: Derived Data Derived Warehouse Data indexes aggregates materialized views (next slide) When to update derived data? Incremental vs. refresh Materialized Views: Materialized Views Define new warehouse relations using SQL expressions Processing: Processing ROLAP servers vs. MOLAP servers Index Structures What to Materialize? Algorithms ROLAP Server: ROLAP Server Relational OLAP Server tools Special indices, tuning; Schema is 'denormalized' MOLAP Server: MOLAP Server Multi-Dimensional OLAP Server multi-dimensional server M.D. tools could also sit on relational DBMS Index Structures: Index Structures Traditional Access Methods B-trees, hash tables, R-trees, grids, … Popular in Warehouses inverted lists bit map indexes join indexes text indexes Inverted Lists: Inverted Lists . . . age index inverted lists data records Using Inverted Lists: Using Inverted Lists Query: Get people with age = 20 and name = 'fred' List for age = 20: r4, r18, r34, r35 List for name = 'fred': r18, r52 Answer is intersection: r18 Bit Maps: Bit Maps . . . age index bit maps data records Using Bit Maps: Using Bit Maps Query: Get people with age = 20 and name = 'fred' List for age = 20: 1101100000 List for name = 'fred': 0100000001 Answer is intersection: 010000000000 Good if domain cardinality small Bit vectors can be compressed Join: Join 'Combine' SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT Join Indexes: Join Indexes join index What to Materialize?: What to Materialize? Store in warehouse results useful for common queries Example: day 2 day 1 129 . . . total sales materialize Materialization Factors: Materialization Factors Type/frequency of queries Query response time Storage cost Update cost Cube Aggregates Lattice: Cube Aggregates Lattice city, product, date city, product city, date product, date city product date all 129 use greedy algorithm to decide what to materialize Dimension Hierarchies: Dimension Hierarchies all state city Dimension Hierarchies: Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown... Interesting Hierarchy: Interesting Hierarchy all years quarters months days weeks conceptual dimension table Design: Design What data is needed? Where does it come from? How to clean data? How to represent in warehouse (schema)? What to summarize? What to materialize? What to index? Tools: Tools Development design andamp; edit: schemas, views, scripts, rules, queries, reports Planning andamp; Analysis what-if scenarios (schema changes, refresh rates), capacity planning Warehouse Management performance monitoring, usage patterns, exception reporting System andamp; Network Management measure traffic (sources, warehouse, clients) Workflow Management 'reliable scripts' for cleaning andamp; analyzing data Current State of Industry: Current State of Industry Extraction and integration done off-line Usually in large, time-consuming, batches Everything copied at warehouse Not selective about what is stored Query benefit vs storage andamp; update cost Query optimization aimed at OLTP High throughput instead of fast response Process whole query before displaying anything You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
dw olap FunSchool Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 2357 Category: Education License: All Rights Reserved Like it (2) Dislike it (0) Added: June 18, 2007 This Presentation is Public Favorites: 2 Presentation Description No description available. Comments Posting comment... By: atabari (43 month(s) ago) Dear Sir, . I'm doing a research on the structure of OLAP systems . So I want to know if I can have a copy of your useful presentation about OLAP . My email is : ali.tabari_univer@yahoo.com Thank you in advance. Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Data Warehousing andOLAP: Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing: Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice andamp; dice, rollup, MOLAP, pivot, ... Outline: Outline What is a data warehouse? Why a warehouse? Models andamp; operations Implementing a warehouse Future directions What is a Warehouse?: What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision maker often a copy of operational data with value-added data (e.g., summaries, history) integrated time-varying non-volatile What is a Warehouse?: What is a Warehouse? Collection of tools gathering data cleansing, integrating, ... querying, reporting, analysis data mining monitoring, administering warehouse Warehouse Architecture: Warehouse Architecture Metadata Why a Warehouse?: Why a Warehouse? Two Approaches: Query-Driven (Lazy) Warehouse (Eager) Query-Driven Approach: Query-Driven Approach Advantages of Warehousing: Advantages of Warehousing High query performance Queries not visible outside warehouse Local processing at sources unaffected Can operate when sources unavailable Can query data not stored in a DBMS Extra information at warehouse Modify, summarize (store aggregates) Add historical information Advantages of Query-Driven: Advantages of Query-Driven No need to copy data less storage no need to purchase data More up-to-date data Query needs can be unknown Only query interface needed at sources May be less draining on sources OLTP vs. OLAP: OLTP vs. OLAP OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse OLTP vs. OLAP: OLTP vs. OLAP Mostly updates Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users OLTP OLAP Data Marts: Data Marts Smaller warehouses Spans part of organization e.g., marketing (customers, products, sales) Do not require enterprise-wide consensus but long term integration problems? Warehouse Models & Operators: Warehouse Models andamp; Operators Data Models relations stars andamp; snowflakes cubes Operators slice andamp; dice roll-up, drill down pivoting other Star: Star Star Schema: Star Schema Terms: Terms Fact table Dimension tables Measures Dimension Hierarchies: Dimension Hierarchies store sType city region è snowflake schema è constellations Cube: Cube Fact table view: Multi-dimensional cube: dimensions = 2 3-D Cube: 3-D Cube dimensions = 3 Multi-dimensional cube: Fact table view: ROLAP vs. MOLAP: ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing MOLAP: Multi-Dimensional On-Line Analytical Processing Aggregates: Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81 Aggregates: Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date Another Example: Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollup Aggregates: Aggregates Operators: sum, count, max, min, median, ave 'Having' clause Using dimension hierarchy average by region (within store) maximum by month (within date) Cube Aggregation: Cube Aggregation day 2 day 1 129 . . . Example: computing sums Cube Operators: Cube Operators day 2 day 1 129 . . . sale(c1,*,*) sale(*,*,*) sale(c2,p2,*) Extended Cube: Extended Cube day 2 day 1 * sale(*,p2,*) Aggregation Using Hierarchies: Aggregation Using Hierarchies customer region country (customer c1 in Region A; customers c2, c3 in Region B) Pivoting: Pivoting Multi-dimensional cube: Fact table view: Implementing a Warehouse: Implementing a Warehouse Monitoring: Sending data from sources Integrating: Loading, cleansing,... Processing: Query processing, indexing, ... Managing: Metadata, Design, ... Monitoring: Monitoring Source Types: relational, flat file, IMS, VSAM, IDMS, WWW, news-wire, … Incremental vs. Refresh Monitoring Techniques: Monitoring Techniques Periodic snapshots Database triggers Log shipping Data shipping (replication service) Transaction shipping Polling (queries to source) Screen scraping Application level monitoring è Advantages andamp; Disadvantages!! Monitoring Issues: Monitoring Issues Frequency periodic: daily, weekly, … triggered: on 'big' change, lots of changes, ... Data transformation convert data to uniform format remove andamp; add fields (e.g., add date to get history) Standards (e.g., ODBC) Gateways Integration: Integration Data Cleaning Data Loading Derived Data Data Cleaning: Data Cleaning Migration (e.g., yen ð dollars) Scrubbing: use domain-specific knowledge (e.g., social security numbers) Fusion (e.g., mail list, customer merging) Auditing: discover rules andamp; relationships (like data mining) Loading Data: Loading Data Incremental vs. refresh Off-line vs. on-line Frequency of loading At night, 1x a week/month, continuously Parallel/Partitioned load Derived Data: Derived Data Derived Warehouse Data indexes aggregates materialized views (next slide) When to update derived data? Incremental vs. refresh Materialized Views: Materialized Views Define new warehouse relations using SQL expressions Processing: Processing ROLAP servers vs. MOLAP servers Index Structures What to Materialize? Algorithms ROLAP Server: ROLAP Server Relational OLAP Server tools Special indices, tuning; Schema is 'denormalized' MOLAP Server: MOLAP Server Multi-Dimensional OLAP Server multi-dimensional server M.D. tools could also sit on relational DBMS Index Structures: Index Structures Traditional Access Methods B-trees, hash tables, R-trees, grids, … Popular in Warehouses inverted lists bit map indexes join indexes text indexes Inverted Lists: Inverted Lists . . . age index inverted lists data records Using Inverted Lists: Using Inverted Lists Query: Get people with age = 20 and name = 'fred' List for age = 20: r4, r18, r34, r35 List for name = 'fred': r18, r52 Answer is intersection: r18 Bit Maps: Bit Maps . . . age index bit maps data records Using Bit Maps: Using Bit Maps Query: Get people with age = 20 and name = 'fred' List for age = 20: 1101100000 List for name = 'fred': 0100000001 Answer is intersection: 010000000000 Good if domain cardinality small Bit vectors can be compressed Join: Join 'Combine' SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT Join Indexes: Join Indexes join index What to Materialize?: What to Materialize? Store in warehouse results useful for common queries Example: day 2 day 1 129 . . . total sales materialize Materialization Factors: Materialization Factors Type/frequency of queries Query response time Storage cost Update cost Cube Aggregates Lattice: Cube Aggregates Lattice city, product, date city, product city, date product, date city product date all 129 use greedy algorithm to decide what to materialize Dimension Hierarchies: Dimension Hierarchies all state city Dimension Hierarchies: Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown... Interesting Hierarchy: Interesting Hierarchy all years quarters months days weeks conceptual dimension table Design: Design What data is needed? Where does it come from? How to clean data? How to represent in warehouse (schema)? What to summarize? What to materialize? What to index? Tools: Tools Development design andamp; edit: schemas, views, scripts, rules, queries, reports Planning andamp; Analysis what-if scenarios (schema changes, refresh rates), capacity planning Warehouse Management performance monitoring, usage patterns, exception reporting System andamp; Network Management measure traffic (sources, warehouse, clients) Workflow Management 'reliable scripts' for cleaning andamp; analyzing data Current State of Industry: Current State of Industry Extraction and integration done off-line Usually in large, time-consuming, batches Everything copied at warehouse Not selective about what is stored Query benefit vs storage andamp; update cost Query optimization aimed at OLTP High throughput instead of fast response Process whole query before displaying anything