logging in or signing up Chapter 13 - Data Warehousing sumit621 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 481 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: September 25, 2010 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... By: saudqu (18 month(s) ago) very usefull presentation.. thanks Saving..... Post Reply Close Saving..... Edit Comment Close By: renuc8 (20 month(s) ago) cool ppt Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Chapter 13 – Data Warehousing : Chapter 13 – Data Warehousing Databases : Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data, becomes the bases for decision making Decision Support Systems : Decision Support Systems Created to facilitate the decision making process So much information that it is difficult to extract it all from a traditional database Need for a more comprehensive data storage facility Data Warehouse Decision Support Systems : Decision Support Systems Extract Information from data to use as the basis for decision making Used at all levels of the Organization Tailored to specific business areas Interactive Ad Hoc queries to retrieve and display information Combines historical operation data with business activities 4 Components of DSS : 4 Components of DSS Data Store – The DSS Database Business Data Business Model Data Internal and External Data Data Extraction and Filtering Extract and validate data from the operational database and the external data sources 4 Components of DSS : 4 Components of DSS End-User Query Tool Create Queries that access either the Operational or the DSS database End User Presentation Tools Organize and Present the Data Differences with DSS : Differences with DSS Operational Stored in Normalized Relational Database Support transactions that represent daily operations (Not Query Friendly) 3 Main Differences Time Span Granularity Dimensionality Time Span : Time Span Operational Real Time Current Transactions Short Time Frame Specific Data Facts DSS Historic Long Time Frame (Months/Quarters/Years) Patterns Granularity : Granularity Operational Specific Transactions that occur at a given time DSS Shown at different levels of aggregation Different Summary Levels Decompose (drill down) Summarize (roll up) Dimensionality : Dimensionality Most distinguishing characteristic of DSS data Operational Represents atomic transactions DSS Data is related in Many ways Develop the larger picture Multi-dimensional view of data DSS Database Requirements : DSS Database Requirements DSS Database Scheme Support Complex and Non-Normalized data Summarized and Aggregate data Multiple Relationships Queries must extract multi-dimensional time slices Redundant Data DSS Database Requirements : DSS Database Requirements Data Extraction and Filtering DSS databases are created mainly by extracting data from operational databases combined with data imported from external source Need for advanced data extraction & filtering tools Allow batch / scheduled data extraction Support different types of data sources Check for inconsistent data / data validation rules Support advanced data integration / data formatting conflicts DSS Database Requirements : DSS Database Requirements End User Analytical Interface Must support advanced data modeling and data presentation tools Data analysis tools Query generation Must Allow the User to Navigate through the DSS Size Requirements VERY Large – Terabytes Advanced Hardware (Multiple processors, multiple disk arrays, etc.) Data Warehouse : Data Warehouse DSS – friendly data repository for the DSS is the DATA WAREHOUSE Definition: Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making Integrated : Integrated The data warehouse is a centralized, consolidated database that integrated data derived from the entire organization Multiple Sources Diverse Sources Diverse Formats Subject-Oriented : Subject-Oriented Data is arranged and optimized to provide answer to questions from diverse functional areas Data is organized and summarized by topic Sales / Marketing / Finance / Distribution / Etc. Time-Variant : Time-Variant The Data Warehouse represents the flow of data through time Can contain projected data from statistical models Data is periodically uploaded then time-dependent data is recomputed Nonvolatile : Nonvolatile Once data is entered it is NEVER removed Represents the company’s entire history Near term history is continually added to it Always growing Must support terabyte databases and multiprocessors Read-Only database for data analysis and query processing Data Marts : Data Marts Small Data Stores More manageable data sets Targeted to meet the needs of small groups within the organization Small, Single-Subject data warehouse subset that provides decision support to a small group of people OLAP : OLAP Online Analytical Processing Tools DSS tools that use multidimensional data analysis techniques Support for a DSS data store Data extraction and integration filter Specialized presentation interface 12 Rules of a Data Warehouse : 12 Rules of a Data Warehouse Data Warehouse and Operational Environments are Separated Data is integrated Contains historical data over a long period of time Data is a snapshot data captured at a given point in time Data is subject-oriented 12 Rules of Data Warehouse : 12 Rules of Data Warehouse Mainly read-only with periodic batch updates Development Life Cycle has a data driven approach versus the traditional process-driven approach Data contains several levels of detail Current, Old, Lightly Summarized, Highly Summarized 12 Rules of Data Warehouse : 12 Rules of Data Warehouse Environment is characterized by Read-only transactions to very large data sets System that traces data sources, transformations, and storage Metadata is a critical component Source, transformation, integration, storage, relationships, history, etc Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users OLAP : OLAP Need for More Intensive Decision Support 4 Main Characteristics Multidimensional data analysis Advanced Database Support Easy-to-use end-user interfaces Support Client/Server architecture Multidimensional Data Analysis Techniques : Multidimensional Data Analysis Techniques Advanced Data Presentation Functions 3-D graphics, Pivot Tables, Crosstabs, etc. Compatible with Spreadsheets & Statistical packages Advanced data aggregations, consolidation and classification across time dimensions Advanced computational functions Advanced data modeling functions Advanced Database Support : Advanced Database Support Advanced Data Access Features Access to many kinds of DBMS’s, flat files, and internal and external data sources Access to aggregated data warehouse data Advanced data navigation (drill-downs and roll-ups) Ability to map end-user requests to the appropriate data source Support for Very Large Databases Easy-to-Use End-User Interface : Easy-to-Use End-User Interface Graphical User Interfaces Much more useful if access is kept simple Client/Server Architecture : Client/Server Architecture Framework for the new systems to be designed, developed and implemented Divide the OLAP system into several components that define its architecture Same Computer Distributed among several computer OLAP Architecture : OLAP Architecture 3 Main Modules GUI Analytical Processing Logic Data-processing Logic Slide 30: OLAP Client/Server Architecture Relational OLAP : Relational OLAP Relational Online Analytical Processing OLAP functionality using relational database and familiar query tools to store and analyze multidimensional data Multidimensional data schema support Data access language & query performance for multidimensional data Support for Very Large Databases Multidimensional Data Schema Support : Multidimensional Data Schema Support Decision Support Data tends to be Nonnormalized Duplicated Preaggregated Star Schema Special Design technique for multidimensional data representations Optimize data query operations instead of data update operations Star Schemas : Star Schemas Data Modeling Technique to map multidimensional decision support data into a relational database Current Relational modeling techniques do not serve the needs of advanced data requirements Star Schema : Star Schema 4 Components Facts Dimensions Attributes Attribute Hierarchies Facts : Facts Numeric measurements (values) that represent a specific business aspect or activity Stored in a fact table at the center of the star scheme Contains facts that are linked through their dimensions Can be computed or derived at run time Updated periodically with data from operational databases Dimensions : Dimensions Qualifying characteristics that provide additional perspectives to a given fact DSS data is almost always viewed in relation to other data Dimensions are normally stored in dimension tables Attributes : Attributes Dimension Tables contain Attributes Attributes are used to search, filter, or classify facts Dimensions provide descriptive characteristics about the facts through their attributed Must define common business attributes that will be used to narrow a search, group information, or describe dimensions. (ex.: Time / Location / Product) No mathematical limit to the number of dimensions (3-D makes it easy to model) Attribute Hierarchies : Attribute Hierarchies Provides a Top-Down data organization Aggregation Drill-down / Roll-Up data analysis Attributes from different dimensions can be grouped to form a hierarchy Star Schema for Sales : Star Schema for Sales Fact Table Dimension Tables Star Schema Representation : Star Schema Representation Fact and Dimensions are represented by physical tables in the data warehouse database Fact tables are related to each dimension table in a Many to One relationship (Primary/Foreign Key Relationships) Fact Table is related to many dimension tables The primary key of the fact table is a composite primary key from the dimension tables Each fact table is designed to answer a specific DSS question Star Schema : Star Schema The fact table is always the larges table in the star schema Each dimension record is related to thousand of fact records Star Schema facilitated data retrieval functions DBMS first searches the Dimension Tables before the larger fact table Data Warehouse Implementation : Data Warehouse Implementation An Active Decision Support Framework Not a Static Database Always a Work in Process Complete Infrastructure for Company-Wide decision support Hardware / Software / People / Procedures / Data Data Warehouse is a critical component of the Modern DSS – But not the Only critical component Data Mining : Data Mining Discover Previously unknown data characteristics, relationships, dependencies, or trends Typical Data Analysis Relies on end users Define the Problem Select the Data Initial the Data Analysis Reacts to External Stimulus Data Mining : Data Mining Proactive Automatically searches Anomalies Possible Relationships Identify Problems before the end-user Data Mining tools analyze the data, uncover problems or opportunities hidden in data relationships, form computer models based on their findings, and then user the models to predict business behavior – with minimal end-user intervention Data Mining : Data Mining A methodology designed to perform knowledge-discovery expeditions over the database data with minimal end-user intervention 3 Stages of Data Data Information Knowledge Extraction of Knowledge from Data : Extraction of Knowledge from Data 4 Phases of Data Mining : 4 Phases of Data Mining Data Preparation Identify the main data sets to be used by the data mining operation (usually the data warehouse) Data Analysis and Classification Study the data to identify common data characteristics or patterns Data groupings, classifications, clusters, sequences Data dependencies, links, or relationships Data patterns, trends, deviation 4 Phases of Data Mining : 4 Phases of Data Mining Knowledge Acquisition Uses the Results of the Data Analysis and Classification phase Data mining tool selects the appropriate modeling or knowledge-acquisition algorithms Neural Networks Decision Trees Rules Induction Genetic algorithms Memory-Based Reasoning Prognosis Predict Future Behavior Forecast Business Outcomes 65% of customers who did not use a particular credit card in the last 6 months are 88% likely to cancel the account. Data Mining : Data Mining Still a New Technique May find many Unmeaningful Relationships Good at finding Practical Relationships Define Customer Buying Patterns Improve Product Development and Acceptance Etc. Potential of becoming the next frontier in database development You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Chapter 13 - Data Warehousing sumit621 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 481 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: September 25, 2010 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... By: saudqu (18 month(s) ago) very usefull presentation.. thanks Saving..... Post Reply Close Saving..... Edit Comment Close By: renuc8 (20 month(s) ago) cool ppt Saving..... Post Reply Close Saving..... Edit Comment Close Premium member Presentation Transcript Chapter 13 – Data Warehousing : Chapter 13 – Data Warehousing Databases : Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data, becomes the bases for decision making Decision Support Systems : Decision Support Systems Created to facilitate the decision making process So much information that it is difficult to extract it all from a traditional database Need for a more comprehensive data storage facility Data Warehouse Decision Support Systems : Decision Support Systems Extract Information from data to use as the basis for decision making Used at all levels of the Organization Tailored to specific business areas Interactive Ad Hoc queries to retrieve and display information Combines historical operation data with business activities 4 Components of DSS : 4 Components of DSS Data Store – The DSS Database Business Data Business Model Data Internal and External Data Data Extraction and Filtering Extract and validate data from the operational database and the external data sources 4 Components of DSS : 4 Components of DSS End-User Query Tool Create Queries that access either the Operational or the DSS database End User Presentation Tools Organize and Present the Data Differences with DSS : Differences with DSS Operational Stored in Normalized Relational Database Support transactions that represent daily operations (Not Query Friendly) 3 Main Differences Time Span Granularity Dimensionality Time Span : Time Span Operational Real Time Current Transactions Short Time Frame Specific Data Facts DSS Historic Long Time Frame (Months/Quarters/Years) Patterns Granularity : Granularity Operational Specific Transactions that occur at a given time DSS Shown at different levels of aggregation Different Summary Levels Decompose (drill down) Summarize (roll up) Dimensionality : Dimensionality Most distinguishing characteristic of DSS data Operational Represents atomic transactions DSS Data is related in Many ways Develop the larger picture Multi-dimensional view of data DSS Database Requirements : DSS Database Requirements DSS Database Scheme Support Complex and Non-Normalized data Summarized and Aggregate data Multiple Relationships Queries must extract multi-dimensional time slices Redundant Data DSS Database Requirements : DSS Database Requirements Data Extraction and Filtering DSS databases are created mainly by extracting data from operational databases combined with data imported from external source Need for advanced data extraction & filtering tools Allow batch / scheduled data extraction Support different types of data sources Check for inconsistent data / data validation rules Support advanced data integration / data formatting conflicts DSS Database Requirements : DSS Database Requirements End User Analytical Interface Must support advanced data modeling and data presentation tools Data analysis tools Query generation Must Allow the User to Navigate through the DSS Size Requirements VERY Large – Terabytes Advanced Hardware (Multiple processors, multiple disk arrays, etc.) Data Warehouse : Data Warehouse DSS – friendly data repository for the DSS is the DATA WAREHOUSE Definition: Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making Integrated : Integrated The data warehouse is a centralized, consolidated database that integrated data derived from the entire organization Multiple Sources Diverse Sources Diverse Formats Subject-Oriented : Subject-Oriented Data is arranged and optimized to provide answer to questions from diverse functional areas Data is organized and summarized by topic Sales / Marketing / Finance / Distribution / Etc. Time-Variant : Time-Variant The Data Warehouse represents the flow of data through time Can contain projected data from statistical models Data is periodically uploaded then time-dependent data is recomputed Nonvolatile : Nonvolatile Once data is entered it is NEVER removed Represents the company’s entire history Near term history is continually added to it Always growing Must support terabyte databases and multiprocessors Read-Only database for data analysis and query processing Data Marts : Data Marts Small Data Stores More manageable data sets Targeted to meet the needs of small groups within the organization Small, Single-Subject data warehouse subset that provides decision support to a small group of people OLAP : OLAP Online Analytical Processing Tools DSS tools that use multidimensional data analysis techniques Support for a DSS data store Data extraction and integration filter Specialized presentation interface 12 Rules of a Data Warehouse : 12 Rules of a Data Warehouse Data Warehouse and Operational Environments are Separated Data is integrated Contains historical data over a long period of time Data is a snapshot data captured at a given point in time Data is subject-oriented 12 Rules of Data Warehouse : 12 Rules of Data Warehouse Mainly read-only with periodic batch updates Development Life Cycle has a data driven approach versus the traditional process-driven approach Data contains several levels of detail Current, Old, Lightly Summarized, Highly Summarized 12 Rules of Data Warehouse : 12 Rules of Data Warehouse Environment is characterized by Read-only transactions to very large data sets System that traces data sources, transformations, and storage Metadata is a critical component Source, transformation, integration, storage, relationships, history, etc Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users OLAP : OLAP Need for More Intensive Decision Support 4 Main Characteristics Multidimensional data analysis Advanced Database Support Easy-to-use end-user interfaces Support Client/Server architecture Multidimensional Data Analysis Techniques : Multidimensional Data Analysis Techniques Advanced Data Presentation Functions 3-D graphics, Pivot Tables, Crosstabs, etc. Compatible with Spreadsheets & Statistical packages Advanced data aggregations, consolidation and classification across time dimensions Advanced computational functions Advanced data modeling functions Advanced Database Support : Advanced Database Support Advanced Data Access Features Access to many kinds of DBMS’s, flat files, and internal and external data sources Access to aggregated data warehouse data Advanced data navigation (drill-downs and roll-ups) Ability to map end-user requests to the appropriate data source Support for Very Large Databases Easy-to-Use End-User Interface : Easy-to-Use End-User Interface Graphical User Interfaces Much more useful if access is kept simple Client/Server Architecture : Client/Server Architecture Framework for the new systems to be designed, developed and implemented Divide the OLAP system into several components that define its architecture Same Computer Distributed among several computer OLAP Architecture : OLAP Architecture 3 Main Modules GUI Analytical Processing Logic Data-processing Logic Slide 30: OLAP Client/Server Architecture Relational OLAP : Relational OLAP Relational Online Analytical Processing OLAP functionality using relational database and familiar query tools to store and analyze multidimensional data Multidimensional data schema support Data access language & query performance for multidimensional data Support for Very Large Databases Multidimensional Data Schema Support : Multidimensional Data Schema Support Decision Support Data tends to be Nonnormalized Duplicated Preaggregated Star Schema Special Design technique for multidimensional data representations Optimize data query operations instead of data update operations Star Schemas : Star Schemas Data Modeling Technique to map multidimensional decision support data into a relational database Current Relational modeling techniques do not serve the needs of advanced data requirements Star Schema : Star Schema 4 Components Facts Dimensions Attributes Attribute Hierarchies Facts : Facts Numeric measurements (values) that represent a specific business aspect or activity Stored in a fact table at the center of the star scheme Contains facts that are linked through their dimensions Can be computed or derived at run time Updated periodically with data from operational databases Dimensions : Dimensions Qualifying characteristics that provide additional perspectives to a given fact DSS data is almost always viewed in relation to other data Dimensions are normally stored in dimension tables Attributes : Attributes Dimension Tables contain Attributes Attributes are used to search, filter, or classify facts Dimensions provide descriptive characteristics about the facts through their attributed Must define common business attributes that will be used to narrow a search, group information, or describe dimensions. (ex.: Time / Location / Product) No mathematical limit to the number of dimensions (3-D makes it easy to model) Attribute Hierarchies : Attribute Hierarchies Provides a Top-Down data organization Aggregation Drill-down / Roll-Up data analysis Attributes from different dimensions can be grouped to form a hierarchy Star Schema for Sales : Star Schema for Sales Fact Table Dimension Tables Star Schema Representation : Star Schema Representation Fact and Dimensions are represented by physical tables in the data warehouse database Fact tables are related to each dimension table in a Many to One relationship (Primary/Foreign Key Relationships) Fact Table is related to many dimension tables The primary key of the fact table is a composite primary key from the dimension tables Each fact table is designed to answer a specific DSS question Star Schema : Star Schema The fact table is always the larges table in the star schema Each dimension record is related to thousand of fact records Star Schema facilitated data retrieval functions DBMS first searches the Dimension Tables before the larger fact table Data Warehouse Implementation : Data Warehouse Implementation An Active Decision Support Framework Not a Static Database Always a Work in Process Complete Infrastructure for Company-Wide decision support Hardware / Software / People / Procedures / Data Data Warehouse is a critical component of the Modern DSS – But not the Only critical component Data Mining : Data Mining Discover Previously unknown data characteristics, relationships, dependencies, or trends Typical Data Analysis Relies on end users Define the Problem Select the Data Initial the Data Analysis Reacts to External Stimulus Data Mining : Data Mining Proactive Automatically searches Anomalies Possible Relationships Identify Problems before the end-user Data Mining tools analyze the data, uncover problems or opportunities hidden in data relationships, form computer models based on their findings, and then user the models to predict business behavior – with minimal end-user intervention Data Mining : Data Mining A methodology designed to perform knowledge-discovery expeditions over the database data with minimal end-user intervention 3 Stages of Data Data Information Knowledge Extraction of Knowledge from Data : Extraction of Knowledge from Data 4 Phases of Data Mining : 4 Phases of Data Mining Data Preparation Identify the main data sets to be used by the data mining operation (usually the data warehouse) Data Analysis and Classification Study the data to identify common data characteristics or patterns Data groupings, classifications, clusters, sequences Data dependencies, links, or relationships Data patterns, trends, deviation 4 Phases of Data Mining : 4 Phases of Data Mining Knowledge Acquisition Uses the Results of the Data Analysis and Classification phase Data mining tool selects the appropriate modeling or knowledge-acquisition algorithms Neural Networks Decision Trees Rules Induction Genetic algorithms Memory-Based Reasoning Prognosis Predict Future Behavior Forecast Business Outcomes 65% of customers who did not use a particular credit card in the last 6 months are 88% likely to cancel the account. Data Mining : Data Mining Still a New Technique May find many Unmeaningful Relationships Good at finding Practical Relationships Define Customer Buying Patterns Improve Product Development and Acceptance Etc. Potential of becoming the next frontier in database development