NINF 5205: Database Design, Development & Implementation, :NINF 5205: Database Design, Development & Implementation, Chapter 13
Business Intelligence and Data Warehouses
Objectives :Objectives In this chapter, you will learn:
How business intelligence is a comprehensive framework to support business decision making
How operational data and decision support data differ
What a data warehouse is, how to prepare data for one, and how to implement one
What star schemas are and how they are constructed 2 Database Systems, 8th Edition
Objectives (continued): :Objectives (continued): In this chapter, you will learn: (continued)
What data mining is and what role it plays in decision support
About online analytical processing (OLAP)
How SQL extensions are used to support OLAP-type data manipulations 3 Database Systems, 8th Edition
The Need for Data Analysis :The Need for Data Analysis Managers track daily transactions to evaluate how the business is performing
Strategies should be developed to meet organizational goals using operational databases
Data analysis provides information about short-term tactical evaluations and strategies 4 Database Systems, 8th Edition
Business Intelligence :Business Intelligence Comprehensive, cohesive, integrated tools and processes
Capture, collect, integrate, store, and analyze data
Generate information to support business decision making
Framework that allows a business to transform:
Data into information
Information into knowledge
Knowledge into wisdom 5 Database Systems, 8th Edition
Business Intelligence Architecture :Business Intelligence Architecture Composed of data, people, processes, technology, and management of components
Focuses on strategic and tactical use of information
Key performance indicators (KPI)
Measurements that assess company’s effectiveness or success in reaching goals
Multiple tools from different vendors can be integrated into a single BI framework 6 Database Systems, 8th Edition
Slide 7 :Database Systems, 8th Edition 7
Decision Support Data :Decision Support Data Operational data
Mostly stored in relational database
Optimized to support transactions representing daily operations
Decision support data differs from operational data in three main areas:
Time span
Granularity
Dimensionality 8 Database Systems, 8th Edition
Slide 9 :Database Systems, 8th Edition 9
Decision Support Database Requirements :Decision Support Database Requirements Specialized DBMS tailored to provide fast answers to complex queries
Four main requirements:
Database schema
Data extraction and loading
End-user analytical interface
Database size 10 Database Systems, 8th Edition
Decision SupportDatabase Requirements (continued) :Decision SupportDatabase Requirements (continued) Database schema
Complex data representations
Aggregated and summarized data
Queries extract multidimensional time slices
Data extraction and filtering
Supports different data sources
Flat files
Hierarchical, network, and relational databases
Multiple vendors
Checking for inconsistent data 11 Database Systems, 8th Edition
Decision SupportDatabase Requirements (continued) :Decision SupportDatabase Requirements (continued) End-user analytical interface
One of most critical DSS DBMS components
Permits user to navigate through data to simplify and accelerate decision-making process
Database size
In 2005, Wal-Mart had 260 terabytes of data in its data warehouses
DBMS must support very large databases (VLDBs) 12 Database Systems, 8th Edition
The Data Warehouse :The Data Warehouse Integrated, subject-oriented, time-variant, and nonvolatile collection of data
Provides support for decision making
Usually a read-only database optimized for data analysis and query processing
Requires time, money, and considerable managerial effort to create 13 Database Systems, 8th Edition
The Data Warehouse (continued) :The Data Warehouse (continued) Data mart
Small, single-subject data warehouse subset
More manageable data set than data warehouse
Provides decision support to small group of people
Typically lower cost and lower implementation time than data warehouse 14 Database Systems, 8th Edition
Twelve Rules that Define a Data Warehouse :Twelve Rules that Define a Data Warehouse Data warehouse and operational environments are separated
Data warehouse data are integrated
Data warehouse contains historical data over long time
Data warehouse data are snapshot data captured at given point in time
Data warehouse data are subject-oriented 15 Database Systems, 8th Edition
Twelve Rules that Define a Data Warehouse (continued) :Twelve Rules that Define a Data Warehouse (continued) Data warehouse data are mainly read-only
Periodic batch updates from operational data
No online updates allowed
Data warehouse development life cycle differs from classical systems development
Data warehouse contains data with several levels of detail:
Current detail data, old detail data, lightly summarized data, and highly summarized data 16 Database Systems, 8th Edition
Twelve Rules that Define a Data Warehouse (continued) :Twelve Rules that Define a Data Warehouse (continued) Read-only transactions to very large data sets
Data warehouse environment traces data sources, transformations, and storage
Data warehouse’s metadata are critical component of this environment
Data warehouse contains chargeback mechanism for resource usage
Enforces optimal use of data by end users 17 Database Systems, 8th Edition
Decision Support Architectural Styles :Decision Support Architectural Styles Provide advanced decision support features
Some capable of providing access to multidimensional data analysis
Complete data warehouse architecture supports:
Decision support data store
Data extraction and integration filter
Specialized presentation interface 18 Database Systems, 8th Edition
Online Analytical Processing :Online Analytical Processing Advanced data analysis environment that supports:
Decision making
Business modeling
Operations research
Four main characteristics:
Use multidimensional data analysis techniques
Provide advanced database support
Provide easy-to-use end-user interfaces
Support client/server architecture 19 Database Systems, 8th Edition
Multidimensional Data Analysis Techniques :Multidimensional Data Analysis Techniques Data are processed and viewed as part of a multidimensional structure
Augmented by the following functions:
Advanced data presentation functions
Advanced data aggregation, consolidation, and classification functions
Advanced computational functions
Advanced data modeling functions 20 Database Systems, 8th Edition
Slide 21 :Database Systems, 8th Edition 21
Advanced Database Support :Advanced Database Support Advanced data access features include:
Access to many different kinds of DBMSs, flat files, and internal and external data sources
Access to aggregated data warehouse data
Advanced data navigation
Rapid and consistent query response times
Maps end-user requests to appropriate data source and to proper data access language
Support for very large databases 22 Database Systems, 8th Edition
Easy-to-Use End-User Interface :Easy-to-Use End-User Interface Advanced OLAP features more useful when access is simple
Many interface features are “borrowed” from previous generations of data analysis tools
Already familiar to end users
Makes OLAP easily accepted and readily used 23 Database Systems, 8th Edition
Client/Server Architecture :Client/Server Architecture Provides framework for design, development, implementation of new systems
Enables OLAP system to be divided into several components that define its architecture
OLAP is designed to meet ease-of-use as well as system flexibility requirements 24 Database Systems, 8th Edition
OLAP Architecture :OLAP Architecture Operational characteristics’ three main modules:
Graphical user interface (GUI)
Analytical processing logic
Data-processing logic
Designed to use both operational and data warehouse data
In most implementations, data warehouse and OLAP are interrelated and complementary
OLAP systems merge data warehouse and data mart approaches 25 Database Systems, 8th Edition
Slide 26 :Database Systems, 8th Edition 26
Relational OLAP :Relational OLAP Uses relational databases and relational query tools
Stores and analyzes multidimensional data
Adds following extensions to traditional RDBMS:
Multidimensional data schema support within RDBMS
Data access language and query performance optimized for multidimensional data
Support for very large databases 27 Database Systems, 8th Edition
Multidimensional OLAP :Multidimensional OLAP Extends OLAP functionality to multidimensional database management systems (MDBMSs)
MDBMS end users visualize stored data as a 3D data cube
Data cubes can grow to n dimensions, becoming hypercubes
To speed access, data cubes are held in memory in a cube cache 28 Database Systems, 8th Edition
Slide 29 :Database Systems, 8th Edition 29
Relational vs. Multidimensional OLAP :Relational vs. Multidimensional OLAP Selection of one or the other depends on evaluator’s vantage point
Proper evaluation must include supported hardware, compatibility with DBMS, etc.
ROLAP and MOLAP vendors working toward integration within unified framework
Relational databases use star schema design to handle multidimensional data 30 Database Systems, 8th Edition
Star Schema :Star Schema Data modeling technique
Maps multidimensional decision support data into relational database
Creates near equivalent of multidimensional database schema from relational data
Easily implemented model for multidimensional data analysis
Preserves relational structures on which operational database is built
Four components: facts, dimensions, attributes, and attribute hierarchies 31 Database Systems, 8th Edition
Facts :Facts Numeric measurements that represent specific business aspect or activity
Normally stored in fact table that is center of star schema
Fact table contains facts linked through their dimensions
Metrics are facts computed at run time 32 Database Systems, 8th Edition
Dimensions :Dimensions Qualifying characteristics provide additional perspectives to a given fact
Decision support data almost always viewed in relation to other data
Study facts via dimensions
Dimensions stored in dimension tables 33 Database Systems, 8th Edition
Attributes :Attributes Use to search, filter, and classify facts
Dimensions provide descriptions of facts through their attributes
No mathematical limit to the number of dimensions
Slice and dice: focus on slices of the data cube for more detailed analysis 34 Database Systems, 8th Edition
Attribute Hierarchies :Attribute Hierarchies Provide top-down data organization
Two purposes:
Aggregation
Drill-down/roll-up data analysis
Determine how the data are extracted and represented
Stored in the DBMS’s data dictionary
Used by OLAP tool to access warehouse properly 35 Database Systems, 8th Edition
Star Schema Representation :Star Schema Representation Facts and dimensions represented in physical tables in data warehouse database
Many fact rows related to each dimension row
Primary key of fact table is a composite primary key
Fact table primary key formed by combining foreign keys pointing to dimension tables
Dimension tables smaller than fact tables
Each dimension record related to thousands of fact records 36 Database Systems, 8th Edition
Performance-Improving Techniques for the Star Schema :Performance-Improving Techniques for the Star Schema Four techniques to optimize data warehouse design:
Normalizing dimensional tables
Maintaining multiple fact tables to represent different aggregation levels
Denormalizing fact tables
Partitioning and replicating tables 37 Database Systems, 8th Edition
Performance-Improving Techniques for the Star Schema (continued) :Performance-Improving Techniques for the Star Schema (continued) Dimension tables normalized to:
Achieve semantic simplicity
Facilitate end-user navigation through the dimensions
Denormalizing fact tables improves data access performance and saves data storage space
Partitioning splits table into subsets of rows or columns
Replication makes copy of table and places it in different location 38 Database Systems, 8th Edition
Implementing a Data Warehouse :Implementing a Data Warehouse Numerous constraints, including:
Available funding
Management’s view of role played by an IS department
Extent and depth of information requirements
Corporate culture
No single formula can describe perfect data warehouse development 39 Database Systems, 8th Edition
The Data Warehouse as an Active Decision Support Framework :The Data Warehouse as an Active Decision Support Framework Data warehouse:
Is not a static database
Is a dynamic framework for decision support that is always a work in progress
Data warehouse is critical component of modern BI environment
Design and implementation must be examined as part of entire infrastructure 40 Database Systems, 8th Edition
A Company-Wide Effort That Requires User Involvement :A Company-Wide Effort That Requires User Involvement Data warehouse data cross departmental lines and geographical boundaries
Building a data warehouse requires the designer to:
Involve end users in process
Secure end users’ commitment from beginning
Create continuous end-user feedback
Manage end-user expectations
Establish procedures for conflict resolution 41 Database Systems, 8th Edition
Satisfy the Trilogy: Data, Analysis, and Users :Satisfy the Trilogy: Data, Analysis, and Users Data warehouse designer must satisfy:
Data integration and loading criteria
Data analysis capabilities with acceptable query performance
End-user data analysis needs 42 Database Systems, 8th Edition
Apply Database Design Procedures :Apply Database Design Procedures Company-wide effort requiring many resources
Quantity of data requires latest hardware and software
Detailed procedures to orchestrate flow of data from operational databases to data warehouse
People with advanced database design, software integration, and management skills 43 Database Systems, 8th Edition
Slide 44 :Database Systems, 8th Edition 44
Data Mining :Data Mining Data-mining tools do the following:
Analyze data
Uncover problems or opportunities hidden in data relationships
Form computer models based on their findings
Use models to predict business behavior
Requires minimal end-user intervention 45 Database Systems, 8th Edition
SQL Extensions for OLAP :SQL Extensions for OLAP Proliferation of OLAP tools fostered development of SQL extensions
Many innovations have become part of standard SQL
All SQL commands will work in data warehouse as expected
Most queries include many data groupings and aggregations over multiple columns 46 Database Systems, 8th Edition
The ROLLUP Extension :The ROLLUP Extension Used with GROUP BY clause to generate aggregates by different dimensions
GROUP BY generates only one aggregate for each new value combination of attributes
ROLLUP extension enables subtotal for each column listed except for the last one
Last column gets grand total
Order of column list important 47 Database Systems, 8th Edition
The CUBE Extension :The CUBE Extension CUBE extension used with GROUP BY clause to generate aggregates by listed columns
Includes the last column
Enables subtotal for each column in addition to grand total for last column
Useful when you want to compute all possible subtotals within groupings
Cross-tabulations good application of CUBE extension 48 Database Systems, 8th Edition
Materialized Views :Materialized Views A dynamic table that contains SQL query command to generate rows
Also contains the actual rows
Created the first time query is run and summary rows are stored in table
Automatically updated when base tables are updated 49 Database Systems, 8th Edition
Summary :Summary Business intelligence generates information used to support decision making
BI covers a range of technologies, applications, and functionalities
Decision support systems were the precursor of current generation BI systems
Operational data not suited for decision support 50 Database Systems, 8th Edition
Summary (continued) :Summary (continued) Four categories of requirements for decision support DBMS:
Database schema
Data extraction and loading
End-user analytical interface
Database size requirements
Data warehouse provides support for decision making
Usually read-only
Optimized for data analysis, query processing 51 Database Systems, 8th Edition
Summary (continued) :Summary (continued) OLAP systems have four main characteristics:
Use of multidimensional data analysis
Advanced database support
Easy-to-use end-user interfaces
Client/server architecture
ROLAP provides OLAP functionality with relational databases
MOLAP provides OLAP functionality with MDBMSs 52 Database Systems, 8th Edition
Summary (continued) :Summary (continued) Star schema is a data-modeling technique
Maps multidimensional decision support data into a relational database
Star schema has four components:
Facts
Dimensions
Attributes
Attribute hierarchies 53 Database Systems, 8th Edition
Summary (continued) :Summary (continued) Four techniques optimize data warehouse design:
Normalize dimensional tables
Maintain multiple fact tables
Denormalize fact tables
Partition and replicate tables
Data mining automates analysis of operational data
SQL extensions support OLAP-type processing and data generation 54 Database Systems, 8th Edition