Presentation Transcript
Data Warehousing/Mining Introduction :Data Warehousing/Mining Introduction
Outline of Lecture :Outline of Lecture Brief History of Data Warehousing
What is a Data Warehouse?
Need For Strategic Information
Information Crisis
Operational and Decision Support System
Difference B/W standard DB and Data warehouse
Data Warehouse Evolution :Data Warehouse Evolution TIME 2000 1995 1980 1960 1975 Information-
Based
Management Data
Revolution “Middle
Ages” “Prehistoric
Times” Relational
Databases PC’s and
Spreadsheets End-user
Interfaces 1st DW
Article DW
Confs. Vendor DW
Frameworks Company
DWs “Building the
DW”
Inmon (1992) Data Replication
Tools 1985 1990
Escalating Need For Strategic Information :Escalating Need For Strategic Information Organizations need information to formulate the business strategies,establish Goals,set Objectives
e.g.
Increase the customer by 10% over the next 5 years
Gain market share by 15% in the next 2 years
Increase product quality levels in the top five product groups
The Information Crisis :The Information Crisis Information is said to be doubled every 18 months
Organizations have tons of data available
Then why information Crisis?
Why cant organizations convert the data into useful information for strategic decision making?
Problem: Heterogeneous Information Sources :Problem: Heterogeneous Information Sources “Heterogeneities are everywhere” Different interfaces
Different data representations
Diverse structure of databases
Duplicate and inconsistent information Personal
Databases Digital Libraries Scientific Databases World
Wide
Web
About Some Definitions :About Some Definitions What is data?
What is information?
What is Warehouse?
What is a Data Warehouse?A Practitioners Viewpoint :What is a Data Warehouse?A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
-- Barry Devlin, IBM Consultant
A Data Warehouse is... :A Data Warehouse is... Stored collection of diverse data
A solution to data integration problem
Single repository of information
Subject-oriented
Organized by subject, not by application
Used for analysis, data mining, etc.
Large volume of data (Gb, Tb)
Non-volatile
Historical
Time attributes are important
A Data Warehouse is... (continued) :A Data Warehouse is... (continued) Updates infrequent
Examples
All transactions EVER at WalMart
Complete client histories at insurance firm
Stockbroker financial information and portfolios
Summary :Summary Operational Systems Data Warehouse
Population Data
Warehouse Business Information
Interface
What is Operational and Decision Support System :What is Operational and Decision Support System Operational Systems
Making the wheels of Business Turn
Take an order
Process a claim
Make shipment
Generate an invoice
Receive cash
Reserve an airline seat
What is Operational and Decision Support System (Contd…) :Decision Support System
Watching the wheels of business turn
Show the top selling products
Show the problem regions
Tell me why (drill down)
Let me see other data (drill across)
Alert me when a district sells below target What is Operational and Decision Support System (Contd…)
Difference :Difference Operational
Current Values
Optimized for transaction
High
Read, update, delete
Predictable, repetitive
Sub seconds
Large Number Informational
Archived, derived, optimized
Optimized for complex queries
Medium to Low
Read
Ad hoc, random, Heuristic
Several Seconds to Minutes
Relatively Small number Data Content
Data Structure
Access Frequency
Access Type
Usage
Response Time
Users
Warehouse is a Specialized DB :Warehouse is a Specialized DB Standard DB
Mostly updates
Many small transactions
Mb - Gb of data
Current snapshot
Index/hash on p.k.
Raw data
Thousands of users (e.g., clerical users) Warehouse
Mostly reads
Queries are long and complex
Gb - Tb of data
History
Lots of scans
Summarized, reconciled data
Hundreds of users (e.g., decision-makers, analysts)
Warehousing and Industry :Warehousing and Industry Warehousing is big business
$2 billion in 1995
$3.5 billion in early 1997
About $8 billion in 1998 [Metagroup]
WalMart has largest warehouse
900-CPU, 2,700 disk, 23 TB Teradata system
~7TB in warehouse
40-50GB per day
Data Warehousing: Two Distinct Issues :Data Warehousing: Two Distinct Issues (1) How to get information into warehouse
“Data warehousing”
(2) What to do with data once it’s in warehouse
“Warehouse DBMS”
Both rich research areas
Industry has focused on (2)
Thank You Very Much :Thank You Very Much