Data Warehouse Architecture Lecture 1

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Data Warehouse Architecture : 

Data Warehouse Architecture Lecture 1

Lecture Objectives : 

Lecture Objectives Define Data Warehouse Architecture Define Data Warehouse and Data Mart Discuss: The Great Debate Present a Data Warehouse Architectural Framework

Information Systems Architecture : 

Information Systems Architecture Information Systems Architecture is the process of making the key choices that are essential to the development of an information system. Architecture includes: Guiding Principles: Approaches/philosophies “Logical” representations of a system Hardware/Operating System Computing model: client/server vs traditional vs Web-based Tools and technologies It is key, when making these choices that they are: Requirements driven Take into consideration operational, technical and financial feasibility Made within an architectural framework

Architecture Drivers : 

Architecture Drivers There are a lot of Drivers of Architecture Architecture

Architecture to Design to Implementation : 

Architecture to Design to Implementation Business Strategy Line of Code / Process Step Employee / Computer Architecture Design Implementation Removal of Choices Looking at the Work of Developing a System as a set of Choices, Architecture can be Described as Highest Level Choices

How is Architecture Different from Design? : 

How is Architecture Different from Design? Its not – Architecture can be considered ‘high-level’ design Architecture includes those aspects of the design that are essential to the information system Architecture Example: Users must be able to self-serve (guiding principle) “We will use a “hub and spoke” design where data will be placed in a central data warehouse, then be propagated to one or more data marts. (approach) We will normalize data in the central warehouse and use a dimensional design in the data marts (approach) We will use Oracle 8i as our DBMS (technical architecture)

Architecture vs Design : 

Architecture vs Design Not Architecture: The Order subject area will be composed of the following tables: order_fact, customer_dim, product_dim and time_dim The customer_dim table will have the following attributes…….

The Value of Architecture : 

The Value of Architecture Communication: To business sponsors, and business users Between members of the project team Planning: Cross Check for Project Plan Ensure that all important components of the data warehouse are accounted for Flexibility and Growth Thinking about overall architecture will reduce risk associated with the ‘success’ of the data warehouse Learning Productivity and Reuse

What’s different about DW Architecture? : 

What’s different about DW Architecture? Transaction processing systems – growth is (relatively) predictable Example: A company uses SAP for order processing They are opening a new retail store They predict (based on experience) 2000 transactions per week To process this volume, we need 3 workstations to capture the transactions Peak time each day is 11-2 when 50% of transactions occur

What’s Different About Data Warehouse Architecture? : 

What’s Different About Data Warehouse Architecture? Success drives explosive growth More users More (complex) queries More data Performance is unpredictable Unpredictable queries Unpredictable use patterns Growth Time Siebel SAP R/3 Data Warehouse

The Great Data Warehouse Architecture Debate : 

The Great Data Warehouse Architecture Debate Bill Inmon: “The enterprise data warehouse” Ralph Kimball: “data marts” The compromise: “Hub and Spoke” or “Federated” models If you build it, They will come

What is a Data Mart? : 

What is a Data Mart? A data mart is a collection of subject areas organized for decision support based on the specific needs of a given user group. Each mart may widely different from others (as we will see) Typically, data marts are built on the dimensional data model: Facts – things that the organization wants to measure: revenue, orders, shipments, purchases, etc. Dimensions – the means by which the organization wants to analyze the measures (facts) – by customer, by time, by product – BY ANY COMBINATION!!

What is a Data Mart? : 

What is a Data Mart? There are two kinds of data marts--dependent and independent. A dependent data mart is one whose source is a data warehouse. An independent data mart is one whose source is the legacy applications environment. All dependent data marts are fed by the same source--the data warehouse. Each independent data mart is fed uniquely and separately by the legacy applications environment. Dependent data marts are architecturally and structurally sound. Independent data marts have a number of significant issues

Data Warehouse Definition #3 : 

Data Warehouse Definition #3 Provides the infrastructure to feed data marts, which supply data to users. Responsible for: Acquiring data Cleaning data Transforming data Managing granular data Managing ALL data required for a specific subject Can be store data in a number of ways: Flat files RDBMS Other technologies (compression)

Data Warehouse Definition #3 : 

Data Warehouse Definition #3 Ensures that: Impact on source systems is minimized Consistent data definitions across data marts Conformed dimensions!! Appropriate history is maintained

Architectural Framework : 

Architectural Framework Three Views Data (content of warehouse) Technical (functions to be performed by the warehouse) Infrastructure (Hardware, Communications) Four levels of detail for each view Business Requirements Architecture Models Detailed Models Implementation

Data Architecture : 

Data Architecture Business Requirements What information do we need to make better business decisions? What data assets are available? Architecture Models and Documents - The Dimensional Model What are the major entities (the facts and dimensions) that make up this information? How do they relate to each other? How should these entities be structured? Detailed Models and Specs - The Logical and Physical Models What are the individual elements, their definitions, domains, and rules for derivation? What are the sources and how do they map to the targets? Implementation Create and document the databases, indexes, backup procedures, etc.

Technical Architecture – Back Room : 

Technical Architecture – Back Room Business Requirements and Audit How will we get at the data, transform it, and make it available to our users? How is this done today? Architecture Models and Documents What are the specific capabilities needed to get the data into a usable form in the desired locations at the appropriate times? What are the major data stores and where should they be located? Detailed Models and Specs What standards and products provide the needed capabilities? How will we hook them together? What are our development standards for code management., naming, etc.? Implementation Write the extracts and loads. Automate the process. Document the process.

Technical Architecture – Front Room : 

Technical Architecture – Front Room Business Requirements and Audit What are the major business issues we face? How will we measure these issues? How do we want to analyze the data? Architecture Models and Documents What will users need to get the information out in a usable form? What major classes of analysis and reporting do we need to provide? What are the priorities? Detailed Models and Specs What tools do we select to provide capabilities What are the specifics for the report templates? Who needs them? How often? How do we distribute them?

Technical Architecture – Front Room : 

Technical Architecture – Front Room Implementation Implement the reporting and analysis environment. Build the initial report set. Train the users. Document.

Infrastructure : 

Infrastructure Business Requirements and Audit What hardware and network capabilities do we need to be successful? What do we currently have in place? Architecture Models and Documents Where is the data coming from and going to? Do we have enough calculation and storage capacity? What are the specific capabilities we are counting on? Do they exist? Who is responsible for them? Detailed Models and Specs How do we interact with these capabilities? What are the system utilities, calls, APIs, etc.? Implementation Install and test new infrastructure components. Connect the sources to the targets to the desktop. Document.

Technical Data Warehouse Framework : 

Technical Data Warehouse Framework

Technical Data Warehouse Framework : 

Technical Data Warehouse Framework Access Engineering Acquisition Delivery Extracted Data Data Marts Re-engd. Data Metadata Web Server Client/ Server Metadata Management Storage Workflow and System Management Data Staging Services “Back Room” “Front Room”

authorStream Live Help