Building Steps of Data Warehouse By Jyotshna

Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

By: rudesaint (8 month(s) ago)

good. but we it would have been better if some practical procedures are given

Presentation Transcript

Steps To Build The Data Warehouse:

S teps To Build The Data Warehouse By : KUMARI JYOTSHNA Roll No : 09513

Slide 2:

Data Warehouse : A Data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually reside at a single site. Data warehouses are constructed via process of data cleaning , data integration , data transformation, data loading, and periodic data refreshing. Definition

Slide 3:

Realize the value of data Data / information is an asset Methods to realize the value, (reporting, Analysis, etc.) Make better decisions Turn data into information Create competitive advantage Methods to support the decision making process, (EIS, DSS, etc.) The Purpose of Data Warehousing

:

Staging Area A preparatory repository where transaction data can be transformed for use in the data warehouse Data Mart Traditional dimensionally modeled set of dimension and fact tables A data warehouse is the union set of data marts Operational Data Store(ODS) Modeled to support near real-time reporting needs. Data Warehouse Components

Slide 5:

Relational Databases ERP System Purchased Data Legacy data Extraction Cleansing Metadata Repository Data Warehouse Optimizer Loader Data Warehouse Engine Analyze Query Data Warehouse Functionality

Slide 6:

Evolution architecture of data warehouse

Slide 7:

Steps in Developing Data Warehouse

Slide 8:

The 1st step before develop data warehouse is that the data source will be identified. We need to figure out what are the data that are required to be put into our data warehouse. There are 2 types of data sources that need to be considered , internal and external data source. Internal data source will be the data that already exist in the system . The external data source is the data that does not exist within system. Identify the data source

Slide 9:

Each data warehouse has the different requirements. Therefore, a customized ETL tool is the better solution in order to fulfill the requirements. For the library data warehouse, we choose our own extract program. We deal the inconsistency issues with our own transformation method and finally we load the data into the data warehouse database. Build customized ETL tool

Slide 10:

This can be the most time consuming part w here we need to grab the data from various data source and store it into the staging database. Much of the time and effort are needed in writing a custom program to transfer the data from sources into staging database. As a result, during extraction, we need to determine which database system will be used for the staging area and also figure out what are Extraction

Contd….:

Contd …. the necessary data that are needed before grab it. The decline in the cost of hardware and storage has overcome the issues on avoiding the data duplication and also their worries on lack of storage as storing the excessive or unnecessary data. However, there is probably no reason to store the unnecessary data which had been identified not being useful in decision makingprocess . Therefore, there is a necessary for extract only the relevant data before bringing into data warehouse.

Slide 12:

A fter extracting the data from various data sources, transformation is needed to ensure the data consistency. In order to transform the data into data warehouse properly, you need to figure out a way of mapping the external data sources fields to the data warehouse fields. Transformation can be performed during data extraction or while loading the data into data warehouse. This integration can be a complex issue when the number of data sources getting bigger. Transformation

Slide 13:

Once the extracting process , transforming and cleansing has been done , the data are loaded into the data warehouse . The loading of data can be categorized into two types; the loading of data that currently contain in the operational database and the loading of the updates to the data warehouse from the changes that have occurred in the operational database. As to guarantee the freshness of data, data Loading

Contd……….:

Contd ………. warehouse is needed to be refreshed to update its data. Many issues are needed to be considered especially during loading the updates to the data warehouse. While updating the data warehouse, we need to ensure that no data are loosed and also to ensure a minimum overhead over the scanning existing file process.

Slide 15:

a THANK YOU