ETL Queues for Active Data Warehousing-Presentation-Final

Views:
 
Category: Education
     
 

Presentation Description

ETL Queues for Active Data Warehousing

Comments

Presentation Transcript

ETL Queues for Active Data Warehousing : 

ETL Queues for Active Data Warehousing Alexandros Karakasidis , Panos Vassiliadis, Evaggelia Pitoura Univ. of Ioannina, QIS 2005, June 17, 2005, Baltimore, MD, USA

Presented By: 

Presented By Sidra Sarwar Rana Aafia Kamal

Agenda : 

Agenda Few Concepts Problem Statement Proposed Solution Experimental Results Related Work Summary

Few Concepts: 

Few Concepts What is Data warehousing What is ETL

ETL: 

ETL

Problem Statement: 

Problem Statement Traditionally the refreshment of data warehouse performed Offline Data are extracted, transformed, cleaned and loaded to the warehouse These activities takes place during a Load window, usually at night, to avoid overload source production system Demand for higher level of Freshness

Proposed Solution: 

Proposed Solution Active Data warehousing Data warehouse updated as frequently as possible Challenging for various reasons Source

Proposed Solution: 

Proposed Solution Propose a framework for implementation of Active Data Warehousing With following Goals Maximum freshness of Data Smooth upgrade of the software at source Minimal overhead of source Stable interface at the warehouse side

Architecture Overview: 

Architecture Overview

Queue Theory for ETL Activities: 

Queue Theory for ETL Activities Model each ETL activity as queue in queuing architecture Relation between number of customers in the system N N= ג * T Mean response time of the system W= 1/( μ - ג ) and queue length L=þ(1-þ)

Queue Theory for ETL Activities: 

Queue Theory for ETL Activities Taxonomy of activities consists of following categories Filters Transformers Binary Operator

Experiment on Data Freshness of Online ETL: 

Experiment on Data Freshness of Online ETL Scenario (a): Transfer data inserted into the legacy application to the DW using various service rates. Scenario (b): 1.Filter 10% of incoming data through a selection predicate 2.Employ transformation to the first column of filtered data 3.Cummulative Aggregation 4.Finally Data Fed to data warehouse

Experiment on Data Freshness of Online ETL: 

Experiment on Data Freshness of Online ETL Scenario (c): 1.Filter 10% of incoming data 2.Additionaly Filter another 2% of the remaining data 3.Key operation applied to the first Colum of Data ,Stream is replicated along two branches

Experiment on Data Freshness of Online ETL: 

Experiment on Data Freshness of Online ETL Scenario (d): 1.Filter 10% of incoming data 2. values of first field to simulate value computation through functions 3.Transformation is applied, Stream is replicated along two branches

Queues for Scenarios: 

Queues for Scenarios

Data Freshness for each Scenario: 

Data Freshness for each Scenario

Related Work: 

Related Work Qingchun Jiang, Sharma Chakravarthy. Queueing analysis of relational operators for continuous data streams. In Proc. CIKM, New Orleans, Louisiana, USA, November 2003, 271-278 Daniel J. Abadi, Don Carney, Ugur Çetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2), 120-139, 2003 S. Babu, J. Widom. Continuous Queries over Data Streams. SIGMOD Record 30(3), 109-120, 2001

Related Work: 

Related Work D. Lomet, J. Gehrke. Special Issue on Data Stream Processing. Data Engineering Bulletin, 26(1), 2003 On-Time Data Warehousing with Oracle10g – Information at the Speed of your Business. An Oracle White Paper August 2003

Summary: 

Summary In terms of architecture isolating the ETL task in warehouse guarantees minimum performance overhead at the source Queue theory can be successfully employed for estimation of response of the Active Staging Area Satisfactory Data Freshness is achieved as a result of implementing proposed architecture

PowerPoint Presentation: 

Thank You!!!!!