ETL in High Data Volume and High Usage Environment


Presentation Description

ETL in high Data Volume and High Usage Environment in Active Data Warehousing


Presentation Transcript

ETL in High Data Volume and High Usage Environment of Active Data Warehouse :

ETL in High Data Volume and High Usage Environment of Active Data Warehouse Presented by Aafia Kamal Sidra Sarwar Rana


Agenda Abstract Introduction Work Conclusion and Future Work Questions


Introduction What is Data Warehouse What is ETL Process

PowerPoint Presentation:

Who are my revenue generating customers that are making complaint calls? What is the network quality offered to my biggest corporate customer having 5000 subscribers? What are the card loading habits of my new customers? What is my total revenue?

PowerPoint Presentation:

Extract Extract Extract Transform Load

Problem Statement :

Problem Statement Traditionally the refreshment of data warehouse performed Offline Data are extracted, transformed, cleaned and loaded to the warehouse These activities takes place during a Load window, usually at night, to avoid overload source production system Demand for higher level of Freshness Data updated as frequently as possible

Proposed Solution:

Proposed Solution Active Data warehousing Data warehouse updated as frequently as possible Challenging for various reasons

Related Work:

Related Work Comparison of two works (techniques and architectures) and their results in terms of data freshness. Alexandros applies a queuing architecture for ETL process to effectively enable the active data warehouse. Ricardo presents a data warehouse loading methodology with ETL loading procedures to provide efficient data integration and high response time for OLAP.

Related Work :

Related Work Set of Experiments conducted in both of works to evaluate Data Freshness. Alexandros evaluate data freshness time with respect to the queue emptying rate and the number of ETL operations. Ricardo evaluates the data freshness time with respect to data warehouse loading strategy as well as the OLAP query response time. Our Conclusion We conclude Ricardo presents a better methodology; Data freshness time can not be reported completely without considering the data warehouse load.

Related Work:

Related Work Highlight two Active Data Warehousing Techniques ETL Queues for active data warehousing Change Data Capture (CDC) technique for active data warehousing Log based CDC Audit Columns Snapshot Differential

Conclusion and Future Work:

Conclusion and Future Work Active data warehouse is the latest requirement for the data warehouse as the need of fresh data for querying and analysis is becoming important for the customer, in order to get up to date and accurate results. Additional techniques need to be developed pertaining to the needs of active OLAP analysis allowing the users to continuously update their analysis and corresponding results.

Thank You!!!!:

Thank You!!!! Questions!!!

authorStream Live Help