logging in or signing up ETL Queues for Active Data Warehousing-Presentation-Final aSGuest125343 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 30 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 30, 2012 This Presentation is Public Favorites: 0 Presentation Description ETL Queues for Active Data Warehousing Comments Posting comment... Premium member Presentation Transcript ETL Queues for Active Data Warehousing : ETL Queues for Active Data Warehousing Alexandros Karakasidis , Panos Vassiliadis, Evaggelia Pitoura Univ. of Ioannina, QIS 2005, June 17, 2005, Baltimore, MD, USAPresented By: Presented By Sidra Sarwar Rana Aafia Kamal Agenda : Agenda Few Concepts Problem Statement Proposed Solution Experimental Results Related Work SummaryFew Concepts: Few Concepts What is Data warehousing What is ETLETL: ETLProblem Statement: Problem Statement Traditionally the refreshment of data warehouse performed Offline Data are extracted, transformed, cleaned and loaded to the warehouse These activities takes place during a Load window, usually at night, to avoid overload source production system Demand for higher level of FreshnessProposed Solution: Proposed Solution Active Data warehousing Data warehouse updated as frequently as possible Challenging for various reasons SourceProposed Solution: Proposed Solution Propose a framework for implementation of Active Data Warehousing With following Goals Maximum freshness of Data Smooth upgrade of the software at source Minimal overhead of source Stable interface at the warehouse sideArchitecture Overview: Architecture OverviewQueue Theory for ETL Activities: Queue Theory for ETL Activities Model each ETL activity as queue in queuing architecture Relation between number of customers in the system N N= ג * T Mean response time of the system W= 1/( μ - ג ) and queue length L=þ(1-þ)Queue Theory for ETL Activities: Queue Theory for ETL Activities Taxonomy of activities consists of following categories Filters Transformers Binary OperatorExperiment on Data Freshness of Online ETL: Experiment on Data Freshness of Online ETL Scenario (a): Transfer data inserted into the legacy application to the DW using various service rates. Scenario (b): 1.Filter 10% of incoming data through a selection predicate 2.Employ transformation to the first column of filtered data 3.Cummulative Aggregation 4.Finally Data Fed to data warehouseExperiment on Data Freshness of Online ETL: Experiment on Data Freshness of Online ETL Scenario (c): 1.Filter 10% of incoming data 2.Additionaly Filter another 2% of the remaining data 3.Key operation applied to the first Colum of Data ,Stream is replicated along two branchesExperiment on Data Freshness of Online ETL: Experiment on Data Freshness of Online ETL Scenario (d): 1.Filter 10% of incoming data 2. values of first field to simulate value computation through functions 3.Transformation is applied, Stream is replicated along two branchesQueues for Scenarios: Queues for ScenariosData Freshness for each Scenario: Data Freshness for each ScenarioRelated Work: Related Work Qingchun Jiang, Sharma Chakravarthy. Queueing analysis of relational operators for continuous data streams. In Proc. CIKM, New Orleans, Louisiana, USA, November 2003, 271-278 Daniel J. Abadi, Don Carney, Ugur Çetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2), 120-139, 2003 S. Babu, J. Widom. Continuous Queries over Data Streams. SIGMOD Record 30(3), 109-120, 2001Related Work: Related Work D. Lomet, J. Gehrke. Special Issue on Data Stream Processing. Data Engineering Bulletin, 26(1), 2003 On-Time Data Warehousing with Oracle10g – Information at the Speed of your Business. An Oracle White Paper August 2003Summary: Summary In terms of architecture isolating the ETL task in warehouse guarantees minimum performance overhead at the source Queue theory can be successfully employed for estimation of response of the Active Staging Area Satisfactory Data Freshness is achieved as a result of implementing proposed architecturePowerPoint Presentation: Thank You!!!!! You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ETL Queues for Active Data Warehousing-Presentation-Final aSGuest125343 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 30 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 30, 2012 This Presentation is Public Favorites: 0 Presentation Description ETL Queues for Active Data Warehousing Comments Posting comment... Premium member Presentation Transcript ETL Queues for Active Data Warehousing : ETL Queues for Active Data Warehousing Alexandros Karakasidis , Panos Vassiliadis, Evaggelia Pitoura Univ. of Ioannina, QIS 2005, June 17, 2005, Baltimore, MD, USAPresented By: Presented By Sidra Sarwar Rana Aafia Kamal Agenda : Agenda Few Concepts Problem Statement Proposed Solution Experimental Results Related Work SummaryFew Concepts: Few Concepts What is Data warehousing What is ETLETL: ETLProblem Statement: Problem Statement Traditionally the refreshment of data warehouse performed Offline Data are extracted, transformed, cleaned and loaded to the warehouse These activities takes place during a Load window, usually at night, to avoid overload source production system Demand for higher level of FreshnessProposed Solution: Proposed Solution Active Data warehousing Data warehouse updated as frequently as possible Challenging for various reasons SourceProposed Solution: Proposed Solution Propose a framework for implementation of Active Data Warehousing With following Goals Maximum freshness of Data Smooth upgrade of the software at source Minimal overhead of source Stable interface at the warehouse sideArchitecture Overview: Architecture OverviewQueue Theory for ETL Activities: Queue Theory for ETL Activities Model each ETL activity as queue in queuing architecture Relation between number of customers in the system N N= ג * T Mean response time of the system W= 1/( μ - ג ) and queue length L=þ(1-þ)Queue Theory for ETL Activities: Queue Theory for ETL Activities Taxonomy of activities consists of following categories Filters Transformers Binary OperatorExperiment on Data Freshness of Online ETL: Experiment on Data Freshness of Online ETL Scenario (a): Transfer data inserted into the legacy application to the DW using various service rates. Scenario (b): 1.Filter 10% of incoming data through a selection predicate 2.Employ transformation to the first column of filtered data 3.Cummulative Aggregation 4.Finally Data Fed to data warehouseExperiment on Data Freshness of Online ETL: Experiment on Data Freshness of Online ETL Scenario (c): 1.Filter 10% of incoming data 2.Additionaly Filter another 2% of the remaining data 3.Key operation applied to the first Colum of Data ,Stream is replicated along two branchesExperiment on Data Freshness of Online ETL: Experiment on Data Freshness of Online ETL Scenario (d): 1.Filter 10% of incoming data 2. values of first field to simulate value computation through functions 3.Transformation is applied, Stream is replicated along two branchesQueues for Scenarios: Queues for ScenariosData Freshness for each Scenario: Data Freshness for each ScenarioRelated Work: Related Work Qingchun Jiang, Sharma Chakravarthy. Queueing analysis of relational operators for continuous data streams. In Proc. CIKM, New Orleans, Louisiana, USA, November 2003, 271-278 Daniel J. Abadi, Don Carney, Ugur Çetintemel, et al. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2), 120-139, 2003 S. Babu, J. Widom. Continuous Queries over Data Streams. SIGMOD Record 30(3), 109-120, 2001Related Work: Related Work D. Lomet, J. Gehrke. Special Issue on Data Stream Processing. Data Engineering Bulletin, 26(1), 2003 On-Time Data Warehousing with Oracle10g – Information at the Speed of your Business. An Oracle White Paper August 2003Summary: Summary In terms of architecture isolating the ETL task in warehouse guarantees minimum performance overhead at the source Queue theory can be successfully employed for estimation of response of the Active Staging Area Satisfactory Data Freshness is achieved as a result of implementing proposed architecturePowerPoint Presentation: Thank You!!!!!