logging in or signing up Mladen Vouk Astro workflow Reinardo Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 106 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript On Large Data-Flow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Datasets using Scientific Workflow Engineering: On Large Data-Flow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Datasets using Scientific Workflow Engineering Presenter: Mladen A. VoukTeam(Scientific Process Automation - SPA): Team (Scientific Process Automation - SPA) Sangeeta Bhagwanani (MS student - GUI interfaces) John Blondin (NCSU Faculty,TSI PI) Zhengang Cheng (PhD student – services, V&V) Dan Colonnese (MS student, graduated, workflow grid and reliability issues) Ruben Lobo (PhD student, packaging) Pierre Moualem (MS student, fault-tolerance) Jason Kekas (PhD student, Technical Support) Phoemphun Oothongsap (NCSU, Postdoc, high-throughput flows) Elliot Peele (NCSU, Technical Support) Mladen A. Vouk (NCSU faculty, SPA PI) Brent Marinello (NCSU, workflows extensions, Others …NC State researchers are simulating the death of a massive star leading to a supernova explosion. Of particular interest is the dynamics of the shock wave generated by the initial implosion of the star which ultimately destroys the star as a highly energetic supernova. : NC State researchers are simulating the death of a massive star leading to a supernova explosion. Of particular interest is the dynamics of the shock wave generated by the initial implosion of the star which ultimately destroys the star as a highly energetic supernova. Key Current TaskEmulating “live” workflows: Key Current Task Emulating “live” workflowsKey Issue: Key Issue Very important to distinguish between a custom-made workflow solution and a more cannonical set of operations, methods, and solutions that can be composed into a scientific workflow. Complexity, skill level needed to implement, usability, maintainability, “standardization” e.g., sort, uniq, grep, ftp, ssh on unix boxes vs. SAS (that can do sorting), home-made sort, SABUL, bbcp (free, but not standard), etc. Topic – Computational Astrophysics : Topic – Computational Astrophysics Dr. Blondin is carrying out research in the field of Circumstellar Gas-Dynamics. The numerical hydrodynamical code VH-1 is used on supercomputers, to study a vast array of objects observed by astronomers both from ground-based observatories and from orbiting satellites. The two primary subjects under investigation are interacting binary stars - including normal stars like the Algol binary, and compact object systems like the high mass X-ray binary SMC X-1 - and supernova remnants - from very young, like SNR 1987a, to older remnants like the Cygnus Loop. Other astrophysical processes of current interest include radiatively driven winds from hot stars, the interaction of stellar winds with the interstellar medium, the stability of radiative shockwaves, the propagation of jets from young stellar objects, and the formation of globular clusters. Slide7: Input Data Highly Parallel Compute Output ~500x500 files Aggregate to ~500 files (< 10+GB each) HPSS archive Data Depot Logistic Network L-Bone Local Mass Storage 14+TB) Aggregate to one file (~1 TB each) Viz Wall Viz Client Local 44 Proc. Data Cluster - data sits on local nodes for weeks Viz Software Workflow - Abstraction: Workflow - Abstraction Model SendData Merge & Backup To VizWall Parallel Computation RecvData Parallel Visualization Data Mover Channel (e.g. LORS, BCC, SABUL, FC over SONET Split & Viz Web or Client GUI Web Services Head Node Services Head Node Services Mass Storage Fiber C. or Local NFS Model Merge Backup Move Split Viz Construct Orchestrate Monitor/Steer Change Stop/Start ControlCurrent and Future Bottlenecks: Current and Future Bottlenecks Computing Resources and Computational Speed (1000+ Cray X1 processors, compute times of 30+ hrs, wait time) Storage and Disks (14+ TB, reliable and sustainable transfer speeds 300+ MB/s , Automation Reliable and Sustainable Network Transfer Rates (300+ MB/s) Bottlenecks (B-specific): Bottlenecks (B-specific) Supercomputer, Storage, HPSS, Ensight Memory Average per job wait time is 24-48 hrs (could be longer if more processors are requested or more time slices are calculated). One run – a 6 hrs (run time) on Cray X1 currently uses 140 processors, and produces 10 time steps. Each time step has 140 Fortran-binary files (28 GB total). Hence, currently, this is 280 MB per 6hr run. Takes about 300 to 500 slices for full visualization (30 to 50 runs , and about 280x(300 to 500)= 10 to 14 TB of space). The 140 files of a time step are merged into one (1) netcdf file (takes about 10 min) BBCP the file to NCSU at about 30 MB/s, or about 15 min per time slice (this can be done in parallel with next time-slice computation). In the future network transfer speeds and disk access speeds may be an issues. B-specific Top-Level W/F Operations: B-specific Top-Level W/F Operations Operators: Create W/F (reserve resources), Run Model, Backup Output, PostProcess Output (e.g., Merge, Split), MoveData, AnalyzeData (Viz, other?), Monitor Progress (state, audit, backtrack, errors, provenance), Modify Parameters States: Modeling, Backup, Postprocessing (A, .. Z), MovingData, Analyzing Remotely Creators: CreateWF, Model?, Expand Modifiers: Merge, Split, Move, Backup, Start, Stop, ModifyParameters Behaviors: Monitor, Audit, Visualize, Error/Exception Handling, Data Provenance, … Goal: Ubiquitous Canonical Operations for Scientific W/F Support: Goal: Ubiquitous Canonical Operations for Scientific W/F Support Fast data transfer from A to B (e.g., LORS, SABUL, GridFTP, BBCP?, other …) Database access Stream merging and splitting Flow monitoring Tracking, Auditing, provenance Verification and Validation Communication service (web services, grid services, xmlrpc, etc.) Other …Issues (1): Issues (1) Communication Coupling (loose, tight, v. tight, code-level) and Granularity (fine, medium?, coarse) Communication Methods (e.g., ssh tunnels, xmprpc, snmp, web/grid services,etc.) – e.g., apparently poor support for Cray Storage issues (e.g., p-netcdf support, bandwidth) Direct and Indirect Data Flows (functionality, throughput, delays, other QoS parameters) End-to-end performance Level of abstraction Workflow description language(s) and exchange issues – interoperability “Standard” scientific computing “W/F functions” Issues (2): Issues (2) Problem is currently similar to old-time punched-card job submissions (long turn-around time, can be expensive due to front end computational resource I/O bottleneck) - need up front verification and validation – things will change Back-end bottleneck due to hierarchical storage issues (e.g., retrieval from HPSS) Long term workflow state preservation - needed Recovery (transfers, other failures) – more needed Tracking data and files - needed Who maintains equipment, storage, data, scripts, workflow elements? Elegant solutions my not be good solutions from the perspective of autonomy. EXTREMELY IMPORTANT!!! – We are trying to get out of the business of totally custom-made solutions. Workflow - Abstraction: Workflow - Abstraction Model SendData Merge & Backup To VizWall Parallel Computation RecvData Parallel Visualization Data Mover Channel (e.g. LORS, SABUL, FC over SONET Split & Viz Web or Client GUI Web Services Head Node Services Head Node Services Mass Storage Fiber C. or Local NFS Model Merge Backup Move Split Viz Construct Orchestrate Monitor/Steer Change Stop/Start Control Goal: 2 -3 Gbps TRates End-To-End Goal: 1TB per NightCommunications: Communications Web/Java-based GUI Web Services for Orchestration - overall and less than tightly coupled sub-workflows LSF and MPI for parallel computation Scripts – (in this example csh/sh based, could be Perl, Python, etc.) on local machines – interpreted language High-level programming language for simulations, complex data movement algorithms, and similar – compiled language B-specific Top-Level W/F Operations: B-specific Top-Level W/F Operations Operators: Create W/F (reserve resources), Run Model, Backup Output, PostProcess Output (e.g., Merge, Split), MoveData, AnalyzeData (Viz, other?), Monitor Progress (state, audit, backtrack, errors, provenance), Modify Parameters States: Modeling, Backup, Postprocessing(A, .. Z), MovingData, Analyzing Remotely Constructor: CreateWF, Model?, Expand Modifiers: Merge, Split, Move, Backup, Start, Stop, ModifyParameters Behaviors: Monitor, Audit, Visualize, … You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Mladen Vouk Astro workflow Reinardo Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 106 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript On Large Data-Flow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Datasets using Scientific Workflow Engineering: On Large Data-Flow Scientific Workflows: An Astrophysics Case Study Integration of Heterogeneous Datasets using Scientific Workflow Engineering Presenter: Mladen A. VoukTeam(Scientific Process Automation - SPA): Team (Scientific Process Automation - SPA) Sangeeta Bhagwanani (MS student - GUI interfaces) John Blondin (NCSU Faculty,TSI PI) Zhengang Cheng (PhD student – services, V&V) Dan Colonnese (MS student, graduated, workflow grid and reliability issues) Ruben Lobo (PhD student, packaging) Pierre Moualem (MS student, fault-tolerance) Jason Kekas (PhD student, Technical Support) Phoemphun Oothongsap (NCSU, Postdoc, high-throughput flows) Elliot Peele (NCSU, Technical Support) Mladen A. Vouk (NCSU faculty, SPA PI) Brent Marinello (NCSU, workflows extensions, Others …NC State researchers are simulating the death of a massive star leading to a supernova explosion. Of particular interest is the dynamics of the shock wave generated by the initial implosion of the star which ultimately destroys the star as a highly energetic supernova. : NC State researchers are simulating the death of a massive star leading to a supernova explosion. Of particular interest is the dynamics of the shock wave generated by the initial implosion of the star which ultimately destroys the star as a highly energetic supernova. Key Current TaskEmulating “live” workflows: Key Current Task Emulating “live” workflowsKey Issue: Key Issue Very important to distinguish between a custom-made workflow solution and a more cannonical set of operations, methods, and solutions that can be composed into a scientific workflow. Complexity, skill level needed to implement, usability, maintainability, “standardization” e.g., sort, uniq, grep, ftp, ssh on unix boxes vs. SAS (that can do sorting), home-made sort, SABUL, bbcp (free, but not standard), etc. Topic – Computational Astrophysics : Topic – Computational Astrophysics Dr. Blondin is carrying out research in the field of Circumstellar Gas-Dynamics. The numerical hydrodynamical code VH-1 is used on supercomputers, to study a vast array of objects observed by astronomers both from ground-based observatories and from orbiting satellites. The two primary subjects under investigation are interacting binary stars - including normal stars like the Algol binary, and compact object systems like the high mass X-ray binary SMC X-1 - and supernova remnants - from very young, like SNR 1987a, to older remnants like the Cygnus Loop. Other astrophysical processes of current interest include radiatively driven winds from hot stars, the interaction of stellar winds with the interstellar medium, the stability of radiative shockwaves, the propagation of jets from young stellar objects, and the formation of globular clusters. Slide7: Input Data Highly Parallel Compute Output ~500x500 files Aggregate to ~500 files (< 10+GB each) HPSS archive Data Depot Logistic Network L-Bone Local Mass Storage 14+TB) Aggregate to one file (~1 TB each) Viz Wall Viz Client Local 44 Proc. Data Cluster - data sits on local nodes for weeks Viz Software Workflow - Abstraction: Workflow - Abstraction Model SendData Merge & Backup To VizWall Parallel Computation RecvData Parallel Visualization Data Mover Channel (e.g. LORS, BCC, SABUL, FC over SONET Split & Viz Web or Client GUI Web Services Head Node Services Head Node Services Mass Storage Fiber C. or Local NFS Model Merge Backup Move Split Viz Construct Orchestrate Monitor/Steer Change Stop/Start ControlCurrent and Future Bottlenecks: Current and Future Bottlenecks Computing Resources and Computational Speed (1000+ Cray X1 processors, compute times of 30+ hrs, wait time) Storage and Disks (14+ TB, reliable and sustainable transfer speeds 300+ MB/s , Automation Reliable and Sustainable Network Transfer Rates (300+ MB/s) Bottlenecks (B-specific): Bottlenecks (B-specific) Supercomputer, Storage, HPSS, Ensight Memory Average per job wait time is 24-48 hrs (could be longer if more processors are requested or more time slices are calculated). One run – a 6 hrs (run time) on Cray X1 currently uses 140 processors, and produces 10 time steps. Each time step has 140 Fortran-binary files (28 GB total). Hence, currently, this is 280 MB per 6hr run. Takes about 300 to 500 slices for full visualization (30 to 50 runs , and about 280x(300 to 500)= 10 to 14 TB of space). The 140 files of a time step are merged into one (1) netcdf file (takes about 10 min) BBCP the file to NCSU at about 30 MB/s, or about 15 min per time slice (this can be done in parallel with next time-slice computation). In the future network transfer speeds and disk access speeds may be an issues. B-specific Top-Level W/F Operations: B-specific Top-Level W/F Operations Operators: Create W/F (reserve resources), Run Model, Backup Output, PostProcess Output (e.g., Merge, Split), MoveData, AnalyzeData (Viz, other?), Monitor Progress (state, audit, backtrack, errors, provenance), Modify Parameters States: Modeling, Backup, Postprocessing (A, .. Z), MovingData, Analyzing Remotely Creators: CreateWF, Model?, Expand Modifiers: Merge, Split, Move, Backup, Start, Stop, ModifyParameters Behaviors: Monitor, Audit, Visualize, Error/Exception Handling, Data Provenance, … Goal: Ubiquitous Canonical Operations for Scientific W/F Support: Goal: Ubiquitous Canonical Operations for Scientific W/F Support Fast data transfer from A to B (e.g., LORS, SABUL, GridFTP, BBCP?, other …) Database access Stream merging and splitting Flow monitoring Tracking, Auditing, provenance Verification and Validation Communication service (web services, grid services, xmlrpc, etc.) Other …Issues (1): Issues (1) Communication Coupling (loose, tight, v. tight, code-level) and Granularity (fine, medium?, coarse) Communication Methods (e.g., ssh tunnels, xmprpc, snmp, web/grid services,etc.) – e.g., apparently poor support for Cray Storage issues (e.g., p-netcdf support, bandwidth) Direct and Indirect Data Flows (functionality, throughput, delays, other QoS parameters) End-to-end performance Level of abstraction Workflow description language(s) and exchange issues – interoperability “Standard” scientific computing “W/F functions” Issues (2): Issues (2) Problem is currently similar to old-time punched-card job submissions (long turn-around time, can be expensive due to front end computational resource I/O bottleneck) - need up front verification and validation – things will change Back-end bottleneck due to hierarchical storage issues (e.g., retrieval from HPSS) Long term workflow state preservation - needed Recovery (transfers, other failures) – more needed Tracking data and files - needed Who maintains equipment, storage, data, scripts, workflow elements? Elegant solutions my not be good solutions from the perspective of autonomy. EXTREMELY IMPORTANT!!! – We are trying to get out of the business of totally custom-made solutions. Workflow - Abstraction: Workflow - Abstraction Model SendData Merge & Backup To VizWall Parallel Computation RecvData Parallel Visualization Data Mover Channel (e.g. LORS, SABUL, FC over SONET Split & Viz Web or Client GUI Web Services Head Node Services Head Node Services Mass Storage Fiber C. or Local NFS Model Merge Backup Move Split Viz Construct Orchestrate Monitor/Steer Change Stop/Start Control Goal: 2 -3 Gbps TRates End-To-End Goal: 1TB per NightCommunications: Communications Web/Java-based GUI Web Services for Orchestration - overall and less than tightly coupled sub-workflows LSF and MPI for parallel computation Scripts – (in this example csh/sh based, could be Perl, Python, etc.) on local machines – interpreted language High-level programming language for simulations, complex data movement algorithms, and similar – compiled language B-specific Top-Level W/F Operations: B-specific Top-Level W/F Operations Operators: Create W/F (reserve resources), Run Model, Backup Output, PostProcess Output (e.g., Merge, Split), MoveData, AnalyzeData (Viz, other?), Monitor Progress (state, audit, backtrack, errors, provenance), Modify Parameters States: Modeling, Backup, Postprocessing(A, .. Z), MovingData, Analyzing Remotely Constructor: CreateWF, Model?, Expand Modifiers: Merge, Split, Move, Backup, Start, Stop, ModifyParameters Behaviors: Monitor, Audit, Visualize, …