logging in or signing up Yu sdm 2006 ft Justine Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 69 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 21, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Testing Efficiency of Parallel I/O Software: Testing Efficiency of Parallel I/O Software Weikuan Yu, Jeffrey Vetter December 6, 2006Testing Parallel IO at ORNL: Testing Parallel IO at ORNL Earlier analysis of scientific codes running at ORNL/LCF Most users have limited I/O capability in their applications because they have not had access to portable, widespread PIO Seldom direct use of MPI-IO Little use of high-level IO middleware: PnetCDF or HDF5 Large variance in performance of parallel IO using different software stacks (as demonstrated in VH1 experiments) Ongoing work Collect application IO access pattern and Lustre server IO traces, with tau, craypat, mpiP, etc Testing other parallel IO components over Lustre Analysis, benchmarking and optimization of data intensive scientific codes Parallel IO Optimization at ORNL: Parallel IO Optimization at ORNL Parallel IO over Lustre A new file system still relies on a generic ADIO implementation Generations of platforms at ORNL demands efficient parallel IO Performance with Jaguar Good read/write bandwidth for large shared single file Not scalable for small read/write and parallel IO management operations (metadata) Approaches for Optimizations Providing a specific, ADIO implementation well-tuned for Lustre Investigating parameters for adjusting striping pattern Exploited Lustre file joining Regular files can be joined in place Split writing and hierarchical striping Developed a prototype on an 80-node Linux cluster Paper submitted to CCGrid 2006, available if interestedSome Characteristics of Lustre IO Performance: Some Characteristics of Lustre IO Performance Performance can be significantly affected by stripe width Need to introduce flexibilities in striping pattern Exploit file joining for growing stripe width with increasing file sizeExplore Lustre File Joining: Explore Lustre File Joining Split writing: Create/write a shared file as multiple small files, aka subfiles Temporary structure to hold file attributes Subfiles joined at the closing time open read/write close File Attributes Subfiles Joined file Diagram of Split WritingHierarchical Striping: 0 Hierarchical Striping Hierarchical striping Create another level of striping pattern for subfiles Allow maximum coverage of Lustre storage targets Mitigate the impact of striping overhead 1 2 3 S-2 S-1 S+1 S 2S-2 S+2 S+3 2S-1 nS+1 nS nS-2 nS+2 nS+3 nS-1 Diagram of Hierarchical Striping (HS) (HS width: N+1; HS size: S*w) subfile 0 subfile 1 subfile n (Stripe width: 2; Stripe size: w) ost0 ost1 ost2 ost3 ost2n ost2n+1Evaluation: Evaluation Table 1: Scalability of Management Operations Scalability of file open and file resize improved dramatically Table 2: Performance of Collective Read/Write Write/Read performance improved dramatically for new files Read/Write of an existing join file is not well performing due to a non-optimized IO path for a join file in LustreResults on Scientific Benchmarks –MPI-Tile-IO and BT/IO: Results on Scientific Benchmarks – MPI-Tile-IO and BT/IO IO Pattern as represented by BT-IO can be improved if the number of iterations is small. It may help if an arbitrary number of files can be joined. Write Performance in MPI-Tile-IO can be improved dramatically Read performance in MPI-Tile-IO cannot be improved by file joining because reading an existing join file does not perform wellConclusions: Conclusions Parallel IO over Lustre Split writing can improve metadata management operations Stripe overhead can be mitigated with careful augmentations of stripe width Lustre file joining Race conditions when joining files multiple processes Low read/write performance on an existing file Not possible for arbitrary hierarchical striping because limited number of files can be joined Need improvement before its production usage parallel IO Next Steps Continue optimization of parallel IO at ORNL, Adapting the earlier techniques to liblustre on XT3/XT4 Develop/Exploit other features, group locks and dynamic stripe width Adapting parallel I/O and parallel FS to wide area collaborative science with other IO protocols such as pNFS and Logistical Networking You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
Yu sdm 2006 ft Justine Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 69 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 21, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Testing Efficiency of Parallel I/O Software: Testing Efficiency of Parallel I/O Software Weikuan Yu, Jeffrey Vetter December 6, 2006Testing Parallel IO at ORNL: Testing Parallel IO at ORNL Earlier analysis of scientific codes running at ORNL/LCF Most users have limited I/O capability in their applications because they have not had access to portable, widespread PIO Seldom direct use of MPI-IO Little use of high-level IO middleware: PnetCDF or HDF5 Large variance in performance of parallel IO using different software stacks (as demonstrated in VH1 experiments) Ongoing work Collect application IO access pattern and Lustre server IO traces, with tau, craypat, mpiP, etc Testing other parallel IO components over Lustre Analysis, benchmarking and optimization of data intensive scientific codes Parallel IO Optimization at ORNL: Parallel IO Optimization at ORNL Parallel IO over Lustre A new file system still relies on a generic ADIO implementation Generations of platforms at ORNL demands efficient parallel IO Performance with Jaguar Good read/write bandwidth for large shared single file Not scalable for small read/write and parallel IO management operations (metadata) Approaches for Optimizations Providing a specific, ADIO implementation well-tuned for Lustre Investigating parameters for adjusting striping pattern Exploited Lustre file joining Regular files can be joined in place Split writing and hierarchical striping Developed a prototype on an 80-node Linux cluster Paper submitted to CCGrid 2006, available if interestedSome Characteristics of Lustre IO Performance: Some Characteristics of Lustre IO Performance Performance can be significantly affected by stripe width Need to introduce flexibilities in striping pattern Exploit file joining for growing stripe width with increasing file sizeExplore Lustre File Joining: Explore Lustre File Joining Split writing: Create/write a shared file as multiple small files, aka subfiles Temporary structure to hold file attributes Subfiles joined at the closing time open read/write close File Attributes Subfiles Joined file Diagram of Split WritingHierarchical Striping: 0 Hierarchical Striping Hierarchical striping Create another level of striping pattern for subfiles Allow maximum coverage of Lustre storage targets Mitigate the impact of striping overhead 1 2 3 S-2 S-1 S+1 S 2S-2 S+2 S+3 2S-1 nS+1 nS nS-2 nS+2 nS+3 nS-1 Diagram of Hierarchical Striping (HS) (HS width: N+1; HS size: S*w) subfile 0 subfile 1 subfile n (Stripe width: 2; Stripe size: w) ost0 ost1 ost2 ost3 ost2n ost2n+1Evaluation: Evaluation Table 1: Scalability of Management Operations Scalability of file open and file resize improved dramatically Table 2: Performance of Collective Read/Write Write/Read performance improved dramatically for new files Read/Write of an existing join file is not well performing due to a non-optimized IO path for a join file in LustreResults on Scientific Benchmarks –MPI-Tile-IO and BT/IO: Results on Scientific Benchmarks – MPI-Tile-IO and BT/IO IO Pattern as represented by BT-IO can be improved if the number of iterations is small. It may help if an arbitrary number of files can be joined. Write Performance in MPI-Tile-IO can be improved dramatically Read performance in MPI-Tile-IO cannot be improved by file joining because reading an existing join file does not perform wellConclusions: Conclusions Parallel IO over Lustre Split writing can improve metadata management operations Stripe overhead can be mitigated with careful augmentations of stripe width Lustre file joining Race conditions when joining files multiple processes Low read/write performance on an existing file Not possible for arbitrary hierarchical striping because limited number of files can be joined Need improvement before its production usage parallel IO Next Steps Continue optimization of parallel IO at ORNL, Adapting the earlier techniques to liblustre on XT3/XT4 Develop/Exploit other features, group locks and dynamic stripe width Adapting parallel I/O and parallel FS to wide area collaborative science with other IO protocols such as pNFS and Logistical Networking