logging in or signing up 42 3 CoolDude26 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 46 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Implicitly Parallel Programming Models: For Thousand-Core Microprocessors Hwu, Ryoo, Ueng, Kelm, Gelado, Stone, Kidd, Barghsorkhi, Mahesri, Tsao, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign Universtat Politecnica de Catalunya Implicitly Parallel Programming Models The Next Software Challenge: The Next Software Challenge Today, TLP models potentially make more effective use of area and power than large ILP CPU’s Multi-core, Many-core, Cell BE™, GPGPU, PPU, etc. Scaling from 4-core to 1000-core chips could happen in the next 15 years All semiconductor market domains converging to concurrent system platforms PCs, game consoles, mobile handsets, servers, supercomputers, networking, etc. We need to make these systems effectively execute valuable, demanding apps. One Plausible View: One Plausible View One might be able to stretch sequential programming models to small scale multi-core systems As for large scale, many-core systems, explicitly parallel programming models must be used I would like to convince you that it is the other way around. Implicitly Parallel Programming Models: Implicitly Parallel Programming Models Programmers choose or create algorithms that have desired amount of parallelism The parallel algorithms are expressed in traditional sequential programming languages With assertions of design properties and assumptions The tools, compilers, runtime, and HW jointly reconstruct parallelism and arrange for parallel execution of the program One Important Disclaimer: One Important Disclaimer Code change needed whether we use implicitly or explicitly parallel programming models Legacy code was not developed to give tools deep knowledge about the computation being performed. E.g., source code does not express high-level properties and assumptions Implicitly Parallel Programming Model ≠ Legacy sequential code Another Important Disclaimer: Another Important Disclaimer When there is no parallelism in the algorithm, all is lost The question is how to go from parallel algorithms to parallel machine code Implicitly Parallel Programming Model ≠ Sequential Algorithms Human vs. Machine-level Programming Models: Human vs. Machine-level Programming Models Pentium Itanium NVida GPU Cost of Tuning ExplicitlyParallel Programming: Cost of Tuning Explicitly Parallel Programming Developing an explicitly parallel app is an expensive proposition Understanding and performing all optimizations needed is hard; transactional memory might help Optimized parallelization will likely involve finessing libraries and infrastructure software (lack of composability) The larger the number of cores, the less intuitive the optimization process will be Cost of Explicitly Parallel Program Verification and Support: Cost of Explicitly Parallel Program Verification and Support Verification and support of a hand parallelized program is even more expensive Application-level testing needs to cover a much larger state space Should contain the verification complexity through guarantees at the interface level Reproducing bugs can be very challenging, deterministic execution semantics help Some of the points in 'Software and the Concurrency Revolution.' Sutter and Larus, ACM Queue 3(7), Sep. 2005. Cost of Scaling Explicitly Parallel Programs: Cost of Scaling Explicitly Parallel Programs Programs developed in explicitly parallel form are unlikely to scale with Moore’s Law Performance comes from optimizations that craft the computation and data accesses to best get around limitations Every new generation or new platform will likely require app developers to redo their application Value and Complexity Magnified: Value and Complexity Magnified Implicitly Parallel Programming Flow: Stylized C/C++ w/ assertions Concurrency discovery Visualizable concurrent form Human Code-gen space exploration Visualizable sequential assembly code with parallel annotations Parallel HW w/sequential state gen Deep analysis w/ feedback assistance Systematic search for best/correct code gen Debugger parallel execution w/ sequential semantics Implicitly Parallel Programming Flow For increased composability For increased scalability For increased supportability Application Scaling : Application Scaling Traditional GP compilers cover legacy apps (pit) based on sequential algorithms Explicitly parallel programming models, OpenMP, and MPI cover simpler, earlier parallel apps (peach skin) Implicitly parallel programming models extend the coverage into more sophisticated parallel apps (peach flesh) 'application scaling' Parallelism in Algorithms(H.263 motion estimation example): Parallelism in Algorithms (H.263 motion estimation example) Different algorithms may expose different levels of parallelism while achieving desired result In motion estimation, can use previous vectors (either from space or time) as guess vectors Concurrency Discovery: Concurrency Discovery Reconstruction of concurrency in a parallel algorithm expressed in a traditional sequential language Program analysis techniques have advanced sufficiently in the past decade to do this job Programmers need to avoid cryptic programming styles Programmer annotations on high-level design properties and assumptions helpful This needs to be done when composing an explicitly parallel program from third-party components anyway MPEG-4 H.263 EncoderParallelism Rediscovery: MPEG-4 H.263 Encoder Parallelism Rediscovery (a) (b) (c) (d) (e) Code Gen Space Exploration: Code Gen Space Exploration Conclusion: Conclusion Implicitly Parallel Programming Models Increase code re-use through parallelism rediscovery Reduce tedious programming chores through code-gen space exploration Control software support cost through sequential verification and debugging interfaces Use of appropriate application algorithms, coding style and compiler analysis techniques is the key to success You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
42 3 CoolDude26 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 46 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: September 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Implicitly Parallel Programming Models: For Thousand-Core Microprocessors Hwu, Ryoo, Ueng, Kelm, Gelado, Stone, Kidd, Barghsorkhi, Mahesri, Tsao, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign Universtat Politecnica de Catalunya Implicitly Parallel Programming Models The Next Software Challenge: The Next Software Challenge Today, TLP models potentially make more effective use of area and power than large ILP CPU’s Multi-core, Many-core, Cell BE™, GPGPU, PPU, etc. Scaling from 4-core to 1000-core chips could happen in the next 15 years All semiconductor market domains converging to concurrent system platforms PCs, game consoles, mobile handsets, servers, supercomputers, networking, etc. We need to make these systems effectively execute valuable, demanding apps. One Plausible View: One Plausible View One might be able to stretch sequential programming models to small scale multi-core systems As for large scale, many-core systems, explicitly parallel programming models must be used I would like to convince you that it is the other way around. Implicitly Parallel Programming Models: Implicitly Parallel Programming Models Programmers choose or create algorithms that have desired amount of parallelism The parallel algorithms are expressed in traditional sequential programming languages With assertions of design properties and assumptions The tools, compilers, runtime, and HW jointly reconstruct parallelism and arrange for parallel execution of the program One Important Disclaimer: One Important Disclaimer Code change needed whether we use implicitly or explicitly parallel programming models Legacy code was not developed to give tools deep knowledge about the computation being performed. E.g., source code does not express high-level properties and assumptions Implicitly Parallel Programming Model ≠ Legacy sequential code Another Important Disclaimer: Another Important Disclaimer When there is no parallelism in the algorithm, all is lost The question is how to go from parallel algorithms to parallel machine code Implicitly Parallel Programming Model ≠ Sequential Algorithms Human vs. Machine-level Programming Models: Human vs. Machine-level Programming Models Pentium Itanium NVida GPU Cost of Tuning ExplicitlyParallel Programming: Cost of Tuning Explicitly Parallel Programming Developing an explicitly parallel app is an expensive proposition Understanding and performing all optimizations needed is hard; transactional memory might help Optimized parallelization will likely involve finessing libraries and infrastructure software (lack of composability) The larger the number of cores, the less intuitive the optimization process will be Cost of Explicitly Parallel Program Verification and Support: Cost of Explicitly Parallel Program Verification and Support Verification and support of a hand parallelized program is even more expensive Application-level testing needs to cover a much larger state space Should contain the verification complexity through guarantees at the interface level Reproducing bugs can be very challenging, deterministic execution semantics help Some of the points in 'Software and the Concurrency Revolution.' Sutter and Larus, ACM Queue 3(7), Sep. 2005. Cost of Scaling Explicitly Parallel Programs: Cost of Scaling Explicitly Parallel Programs Programs developed in explicitly parallel form are unlikely to scale with Moore’s Law Performance comes from optimizations that craft the computation and data accesses to best get around limitations Every new generation or new platform will likely require app developers to redo their application Value and Complexity Magnified: Value and Complexity Magnified Implicitly Parallel Programming Flow: Stylized C/C++ w/ assertions Concurrency discovery Visualizable concurrent form Human Code-gen space exploration Visualizable sequential assembly code with parallel annotations Parallel HW w/sequential state gen Deep analysis w/ feedback assistance Systematic search for best/correct code gen Debugger parallel execution w/ sequential semantics Implicitly Parallel Programming Flow For increased composability For increased scalability For increased supportability Application Scaling : Application Scaling Traditional GP compilers cover legacy apps (pit) based on sequential algorithms Explicitly parallel programming models, OpenMP, and MPI cover simpler, earlier parallel apps (peach skin) Implicitly parallel programming models extend the coverage into more sophisticated parallel apps (peach flesh) 'application scaling' Parallelism in Algorithms(H.263 motion estimation example): Parallelism in Algorithms (H.263 motion estimation example) Different algorithms may expose different levels of parallelism while achieving desired result In motion estimation, can use previous vectors (either from space or time) as guess vectors Concurrency Discovery: Concurrency Discovery Reconstruction of concurrency in a parallel algorithm expressed in a traditional sequential language Program analysis techniques have advanced sufficiently in the past decade to do this job Programmers need to avoid cryptic programming styles Programmer annotations on high-level design properties and assumptions helpful This needs to be done when composing an explicitly parallel program from third-party components anyway MPEG-4 H.263 EncoderParallelism Rediscovery: MPEG-4 H.263 Encoder Parallelism Rediscovery (a) (b) (c) (d) (e) Code Gen Space Exploration: Code Gen Space Exploration Conclusion: Conclusion Implicitly Parallel Programming Models Increase code re-use through parallelism rediscovery Reduce tedious programming chores through code-gen space exploration Control software support cost through sequential verification and debugging interfaces Use of appropriate application algorithms, coding style and compiler analysis techniques is the key to success