42 3

Uploaded from authorPOINT
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Slide1: 


Implicitly Parallel Programming Models: 

For Thousand-Core Microprocessors Hwu, Ryoo, Ueng, Kelm, Gelado, Stone, Kidd, Barghsorkhi, Mahesri, Tsao, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign Universtat Politecnica de Catalunya Implicitly Parallel Programming Models

The Next Software Challenge: 

The Next Software Challenge Today, TLP models potentially make more effective use of area and power than large ILP CPU’s Multi-core, Many-core, Cell BE™, GPGPU, PPU, etc. Scaling from 4-core to 1000-core chips could happen in the next 15 years All semiconductor market domains converging to concurrent system platforms PCs, game consoles, mobile handsets, servers, supercomputers, networking, etc. We need to make these systems effectively execute valuable, demanding apps.

One Plausible View: 

One Plausible View One might be able to stretch sequential programming models to small scale multi-core systems As for large scale, many-core systems, explicitly parallel programming models must be used I would like to convince you that it is the other way around.

Implicitly Parallel Programming Models: 

Implicitly Parallel Programming Models Programmers choose or create algorithms that have desired amount of parallelism The parallel algorithms are expressed in traditional sequential programming languages With assertions of design properties and assumptions The tools, compilers, runtime, and HW jointly reconstruct parallelism and arrange for parallel execution of the program

One Important Disclaimer: 

One Important Disclaimer Code change needed whether we use implicitly or explicitly parallel programming models Legacy code was not developed to give tools deep knowledge about the computation being performed. E.g., source code does not express high-level properties and assumptions Implicitly Parallel Programming Model ≠ Legacy sequential code

Another Important Disclaimer: 

Another Important Disclaimer When there is no parallelism in the algorithm, all is lost The question is how to go from parallel algorithms to parallel machine code Implicitly Parallel Programming Model ≠ Sequential Algorithms

Human vs. Machine-level Programming Models: 

Human vs. Machine-level Programming Models Pentium Itanium NVida GPU

Cost of Tuning ExplicitlyParallel Programming: 

Cost of Tuning Explicitly Parallel Programming Developing an explicitly parallel app is an expensive proposition Understanding and performing all optimizations needed is hard; transactional memory might help Optimized parallelization will likely involve finessing libraries and infrastructure software (lack of composability) The larger the number of cores, the less intuitive the optimization process will be

Cost of Explicitly Parallel Program Verification and Support: 

Cost of Explicitly Parallel Program Verification and Support Verification and support of a hand parallelized program is even more expensive Application-level testing needs to cover a much larger state space Should contain the verification complexity through guarantees at the interface level Reproducing bugs can be very challenging, deterministic execution semantics help Some of the points in 'Software and the Concurrency Revolution.' Sutter and Larus, ACM Queue 3(7), Sep. 2005.

Cost of Scaling Explicitly Parallel Programs: 

Cost of Scaling Explicitly Parallel Programs Programs developed in explicitly parallel form are unlikely to scale with Moore’s Law Performance comes from optimizations that craft the computation and data accesses to best get around limitations Every new generation or new platform will likely require app developers to redo their application

Value and Complexity Magnified: 

Value and Complexity Magnified

Implicitly Parallel Programming Flow: 

Stylized C/C++ w/ assertions Concurrency discovery Visualizable concurrent form Human Code-gen space exploration Visualizable sequential assembly code with parallel annotations Parallel HW w/sequential state gen Deep analysis w/ feedback assistance Systematic search for best/correct code gen Debugger parallel execution w/ sequential semantics Implicitly Parallel Programming Flow For increased composability For increased scalability For increased supportability

Application Scaling : 

Application Scaling Traditional GP compilers cover legacy apps (pit) based on sequential algorithms Explicitly parallel programming models, OpenMP, and MPI cover simpler, earlier parallel apps (peach skin) Implicitly parallel programming models extend the coverage into more sophisticated parallel apps (peach flesh) 'application scaling'

Parallelism in Algorithms(H.263 motion estimation example): 

Parallelism in Algorithms (H.263 motion estimation example) Different algorithms may expose different levels of parallelism while achieving desired result In motion estimation, can use previous vectors (either from space or time) as guess vectors

Concurrency Discovery: 

Concurrency Discovery Reconstruction of concurrency in a parallel algorithm expressed in a traditional sequential language Program analysis techniques have advanced sufficiently in the past decade to do this job Programmers need to avoid cryptic programming styles Programmer annotations on high-level design properties and assumptions helpful This needs to be done when composing an explicitly parallel program from third-party components anyway

MPEG-4 H.263 EncoderParallelism Rediscovery: 

MPEG-4 H.263 Encoder Parallelism Rediscovery (a) (b) (c) (d) (e)

Code Gen Space Exploration: 

Code Gen Space Exploration

Conclusion: 

Conclusion Implicitly Parallel Programming Models Increase code re-use through parallelism rediscovery Reduce tedious programming chores through code-gen space exploration Control software support cost through sequential verification and debugging interfaces Use of appropriate application algorithms, coding style and compiler analysis techniques is the key to success