logging in or signing up parallel algorithms paweld Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 738 Category: Science & Tech.. License: All Rights Reserved Like it (0) Dislike it (0) Added: May 08, 2010 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Parallel algorithms : Parallel algorithms Parallel and Distributed Computing Wrocław, 07.05.2010 Paweł Duda Parallel algorithm – definition : Parallel algorithm – definition A parallel algorithm is an algorithm that has been specifically written for execution on a computer with two or more processing units. Parallel algorithms : Parallel algorithms can be run on computers with single processor (multiple functional units, pipelined functional units, pipelined memory systems) Modelling algorithms 1 : Modelling algorithms 1 when designing algorithm, take into account the cost of communication, the number of processors (efficiency) designer usually uses an abstract model of computation called parallel random-access machine (PRAM) each CPU operation = one step model’s advantages Modelling algorithms 2 - PRAM : Modelling algorithms 2 - PRAM neglects such isses as synchronisation and communication no limit on the number of processors in the machine any memory location is uniformely accessible from any processor no limit on the amount of shared memory in the system Modelling algorithms 3 - PRAM : Modelling algorithms 3 - PRAM no conflict in accessing resources generally the programs written on those machines are MIMD Multiprocessor model : Multiprocessor model MultiPROCESSOR MODEL PARALLEL RANDOM-ACCESS MACHINE MODULAR MEMORY MACHINE MODELS LOCAL MEMORY MACHINE MODEL Parallel Algorithms : Parallel Algorithms Multiprocessor model Work-depth model : Work-depth model How the cost of the algorithm can be calculated? Work - W Depth - D P = W/D – PARALLELISM of the algorithm Picture: Summing 16 numbers on a tree.The total depth (longest chain of dependencies) is 4 and The total work (number of operations) is 15. Mergesort : Mergesort Conceptually, a merge sort works as follows: input: sequence of n keys output: sorted sequence of n keys If the list is of length 1, then it is already sorted. Otherwise: Divide the unsorted list into two sublists of about half the size. Sort each sublist recursively by re-applying merge sort. Merge the two sublists back into one sorted list. Mergesort : Mergesort General-purpose computing on graphics processing units (GPGPU) : General-purpose computing on graphics processing units (GPGPU) - recent trend GPUs co-processors linear algebra matrix operations General-purpose computing on graphics processing units (GPGPU) Nvidia's Tesla GPGPU card Matrix multiplication : Algorithm: MATRIX_MULTIPLY(A,B) 1 (l,m) := dimensions (A) 2 (m,n) := dimensions (B) 3 in parallel for i ∊ [o..l) do 4 in parallel for j ∊ [0..n) do 5 Rij := sum( { Aik * Bkj : k ∊ [0..m) } ) We need log n matrix multiplications, each taking time O(n3) The serial complexity of this procedure is O(n3log n). Matrix multiplication Search : Search Dynamic creation of tasks and channels during program execution Looking for nodes coresponding to ‘solutions’ Initially a task created for the root of the tree procedure search(A) begin if(solution(A)) then score = eval(A); report solution and score else foreach child A(i) of A search (A(i)) endfor endif end Shortest-Path Algorithms : Shortest-Path Algorithms The all-pairs shortest-path problem involves finding the shortest path between all pairs of vertices in a graph. A graph G=(V,E) comprises a set V of N vertices {vi} , and a set E ⊆ V x X of edges. For (vi, vj) and (vi,vj), i ≠ j Floyd’s algorithm : Floyd’s algorithm Floyd’s algorithm is a graph analysis algorithm for finding shortest paths in a weighted graph. A single execution of the algorithm will find the shortest paths between all pairs of vertices. parallel Floyd’s algorithm 1 : parallel Floyd’s algorithm 1 Parallel Floyd’s algorithm 1 The first parallel Floyd algorithm is based on a one-dimensional, rowwise domain decomposition of the intermediate matrix I and the output matrix S. the algorithm can use at most N processors. Each task has one or more adjacent rows of I and is responsible for performing computation on those rows. parallel Floyd’s algorithm 1 : parallel Floyd’s algorithm 1 Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous block of rows. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block and the k th row. parallel Floyd’s algorithm 2 : parallel Floyd’s algorithm 2 Parallel Floyd’s algorithm 2 An alternative parallel version of Floyd's algorithm uses a two-dimensional decomposition of the various matrices. This version allows the use of up to N2 processors parallel Floyd’s algorithm 2 : parallel Floyd’s algorithm 2 Parallel Floyd 2 Parallel version of Floyd's algorithm based on a two-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous submatrix. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block, and part of the k th row and column. : Thank you for attention You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
parallel algorithms paweld Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 738 Category: Science & Tech.. License: All Rights Reserved Like it (0) Dislike it (0) Added: May 08, 2010 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Parallel algorithms : Parallel algorithms Parallel and Distributed Computing Wrocław, 07.05.2010 Paweł Duda Parallel algorithm – definition : Parallel algorithm – definition A parallel algorithm is an algorithm that has been specifically written for execution on a computer with two or more processing units. Parallel algorithms : Parallel algorithms can be run on computers with single processor (multiple functional units, pipelined functional units, pipelined memory systems) Modelling algorithms 1 : Modelling algorithms 1 when designing algorithm, take into account the cost of communication, the number of processors (efficiency) designer usually uses an abstract model of computation called parallel random-access machine (PRAM) each CPU operation = one step model’s advantages Modelling algorithms 2 - PRAM : Modelling algorithms 2 - PRAM neglects such isses as synchronisation and communication no limit on the number of processors in the machine any memory location is uniformely accessible from any processor no limit on the amount of shared memory in the system Modelling algorithms 3 - PRAM : Modelling algorithms 3 - PRAM no conflict in accessing resources generally the programs written on those machines are MIMD Multiprocessor model : Multiprocessor model MultiPROCESSOR MODEL PARALLEL RANDOM-ACCESS MACHINE MODULAR MEMORY MACHINE MODELS LOCAL MEMORY MACHINE MODEL Parallel Algorithms : Parallel Algorithms Multiprocessor model Work-depth model : Work-depth model How the cost of the algorithm can be calculated? Work - W Depth - D P = W/D – PARALLELISM of the algorithm Picture: Summing 16 numbers on a tree.The total depth (longest chain of dependencies) is 4 and The total work (number of operations) is 15. Mergesort : Mergesort Conceptually, a merge sort works as follows: input: sequence of n keys output: sorted sequence of n keys If the list is of length 1, then it is already sorted. Otherwise: Divide the unsorted list into two sublists of about half the size. Sort each sublist recursively by re-applying merge sort. Merge the two sublists back into one sorted list. Mergesort : Mergesort General-purpose computing on graphics processing units (GPGPU) : General-purpose computing on graphics processing units (GPGPU) - recent trend GPUs co-processors linear algebra matrix operations General-purpose computing on graphics processing units (GPGPU) Nvidia's Tesla GPGPU card Matrix multiplication : Algorithm: MATRIX_MULTIPLY(A,B) 1 (l,m) := dimensions (A) 2 (m,n) := dimensions (B) 3 in parallel for i ∊ [o..l) do 4 in parallel for j ∊ [0..n) do 5 Rij := sum( { Aik * Bkj : k ∊ [0..m) } ) We need log n matrix multiplications, each taking time O(n3) The serial complexity of this procedure is O(n3log n). Matrix multiplication Search : Search Dynamic creation of tasks and channels during program execution Looking for nodes coresponding to ‘solutions’ Initially a task created for the root of the tree procedure search(A) begin if(solution(A)) then score = eval(A); report solution and score else foreach child A(i) of A search (A(i)) endfor endif end Shortest-Path Algorithms : Shortest-Path Algorithms The all-pairs shortest-path problem involves finding the shortest path between all pairs of vertices in a graph. A graph G=(V,E) comprises a set V of N vertices {vi} , and a set E ⊆ V x X of edges. For (vi, vj) and (vi,vj), i ≠ j Floyd’s algorithm : Floyd’s algorithm Floyd’s algorithm is a graph analysis algorithm for finding shortest paths in a weighted graph. A single execution of the algorithm will find the shortest paths between all pairs of vertices. parallel Floyd’s algorithm 1 : parallel Floyd’s algorithm 1 Parallel Floyd’s algorithm 1 The first parallel Floyd algorithm is based on a one-dimensional, rowwise domain decomposition of the intermediate matrix I and the output matrix S. the algorithm can use at most N processors. Each task has one or more adjacent rows of I and is responsible for performing computation on those rows. parallel Floyd’s algorithm 1 : parallel Floyd’s algorithm 1 Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous block of rows. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block and the k th row. parallel Floyd’s algorithm 2 : parallel Floyd’s algorithm 2 Parallel Floyd’s algorithm 2 An alternative parallel version of Floyd's algorithm uses a two-dimensional decomposition of the various matrices. This version allows the use of up to N2 processors parallel Floyd’s algorithm 2 : parallel Floyd’s algorithm 2 Parallel Floyd 2 Parallel version of Floyd's algorithm based on a two-dimensional decomposition of the I matrix. In (a), the data allocated to a single task are shaded: a contiguous submatrix. In (b), the data required by this task in the k th step of the algorithm are shaded: its own block, and part of the k th row and column. : Thank you for attention