logging in or signing up PAPI Haggrid Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 146 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: September 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript PAPI 3.0.8.1 on Blue Gene L: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance Presentation overview: Presentation overview Project objectives PAPI explanation Blue Gene L explanation Current state of research Project objectives: Project objectives Upgrade PAPI on BG/L Provide interface for network counters Allow Lawrence Livermore National Lab users to also have access to PAPI Using network counters to place tasks optimally on BG/L PAPI – Intro: PAPI – Intro Courtesy of http://icl.cs.utk.edu/papi/ PAPI – Intro: PAPI – Intro PAPI useful to profile your own programs. Many tools based on PAPI PapiEx – Command line measurement tool PerfSuite – Aggregate measurement and statistical profiling package and API HPCToolkit – Statistical profiling package Many more! PAPI – Supported platforms: PAPI – Supported platforms IBM – POWER3, 604, 604e, POWER4 Cray T3E, Cray X1 AMD – Athlon, Opteron Intel – P1 to P4, Itanium I and II UltraSparc I, II andamp; III MIPS R10K, R12K, R14K Alpha PAPI – Generic Interface: PAPI – Generic Interface Call sequence for generic interface PAPI_library_init – Initialize memory for PAPI’s data structures PAPI_create_eventset – Create an empty list of events PAPI_add_event – Add events to be counted PAPI_start – Begin counting all events within the specified eventset PAPI_stop – Stop all counters and read their current values PAPI – Events: Presets: PAPI – Events: Presets Presets – list of predefined events implemented on all systems where they can be supported Not all presets available on every architecture (e.g. BG/L has no cache lower than L3 – thus L1 cache hit preset not applicable) Native events form the basic building blocks for PAPI presets PAPI – Events: Presets: PAPI – Events: Presets Courtesy of http://icl.cs.utk.edu/papi/ PAPI – Events: Native: PAPI – Events: Native In addition to the predefined PAPI preset events, the PAPI library also exposes a majority of the events native to each platform Can be added to eventsets in the same manner as presets PAPI – Events: Native: PAPI – Events: Native PAPI – Internals: PAPI – Internals Array of eventsets is the main portion PAPI – Other features: PAPI – Other features Multiplexing – If there are not enough hardware counters Thread safe – Profiling is thread safe Overflow detection – Hardware counters have limited space PAPI – PAPI2 vs PAPI3: PAPI – PAPI2 vs PAPI3 PAPI 3 significantly reduced overheads for starting, stopping and reading the counters Courtesy of http://icl.cs.utk.edu/papi/ PAPI – PAPI2 vs PAPI3: PAPI – PAPI2 vs PAPI3 Better native event support in PAPI3 Better thread support in PAPI3 Overflow and Profiling enhancements in PAPI3 Myriad bug fixes and code cleanup in PAPI3 PAPI – PAPI2 vs PAPI3: PAPI – PAPI2 vs PAPI3 Overlapping eventsets supported in PAPI2 Minor changes in the API – mostly dereferencing variables Blue Gene L – Intro: Blue Gene L – Intro 65,536 nodes connected in 64 x 32 x 32 3D torus Nodes made up of PowerPC 440 embedded processors Smaller than most super computers Consumes less power Blue Gene L: Blue Gene L Blue Gene L - Networks: Blue Gene L - Networks 3D torus network (node to node) Tree network (broadcasts) Blue Gene L – HW counters: Blue Gene L – HW counters 48 universal performance counters 4 floating point unit counters Counters 32 bit – must use virtual counters to prevent overflow Blue Gene L – HW counters: Blue Gene L – HW counters Research – Overall goals: Research – Overall goals Network hardware counters new Use network counters to determine traffic between tasks Try to optimize placement of tasks to minimize communication latency Given counts and distances: cost = counts * distance. Minimize over all nodes Research – Counting: Research – Counting First goal to determine what is being counted Research – Networks: Research – Networks For each MPI call – determine which network counters are being used Tree is supposed to be for broadcasts Torus is supposed to be for point to point communication Ambiguities in the specification Research – Future decisions: Research – Future decisions How to profile a target application Manually insert PAPI instrumentation: a lot of work Instrument binaries with counting code What information to store All counts on each node: a lot of data Sample of all nodes: not as accurate (what if the tasks behave / communicate differently? Research – Future decisions: Research – Future decisions How to use collected information Profile an application to obtain counter feedback to determine optimized static task layout Dynamically migrate tasks in response to counters You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
PAPI Haggrid Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 146 Category: News & Reports.. License: All Rights Reserved Like it (0) Dislike it (0) Added: September 19, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript PAPI 3.0.8.1 on Blue Gene L: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance Presentation overview: Presentation overview Project objectives PAPI explanation Blue Gene L explanation Current state of research Project objectives: Project objectives Upgrade PAPI on BG/L Provide interface for network counters Allow Lawrence Livermore National Lab users to also have access to PAPI Using network counters to place tasks optimally on BG/L PAPI – Intro: PAPI – Intro Courtesy of http://icl.cs.utk.edu/papi/ PAPI – Intro: PAPI – Intro PAPI useful to profile your own programs. Many tools based on PAPI PapiEx – Command line measurement tool PerfSuite – Aggregate measurement and statistical profiling package and API HPCToolkit – Statistical profiling package Many more! PAPI – Supported platforms: PAPI – Supported platforms IBM – POWER3, 604, 604e, POWER4 Cray T3E, Cray X1 AMD – Athlon, Opteron Intel – P1 to P4, Itanium I and II UltraSparc I, II andamp; III MIPS R10K, R12K, R14K Alpha PAPI – Generic Interface: PAPI – Generic Interface Call sequence for generic interface PAPI_library_init – Initialize memory for PAPI’s data structures PAPI_create_eventset – Create an empty list of events PAPI_add_event – Add events to be counted PAPI_start – Begin counting all events within the specified eventset PAPI_stop – Stop all counters and read their current values PAPI – Events: Presets: PAPI – Events: Presets Presets – list of predefined events implemented on all systems where they can be supported Not all presets available on every architecture (e.g. BG/L has no cache lower than L3 – thus L1 cache hit preset not applicable) Native events form the basic building blocks for PAPI presets PAPI – Events: Presets: PAPI – Events: Presets Courtesy of http://icl.cs.utk.edu/papi/ PAPI – Events: Native: PAPI – Events: Native In addition to the predefined PAPI preset events, the PAPI library also exposes a majority of the events native to each platform Can be added to eventsets in the same manner as presets PAPI – Events: Native: PAPI – Events: Native PAPI – Internals: PAPI – Internals Array of eventsets is the main portion PAPI – Other features: PAPI – Other features Multiplexing – If there are not enough hardware counters Thread safe – Profiling is thread safe Overflow detection – Hardware counters have limited space PAPI – PAPI2 vs PAPI3: PAPI – PAPI2 vs PAPI3 PAPI 3 significantly reduced overheads for starting, stopping and reading the counters Courtesy of http://icl.cs.utk.edu/papi/ PAPI – PAPI2 vs PAPI3: PAPI – PAPI2 vs PAPI3 Better native event support in PAPI3 Better thread support in PAPI3 Overflow and Profiling enhancements in PAPI3 Myriad bug fixes and code cleanup in PAPI3 PAPI – PAPI2 vs PAPI3: PAPI – PAPI2 vs PAPI3 Overlapping eventsets supported in PAPI2 Minor changes in the API – mostly dereferencing variables Blue Gene L – Intro: Blue Gene L – Intro 65,536 nodes connected in 64 x 32 x 32 3D torus Nodes made up of PowerPC 440 embedded processors Smaller than most super computers Consumes less power Blue Gene L: Blue Gene L Blue Gene L - Networks: Blue Gene L - Networks 3D torus network (node to node) Tree network (broadcasts) Blue Gene L – HW counters: Blue Gene L – HW counters 48 universal performance counters 4 floating point unit counters Counters 32 bit – must use virtual counters to prevent overflow Blue Gene L – HW counters: Blue Gene L – HW counters Research – Overall goals: Research – Overall goals Network hardware counters new Use network counters to determine traffic between tasks Try to optimize placement of tasks to minimize communication latency Given counts and distances: cost = counts * distance. Minimize over all nodes Research – Counting: Research – Counting First goal to determine what is being counted Research – Networks: Research – Networks For each MPI call – determine which network counters are being used Tree is supposed to be for broadcasts Torus is supposed to be for point to point communication Ambiguities in the specification Research – Future decisions: Research – Future decisions How to profile a target application Manually insert PAPI instrumentation: a lot of work Instrument binaries with counting code What information to store All counts on each node: a lot of data Sample of all nodes: not as accurate (what if the tasks behave / communicate differently? Research – Future decisions: Research – Future decisions How to use collected information Profile an application to obtain counter feedback to determine optimized static task layout Dynamically migrate tasks in response to counters