logging in or signing up P105 017 Gallard Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 110 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 29, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems: Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems Wim Heirman Ghent University, BelgiumOutline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsArchitecture of a distributed shared-memory system: Architecture of a distributed shared-memory system Nodes: Processor Caches Main memory Network interface Interconnection network Packet switchedArchitecture of a distributed shared-memory system: Architecture of a distributed shared-memory system ‘Remote’ memory access: handled by the network interfaces, requires use of the interconnection network CPU MEM Net IF Interconnection network CPU Cache Cache MEM Net IFInterconnect requirements: Interconnect requirements Network latency is a major bottleneck: instruction (.5 ns) << local memory access (50 ns) << remote memory access (500 ns)Interconnect requirements: Interconnect requirements Non-uniform network traffic in space and time => Reconfigurable network?Outline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsReconfigurable Optical Networks: Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)Reconfigurable Optical Networks: Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)Reconfigurable Optical Networks: Reconfigurable Optical Networks Photonic Crystal components (crossbar) Source: D. Prather, University of DelawareReconfiguration in shared-memory machines: Reconfiguration in shared-memory machines Reconfiguration speed: up to 1 ms One memory access: < 1 µs Locality needed in address streams! (Traffic Temporal Analysis for Reconfigurable Interconnects in Shared-Memory Systems, W. Heirman et. al., Reconfigurable Architectures Workshop, April 4-5, 2005, Denver, CO)Reconfiguration in shared-memory machines: Reconfiguration in shared-memory machines CPU MEM CPU MEM time traffic ‘burst’ CPU MEM Reconfiguration in shared-memory machines: Reconfiguration in shared-memory machines CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM Base network (fixed) Extra links (reconfigurable)Reconfiguration in shared-memory machines: Reconfiguration in shared-memory machines Requirement: Reconfiguration time << reconfiguration interval << burst durationEvaluating network performance: Evaluating network performance Full-system simulations are needed: Current statistical traffic models don’t exhibit the ‘bursty behavior’ exploited here ‘Application speedup’ cannot be derived from network performance alone The simulation needs to model tens of processors, caches, and the interconnection network Different benchmarksEvaluating network performance: Evaluating network performance Evaluating just one set of network parameters takes hours of simulations… How can we do this faster? Derive performance for several sets of network parameters from one simulation!Outline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsPredicting network performance: Predicting network performance One full-system simulation network packets memory accesses for each parameter set Our prediction modelPredicting network performance: Predicting network performance Estimate extra link placement: Parameters: reconfiguration interval (delta t), number of extra links (n), link placement algorithm time delta t = 1 n = 2 delta t = 2 n = 4Predicting network performance: Predicting network performance Estimate new memory access latency for each transaction: No change Reduced access time No change(!)Predicting network performance: Predicting network performance Predict application speedup: computation time (constant) unchanged fraction of memory latency reduced fraction of memory latency Original execution time New execution time Application speedup :2.13Outline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsResults: ResultsAssumptions: Assumptions Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a pathResults: application variability: Results: application variability Correlation between computation time variability and prediction error is high, this could explain larger errors in some benchmarksResults: different parameters: Results: different parameters FFT benchmark, results for different reconfiguration intervals and # extra links: good relative predictionOutline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsFuture work: Future work Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a path Conclusions: Conclusions Using our technique, good predictions can be made using much less time-consuming simulations Good relative accuracy over a range of parameters allows for quick design-space exploration Further refinements can be made by including application variability and congestion You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
P105 017 Gallard Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 110 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 29, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems: Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems Wim Heirman Ghent University, BelgiumOutline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsArchitecture of a distributed shared-memory system: Architecture of a distributed shared-memory system Nodes: Processor Caches Main memory Network interface Interconnection network Packet switchedArchitecture of a distributed shared-memory system: Architecture of a distributed shared-memory system ‘Remote’ memory access: handled by the network interfaces, requires use of the interconnection network CPU MEM Net IF Interconnection network CPU Cache Cache MEM Net IFInterconnect requirements: Interconnect requirements Network latency is a major bottleneck: instruction (.5 ns) << local memory access (50 ns) << remote memory access (500 ns)Interconnect requirements: Interconnect requirements Non-uniform network traffic in space and time => Reconfigurable network?Outline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsReconfigurable Optical Networks: Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)Reconfigurable Optical Networks: Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)Reconfigurable Optical Networks: Reconfigurable Optical Networks Photonic Crystal components (crossbar) Source: D. Prather, University of DelawareReconfiguration in shared-memory machines: Reconfiguration in shared-memory machines Reconfiguration speed: up to 1 ms One memory access: < 1 µs Locality needed in address streams! (Traffic Temporal Analysis for Reconfigurable Interconnects in Shared-Memory Systems, W. Heirman et. al., Reconfigurable Architectures Workshop, April 4-5, 2005, Denver, CO)Reconfiguration in shared-memory machines: Reconfiguration in shared-memory machines CPU MEM CPU MEM time traffic ‘burst’ CPU MEM Reconfiguration in shared-memory machines: Reconfiguration in shared-memory machines CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM Base network (fixed) Extra links (reconfigurable)Reconfiguration in shared-memory machines: Reconfiguration in shared-memory machines Requirement: Reconfiguration time << reconfiguration interval << burst durationEvaluating network performance: Evaluating network performance Full-system simulations are needed: Current statistical traffic models don’t exhibit the ‘bursty behavior’ exploited here ‘Application speedup’ cannot be derived from network performance alone The simulation needs to model tens of processors, caches, and the interconnection network Different benchmarksEvaluating network performance: Evaluating network performance Evaluating just one set of network parameters takes hours of simulations… How can we do this faster? Derive performance for several sets of network parameters from one simulation!Outline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsPredicting network performance: Predicting network performance One full-system simulation network packets memory accesses for each parameter set Our prediction modelPredicting network performance: Predicting network performance Estimate extra link placement: Parameters: reconfiguration interval (delta t), number of extra links (n), link placement algorithm time delta t = 1 n = 2 delta t = 2 n = 4Predicting network performance: Predicting network performance Estimate new memory access latency for each transaction: No change Reduced access time No change(!)Predicting network performance: Predicting network performance Predict application speedup: computation time (constant) unchanged fraction of memory latency reduced fraction of memory latency Original execution time New execution time Application speedup :2.13Outline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsResults: ResultsAssumptions: Assumptions Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a pathResults: application variability: Results: application variability Correlation between computation time variability and prediction error is high, this could explain larger errors in some benchmarksResults: different parameters: Results: different parameters FFT benchmark, results for different reconfiguration intervals and # extra links: good relative predictionOutline: Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusionsFuture work: Future work Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a path Conclusions: Conclusions Using our technique, good predictions can be made using much less time-consuming simulations Good relative accuracy over a range of parameters allows for quick design-space exploration Further refinements can be made by including application variability and congestion