Share PowerPoint. Anywhere!

P105 017

Uploaded from authorPOINT Lite
Download as Download Not Available PPT
Presentation Description

No description available

Views: 14
Like it  ( Likes) Dislike it  ( Dislikes)
Added: October 29, 2007 This presentation is Public
Presentation Category :Entertainment
Tags Add Tags
Presentation StatisticsNew!
Views on authorSTREAM: 12 | Views from Embeds: 2
Others - 2 views
Presentation Transcript

Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems : Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems Wim Heirman Ghent University, Belgium


Outline : Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions


Architecture of a distributed shared-memory system : Architecture of a distributed shared-memory system Nodes: Processor Caches Main memory Network interface Interconnection network Packet switched


Architecture of a distributed shared-memory system : Architecture of a distributed shared-memory system ‘Remote’ memory access: handled by the network interfaces, requires use of the interconnection network CPU MEM Net IF Interconnection network CPU Cache Cache MEM Net IF


Interconnect requirements : Interconnect requirements Network latency is a major bottleneck: instruction (.5 ns) << local memory access (50 ns) << remote memory access (500 ns)


Interconnect requirements : Interconnect requirements Non-uniform network traffic in space and time => Reconfigurable network?


Outline : Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions


Reconfigurable Optical Networks : Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)


Reconfigurable Optical Networks : Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)


Reconfigurable Optical Networks : Reconfigurable Optical Networks Photonic Crystal components (crossbar) Source: D. Prather, University of Delaware


Reconfiguration in shared-memory machines : Reconfiguration in shared-memory machines Reconfiguration speed: up to 1 ms One memory access: < 1 µs Locality needed in address streams! (Traffic Temporal Analysis for Reconfigurable Interconnects in Shared-Memory Systems, W. Heirman et. al., Reconfigurable Architectures Workshop, April 4-5, 2005, Denver, CO)


Reconfiguration in shared-memory machines : Reconfiguration in shared-memory machines CPU MEM CPU MEM time traffic ‘burst’ CPU MEM


Reconfiguration in shared-memory machines : Reconfiguration in shared-memory machines CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM Base network (fixed) Extra links (reconfigurable)


Reconfiguration in shared-memory machines : Reconfiguration in shared-memory machines Requirement: Reconfiguration time << reconfiguration interval << burst duration


Evaluating network performance : Evaluating network performance Full-system simulations are needed: Current statistical traffic models don’t exhibit the ‘bursty behavior’ exploited here ‘Application speedup’ cannot be derived from network performance alone The simulation needs to model tens of processors, caches, and the interconnection network Different benchmarks


Evaluating network performance : Evaluating network performance Evaluating just one set of network parameters takes hours of simulations… How can we do this faster? Derive performance for several sets of network parameters from one simulation!


Outline : Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions


Predicting network performance : Predicting network performance One full-system simulation network packets memory accesses for each parameter set Our prediction model


Predicting network performance : Predicting network performance Estimate extra link placement: Parameters: reconfiguration interval (delta t), number of extra links (n), link placement algorithm time delta t = 1 n = 2 delta t = 2 n = 4


Predicting network performance : Predicting network performance Estimate new memory access latency for each transaction: No change Reduced access time No change(!)


Predicting network performance : Predicting network performance Predict application speedup: computation time (constant) unchanged fraction of memory latency reduced fraction of memory latency Original execution time New execution time Application speedup :2.13


Outline : Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions


Results : Results


Assumptions : Assumptions Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a path


Results: application variability : Results: application variability Correlation between computation time variability and prediction error is high, this could explain larger errors in some benchmarks


Results: different parameters : Results: different parameters FFT benchmark, results for different reconfiguration intervals and # extra links: good relative prediction


Outline : Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions


Future work : Future work Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a path


Conclusions : Conclusions Using our technique, good predictions can be made using much less time-consuming simulations Good relative accuracy over a range of parameters allows for quick design-space exploration Further refinements can be made by including application variability and congestion