P105 017

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems: 

Prediction Model for Evaluation of Reconfigurable Interconnects in Distributed Shared-Memory Systems Wim Heirman Ghent University, Belgium

Outline: 

Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions

Architecture of a distributed shared-memory system: 

Architecture of a distributed shared-memory system Nodes: Processor Caches Main memory Network interface Interconnection network Packet switched

Architecture of a distributed shared-memory system: 

Architecture of a distributed shared-memory system ‘Remote’ memory access: handled by the network interfaces, requires use of the interconnection network CPU MEM Net IF Interconnection network CPU Cache Cache MEM Net IF

Interconnect requirements: 

Interconnect requirements Network latency is a major bottleneck: instruction (.5 ns) << local memory access (50 ns) << remote memory access (500 ns)

Interconnect requirements: 

Interconnect requirements Non-uniform network traffic in space and time => Reconfigurable network?

Outline: 

Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions

Reconfigurable Optical Networks: 

Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)

Reconfigurable Optical Networks: 

Reconfigurable Optical Networks WDM (wavelength division multiplexing) Tunable lasers / detectors Passive star coupler (PSC)

Reconfigurable Optical Networks: 

Reconfigurable Optical Networks Photonic Crystal components (crossbar) Source: D. Prather, University of Delaware

Reconfiguration in shared-memory machines: 

Reconfiguration in shared-memory machines Reconfiguration speed: up to 1 ms One memory access: < 1 µs Locality needed in address streams! (Traffic Temporal Analysis for Reconfigurable Interconnects in Shared-Memory Systems, W. Heirman et. al., Reconfigurable Architectures Workshop, April 4-5, 2005, Denver, CO)

Reconfiguration in shared-memory machines: 

Reconfiguration in shared-memory machines CPU MEM CPU MEM time traffic ‘burst’ CPU MEM

Reconfiguration in shared-memory machines: 

Reconfiguration in shared-memory machines CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM Base network (fixed) Extra links (reconfigurable)

Reconfiguration in shared-memory machines: 

Reconfiguration in shared-memory machines Requirement: Reconfiguration time << reconfiguration interval << burst duration

Evaluating network performance: 

Evaluating network performance Full-system simulations are needed: Current statistical traffic models don’t exhibit the ‘bursty behavior’ exploited here ‘Application speedup’ cannot be derived from network performance alone The simulation needs to model tens of processors, caches, and the interconnection network Different benchmarks

Evaluating network performance: 

Evaluating network performance Evaluating just one set of network parameters takes hours of simulations… How can we do this faster? Derive performance for several sets of network parameters from one simulation!

Outline: 

Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions

Predicting network performance: 

Predicting network performance One full-system simulation network packets memory accesses for each parameter set Our prediction model

Predicting network performance: 

Predicting network performance Estimate extra link placement: Parameters: reconfiguration interval (delta t), number of extra links (n), link placement algorithm time delta t = 1 n = 2 delta t = 2 n = 4

Predicting network performance: 

Predicting network performance Estimate new memory access latency for each transaction: No change Reduced access time No change(!)

Predicting network performance: 

Predicting network performance Predict application speedup: computation time (constant) unchanged fraction of memory latency reduced fraction of memory latency Original execution time New execution time Application speedup :2.13

Outline: 

Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions

Results: 

Results

Assumptions: 

Assumptions Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a path

Results: application variability: 

Results: application variability Correlation between computation time variability and prediction error is high, this could explain larger errors in some benchmarks

Results: different parameters: 

Results: different parameters FFT benchmark, results for different reconfiguration intervals and # extra links: good relative prediction

Outline: 

Outline Introduction Reconfigurable Optical Networks Prediction Model Results Future work & conclusions

Future work: 

Future work Access latency is not hidden by out-of-order execution Average reduction factor is used for all improved memory accesses (2.13 for 4x4 torus network) Memory accesses require only 2 nodes Computation time remains constant Congestion is not modeled Any combination of extra links can be made Extra links are not used as part of a path

Conclusions: 

Conclusions Using our technique, good predictions can be made using much less time-consuming simulations Good relative accuracy over a range of parameters allows for quick design-space exploration Further refinements can be made by including application variability and congestion