51

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Tornado: Maximizing Locality and Concurrency in a SMMP OS: 

Tornado: Maximizing Locality and Concurrency in a SMMP OS

Contents : 

Contents Types of Locality Locality: A closer look Requirements for locality Design Basics of Tornado Test Results Conclusion

Types of Locality*: 

Types of Locality* Temporal locality “The concept that a resource that is referenced at one point in time will be referenced again sometime in the near future.” Spatial locality “The concept that the likelihood of referencing a resource is higher if a resource near it has been referenced.” Sequential locality “The concept that memory is accessed sequentially.” *Source: Wikipedia

Locality: A closer look, Read only case: 

Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 Processor # 2 Cache Cache x Memory

Locality: A closer look, Read only case: 

Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 x Processor # 2 x x Cache Cache Memory

Locality: A closer look, Read only case: 

Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 x Processor # 2 x x Cache Cache Memory

Locality: A closer look, Read only case: 

Locality: A closer look, Read only case bool x = true; while (x) { // Do some work // reading but not // writing x… } Processor # 1 x Processor # 2 x x Cache Cache Memory Notes: No accesses on the bus Because accesses are reads that are satisfied in local caches and no invalidations are sent

Locality: A closer look, Read/Write case: 

Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 Cache x Memory bool x = true; while (x) { x = false; // Do other // work… }

Locality: A closer look, Read/Write case: 

Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x Processor # 2 x x Memory bool x = true; while (x) { x = false; // Do other // work… }

Locality: A closer look, Read/Write case: 

Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x Processor # 2 x x Memory bool x = true; while (x) { x = false; // Do other // work… } Invalidate block containing x

Locality: A closer look, Read/Write case: 

Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x Processor # 2 x x Memory bool x = true; while (x) { x = false; // Do other // work… } 2. Read request 1. Cache miss

Locality: A closer look, Read/Write case: 

Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x Processor # 2 x x Memory bool x = true; while (x) { x = false; // Do other // work… } 2. Read request 1. Cache miss 3. Data

Locality: A closer look, Read/Write case: 

Locality: A closer look, Read/Write case bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x Processor # 2 x x Memory bool x = true; while (x) { x = false; // Do other // work… } 2. Read request 1. Cache miss 3. Data 4. Write 5. Invalidate block containing x Notes: x becomes a bottleneck, the valid copy keeps jumping from one cache to the other Every write access causing invalidation Almost every read causing a read miss and a bus read

Locality: A closer look, Effect of Cache Line Length: 

Locality: A closer look, Effect of Cache Line Length bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 Processor # 2 x Memory bool y = true; while (y) { y = false; // Do other // work… } y 0x0 0x4 x,y Notes: x & y have different addresses but fall into the same cache line (block)!

Locality: A closer look, Effect of Cache Line Length: 

Locality: A closer look, Effect of Cache Line Length bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x,y Processor # 2 x Memory bool y = true; while (y) { y = false; // Do other // work… } y 0x0 0x4 x,y Notes: Read doesn’t cause any problem

Locality: A closer look, Effect of Cache Line Length: 

Locality: A closer look, Effect of Cache Line Length bool x = true; while (x) { x = false; // Do other // work… } Processor # 1 x,y Processor # 2 x Memory bool y = true; while (y) { y = false; // Do other // work… } y 0x0 0x4 x,y Notes: Remember: Invalidations are per cache-line/block not word! So we have pretty much the same behavior as the read/write case on a single variable Invalidate block containing x & y

Requirements for Locality: 

Requirements for Locality Spatial and temporal locality Minimizing read/write and write sharing Minimize false sharing Minimize the distance between the accessing processor and the target memory module.

Design Basics for Tornado: 

Design Basics for Tornado Individual resources are individual objects Clustering objects Protected procedure calls (PPC) Semi-automatic garbage collection

Clustered Objects: 

Clustered Objects Appears as a single object from the outside but is internally split into reps Each rep handles requests from one or more processors Lots of advantages to this design

Clustered Objects (cont.): 

Clustered Objects (cont.) Per-processor translation tables Partitioned global translation table Default “miss” handlers

Protected Procedure Calls: 

Protected Procedure Calls Microkernel: relies on servers to carry on part of the OS job As many server threads as there are clients A request is handled on the same processor where it was issued *Image source: Wikipedia

Garbage Collection: 

Garbage Collection Semi-automatic Makes distinction between temporary and persistent references to objects Eliminates the need for two locks to guarantee existence and locking altogether for read only data

Test Results: Effect of rep Count (1): 

Test Results: Effect of rep Count (1)

Test Results: Effect of rep Count (2): 

Test Results: Effect of rep Count (2)

Test Results: Effect of Cache Associativity: 

Test Results: Effect of Cache Associativity

Test Results: Tornado vs. Commercial OSes: 

Test Results: Tornado vs. Commercial OSes

Conclusion: 

Conclusion Tornado performs much better than many commercial OSes The concept of clustered objects gives it a lot of advantage High locality of data Diminished need for locking Higher degree of sharing, concurrency and modularity