planet 03

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Exercising DHash with Distributed Backup: 

Exercising DHash with Distributed Backup Robert Morris Frans Kaashoek, David Karger, Russ Cox, Frank Dabek, Emil Sit, Josh Cates, Jacob Strauss, James Robertson http://pdos.lcs.mit.edu/chord MIT Laboratory for Computer Science

Backup Characteristics: 

Backup Characteristics Full dump only once ever Then just incremental snapshots Dump reads a lot, to verify old blocks Data caching isn’t very useful Mostly bulk data movement Archived snapshots are on-line So interactive read also matters

What do we deserve?: 

What do we deserve? Throughput: Same as one TCP? N TCPs? Use window of RPCs to cover latency Lookup latency: very cheap w/ proximity Read latency (w/ k copies): max / 2k Write latency: max

Initial Performance: 

Initial Performance Bulk read: 50 kbytes/second Bulk write: 10 kbytes/second 90 nodes, Planetlab + RON 5 replicas of each block

What went wrong?: 

What went wrong? Forced to talk to the predecessor Fetch from replica closest to predecessor Replicas are large, so puts were slow Unified congestion window Mimiced single TCP increase/decrease Worst-case timeouts, no fast retransmit

Old Chord/DHash: 

Old Chord/DHash (key) predecessor, returns successor list replicas on successors 1. 4. 3. 2. originator

Solutions: 

Solutions 7-of-14 coding w/ IDA, not replication Maintain proximity to originator Avoid predecessor, or any particular node Fetch data from nodes close to originator Predict better RPC timeouts Key tool: synthetic coordinates

Vivaldi Synthetic Coordinates: 

Vivaldi Synthetic Coordinates Each node estimates its own position Position = (x,y): 'synthetic coordinates' x and y units are time (milliseconds) Distance predicts network latency Key point: predict w/o pinging first Like GNP, but no landmarks

Vivaldi synthetic coordinates: 

Vivaldi synthetic coordinates Each node starts with a random incorrect position 0,1 1,2 2,3 3,0

Vivaldi synthetic coordinates: 

Vivaldi synthetic coordinates Each node starts with a random incorrect position 2 ms 1 ms 2 ms 1 ms Each node 'pings' a few other nodes to measure network latency (distance) A B

Vivaldi synthetic coordinates: 

Vivaldi synthetic coordinates Each node starts with a random incorrect position Each node 'pings' a few other nodes to measure network latency (distance) 2 1 2 1 Each nodes 'moves' to cause measured distances to match coordinates

Vivaldi synthetic coordinates: 

Vivaldi synthetic coordinates Each node starts with a random incorrect position Each node 'pings' a few other nodes to measure network latency (distance) 2 1 2 1 Each nodes 'moves' to cause measured distance to match coordinates

Vivaldi in action: 

Vivaldi in action Execution on 86 PlanetLab Internet hosts Each host only pings a few other random hosts Most hosts find useful coordinates after a few dozen pings

Vivaldi predicts latency well: 

Vivaldi predicts latency well

New lookup+fetch: 

New lookup+fetch 1. 3. 2. API: lookup(k, m) returns first m successors of k Stops when successor list overlaps m C’s successor list A G F E D C B

Vivaldi helps decrease DHash fetch times: 

Vivaldi helps decrease DHash fetch times 77 PlanetLab + 18 RON nodes Fundamental limit about 100 ms?

DHash throughput: 

DHash throughput Berkeley Throughput (kbytes/second) Mazu NYU Australia

Conclusions: 

Conclusions Unclear what 'fair' throughput is vs TCP Main latency costs: Data is not likely to be close Last steps of lookup are expensive Need latency predictions to as-yet unknown nodes Synthetic coordinates are useful in many ways http://pdos.lcs.mit.edu/chord

Vivaldi vs. network coordinates: 

Vivaldi vs. network coordinates Vivaldi’s coordinates match geography well over-sea distances shrink (faster than over-land) orientation of Australia and Europe is 'wrong'

authorStream Live Help