Tapestry_quals

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Qualifying Examination : 

Qualifying Examination A Decentralized Location and Routing Infrastructure for Fault-tolerant Wide-area Applications John Kubiatowicz (Chair) Satish Rao Anthony Joseph John Chuang

Outline : 

Outline Motivation Tapestry Overview Examining Alternatives Preliminary Evaluation Research Plan

New Challenges in Wide-Area : 

New Challenges in Wide-Area Trends: Moore’s Law growth in CPU, b/w, storage Network expanding in reach and bandwidth Applications poised to leverage network growth Scalability: # of users, requests, traffic Expect failures everywhere 106’s of components  MTBF decreases geometrically Self-management Intermittent resources  single centralized management policy not enough Proposal: solve these issues at infrastructure level so applications can inherit properties transparently

Clustering for Scale : 

Clustering for Scale Pros: Easy monitor of faults LAN communication Low latency High bandwidth Shared state Simple load-balancing Cons: Centralized failure: Single outgoing link Single power source Geographic locality Limited scalability Outgoing bandwidth Power Space / ventilation

Global Computation Model : 

Global Computation Model Leverage proliferation of cheap computing resources: cpu’s, storage, b/w Global self-adaptive system Utilize resources wherever possible Localize effects of single failures No single point of vulnerability Robust, adaptive, persistent

Global Applications? : 

Global Applications? Fully distributed share of resources Storage: OceanStore, Freenet Computation: SETI, Entropia Network bandwidth: multicast, content distribution Deployment: application-level protocol Redundancy at every level Storage Network bandwidth Computation

Key: Routing and Location : 

Key: Routing and Location Network scale  stress on location / routing layer Wide-area decentralized location and routing on an overlay Properties abstracted in such a layer Scalability: million nodes, billion objects Availability: survive routine faults Dynamic Operation: self-configuring, adaptive Locality: minimize system-wide operations Load balanced operation

Research Issues : 

Research Issues Tradeoffs in performance vs. overhead costs Overlay routing efficiency vs. routing pointer storage Location locality vs. location pointer storage Fault-tolerance and availability vs. storage, bandwidth used Performance stability via redundancy Not: Application consistency issues Application level load partitioning

Outline : 

Outline Motivation Tapestry Overview Examining Alternatives Preliminary Evaluation Research Plan

What is Tapestry : 

What is Tapestry A prototype of a dynamic, scalable, fault-tolerant location and routing infrastructure Suffix-based hypercube routing Core system inspired by PRR97 Publish by reference Core API: publishObject(ObjectID, [serverID]) msgToObject(ObjectID) msgToNode(NodeID)

Design Goals : 

Design Goals Scalability Millions of nodes, billions of objects Self-configuration / Dynamic operation Dynamic insertion Surrogate Routing Fault-resilience / Self-maintenance Transparent operation despite failures Exploit redundancy for performance stability

Plaxton, Rajamaran, Richa ‘97 : 

Plaxton, Rajamaran, Richa ‘97 Overlay network with randomly distributed IDs Server (where objects are stored) Client (which want to search/contact objects) Router (which forwards messages from other nodes) Combined location and routing Servers “publish / advertise” objects they maintain Messages route to nearest server given object ID Assume global network knowledge

Basic Routing : 

Basic Routing Example: Octal digits, 218 namespace, 005712  627510

Publish / Location : 

Publish / Location Each object has associated root node, e.g. identity f( ) Root keeps a pointer to object’s location Object O stored at server S S routes to Root(O) Each hop keeps <O,S>in index database Client routes to Root(O), route to S when <O,S> found R C S

What’s New : 

What’s New PRR97 Benefits: Simple fault-handling Scalable: state: bLogb(N), hops: Logb(N)b=digit base, N= |namespace| Exploits locality Proportional route distance Limitations Global knowledge algorithms Root node vulnerability Lack of adaptability Tapestry Inherited: Scalability Exploits locality Proportional route distance New: Distributed algorithms Redundancy for fault-tolerance Redundancy for performance Self-configuring / adaptive

Fault-resilience : 

Fault-resilience Minimized soft-state vs. explicit fault-recovery Routing Redundant backup routing pointers Soft-state neighbor probe packets Location Soft-state periodic republish 50 million files/node, daily republish, b = 16, N = 2160 , 40B/msg, worst case update traffic  156 kb/s, expected traffic for network w/ 240 nodes  39 kb/s Hash objectIDs for multiple roots P(findingReference w/ partition) = 1 – (1/2)nwhere n = # of roots

Dynamic “Surrogate” Routing : 

Dynamic “Surrogate” Routing Real networks much smaller than namespace sparseness in the network Routing to non-existent node (or, defining f: (N)(n), where N = namespace, n = set of nodes in network ) Example:Routing to root node of object ONeed mapping from N n

PRR97 Approach to f(Ni) : 

PRR97 Approach to f(Ni) Given desired ID Ni, Find set S of nodes in existing network nodes n matching most # of suffix digits with Ni Choose Si = node in S with highest valued ID Issues: Mapping must be generated statically using global knowledge Must be kept as hard state in order to operate in changing environment Mapping is not well distributed, many nodes in n get no mappings

Tapestry Approach to f(Ni) : 

Tapestry Approach to f(Ni) Globally consistent distributed algorithm: Attempt to route to desired ID Ni Whenever null entry encountered, choose next “higher” non-null pointer entry If current node S is only non-null pointer in rest of route map, terminate route, f(Ni) = S Assumes: Routing maps across network are up to date Null/non-null properties identical at all nodes sharing same suffix

Analysis of Tapestry Algorithm : 

Analysis of Tapestry Algorithm Globally consistent deterministic mapping Null entry  no node in network with suffix consistent map  identical null entries across same route maps of nodes w/ same suffix Additional hops compared to PRR solution: Reduce to coupon collector problemAssuming random distribution With n  ln(n) + cn entries, P(all coupons)= 1-e-c For n=b, c=b-ln(b), P(b2 nodes left) = 1-b/eb = 1.8 10-6 # of additional hops  Logb(b2) = 2 Distributed algorithm with minimal additional hops

What if Not consistent? : 

What if Not consistent? Two cases A. If a node disappeared, and some node did not detect it. Routing proceeds on invalid link, fails No backup router, so proceed to surrogate routing B. If a node entered, has not been detected, then go to surrogate node instead of existing node New node checks with surrogate after all such nodes have been notified Route info at surrogate is moved to new node

Properties of Overlay : 

Properties of Overlay Logical hops through overlay per route Routing state per overlay node Overlay routing distance vs. underlying network Relative Delay Penalty (RDP) Messages for insertion Load balancing

Alternatives: P2P Indices : 

Alternatives: P2P Indices Current Solutions: DNS server redirection, DNS peering Content Addressable Networks Ratnasamy et al., ACIRI / UCB Chord Stoica, Morris, Karger, Kaashoek, BalakrishnanMIT Pastry Druschel and RowstronMicrosoft Research

Comparing the Alternatives : 

Comparing the Alternatives Properties Parameter Logical Path Length Neighbor-state Routing Overhead (RDP) Messages to insert Consistency Load-balancing TAP. Chord CAN Pastry Logb(N) Log2(N) O(d*N1/d) O(d) Base b bLogb(N) O(1) O(1) O(1) ? O(1)? App-dep. Eventual Epoch ??? Good Logb(N) Log2(N) None Dimen d Base b bLogb(N)+O(b) O(Log22(N)) O(d*N1/d) Good Good Good O(Logb2(N)) O(Logb(N)) Designed for P2P Systems

Outline : 

Outline Motivation Tapestry Overview Examining Alternatives Preliminary Evaluation Research Plan

Evaluation Metrics : 

Evaluation Metrics Routing distance overhead (RDP) Routing redundancy  fault-tolerance Availability of objects and references Message delivery under link/router failures Overhead of fault-handling Locality vs. storage overhead Optimality of dynamic insertion Performance stability via redundancy

Results: Location Locality : 

Results: Location Locality Measuring effectiveness of locality pointers (TIERS 5000)

Results: Stability via Redundancy : 

Results: Stability via Redundancy Impact of parallel queries on multiple roots on response time and its variability. Aggregate bandwidth measures b/w used for softstate republish 1/day and b/w used by requests at rate of 1/s.

Example Application: Bayeux : 

Application-level multicast Leverages Tapestry for scale fault-tolerance Optimizations Self-configuringinto sub-trees Group ID clustersfor lower b/w Example Application: Bayeux Root --10 -010 --00 ---0 -110 -100 -000 0010 1110

Research Scope : 

Research Scope Effectiveness as application infrastructure Build new novel apps Port existing apps to scale to wide-area Use simulations to better understand parameters effects on overall performance Explore further stability via statistics Understand / map out research space Outside scope: DoS resiliency Streaming media, P2P, content-distribution apps

Timeline 0-5 months : 

Timeline 0-5 months Simulation/analysis of parameters impact on performance Quantify approaches to exploit routing redundancy, analyze via simulation Finish deployment of real dynamic Tapestry Consider alternate mechanisms Learn from consistent hashing

Timeline 5-10 months : 

Timeline 5-10 months Extend deployment to wide-area networks Nortel, EMC, academic institutions Evaluate real world performance Design and implement network-embedded SDS (w/ T. Hodes) Optimizing routing by fault prediction Integrate link-characterization work (Konrad) Start writing dissertation

Timeline 10-13 months : 

Timeline 10-13 months Finish writing dissertation Travel / Interviews Graduate

Chord : 

Chord Distributed hashtable State: Log2(N) ptrs Hops: Log2(N) Key: Correlation between overlay distance and physical distance occurs over time 0 16 8 4 2 1

CAN : 

CAN Distributed hashtable addressed in d dimension coordinate space State: O(d) Hops: expected O(dn1/d) Key: Lack of network distance correlation partially offset by heuristics & redundancy Soft-state immutable objects, consistency by epochs?

Pastry : 

Pastry Incremental digit routing like Plaxton Object replicated at x nodes closest to object’s ID State: b(LogbN)+O(b) Hops: O(LogbN) Key: Does not exploit locality well Infrastructure controls replicates and their placement Questions of security, confidentiality, consistency? (Immutable objects?)

Algorithm: Dynamic Insertion : 

Algorithm: Dynamic Insertion New node desires integration w/ ID xxxxx Step 1: copy route map recursively ----x ---xx -xxxx --xxx

Dynamic Insertion cont. : 

Dynamic Insertion cont. Step 2: Get relevant data from surrogateinform edge nodes of your existence Step 3: Notify neighbors to optimize paths surrogate --xxx .. ... ... ..