squirrel-podc

Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Squirrel: A peer-to-peer web cache:

Squirrel: A peer-to-peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

Web Caching:

Web Caching Latency, External traffic, Load on web servers and routers. Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.

Web Cache:

Centralized Web Cache Web Cache Browser Browser Cache Web Server Browser Browser Cache Client Client Internet Corporate LAN

Cooperative Web Cache:

Internet Corporate LAN Cooperative Web Cache Browser Browser Cache Web Server Browser Browser Cache Client Client Web Cache Web Cache Web Cache Web Cache Web Cache

Decentralized Web Cache:

Internet Decentralized Web Cache Browser Web Server Browser Browser Cache Client Client Corporate LAN Browser Cache Squirrel

Distributed Hash Table:

Distributed Hash Table Peer-to-peer location service: Pastry Completely decentralized and self-organizing Fault-tolerant, scalable, efficient Operations: Insert(k,v) Lookup(k) k6,v6 k1,v1 k5,v5 k2,v2 k4,v4 k3,v3 nodes <key,value> Peer-to-peer routing and location substrate

Why peer-to-peer?:

Why peer-to-peer ? Cost of dedicated web cache No additional hardware Administrative effort Self-organizing network Scaling implies upgrading Resources grow with clients

Setting:

Setting Corporate LAN 100 - 100,000 desktop machines Located in a single building or campus Each node runs an instance of Squirrel Sets it as the browser’s proxy

Mapping Squirrel onto Pastry:

Mapping Squirrel onto Pastry Two approaches: Home-store Directory

Home-store model:

Home-store model client home LAN Internet URL hash

Home-store model:

Home-store model client home …that’s how it works!

Directory model:

Directory model Client nodes always cache objects locally. Home-store: home node also stores objects. Directory: the home node only stores pointers to recent clients, and forwards requests.

Directory model:

Directory model client home Internet LAN

Directory model:

Directory model client home Randomly choose entry from table

Directory: Advantages:

Directory: Advantages Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects seems to improve load balancing. Home-store scheme can incur hotspots.

Directory: Disadvantages:

Directory: Disadvantages Cache insertion only happens at clients, so: active clients store all the popular objects, inactive clients waste most of their storage. Implications: Reduced cache size. Load imbalance.

Directory: Load spike example:

Directory: Load spike example Web page with many embedded images, or Periods of heavy browsing. Many home nodes point to such clients! Evaluate …

Trace characteristics:

Trace characteristics Microsoft in : Redmond Cambridge Total duration 1 day 31 days Number of clients 36,782 105 Number of HTTP requests 16.41 million 0.971 million Peak request rate 606 req/sec 186 req/sec Number of objects 5.13 million 0.469 million Number of cacheable objects 2.56 million 0.226 million Mean cacheable object reuse 5.4 times 3.22 times

Total external traffic:

Total external traffic 85 90 95 100 105 0.001 0.01 0.1 1 10 100 Directory Home-store No web cache Centralized cache Redmond [lower is better] Per-node cache size (in MB) Total external traffic (GB)

Total external traffic:

Total external traffic 5.5 5.6 5.7 5.8 5.9 6 6.1 0.001 0.01 0.1 1 10 100 Total external traffic (GB) [lower is better] Per-node cache size (in MB) Directory Home-store No web cache Centralized cache Cambridge

LAN Hops:

LAN Hops 0% 20% 40% 60% 80% 100% 0 1 2 3 4 5 6 Total hops within the LAN Redmond Centralized Home-store Directory % of cacheable requests

LAN Hops:

LAN Hops 0% 20% 40% 60% 80% 100% 0 1 2 3 4 5 % of cacheable requests Centralized Home-store Directory Cambridge Total hops within the LAN

Load in requests per sec:

Load in requests per sec 1 10 100 1000 10000 100000 0 10 20 30 40 50 Number of times observed Max objects served per-node / second Home-store Directory Redmond

Load in requests per sec:

Load in requests per sec 1 10 100 1000 10000 100000 1e+06 1e+07 0 10 20 30 40 50 Number of times observed Max objects served per-node / second Home-store Directory Cambridge

Load in requests per min:

Load in requests per min 1 10 100 0 50 100 150 200 250 300 350 Number of times observed Max objects served per-node / minute Home-store Directory Redmond

Load in requests per min:

Load in requests per min 1 10 100 1000 10000 0 20 40 60 80 100 120 Number of times observed Max objects served per-node / minute Home-store Directory Cambridge

Fault tolerance:

Fault tolerance Sudden node failures result in partial loss of cached content . Home-store: Proportional to failed nodes. Directory: More vulnerable.

Fault tolerance:

Fault tolerance Home-store Directory Redmond Mean 1% Max 1.77% Mean 1.71% Max 19.3% Cambridge Mean 1% Max 3.52% Mean 1.65% Max 9.8% If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:

Conclusions:

Conclusions Possible to decentralize web caching. Performance comparable to a centralized web cache, Is better in terms of cost, scalability, and administration effort, and Under our assumptions, the home-store scheme is superior to the directory scheme.

Other aspects of Squirrel:

Other aspects of Squirrel Adaptive replication Hotspot avoidance Improved robustness Route caching Fewer LAN hops

Thanks.:

Thanks.

(backup) Storage utilization:

(backup) Storage utilization Redmond Home-store Directory Total 97641 MB 61652 MB Mean per-node 2.6 MB 1.6 MB Max per-node 1664 MB 1664 MB

(backup) Fault tolerance:

(backup) Fault tolerance Home-store Directory Equations Mean H/O Max H max /O Mean (H+S)/O Max max(H max ,S max )/O Redmond Mean 0.0027% Max 0.0048% Mean 0.198% Max 1.5% Cambridge Mean 0.95% Max 3.34% Mean 1.68% Max 12.4%

(backup) Full home-store protocol:

(backup) Full home-store protocol server client other other req home req req a : object or notmod from home b : object or notmod from origin 3 1 b 2 (WAN) (LAN) origin b : req

(backup) Full directory protocol:

(backup) Full directory protocol dir server server e : cGET req origin origin other other req home req client req 2 b : not-modified 3 e 3 2 1 c ,e : req c ,e : object 1 4 a , d 2 a , d : req 1 a : no dir, go to origin. Also d 2 3 1 not-modified object or dele- gate

(backup) Peer-to-peer Computing:

(backup) Peer-to-peer Computing Decentralize a distributed protocol: Scalable Self-organizing Fault tolerant Load balanced Not automatic!!

Decentralized Web Cache:

Decentralized Web Cache Browser Browser Browser Cache Browser Cache Web Server LAN Internet

Challenge:

Challenge Decentralized web caching algorithm: Need to achieve those benefits in practice! Need to keep overhead unnoticeably low. Node failures should not become significant.

Peer-to-peer routing, e.g., Pastry:

Peer-to-peer routing, e.g., Pastry Peer-to-peer object location and routing substrate = Distributed Hash Table. Reliably maps an object key to a live node. Routes in log 16 (N) steps (e.g. 3-4 steps for 100,000 nodes)

Home-store is better!:

Home-store is better! Simpler home-store scheme achieves load balancing by hash function randomization. Directory scheme implicitly relies on access patterns for load distribution.

Directory scheme seems better…:

Directory scheme seems better… Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects results in load balancing.

Interesting difference:

Interesting difference Consider: Web page with many images, or Heavily browsing node Directory: many pointers to some node. Home-store: natural load balancing. Evaluate …

Fault tolerance:

Fault tolerance Home-store Directory Redmond Mean 0.0027% Max 0.0048% Mean 0.2% Max 1.5% Cambridge Mean 0.95% Max 3.34% Mean 1.7% Max 12.4% When a single Squirrel node crashes, the fraction of lost cached content is: