iris pl

Uploaded from authorPOINT
Views:
 
Category: Education
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

The CoDeeN Content Distribution Network: 

The CoDeeN Content Distribution Network Vivek S. Pai, Limin Wang, KyoungSoo Park, Ruoming Pang, Larry Peterson Princeton University August 12, 2003

Content Distribution Networks: 

Content Distribution Networks Replicates Web content broadly Redirects clients to 'best' copy Load, locality, proximity Offloads work from origin servers Multiplexes load spikes Reduces overprovisioning Ex: Akamai, Mirror Image, Speedera

What Does It Do?: 

What Does It Do? An Academic Content Distribution Network Redirects/caches HTTP requests Based on our OSDI 2002 paper on CDN performance An Open Proxy Network Probably the largest in existence

Who Is The Target Audience?: 

Who Is The Target Audience? Now Users wanting better performance People seeking 'anonymity' Next Content providers seeking load sharing Later General support for absorbing flash crowds Avoid the 'Slashdot Effect'

How Does It Work?: 

How Does It Work? Server surrogates (proxies) on most North American sites Originally everywhere, but we cut back Clients specify proxy to use Cache hits served locally Cache misses forwarded to CoDeeN nodes Maybe forwarded to origin servers

Request Forwarding: 

Request Forwarding

When Will It Be Ready?: 

When Will It Be Ready? January – development started Reliability andamp; stability major concerns March – stable enough for daily use April – security problems begin Shut down for one month June – Restarted 'beta' Expecting 'production' soon

Decisions – Good & Bad: 

Decisions – Good andamp; Bad Use commercial proxy with API [USITS 2003] Good – mostly layer 7 concerns Bad – limits deployment size (donated licenses) Deployment on PlanetLab Good – otherwise impossible 'Bad' – vulnerable to other experiments Allow open access Good – generates real traffic Bad – some traffic just plain mean

Lots of Malicious Traffic: 

Lots of Malicious Traffic Spammers SMTP tunnels, POST forms, IRC channels Bandwidth hogs Google crawls, steganographers, X-Pacific Hackers andamp; Spreaders Yahoo dictionary attacks, IIS vuln tests Content thieves E-journals/databases, local content Restrict ports andamp; HTTP methods Multi-scale req andamp; bw accounting Signature database andamp; Robot test Determine location andamp; privilege

Protecting Privilege: 

Protecting Privilege

Attempted SMTP Tunnels/Day: 

Attempted SMTP Tunnels/Day

By The Numbers…: 

By The Numbers… Restarted in late May In continuous operation Stats from first 8 weeks Over 59,000 unique IPs as clients Over 24 million requests serviced Valid rates up to 15K reqs/hour Roughly 1 million reqs/day aggregate

More Production Info: 

More Production Info About 2000 lines of code About ¼ is actual decision logic Uptimes limited by upgrades Generally 1-2 times/week Downtimes of 20 seconds/node Currently on ~40 nodes

Daily Requests (Serviced): 

Daily Requests (Serviced)

Welcome: 

Welcome

Avoiding: 

Avoiding sorted by # avoiding

Load: 

Load sorted by # load average

Total: 

Total sorted by # total req rate

Users: 

Users sorted by # users

The Troubles We’ve Caused: 

The Troubles We’ve Caused Routinely trigger open proxy alerts Educating sysadmins, others Resource checks generate noise Got onto planetlab-support Really good honeypots 6000 SMTP flows/minute at CMU Spammers do ~1M HTTP ops/day

What We’ve Learned: 

What We’ve Learned Parallel ssh is a must General commands/queries Basis for parallel scp Used to detect out-of-date files Monitoring is a must Too hard to see anomalies in 40+ nodes Almost looks like a demo Be careful accepting outside requests

What We Still Need: 

What We Still Need Better layer 4 tools Hard to tell why things die Building complete heartbeats isn’t fun Better isolation on most resources CPU/OS: Java, VServers, ??? Others: FD exhaustion, disk space

What We Wouldn’t Mind…: 

What We Wouldn’t Mind… Customizable DNS mapping Map project.planet-lab.org to some node Projects could provide feedback Node availability, utility, etc Most IP geolocation seems locked up

More Info: 

More Info http://codeen.cs.princeton.edu