BD-Phenix1

Views:
 
Category: Entertainment
     
 

Presentation Description

jhk

Comments

Presentation Transcript

Building Reliable Services Using Backdoors: 

Building Reliable Services Using Backdoors Stephen Smaldone Department of Computer Science Rutgers University

Frustration Scalability: 

Frustration Scalability Service.com

Planetary-Scale Services: 

Planetary-Scale Services Human operators, phone calls and emails hard to scale Cost of ownership dramatically exceeds cost of systems Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST

The Dream: A Defensive Architecture: 

The Dream: A Defensive Architecture Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST Gateway BD Gateway BD BD BD Gateway BD BD BD BD BD Private Network

PowerPoint Presentation: 

Possible Healing Actions Refresh the state (reboot) Destructive and Disruptive Repair the state (continue) Recover the state (transfer) How to access the memory of the failed system when the OS is “hung”?

The Motivating Philosophy: 

The Motivating Philosophy Something is better than nothing Save application state if possible Faster is better than slower Repairing state faster than repairing software It is hard to corrupt or stop an outsider Remote healing better than self-healing Attackers and faults are becoming “smarter” Try “holistic” approach if nothing else

The Backdoor (BD): 

The Backdoor (BD) Backdoor : a hidden software or hardware mechanism, usually created for testing and troubleshooting --American National Standard for Telecommunications

Backdoor Design Principles: 

Backdoor Design Principles 1. Availability BD must be highly available (even when OS is not) 2. Non-intrusiveness BD operations must not involve local OS (zero-overhead monitoring) 3. Integrity OS cannot alter BD execution or modify the result of a BD operation 4. Responsiveness A BD operation cannot be delayed indefinitely

Possible Backdoor Implementations: 

Possible Backdoor Implementations A programmable network interface (I-NIC) Our current prototype is on Myrinet A virtual machine over a VMM Work in progress over Xen IBM’s Remote Supervisor Adapter? HP’s Remote Management Adapter?

Backdoor as building block: 

Backdoor as building block Remote Healing Systems A computer system monitors/repairs/recovers the state of a remote system through the backdoor Backdoor is controlled by the remote OS Defensive Architectures Backdoors are programmed to execute defensive tasks, stand-alone or cooperatively over a private network Standalone backdoor

Outline: 

Outline Introduction Backdoor Idea Remote Healing Defensive Architectures Conclusions

Remote Healing: 

Remote Healing Backdoor prototyped on I-NIC (Myrinet) Remote Repair of OS State Remote Recovery for Cluster-Based Internet Servers

Backdoor on I-NIC : 

Backdoor on I-NIC Mem NIC CPU I-NIC Backdoor Private Network “Front door” Backdoor provides an alternative access to system memory without involving local CPU/OS Private network over a specialized interconnect, VPN, or even over a phone link!

A Remote Healing Architecture: 

A Remote Healing Architecture Mem I/O CPU BD Target System BD Monitor System Mem I/O CPU

Backdoors use Remote Memory Communication: 

Backdoors use Remote Memory Communication NIC CPU CPU Memory BD CPU Memory BD Monitor Target MONITOR (Remote-Read) Recovery/Repair (Remote-Read/Write)

Remote OS Locking : 

Remote OS Locking Implemented by a BD-OS protocol Two functions Provides exclusive access to target OS data for state repairing Enforces fail-stop model in the recovery case to avoid the consequences of false positives in failure detection Can be avoided? Yes for monitoring

OS Support for Remote Healing: 

Monitoring and Failure Detection Sensor Box : system health indicators (sensors) provided by the target OS in its local memory Sensors: <UniqueID, Type, Threshold , Value > Repairing Externalized State : OS state data that the BD can read Remote Access Hooks : OS control data that the BD can write to perform repairing actions Recovery Continuation Box : fine-grain OS and application checkpoint state that the BD can transfer between systems to migrate running applications OS Support for Remote Healing

Sensor Box (SB): 

Sensor Box (SB) Collection of health indicators (sensors) in the target OS memory <ID, Type, Threshold, Value> Sensor Type Threshold Progress Update deadline Level Max/Min value Pressure Max number of events

Failure Detection using Sensor Box: 

Target OS Monitor Sensor Box Target OS updates progress sensors in SB continuously Monitoring thread reads SB periodically and checks counters Failure = counter stalled beyond its deadline False positive rate vs. detection latency tradeoff Backdoor <Timer interrupts> <Context switches> <NIC interrupts> … Failure Detection using Sensor Box

Monitoring and Detection Using BD: 

Monitoring and Detection Using BD CPU Mem BD CPU Mem BD Sensor Box Remote view Detection

Diagnosis and Repairig: 

Diagnosis and Repairig Diagnosis Inspect live OS data structures in target’s memory (through the externalized state) Identify damaged OS state (e.g. resource exhaustion due to memory hogging processes) Repairing Modify target OS memory (through remote access hooks) to correct damaged state (e.g. remove memory hogging processes by “injecting” a kill signal in its process control block)

Diagnosis Using BD: 

Diagnosis Using BD CPU Mem BD CPU Mem BD Externalized state Fine grained view Diagnosis

Repair Using BD: 

Mem Repair Hook Repair Repair Using BD CPU Mem BD CPU BD Correct state

Case Study: Repairing OS State: 

Case Study: Repairing OS State Damaged OS state : resource exhaustion, corrupted data structures, compromised OS, etc. Resource exhaustion Attack, overload, system misconfiguration, programming error Repairing cannot rely on local resources Two examples Fork bomb Memory hog

Case Study : Memory Hog: 

Case Study : Memory Hog Program allocates memory in an infinite loop Both memory and swap space are occupied by the memory hog System is inaccessible from console or the network Cannot spawn new processes Cannot handle interrupts Local daemons cannot repair system

Remote Repairing in case of Memory Hogging: 

Remote Repairing in case of Memory Hogging Monitoring Pressure sensor signals when severe low memory condition is detected Diagnosis Target externalizes process table and process memory usage statistics Monitoring thread identifies the culprit Repairing Monitoring thread kills culprit by remotely posting a SIGKILL

Prototype: 

Prototype BD implemented on Myrinet LanaiX NIC Modified firmware and low level GM library Modified FreeBSD 4.8 kernel Experimental setup Dell Poweredge 2600 servers with 2.4 GHz dual Intel Xeon, 1GB RAM, 2GB swap, Myrinet Lanai X NIC Benchmark: simple counting program with fixed number of iterations

Effectiveness of Remote Repairing: 

Effectiveness of Remote Repairing

Repairing Timeline : 

Repairing Timeline

Remote Healing: 

Remote Healing Backdoor prototype using Myrinet Remote Repair of OS State Remote Recovery for Cluster-based Internet Servers

Clusters with BD Network: 

Clusters with BD Network P M I/O BD P M I/O BD P M I/O BD P M I/O BD Interconnect M T M M T T T M

Cluster-based Internet Services with BD network: 

Cluster-based Internet Services with BD network Server Monitor Server Monitor Server Monitor Client Client Client

Cluster-based Internet Services with BD network: 

Cluster-based Internet Services with BD network Server Monitor Server Monitor Server Monitor Client Client Client

Continuation Box (CB): 

Continuation Box (CB) Idea Define per client-session state (OS and application) Transfer client sessions from the failed system to other systems in the cluster running the same server application CB encapsulates the state of a client session associated with a server application (possibly multi-process) OS state (data in transit through IPC channels) application-specific state (periodically exported/checkpointed by the application)

Continuation Box Extraction: 

Continuation Box Extraction Memory BD CPU BD Victim machine (crashed) Recovery machine (healthy) Memory Continuation Box Recovered State OS

Client-Session Continuation Box for Multi-Process Servers: 

Client-Session Continuation Box for Multi-Process Servers Client 1 CB2 CB1 TCP/IP IPC App. state Comm. state Process 1 Process 2 Client 2

Continuation Box API: 

Continuation Box API create_cb for a client session export application state to CB associate I/O channel with the CB open_cb given an I/O channel import application state from CB

Changes to make Server Recoverable: 

Changes to make Server Recoverable while (cid = accept()) { cbid = create_cb(cid) if ( import(cbid, &{file_name, offset}) == NULL) { receive(cid, file_name) offset = 0 } fd=open(file_name) seek(fd, offset) while (read(fd, block, size) != EOF) { send(cid, block, size) offset += size export(cbid, {file_name, offset}) } }

State Synchronization Problem: 

State Synchronization Problem Application state (SB_APP) updated only upon export OS state (SB_IO) updated continuously by the OS kernel How to synchronize the two components of the CB? A1 A1 3 2 OS Application export SB_IO SB_APP SB A1 A1 3 OS Application import SB_IO SB_APP SB A2 A1 3 2 OS Application SB_IO SB_APP SB RECV

CB-based Recovery: 

CB-based Recovery Log-based rollback recovery restores server state with respect to a client OS keeps communication logs (send/receive) 0-copy using the communication buffers After migration, OS replays send/receive operations from logs transparent to server and client applications

Backdoors Prototype: 

Backdoors Prototype Myrinet LanaiX NIC as backdoor in-kernel remote read/write operations Modified FreeBSD kernel Sensor Box, Continuation Box Modified server applications Apache, Flash, Icecast, JBoss

Case Study: A Multi-tier Auction Service: 

Case Study: A Multi-tier Auction Service Back-End MySQL DB server Front-End (FE) Apache web server Middle Tier (MT) JBoss app. server

Recoverable RUBiS: 

Recoverable RUBiS

PowerPoint Presentation: 

Experimental Evaluation Experimental setup Dell PowerEdge 2600 servers, 2.4 GHz dual Intel Xeon, 1GB RAM, 1Gb Ethernet Workload modeled after TPC-W Fault injection in FE and MT nodes synthetic freeze, emulated freeze by remote OS locking, bugs inserted in network drivers Evaluation Low overhead under load Recovery is fast

Low Overhead under Load: 

Low Overhead under Load

Recovery is Fast: 

Recovery is Fast

Outline: 

Outline Introduction Backdoor Idea Remote Healing Experience Defensive Architectures Conclusions

Autonomous Backdoor: 

Autonomous Backdoor BD is programmed to execute defensive tasks, then “sealed”

Defensive Architecture Hierarchy: 

Defensive Architecture Hierarchy Defensive Computer Architecture (DCA) Individual computers equipped with BD BD performs local defensive tasks (e.g. OS state inspection) Defensive Network Architecture (DNA) Cluster nodes equipped with BDs connected over high-speed private network BDs perform defensive tasks cooperatively (e.g. OS integrity checking, continuous remote logging) Defensive Inter-Network Architectures (DINA) Loosely coupled DNAs connected over the Internet or other networks DNA cooperate (e.g. early warnings of virus attacks)

Defensive Inter-Network Architecture over PlanetLab (new project): 

Defensive Inter-Network Architecture over PlanetLab (new project) Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST Gateway BD Gateway BD BD BD Gateway BD BD BD BD BD Private Network

Local Memory Inspection (Work in Progress): 

Local Memory Inspection (Work in Progress) Orion - Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system states

Related Work : 

Related Work DEC WRL Titan system [’86] Recoverable OS subsystems Rio reliable file cache [Chen ‘96] Recovery Box [Baker ‘92] Defensive Programming [Qie ‘03] Nooks [Swift ’04] Recovery Oriented Computing [Patterson’02] Microreboot [Candea’04] TCP Connection Failover[Snoeren’01, Sultan’01, Alvisi’01, Koch’03, Mishra’03, Zagorodnov’03] Automatic repair of data structures [Demski ‘03] K42 [Soules ’03] Hypervisor-based fault tolerance [Bressoud ‘95]

Conclusions : 

Conclusions The Backdoor is a promising building block for remote healing and defensive architectures Feasibility studies for Remote Repairing and Remote Recovery using I-NIC-based Backdoor prototype Current work includes Defensive Architectures and Orion

People and Money Behind Backdoors: 

People and Money Behind Backdoors Liviu Iftode Florin Sultan Aniruddha Bohra Pascal Gallard (INRIA/IRISA, France) Iulian Neamtiu (University of Maryland) Yufei Pan Arati Baliga Tzvika Chumash NSF CAREER CCR-0133366

Thank You!: 

Thank You! http://discolab.rutgers.edu/bda

Yes, BD Security! (work in progress): 

Yes, BD Security! (work in progress) BD under OS control Access to remote memory controlled through memory registration (established at the initialization time) Voting scheme for remote writes (delayed writes) BDs monitor each other and their OSes integrity Autonomous BD OS cannot access BD memory after initialization (possible with PCI Express)

Local Memory Inspection (Work in Progress): 

Local Memory Inspection (Work in Progress) Kernel Integrity Monitoring & Healing Search for kernel rootkits individual kernel functions kernel tables e.g. syscall dynamic structures e.g. the process table, etc Repair the kernel when compromised Replace tampered tables with clean versions. Replace corrupt versions of kernel functions with clean ones. Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system states