logging in or signing up BD-Phenix1 aSGuest119723 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 6 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 18, 2011 This Presentation is Public Favorites: 0 Presentation Description jhk Comments Posting comment... Premium member Presentation Transcript Building Reliable Services Using Backdoors: Building Reliable Services Using Backdoors Stephen Smaldone Department of Computer Science Rutgers UniversityFrustration Scalability: Frustration Scalability Service.comPlanetary-Scale Services: Planetary-Scale Services Human operators, phone calls and emails hard to scale Cost of ownership dramatically exceeds cost of systems Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JSTThe Dream: A Defensive Architecture: The Dream: A Defensive Architecture Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST Gateway BD Gateway BD BD BD Gateway BD BD BD BD BD Private NetworkPowerPoint Presentation: Possible Healing Actions Refresh the state (reboot) Destructive and Disruptive Repair the state (continue) Recover the state (transfer) How to access the memory of the failed system when the OS is “hung”?The Motivating Philosophy: The Motivating Philosophy Something is better than nothing Save application state if possible Faster is better than slower Repairing state faster than repairing software It is hard to corrupt or stop an outsider Remote healing better than self-healing Attackers and faults are becoming “smarter” Try “holistic” approach if nothing elseThe Backdoor (BD): The Backdoor (BD) Backdoor : a hidden software or hardware mechanism, usually created for testing and troubleshooting --American National Standard for TelecommunicationsBackdoor Design Principles: Backdoor Design Principles 1. Availability BD must be highly available (even when OS is not) 2. Non-intrusiveness BD operations must not involve local OS (zero-overhead monitoring) 3. Integrity OS cannot alter BD execution or modify the result of a BD operation 4. Responsiveness A BD operation cannot be delayed indefinitelyPossible Backdoor Implementations: Possible Backdoor Implementations A programmable network interface (I-NIC) Our current prototype is on Myrinet A virtual machine over a VMM Work in progress over Xen IBM’s Remote Supervisor Adapter? HP’s Remote Management Adapter?Backdoor as building block: Backdoor as building block Remote Healing Systems A computer system monitors/repairs/recovers the state of a remote system through the backdoor Backdoor is controlled by the remote OS Defensive Architectures Backdoors are programmed to execute defensive tasks, stand-alone or cooperatively over a private network Standalone backdoorOutline: Outline Introduction Backdoor Idea Remote Healing Defensive Architectures ConclusionsRemote Healing: Remote Healing Backdoor prototyped on I-NIC (Myrinet) Remote Repair of OS State Remote Recovery for Cluster-Based Internet ServersBackdoor on I-NIC : Backdoor on I-NIC Mem NIC CPU I-NIC Backdoor Private Network “Front door” Backdoor provides an alternative access to system memory without involving local CPU/OS Private network over a specialized interconnect, VPN, or even over a phone link!A Remote Healing Architecture: A Remote Healing Architecture Mem I/O CPU BD Target System BD Monitor System Mem I/O CPUBackdoors use Remote Memory Communication: Backdoors use Remote Memory Communication NIC CPU CPU Memory BD CPU Memory BD Monitor Target MONITOR (Remote-Read) Recovery/Repair (Remote-Read/Write)Remote OS Locking : Remote OS Locking Implemented by a BD-OS protocol Two functions Provides exclusive access to target OS data for state repairing Enforces fail-stop model in the recovery case to avoid the consequences of false positives in failure detection Can be avoided? Yes for monitoringOS Support for Remote Healing: Monitoring and Failure Detection Sensor Box : system health indicators (sensors) provided by the target OS in its local memory Sensors: <UniqueID, Type, Threshold , Value > Repairing Externalized State : OS state data that the BD can read Remote Access Hooks : OS control data that the BD can write to perform repairing actions Recovery Continuation Box : fine-grain OS and application checkpoint state that the BD can transfer between systems to migrate running applications OS Support for Remote HealingSensor Box (SB): Sensor Box (SB) Collection of health indicators (sensors) in the target OS memory <ID, Type, Threshold, Value> Sensor Type Threshold Progress Update deadline Level Max/Min value Pressure Max number of eventsFailure Detection using Sensor Box: Target OS Monitor Sensor Box Target OS updates progress sensors in SB continuously Monitoring thread reads SB periodically and checks counters Failure = counter stalled beyond its deadline False positive rate vs. detection latency tradeoff Backdoor <Timer interrupts> <Context switches> <NIC interrupts> … Failure Detection using Sensor BoxMonitoring and Detection Using BD: Monitoring and Detection Using BD CPU Mem BD CPU Mem BD Sensor Box Remote view DetectionDiagnosis and Repairig: Diagnosis and Repairig Diagnosis Inspect live OS data structures in target’s memory (through the externalized state) Identify damaged OS state (e.g. resource exhaustion due to memory hogging processes) Repairing Modify target OS memory (through remote access hooks) to correct damaged state (e.g. remove memory hogging processes by “injecting” a kill signal in its process control block)Diagnosis Using BD: Diagnosis Using BD CPU Mem BD CPU Mem BD Externalized state Fine grained view DiagnosisRepair Using BD: Mem Repair Hook Repair Repair Using BD CPU Mem BD CPU BD Correct stateCase Study: Repairing OS State: Case Study: Repairing OS State Damaged OS state : resource exhaustion, corrupted data structures, compromised OS, etc. Resource exhaustion Attack, overload, system misconfiguration, programming error Repairing cannot rely on local resources Two examples Fork bomb Memory hogCase Study : Memory Hog: Case Study : Memory Hog Program allocates memory in an infinite loop Both memory and swap space are occupied by the memory hog System is inaccessible from console or the network Cannot spawn new processes Cannot handle interrupts Local daemons cannot repair systemRemote Repairing in case of Memory Hogging: Remote Repairing in case of Memory Hogging Monitoring Pressure sensor signals when severe low memory condition is detected Diagnosis Target externalizes process table and process memory usage statistics Monitoring thread identifies the culprit Repairing Monitoring thread kills culprit by remotely posting a SIGKILLPrototype: Prototype BD implemented on Myrinet LanaiX NIC Modified firmware and low level GM library Modified FreeBSD 4.8 kernel Experimental setup Dell Poweredge 2600 servers with 2.4 GHz dual Intel Xeon, 1GB RAM, 2GB swap, Myrinet Lanai X NIC Benchmark: simple counting program with fixed number of iterationsEffectiveness of Remote Repairing: Effectiveness of Remote RepairingRepairing Timeline : Repairing TimelineRemote Healing: Remote Healing Backdoor prototype using Myrinet Remote Repair of OS State Remote Recovery for Cluster-based Internet ServersClusters with BD Network: Clusters with BD Network P M I/O BD P M I/O BD P M I/O BD P M I/O BD Interconnect M T M M T T T MCluster-based Internet Services with BD network: Cluster-based Internet Services with BD network Server Monitor Server Monitor Server Monitor Client Client ClientCluster-based Internet Services with BD network: Cluster-based Internet Services with BD network Server Monitor Server Monitor Server Monitor Client Client ClientContinuation Box (CB): Continuation Box (CB) Idea Define per client-session state (OS and application) Transfer client sessions from the failed system to other systems in the cluster running the same server application CB encapsulates the state of a client session associated with a server application (possibly multi-process) OS state (data in transit through IPC channels) application-specific state (periodically exported/checkpointed by the application)Continuation Box Extraction: Continuation Box Extraction Memory BD CPU BD Victim machine (crashed) Recovery machine (healthy) Memory Continuation Box Recovered State OSClient-Session Continuation Box for Multi-Process Servers: Client-Session Continuation Box for Multi-Process Servers Client 1 CB2 CB1 TCP/IP IPC App. state Comm. state Process 1 Process 2 Client 2Continuation Box API: Continuation Box API create_cb for a client session export application state to CB associate I/O channel with the CB open_cb given an I/O channel import application state from CBChanges to make Server Recoverable: Changes to make Server Recoverable while (cid = accept()) { cbid = create_cb(cid) if ( import(cbid, &{file_name, offset}) == NULL) { receive(cid, file_name) offset = 0 } fd=open(file_name) seek(fd, offset) while (read(fd, block, size) != EOF) { send(cid, block, size) offset += size export(cbid, {file_name, offset}) } }State Synchronization Problem: State Synchronization Problem Application state (SB_APP) updated only upon export OS state (SB_IO) updated continuously by the OS kernel How to synchronize the two components of the CB? A1 A1 3 2 OS Application export SB_IO SB_APP SB A1 A1 3 OS Application import SB_IO SB_APP SB A2 A1 3 2 OS Application SB_IO SB_APP SB RECVCB-based Recovery: CB-based Recovery Log-based rollback recovery restores server state with respect to a client OS keeps communication logs (send/receive) 0-copy using the communication buffers After migration, OS replays send/receive operations from logs transparent to server and client applicationsBackdoors Prototype: Backdoors Prototype Myrinet LanaiX NIC as backdoor in-kernel remote read/write operations Modified FreeBSD kernel Sensor Box, Continuation Box Modified server applications Apache, Flash, Icecast, JBossCase Study: A Multi-tier Auction Service: Case Study: A Multi-tier Auction Service Back-End MySQL DB server Front-End (FE) Apache web server Middle Tier (MT) JBoss app. serverRecoverable RUBiS: Recoverable RUBiSPowerPoint Presentation: Experimental Evaluation Experimental setup Dell PowerEdge 2600 servers, 2.4 GHz dual Intel Xeon, 1GB RAM, 1Gb Ethernet Workload modeled after TPC-W Fault injection in FE and MT nodes synthetic freeze, emulated freeze by remote OS locking, bugs inserted in network drivers Evaluation Low overhead under load Recovery is fastLow Overhead under Load: Low Overhead under LoadRecovery is Fast: Recovery is FastOutline: Outline Introduction Backdoor Idea Remote Healing Experience Defensive Architectures ConclusionsAutonomous Backdoor: Autonomous Backdoor BD is programmed to execute defensive tasks, then “sealed”Defensive Architecture Hierarchy: Defensive Architecture Hierarchy Defensive Computer Architecture (DCA) Individual computers equipped with BD BD performs local defensive tasks (e.g. OS state inspection) Defensive Network Architecture (DNA) Cluster nodes equipped with BDs connected over high-speed private network BDs perform defensive tasks cooperatively (e.g. OS integrity checking, continuous remote logging) Defensive Inter-Network Architectures (DINA) Loosely coupled DNAs connected over the Internet or other networks DNA cooperate (e.g. early warnings of virus attacks)Defensive Inter-Network Architecture over PlanetLab (new project): Defensive Inter-Network Architecture over PlanetLab (new project) Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST Gateway BD Gateway BD BD BD Gateway BD BD BD BD BD Private NetworkLocal Memory Inspection (Work in Progress): Local Memory Inspection (Work in Progress) Orion - Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system statesRelated Work : Related Work DEC WRL Titan system [’86] Recoverable OS subsystems Rio reliable file cache [Chen ‘96] Recovery Box [Baker ‘92] Defensive Programming [Qie ‘03] Nooks [Swift ’04] Recovery Oriented Computing [Patterson’02] Microreboot [Candea’04] TCP Connection Failover[Snoeren’01, Sultan’01, Alvisi’01, Koch’03, Mishra’03, Zagorodnov’03] Automatic repair of data structures [Demski ‘03] K42 [Soules ’03] Hypervisor-based fault tolerance [Bressoud ‘95]Conclusions : Conclusions The Backdoor is a promising building block for remote healing and defensive architectures Feasibility studies for Remote Repairing and Remote Recovery using I-NIC-based Backdoor prototype Current work includes Defensive Architectures and OrionPeople and Money Behind Backdoors: People and Money Behind Backdoors Liviu Iftode Florin Sultan Aniruddha Bohra Pascal Gallard (INRIA/IRISA, France) Iulian Neamtiu (University of Maryland) Yufei Pan Arati Baliga Tzvika Chumash NSF CAREER CCR-0133366Thank You!: Thank You! http://discolab.rutgers.edu/bdaYes, BD Security! (work in progress): Yes, BD Security! (work in progress) BD under OS control Access to remote memory controlled through memory registration (established at the initialization time) Voting scheme for remote writes (delayed writes) BDs monitor each other and their OSes integrity Autonomous BD OS cannot access BD memory after initialization (possible with PCI Express)Local Memory Inspection (Work in Progress): Local Memory Inspection (Work in Progress) Kernel Integrity Monitoring & Healing Search for kernel rootkits individual kernel functions kernel tables e.g. syscall dynamic structures e.g. the process table, etc Repair the kernel when compromised Replace tampered tables with clean versions. Replace corrupt versions of kernel functions with clean ones. Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system states You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
BD-Phenix1 aSGuest119723 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINT lite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 6 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 18, 2011 This Presentation is Public Favorites: 0 Presentation Description jhk Comments Posting comment... Premium member Presentation Transcript Building Reliable Services Using Backdoors: Building Reliable Services Using Backdoors Stephen Smaldone Department of Computer Science Rutgers UniversityFrustration Scalability: Frustration Scalability Service.comPlanetary-Scale Services: Planetary-Scale Services Human operators, phone calls and emails hard to scale Cost of ownership dramatically exceeds cost of systems Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JSTThe Dream: A Defensive Architecture: The Dream: A Defensive Architecture Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST Gateway BD Gateway BD BD BD Gateway BD BD BD BD BD Private NetworkPowerPoint Presentation: Possible Healing Actions Refresh the state (reboot) Destructive and Disruptive Repair the state (continue) Recover the state (transfer) How to access the memory of the failed system when the OS is “hung”?The Motivating Philosophy: The Motivating Philosophy Something is better than nothing Save application state if possible Faster is better than slower Repairing state faster than repairing software It is hard to corrupt or stop an outsider Remote healing better than self-healing Attackers and faults are becoming “smarter” Try “holistic” approach if nothing elseThe Backdoor (BD): The Backdoor (BD) Backdoor : a hidden software or hardware mechanism, usually created for testing and troubleshooting --American National Standard for TelecommunicationsBackdoor Design Principles: Backdoor Design Principles 1. Availability BD must be highly available (even when OS is not) 2. Non-intrusiveness BD operations must not involve local OS (zero-overhead monitoring) 3. Integrity OS cannot alter BD execution or modify the result of a BD operation 4. Responsiveness A BD operation cannot be delayed indefinitelyPossible Backdoor Implementations: Possible Backdoor Implementations A programmable network interface (I-NIC) Our current prototype is on Myrinet A virtual machine over a VMM Work in progress over Xen IBM’s Remote Supervisor Adapter? HP’s Remote Management Adapter?Backdoor as building block: Backdoor as building block Remote Healing Systems A computer system monitors/repairs/recovers the state of a remote system through the backdoor Backdoor is controlled by the remote OS Defensive Architectures Backdoors are programmed to execute defensive tasks, stand-alone or cooperatively over a private network Standalone backdoorOutline: Outline Introduction Backdoor Idea Remote Healing Defensive Architectures ConclusionsRemote Healing: Remote Healing Backdoor prototyped on I-NIC (Myrinet) Remote Repair of OS State Remote Recovery for Cluster-Based Internet ServersBackdoor on I-NIC : Backdoor on I-NIC Mem NIC CPU I-NIC Backdoor Private Network “Front door” Backdoor provides an alternative access to system memory without involving local CPU/OS Private network over a specialized interconnect, VPN, or even over a phone link!A Remote Healing Architecture: A Remote Healing Architecture Mem I/O CPU BD Target System BD Monitor System Mem I/O CPUBackdoors use Remote Memory Communication: Backdoors use Remote Memory Communication NIC CPU CPU Memory BD CPU Memory BD Monitor Target MONITOR (Remote-Read) Recovery/Repair (Remote-Read/Write)Remote OS Locking : Remote OS Locking Implemented by a BD-OS protocol Two functions Provides exclusive access to target OS data for state repairing Enforces fail-stop model in the recovery case to avoid the consequences of false positives in failure detection Can be avoided? Yes for monitoringOS Support for Remote Healing: Monitoring and Failure Detection Sensor Box : system health indicators (sensors) provided by the target OS in its local memory Sensors: <UniqueID, Type, Threshold , Value > Repairing Externalized State : OS state data that the BD can read Remote Access Hooks : OS control data that the BD can write to perform repairing actions Recovery Continuation Box : fine-grain OS and application checkpoint state that the BD can transfer between systems to migrate running applications OS Support for Remote HealingSensor Box (SB): Sensor Box (SB) Collection of health indicators (sensors) in the target OS memory <ID, Type, Threshold, Value> Sensor Type Threshold Progress Update deadline Level Max/Min value Pressure Max number of eventsFailure Detection using Sensor Box: Target OS Monitor Sensor Box Target OS updates progress sensors in SB continuously Monitoring thread reads SB periodically and checks counters Failure = counter stalled beyond its deadline False positive rate vs. detection latency tradeoff Backdoor <Timer interrupts> <Context switches> <NIC interrupts> … Failure Detection using Sensor BoxMonitoring and Detection Using BD: Monitoring and Detection Using BD CPU Mem BD CPU Mem BD Sensor Box Remote view DetectionDiagnosis and Repairig: Diagnosis and Repairig Diagnosis Inspect live OS data structures in target’s memory (through the externalized state) Identify damaged OS state (e.g. resource exhaustion due to memory hogging processes) Repairing Modify target OS memory (through remote access hooks) to correct damaged state (e.g. remove memory hogging processes by “injecting” a kill signal in its process control block)Diagnosis Using BD: Diagnosis Using BD CPU Mem BD CPU Mem BD Externalized state Fine grained view DiagnosisRepair Using BD: Mem Repair Hook Repair Repair Using BD CPU Mem BD CPU BD Correct stateCase Study: Repairing OS State: Case Study: Repairing OS State Damaged OS state : resource exhaustion, corrupted data structures, compromised OS, etc. Resource exhaustion Attack, overload, system misconfiguration, programming error Repairing cannot rely on local resources Two examples Fork bomb Memory hogCase Study : Memory Hog: Case Study : Memory Hog Program allocates memory in an infinite loop Both memory and swap space are occupied by the memory hog System is inaccessible from console or the network Cannot spawn new processes Cannot handle interrupts Local daemons cannot repair systemRemote Repairing in case of Memory Hogging: Remote Repairing in case of Memory Hogging Monitoring Pressure sensor signals when severe low memory condition is detected Diagnosis Target externalizes process table and process memory usage statistics Monitoring thread identifies the culprit Repairing Monitoring thread kills culprit by remotely posting a SIGKILLPrototype: Prototype BD implemented on Myrinet LanaiX NIC Modified firmware and low level GM library Modified FreeBSD 4.8 kernel Experimental setup Dell Poweredge 2600 servers with 2.4 GHz dual Intel Xeon, 1GB RAM, 2GB swap, Myrinet Lanai X NIC Benchmark: simple counting program with fixed number of iterationsEffectiveness of Remote Repairing: Effectiveness of Remote RepairingRepairing Timeline : Repairing TimelineRemote Healing: Remote Healing Backdoor prototype using Myrinet Remote Repair of OS State Remote Recovery for Cluster-based Internet ServersClusters with BD Network: Clusters with BD Network P M I/O BD P M I/O BD P M I/O BD P M I/O BD Interconnect M T M M T T T MCluster-based Internet Services with BD network: Cluster-based Internet Services with BD network Server Monitor Server Monitor Server Monitor Client Client ClientCluster-based Internet Services with BD network: Cluster-based Internet Services with BD network Server Monitor Server Monitor Server Monitor Client Client ClientContinuation Box (CB): Continuation Box (CB) Idea Define per client-session state (OS and application) Transfer client sessions from the failed system to other systems in the cluster running the same server application CB encapsulates the state of a client session associated with a server application (possibly multi-process) OS state (data in transit through IPC channels) application-specific state (periodically exported/checkpointed by the application)Continuation Box Extraction: Continuation Box Extraction Memory BD CPU BD Victim machine (crashed) Recovery machine (healthy) Memory Continuation Box Recovered State OSClient-Session Continuation Box for Multi-Process Servers: Client-Session Continuation Box for Multi-Process Servers Client 1 CB2 CB1 TCP/IP IPC App. state Comm. state Process 1 Process 2 Client 2Continuation Box API: Continuation Box API create_cb for a client session export application state to CB associate I/O channel with the CB open_cb given an I/O channel import application state from CBChanges to make Server Recoverable: Changes to make Server Recoverable while (cid = accept()) { cbid = create_cb(cid) if ( import(cbid, &{file_name, offset}) == NULL) { receive(cid, file_name) offset = 0 } fd=open(file_name) seek(fd, offset) while (read(fd, block, size) != EOF) { send(cid, block, size) offset += size export(cbid, {file_name, offset}) } }State Synchronization Problem: State Synchronization Problem Application state (SB_APP) updated only upon export OS state (SB_IO) updated continuously by the OS kernel How to synchronize the two components of the CB? A1 A1 3 2 OS Application export SB_IO SB_APP SB A1 A1 3 OS Application import SB_IO SB_APP SB A2 A1 3 2 OS Application SB_IO SB_APP SB RECVCB-based Recovery: CB-based Recovery Log-based rollback recovery restores server state with respect to a client OS keeps communication logs (send/receive) 0-copy using the communication buffers After migration, OS replays send/receive operations from logs transparent to server and client applicationsBackdoors Prototype: Backdoors Prototype Myrinet LanaiX NIC as backdoor in-kernel remote read/write operations Modified FreeBSD kernel Sensor Box, Continuation Box Modified server applications Apache, Flash, Icecast, JBossCase Study: A Multi-tier Auction Service: Case Study: A Multi-tier Auction Service Back-End MySQL DB server Front-End (FE) Apache web server Middle Tier (MT) JBoss app. serverRecoverable RUBiS: Recoverable RUBiSPowerPoint Presentation: Experimental Evaluation Experimental setup Dell PowerEdge 2600 servers, 2.4 GHz dual Intel Xeon, 1GB RAM, 1Gb Ethernet Workload modeled after TPC-W Fault injection in FE and MT nodes synthetic freeze, emulated freeze by remote OS locking, bugs inserted in network drivers Evaluation Low overhead under load Recovery is fastLow Overhead under Load: Low Overhead under LoadRecovery is Fast: Recovery is FastOutline: Outline Introduction Backdoor Idea Remote Healing Experience Defensive Architectures ConclusionsAutonomous Backdoor: Autonomous Backdoor BD is programmed to execute defensive tasks, then “sealed”Defensive Architecture Hierarchy: Defensive Architecture Hierarchy Defensive Computer Architecture (DCA) Individual computers equipped with BD BD performs local defensive tasks (e.g. OS state inspection) Defensive Network Architecture (DNA) Cluster nodes equipped with BDs connected over high-speed private network BDs perform defensive tasks cooperatively (e.g. OS integrity checking, continuous remote logging) Defensive Inter-Network Architectures (DINA) Loosely coupled DNAs connected over the Internet or other networks DNA cooperate (e.g. early warnings of virus attacks)Defensive Inter-Network Architecture over PlanetLab (new project): Defensive Inter-Network Architecture over PlanetLab (new project) Internet Failure Attacks 9:00pm EST 2:00am GMT 11:00am JST Gateway BD Gateway BD BD BD Gateway BD BD BD BD BD Private NetworkLocal Memory Inspection (Work in Progress): Local Memory Inspection (Work in Progress) Orion - Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system statesRelated Work : Related Work DEC WRL Titan system [’86] Recoverable OS subsystems Rio reliable file cache [Chen ‘96] Recovery Box [Baker ‘92] Defensive Programming [Qie ‘03] Nooks [Swift ’04] Recovery Oriented Computing [Patterson’02] Microreboot [Candea’04] TCP Connection Failover[Snoeren’01, Sultan’01, Alvisi’01, Koch’03, Mishra’03, Zagorodnov’03] Automatic repair of data structures [Demski ‘03] K42 [Soules ’03] Hypervisor-based fault tolerance [Bressoud ‘95]Conclusions : Conclusions The Backdoor is a promising building block for remote healing and defensive architectures Feasibility studies for Remote Repairing and Remote Recovery using I-NIC-based Backdoor prototype Current work includes Defensive Architectures and OrionPeople and Money Behind Backdoors: People and Money Behind Backdoors Liviu Iftode Florin Sultan Aniruddha Bohra Pascal Gallard (INRIA/IRISA, France) Iulian Neamtiu (University of Maryland) Yufei Pan Arati Baliga Tzvika Chumash NSF CAREER CCR-0133366Thank You!: Thank You! http://discolab.rutgers.edu/bdaYes, BD Security! (work in progress): Yes, BD Security! (work in progress) BD under OS control Access to remote memory controlled through memory registration (established at the initialization time) Voting scheme for remote writes (delayed writes) BDs monitor each other and their OSes integrity Autonomous BD OS cannot access BD memory after initialization (possible with PCI Express)Local Memory Inspection (Work in Progress): Local Memory Inspection (Work in Progress) Kernel Integrity Monitoring & Healing Search for kernel rootkits individual kernel functions kernel tables e.g. syscall dynamic structures e.g. the process table, etc Repair the kernel when compromised Replace tampered tables with clean versions. Replace corrupt versions of kernel functions with clean ones. Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system states