logging in or signing up FTMA latest Wen12 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 60 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 24, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Fault-Tolerance Issues for Communicating Mobile Agents Keith Marzullo University of California, San Diego Department of Computer Science and Engineering … and the TACOMA group 6 October 1999Fault-Tolerance: Fault-Tolerance Fault-tolerance can mean different things: Ensuring that a failure will not be visible to the application masking Detecting when a failure has occurred. detection and recovery Ensuring that a failure will not cause an inconsistent application state to arise. atomic transactionsRoadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Masking: Masking Uses sufficient replication and voting so that (independent) failures of components does not result in an incorrect state. It can be supplied as a wrapper that hides the replication of the service from the clients. Different approaches appropriate for different failure model, performance requirements, and underlying communication systems.State Machine Approach: State Machine Approach Primary-Backup Approach: Primary-Backup Approach Detection and Recovery: Detection and Recovery Detection require less replication than masking 1 vs. f+1 for detecting vs. masking f failstop crashes f+1 vs. 2f+1 for detecting vs. masking f arbitrary failures Recovery can be rollback, roll forward, or more specific approach. Roadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Replicated Agents with Voting: Replicated Agents with Voting S DReplicated Agents with Voting (2): Replicated Agents with Voting (2) S DReplicated Agents with Voting (3): Replicated Agents with Voting (3) S DReplicated Agents with Voting (4): Replicated Agents with Voting (4) Replicated Agents with Voting (5): Replicated Agents with Voting (5) Implements an architecture that can tolerate maliciously faulty landing pads. Rather complex and expensive. Perhaps best solved by landing pad.Roadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Primary-Backup by Application: Primary-Backup by Application Places can crash, causing local agents to become lost. Agent code can be faulty, causing an agent to repeatedly fail. Communications can break, causing an agent’s plan to be unattainable.Norwegian Army Protocol: Norwegian Army Protocol Protocol uses the places an agent has visited as a set of of potential places for recovery code to execute. The linear structure of a trajectory defines a monitoring strategy. version 1 (oldest) version 2 version 3 (youngest) version 4 current agent rear guardsApplication Interaction: Application Interaction An agent executes a fault-tolerant action at a place Action completes with a move or exit Regular actions have an attribute failure Failure actions have attributes failedCode and failedAt If a regular action r fails then there is exactly one completed failure action f such that: f.code = r.failure f.failedCode = r.code f.failedAt = r.place f.bc = r.bcFail-Stop Reliable Broadcast: Fail-Stop Reliable Broadcast Failure-Free Execution: Failure-Free ExecutionFailure Execution: Failure ExecutionNAP Details...: NAP Details... spawn and checkpoint operations also terminate fault-tolerant action Additional complexity arising from a mobile computation visiting same place multiple times. Can carry support for NAP along with mobile agent. scalability wrt administrative domainsRoadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Transactions: Transactions Atomicity based on atomic commit protocol and stable storage associated with each landing pad. Appears to be simple. Additional power comes from code mobility. Transactions and Code Mobility: Transactions and Code Mobility account store 3 store 2 store 1 lock $100 lock $200 lock $160 buy X lock $100 lock $300 lock $460Transactions and Code Mobility (2): Transactions and Code Mobility (2) account store 3 store 2 store 1 lock $100 lock $200 lock $160 buy X lock $100 lock $200 lock $200 Roadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Programming for Fault-Tolerance: Programming for Fault-Tolerance The kinds of problems we have been considering (so far) for NAP have to do with software installation and system maintenance. Synchronized installation of new version of package. Software license checking and upgrade. Specialized tool installation for distributed monitoring and testing. All are built around some variation of agreement or reliable broadcast.Programming for Fault-Tolerance (2): Programming for Fault-Tolerance (2) A plus seems to be the separation of mobility from function. Trajectory, synchronization, security and authentication are handled by mobility. But, writing fault-tolerant actions to implement the particular version agreement/reliable broadcast is awkward . … this seems to be a good place to use a higher-level programming language. e.g., SageObservations: Observations It’s hard to do fault-tolerance without knowing the failure model! Detection and recovery is more appropriate for mobile agent computations than masking. Need work by the fault-tolerance community into detection and recovery for arbitrary failures. System management and maintenance seems to be a very rich field for problems involving fault-tolerant mobile agent computations.Bibliography: Bibliography F. B. Schneider. Towards fault-tolerant and secure agentry. In 11th International Workshop, WDAG '97, Saarbrucken, Germany, 24-26 Sept. 1997), pp.1-14. Dag Johansen et. al. NAP: practical fault-tolerance for itinerant computations.In Proceedings. 19th IEEE International Conference on Distributed Computing Systems, Austin,TX, USA, 31 May-4 June 1999), pp. 180-189. M. Strasser and K. Rothermel. Reliability concepts for mobile agents. International Journal of Cooperative Information Systems, Dec. 1998, 7(4):355-382. A. Ricciardi. The Sage Project: Software Engineering for Distributed Applications. The University of Texas Department of Electrical and Computer Engineering TR-1996-007, available at http://www.bell-labs.com/user/aleta/TR-PDS-1996-007.ps.gz. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
FTMA latest Wen12 Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 60 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 24, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: Fault-Tolerance Issues for Communicating Mobile Agents Keith Marzullo University of California, San Diego Department of Computer Science and Engineering … and the TACOMA group 6 October 1999Fault-Tolerance: Fault-Tolerance Fault-tolerance can mean different things: Ensuring that a failure will not be visible to the application masking Detecting when a failure has occurred. detection and recovery Ensuring that a failure will not cause an inconsistent application state to arise. atomic transactionsRoadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Masking: Masking Uses sufficient replication and voting so that (independent) failures of components does not result in an incorrect state. It can be supplied as a wrapper that hides the replication of the service from the clients. Different approaches appropriate for different failure model, performance requirements, and underlying communication systems.State Machine Approach: State Machine Approach Primary-Backup Approach: Primary-Backup Approach Detection and Recovery: Detection and Recovery Detection require less replication than masking 1 vs. f+1 for detecting vs. masking f failstop crashes f+1 vs. 2f+1 for detecting vs. masking f arbitrary failures Recovery can be rollback, roll forward, or more specific approach. Roadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Replicated Agents with Voting: Replicated Agents with Voting S DReplicated Agents with Voting (2): Replicated Agents with Voting (2) S DReplicated Agents with Voting (3): Replicated Agents with Voting (3) S DReplicated Agents with Voting (4): Replicated Agents with Voting (4) Replicated Agents with Voting (5): Replicated Agents with Voting (5) Implements an architecture that can tolerate maliciously faulty landing pads. Rather complex and expensive. Perhaps best solved by landing pad.Roadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Primary-Backup by Application: Primary-Backup by Application Places can crash, causing local agents to become lost. Agent code can be faulty, causing an agent to repeatedly fail. Communications can break, causing an agent’s plan to be unattainable.Norwegian Army Protocol: Norwegian Army Protocol Protocol uses the places an agent has visited as a set of of potential places for recovery code to execute. The linear structure of a trajectory defines a monitoring strategy. version 1 (oldest) version 2 version 3 (youngest) version 4 current agent rear guardsApplication Interaction: Application Interaction An agent executes a fault-tolerant action at a place Action completes with a move or exit Regular actions have an attribute failure Failure actions have attributes failedCode and failedAt If a regular action r fails then there is exactly one completed failure action f such that: f.code = r.failure f.failedCode = r.code f.failedAt = r.place f.bc = r.bcFail-Stop Reliable Broadcast: Fail-Stop Reliable Broadcast Failure-Free Execution: Failure-Free ExecutionFailure Execution: Failure ExecutionNAP Details...: NAP Details... spawn and checkpoint operations also terminate fault-tolerant action Additional complexity arising from a mobile computation visiting same place multiple times. Can carry support for NAP along with mobile agent. scalability wrt administrative domainsRoadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Transactions: Transactions Atomicity based on atomic commit protocol and stable storage associated with each landing pad. Appears to be simple. Additional power comes from code mobility. Transactions and Code Mobility: Transactions and Code Mobility account store 3 store 2 store 1 lock $100 lock $200 lock $160 buy X lock $100 lock $300 lock $460Transactions and Code Mobility (2): Transactions and Code Mobility (2) account store 3 store 2 store 1 lock $100 lock $200 lock $160 buy X lock $100 lock $200 lock $200 Roadmap: Roadmap Review some ideas from fault-tolerance. Present some issues associated with masking and active replication in mobile agent computations. Present a protocol for detection and recovery in mobile agent computations based on primary-backup. Discuss some issues associated with transactional support. Mention a programming issue with detection and recovery.Programming for Fault-Tolerance: Programming for Fault-Tolerance The kinds of problems we have been considering (so far) for NAP have to do with software installation and system maintenance. Synchronized installation of new version of package. Software license checking and upgrade. Specialized tool installation for distributed monitoring and testing. All are built around some variation of agreement or reliable broadcast.Programming for Fault-Tolerance (2): Programming for Fault-Tolerance (2) A plus seems to be the separation of mobility from function. Trajectory, synchronization, security and authentication are handled by mobility. But, writing fault-tolerant actions to implement the particular version agreement/reliable broadcast is awkward . … this seems to be a good place to use a higher-level programming language. e.g., SageObservations: Observations It’s hard to do fault-tolerance without knowing the failure model! Detection and recovery is more appropriate for mobile agent computations than masking. Need work by the fault-tolerance community into detection and recovery for arbitrary failures. System management and maintenance seems to be a very rich field for problems involving fault-tolerant mobile agent computations.Bibliography: Bibliography F. B. Schneider. Towards fault-tolerant and secure agentry. In 11th International Workshop, WDAG '97, Saarbrucken, Germany, 24-26 Sept. 1997), pp.1-14. Dag Johansen et. al. NAP: practical fault-tolerance for itinerant computations.In Proceedings. 19th IEEE International Conference on Distributed Computing Systems, Austin,TX, USA, 31 May-4 June 1999), pp. 180-189. M. Strasser and K. Rothermel. Reliability concepts for mobile agents. International Journal of Cooperative Information Systems, Dec. 1998, 7(4):355-382. A. Ricciardi. The Sage Project: Software Engineering for Distributed Applications. The University of Texas Department of Electrical and Computer Engineering TR-1996-007, available at http://www.bell-labs.com/user/aleta/TR-PDS-1996-007.ps.gz.