Structured Approach to System Availability and Continuity

Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design :

Structured Approach to IT Business System Availability and Continuity Planning, Analysis and Design Alan McSweeney

Objectives:

January 11, 2011 2 Objectives To provide details on a structured approach to analyse and define availability and continuity requirements for IT systems To provide background information on the changing landscape of availability and continuity

Agenda:

January 11, 2011 3 Agenda Availability and Continuity Overview Availability Management Continuity Management Summary

Availability and Continuity :

January 11, 2011 4 Availability and Continuity Availability is the ability of a system or service to perform its required function at a stated instant or over a stated period of time. Availability is expressed as the availability ratio The proportion of time that the service is actually available for use by the customers within the agreed service hours Continuity is concerned with preparing to address unwanted occurrences May relate to the recovery of IT systems or entire business processes. Continuity is concerned with ensuring that IT Services are recovered within agreed time scale Availability is a superset of Continuity and encompasses the continued operation of systems in the event of a disaster Continuity ensures availability in extreme circumstances Availability defines what is to be available in these extreme circumstances

Availability and Continuity Relationship:

January 11, 2011 5 Availability and Continuity Relationship Availability Continuity Continuity Provides Business Impact Analysis to Availability Availability Provides Availability Criteria to Continuity

Availability and Continuity Relationships with Other IT Management Processes:

January 11, 2011 6 Availability and Continuity Relationships with Other IT Management Processes Availability Continuity Capacity Planning and Management IT Architecture Change Management Service Planning and Management Security Management Finance Management Puts a Cost on Lack of Availability Controls Expenditure on Availability and Continuity Defines the Capacity Required for Continuity and Availability Ensures Systems and Infrastructure are Designed to Incorporate Continuity and Availability Controls Change that May Impact Availability or Require Continuity to be Invoked Ensures that Continuity and Availability are Incorporated into Service Agreements and Provisions Controls Security that May Impact Continuity and Availability Continuity Provides Business Impact Analysis to Availability Availability Provides Availability Criteria to Continuity

Availability and Continuity:

January 11, 2011 7 Availability and Continuity Availability Defines availability of service during operating hours Under normal circumstances Under extraordinary circumstances Continuity Defines continued operations of critical services and their availability Time until services are available and state of service after recovery Under extraordinary circumstances

Availability and Continuity:

January 11, 2011 8 Availability and Continuity Service 1 Service 2 Component 1 Component 2 Component 3 Component 1 Component 4 Component 5 Service 3 Service 4 Component 1 Component 5 Component 6 Component 1 Component 2 Component 7 Primary IT Facilities Service 1 Component 1 Component 2 Component 3 Service 3 Component 1 Component 5 Component 6 Recovery IT Facilities Availability of Services During Normal Operations Availability of Services After Continuity Continuity of Operations

Availability and Continuity:

January 11, 2011 9 Availability and Continuity Service 1 Service 2 Component 1 Component 2 Component 3 Component 1 Component 4 Component 5 Service 3 Service 4 Component 1 Component 5 Component 6 Component 1 Component 2 Component 7 Primary IT Facilities Service 1 Component 1 Component 2 Component 3 Service 3 Component 1 Component 5 Component 6 Recovery IT Facilities Continuity of Operations Full View of Availability

Availability and Continuity:

January 11, 2011 10 Availability and Continuity ContinuousOperation Disaster Recovery High Availability Business Continuity

Availability and Continuity:

January 11, 2011 11 Availability and Continuity Availability During Normal Operations Availability During Housekeeping and Maintenance Operations Availability After Some Component Failures Availability After Complete Failure of Primary Facility Availability Continuity

Availability and Continuity Heat Map:

January 11, 2011 12 Availability and Continuity Heat Map Last Transaction Minutes Hours Days Days Hours Minutes Seconds Instantly Recovery Time Objective (RTO) – Time to Recover Service/Time By Which Service Needs to be Recovered Recovery Point Objective (RPO) – Amount of Data Loss Tolerable After Recovery Increasing Availability (and Continuity) Requirements

RTO and RPO:

January 11, 2011 13 RTO and RPO Recovery Point Objective (RPO) Amount of Data Loss Tolerable After Recovery Either amount of data immediately available after recovery or amount of data available for some time after recovery Can be different Provide some data for minimal operations initially Provide more/all data Recovery Time Objective (RTO) Time to Recover Service/Time By Which Service Needs to be Recovered

RTO and RPO With Cost of Lack of Availability:

January 11, 2011 14 RTO and RPO With Cost of Lack of Availability Recovery Point Objective (RPO) – Amount of Data Loss Tolerable After Recovery Recovery Time Objective (RTO) – Time to Recover Service/Time By Which Service Needs to be Recovered Cost of Lack of Availability of Service/Cost Benefit of Providing High Availability and High Continuity Business Critical Services Requiring Immediate Access With Very Limited/No Data Loss and Requiring Continued Operation in the Event of a Disaster Add extra dimension to Availability and Continuity Heat Map to allow for explicit identification of those systems that need to be continuously available

What is a Business Critical Application?:

January 11, 2011 15 What is a Business Critical Application? Applications deemed business/mission critical 2006 – 16% 2007 – 36% 2008 – 56% 2009 – 60% Availability and continuity are merging as most applications are being deemed mission critical

How Often Have You had to Invoke Continuity Plan in Last Five Years?:

January 11, 2011 16 How Often Have You had to Invoke Continuity Plan in Last Five Years? 27% of organisations have declared at least one disaster in the last five years

What Were the Causes of Having to Invoke Continuity Plans?:

January 11, 2011 17 What Were the Causes of Having to Invoke Continuity Plans?

Continuity Testing Seen as Disruptive:

January 11, 2011 18 Continuity Testing Seen as Disruptive 40% of organisations state that continuity testing impacts customers 32% of organisations state that continuity testing impacts sales Reasons for lack of testing Lack of time resources Lack of technology Disruption to employees Budget Disruption to customers Disruption to sales Disruption to production systems Not seen as a priority

Business Impact of Lack of Availability and Continuity Increase Exponentially Over Time:

January 11, 2011 19 Business Impact of Lack of Availability and Continuity Increase Exponentially Over Time

Availability Design and Management:

January 11, 2011 20 Availability Design and Management Availability design optimises the capability of the IT infrastructure, services and supporting organisation to deliver a cost effective and sustained level of availability that enables the business to satisfy its business objectives Ensures IT systems and infrastructure are designed to deliver the levels of availability required by the business Provides a range of availability reporting to ensure that agreed levels of Availability are continuously measured and monitored Optimises the availability of the IT infrastructure to deliver cost effective improvements that deliver real benefits to the business Ensures shortfalls in availability are recognised and corrective actions are identified and performed Reduces problems and incidents that impact availability Creates and maintains an Availability Plan aimed at improving the overall availability and infrastructure components to ensure business availability requirements can be satisfied

Continuity Design and Management:

January 11, 2011 21 Continuity Design and Management Continuity design is concerned with responding to and recovering business operations in the event of an outage or disaster rendering significant impact on the organisation Support the business by ensuring that the required IT facilities can be recovered within required and agreed business timescales Provides the strategic and operational framework to review the way the organisation continues to provide its services while increasing its ability to recover from disruption, interruption or loss Depends both on management and operations Requires management commitment

People, Process, Technology:

January 11, 2011 22 People, Process, Technology Start availability and continuity design with a business impact analysis and risk assessment Technology exists to supports availability and continuity design - technology not constitute a plan Focus on prevention before investing in technology However, availability and continuity is seen as the preserve of IT The business frequently does not have the required project focus or experience Embed availability and continuity into IT architectures

Questions:

January 11, 2011 23 Questions Do you have adequate control over prevention of business process or IT infrastructure downtime? Do you have adequate IT capabilities to insure continuous operations? Do you know the risks your business and its business systems face? What would the cost and impact of downtime be to your business? Is your current continuity plan sufficient to meet your RPO and RTO objectives? Do you know how much will business continuity costs? What business problems will implementing availability and continuity solve even if you do not experience an unplanned IT outage? What is the overall business value of availability and continuity to the business? How should we define what level of business continuity we really need?

Availability Design and Management:

January 11, 2011 24 Availability Design and Management

Availability Design and Management Process:

January 11, 2011 25 Availability Design and Management Process 2. Availability Report Evaluation and Improvement 1. Availability Reporting 3. Management Escalations of Service Availability Violations 2. Document System and Application Architecture 1. Availability Requirements Analysis 4. Availability Review 3. Gap Analysis and Recommendations Availability Process Quality Control Availability Process Design and Management Availability Design and Management Consists of Two Parallel Sub-Processes

Structured Approach to Availability Design and Management:

January 11, 2011 26 Structured Approach to Availability Design and Management Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape Scope is to define a plan to implement agreed availability

Scope of Availability Design and Management:

January 11, 2011 27 Scope of Availability Design and Management Planning for service availability Designing for service availability by anticipating disruptions, estimating and measuring reliability and maintainability Planning for availability within SLA and reporting on them Ensuring cost effectiveness of availability solutions Reducing the duration of problems and incidents affecting availability Ensuring that security requirements are defined and incorporated within the overall availability design

Availability Design and Management Driven by Requirements:

January 11, 2011 28 Availability Design and Management Driven by Requirements Availability requirements are based on the needs of the business Requirements are gathered, defined, and validated by the key users and business management Includes hours of uptime as well as planned and unplanned downtime Includes ongoing support and procedures to address service disruptions

Benefits of a Structured Approach to Availability Design and Management:

January 11, 2011 29 Benefits of a Structured Approach to Availability Design and Management Reduce Risks SLAs will incorporate availability design based on architecture, Reduced risk of violating SLAs Cost Reduction A defined and agreed acceptable level of service prevents over-delivery Unnecessary expenditure on maintenance and resilience building is avoided Improved Service Agility Changing business availability requirements are addressed quickly Cost of changes in availability of different levels is defined or can be assessed quickly. Improved Service Quality Improvement in Service Quality results from reduced Incidents as well as a reduced time to restore service

Structured Approach to Availability Design and Management:

January 11, 2011 30 Structured Approach to Availability Design and Management

Step 1 - Availability Requirements Analysis :

January 11, 2011 31 Step 1 - Availability Requirements Analysis Step Scope Inputs Outputs 1. Availability Requirements Analysis Determine availability requirements related to supporting the needs of the business Validate with other IT management processes Create draft service agreement and assess for feasibility from availability perspective Request for new service or changes to existing service Request for change to availability Documented and agreed availability requirements 1.1 Understand Service Goals Document business goals for the service Service design specification Documented and agreed business goals 1.2 Document Availability Requirements Produce draft availability requirements based on understanding of business goals Draft service level agreement Documented and agreed availability requirements 1.3 Validate with Service Level Management Function Validate availability draft requirements with service level agreements and overall service management plan Overall service management plan Validated availability requirements

Step 2 - Document System and Application Architecture :

January 11, 2011 32 Step 2 - Document System and Application Architecture Step Scope Inputs Outputs 2. Document System and Application Architecture Analyse operating environment of the individual components that comprise the service Service design specification Configurations of individual components that comprise the service level agreement Documented and agreed existing architecture for service delivery 2.1 Define Service Critical Components Define the configurations of individual components that comprise the service Service design specification Configurations of individual components that comprise the service Documented and agreed list of individual components that comprise the service 2.2 Document Service Critical Components and Their Relationships Document the structure of the service breakdown - individual components and and their relationships that deliver the service Configurations of individual components, their attributes and relationships Representation of individual components, their attributes and relationships 2.3 Document and Review Components Monitoring Capability Review existing service monitoring facilities and update or replace if required Existing service monitoring procedures Defined service monitoring criteria 2.4 Document System and Application Architecture Complete architecture document that describes how the service is delivered according to the service level agreement Representation of individual components, their attributes and relationships Defined service monitoring criteria Architecture document

Step 3 - Gap Analysis and Recommendations:

January 11, 2011 33 Step 3 - Gap Analysis and Recommendations Step Scope Inputs Outputs 3. Gap Analysis and Recommendations Perform gap analysis and recommend suitable approach, create specifications and cost justification Validated availability requirements Architecture document Service problem and incident history Availability design 3.1 Perform Gap and Risk Analysis Based on knowledge derived from Incident and Problem data identify gaps in current services Problem and incident data Availability requirements Architecture document Gaps analysed and risks identified and documented 3.2 Identify Single points of Failure Identify individual components whose failure can cause service disruption Components attributes and relationships Identified points of failure 3.3 Evaluate Alternative Approaches and Costs Explore various options within the approved range and identify a suitable approach based on requirements and cost justification IT strategy and architecture Gaps analysed and risks identified and documented Approach for required availability 3.4 Produce Gap Closure Recommendation and Specification Decision on how the closure should be implemented based on financial and business reasons Develop specifications for the availability design and architecture Approach for required availability Cost information Decision on design and implementation Specifications for the availability design and architecture 3.5 Plan and Summarise Downtime Plan downtime for components and aggregate downtime across services Decision on design and implementation Planned downtime 3.6 Create Statement of Work to Implement Initiate project for implementing changes to address availability issues Specifications for the availability design and architecture Statement of work for project

Step 4 - Availability Review:

January 11, 2011 34 Step 4 - Availability Review Step Scope Inputs Outputs 4. Availability Review Assess, review and update availability design if required Incident, problem, fault reports Identified availability concerns and amended design if required 4.1 Define Availability Measurement Model Define availability measurement model Documented and agreed availability requirements Defined data sources for availability measurement 4.2 Perform Trend Analysis Analyse incident and problem data to arrive at a high level view of availability Incident and problem trend reports Identified availability concerns 4.3 Analyse Expanded Incident Lifecycle Analyse expanded incident lifecycle Analyse breakdown of incident resolution to validate and update design considerations Identified specific areas which need improvement 4.4 Investigate Major Outages Investigate large outages and update availability design if required Detailed incident analysis for specific incidents, fault, problems and performance reports Identified availability concerns 4.5 Analyse Availability Reports Review availability reports and update infrastructure if required Availability reports Identified availability concerns Statement of work for identified changes

Core Principles:

January 11, 2011 35 Core Principles Core principles ensure consistency of work and outputs Ensure processes will meet the requirements of the business Work will be of a high quality Core principles should serve as a checklist against which all work is assessed

Availability Design and Management Core Principles:

January 11, 2011 36 Availability Design and Management Core Principles Availability requirements are based on the agreed and defined needs of the business The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business Infrastructure needs to be designed to routinely incorporate availability requirements The availability design and management process must adhere to security policies and procedures An availability plan will be used to track and manage availability requirements and information collected Data on service reliability, maintainability, resiliency must be collected and monitored The IT function will use continuous process improvement to achieve and maintain level of service availability Planned downtime must be minimised for business-critical functions and unplanned downtime is handled by service management processes including Incident Management, Service Request Management, Continuity Management

Core Principle 1 - Availability Requirements Are Based On The Agreed And Defined Needs Of The Business:

January 11, 2011 37 Core Principle 1 - Availability Requirements Are Based On The Agreed And Defined Needs Of The Business Elements Conditions for availability must be aligned with the needs of the business Relevant availability data must be gathered and analysed Input and validation of requirements must be solicited from the business Availability requirements must be documented and distributed for agreement and approval Benefits Expectations are clearly defined and accepted User satisfaction is increased Growth can be forecast more easily Problem areas can be identified

Core Principle 2 - The IT Function Determines The Overall Requirement Of Availability, Performance And Recoverability Of Systems:

January 11, 2011 38 Core Principle 2 - The IT Function Determines The Overall Requirement Of Availability, Performance And Recoverability Of Systems Elements Requirements are met under defined and agreed service agreements Good working relationships need to exist with key suppliers and vendors Changes to environment must be reflected in service agreements Benefits There is a structure of supporting contracts in place from suppliers and vendors to met business availability requirements

Core Principle 3 - Infrastructure Needs To Be Designed To Routinely Incorporate Availability Requirements:

January 11, 2011 39 Core Principle 3 - Infrastructure Needs To Be Designed To Routinely Incorporate Availability Requirements Elements Changes in infrastructure and business needs must reflected in availability planning and design Availability and recovery requirements need to be explicitly incorporated at the design stage Benefits Availability requirements and expectations are clearly defined and accepted

Core Principle 4 - Availability Design And Management Process Must Adhere To Security Policies And Procedures:

January 11, 2011 40 Core Principle 4 - Availability Design And Management Process Must Adhere To Security Policies And Procedures Elements Access to IT services must be provided in a secure environment Availability processes must be aligned with security policies Benefits Security measures will be followed There will be an ability to differentiate between security problems and availability problems

Core Principle 5 - Availability Plan Will Be Used To Track And Manage Availability Requirements And Information Collected :

January 11, 2011 41 Core Principle 5 - Availability Plan Will Be Used To Track And Manage Availability Requirements And Information Collected Elements An availability plan must be developed and distributed Availability planning must be defined and outlined The availability plan must define the details about the to be data collected: what, how often, analysis, reporting, distribution, responses required Benefits Availability management goals are clearly defined and documented There will be a clearly communicated process for availability planning and reporting Data provided for availability reporting, analysis and forecasting

Core Principle 6 - Data On Service Reliability, Maintainability, Resiliency Must Be Collected And Monitored :

January 11, 2011 42 Core Principle 6 - Data On Service Reliability, Maintainability, Resiliency Must Be Collected And Monitored Elements The data to be collected and monitored must be defined, documented and communicated A supporting procedure to collect and monitor data, including response to potential problems must be defined Data needs to be reviewed on a regular and consistent basis Benefits Availability management will be proactive and responsive rather than reactive The expectations of the business can be set accurately There will be an ability to prepare for potentially increased future requirements Availability trends can be identified and addresses

Core Principle 7 - IT Function Will Use Continuous Process Improvement To Achieve And Maintain Level Of Service Availability :

January 11, 2011 43 Core Principle 7 - IT Function Will Use Continuous Process Improvement To Achieve And Maintain Level Of Service Availability Elements Collected availability data will be used to identify areas requiring improvement Implementation of any availability process improvement must be controlled by the change management process to control impact Benefits The business is enabled to make recommendations on availability improvements

Core Principle 8 - Planned Downtime Must Be Minimised For Business-Critical Functions And Unplanned Downtime Is Handled By Service Management Processes:

January 11, 2011 44 Core Principle 8 - Planned Downtime Must Be Minimised For Business-Critical Functions And Unplanned Downtime Is Handled By Service Management Processes Elements Planned and unplanned downtime must be clearly notified to the business Acceptable versus unacceptable unplanned downtime for business-critical functions must be defined Escalation procedures will be developed and distributed Benefits Expectations are set with the business IT demonstrates commitment to supporting business-critical functions

Use Core Principles as Checklist for Independent Verification of Availability Design and Processes:

January 11, 2011 45 Use Core Principles as Checklist for Independent Verification of Availability Design and Processes 1 Availability requirements are based on the agreed and defined needs of the business R 1.1 Conditions for availability must be aligned with the needs of the business R 1.2 Relevant availability data must be gathered and analysed R 1.3 Input and validation of requirements must be solicited from the business R 1.4 Availability requirements must be documented and distributed for agreement and approval R 2 The IT function will determine the overall requirement of availability, performance and recoverability of systems under the terms of a service agreement with the business R 2.1 Requirements are met under defined and agreed service agreements R 2.2 Good working relationships need to exist with key suppliers and vendors R 2.3 Changes to environment must be reflected in service agreements R 3 Infrastructure needs to be designed to routinely incorporate availability requirements R 3.1 Changes in infrastructure and business needs must reflected in availability planning and design R 3.2 Availability and recovery requirements need to be explicitly incorporated at the design stage R 4 Availability Design And Management Process Must Adhere To Security Policies And Procedures R 4.1 Access to IT services must be provided in a secure environment R 4.2 Availability processes must be aligned with security policies R

Continuity Design and Management:

January 11, 2011 46 Continuity Design and Management

Continuity Design and Management Process:

January 11, 2011 47 Continuity Design and Management Process 2. Continuity Report Evaluation and Improvement 1. Continuity Reporting 3. Management Escalations of Service Continuity Violations 2. Conduct Business Impact Analysis 1. Conduct Risk and Disaster Avoidance Assessment 4. Form Continuity and Disaster Recovery Team 3. Determine Data Backup and Recovery Options Continuity Process Quality Control Continuity Process Design and Management Continuity Design and Management Consists of Two Parallel Sub-Processes 6. Continuity Processing for Critical Service Components 5. Design and Develop Disaster Recovery Plan 8. Maintain Continuity and Disaster Recovery Plan 7. Conduct Continuity and Disaster Recovery Rehearsal

Structured Approach to Continuity Design and Management:

January 11, 2011 48 Structured Approach to Continuity Design and Management Can be used for an individual system or application or a service that consists of a number of systems or applications or the entire IT landscape Scope is to define a plan to implement agreed continuity

Scope of Continuity Design and Management:

January 11, 2011 49 Scope of Continuity Design and Management Conducting impact analyses on loss of business systems Designing for service continuity by anticipating disruptions, estimating and measuring reliability and maintainability Supporting business critical functions Designing and developing a Disaster Recovery Plan Design and developing Disaster Recovery Training Planning for and performing disaster mitigation and avoidance Assessing and managing risk

Structured Approach to Continuity Design and Management:

January 11, 2011 50 Structured Approach to Continuity Design and Management

Step 1 - Conduct Risk and Disaster Avoidance Assessment:

January 11, 2011 51 Step 1 - Conduct Risk and Disaster Avoidance Assessment Step Scope Inputs Outputs 1. Conduct Risk and Disaster Avoidance Assessment Identify and quantify risks and vulnerabilities to the organisation Risks and threats, historical data, current environment, current policies, processes and procedures Risk assessment report with recommendations for improvements 1.1 Identify Potential Threats Identify potential threats, internal and external, including weaknesses in the organisation that will cause failure of IT systems Agreement on scope of Continuity recovery plan Potential threats affecting IT systems are identified 1.2 Assess Probability of Threats Assess the probability of the potential threats affecting IT systems are identified Potential threats affecting IT systems are identified Assessment of probability of identified potential threats 1.3 Evaluate Current Disaster Avoidance Measures Evaluates current disaster avoidance measures Potential threats affecting IT systems are identified and their probability Evaluation of current disaster avoidance measures 1.4 Assess Risk Controls to Mitigate Threats Determine the effectiveness of controls in deterring threats Current avoidance measures Assessment of risk controls to reduce threats 1.5 Determine Impact of Reduced Controls Determine how effective a control would be in deterring the threat, limiting the cost of the risk and minimising the impact threats have Assessment of risk controls to reduce threats Impact to organisation without adequate disaster recovery controls 1.6 Determine Value of Additional Controls Determine which risks the organisation is willing to accept and those to be controlled Assessment of risk controls to reduce threats, impact to organisation Value to organisation of additional controls

Step 2 - Conduct Business Impact Analysis:

January 11, 2011 52 Step 2 - Conduct Business Impact Analysis Step Scope Inputs Outputs 2. Conduct Business Impact Analysis Conduct business impact analysis In order to know which functions are the most critical to the organisation for survival Risk and disaster avoidance assessment Critical function categorisation List of recovery requirements for processing critical functions 2.1 Define Business Impact Analysis Methodology Defines methodology and process to be used in Business Impact Analysis based on the risk and disaster avoidance assessment Business systems Agreed methodologies and processes to be used in Business Impact Analysis 2.2 Identify Business Functions to be Analysed Identify business functions to be analysed for risk and disasters Agreed methodologies and processes to be used in Business Impact Analysis Business functions identified for analysis 2.3 Define Business Function Criticality Categorisation Defined categorisation criteria for each business function Identified business functions Criteria for categorising business functions 2.4 Design Questions and Conduct Interviews Design and validate questions and conduct interviews Defined criteria for categories of business functions Validation of business losses 2.5 Analyse Results of Interviews Analyse the data and validate findings if necessary Validation of business losses Analysis of data 2.6 Summarise and Present Results Develop conclusions and present final report regarding Business Impact Analysis Analysis of data Conclusions and final report of Business Impact Analysis

Step 3 - Determine Data Backup and Recovery Options:

January 11, 2011 53 Step 3 - Determine Data Backup and Recovery Options Step Scope Inputs Outputs 3. Determine Data Backup and Recovery Options Determine data backup and recovery options based on the requirements for recovering critical functions and the type of disaster or interruption being cater for Available time to backup and recover Acceptable downtime Recovery requirements Recovery objectives List of backup options, Supporting procedures 3.1 Identify Backup and Recovery Options for Critical Functions Work with business units to identify possible backup options for critical business functions Conclusions and final report of Business Impact Analysis Backup options for critical functions 3.2 Evaluate Operation of Backup and Recovery Options Evaluate previously identified backup options needs to be for various scenarios Backup options for critical functions Evaluated backup options for critical business functions 3.3 Determine Backup and Recovery Options for Critical Functions Determine backup options for those critical business functions that currently do not have any backup options or where the options do not work correctly Evaluated backup options for critical business functions Backup options for all critical business functions 3.4 Design Backup and Recovery Procedures Design backup procedures for all critical business functions Backup options for critical business functions Backup procedures for critical business functions

Step 4 - Form Continuity and Disaster Recovery Team:

January 11, 2011 54 Step 4 - Form Continuity and Disaster Recovery Team Step Scope Inputs Outputs 4. Form Continuity and Disaster Recovery Team Establish recovery teams and specify what each team is to do in the event of a broad range of possibilities Business needs Recovery requirements Recovery team structure Recovery team charter and members Recovery procedures 4.1 Define Recovery Team Structure Define structure of disaster recovery team Decision to proceed Structure of disaster recovery team 4.2 Define Recovery Team Functions Define the function of each individual disaster recovery team of each business units Structure of disaster recovery team Functions for recovery team 4.3 Define Team Leaders and Members Define team leader, alternative leader and other team members for each type of disaster and business units Functions for recovery team Recovery team leader, alternate team leader and members 4.4 Define Team Charter Define charter for each team along with the defined roles and responsibilities Define recovery procedures for each team relevant to their team role and charter Recovery team leader, alternate team leader and members Charter and recovery procedures along with roles and responsibilities for each recovery team

Step 5 - Design and Develop Disaster Recovery Plan:

January 11, 2011 55 Step 5 - Design and Develop Disaster Recovery Plan Step Scope Inputs Outputs 5. Design and Develop Disaster Recovery Plan Develop and validate processes and procedures to support the critical business functions and validate, Recovery objectives Scope of plan Business function classification Disaster definitions and classification Recovery team organisation Recovery Plan 5.1 Determine DRP Structure and Methodology Determine the structure and methodology of how the plan will be developed Structure of disaster recovery team Structure and methodology of developing DRP 5.2 Define DRP Notification Schedule and Process Define the notification schedule and process of recovery Structure and methodology of developing DRP Notification schedule and recovery process 5.3 Define DRP Escalation Process Define the DRP escalation criteria and procedure Notification schedule and recovery process Escalation procedure 5.4 Define Key Recovery Objectives Consider the organisation’s key recovery objectives and policies while designing DRP Escalation procedure Consideration of key recovery objectives and policies 5.5 Define Recovery Steps Define the framework for disaster recovery to ensure it contains the required recovery steps Consideration of key recovery objectives and policies Disaster recovery steps 5.6 Define Critical Function Restoration Process Discuss the DRP with business units to get acceptance to define final restoration process and define training to be provided Disaster recovery steps Accepted restoration process

Step 6 - Alternate Processing for Critical Service Components:

January 11, 2011 56 Step 6 - Alternate Processing for Critical Service Components Step Scope Inputs Outputs 6. Alternate Processing for Critical Service Components Evaluate critical business function components to determine if alternate processing procedures are necessary and feasible for the period between a disaster and recovery and how recovery should be achieved Critical business function components Alternatives for processing critical components Critical business function components timelines Alternate procedures 6.1 Identify Critical Components for Continuity Work with business units to identify critical components that need alternate processing Accepted restoration process Critical components identified 6.2 Develop Options for Continuity Develop options for alternate processing for critical components in coordination with business units Critical components identified Options for alternate processing 6.3 Develop Continuity Processing Steps Develop processing steps based on the options for alternate processing for critical components Options for alternate processing Alternate processing steps 6.4 Develop Return from Continuity Process Develop procedure to return from alternate processing to normal processing Alternate processing steps Steps to return critical components to normal processing from alternate processing

Step 7 - Conduct Continuity and Disaster Recovery Rehearsal:

January 11, 2011 57 Step 7 - Conduct Continuity and Disaster Recovery Rehearsal Step Scope Inputs Outputs 7. Conduct Continuity and Disaster Recovery Rehearsal Conduct rehearsals to validate the success of an organisation’s ability to respond and recover from a disaster Rehearsal plan Recovery procedures Alternate procedures Rehearsal objectives Lessons learned Rehearsal report 7.1 Design Rehearsal Designed p rogramme s for rehearsals Disaster Recovery Plan Programs for rehearsals 7.2 Develop Rehearsal Scenarios Develop rehearsal scenarios based on the design of rehearsals Programs for rehearsals Rehearsal scenarios 7.3 Plan and Schedule Rehearsals Plan and schedule rehearsals, both planned and unannounced Rehearsal scenarios Schedule rehearsals 7.4 Develop Rehearsal Evaluation Criteria Develop evaluation techniques and criteria for each rehearsal scenarios Schedule rehearsals Evaluation techniques and criteria 7.5 Conduct Rehearsals Conduct rehearsals in coordination with all other members Schedule rehearsals Conduct rehearsals 7.6 Review and Analyse Rehearsals Document and distribute outcomes of the rehearsals to all the members along with lessons learned and review reports Conduct rehearsals Reports on conducted rehearsals

Step 8 - Maintain Continuity and Disaster Recovery Plan:

January 11, 2011 58 Step 8 - Maintain Continuity and Disaster Recovery Plan Step Scope Inputs Outputs 8. Maintain Continuity and Disaster Recovery Plan Conduct scheduled reviews of the contents of the continuity plan Updated the plan as part of the change management process and with other related changes Disaster recovery plan Review schedule List of reviewers Review criteria and objectives Recommendations for improvements or changes Approval list from reviewer 8.1 Assign Responsibility for DRP Maintenance Identify reviewers responsible for plan maintenance and a ssign responsibility Rehearsal review reports DRP Review criteria and objectives Assigned responsibilities to review and maintenance of DRP 8.2 Establish DRP Review and Maintenance Procedures and Schedule Establish review and maintenance of procedures and schedules Assigned responsibilities to review and maintenance of DRP Procedure for review and maintenance of DRP 8.3 Integrate DRP Maintenance into Change Management Integrate maintenance process with change management processes to assessed changes for their potential impact on the continuity plans Review feedbacks and inputs Updated DRP 8.4 Agree and Maintain DRP Distribution List After updating DRP create a distribution list to whom the DRP has to be distributed Updated DRP Distribution list

Continuity Design and Management Core Principles:

January 11, 2011 59 Continuity Design and Management Core Principles Scope of continuity plan must contain clear and realistic recovery objectives and recovery timeframes Risk management and disaster avoidance measures should be in place and practiced Continuity plan including disaster recovery should be designed and developed to support recovery of agreed critical business functions Continuity plan should be rehearsed regularly Continuity and recovery strategies or plans should be integrated into design and deployment of changes to infrastructure Continuity and recovery processes or plans should be reviewed and updated on a regular basis

Core Principle 1 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes:

January 11, 2011 60 Core Principle 1 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes Elements Recovery process must be aligned to support business objectives It must be ensured that business impact and recovery investments have direct relationship Recovery time and objectives needs to be communicated and validated The disasters must be defined, which continuity plan will and will not address Scope of planning efforts must be stated Benefits Clear objectives Defined scope of efforts Expectations are agreed and defined Coordinated recovery efforts

Core Principle 2 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes:

January 11, 2011 61 Core Principle 2 - Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes Elements Ensure that environment is constructed and operated to prevent potential disasters As infrastructure changes and business needs change, ensure risks and exposures are addressed Benefits Control of preventable, predictable disasters Minimising and deterring potential disasters

Core Principle 3 - Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions:

January 11, 2011 62 Core Principle 3 - Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions Elements Investment for adequate preventative, proactive, and recovery methods for critical business functions All business functions and their criticality must be defined and communicated to the organisation Must be ensured that the key customers are reassured of continuity management process Benefits Expectations are set and agreed upon Minimise significant losses to the organisation in terms of financial, legal, and operational issues

Core Principle 4 - Continuity Plan Should Be Rehearsed Regularly:

January 11, 2011 63 Core Principle 4 - Continuity Plan Should Be Rehearsed Regularly Elements Regular rehearsals must be conducted, both planned and unannounced Partial and full rehearsals must be conducted A variety of rehearsal techniques must be used Rehearsal objectives and success criteria must be clearly defined Benefits Potential for successful recovery is high Reinforces learning and commitment Demonstrates value to organisation Identification of potential weaknesses in plan

Core Principle 5 - Continuity And Recovery Strategies Or Plans Should Be Integrated Into Design And Deployment Of Changes To Infrastructure:

January 11, 2011 64 Core Principle 5 - Continuity And Recovery Strategies Or Plans Should Be Integrated Into Design And Deployment Of Changes To Infrastructure Elements Must ensure the plans for changes to infrastructure are considered with continuity in mind Recovery procedures must be requested for new applications, systems, networks Benefits Continuity is critical component of operating environment Continuity strategies and plan have important role in design and deployment decisions and plans

Core Principle 6 - Continuity And Recovery Processes Or Plans Should Be Reviewed And Updated On A Regular Basis:

January 11, 2011 65 Core Principle 6 - Continuity And Recovery Processes Or Plans Should Be Reviewed And Updated On A Regular Basis Elements Regular reviews of continuity plans must be defined and scheduled Make sure reviewers are not involved in the development of the plan and are objective Integration into the change management process for plan updates must be ensured Revision, tracking, and distribution list must be defined and document Benefits Keeps continuity plan as a living document Ensures the plan is kept current Reminder of continuing purpose of plan and its benefits to the organisation

Use Core Principles as Checklist for Independent Verification of Continuity Design and Processes:

January 11, 2011 66 Use Core Principles as Checklist for Independent Verification of Continuity Design and Processes 1 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes R 1.1 Recovery process must be aligned to support business objectives R 1.2 It must be ensured that business impact and recovery investments have direct relationship R 1.3 Recovery time and objectives needs to be communicated and validated R 1.4 The disasters must be defined, which continuity plan will and will not address R 2 Scope Of Continuity Plan Must Contain Clear And Realistic Recovery Objectives And Recovery Timeframes R 2.1 Ensure that environment is constructed and operated to prevent potential disasters R 2.2 As infrastructure changes and business needs change, ensure risks and exposures are addressed R 3 Continuity Plan Including Disaster Recovery Should Be Designed And Developed To Support Recovery Of Agreed Critical Business Functions R 3.1 Investment for adequate preventative, proactive, and recovery methods for critical business functions R 3.2 All business functions and their criticality must be defined and communicated to the organisation R 3.3 Must be ensured that the key customers are reassured of continuity management process R 4 Continuity Plan Should Be Rehearsed Regularly R 4.1 Regular rehearsals must be conducted, both planned and unannounced R 4.2 Partial and full rehearsals must be conducted R

Process Quality Control:

January 11, 2011 67 Process Quality Control

Common Process Quality Control Procedures for Availability and Continuity:

January 11, 2011 68 Common Process Quality Control Procedures for Availability and Continuity 2. Continuity Report Evaluation and Improvement 1. Continuity Reporting 3. Management Escalations of Service Continuity Violations Continuity Process Quality Control 2. Availability Report Evaluation and Improvement 1. Availability Reporting 3. Management Escalations of Service Availability Violations Availability Process Quality Control

Structured Approach to Availability and Continuity Process Quality Control:

January 11, 2011 69 Structured Approach to Availability and Continuity Process Quality Control

Step 1 - Generate Report Metrics and Reports:

January 11, 2011 70 Step 1 - Generate Report Metrics and Reports Step Scope Inputs Outputs 1. Generate Report Metrics and Reports Generate report metrics and periodic and ad hoc reports as per requirement or plan Report Schedule Request for Ad hoc reports Generated or distributed Reports 1.1 Develop Management Reports Based on Agreed Metrics Report to management the contributions made by this process to overall service management Report requirements Accepted reports, frequency and costs 1.2 Schedule Report Update the report schedule Report schedule Updated report schedule 1.3 Generate Reports Generate reports according to per schedule or in response to ad hoc requirements Collected metrics Generated reports 1.4 Distribute Reports Distribute the generated report to the target recipients Generated reports Distributed reports 1.5 Review Report Schedule Review regularly the report requirements Report schedule Report details Review results 1.6 Update Reporting Schedule Update report schedule with the new reports Report schedule Updated report schedule

Step 2 - Evaluation and Improvement:

January 11, 2011 71 Step 2 - Evaluation and Improvement Step Scope Inputs Outputs 2. Evaluation and Improvement Perform periodic reviews for process performance improvement Process metrics Future directives Service level expectations Review schedule Improvement plan Implemented improvements, Reduced costs, Improved process efficiency and effectiveness 2.1 Evaluate Process for Improvement Review the effectiveness and efficiency of the continuity management process regularly Improvement plan Gap analysis report 2.2 Develop Improvements and Implementation Plan Develop and review proposed process improvements Improvement plan Gap analysis report Revised business requirements Improvement strategy 2.3 Create and Submit Improvement Implementation Plan Create and submit improvement implementation plan Improvement strategy Submitted improvement implementation plan 2.4 Implement Improvement Plan Manage and coordinate the implementation of the process improvement plan Approved improvement implementation plan Improvement strategy Implemented improvements Reduced costs Improved process efficiency And effectiveness 2.5 Review Implementation Monitor implementation to ensure that process is not disrupted and that the changes are working as intended Implemented improvements Closed improvement implementation plan Review Results 2.6 Update Process Improvement Plan Update the process improvement plan with any changes Process Improvement plan Review cycle Updated process improvement plan

Summary:

January 11, 2011 72 Summary Availability and continuity are merging into a single unbroken requirement Availability and continuity can be a significant overhead to an organisation so their cost should yield benefits elsewhere Most business systems and processes are defined as business critical Management commitment is needed to ensure availability and continuity can the required attention and resources Use core principles for availability and continuity for independent verification of processes and designs Availability and continuity should be embedded into system architectures and designs rather than being an afterthought

More Information:

January 11, 2011 73 More Information Alan McSweeney alan@alanmcsweeney.com