Demystifying Business Continuity & Disaster Recovery
This guide has been produced by Probrand together with Oval Insurance Broking to help demystify the jargon of business continuity & disaster recovery. Some such as Denis Goulet, an internationally recognised expert in Business Continuity Management, would argue that business continuity is a necessary part of doing business. Indeed, he says, “business continuity is part of the cost of doing business. We won’t throw millions at this if we don’t have to, but if we have to, we will.”
Whilst Goulet espouses an extreme point of view, it is quite clearly every Director's responsibility, under Corporate Governance, to manage a business competently. This, of course, includes working to ensure a business thrives and survives, which demands a Business Continuity Plan (BCP). Indeed, many financial auditors are starting to refuse signoff on annual accounts unless a BCP is in place.
This guide therefore endeavours to assist in that thought process by offering a template and tips to produce your own business continuity plan. Following this guide will not necessarily get you to meet any standards or make you a disaster recovery expert, but hopefully it will make you think about the issues and give you some pointers on how you can protect your business in the future, after all, you know your businesses and how much they mean to you, much better than we do.
Definitions or “Okay, what’s this malarkey all about then”?
Business continuity is the activity performed by an organisation to ensure that critical business functions will be available to customers, suppliers, regulators, and other entities that must have access to those functions. These activities include many daily chores such as project management, system backups, change control, and help desk. Business continuity is not something implemented at the time of a disaster; business continuity refers to those activities performed daily to maintain service, consistency, and recoverability.
The foundations of business continuity are the standards, program development, and supporting policies; guidelines, and procedures needed to ensure a firm can continue to function effectively, irrespective of the adverse circumstances or events. All system design, implementation, support, and maintenance must be based on this foundation in order to have any hope of achieving business continuity, disaster recovery, or in some cases, system support. Business continuity is sometimes confused with disaster recovery, but they are separate entities. Disaster recovery is a small subset of business continuity. It is also sometimes confused with Work Area Recovery (due to loss of the physical building which the business is conducted within); which is another part of business continuity.
The term Business Continuity describes a mentality or methodology of conducting day-to-day business, whereas business continuity planning is an activity of determining what that methodology should be. The business continuity plan may be thought of as the incarnation of a methodology that is followed by everyone in an organisation on a daily basis to ensure normal operations.
Business Continuity Planning (BCP) “identifies an organisation’s exposure to internal and external threats and synthesizes hard and soft assets to provide effective prevention and recovery for the organization, while maintaining competitive advantage and value system integrity”. It is also called business continuity and resiliency planning (BCRP). A business continuity plan is a roadmap for continuing operations under adverse conditions (i.e. interruption from natural or man-made hazards). BCP is an ongoing state or methodology governing how business is conducted.
BCP is working out how to continue operations, or the delivery of services, during disruption or interruptions resulting from events such as; fires, floods, power outages, theft, and vandalism, earthquakes, and pandemics. In fact, any event that could impact operations should be considered, such as supply chain interruption, loss of or damage to critical infrastructure (major machinery or computing/network resource). As such, BCP is the result of risk management thinking applied to your organisation.
Disaster recovery (DR) is a subset of business continuity and is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster. Explicitly, disaster recovery focuses on the IT or technology systems that support business functions. Note – DR can be applied to other issues.
Applicable Standards or “Surely some government bod has gotten into this big-time”?
The British Standards Institution (BSI) has produced BS 25999; a business continuity management (BCM) standard formed of two parts. The first, “BS 25999-1:2006 Business Continuity Management. Code of Practice”, takes the form of general guidance and seeks to establish processes, principles and terminology for business continuity management.
The second, “BS 25999-2:2007 Specification for Business Continuity Management”, specifies requirements for implementing, operating and improving a documented Business Continuity Management System (BCMS), describing only requirements that can be objectively and independently audited.
With the advent of ISO 22301 it has been agreed that BS 25999-2 will be withdrawn by the end of 2012, however no decision has yet been made on the future of BS 25999-1.
So where do we start?
Where we always should, with a plan.
Business Impact Analysis
When business is disrupted, it can cost money. Lost revenues plus extra expenses means reduced profits. Insurance does not cover all costs and cannot replace customers that defect to the competition. A business continuity plan to continue business is essential. Development of a business continuity plan includes four steps:
- Conduct a business impact analysis to identify time-sensitive or critical business functions and processes and the resources that support them.
- Identify, document, and implement to recover critical business functions and processes.
- Organise a business continuity team and compile a business continuity plan to manage a business disruption.
- Conduct training for the business continuity team and testing and exercises to evaluate recovery strategies and the plan.
Information technology (IT) includes many components such as networks, servers, desktop and laptop computers and wireless devices. The ability to run both office productivity and enterprise software is critical.
Therefore, recovery strategies for information technology should be developed so technology can be restored in time to meet the needs of the business. Manual workarounds should be part of the IT plan so business can continue while computer systems are being restored.
Business impact analysis identifies the effects resulting from disruption of business functions and processes. It also uses information to make decisions about recovery priorities and strategies.
The business impact analysis worksheet can be used to capture this information as discussed in business impact analysis. The worksheet should be completed by business function and process managers with sufficient knowledge of the business. Once all worksheets are completed, the worksheets can be tabulated to summarise:
- the operational and financial impacts resulting from the loss of individual business functions and process
- the point in time when loss of a function or process would result in the identified business impacts.
Those functions or processes with the highest potential operational and financial impacts become priorities for restoration. The point in time when a function or process must be recovered, before unacceptable consequences could occur, is often referred to as the “Recovery Time Objective.”
Resource Required to Support Recovery Strategies
Following an incident that disrupts business operations, resources will be needed to carry out recovery strategies and to restore normal business operations. Resources can come from within the business or be provided by third parties. Resources include:
- Office space, furniture and equipment
- Technology (computers, peripherals, communication equipment, software and data)
- Vital records (electronic and hard copy)
- Production facilities, machinery and equipment
- Inventory including raw materials, finished goods and goods in production.
- Utilities (power, natural gas, water, sewer, telephone, internet, wireless)
- Third party services
Since all resources cannot be replaced immediately following a loss, managers should estimate the resources that will be needed in the hours, days and weeks following an incident.
Conducting the Business Continuity Impact Analysis
The worksheets enclosed should be distributed to business process managers along with instructions about the process and how the information will be used. After all managers have completed their worksheets, information should be reviewed. Gaps or inconsistencies should be identified. Meetings with individual managers should be held to clarify information and obtain missing information.
After all worksheets have been completed and validated, the priorities for restoration of business processes should be identified. Primary and dependent resource requirements should also be identified. This information will be used to develop recovery strategies.
What’s the difference between a Risk Assessment & a Business Impact Analysis?
A risk assessment is a process to identify potential hazards and analyze what could happen if a hazard occurs.
A business impact analysis (BIA) is the process for determining the potential impacts resulting from the interruption of time sensitive or critical business processes.
There are numerous hazards to consider. For each hazard there are many possible scenarios that could unfold depending on timing, magnitude and location of the hazard. Consider hurricanes:
There are many “assets” at risk from hazards. First and foremost, injuries to people should be the first consideration of the risk assessment. Hazard scenarios that could cause significant injuries should be highlighted to ensure that appropriate emergency plans are in place. Many other physical assets may be at risk. These include buildings, information technology, utility systems, machinery, raw materials and finished goods. The potential for environmental impact should also be considered. Consider the impact an incident could have on
your relationships with customers, the surrounding community and other stakeholders. Consider situations that would cause customers to lose confidence in your organization and its products or services.
As you conduct the risk assessment, look for vulnerabilities—weaknesses—that would make an asset more susceptible to damage from a hazard. Vulnerabilities include deficiencies in building construction, process systems, security, protection systems and loss prevention programs. They contribute to the severity of damage when an incident occurs. For example, a building without a fire sprinkler system could burn to the ground while a building with a properly designed, installed and maintained fire sprinkler system would suffer limited fire
The impacts from hazards can be reduced by investing in mitigation. If there is a potential for significant impacts, then creating a mitigation strategy should be a high priority.
Use the Risk Assessment Table to undertake your own risk assessment. Instructions on how to complete are included within the document.
Note – beware considering issues which can be managed by normal controls. BIA looks for low probability, high impact issues which will need major effort to recover from.
If a facility is damaged, production machinery breaks down, a supplier fails to deliver or information technology is disrupted, business is impacted and the financial losses can begin to grow. Recovery strategies are alternate means to restore business operations to a minimum acceptable level following a business disruption and are prioritized by the recovery time objectives (RTO) developed during the business impact analysis.
Recovery strategies require resources including people, facilities, equipment, materials and information technology. An analysis of the resources required to execute recovery strategies should be conducted to identify gaps. For example, if a machine fails but other machines are readily available to make up lost production, then there is no resource gap. However, if all machines are lost due to a flood, and insufficient undamaged inventory is available to meet customer demand until production is restored, production might be made up by machines at another facility—whether owned or contracted.
Strategies may involve contracting with third parties, entering into partnership or reciprocal agreements or displacing other activities within the company. Staff with in-depth knowledge of business functions and processes are in the best position to determine what will work. Possible alternatives should be explored and presented to management for approval and to decide how much to spend.
Depending upon the size of the company and resources available, there may be many recovery strategies that can be explored. Utilization of other owned or controlled facilities performing similar work is one option. Operations may be relocated to an alternate site - assuming both are not impacted by the same incident. This strategy also assumes that the surviving site has the resources and capacity to assume the work of the impacted site. Prioritization of production or service levels, providing additional staff and resources and other action would be needed if capacity at the second site is inadequate.
Telecommuting is a strategy employed when staff can work from home through remote connectivity. It can be used in combination with other strategies to reduce alternate site requirements. This strategy requires ensuring telecommuters have a suitable home work environment and are equipped with or have access to a computer with required applications and data, peripherals, and a secure broadband connection.
In an emergency, space at another facility can be put to use. Cafeterias, conference rooms and training rooms can be converted to office space or to other uses when needed. Equipping converted space with furnishings, equipment, power, connectivity and other resources would be required to meet the needs of workers.
Partnership or reciprocal agreements can be arranged with other businesses or organizations that can support each other in the event of a disaster. Assuming space is available, issues such as the capacity and connectivity of telecommunications and information technology, protection of privacy and intellectual property, the impacts to each other’s operation and allocating expenses must be addressed. Agreements should be negotiated in writing and documented in the business continuity plan. Periodic review of the agreement is needed to determine if there is a change in the ability of each party to support the other.
There are many vendors that support business continuity and information technology recovery strategies. External suppliers can provide a full business environment including office space and live data centres ready to be occupied. Other options include provision of technology equipped office trailers, replacement machinery and other equipment. The availability and cost of these options can be affected when a regional disaster results in competition for these resources.
There are multiple strategies for recovery of manufacturing operations. Many of these strategies include use of existing owned or leased facilities. Manufacturing strategies include:
- Shifting production from one facility to another
- Increasing manufacturing output at operational facilities
- Retooling production from one item to another
- Prioritization of production—by profit margin or customer relationship
- Maintaining higher raw materials or finished goods inventory
- Reallocating existing inventory, repurchase or buyback of inventory
- Limiting orders (e.g., maximum order size or unit quantity)
- Contracting with third parties
- Purchasing business interruption insurance
There are many factors to consider in manufacturing recovery strategies:
- Will a facility be available when needed?
- How much time will it take to shift production from one product to another?
- How much will it cost to shift production from one product to another?
- How much revenue would be lost when displacing other production?
- How much extra time will it take to receive raw materials or ship finished goods to customers? Will the extra
- time impact customer relationships?
- Are there any regulations that would restrict shifting production?
- What quality issues could arise if production is shifted or outsourced?
- Are there any long-term consequences associated with a strategy?
IT Disaster Recovery Plan
Businesses use information technology to quickly and effectively process information. Employees use electronic mail and Voice Over Internet Protocol (VOIP) telephone systems to communicate. Electronic data interchange (EDI) is used to transmit data including orders and payments from one company to another. Servers process information and store large amounts of data. Desktop computers, laptops and wireless devices are used by employees to create, process, manage and communicate information. What do you when your information technology stops working?
An information technology disaster recovery plan (IT DRP) should be developed in conjunction with the business continuity plan. Priorities and recovery time objectives for information technology should be developed during the business impact analysis. Technology recovery strategies should be developed to restore hardware, applications and data in time to meet the needs of the business recovery.
Businesses large and small create and manage large volumes of electronic information or data. Much of that data is important. Some data is vital to the survival and continued operation of the business. The impact of data loss or corruption from hardware failure, human error, hacking or malware could be significant. A plan for data backup and restoration of electronic information is essential.
IT Recovery Strategies
Recovery strategies should be developed for Information technology (IT) systems, applications and data. This includes networks, servers, desktops, laptops, wireless devices, data and connectivity. Priorities for IT recovery should be consistent with the priorities for recovery of business functions and processes that were developed during the business impact analysis. IT resources required to support time-sensitive business functions and processes should also be identified. The recovery time for an IT resource should match the recovery time objective for the business function or process that depends on the IT resource.
Information technology systems require hardware, software, data and connectivity. Without one component of the “system,” the system may not run. Therefore, recovery strategies should be developed to anticipate the loss of one or more of the following system components:
- Computer room environment (secure computer room with climate control, conditioned and backup power
- supply, etc.)
- Hardware (networks, servers, desktop and laptop computers, wireless devices and peripherals)
- Connectivity to a service provider (fibre, cable, wireless, etc.)
- Software applications (electronic data interchange, e-mail, ERP systems, financial systems such as Sage, etc.)
- Data and restoration
Some business applications cannot tolerate any downtime. They utilize dual data centres capable of handling all data processing needs, which run in parallel with data mirrored or synchronized between the two centres. This is a very expensive solution that only larger companies can afford. However, there are other solutions available for small to medium sized businesses with critical business applications and data to protect. These include services whereby each of your critical servers are imaged such that a replica of the exact configuration is taken away and stored off-site at the service provider’s location. At the onset of an emergency, the “replica” server can be brought
online and, with copies of the latest tapes, or with data stored in the Cloud, critical business applications can be back up and functioning within four hours of receipt of such data, and securely available to nominated members of staff with access to the internet.
Internal Recovery Strategies
Many businesses have access to more than one facility. Hardware at an alternate facility can be configured to run similar hardware and software applications when needed. Assuming data is backed up off-site or data is mirrored between the two sites, data can be restored at the alternate site and operations can continue.
Vendor Supported Recovery Strategies
There are vendors that can provide “hot sites” for IT disaster recovery. These sites are fully configured data centres with commonly used hardware and software products. Subscribers may provide unique equipment or software either at the time of disaster or store it at the hot site ready for use.
Data streams, data security services and applications can be hosted and managed by vendors. This information can be accessed at the primary business site or any alternate site using a web browser. If an outage is detected at the client site by the vendor, the vendor automatically holds data until the client’s system is restored. These vendors can also provide data filtering and detection of malware threats, which enhance cyber security.
Developing an IT Disaster Recovery Plan
Businesses should develop an IT disaster recovery plan. It begins by compiling an inventory of hardware (e.g. servers, desktops, laptops and wireless devices), software applications and data. The plan should include a strategy to ensure that all critical information is backed up.
Identify critical software applications and data and the hardware required to run them. Using standardized hardware will help to replicate and reimage new hardware. Ensure that copies of program software are available to enable re-installation on replacement equipment. Prioritize hardware and software restoration.
Document the IT disaster recovery plan as part of the business continuity plan. Test the plan periodically to make sure that it works.
Data Back Up
Businesses generate large amounts of data and data files are changing throughout the workday. Data can be lost, corrupted, compromised or stolen through hardware failure, human error, hacking and malware. Loss or corruption of data could result in significant business disruption.
Data backup and recovery should be an integral part of the business continuity plan and information technology disaster recovery plan. Developing a data backup strategy begins with identifying what data to backup, selecting and implementing hardware and software backup procedures, scheduling and conducting backups and periodically validating that data has been accurately backed up.
Developing the Data Backup Plan
Identify data on network servers, desktop computers, laptop computers and wireless devices that needs to be backed up along with other hard copy records and information. The plan should include regularly scheduled backups from wireless devices, laptop computers and desktop computers to a network server. Data on the server can then be backed up. Backing up hard copy vital records can be accomplished by scanning paper records into digital formats and allowing them to be backed up along with other digital data.
Options for Data Backup
Tapes, cartridges and large capacity USB drives with integrated data backup software are effective means for businesses to backup data. The frequency of backups, security of the backups and secure off-site storage should be addressed in the plan. Backups should be stored with the same level of security as the original data.
Many vendors offer online data backup services including storage in the “cloud”. This is a cost-effective solution for businesses with an internet connection. Software installed on the client server or computer is automatically backed up.
Data should be backed up as frequently as necessary to ensure that, if data is lost, it is not unacceptable to the business. The business impact analysis should evaluate the potential for lost data and define the “recovery point objective.” Data restoration times should be confirmed and compared with the IT and business function recovery time objectives.
You now need to pull together all the information from the BIA and the recovery strategy phase of the project to produce a document outlining a summary of the resource requirements, comparison of alternatives, recommended alternatives, and the timetable, estimated resources and action steps required to implement the recommended alternatives.
You can then use this report to develop the business continuity plan to assist in the delivery of an efficient and timely recovery of your business following a business interruption or disaster.
The detailed plan should be time sliced and should comprise three key sections:
- Emergency response - Stabilise the incident
- Crisis Management - Mitigate or reduce the effects of a crisis
- Business Recovery - To full recovery
In addition to the three key sections, the plan needs to contain detailed procedures and tasks required to recover the critical processes within the agreed recovery time frame, for example:
Emergency Response Plan
The actions taken in the initial minutes of an emergency are critical. A prompt warning to employees to evacuate, shelter or lockdown can save lives. A call for help to public emergency services that provides full and accurate information will help the dispatcher send the right responders and equipment. An employee trained to administer first aid or perform CPR can be lifesaving. Action by employees with knowledge of building and process systems can help control a leak and minimize damage to the facility and the environment.
The first step when developing an emergency response plan is to conduct a risk assessment to identify potential emergency scenarios. An understanding of what can happen will enable you to determine resource requirements and to develop plans and procedures to prepare your business. The emergency plan should be consistent with your performance objectives.
At the very least, every facility should develop and implement an emergency plan for protecting employees, visitors, contractors and anyone else in the facility. This part of the emergency plan is concerned with developing plans for the safety of life and includes building evacuation (“fire drills”).
When an emergency occurs, the first priority is always life safety. The second priority is the stabilization of the incident. There are many actions that can be taken to stabilize an incident and minimize potential damage. First aid and CPR by trained employees can save lives. Use of fire extinguishers by trained employees can extinguish a small fire. Containment of a small chemical spill and supervision of building utilities and systems can minimize damage to a building and help prevent environmental damage.
Crisis Communications Plan
When an emergency occurs, the need to communicate is immediate. If business operations are disrupted, customers will want to know how they will be impacted. Regulators may need to be notified and local government officials will want to know what is going on in their community.
Employees and their families will be concerned and want information. Neighbours living near the facility may need information - especially if they are threatened by the incident. All of these “audiences” will want information before the business has a chance to begin communicating.
An important component of the preparedness program is the crisis communications plan. A business must be able to respond promptly, accurately and confidently during an emergency in the hours and days that follow.
Many different audiences must be reached with information specific to their interests and needs. The image of the business can be positively or negatively impacted by public perceptions of the handling of the incident.
Understanding the audiences that a business needs to reach during an emergency is one of the first steps in the development of a crisis communications plan. There are many potential audiences that will want information during and following an incident and each has its own needs for information.
The challenge is to identify potential audiences, determine their need for information and then identify who within the business is best able to communicate with that audience.
The following is a list of potential audiences.
- Survivors impacted by the incident and their families
- Employees and their families
- News media
- Community—especially neighbours living near the facility
- Company management, directors and investors
- Government elected officials, regulators and other authorities
During and following an incident, each audience will seek information that is specific to them. “How does the incident affect my order, job, safety, community…?” These questions need to be answered when communicating with each audience.
After identifying the audiences and the spokesperson assigned to communicate with each audience, the next step is to script messages. Writing messages during an incident can be challenging due to the pressure caused by “too much to do” and “too little time.” Therefore, it is best to script message templates in advance if possible.
Pre-scripted messages should be prepared using information developed during the risk assessment. The risk assessment process should identify scenarios that would require communications with stakeholders. There may be many different scenarios but the need for communications will relate more to the impacts or potential impacts of an incident:
- accidents that injure employees or others
- property damage to company facilities
- liability associated injury to or damage sustained by others
- production or service interruptions
- chemical spills or releases with potential off-site consequences, including environmental
- product quality issues
Messages should be scripted to address the specific needs of each audience, which may include:
Customer - “When will I receive my order?” “What will you give me to compensate for the delay?”
Employee - “When should I report to work?” “Will I have a job?” “Will I get paid during the shutdown or can I collect unemployment?” “What happened to my co-worker?” “What are you going to do to address my safety?” “Is it safe to go back to work?”
Government Regulator - “When did it happen?” “What happened (details about the incident)?” “What are the impacts (injuries, deaths, environmental contamination, safety of consumers, etc.)?”
Elected Official - “What is the impact on the community (hazards and economy)?” “How many employees will be affected?” “When will you be back up and running?”
Suppliers - “When should we resume deliveries and where should we ship to?”
Management - “What happened?” “When did it happen?” “Was anyone injured?” “How bad is the property damage?” “How long do you think production will be down?”
Neighbours in the Community - “How can I be sure it’s safe to go outside?” “What are you going to do to prevent this from happening again?” “How do I get paid for the loss I incurred?”
News Media - “What happened?” “Who was injured?” “What is the estimated loss?” “What caused the incident?” “What are you going to do to prevent it from happening again?” “Who is responsible?”
Messages can be pre-scripted as templates with blanks to be filled in when needed. Pre-scripted messages can be developed, approved by the management team and stored on a remotely accessible server for quick editing and release when needed.
Another important element of the crisis communications plan is the need to coordinate the release of information. When there is an emergency or a major impact on the business, there may be limited information about the incident or its potential impacts. The “story” may change many times as new information becomes available.
One of the aims of the crisis communication plan is to ensure consistency of message. If you tell one audience one story and another audience a different story, it will raise questions of competency and credibility. Protocols need to be established to ensure that the core of each message is consistent while addressing the specific questions from each audience.
Another important goal of the crisis communications plan is to move from reacting to the incident, to managing a strategy, to overcome the incident. Management needs to develop the strategy and the crisis communications team needs to implement that strategy by allaying the concerns of each audience and positioning the organization to emerge from the incident with its reputation intact.
Finally, once the plan is developed and implemented, the organisation will have several benefits to name a few:
- Compliance with legal and regulatory bodies i.e. Civil contingencies act, Basel II, FSA, Sarbanes-Oxley,
- OFCOM, Corporate governance
- Robust Health & Safety
- Positive public image
- Protect human life, environment, company assets Protect interests of all stakeholders including customers,
- Employees, shareholders and suppliers;
- Resilience against competition; and
- Cost-effective insurance and risk management practices.
Each member of the business continuity planning team should hold a copy of the plan and one electronic copy should be held for maintenance purposes.
Plan Exercising and Maintenance
It is essential that once the plan is developed, that it is tested and maintained to ensure that it remain valid and current at all times. This may be via desktop testing, scenario testing, specific test cases, and full-scale rehearsal.
By testing the plan you will identify area of improvements, verify that the plan is viable and practical; provide proof and documentation of the results of the test. Additionally you will have trained staff in the use of the BCP.
You should also incorporate maintenance procedures within the plan so that it is regularly tested and the plan updated accordingly.