Monday, August 27, 2018

Resilient Businesses Move Their People To The Cloud

         
Every year, as the Atlantic hurricane season approaches many businesses have a nagging realization that they are at risk due to a catastrophic "Black Swan " event. Black Swan events are a constant source of risk in states like Florida where many communities are subject to disruption due to coastal storms. This risk is particularly acute for businesses that depend on the storage of on-line data if there is a chance their critical data could become lost or corrupted. But the threat from Black Swan events isn't limited to Florida, nor is it limited to large scale disruptive events like hurricanes.

The black swan theory or theory of black swan events describes a disruptive event that comes as a surprise, has a major effect, and is often inappropriately rationalized after the fact with the benefit of hindsight. The term is based on an ancient saying which presumed black swans did not exist, but the saying was rewritten after black swans were discovered in the wild. Consider the following scenario...
       
"We tend to think of disasters in terms of the attacks on the World Trade Center, Hurricane Katrina, or other mega events. Sometimes, however, less notable events occur that can have a catastrophic effect on a business. In February 1981, an electrical fire in the basement of the State Office Building in Binghamton, New York, spread throughout the basement of the building setting fire to a transformer containing over a thousand gallons of toxin-laden oil. Originally thought to be PCBs, the toxins were soon determined to contain dioxin and dibenzofuran, two of the most dangerous chemicals ever created.

The fire was smoky and quickly filled the 18-story building with smoke. As the transformer burned, the soot entered the buildings ventilation shafts and quickly spread toxic soot throughout the building. The building was so badly contaminated that it took 13 years and over $47 million to clean before the building could be reentered or used. Because of the nature of the fire, the building and its contents, including all paper records, computers, and personal effects of the people who worked there, were not recoverable. This type of event would be irrecoverable for many businesses." - Operations Due Diligence, Published by McGraw Hill
       
What affect would a catastrophic hurricane that affected an entire region or a localized disruptive event like a fire have on the operation of your business? Could you survive that kind of interruption or loss? As the dependence on on-line data has grown in virtually every type of business, so has the risk that loss of their data could disrupt the operation of the business and even result in its complete failure. In response to these threats, there has been an evolution in the approaches used to mitigate these risks as the volume of on-line data has continued to grow. Originally, the concept of Disaster Recovery (DR) emerged as a mitigation strategy that focused on the recovery of critical data after a disruptive event by giving the business the ability to restore disrupted IT operations.
         
Disaster Recovery (DR) involves a set of policies and procedures that enable the restoration of critical business data and allows the IT infrastructure to be restored to a prior state. DR was originally seen as the domain of the IT department who were given responsibility for mitigating the risk. To minimize the risk, system backups were scheduled frequently and aggressive DR plans that included server cold start procedures and data backups were implemented.
         
The goal was to restore the infrastructure to the last point where the data had been backed up (at the time, typically on tape). The acceptable DR practices at the time allowed the IT system to be rebooted when the facility power was finally restored... Unless it was in a flood zone or the off-site backup storage facility had also been impacted. In either case, the operation of the facility could potentially be disrupted for some period of time and the data restoration was also potentially at risk depending on where backups were stored.
         
Now let's roll the calendar ahead... As technology evolved so did the Disaster Recovery strategies, which lead to new concepts that evolved to the requirements for a Business Continuity solution as a means of mitigating risk. Still seen as the domain of IT, as technology moved towards solutions like shadow servers, distributed data locations and high speed bulk data transmission with hyper connectivity. Data no longer had to be "recovered", it just had to be connected in distributed locations where it could be remotely accessed. Business Continuity mitigated the risk of data loss and allowed a business to recover much more quickly and efficiently from a Black Swan event because its servers never went completely down.
         
Business Continuity originally encompassed planning and preparation to ensure that an organization's IT infrastructure remained intact enabling the business to efficiently recover to an operational state within a reasonably short period following a Black Swan event. Technology today has evolved towards cloud solutions that put both the data and the applications into remote "cloud" locations so it would seem the IT responsibility for mitigating the risk of on-line data loss or corruption has been solved. With highly connected, fully distributed solutions, some people feel the need for business continuity may be fading in criticality. Nothing could be further from the truth...
         
The fact is the risk was never solely in the loss of the data but the loss of the businesses ability to operate. There are businesses that cannot tolerate any disruption to their operations. These include healthcare, insurance, and communications companies, critical logistic suppliers, transportation providers and local governments. It is during Black Swan events that the services and products these businesses provide may be most needed. The requirements of other, less critical businesses, whose operations could be interrupted for days or even weeks, but who might face a significant financial risk, may also make their continued operation a matter of corporate survival.
         
Today's technology has completely abstracted business processing and data from the user by moving critical IT infrastructures into the cloud. Cloud technology enables users to work from remote locations, but use of the cloud doesn't fully mitigate operational risk. It means people have now replaced computers as the critical path to continued operations. The operation of the business is more likely to be interrupted because key personnel aren't prepared to sustain operations during a Black Swan event. They don't have a facility that has been proactively planned to support operations during disruptive events that could last for hours, days or weeks. Particularly in areas like Florida, where large natural disasters such as hurricanes can disrupt services to entire communities, resilient businesses need to prepare in advance for sustained operations during a disruptive event. The ability of a business to continue its operations during times of distress are a measure of the businesses resiliency.

Business Resiliency: 
       
Takes business continuity to another level because it makes it the domain of operations management rather than leaving it solely as the domain of the IT Department. When planning for disaster recovery or business continuity the critical link is now the people who are needed to operate critical systems remotely. Yes, there are occasions where staff can work from home or from remote facilities the business may operate, however, this is not always a satisfactory answer and even when it is, businesses often find themselves scrambling to play catch up, trying to figure out who does what and "how can we get it done under these circumstances" situations. During Black Swan events including regional disruptions like hurricanes or local disruptions such as fires, many of the people the business relies on may not have power, internet or even a phone needed to enable them to work from home. Because you can't put people in the cloud, Business Resiliency requires planning, training and practice so that your staff knows how and when to mobilize.
         
Resilient businesses integrate Black Swan response into their continuing operations so that, when they are needed, at a time when the business and the people are under stress, everyone knows how to respond efficiently and effectively and where to go to provide that response. Business resiliency requires a dedicated facility that has been hardened to withstand Black Swan events and has been designed to provide the support services the people and the IT infrastructure will both need. Business resiliency requires proactive planning and the integration of operating procedures into the businesses standard operating plans to include remote operations by trained critical staff who have been mobilized to respond during disruptive events and it requires proactive practice to ensure that, when remote operations are needed, the people are ready.

No comments:

Post a Comment