Falling down and getting up: Disaster Recovery Made Easy
Its best to open with an anecdote: Once Upon a Time there was a weight loss candy called “Ayds”. This was in the days before the AIDS pandemic – the name was trademarked in 1,937 and peaked in popularity in the 70’s. But then AIDS came to the world, and the makers of Ayds had a full-blown crisis on their hands. They decided not to deal with it, believing the whole thing would blow over – and now they are nothing more than an anecdote, a joke used to demonstrate how important one’s name and reputation really are.
Here is the most important thing you need to learn from “Ayds”: a company and a product are never what they were at their best – they are only what they are at their worst. Customers feel entitled to the best service and goods, and will see any kind of faulty item, disturbance of service or reputation problem as a personal slight, a malicious act against them that will not be forgotten. Even the most innocuous of problems, failed or over-crowded servers, will encourage hot vitriol from enraged clients. And while these rants often bring to mind the well-known “First World Problems” meme, its best to remember that those are all paying customers – and angry customers leave. This is where the art of Crisis Management comes into play.
Crisis Management is the art of dealing with unexpected and highly damaging situations that have the potential to cripple or even topple a business by causing its customers to abandon it. There are two types of crises:
1. A disruption in the chain of supply – anything that disrupts availability of services and goods. This can be a problem in any part of the chain, from under-production, to failure of supply and up to faulty products. In the digital age, this also covers any form of on-line service denial – a server going down (e.g. as a result of DDoS attack) or slows connection can be just as disruptive to your business as the loss of merchandise in an earthquake or a tsunami.
2. A problem of reputation – anything that causes customers to decide to take their business elsewhere. This can range from faulty business practices to unpopular public statements and up to bed customer service. In digital context this can occur as a result of data breach or other, similarly trust-dissolving, scenario.
There are two very important things to note here. The first is that any kind of problem in the chain of supply will eventually become a reputational problem, if it is not dealt with swiftly and efficiently. The second is that in the digital age social media makes bad reputation spread like wildfire.
The Four Golden Rules
There are four golden rules when dealing with a crisis:
Rule #1: It is better to prevent a crisis then it is to manage it. This is the simplest rule of them all – think ahead and make sure no problems arise. Produce and distribute ahead of time to ensure no faults in the chain. Make use of load balancing services to make sure your site is always available and responsive. Don’t say anything you’d regret.
Rule #2: The goal of crisis management is to minimize damage – not supress it and not undo it. Damage has been done, and all you can do now is prevent further damage from being done. Once something has happened, it is to be treated like a cancer – excise it and make sure it doesn’t spread. Take the hit, bite the bullet, and deal with it.
Rule #3: be ready. It is never a matter of “If” – it is always a matter of “When”. No matter how good you are at preventing crises, something will eventually happen. Be ready and have a plan – backup everything and on a timely basis, have extra deliverable goods on stock, hire a good PR company and donate money in advance. Be ready.
Rule #4: faster is better. In every kind of crisis, time is always of the essence. You need to respond fast and fix the problem swiftly. Make sure servers come back up as soon as possible. Deliver more goods on time. Apologize right away. The longer you take to deal with something, the further it spreads.
Four Phased Response
After you’ve prepared and understood the theory, this is how to actually deal with any kind of crisis. Follow these four simple steps, and you will be able to deal with anything.
Phase 1 – Acknowledge and apologize. In the digital age, there can be no room for lies or avoidances – if a problem exists, the thousands of people encountering it will share and tweet about it ceaselessly (Note that “Thousands” is a best case scenario – a worst case included millions). Make a public statement that says you are aware of the problem and explain how you are dealing with it. Give those thousands (or millions) a word to spread, of your acknowledgement and apology.
Phase 2 – Offer alternatives, if possible. Whenever the service allows it, offer customers an alternative to their problem. Route users to servers located in other parts of the world, exchange faulty items with other, temporary items and so on.
Phase 3 – Fix the problem. This is the important part, but could take some time. Like we said, faster is better, and the better prepared you are, the faster the problem will be fixed.
What is considered reasonable speed can be differ depending on what your service is – servers need to go back up in minutes – not hours, but physical products can be delivered over the course of days. The faster, the better.
Phase 4 – Offer incentives to return. Once the crisis is over and services are back, you need to make sure your reputation is intact – do this by offering incentives to returning customers. This will make sure none of them find other suppliers for their needs. Offer free premium services on your site or a discount on products. If something politically charged happened, donate a large sum of money to a worthy cause. This will entice your customers to return, and allow you to resume the regular course of business.
IT Disaster Recovery
The first rule is, like we said, Be Prepared; in the digital age, there are new tools that allow you to prepare, ahead of time, for any kind of server-based crisis. These are called local or global load-balancing services, and local or global failover services.
Basically, the load balancing services help organizations distribute the amount of incoming traffic between different servers, both locally (in the same data center) and globally (between remotely located servers). By doing so, load balancing services lessen the load on each of the servers, dramatically minimizing the chance that one of them would crash as a result of an overload.
However, server overload is often the least of your concerns. From prolonged power shortages to nature disasters and the ever-present chance of human error, server downtime is the worst case scenario. This is where failover services come into play; Leveraging their ability to auto detect server failure, these tools are able to reroute traffic to a backup location, preferably with minimal interruption of service to your customers – “Minimal” being the key word here. Until recently, everyone had to rely on DNS option: republishing new server IP addresses each time you wanted to reroute traffic. Such a setup is exposed to various issues, most of which are related to the fact that DNS updates do not immediately register by ISPs. In fact, most ISPs will rely on pre-cached data, and will only update their routing tables once every 30 minutes, or more. While this is fine for day-to-day operations, it is unacceptable for IT disaster recovery incidents.
Source: Incaspula Global Load Balancing
Today, new cloud-based services are working to overcome these DNS-related limitations. With a quick adoption rate of these new technologies, more and more organizations are now introducing seamless failover and load balancing technologies, offering instant response to real world and online crisis scenarios.
Crisis Management and Disaster Recovery are important aspects of any business, and it is important to be ready. It is important to not only deal with anything that might harm you, but also to prepare in advance to different types of crises and prevent them whenever possible. Always remember that a prevented crisis costs a lot less even then one well dealt with.