I am really surprised at how often that I see questions like this. I have been answering a few questions on a few different forums and I see this more and more often. Here is the quick answer: I don’t know. There really is no way to find the book answer to your question. Disaster Recovery Plans (DRP) can be very deep or very cheap. I would hope that most people here have a basic DRP in one fashion or another. This doesn’t mean that it will work, though. Heck, it doesn’t even mean that it will cover your database other than one or two specific disasters. See, I think that a lot of people don’t realize that even RAID 5 can be considered part of your DRP. Let think about this: A disaster is anything that could happen to your server that would cause it to be disrupted. So with looking at this simple definition we can consider anything from a bad hard drive or a power supply going bad to an F-5 tornado ripping through your datacenter to be a disaster. Fair enough? Is the impact to end user of the database any more or any less depending on the cause of the outage? End result is the database is down or it is not down.
When you consider that something as simple as a power supply going bad can be a disaster for your database, your perspective may change. Have you done anything to protect yourself from a bad hard drive or a bad power supply? Consider the server class of the machine that your SQL Server is on; do you have multiple power supplies; do you have your data on some storage that will protect you from a single hard drive going bad? If yes then you have at least started on the right foot. There are levels of disasters. Some are small and are easy to recover from, others are all out life changing events. Yet when I see this question as often as I do I think we may have lost a bit of common sense. If you are being asked to implement a Disaster Recovery Plan then you should have some sort of requirements around it. More often than not, you didn’t get any requirements. What you got was a task and the people and or companies that rely on you expect that you are going to make the right decisions, to help them out of a sticky situation. To have a effective DRP, you need to be able to understand what kind of disasters you are being asked to recover the database from so you can make a plan.
Your company needs to define Disaster.
Let’s look at an example. Let’s say that your company contracts with a disaster recovery company. This hypothetical company (we will call it DR Vendor) charges your company to be “on-call”, but they provide a location that is different from your datacenter that is ready to be turned up with your data in just a few hours. The only thing that DR Vendor requires is that you show up at the door with an authorized person and you have a backup in your hand. Sounds pretty simple doesn’t it? But what happens if you have a internet connection go offline? One would think that this could be really bad for a database that is serving up data to the internet. The website would be down, but is it worth turning up DR Vendor, for something that may be fixed in a short period of time? But then again, how much time is considered a short period of time? A small web store that does a few hundred thousand a year in sales may not have the same finical impact that a huge website like Amazon does if they were offline for an hour.
If your company does not know how to define a disaster than it is your job to ask them. Present them with questions that will get you the information that you need. How long can you be down; how much data can you lose; what is considered down (is users can read, but not write to the database is that considered down)? Then there is the most important question, what is the budget for your DRP?
So I present you with the question that we started with: What is the best Disaster Recovery Plan? I don’t know – you tell me, what do you require? Once you have the requirements you can now start to design the best plan for the disaster. Once you have that documented, the rest becomes a matter of research to understand the potential solutions. A recovery plan can be something as simple as a backup or as complex as a hot standby site. Common answers to most of the previous questions sound like, “we can never lose data” or “99.999 % uptime is the requirement”, or “we need to be able to bring our site up in seconds if natural disaster destroys our data center”. But often those answers are slightly altered when the costs are reviewed. This brings me to the next discussion point.
Does it meet common business logic?
Some databases require strict uptime or zero data loss requirements. Think of the implications of a bank losing 15 minutes worth of data. Are these the same requirements that a local convince store has? I was sitting a local coffee store the other day and they were helping one customer about every 3 or 4 min. Let’s say the average cup of joe is $4.00, so if there was an endless line of steady customers they would help about 20 people an hour, at $4.00 a coffee, they are looking at losing about $80.00 an hour, and that is if they could not serve any coffee at all. (This crosses the line into business continuation planning.) If they were to try to protect the business from 3 hours worth of down time we are talking about $240.00 worth of sales. Sure they may upset some customers to the point they never come back, or they may miss the one customer that hour that is ordering for everyone in the office, but in general numbers, what is the potential loss to the potential cost? However, what if that coffee shop is a national chain, and the corporate office goes offline and this caused everyone to stop taking orders for 3 hours? At 1,000 stores we are talking almost $250,000.00. Make sure the loss that you want to protect is logical to the business.
At one point in time I was working with a client who was talking to me about another consultant who recommended a DRP. The total cost was going to be close to $100,000. Well, depending on the company or organization that can be pretty affordable. But, the more I thought about it, I started to have questions. When he told me the requirements were that his 2 gig database could be offline for up to a week without impacting revenue, all the sudden this $100,000 did not meet the business logic rule. The database requirements were light, and so light that a laptop could host the database. This problem was solved by a simple backup schedule with tests that proved the database could be restored.
SQL Server offers a variety of options to recover your data, or to make another copy of your data somewhere. These options come with different costs, not just financial in nature, and some of these costs are linked directly to performance. The more information that you can supply or the more information you can get as a DBA, the better off you are going to be when you are faced with that disaster.