Archive for May, 2015

This article is part 7 in a series on the top 10 most common mistakes that I have seen impacting SQL Server Performance and Resiliency. This post is not all inclusive.

Most common mistake #7: Disaster Recovery Plans

Often people hear disaster recovery plan and the first reaction is to start worrying about the costs. Disaster recovery plans don’t have to be expensive, expensive disaster recovery plans come from strict requirements.

About 10 years ago when I started as an independent consultant one of my first clients was contacting me to help build out a disaster recovery plan for them. After our initial discussion I learned some consulting firms had forecasted one hundred thousand dollar solutions. Many large companies would look at that number and determine it was a bargain, however this clients company made less than 50k a year. The data changed about once a year, and if the database was down a week or two it was questioned if anyone would even notice. It was easy to see that the hundred thousand dollar solution was extremely over engineered.

Don’t ignore the basics

Disaster Recovery Solutions should start with two basic questions, what is the recovery point object and what is the recovery time objective.

  • RPO – Recovery Point Objectives – To what point must the database be restored after a disaster. Another way to ask this question would be, how much data can be lost.
  • RTO – Recovery Time Objectives – How much time can elapse after the disaster has occurred? Or, how long can your system can be down?

Depending on these answers additional questions will arise, however these two questions can help determine what potential solutions will work. SQL Server offers a number of solutions from Transaction Log shipping to AlwaysOn Availability Groups.

Pay Attention to the Details

Whenever I visit a datacenter for a client I make sure that I take some time to review how the cages are wired. On more than one occasion I have seen servers with redundant power supplies have both of the power cords plugged into one circuit. This configuration will protect you if one of the power supplies goes bad, however if the circuit goes down the redundant power supply isn’t any help.

When executing a disaster recovery plan ensure all the small details are checked. If there is a single point of failure in the system Murphy is going to find it.

Test

I can tell you the most common mistake I see on a regular basis with Disaster Recovery solutions is the lack of testing. Some testing is better than no testing, but the best testing is testing that mimic’s actual disasters. If there is a power outage for your servers and you have 5 min. to get everything moved do you know the steps to complete before the unlimited power supply loses its charge? What steps must you take if you don’t have the 5 minutes? I was working with the chief technology officer for a major education facility and he had a vendor that was telling him he we safe, and he didn’t have to worry about it. His contract was for a 15 minute recovery point. When we reached out to the vendor and asked them to prove it.

The lesson here is perform regular realistic tests, if they don’t work, find out why and fix it.

Top 10 Tips for SQL Server Performance and Resiliency

  1. Improper Backups
  2. Improper Security
  3. Improper Maintenance
  4. Not having a Baseline
  5. SQL Server Max Memory
  6. Change History

You might have been following along with my recent series of posts related to a number of resiliency tips for SQL Server. I have compiled these tips based off my observations over the last 20 or so years. Over the summer I have the opportunity to present a session that is based on these tips. This week I am in New York, to present at SQL Saturday #380. Attendance is free with a small fee to help cover the cost of lunch.

June 6th I will be in Colorado Springs, Co. presenting the same session at SQL Saturday #415.

I believe there are a couple more SQL Saturdays I will be at this summer. I hope to see many old friends and meet many new ones. Tip #7 should be posted Friday 5/29. If you are looking for a list of the tips that have already posted you can find these here:

Top 10 Tips for SQL Server Performance and Resiliency

  1. Improper Backups
  2. Improper Security
  3. Improper Maintenance
  4. Not having a Baseline
  5. SQL Server Max Memory
  6. Change History

Tip # 6 – Change History

Top 10 Tips for SQL Server Performance and Resiliency

This article is part 6 in a series on the top 10 most common mistakes that I have seen impacting SQL Server Performance and Resiliency. This post is not all inclusive.

Most common mistake #6: Not Keeping a Change History

Business today moves fast, companies change interfaces to keep content relevant and are continuing to offer new services to their client base. In a perfect world code is tested 10 ways to Sunday and is never released with bugs or design issues, however we don’t live in that world. I mean no disrespect to the developers that spend hour upon hours trying to provide the best changes possible. There are just so many variables that making a release perfect every time is just not a reality.

There is an inherent risk with change; software, hardware or configurations including SQL Server.

When a change is being planned no matter how small of a change, these changes should be documented. Chances are most DBA’s have heard this before; however, simple changes are often overlooked. Documenting the change is a great start; however, keep in mind that it is critical that the documented changes are easily researched. In my experience with working with SQL Server, most issues with SQL Server can be tracked back to a date when an issue started. Because of this, I recommend storing documents so they can be searched by date.

Stability increases when you embrace change history and take change management couple steps further. Consider setting up a change process. Implementing policies such as not allowing changes to be made to the SQL Server unless they are being made via scripts. There are some great benefits to only allowing changes via script. Consider a small table change, a few of the benefits that you will see by using a script over the user interface:

  • When changes are made via script, and the script has been saved it is easy to see exactly what was executed. When changes are made via the user interface you have to assume you know what the exact change was.
  • Changes made via the user interface are susceptible to the “fat finger” otherwise known as a typo. You can review your change with your eyes, but having others review your change is more difficult. If you mistype the name you are now left with having to make a second change to correct your mistake.
  • When using a script, you can share what the change is going to look like before the change is made. You can send the new table layout to other teams in the company and they can see exactly the totality of the change.

When making changes it is also important to have a roll back script. This is most likely the most overlooked part of change management. Having a roll back script is not often needed and when everything is rolling out as planned there is no thoughts to having to roll back these changes. When a changes are made and it isn’t going as well as it was planned someone may make the decision to cancel the change. A restore can often lead to extended hours of downtime and unexpected impacts to customers. These are the times when the extra effort put into a rollback script prove to be priceless.

 

Top 10 Tips for SQL Server Performance and Resiliency

  1. Improper Backups
  2. Improper Security
  3. Improper Maintenance
  4. Not having a Baseline
  5. SQL Server Max Memory