Posts Tagged ‘SQL Server Performance’

Tip # 4 – Not having a Baseline

Top 10 Tips for SQL Server Performance and Resiliency

This article is part 4 in a series on the top 10 most common mistakes that I have seen impacting SQL Server Performance and Resiliency. This post is related to the most common errors I see and is not all inclusive. If you like this post check out the other tips already posted:

  1. Improper Backups
  2. Improper Security
  3. Improper Maintenance

Most common mistake #4: Not having a Baseline

Baselines can come in many different forms, however none of them are overly complex to understand. The core principal of a baseline is having a known set of metrics which can be used to make a comparison. A common and effective trouble shooting technique when something goes wrong is review what has changed. However, if you don’t have a starting point showing you what the system was like before something changed, finding what has changed is much more difficult.

Performance Baseline

I am often reviewing systems after they have started to exhibit performance issues. My goal with these type of engagements is to return the server to normal or better than normal performance as quickly as I can. Without a performance baseline of how the system runs under normal circumstances, leaves whoever is troubleshooting the performance issues without a valuable tool.

At the basic of level, SQL Server performance can be grouped into 3 categories where a performance issue may lie; hardware, operating system and SQL Server. Consider having a performance monitor counter run that includes some basic counter that could be relevant to each section.

Memory\Available Bytes

Memory\Pages/sec

PhysicalDisk(*)\Avg. Disk sec/Read

PhysicalDisk(*)\Avg. Disk sec/Write

PhysicalDisk(*)\Current Disk Queue Length

PhysicalDisk(*)\Avg. Disk Queue Length

Processor(_Total)\% Processor Time

SQLServer:Access Methods\Full Scans/sec

SQLServer:Access Methods\Index Searches/sec

SQLServer:Buffer Manager\Page life expectancy

SQLServer:General Statistics\User Connections

SQLServer:Latches\Average Latch Wait Time (ms)

SQLServer:Locks(_Total)\Lock Wait Time (ms)

SQLServer:Locks(_Total)\Lock Waits/sec

SQLServer:Locks(_Total)\Average Wait Time (ms)

SQLServer:Memory Manager\Memory Grants Pending

SQLServer:Memory Manager\Target Server Memory (KB)

SQLServer:Memory Manager\Total Server Memory (KB)

SQLServer:Plan Cache(_Total)\Cache Hit Ratio

SQLServer:SQL Statistics\Batch Requests/sec

SQLServer:SQL Statistics\SQL Compilations/sec

SQLServer:SQL Statistics\SQL Re-Compilations/sec

System\Processor Queue Length

System\Context Switches/sec

When you leave tasks to be run manually, such as taking a baseline each month, there is a risk of someone forgetting or the task being put aside for higher priorities. Consider scheduling your performance monitor data collections.

Configuration Baseline

The baseline you have shouldn’t stop at the performance monitor. The configuration of your SQL Server with a document explaining some of the non-standard configurations should be done as well.

Recently I was working on a performance problem for a client who has had their SQL Server online for years. The server had been slowing down over time and the client needed the performance to be better. When we were looking at the database configuration, we noticed the compatibility level on the database was set to SQL Server 2000 even though the SQL Server instance was 2008. Everyone who worked in the IT department when the server was configured had moved on a few years prior. The server and a number of settings configured that were not default, some of the settings were not best practice. The only way we could determine if the setting was correct or if it was just something someone overlooked was to make the change back to the default and complete regression testing to see the impact. This added a lot of time and delay to correcting the overall performance issue. A configuration baseline would have saved a lot of time and money. I recommend doing a health check\baseline on your SQL Servers once a year.

Top 10 Tips for SQL Server Performance and Resiliency

  1. Improper Backups
  2. Improper Security
  3. Improper Maintenance

 

We started work with a new client today; it appears that their SQL Server Performance is not what they expect it to be. I thought that it would be interesting to use this client as an example on my blog because the problem that they are seeing is not just isolated to them. We have had many clients with the same issues at SQL on Call and we have proven that Performance problems can be fixed. Through this series of blogs I will only refer to them as the client, for a couple of reasons. First and foremost is for the privacy of the client and the chances are that you know who this client is since they are a pretty large website. I can describe the issues that they are seeing so that you can understand where we are coming from and what we are looking at.

Issue: The client has one production clustered database system that has plenty of hardware behind it. They host a website off this server and the website has thousands of web pages. The original complaint was the page to page navigation was slow and when the server was monitored for any bottlenecks the hardware came up clean. The client stated that the slowness that they were experiencing was not occurring all the time but more like 70% of the time. When I logged on to the web page to reproduce the issue we were not able to reproduce the issue at first. I then tried 2 or 3 times throughout the day to see if there was a time when the server was being hit harder than others. I was only able to reproduce and validate the problem a couple of times, it was enough to see that there was an issue, but the issue was not related to time, load, dates or pages being called.

Step One: The first thing that I did as described in the Issue section of this entry was to validate the issue. Even if I was not able to validate the issue there was a concern on the client’s side that there was a problem so I let them know we have seen this in the past and have had successful results.

I had to have facts on how bad was bad. Look at it this way. If you are an accountant working in an office and you are the end user of a database that searches for billing information on the client that takes 10 seconds, then 10 seconds may not be that bad if you know that is what the system is going to behave like. If you look at it from a different angle where you the same account but now your job is to talk to a client who wants to know why his bill is messed up. The client on the other side of the phone is not very happy; he is screaming and yelling because he was double billed. You as the account are trying to pull up his information and now it’s going to take 10 seconds where there is going to be silence on the phone. The end result is that the same 10 seconds can feel like hours in one situation yet in another situation it may be no issue at all. (As an FYI the client we are working on is a call center this is just an example that came to mind.)

An interesting side note here. Once I worked with a company that had a call center and was trying to understand the whole time issue with someone on the phone. I started to think about it and I did a test with my Mother and did not let her know that I was doing it. What I did was I called her and would talk for a few minutes and then I would pause. I had a stop watch with me to see how long the silence would last before she would feel awkward and start talking. The average was about 3 seconds. So the 10 second thing could be a big issue.

My Actions: Since there was the issue of this not being consistent and we were not able to pinpoint any sort of collisions we needed to get answers to a few questions.

  • What and when, were the changes for 2 weeks before you noticed the problem up to now. If we can get a list of changes that were made we may be able to speed up the process by looking at the individual changes.
    • We requested these changes from the client and are expecting a list soon.
  • Since the performance problems are not consistent we don’t know if the performance counters that they had may have missed any sort of issues that they were experiencing.
    • We have set up a Perf Mon to watch the Disk, Memory, Processors, Cache Hit Ratio, along with number of users connected (as a baseline number). We set the perfmon to poll and record once every 60 seconds, this way we can leave the perfmon up and running for an extended period of time.
  • When was the last time that database was re-indexed.
    • We were given the answer of last weekend, but we went in and validated this along with making sure that the statistics were being updated.
  • Has there been any major data inserts since the last time that database was re-indexed.
    • The answer to this was no, nothing out of the normal all the inserts were about the same as what they have been doing for years. If there was an index problem we would have expected the performance problem across the board, but since the hard ware was in good shape there is the possibility that the times when the database was running well that the hardware was able to power through the problems.

At this point we are going to wait 24 hours to see what the perf mon picks up. The client was made aware of the fact that we were just collecting data. In other cases in the past they client could not wait. In those cases we take a little more aggressive actions, the caution at being more aggressive is that you hurt the performance worse by tacking on too much to an already taxed system. We are going to take a look at the Perf Mon in a day or two and validate that we do not have any issues with the hardware.