Over the last 6 months or so I have spent more time trying to find the cause of a problem more than anything else. The results are interesting, and I am starting to see a pattern that is proving to be successful. It doesn’t matter if I am working on a performance issue, or just trying to figure out why something is not working. The most common problem that I see time and time again is performance. For the most part as soon as you get the system up and running or features configured, they pretty much stay that way. However, with performance it changes. It changes when the data changes; new releases can be one cause; another common cause is business. When I say business, I am referring to growing your data, or the changes in services your company is offering (say for example your company has decided to see to a new market share, and the sales doubled). The short of it is when a business is not growing and it is just staying static there isn’t much long term hope for it. Stock prices alone take drastic up turns and down turns just based on growth percentages.
So you have been presented with a problem, let’s say it is a performance problem and you are hearing about it from someone at the help desk. The complaint in the ticket says that the database is not performing like it should. The rest of the information you receive from the ticket is not very helpful. You have not been given the complete picture, you don’t even know what part of the database is slow, so how do you treat it as professional you are? I have a few tips for you that I hope will help. I have developed these tips over years of learning the things the hard way.
Always get the description of the problem from the source. Skip the help desk and the ticket system. If you can, go sit down with the person who is reporting the problem. Let them show you where they see the performance problem. Let them show you how this problem creates problems in their life. A query that takes 3 seconds to return to a customer service person may not be considered a big deal. But wait until you are on the phone with a customer screaming their head off wanting to know why they were charged an extra $3.00. The 3 seconds is pure hell for the person answering the phone. If you can make this a lot faster, then you can make the job a bit easier. By learning what else can be done, and then fixing these problems you are now the hero.
I think it was 6th or 7th grade where I learned all about the scientific process, the part where you hypothesize, then run your tests, document your results, and then draw your conclusion. I remember when I was learning about this, thinking when the heck I am going to use this. I had no plans on being a scientist, so I just didn’t see where this was relevant. Now, my work does not include cutting up frogs, but I have found a way to use those old science class skills. When you are troubleshooting a problem make sure that you know what you are testing, what you want to test, run your test, measure your test and then record it. Once you are done document your results and determine if you were right or wrong. If you’re wrong, start again. But make sure that you are only testing one thing at a time.
Remove many potential issues in a test as quickly as you can. Let’s reflect back to poor performance – you need to determine as quickly as you can if it is the database. If it is, then where is that problem in the database? One of the first places that you are going to want to look is your wait times. Run a query against the sys.dm_os_wait_stats DMV. If you can isolate a process like returning a client, and you can isolate that on a specific server, stop all the other traffic , measure the wait stats, record them, run your process and then run them again. The delta between the two runs will give you a good head start on where you should look
One thing to keep in mind, when you use this method to eliminate many potential problems, or even identify problems, be sure that you understand the complete problem. Let’s say you have a high number of disk waits. These could be caused by many reasons. Many DBA’s may start to point at the storage as part of the problem; however you may be setting yourself up to fail. A lot of disk reads could indicate poor indexing, or poorly written queries.
The way that you trouble shoot problems is going to say a lot about you. If you keep calm and document well developed thoughts, I think your problem will be behind you in no time! But if you sit around pointing fingers at everyone else and then freaking out when it might be a database issue, then you are going to have a stress filled day. If you don’t change your process, it will be filled with a lot of stress for your career. I would love to hear what trouble shooting technics you use, drop one of two in the comments section for me.