I made a huge mistake that to be honest I am embarrassed to admit to. Yet for some unknown crazy reason I have determined it is ok to blog it. What did I do? I didn’t test a new piece of software before I installed it on a server where it could impact my monitoring tool.
The whole mess started when a company contacted me about testing some new software they had just released about 6 weeks ago. The software is supposed to be an auditing product that will allow you to provide a solution that is somewhat close to the feature sets that are offered with SQL Server. There are some added features that made it somewhat interesting for a test. I installed the product on a server that is not critical for production operations my monitoring server. Within a few hours I started to be alerted to some slow I/O; the good news is that I was able to put one and one together and determine that the Audit product I installed was the cause of the I/O issue.
Finding this behavior completely unacceptable I took actions that I thought would resolve the issue. I uninstalled the product… I should be all good, right? For good measure I sent an email over to the company, I was going to do a product review and I wanted them to know what I was seeing, the email was pretty short and clear.
Do you have any added information on the impact that the tool had on the server?
I could not say at the time what caused the performance issue exactly, I knew it was the new software because of a couple items, first all the issues stopped the second I removed the tool, and the most telling was the fact that pre software install my SQL Server was running fine, after the install I was looking at 10 second write times.
I received an email back that said:
I have talked to my technical team about this issue. They told me that our tool doesn’t impact server’s performance.
I knew immediately that this was not going to work, there was nothing more to the message and I realized that it is obvious they have not done enough testing. I was going to send them a few emails and see what we could do to reach across the table and see if I could help, but as life goes life got busy. As time went on and I forgot about the problem, that was until last week. I ended up having a different issue that caused me to open my performance tool to see what clues I could gather, this is when I found that my monitoring to was struggling to do anything. I had completely forgot about the Audit tool that I had removed before, I had removed it so why should I even consider it…
One would think that there were some obvious signs that the tool is either not ready for prime time, or does not meet the standards of the tools that I want to use:
- The tool wanted elevated security access.
- The supporting documents included all of about 5 pages of screen shots. I asked about this and was told more is coming, however I don’t see them on the site yet.
- I have never heard of the tool, and had not seen other reviews on it.
Now let’s move a few weeks forward…
Performance issues struck again.
Well, to sum up a couple of really frustrating days not only for myself but for the poor guys that run the support desk for my monitoring tool. We stumbled on to a few things.
First of all the support guys at SQL Sentry thought that we should take a look just to verify that there were not any traces running, easy enough…
The results had two that I had expected, one was a performance trace that is filtered and expected. The second one was the default trace, but for the life of me I could not figure out what the third one was. I admit my age is starting to reach the point where I can’t remember what I ate for breakfast most days, but I had no clue where or when I created a trace called mytrace.trc. In addition to not knowing the trace it had just started the night before we found it based on the creation date, now granted the server was rebooted. I looked for what I thought could start this trace, then it occurred to me to look for a startup procedure. Found a great post here where I found this quick query.
And there is was…
The audit tool that I installed did not remove the startup procedures in the database. So when I uninstalled the tool, and more or less considered the issue closed, there were items that were left behind that caused issues weeks later. My point is, be careful on what you install. I tried to do that, but when I determined that I needed to remove the product I found that the same effort the company dedicated to rolling out the product was much more than the effort they used to remove it. I should not have had to learn this lesson the hard way, but we all make mistakes. I can tell you that I will learn from this mistake, and it won’t happen again.