Over the last couple of days on the SSWUG Newsletter there has been some great discussion on SAN Solutions and SQL Server. (If do not get the newsletter, you can at no charge by signing up here.) I have included today’s editorial, but I would recommend that you check out the discussion. In my opinion this is one of the more important topics that are being talked about over the last couple of years. Technology including hardware keeps getting better and better. But I have been in environments where companies have adopted new technology but have not invested in the training for the staff that was expected to run it. The end result is, who does this responsibility fall on? If it’s a storage device and your database is on it, you cannot just pass the buck and say it’s the SAN Admin’s fault. You have to own up and find the source. Jeremy had a great comment that I thought fit in well with mine.
I think there is a great need for more DBA’s out there to adopt the approach that Brent Ozar took. He looked at the new storage that was going to host his databases and studied it, became an expert and is now trying to get the rest of us to catch up. When I first talked to Brent I was excited to see that he was talking about the subject of Databases on SAN. I have been too many conferences and have listen to a few speakers try to capture this subject. None of them did it as well as Brent.
SAN Solutions, SQL Server and Many Misconceptions…
From our own Chris Shaw: “I think the comments that you have received to this point are great. There is a huge need there for SAN Administrators to understand SQL Server, or even SQL Server DBA’s to start learning SANS. Either way in the past few years I have seen this as being a huge gap. When I was working with a SAN not too long ago, the SAN Administrator would create a RAID and then many LUN’s on that RAID. These LUNS would be owned by different servers. In some cases this would not have been a big deal if the servers and solid busy times and times when there was no activity. So you could put 2 machines on the same RAID no matter the flavor of RAID.
This company clearly did not understand that if you have 10 spindles on a server and then you upgrade it to a SAN where it may have 12 spindles, but it has to be shared with another server that used to have 10 spindles… Well now what was 20 spindles for 2 servers is now 12 spindles for 2 servers. The end result was performance problems like we had never seen before. We had the SAN Vendor come in give us the configuration recommendation and when they left all they did was move the LUNS around. The Spindles were still shared.“
Jeremy: “This is a great topic.
Steven’s comment about using the word spindle has been very effective for me as well. Also helping to educate the SAN admin’s on why I request different raid configurations for different files has been very effective. By explaining that a raid 1+0 configuration is helpful for logs due to their contiguous writes versus having data on raid 5 due to the decreased cost (less spindles needed) and the random read / write nature of data files.
David’s comment is also very interesting. I’ve been part of many SAN implementations over the years and I’ve only once seen the performance degrade when moving to a SAN. This was due to the SAN administrators creating one huge raid group (5) and putting exchange, databases and file shares all together.
At the end of the day, I believe that the responsibility for this to be effective rests on the owner / manager of the database system. Typically that’s the DBA. If a DBA doesn’t understand the topology of storage, be it NAS, SAN, SCSI, FIBER etc… it is their job to get up to speed on all of it. From a hardware performance standpoint it is nearly always the bottleneck, in my experience.
One quick lesson learned (the hard way). Even the best SAN guys don’t always understand the underpinnings of the technology and metrics provided by the SAN vendor. The experience that I’m referring to had to do with the SAN monitoring software showing performance capacity at 30 – 40 percent. Meaning that the SAN was reporting that it was only busy that percentage of time; but from the O/S / RDBMS side, I was noticing queue lengths / waits. It took over a week for me to convince them to give me a login to the monitoring software. Once I had it, I was able to find a statistic that explained everything… it was busy time per disk. On one of the raid groups; the group was reporting 40% busy, yet each disk was reporting > 80% busy. Meaning that the disk was thrashing and killing performance.
There are many other stories out there which are similar and some new ones with NAS. However, at the end of the day, SANS are simply awesome for databases.“