How many times have we had sweet dreams of endless amounts of free time doing whatever we wish be interrupted by a message or a phone call that says we have a server down? I can tell you that it happens to me more often than I would like. In the past I have been pulled away from a Mother’s Day dinner, called off the top of a mountain, heck my cell phone that does not appear to work if I lean my head to the left in my house has this odd effect of being able to find me 2 miles out into the ocean when I was on a cruise (Now that is a blog for a different time).
Yesterday I was reviewing one of the servers that I monitor and I found that there was a cluster failover. I scoured the SQL Server log to validate that the server was back on line and that everything was ok. After I completed that I went into discovery mode. What caused my 2:00 AM wake up call? Well the actual cause is still unknown but I did stumble on to something I thought I would make mention of.
Server hardware is considered server hardware mostly because of the redundancy that is built into the hardware, things like Storage redundancy, or redundant fans, even redundant power supplies. So as my brain churns I thought I would research items on why servers fail. I was looking for some statistics that could point me in one direction or another. These are not scientific at all because I could not find exactly what I was looking for. Here is what I did find.
- Disk failures
- Power Supply failures
- NIC Card failures
- Software failures
- Bad Memory
The end result was I could not find any statistics on the common server crashes or failures. I think I am going to try to do a survey and see if I can get some good data on why this happens, how do we take back control of our life and our sleep.