As a consultant many times as I look at different logs, history of jobs, result sets and I am see errors. When I start to dive into the errors I come across discussions that end up with, “Yeah we know it is there like that, it’s what we expect”. I am not talking about warnings; I am talking about full blown errors. I have even seen jobs that exist on a production server that fails; there are even jobs that have been designed to fail. This brings up the question in my mind of why? It is easy for me to sit back in my chair, kick my feet up and let them know this is really not the best way to do things. But as I dig deeper I have to ask what are some of the reasons for people have done that. The one reason that I have heard that I like is to validate the error alert system. This is when someone designs a fail point into their system. Many admin’s rely on notification paths that our outside of their control. Since that is the case they want to validate that the notification path is continuing to work. For example, a company may use pagers or email notifications to alert them to something being amiss with their database. They want to validate that the complete notification path is working as they expect it to, so they create a job or a process that they know will fail just to send an alert to the device.
When people use this method is it really considered a failure? If a process is designed to throw an error message and it does raise that message then wouldn’t that be considered a success? In place of the failure could they not test the same path by having it alert on success as much as failure? With that being said I am not a fan of process that send messages on success on a regular basis, I thing that the completed successfully and failed messages can start to look the same on a small device such as a phone or a pager. If someone expects a page on fail or success then either way they will get that message and may not pay as close attention to it as they should. In my experience, a page, or message to a cell phone should only be done with something is not right. I could be convinced that A page a week or even once a day to validate the system is working as expected is a good thing, as long as not getting the page raises an alert with the person carrying the device.
There is a chance that I am looking at this from the wrong angle. I believe processes should be designed to succeed not fail.
Don’t Forget… This is the week that Kalen Delaney will be in town. This is going to be a great week. The Kalen Event is Free, all you need to do is sign up here.