You’re On-Call, Now What?

Posted: April 16, 2012 in Career, SQLServerPedia Syndication

I have been placed on call for years, and there have been times where there is a declared on-call list, and then there are times when there was no such list.  But either way I have always been on-call (if something is wrong with one of my databases, I am there.  It’s my job, my career choice and my responsibility).  I actually like having an on-call list, because this is a defined set of time that I am not the first line of defense for looking at error messages.  Many people look at an on-call list and think “look at all this time I am having to work, change my plans, have my life impacted”.  I look at a defined on-call schedule as a, wow I only have to be working outside normal hours as a first level trouble shooting on these days.  Now I have all this time to go do whatever it is I want.  If there was no defined on-call schedule I would have to look at every single message and determine if it was a database issue, then I would have to determine what the action is.  If it is not a database issue I would just have to wonder if the person responsible for that area is going to look at whatever the problem was.    In other words, I like a clear definition of when I am the guy who is watching out for what may be threating, and a clear definition of when I can go camping in some remote location.

On a side note…

Maybe this is an old Marine thing coming out in my personality.  I want to know when I sleep someone has got my back,  so I will make sure when it is my time I will make sure others feel that same comfort.  I have been spending a lot of time lately trying to determine why I have certain opinions and where I developed them.

Really what I want to accomplish with this post is a fact finding mission of sorts. See, I have spent so many years on call, and the responsibilities of what I do while I am on-call has been pretty simple.  When I have been on-call the reasonability has always been that I will be the person who is the first line of defense for pages, alerts, and phone calls.  The more I think about this I don’t think that I am making the best use of my time.  Sure I may fix simple issues, or make sure that the servers are up and running.  But what can I, or what should I be doing to make this time that I am already spending on-call used to its fullest potential?

I really am curious what others are being asked to do when they are on-call.  If you have time, please leave a comment and let me know, even if it is something that I have mentioned.

I have had some very simple requirements before. For example, at one time I was asked to make sure that I was checking my email twice a day, but this was well before the smart phones, and email following you around everywhere you go.  Now I just check email when my phone thing plays a random assortment of noises.  I know this is shameful to admit, but I really do still carry a pager like Steve Jones (T|B) has pictured on his blog this week (In all fairness, some of the places I go when I am not working can be a little out of cell area.  And, well, nothing will wake me like the screaming buzz from this pager).

Something new I am going to try next week when I start the on-call shift for a week is keeping a log.  I am going to keep track of each time someone calls, or I get a page.  My goal with this is to make sure that I am completing issues and communicating the completion of these issues to the source.  So if I get a call from employee xyz that says they cannot search something in the database, I want to know when they called, who called, what I did, and when I let them know the issue was fixed.  The other thing that I hope to accomplish with this log is having a better timeline when it comes to doing a post mortem.  Sure, when I am troubleshooting something I keep notes and copies of the logs, but what tells me when an issue was reported, or when the “all clear” was given?

I have learned over the last few years that the impression is so important, and in some cases the impression is more important than the facts.   Some people have just done a really good job at being able to explain what a technical problem is to non-technical people, but there are some cases when it does not matter that the whole server was covered in water from the custodian in the datacenter.  The customer only knew that the database was down.  Who would have guessed that water could short out one of those server things? 

So really I am curious as to what I am missing, even if it is not critical, just items that may make life easier.

  1. SqlAsylum says:

    I’ve had lots of different on call situations. When I have been part of a group that rotated the on call this is normally how it went. Whoever is on call is responding to pages/texts/emails/calls. Those on call typically do the release for that week/cycle that they are on. The person on call needs to be near a computer or have access to one for example a laptop with a Mifi device or something similar is fine. For those drinkers out there they need to stay sober for on call weeks/weekends. Those not on call at the time should be free to use the time as they see fit. If you can’t meet that for your week/weekend you should change your plans or organize with someone else to take your time and switch with you. This has worked successfully for me in many different companies. If someone is not handling the on call and another DBA has to be called during your time this is brought up in performance reviews/weekly status and seen as a negative on your review. I’ve always had some sort of escalation in this scenario as well. So that if you have a DBA that can’t figure it out a senior is called in. Typically this could be the Db manager or a senior developer or anyone that may know more about the system than you do. This person doesn’t have to adhere to the same rules of on call but should be aware that a phone consultation might come in.

    Like I said there have been other specific situations I’ve been in but like you I’ve been on call for the last 10+ years. Whether it was for the company I worked with or the Company I contracted with. Someone can always call me in the wee hours of the morning. 🙂

  2. Chris Yates says:

    I think you have hit the nail on the head when you stated “when you are sleeping; you want to know someone has your back”. I’m part of a 4 man team; which I enjoy the camaraderie. I have one person who is sketchy at times but for the most part we are all pretty good at watching out for each other.

    We have separate duties due to our traditional banking environment and a non-traditional banking environment. On a 4 week rotation a DBA will be on call from Monday to Monday morning. We have everything set up to notify us on point of failure from maintenance jobs, to importing/exporting data. Once a failure comes into a common DBA Team Email box it is disseminated out to the DBA on call. The problem I see with this approach is what if the email system or maintenance on the email system goes down or is being worked on?!?!?! We are working on a separate alerting mechanism TBD.

    We trend alerts that we receive to see if we are receiving the same ones over and over again; this gives us a good baseline to start and from that we use our change tracking system to provide a list of resolutions for all to review if needed.

    If one of us has a big weekend with upgrades or whatnot we tend to do them as a team or in two’s to help lighten the load. Another thing that most people have a strong dislike for is documentation………if processes are documented (issues and resolutions as well) then to me it is that much easier to have information when you need it quickly and easily; store our DBA documents on a secured Sharepoint site (ironic choice of words I know).

    • Chris Shaw says:

      Chris, I am really glad someone saw what I was really wanting to point out. On Call is so much easier in my mind as long as I remember I am helping people get there rest.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s