So you are starting a new job as an SRE, and expect to be on call anytime now. This is your first time in the role, and you want to prepare yourself and your family for it. Being on-call comes with unique challenges and you should be prepared for it.
Be good to yourself
When on call, you are likely to be woken up in the middle of the night to solve a complex problem. While your sleep will be broken, there is no reason you can't be comfortable working.
Fix a routine
Know yourself. For example, if you are slow/groggy for the first 2 minutes when you wake up, set a routine that will help you. Maybe power up your laptop, get a glass of water, and use the facilities before you look at a screen. Give yourself some time to boot the same way you would your laptop.
Talk with your manager about being able to sleep in or having a shorter work day after a noisy on-call day. Make sure to get adequate rest when you are not on call, so the sleepless nights are not so much of a strain on your health.
Set Your Surroundings
Whether or not you can allocate a room/desk to work late nights in, make sure to keep your laptop fully charged and plugged in at the desk/table you will be working from.
Keep water and snacks handy to help your mind and body wake up. On cold nights, keep a robe or a throw around so you aren't freezing while working.
Plan Social Commitments
Avoid travelling when you are on call as much as you can. It will make sure your getaways are peaceful. If not, make sure you can pull over and work without a power source for a few hours. Carrying an additional battery may prove to be more useful than you think.
Invest in a good data plan from a reliable operator so you can create a hotspot wherever you need. When social commitments are unavoidable, try swapping with a co-worker. Planning your commitments in advance can be a real stress-buster. Buy a bag that can fit all of your things- a laptop, chargers, extra batteries, power strips, notebooks, etc, and not hurt your back and shoulders.
Be good to your work
Being on-call comes with responsibility not just to fixing an error, but to prevent it from occurring again. While the exact process of how you can do good work while on call may vary, here are a few general things to keep in mind when you are the one trying to plug a hole in the system.
Do your homework
Know the infrastructure at your organisation well. Find out who does what, so you can quickly loop someone in if you know the problem is because of a code change they made. Review any alerts that have happened over the previous shift to see if there is a trend.
When you are about to start your on-call, ensure you know if you have a backup, and who they are. Find out who you can contact in case you need to escalate something. Save all these numbers on your phone. If you have 24x7 support from external partners, ensure their support numbers are in your phone.
Dont be afraid to ask for help
If you have a problem during the night and you can't solve it or feel uneasy about the solution call a someone else on the on-call team, or another team that can fix it. It's better to fix it right than having to spending the next day fixing a mess. Remember to thank them for the help.
Rely on your teammates, tools and tuning. Make notes - if it goes long, you're going to be turning over an issue to someone, and you will be tired when you do. They will think you for having kept notes.
Fix it for the long run
Try to prevent the problem you just fixed from reoccurring by improving monitoring/automation or something to improve that gap so that you and others can keep on sleeping. It is really important to have a culture of postmortem of incidents after they are resolved, to prevent the system from breaking down again for the same reason.
Make sure that there are differences between actionable items and notification items in alerts. If you classify your notifications well, you and your teammates will sleep better. Zenduty allows you to customize Escalation Policies so only the right people are alerted at the right time.
Do your best, and don't fret about the rest
you are going to get bad calls. Issues that should have never been paged out. Issues that aren't even remotely in your realm of responsibility. ake a deep breath before you answer the page / phone call and be a professional. Nobody is out to get you, probably; yelling at someone for being stupid at 02:30 is counterproductive and is actually wasting time you could be sleeping. Turf it or fix it, then note it and move on.
Don't let it get to you that production is "down". You didn't do anything to cause it go down, you woke up in the middle of the night, took the call, and are doing what you can to fix it. You are the good guy in that situation. Don't let worried stakeholders pressuring you to "Fix it faster" get to you. Take a deep breath, use your troubleshooting skills, and work the problem. The amount of time it takes to fix is the amount of time it takes.
Be good to your family
- Set clear expectations, don't lie or underestimate the impact of your work in your life/day to day
- Try to avoid disturbing your partner as much as possible.
- Consider getting yourself a wrist band which is connected to corp phone so it would not ring but instead vibrate. Same work for alarms if you have a scheduled change.
- When you are not on call, enjoy your family. Eventually they will understand when you are on and off for them
- Let your family know a week before your rotation starts. Plotting out dinners, kid pickups, and other routines in advance for smooth sailing
- Ask your partner to stay in bed and call them. Walk around the house talking to them, until they can't hear your voice outside the phone- that's likely your work spot (speak a bit louder than usual).
- Invest in earplugs! If your partner is a light sleeper, it may help them sleep through your long nights.
Zenduty is a cutting edge incident management platform designed by developers keeping the well-being of engineers in mind. Sign up for free here.