In the last post, on-call onboarding "the good" I mused about 🌈 the good 🦋 practices and processes I've experimented with for onboarding both new engineers and engineers new-to-the-org
so keeping with the whole "the good, the bad, and the ugly" thing i've got going on in this mini-series today's post will be about the bad wrt to on-call onboarding and pager life
the very first thing that comes to mind
if you were active on Twitter this spring you may have seen my tweet about this
it was a throwaway tweet reacting to another ops pal sharing the enormous burden and impact an understaffed rotation was having on their life. blended with my own experiences. in concert with the stories I hear from friends, colleagues, acquaintances.
based on anecdata, I consider it to be the biggest driver of burnout within tech.
not having on-call onboarding at all
i find it quite odd and stressful that so many companies out there just expect their engineers to hop into a rotation without any training.
to me this would be like handing a teenager car keys without having them pass a learner's permit test first.
but paigerduty, you say, we have a shadow rotation and a whole incident response guidebook
THAT IS NOT ENOUGH. ahem. a shadow rotation should be like releasing animals into the wild. the should have the skills to fend for themselves!
you've got to equip them with skills. you do not want any engineer looking at a page and thinking "I don't understand any of this".
so why is that not enough? for starters every company i've been at has Lego'd together different monitoring components let alone tooling. if you ever want to see true horror on a devs face, take someone used to the walled garden of enterprise monitoring tools and show them OSS monitoring tools.
this is a cutesy phrase for an abhorrent reality.
alerts, which normal humans associate with like horrible blaring beeps we in tech have decided is super cool to just pipe to Slack and ignore.
can you imagine actually hooking up your org's alert channels to something that created sound? shudder.
at a certain point with a mountain of alert noise primary on-callers kinda give up it feels impossible.
i've genuinely suggested deleting all alerts and starting fresh kind of like filing for Chapter STFU Alert Bankruptcy. for some reason no one's taken me up on it
why do spammy alerts make for shitty on-call onboarding?
glad you asked ;)
before you normalize to the noise - you have to develop your system spidey sense to understand what alerts indicate REAL DANGER and what alerts just "go off every deploy" or whatever.
no concept of a centralized info source on services + infra
yes this could be what they call a "service catalog" but to de-mystify things I see it as really a big directory of who is on-call for what (notice how I didn't say own!) links to high level docs and data signals and crucially the zillion places you can find monitoring data and how to interpret it.
you don't need a fancy web app to do this...one of the better one's I've seen was brilliantly designed in genuinely my fave Atlassian product, Confluence! If you're not up on the Page Properties Macro then welcome to the light my friend
Want a high level table view of all of something? In this case a PNW Fiber Seller profile look how nice and structured the info is....I wonder if its using a template...
clicking into one and we can see a nice lil table on each PNW Fiber Seller page, how handy. Wonder what's going on under the hood...
aha! Our new BFF, the Page Properties macro!
tying it all together on that main TOC page is the Page Properties Report.
if this blew your mind...then oh gosh I'll just have to become a Confluencer aka confluence influencer. just lmk.
no holiday compensation
so why holiday and not anything else? well for one I'm American and can only dream of getting EU worker's protections. secondly its what I've seen be achievable in practice and tbh the spot bonus did make me feel like valued for my shift. it recognized that this is an additional burden to spend your holiday holding ze pager.
no training on monitoring tool
look at the difference in airplane cockpits and monitoring "dashboard" for a smol plane vs a big jet
they're different air crafts and require different levels of visibility for the same general thing (flying in the air)
Would you expect a smol plane pilot to confidently walk in and fly that big jet?
tbh leaving the walled garden of Fancy Enterprise Monitoring TM tooling and diving into the OSS world can have as much of a UX shock as above (and vice versa).
and if there's any tool to have sharp and at the ready ... I am obvs biased but would say whatever your org uses for monitoring/observability.
so there we have it...a non-exhaustive list of "the baaaad" of on-call onboarding. in the final post i'll wrap up with "the ugly" and if you missed part 1 "the good" <- click that handy lil' link