3 min read

Meat Based Monitoring

Meat based monitoring is a hilarious name for a serious situation. I reflect on a time when I became the meat-based monitor for a major hotel chain.
Meat Based Monitoring
illustration of neon rainbow grid swirling in on itself

Several months back the marketing department gathered for an in-person offsite in Nashville. After a long day traveling across the States all I wanted was to put in a dinner order, check in, and loaf in my room in peace. That was the plan anyways.  In the taxi ride over I attempted to look up the hotel’s dining menu and instructions.  Instead of the homepage, I was greeted by this barebones error page that is clearly some default messaging intended only for engineers. 

hideous default error page HTTP Error 500.30 - ASP.NET Core app failed to start. Common solutions to this issue: The app failed to start, the app started then stopped, the app started but threw an exception during startup. Troubleshooting steps: check system event log for error messages, enable logging the application process' stdout, attach a debugger in app process and inspect. For more guidance on diagnosing and handling these error, visit Troubleshoot ASP.NET Core on Azure App Service and IIS.

I’m not asking for Picasso-worthy designs either, even the minimalistic pink unicorn from GitHub would work. At the very least replacing the error message with something customer-friendly like “We’re aware of an issue with our site and are working on it. In the meantime, to checkmeantime to check on reservations, call us at 555-555-5555.” 

beautiful minimally styled error page for GitHub.com stating We're having a really bad day. The Unicorns have taken over. We're doing our best to get them under control and get GitHub back up and running. Links to contact support, GitHub Status page, social media like to @githubstatus



“Must be a glitch” I thought to myself and I did what 54% of consumers  do when faced with this situation: and refreshed the page. No dice. I was still 30 minutes away and figured a total site outage must already be on their radar and would be fixed by the time I arrived. Bad assumption. When I approached the reception desk to check-in, I refreshed one last time and was kind of shocked to see that the site was still unavailable. At this point I realized the hotel probably didn’t even know their site was and had been down and reported it to the concierge. 

While I may not have been the first customer to notice, I was the first to report this outage that affected the websites for the entire chain of hotels in this group. The evening ended with me rolling my suitcase upstairs, chowing down on some southern food and of course occasionally checking the hotel site to see when it would come back online (it took another couple hours after check-in fwiw). All in all, the outage had a minimal impact on my overall experience. 

My inner SRE wondered when the site first went offline, how long the outage would have lasted if I hadn’t notified them, how many users also saw that default 500 error page, and what their approach to monitoring was. I will never know the answers to those questions but was left with a reminder that in a time when observability dominates discussions, monitoring continues to play a key and overlooked role.


CAT TAX

sweet Norman my bouncy black catlaying atop a cat tree surrounded by delicious plastic green leaves. his eyes are focused on a treat being held off camera

-- paigerduty