Zenduty Blog

Zenduty Blog

  • Home
  • Incident Management
  • Holistic Wellness
  • New on Zenduty
  • Sign Up

›Browse posts

Browse posts

  • Zenduty — SRE Puzzle of the week — Forgotten password
  • Incident Alert Routing — reducing noise and getting woken up only by alerts that matter
  • On-call doesn't have to be stressfull
  • The importance of GameDays
  • Site Reliability Engineering-Why you should adopt SRE
  • Relationships between Operation and Devlopment Teams
  • ChatOps-The future of collaboration
  • Post Mortems- Bringing clarity to incident reviews
  • The importance of Incident Roles
  • Fostering blamelessness at the workplace
  • The true cost of unreliability
  • Real-time incident management with HetrixTools
  • How to manage Incident Response Efficiently
  • Real-time incident management with Ghost Inspector
  • 10 Superfoods to Boost Productivity
  • Tackling the challenges of Incident management
  • Real-time incident management with AppOptics
  • Vacay Your Woes Away
  • Real-time incident management with Server Density
  • Nailing the Incident Management Process
  • Real-time incident management with Logzio
  • Responders make Incident Resolution Better
  • Real-time incident management with Humio
  • Meditate: Nip the stress in the bud.
  • Real-time incident management with Hosted Graphite
  • Real-time incident management with Scout
  • The Zen of SRE
  • Travel and Work: Wholesome or Troublesome?
  • Real-time incident management with Wavefront
  • A beginner's guide to Incident Management
  • Real-time incident management with Site24x7
  • Wellness- A Way of Life
  • Incident Management in an Agile Setup
  • My Hate-Love-Love Relationship With Sleep
  • Real-time incident management with LogDNA
  • The key to keeping your retail stores online
  • Real-time incident management with Bugsnag
  • Six tips to help you relax without leaving your desk!
  • Real-time incident management with Firebase Crashlytics
  • Real-time incident management with Atatus
  • Practicing Wellness for Increased Output at Work
  • Incident Management for Emergency Services!
  • Zenduty Now Alerts your Slack!
  • 4 Breathing Exercises to Help You at Work and On Call
  • Avoiding Burnout for SREs
  • Why on-call teams need ChatOps
  • Real-time incident management with Freshdesk
  • How You Can Boost Your Mental Health at Work
  • What ITOps Teams Can Learn From Sports Teams
  • Why Young Startups Should Invest in On-Call Right Now
  • Mental Health Myths Busted
  • Preparing for On-Call

The true cost of unreliability

September 13, 2019

Amrit Balraj

Every organization is different in the way it functions as a whole, with different approaches to operations management, functionality and legal structure. However every company whether big or small face unplanned downtimes from time to time.

There are multiple examples of large companies taking major hits due to an unintentional outage. An example of a significant crash of services is the Amazon Prime Day crash of 2018 which is estimated by Axios to have cost them $72m-$99m in lost sales. Prime members were unable to log on to the site to participate in the lightning deals of the day, causing a customer service nightmare along with a deep hole in their pockets.

If your company has an annual turnover of $1m, outages can cost from upto $1000-10,000 per hour of downtime. A study undertaken by software company ITIC displayed that one hour of downtime costs 98% of large enterprises more than $100,000 per hour of downtime. The costs multiply if downtime is experienced by business critical service components like payment, support, on boarding etc. These issues, if not detected and resolved immediately, can go from being a minor problem to potential public/customer relations disaster. Companies which deal in financial services, energy and data security are often the worst hit after a data outage. Always remember - Reliability has a direct correlation to your company’s growth, brand image and bottom line.

What are some strategies to improve the reliability of the services offered by your company?

Everyone prefers having their services smoothly around the clock, but outages are all part of the game. Your team might be building the best platform in the business with complex code and interdependent systems, but it pays to be cautious in the long run (literally).The most time tested methods to ensure that incidents don't take a chunk of your income is to:

  • Formulate realistic SLA goals, define controllable SLOs for uptime over a specified time, and establish clear SLIs and error budgets
  • Use monitoring and measuring tools to understand factors like rate of deployment, mean time to recover (MTTR) and quality assurance.
  • Build a solid CI/CD pipeline, standardize the deployment process, and automate testing as much as possible.
  • Construct an iron-clad incident response strategy with clear roles and responsibilities, incident checklists, automations and communication channels.
  • Learn from downtime - conduct blameless postmortems of critical incidents(when/what/why/how) and institutionalize best practices within your teams.

We are building Zenduty to help your company will be ahead of the curve when it comes to reliability and support. Zenduty serves as a single source of truth all of your alerts to help notify the right people minimizing confusion. Working with predefined parameters, Zenduty will you help your teams monitor the incident timeline increasing communication and visibility.


Zenduty is a cutting edge incident management platform designed by developers keeping the well-being of engineers in mind. Sign up for free here.

Tweet
Recent posts
Copyright © 2019 Zenduty Created by YellowAnt