All Stories

Recommended reads on Resilience engineering and SRE

Recommended reads on Resilience engineering and SRE

Escalating Prometheus alerts to SMS/Phone/Slack/Microsoft-Teams via AlertManager and Zenduty

Prometheus is by far, one of the most popular open-source monitoring tools used by millions of engineering teams globally with a robust community and continued adoption and evolution.

Site reliability engineering - what is SRE?

As companies today are racing to build site reliability engineering(SRE) practices within their engineering teams, site reliability engineering has become one of the hottest and highest paying jobs in tech....

On-call compensation models

Providing customers with a world-class and seamless user experience is critical for the success of any business. It is therefore important that you have a robust on-call strategy that optimizes...

Defining your Sev-1s

One of the primary things you need to figure out whenever your team is formulating your incident management process is describing in words what a Sev0(your highest incident priority) looks...