All Stories

Site Reliability Engineering - Why you should adopt SRE

Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional. He...

Relationships between Operations and Development Teams

Modern businesses are evolving rapidly with the advent of cloud, CI/CD and microservices. However, there still exists an extensive and obvious divide between principle business stakeholders and developmental teams. Development...

ChatOps - The future of collaboration

ChatOps is the implementation of chatbots to unify communication and collaboration. Through ChatOps every single member of a team will be aware of what the other members are working on....

Post Mortems - Bringing clarity to incident reviews

An incident post mortem is known by many names- incident review, root cause analysis (RCA), learning review, but what do they entail?. A post mortem is a post-incident activity to...

The importance of Incident Roles

Modern technology organizations are required to be adaptive in their approach to incident management. A single project will have multiple teams working as different branches on integrated systems. Even if...

Fostering blamelessness at the workplace

An integral lesson every business (of any size) learns is that failure is inevitable at some point in the production cycle. There might be times where things go haywire at...