Jaya Jain
Jaya Jain Product Manager at Zenduty.

A beginner's guide to Incident Management

A beginner's guide to Incident Management

An incident is an event or an occurrence that could lead to the disruption of services and operations of an organisation or could lead to them making losses. Management of incidents refers to a team of engineers working in a firm who identify, analyse and correct such incidents to prevent recurrences.

Incidents could be classified based on their technical or characteristic features:

Technical incidents:

  1. Compromised computing resources:
    • Operating System corruption — Due to malware or virus which causes errors to show up in the system.
    • User account compromises — When the control of an individual’s account is either fully or partially being managed by an unauthorised person.
  2. Exploitation via Email:
    • Unsolicited Commercial Email (UCE) — Commonly known as spam mail. They are generally trick emails sent in bulk by spambots.
    • Phishing emails — They are fraudulent emails sent by scammers to obtain personal information and use it against the individuals.
  3. Network and Resource Abuses:
    • Network scanning activities and Denial of network service attacks.
  4. Resource misconfiguration:
    • Vulnerable software configurations — Vulnerable software configurations is caused by holes in computer security and leaves the systems open to cyber attacks.
    • Open proxy servers and anonymous ftp servers — As there is no encryption your data is not safe being transferred in these servers.

Character-based incidents:

  1. Major incidents
    Large scale incidents are a rarity but when they do occur an incident management system should be in place to tackle the problems immediately to ensure no irrevocable damage or major losses are incurred. Speed and efficiency in dealing with these issues are vital.
  2. Repetitive incidents
    Some incidents persist despite solving them repeatedly; they could be a sign of underlying problems in the configuration. Scripts can be created to follow a procedure to resolve simple repetitive incidents.
  3. Severity
    Incidents are classified based on the parameters of the safety concerns and loss or exposure of personal data. The size of the community affected is also a parameter to check the severity.

Functioning:
The system firsts make a note of the incident. It then classifies the incidents based on urgency, impact and priority. It assigns the resolving duty to the appropriate personnel. Finally, it controls the incident through resolution and reports it after the issue has been resolved. The five steps involved in resolving an issue are:

  • Incident diagnosis — The first step towards resolving an incident where the initial understanding and analysis of the problem takes place.
  • Incident escalation — The incident is escalated for quicker solving and is assigned to the team with the right skills to tackle it.
  • Incident investigation— The initial diagnosis and relevant information from related incidents and discoveries are put together to come up with a solution.
  • Incident resolution — The incident has been handled and is documented for future use.
  • Incident closure — The incident is filtered out of the main purview but is added to the organisation’s knowledge base in case there may be related incidents to solve in the future.

Benefits:

  1. Business is not affected massively, losses are contained and the effectiveness of the business is increased.
  2. Improved monitoring and accurate assessment of service level agreements between service providers and clients.
  3. The loss of incidents and record of incorrect incidents is eliminated.
  4. There is a general increase in the productivity and efficiency of the organisation.
  5. The end user and customer satisfaction increases.
  6. There is an overall faster increase in the functional escalation of the incidents.
  7. It motivates the incident management team along with other teams to engage in training one another and builds a culture of trust between them.
  8. The growth process for junior staff is quick as they gain valuable knowledge on the specific incident resolution as well as the overall system’s functioning.
  9. Documentation of the incidents is enhanced qualitatively and quantitatively and the option of customising your reports helps you document specific information accurately concerning the process.
  10. The size and growth of the company will lead to differentiation in the duties and that will lead to the requirement of new tools to be created. The incident management process helps shed light on this need and helps understand where to begin creating the tools.
  11. Communication across various platforms is expeditious to help detect incidents quickly and resolve them.
  12. Staff can access and manage the incident details from multiple devices and can link all the inter-related incidents.
  13. There is a better organisation of procedure and chain of command as the allocation of resources is done as per available staff and their skill sets.

Zenduty is a cutting edge incident management platform designed by developers keeping the well-being of engineers in mind. Sign up for free here.

Looking for an incident management and on-call scheduling platform?

Sign up for a 14-day free trial of Zenduty. No CC required. Implement modern incident response and on-call practices within your production operations and provide industry-leading SLAs to your customers

Sign up on Zenduty Login to Zenduty