Nagios is one of the most widely used open-source network monitoring software used by thousands of NOC teams globally to monitor the health of a vast array of their hosts and services. Most teams rely on Emails as their primary Nagios alert notification channel, which may take a few minutes to respond to by your NOC team. While adoption of Microsoft Teams within organizations has skyrocketed in the last 3 years, a lot of teams have realized that sending Nagios alerts to Microsoft Teams can help you shave 70–80% from your alert response times and 50–60% in overall issue resolution times by leveraging the power of tribal knowledge and collaborative triaging.

At Zenduty, we have a bunch of customers that use Nagios with Zenduty to dispatch and escalate Nagios downtime alerts through not just SMS, Voice, Email, Slack, Push notifications but also to Microsoft teams. By sending Nagios alerts to Zenduty, our customers ensure that the entire NOC team channel becomes aware of the incident apart from their primary on-call engineers. If an on-call engineer, for some reason, cannot respond within SLA timeframes, then other team members within the Teams channel can step in and become the incident commander.

Sign up on Zenduty here to get real-time alerts from your Nagios, setup escalation policies and a solid incident response pipeline for your network operations.

How do I set up an end-to-end incident response to my Nagios host/service downtime?

  1. Sign up on Zenduty
  2. Create a new team, and add your NOC engineers to the team.
  3. Define a service within your newly created team, and add the Nagios integration. Configure the Nagios integration to send alerts to Zenduty
  4. Define an escalation policy for your service and map your service to that escalation policy
  5. Within the same service, add a Microsoft Teams connector integration
  6. Also, add Microsoft Teams as a personal contact information

How it all comes together

How the integration with Microsoft Teams catalyzes your incident response

Whenever Nagios detects a service or host downtime, it will send an alert to Zenduty, which will in turn convert that into an incident and execute the escalation policy for the service. The on-call engineers will receive alerts on SMS, Phone(Voice call), Email, and push notifications. Zenduty will also send a message to the Team channel(“Support Channel” channel in the example below) and also alert the on-call engineer in a direct bot chat message(1-to-one)

When the on-call engineer or anybody in the channel sees the alert, they can acknowledge the incident from Teams itself to prevent further alert escalations. If the on-call engineer needs any help with triaging or mitigating the incident, they can add subject matter experts and stakeholders as responders to the incident. Zenduty will then alert them on all channels including Teams. You can also set custom priorities and subscribe to SLA alerts on Teams itself.

After the on-call engineer remediates the issue, they can “resolve” the issue on Zenduty. On the other hand, of the Nagios metric falls back to the normal range, Zenduty will automatically resolve the incident and stop all alerts and escalation for the incident.

Nagios+Zenduty+Teams = better reliability

With Zenduty’s Nagios and Microsoft Teams integration, NOC teams can minimize their mean times to response and resolution for critical network issues. Zenduty helps you institutionalize reliability within your operations. Rapid response to downtime is critical for business continuity and maintaining the trust of your customers.

Sign up on Zenduty here to get real-time alerts from your Nagios, setup escalation policies, and solid incident response processes for your network operations.