Incidents
Incidents represent failures or disruptions detected when a monitored breaches a predefined failure threshold - such as multiple failed pings or service checks. When this occurs, an incident is automatically created to track the issue, capturing details like the affected monitor, time of failure, and current status. The incident is assigned to the appropriate on-call team member for investigation and resolution. The incident can be updated with status changes (e.g. acknowledged, resolved) and annotated with comments to help coordination among responders.
Configuration
An incident will be created if the option is enabled for the monitor.
Creation
Incidents are automatically created when a monitor enters a failure state, if the monitor is configured to do so. This is the default behaviour.
An incident is assigned to the on-call person of the team that the moniotr is linked to. That assignee can change the assignment to any other person in the team.
Updating Incidents
You can view and edit incidents from the left menu. Each incident shows the current status and the time it occured and the duration of the outage.
The status of an incident can be changed between the following values.
- OPEN
- ACKNOWLEDGED
- RESOLVED
- CLOSED
Any member of the team assigned to the incident can add comments. These are intended to help speed resolution of the issue through information sharing about the potential cause and the impact.
The incident assignee can also change the team assigned to the incident.
When the monitor status changes to its success/up state the incident is automatically updated and a comment added by the monitor polling services.
Deleting Incidents
An incident can be deleted after it has reached the CLOSED state.