Since the Major Incident Management Process was formalized in April of this year, we have started collecting data from each Major Incident that occurs. The goal of this action is ultimately to recognize, analyze, and remediate issues that are causing disruptions of service to our user base and ultimately to the people of Colorado.
Collection of Metrics
Following each Major Incident, we record several KPI’s (Key Performance Indicators) that will
give a high level view of the issues that are occurring across the different agencies. Some examples of these include:
At the beginning of each month, we submit a report to the Executive Leadership Team that graphically outlines the above KPI’s. Below are some examples of what we report:
Actions based on trends
Once a baseline is established and trends become identifiable, then actions can be taken to reduce the number of Major Incidents and in turn reduce the downtime of the services we provide. One example would be the downtime caused by vendor outages. An outage here and there may go unnoticed and quickly forgotten until you take a high level view at the data below.
From this data we can extrapolate everything we need to approach the vendor and start asking questions. We can compare the MTTR to the SLA’s and request paybacks, or ask what is being done to alleviate repeat issues or even congratulate them on a job well done!
In time, other data points that can be collected during a Major Incident may come to light that will provide more granular data than what you see here. Please feel free to contact our team if you have ideas or have further questions.