Roxana González May 24, 2022
- 6 min read
Incident management is the process followed by the area of IT service management torespond to a service disruption, in order to restore it to normal as quickly as possible, minimizing the negative impact on the business. An incident is a single unplanned event that generates a service disruption, whereas a problem is a cause or potential cause of one or more incidents, as defined by ITIL incident management guidelines.
Incident management is the process followed by the area of IT service management torespond to a service disruption, in order to restore it to normal as quickly as possible, minimizing the negative impact on the business.
An incident is a single unplanned event that generates a service disruption, whereas a problem is a cause or potential cause of one or more incidents, as defined by ITIL incident management guidelines.
Incident management differs from problem management, in that the first one revolves around addressing specific disruptive events, in real time, whereas problem management is focused on minimizing and preventing the root cause of those events.
As it can be seen, incident and problem management are related but are not the same. Incident management processes are mainly used by service desk teams, right from the moment the incident ticket is received. Service desks are the only point of contact for end users to report incidents.
Incident Management lifecycle
The incident management process refers to the company's guidelines or framework for identifying and responding to a service outage or other disruptive event. An incident management process starts when a user reports an issue and ends when a service desk team member solves that issue.
There are two major industry standard incident response frameworks, which are NIST and SANS. Having a good strategy will help to attain good problem management standards.
NIST stands for National Institute of Standards and Technology. Its incident response plan consists of four steps:
- Detection and Analysis
- Containment, Eradication, and Recovery
- Post-Incident Activity
SANS stands for SysAdmin, Audit, Network, and Security. It's a private organization, which focuses on security incidents. Its incident response process consists of six steps:
- Lessons Learned
There are some similarities within these two incident management processes. In both cases the first step is preparation, which requires compiling all the assets and ranking them according to their level of importance. The second step implies creating an incident response plan for each event and similar incidents.
Then, it is necessary to create a communication strategy and identify who and how to contact depending on the situation. Incident responders should take this into account from the very first onset of the incident. This is also important for problem management.
Identification/Detection and Analysis
The second step in both standards require identification of the incident. Once this is done, it is important for the incident management team to evaluate what has caused the breach, so that the unexpected issue becomes a known error that can be prevented in the future. All this data will eventually help problem management as well.
Containment, Eradication, and Recovery
NIST groups these three steps in one, whereas SANS describes them separately. Containment seeks to stop the incident or breach as soon as possible to reduce the inconvenience that it may lead to. Eradication means the breach has occurred or the threat actor is within the system and it is important for the incident management team to remove it so that it does not expand to other areas. Recovery seeks to restore the system back to a prior level of performance before the disruption occurred.
Post-Incident Activity or Lessons Learned
This final step, although named differently, is shared among the NIST and SANS approach.
It refers to the moment the organization analyzes the situation in order to learn from experience. The aim is to understand how to better respond to future security incidents, or any type of incident, and record the improvements that need to be made in a document that will serve as a guideline in the future, both for incident handling and problem management. Known errors can prevent future incidents.
The five most common incident management issues
- Plans are not customized to the organization
Sometimes organizations put into practice standard incident resolution plans that are not tailored to their context or needs. Many ready-made plans are just ineffective or not well adapted to the company.
Recommendation: it is necessary to determine processes and strategies adapted to the type of business, objectives, environment and culture. It is important to either create from scratch or adapt a standard taking into account all the variables mentioned. An incident manager should then be able to put all this into practice.
- Lack of prioritization
Lack of prioritization increases the risk of missing critical incidents. Resources are limited, therefore it is important to prioritize, and differentiate critical from non critical incidents. This should also be taken into account when setting out a problem management strategy.
Recommendation: organizations should establish a clear prioritization scheme, so that the teams know what should be addressed first. It is also recommended that an incident manager help automate responses as much as possible.
- Poor communication strategy and ways to collaborate
It is crucial to know what should be communicated and to whom in order to respond to an incident. Some organizations resort to mails or spreadsheets and the information is sent many times causing an overflow of messages which is not effective and does not foster collaboration.
Recommendation: a clear communication strategy should be laid out to attain effective incident management. The first step is that service desk teams publish relevant data in a shared portal. It might be useful to resort to a centralized panel where all the latest details about the incident are clearly expressed. In this way, all the key actors will be able to get all the necessary information at once, without delay. This will lead to a better collaboration strategy and team work so that the response time is reduced.
- The response tools are inadequate
Some organizations have inadequate or outdated tools to solve incidents. At times, even when the tools are updated, they might not be properly used by the service desk teams and the rest of the personnel either because they lack training or because they are not suited to the business.
Recommendation: members of the company that deal with incident management should receive proper training to be able to use all the necessary tools to perform their duties. It is important to regularly evaluate the tools to see if they need to be updated or if they are suited to respond to the threats the organization is exposed to.
Additionally, it's important to have suitable response tools such as InvGate Service Desk.
- The incident response team doesn't have authority
Incident response teams must escalate issues to different areas of management to get the support they need. They need partners, executives and other upper layers of management to be informed of the issues and solutions being developed and then make sure this information leads to their support. This might change management in a positive way.
Recommendation: it is important to lay out an automated communication channel with management so that they are well informed and ensure their support to the incident response team through the whole process. They might also need to contact other areas to facilitate their work during the response process.
Companies use top-of-the-line ITSM solutions such as InvGate Service Desk to effectively manage, communicate, and prevent incidents. The software is able to identify incident types as problems and tackle potential issues early and before they become problems.
Frequently asked questions about incident management
What is an incident?
An incident is a single unplanned event that causes a service disruption.
What is the difference between an incident and a problem?
A problem is the root that has caused one or many incidents. Problem management processes therefore, try to find and address that cause in order to prevent incidents from happening again in the future. Incident and problem management are closely related but are not the same.
Why is it important to have a clear incident management system?
It seeks to restore the service affected to normal as soon as possible so as to reduce the negative impact on the business operation.
How are impact and urgency measured?
Impact is based on how the service provided is affected, whereas urgency measures the time for an incident to have a significant impact on the business operation.
Read other articles like this : Incident Management, Change Management
- Incident Detection. You need to be able to detect an incident even before the customer spots it. ...
- Prioritization and Support. ...
- Investigation and Diagnosis. ...
- Resolution. ...
- Incident Closure.
Examples of major incidents include:-
Natural disasters such as floods and storms. Pollution ie spillages, radioactive substances, toxic gases. War or terrorism.
Examples of Incidents include printer issue, wifi connectivity issue, application lock issue, email service issue, laptop crash, AD authentication error, file sharing issue etc.What are 3 types of incidents? ›
- Major Incidents. Large-scale incidents may not come up too often, but when they do hit, organizations need to be prepared to deal with them quickly and efficiently. ...
- Repetitive Incidents. ...
- Complex Incidents.
In the event of a cybersecurity incident, best practice incident response guidelines follow a well-established seven step process: Prepare; Identify; Contain; Eradicate; Restore; Learn; Test and Repeat: Preparation matters: The key word in an incident plan is not 'incident'; preparation is everything.What are the 4 main stages of a major incident? ›
What is a Major Incident? enquiries likely to be generated both from the public and the news media usually made to the police. Most major incidents can be considered to have four stages: • the initial response; the consolidation phase; • the recovery phase; and • the restoration of normality.What is a major incident in the workplace? ›
A critical incident is an unexpected event that causes a period of acute stress in a workplace environment or group.What is a simple major incident? ›
A simple incident describes a major incident where infrastructure remains intact; a compound incident involves damage to infrastructure, e.g. transportation, lines/methods of communication, health services, etc.What are considered critical incidents? ›
A critical incident is a sudden, unexpected and overwhelming event, that is out of the range of expected experiences. You may feel intense fear, helplessness, horror and completely out of control. After such an abnormal event, most people experience reactions that are disturbing and difficult to accept.What are the types of incident? ›
- Worker injury incident.
- Environmental incident.
- Property damage incident.
- Vehicle incident.
- Fire incident.
Major incident management (often known here at Atlassian simply as incident management) is the process used by DevOps and IT Operations teams to respond to an unplanned event or service interruption and restore the service to its operational state.What are the 6 stages in the incident management life cycle? ›
The NIST incident response lifecycle breaks incident response down into four main phases: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Event Activity.What is a Type 5 incident? ›
TYPE 5 INCIDENT: One or two single response resources with up to 6 response personnel, the incident is expected to last only a few hours, no ICS Command and General Staff positions activated.What are the 6 different types of accidents? ›
- Accidents at Work. You may have been involved in an accident whilst at work. ...
- Slip/Trip Claims (public liability) ...
- Industrial Diseases and Illnesses. ...
- Road Traffic Accidents. ...
- Accidents Abroad. ...
- Accidents involving Animals. ...
- Sports Related Injuries.
- Clinical Negligence.
A local or regional IMT (Type 4 or 5) is a single and/or multi-agency team for expanded incidents typically formed and managed at the city or county level or by a pre-determined regional entity.What are the main components of incident handling? ›
- Respond to threats.
- Triage incidents to determine severity.
- Mitigate a threat to prevent further damage.
- Eradicate the threat by eliminating the root cause.
- Restoring production systems.
- Post-mortem and action items to prevent future attacks.
Security incidents are events that may indicate that an organization's systems or data have been compromised or that measures put in place to protect them have failed. In IT, a security event is anything that has significance for system hardware or software, and an incident is an event that disrupts normal operations.How do you do incident management? ›
- Identify an incident and log it. An incident can come from anywhere: an employee, a customer, a vendor, monitoring systems. ...
- Categorize. Assign a logical, intuitive category (and subcategory, as needed) to every incident. ...
- Prioritize. Every incident must be prioritized. ...
In simple terms, Priority 1 (P1) is a complete business down situation or a single critical system down with high financial impact. The client/user is unable to operate. Real time E.g. Chrome is not opening up on your machine. Its the main or the only browser which you use or have.What makes a good major incident manager? ›
Major Incident Manager
Leveraging technology to issue all communications and providing key stakeholder management. Leading, driving, facilitating and chairing all investigation activities, meetings, and conference calls.
Definition: An Incident's priority is usually determined by assessing its impact and urgency: 'Urgency' is a measure how quickly a resolution of the Incident is required. 'Impact' is measure of the extent of the Incident and of the potential damage caused by the Incident before it can be resolved.What is a critical incident at work? ›
A critical incident is any event or series of events that is sudden, overwhelming, threatening or protracted. This may be an assault, threats, severe injury, death, fire or a bomb threat.What is Critical Incident management? ›
Critical Incident Management is the process by which an organisation reacts to such an event in order to protect its operations, staff and stakeholders, the wider public and ultimately its reputation.What is incident management interview questions and answers? ›
- How would you go about leading an incident investigation? ...
- How would you manage a large team of technical staff? ...
- How do you keep up to date with the changing IT industry and new software programs? ...
- Which incident management software systems do you enjoy working with?
A natural major incident is the result of earthquake, flood, fire, volcano, tsunami, drought, famine or pestilence. To some extent the natural disaster will be self-propagating: following a flood or earthquake those left homeless and starving will be vulnerable to the disases associated with squalor.What are the 3 categories of triage? ›
- Immediate category. These casualties require immediate life-saving treatment.
- Urgent category. These casualties require significant intervention as soon as possible.
- Delayed category. These patients will require medical intervention, but not with any urgency.
- Expectant category.
A major incident is one that causes a serious interruption to business activities and must be resolved with the utmost urgency.What are Critical Incident Stress Management examples? ›
Examples of a critical incident:
Unanticipated poor patient outcome. Injury or sudden death of a co-worker on the job. Major incidents involving multiple deaths and/or injuries. Attempted/completed suicide.
What is a Critical Incident? A Critical Incident is any event that poses a serious risk to the life, health or safety, of an individual who is receiving services from your organisation. It can include incidents where staff, clients and third parties feel unsafe and under stress.What are the 6 steps of incident response? ›
- Lessons Learned.
- Stage 1: Identification. Declaring the major incident: ...
- Stage 2: Containment. Assembling the major incident team. ...
- Stage 3: Resolution. Implementing the resolution plan as a change. ...
- Stage 4: Maintenance. Performing a post-implementation review.
P1 – Priority 1 incident tickets (Critical) P2 – Priority 2 incident tickets (High) P3 – Priority 3 incident tickets (Moderate) P4 – Priority 4 incident tickets (Low) SLA success rate is given as percentage.What is Major incident management process? ›
Major incident management (often known here at Atlassian simply as incident management) is the process used by DevOps and IT Operations teams to respond to an unplanned event or service interruption and restore the service to its operational state.What is incident management life cycle? ›
The NIST incident response lifecycle breaks incident response down into four main phases: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Event Activity.What are security threats and incidents? ›
Security incidents are events that may indicate that an organization's systems or data have been compromised or that measures put in place to protect them have failed. In IT, a security event is anything that has significance for system hardware or software, and an incident is an event that disrupts normal operations.Which one is most important aspect of incident response? ›
Detection. One of the most important steps in the incident response process is the detection phase. Detection (also called identification) is the phase in which events are analyzed in order to determine whether these events might comprise a security incident.What is a major incident in the workplace? ›
A critical incident is an unexpected event that causes a period of acute stress in a workplace environment or group.How can I be a good incident manager? ›
- An eye for detail. An Incident Manager must ensure processes and policies are being adhered to and standards are being met. ...
- Be calm under pressure. ...
- A methodical mind. ...
- A good communicator. ...
- A problem solver.
An incident manager's job is to respond to incidents when they occur and take any necessary steps to restore service and return the business to normal operations as quickly as possible. Incident managers are the IT staff members with which employees, suppliers, and customers interact when they are stuck and need help.What does P1 P2 P3 P4 mean? ›
The P1, P2, P3, and P4 are the P visa types. These visas are issued to a foreign athlete, famous artist, a member of an entertaining group, coach, and their family members. In this article, about each one of them is told clearly and the requirements that must be satisfied to get the visa.
Priority 3 (P3) – The clients' core business is unaffected but the issue is affecting efficient operation by one or more people. Priority 4 (P4) – The issue is an inconvenience or annoying but there are clear workarounds or alternates.What is a Priority 1 issue? ›
Priority 1 (P1): These issues are usually business-critical. They represent an issue for which no workarounds exist, or there is a severe outage. If you're a SaaS product, this might be your product being down or something which affects a large number of your customers.What are the main objectives of incident management? ›
The purpose of the Incident Management process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, ensuring that agreed levels of service quality are maintained.What is Critical incident management? ›
Critical Incident Management is the process by which an organisation reacts to such an event in order to protect its operations, staff and stakeholders, the wider public and ultimately its reputation.What is P1 and P2 incidents? ›
Depending on the impact and urgency, a major incident will be categorized as a P1 or P2. Incident Coordinators utilize a priority matrix to determine the appropriate impact and urgency. All P1 tickets are considered major incidents. P2 tickets are considered major if the impact is "multiple groups" or "campus."