Understanding incident response roles and responsibilities (2024)

Incident management for high-velocity teams

Get it free

Learn more

Service Request Management

Overview

IT metrics and reporting

SLAs: The What, the Why, the How

Why first call resolution matters

Help desk

Service desk vs help desk vs ITSM

How to run IT support the DevOps way

Conversational ticketing

Customize Jira Service Management

Transitioning from email support

Service Catalog

What is a virtual agent

Understanding IT services and why they’re important

IT Asset Management

Overview

Configuration management databases

Configuration vs Asset Management

Enhance efficiency and security with asset tracking

Incident Management

Overview

IT service continuity management

Incident Communication

Templates

Workshop

Incident Response

Best Practices

Incident Commander

Aviation

Roles and responsibilities

Lifecycle

Playbook

On call

On call schedules

On call pay

Alert fatigue

Improving on call

IT alerting

Escalation Policies

Tools

Template

KPIs

Common metrics

Severity levels

Cost of downtime

SLA vs. SLO vs. SLI

Error budget

Reliability vs. availability

MTTF (Mean Time to Failure)

DevOps

You built it, you run it

Problem management vs. incident management

ChatOps

ITSM

Major incident management

IT incident management

Modern incident management for IT ops

Disaster recovery plans for IT ops and DevOps pros

Bug tracking best practices

Postmortem

Template

Blameless

Reports

Meeting

Timelines

5 whys

Public vs. private

Tutorials

Incident communication

On call schedule

Automating customer notifications

Handbook

Incident response

Postmortems

Template generator

Glossary

Get the handbook

2020 State of Incident Management

2021 State of Incident Management

IT Management

Overview

Problem Management

Overview

Template

Roles and responsibilities

Process

Change Management

Overview

Best practices

Roles and responsibilities

Change advisory board

Change management types

Knowledge Management

Overview

What is a knowledge base

What is knowledge-centered service (KCS)

Self-service knowledge bases

Enterprise Service Management

Overview

HR Service Management and Delivery

HR Automation best practices

Three implementation tips for ESM

Understanding the offboarding process

ITIL

Overview

DevOps vs ITIL

ITIL Service Strategy Guide

ITIL service transition

Continual service improvement

IT Operations

Overview

IT Operations Management

Overview

System Upgrade

Service mapping

Application dependency mapping

IT infrastructure

An incident is no time to have multiple people doing duplicate work. It’s also a terrible time to have important tasks ignored, all because everyone thought somebody else was working on it. Incidents are made worse when incident response team members can’t communicate, can’t cooperate, and don’t know what each other is working on. Work gets repeated, work gets ignored, customers and the business suffer.

That’s why effective incident response teams designate clear roles and responsibilities. Team members know what the different roles are, what they’re responsible for, and who is in which role during an incident.

Here are a few of the most common incident management roles. Several of them, like major incident manager, are key to our own incident response strategy.

Role: Incident manager

Primary responsibility: The incident manager has the overall responsibility and authority during the incident. They coordinate and direct all facets of the incident response effort. As a rule of thumb, the incident manager is responsible for all roles and and responsibilities until they designate that role to someone else. At Atlassian, the incident manager can also devise and delegate ad hoc roles as required by the incident. For example, they could set multiple tech leads if more than one stream of work is underway, or create separate internal and external communications managers.

Secondary responsibilities: Everything someone else isn’t assigned to.

Also known as: Incident commander, major incident manager

Role: Tech lead

Primary responsibility: The tech lead is typically a senior technical responder. They are responsible for developing theories about what's broken and why, deciding on changes, and running the technical team during the incident. This role works closely with the incident manager.

Secondary responsibilities: Communicate updates to incident manager and other team members, document key theories and actions taken during the incident for later analysis, participate in incident postmortem, page additional responders and subject matter experts.

Also known as: On-call engineer, subject matter expert

Role: Communications manager

Primary responsibility: The communications manager is the person familiar with public communications, possibly from the customer support or public relations teams. They are responsible for writing and sending internal and external communications about the incident. This is usually also the person who updates the status page.

Secondary responsibilities: Collect customer responses, interface with executives and other high-level stakeholders.

Also known as: Communications officer, communications lead

Role: Customer support lead

Primary responsibility: The person in charge of making sure incoming tickets, phone calls, and tweets about the incident get a timely, appropriate response.

Secondary responsibilities: Pass customer-sourced details to the incident-response team.

Also known as: Help desk lead, customer support agent

Role: Subject matter expert

Primary responsibility: A technical responder familiar with the system or service experiencing an incident. Often responsible for suggesting and implementing fixes.

Secondary responsibilities: Providing context and updates to the incident team, paging additional subject matter experts.

Also known as: Technical lead, on-call engineer

Role: Social media lead

Primary responsibility: A social media pro in charge of communicating about the incident on social channels.

Secondary responsibilities: Updating the status page, sharing real-time customer feedback with the incident response team.

Also known as: Social media manager, communications lead

Role: Scribe

Primary responsibility: A scribe is responsible for recording key information about the incident and its response effort.

Secondary responsibilities: Maintain an incident timeline, keep a record of key people and activities throughout the incident.

Role: Problem manager

Primary responsibility: The person responsible for going beyond the incident’s resolution to identify the root cause and any changes that need to be made to avoid the issue in the future.

Secondary responsibilities: Coordinate, run, and record an incident postmortem, log and track remediation tickets.

Also known as: Root cause analyst

Tutorial

Setting up an on-call schedule with Opsgenie

In this tutorial, you’ll learn how to set up an on-call schedule, apply override rules, configure on-call notifications, and more, all within Opsgenie.

Read this tutorial

Up next

Get to know the incident response lifecycle | Atlassian

The incident response lifecycle is your organization’s step-by-step framework for identifying and reacting to a service outage or security threat.

Read this article

Up Next

Lifecycle

Understanding incident response roles and responsibilities (2024)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 6335

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.