Incident management for high-velocity teams
Get it free
Service Request Management
Overview
Best practices for building a service desk
IT metrics and reporting
SLAs: The What, the Why, the How
Why first call resolution matters
Help desk
Service desk vs help desk vs ITSM
How to run IT support the DevOps way
Conversational ticketing
Customize Jira Service Management
Transitioning from email support
Service Catalog
What is a virtual agent
Understanding IT services and why they’re important
IT Asset Management
Overview
Configuration management databases
Configuration vs Asset Management
Enhance efficiency and security with asset tracking
Incident Management
Overview
IT service continuity management
Incident Communication
Templates
Workshop
Incident Response
Best Practices
Incident Commander
Aviation
Roles and responsibilities
Lifecycle
Playbook
On call
On call schedules
On call pay
Alert fatigue
Improving on call
IT alerting
Escalation Policies
Tools
Template
KPIs
Common metrics
Severity levels
Cost of downtime
SLA vs. SLO vs. SLI
Error budget
Reliability vs. availability
MTTF (Mean Time to Failure)
DevOps
You built it, you run it
Problem management vs. incident management
ChatOps
ITSM
Major incident management
IT incident management
Modern incident management for IT ops
Disaster recovery plans for IT ops and DevOps pros
Bug tracking best practices
Postmortem
Template
Blameless
Reports
Meeting
Timelines
5 whys
Public vs. private
Tutorials
Incident communication
On call schedule
Automating customer notifications
Handbook
Incident response
Postmortems
Template generator
Glossary
Get the handbook
2020 State of Incident Management
2021 State of Incident Management
IT Management
Overview
Problem Management
Overview
Template
Roles and responsibilities
Process
Change Management
Overview
Best practices
Roles and responsibilities
Change advisory board
Change management types
Knowledge Management
Overview
What is a knowledge base
What is knowledge-centered service (KCS)
Self-service knowledge bases
Enterprise Service Management
Overview
HR Service Management and Delivery
HR Automation best practices
Three implementation tips for ESM
Understanding the offboarding process
ITIL
Overview
DevOps vs ITIL
ITIL Service Strategy Guide
ITIL service transition
Continual service improvement
IT Operations
Overview
IT Operations Management
Overview
System Upgrade
Service mapping
Application dependency mapping
IT infrastructure
An incident is no time to have multiple people doing duplicate work. It’s also a terrible time to have important tasks ignored, all because everyone thought somebody else was working on it. Incidents are made worse when incident response team members can’t communicate, can’t cooperate, and don’t know what each other is working on. Work gets repeated, work gets ignored, customers and the business suffer.
That’s why effective incident response teams designate clear roles and responsibilities. Team members know what the different roles are, what they’re responsible for, and who is in which role during an incident.
Here are a few of the most common incident management roles. Several of them, like major incident manager, are key to our own incident response strategy.
Role: Incident manager
Primary responsibility: The incident manager has the overall responsibility and authority during the incident. They coordinate and direct all facets of the incident response effort. As a rule of thumb, the incident manager is responsible for all roles and and responsibilities until they designate that role to someone else. At Atlassian, the incident manager can also devise and delegate ad hoc roles as required by the incident. For example, they could set multiple tech leads if more than one stream of work is underway, or create separate internal and external communications managers.
Secondary responsibilities: Everything someone else isn’t assigned to.
Also known as: Incident commander, major incident manager
Role: Tech lead
Primary responsibility: The tech lead is typically a senior technical responder. They are responsible for developing theories about what's broken and why, deciding on changes, and running the technical team during the incident. This role works closely with the incident manager.
Secondary responsibilities: Communicate updates to incident manager and other team members, document key theories and actions taken during the incident for later analysis, participate in incident postmortem, page additional responders and subject matter experts.
Also known as: On-call engineer, subject matter expert
Role: Communications manager
Primary responsibility: The communications manager is the person familiar with public communications, possibly from the customer support or public relations teams. They are responsible for writing and sending internal and external communications about the incident. This is usually also the person who updates the status page.
Secondary responsibilities: Collect customer responses, interface with executives and other high-level stakeholders.
Also known as: Communications officer, communications lead
Role: Customer support lead
Primary responsibility: The person in charge of making sure incoming tickets, phone calls, and tweets about the incident get a timely, appropriate response.
Secondary responsibilities: Pass customer-sourced details to the incident-response team.
Also known as: Help desk lead, customer support agent
Role: Subject matter expert
Primary responsibility: A technical responder familiar with the system or service experiencing an incident. Often responsible for suggesting and implementing fixes.
Secondary responsibilities: Providing context and updates to the incident team, paging additional subject matter experts.
Also known as: Technical lead, on-call engineer
Role: Social media lead
Primary responsibility: A social media pro in charge of communicating about the incident on social channels.
Secondary responsibilities: Updating the status page, sharing real-time customer feedback with the incident response team.
Also known as: Social media manager, communications lead
Role: Scribe
Primary responsibility: A scribe is responsible for recording key information about the incident and its response effort.
Secondary responsibilities: Maintain an incident timeline, keep a record of key people and activities throughout the incident.
Role: Problem manager
Primary responsibility: The person responsible for going beyond the incident’s resolution to identify the root cause and any changes that need to be made to avoid the issue in the future.
Secondary responsibilities: Coordinate, run, and record an incident postmortem, log and track remediation tickets.
Also known as: Root cause analyst
Tutorial
Setting up an on-call schedule with Opsgenie
In this tutorial, you’ll learn how to set up an on-call schedule, apply override rules, configure on-call notifications, and more, all within Opsgenie.
Read this tutorialUp next
Get to know the incident response lifecycle | Atlassian
The incident response lifecycle is your organization’s step-by-step framework for identifying and reacting to a service outage or security threat.
Read this articleUp Next
Lifecycle