What Are Incident Severity Levels? (SEV1 to SEV3 explained) | Better Stack Community (2023)

Not all incidents are created equal — prioritizing incidents based on theimpact they have on your business improves collaboration and makes for fasterincident resolution.

But how do you prioritize incidents?

Enter severity levels.

What are severity levels?

Severity levels is a measurement of the impact anincident has on your business. Commonly usedseverenity ranking is from SEV 1 (severity 1) to SEV 3 (severity 3), where SEV 1is a critical incident and SEV 3 is a minor incident.

SEV 1 incident could be a situation when a service is down for all users orcustomers, when there has been a major security breach or when customer data arelost. A SEV 1 is defined as a critical incident with high impact on thebusiness.

SEV 2 incident could be a situation when a significant part of the corefunctionality is not-working or when a service is unavailable for a subset ofusers or customers. A SEV 2 is defined as a major incident with significantimpact on the business.

SEV 3 incident could be a situation when a system issue causes a slightinconvenience to the users or customers, but doesn’t influence any major systemfunctions. A SEV 3 is defined as a minor incident with low impact on thebusiness.

SeverityDescriptionExample
SEV 1Critical incident with high impactA service is down for all customers
SEV 2Major incident with significant impactA service is down for a sub-set of customers
SEV 3Minor incident with low impactA bug is creating an inconvenience to customers

The levels can go beyond SEV 3. At larger organisations SEV 4 and SEV 5 areoften used. The number of severity levels can be determined by eachorganisation, but 3 levels are generally enough. More severity levels can leadto confusion and more time spent on accessing which severity level an incidentis instead of actually going forward and start working on the resolution.

Why are severity levels used?

Severity levels isn't just just fancy speak of DevOps teams. SEV levels puteveryone on the same page when an incident happens and can significantly improvethe incident response time.

Main benefit of using severity levels is that a team can connect a level to aspecific process or automation so whenever such incident occurs no improvisationis necessary and pre-made workflows are started.

For example a SEV 1 incident could be connected to an immediate statuspage update and to alerting an c-level company executives.A SEV 3 incident on the other hand can be connected to a much low-level workflow— for example a ticket being created in Jira.

Severity vs. Priority, what’s the difference?

In most cases, severity = priority.

The more severe the incident is, the more of a priority it is for the developerteam. An infrastructure incident that takes down the whole company onlinepresense is the highest priority for the DevOps team right away. But in somecases, you can have a high-priority incident that is not high in severity.

For example, if a recent homepage edit causes that the h1 title tag is notformatted properly, it’s certainly not very severe as the core functionality isnot affected. However, it’s a high priority because it can damage the brandimage of the company and cause confusion among current or potential customers.

Similarly, you can have high-severity, but low-priority incidents. For examplean incident that’s is making your product unavailable for 0.01% of all yourcustomers has a critical impact, because it’s making the product unusable. Butit’s low-priority because it’s only influencing a very small subset ofcustomers.

Because these low-severity + high-priority and high-severity + low-priorityincidents exist we need to distinguish the differences between severity andpriority:

  • Severity measures the impact an incident has on the business — It answersquestions about the consequences of an incident.
  • Priority measures incident’s urgency — It answers questions about whatshould be fixed first.

The fact that priority tells us what should be fixed first it’s usually betterto focus on working with priorities instead of severity levels. Let’s have alook how the priority levels first approach looks like.

How to use priority levels?

Priority levels work same as severity levels when it comes to numbering. Thelower the number the more priority the incident has.

The main difference is that priority level tells us what incident needs to besolved first, instead of just stating which incident is the most severe (has themost impact).

PriorityDescriptionExample
P1Critical incident that needs to be addressed immediatelly.A service is down for all customers.
P2Major incident that needs to be addressed quickly.A service is down for a sub-set of customers.
P3Minor incident that can be handled within working hours.A bug is creating an inconvenience to customers.

Simplyfying things: issues with P1 and SEV 3

Severity and priority levels are great in theory, but in practise they are oftentoo complicated. The main reason for having a severity levels setup is tosimplify incident communication within a team, not to complicate it. The goal isto say P1 or SEV 3 and get everyone on the same page immediatelly.

This is sadly often not the case. Especially in high stress situations likebeing waken up by an on-call alert in the middle of thenight. Similarly less technical people might think of SEV 3 as the highestseverity level, while it’s the lowest.

To simplify this we can switch to only using human words. This means thatinstead of using code words like SEV 1, we can use regular words like criticalincident for all SEV 1 or P1 incidents.

Here is how an alternative naming could look like:

Standardized code wordAlternative naming
P1 or SEV 1Critical incident
P2 or SEV 2Major incident
P3 or SEV 3Minor incident

Defining incident levels with examples

Another way to make incident levels more approachable is to define them withreal-life examples relevant to your specific product. For example if you arerunning an Airbnb competitor you could define incident levels as following:

PriorityAirbnb competitor definition (example)
P1At least 10% of users can’t book new stay and/or at least 10% of current customers can sign in and manage their bookings.

Privacy of confidential customer information was breached.

Some customers loss data about their bookings.

P2Maximum of 10% of user can’t book new stay and/or maximum 10% of current customers can sign in and manage their bookings.

All customers can’t reschedule or change their bookings.

New users can’t add more people when booking a stay.

P3Some of search filters are not working properly when new users pick new bookings.

Site is slower when loading images in listings.

This example is simplified, but the essence is that with this table anytechnical or non-technical team member has a very clear understanding what kindof incident is the company facing.

Final thoughs

Using severity, priority or just alternative human worded incident levels is agreat way to step up your incident management.

But keep in mind that any incident levels are only as good as the workflows that are connected with them.And that real-life definitions and examples from your business are the key tothe success of any incident levels implementation.

Learn more about how to improve your incident response:

  • Explained: All Meanings of MTTR and Other IncidentMetrics
  • How to Create a Developer-Friendly On-Call Schedule in 7steps
  • 4 Copy-Pastable Incident Templates for Status Pages
We call you when your
website goes down

Get notified with a radically better
infrastructure monitoring platform.

Explore monitoring →

Check Uptime, Ping, Ports, SSL and more.

Get Slack, SMS and phone incident alerts.

Easy on-call duty scheduling.

Create free status page on your domain.

Start monitoring

Got an article suggestion?Let us know

Next article

Availability Table (90%-99.999% Uptime)This availability table shows how much downtime is permitted to achieve a desired availability level.→

What Are Incident Severity Levels? (SEV1 to SEV3 explained) | Better Stack Community (4)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated: 11/04/2022

Views: 5905

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.