Ending Alert Fatigue: Threat Stack and VictorOps On Modern-Day Security and Incident Management

Alert fatigue is a very real issue that security, ops, and dev teams are dealing with today. But how do you know if alerts are burning you out? And how can others on your team recognize it? More often than not, alert fatigue flies under the radar until it’s too late — critical issues start to pass by unnoticed and our adversaries get the upper hand.

We polled attendees in this week’s webinar, Ending Alert Fatigue With Modern Security & Incident Response (recorded verison below), that Threat Stack co-hosted with our friends at VictorOps and found that 82 percent of them said alert fatigue has had an impacton well-being and productivity!

 

The symptoms of alert fatigue read like a prescription drug commercial:

  • Longer response times
  • Anxiety
  • Sleep deprivation
  • Negative physical effects like high blood pressure
  • Cognitive impairment
  • Team and individual dissatisfaction
  • And a whole slew of other symptoms barely audible above overtly cheery music

We’re all running fast in the cloud today, building new features and making sure our platforms scale, but it’s just as important to build continuous improvement processes into the company DNA to catch and address security alert fatigue issues before they set in. These are problems that need to be fixed, not tolerated, to ensure that our security defenses stay ahead of the adversaries’ offensive maneuvers.

During Tuesday’s webinar, Jason Hand, DevOps Evangelist for VictorOps, and I discussed the best ways to combat this all-too-common problem.

The Culprit: Normalization of Deviance

It may sound like a fancy statistics term, but normalization of deviance just means the incremental and gradual erosion of normal procedures. In other words, bad practices start to seem normal the longer they go on.

How does this happen? As a company grows, more tools are required, and with more tools come more alerts and often a breakdown of processes and procedures to handle them. Soon enough, the alerts coming from each of your systems and tools sound like an obnoxiously loud cocktail party, everyone having different conversations about different things. As a result, Security and DevOps teams become so desensitized to these alerts that even when the system flags a truly anomalous activity, it may get ignored due to burnout.

What you want is for alerts to sound off like a harmonious choir, all working together and only hitting the high notes when a real issue arises. So how do we get there? Rather than sit by the sidelines waiting for the next team member to hit this negative inflection point, on Tuesday’s webinar Jason and I offered seven ways teams can avoid alert fatigue.

7 Ways to Avoid Alert Fatigue with Modern Tools

1. Make All Alerts Contextual and Actionable

It’s a tiring workday sifting through alerts that have no meaning and no context from which to determine a course of action. Alerts need two key things in order to be effective:

  • Context that comes from pairing data points from across the system to paint a complete picture, including runbooks, graphs, logs, notes, and any other details relevant to resolving the issue.
  • Source details that indicate exactly where the issue originated and any other areas of your system that were impacted, so you can fix the problem at the root.

2. Reduce Redundant Alerts

Plain and simple, it’s inefficient to be paged on the same issue over and over — especially if it’s a non-issue. This is one of the biggest factors leading to alert fatigue. It doesn’t matter whether it’s an alert triggered by regular engineering work, or a third-party app setting off an unnecessary alert: these instances can all lead to alert fatigue. Reducing and consolidating alerts can be done by either fine-tuning the alerting protocol for each tool, or even better, by combining all security functions into a single platform (such as the Threat Stack Cloud Security PlatformTM) to unify alert configurations and origination.

3. Designate Alerts to a Single Source or Timeline

With each tool sending off its own alerts (most often directly into your email inbox), it becomes difficult to connect the dots and uncover real issues — that is, if you even pay attention to these alerts amongst the clutter of your email. Our motto, both at Threat Stack and VictorOps, is never rely on email alerts as your single source of truth. It’s far better to use an open communication channel like Slack to stream alerts, provide team-wide visibility, and allow for open discussions to resolve issues.

Streamlining security functions (threat intelligence, vulnerability management, CloudTrail, etc.) into a single place, like the Threat Stack Cloud Security PlatformTM, can also go a long way in unifying security alerting. (Oh, and we also have a Slack integration.)

4. Adjust Anomaly Detection Thresholds

Caught up in the day-to-day hustle, many teams forget to fine tune baselines on a regular basis. This results in more alerts about nothing, further adding to fatigue. A good place to start is by addressing your noisiest alerts, but an even better solution is using a tool that can learn from your system’s baselines over time, adjusting as you scale so you don’t have to do this manually. Threat Stack and VictorOps are both built to do this.

5. Ensure That Correct Individuals/Teams Are Alerted

Another problem that crops up as teams grow is ensuring that everyone on the team has the right access to the right alerts in order to take action on them. As part of your continuous improvement processes, allow each team member to decide how, how often, and on what topics they should be alerted.

6. Customize Personal Notifications/Page

Jason is no stranger to the countless stories of engineers and ops folks being woken up during the night by non-severe alerts. Not only will your team be sleeping less, they may even stop trusting the daytime alerts! Instead, ensure that only high-severity alerts trigger a “wake me up in the middle of the night” scenario. All others can wait until the morning.

7. Revisit and Adjust Regularly

The preceding six recommendations shouldn’t be a one-time effort; you need to revisit them regularly to ensure the system is working as it should be. Below are several questions we recommend posing to your team during postmortem exercises and regular team-wide meetings:

  • Is alert “Signal:Noise” tuning owned by the entire team?
  • Is alert tuning part of your continuous improvement processes?
  • Are your teams empowered to prioritize work and address the factors that contribute to alert fatigue?
  • Are the escalation processes sane and effective?
  • Can more data be integrated into alerts to provide the proper context to make decisions?

So, let’s make alert fatigue a thing of the past, shall we? Check out Threat Stack Cloud Security PlatformTM and VictorOps and join the ranks of modern-day security and incident management!