In a recent study, 72% of CISOs stated that their teams are facing alert fatigue, while 82% of respondents to a Threat Stack survey indicated that alert fatigue is having a negative impact on their organization’s well-being and productivity.
Traditional approaches to managing security alerts have often driven teams into a reactive mode where they’re overwhelmed by huge volumes of noisy alerts or spend far too much time gathering information and digging around in log files. If this proliferation of data is transformed into relevant and actionable intelligence, however, teams can overcome alert fatigue, identify and respond to critical issues in real time, and reduce risk continuously over time.
In this post, we’ll take a look at some best practices on how you can move away from reactive, ad hoc tactics and adopt a structured, proactive approach by making alerts a key element of your overall information security strategy.
Stop Accepting Alert Fatigue as the Norm
Alert fatigue occurs when Security and DevOps teams become so desensitized to alerts through the “normalization of deviance” that even truly anomalous activities may be ignored. It’s difficult to justify tolerating or avoiding these problems, so this post outlines seven best practices that, based on our experience and the experiences of our customers, will transform noisy alerts into systems that provide more granular alerting and actionable data.
It may sound like a fancy statistics term, but “normalization of deviance” just means that bad practices start to seem normal the longer they go on.
We’re all running fast in the cloud today, building new features and making sure our platforms scale, but it’s just as important to build continuous improvement processes into the DNA of our organizations to catch and address security alert fatigue issues before they set in. These are problems that need to be fixed — not tolerated — to ensure that our security defenses stay ahead of the adversaries’ offensive maneuvers.
How does this happen? As a company grows, more tools are required, and with more tools come more alerts and often a breakdown of processes and procedures to handle them. Soon enough, the alerts coming from each of your systems and tools sound obnoxiously loud, and as a result, Security and DevOps teams become so desensitized to them that even when the system flags a truly anomalous activity, it may get ignored due to burnout.
What you want is for alerts to sound off like a harmonious choir, working together — and only hitting the loud notes when a real issue arises. So how can you make this happen? Take a look at the best practices below.
Seven Best Practices for Avoiding Alert Fatigue
1. Make All Alerts Contextual and Actionable
It’s a tiring day when you have to sift through alerts that have no meaning and no context from which to determine a course of action. Alerts need two key things in order to be effective:
- Context that comes from pairing data points from across the system to paint a complete picture, including runbooks, graphs, logs, notes, and any other details relevant to resolving the issue.
- Source details that indicate exactly where the issue originated and any other areas of your system that were impacted, so you can fix the root problem.
2. Reduce Redundant Alerts
It’s inefficient and counterproductive to be paged on the same issue over and over — especially if it’s a non-issue. This is one of the biggest factors leading to alert fatigue. It doesn’t matter whether it’s an alert triggered by regular engineering work, or a third-party app setting off an unnecessary alert — these instances can all lead to alert fatigue. Reducing and consolidating alerts can be done by either fine-tuning the alerting protocol for each tool — or even better — by combining all security functions into a single platform (such as the Threat Stack Cloud Security Platform®) in order to unify alert configurations and origination.
The Threat Stack Cloud Security Platform
Threat Stack’s Cloud Security Platform takes a behavior-based approach to security alerting, governed by pre-built and/or custom rules focused on events that you consider important. These rules are incredibly powerful in that they provide the clarity and transparency about what is being alerted on in AWS’ ever-changing world. The Cloud Security Platform allows security teams to instantly exclude any unactionable data about the environment, or increase the visibility of data that does matter — something that is fundamentally unavailable in other systems, including those that rely on machine learning (ML) algorithms. This enables companies to provide immediate input and customization into their security alerts on your time, without the need for a system to “re-learn” your process on its own time.
While Threat Stack delivers powerful insights and operational controls out of the box through its comprehensive, ready-to-use rulesets, it also gives you the flexibility to create new rules and further refine and optimize existing rules to suit the specific requirements of your unique environment (unlike an ML approach that is difficult to adapt and doesn’t know your business and use cases). The Threat Stack platform also provides a single, unified view of multiple accounts that enhances your ability to immediately identify specific areas of risk regardless of their location.
3. Designate Alerts to a Single Source or Timeline
With each tool sending off its own alerts (most often directly into your email inbox), it becomes difficult to connect the dots and uncover real issues — that is, if you even pay attention to these alerts amongst the clutter of your email. At Threat Stack, our motto is: Never rely on email alerts as your single source of truth. It’s far better to use an open communication channel like Slack to stream alerts, provide team-wide visibility, and allow for open discussions to resolve issues.
Streamlining security functions (threat intelligence, vulnerability management, CloudTrail events, etc.) into a single place, like the Threat Stack Cloud Security Platform, can also go a long way in unifying security alerting.
4. Adjust Anomaly Detection Thresholds
Caught up in the day-to-day hustle, many teams forget to fine tune baselines on a regular basis. This results in more alerts about nothing, further adding to noise and fatigue. A good place to start is by addressing your noisiest alerts, but an even better solution is using a tool that can learn from your system’s baselines over time, adjusting as you scale so you don’t have to do this manually. Threat Stack is built to do this.
5. Ensure That Correct Individuals and Teams Are Alerted
Another problem that crops up as teams grow is ensuring that everyone on the team has the right access to the right alerts in order to take action on them. As part of your continuous improvement processes, allow each team member to decide how, how often, and on what topics they should be alerted.
6. Customize Personal Notifications/Page
Most engineers and ops people have had the experience of being woken up during the night by non-severe alerts. Not only does this cut into much-needed sleep, but also it can start eroding your team’s trust in daytime alerts! Instead, ensure that only high-severity alerts trigger a “wake me up in the middle of the night” scenario. All others can wait until the morning.
7. Revisit and Adjust Regularly
The six recommendations discussed above are not intended to be a one-time effort. You need to revisit them regularly to ensure that your system is working as it should be. To help guide your continuous improvement initiatives, here are several questions you can pose to your team during postmortems and regular team meetings:
- Is alert “Signal:Noise” tuning owned by the entire team?
- Is alert tuning part of your continuous improvement processes?
- Are your teams empowered to prioritize work and address the factors that contribute to alert fatigue?
- Are the escalation processes sane and effective?
- Can more data be integrated into alerts to provide the proper context to make decisions?
Looking Ahead . . .
If your organization is suffering from alert fatigue, or you’re not getting value out of your alerts because you don’t have a structured approach to managing them, it makes sense to replace your current ad hoc tactics with a more proactive approach. We’re confident that the best practices outlined above will provide the motivation and information you need to start bringing greater control and management to your alerting processes.
If you’d like to learn more about where you stand in terms of proactive security, schedule a demo of the Threat Stack Cloud Security Platform.