How to Optimize Your Incident Response Process in the Cloud

Bad guys know the faster they move, the more they’ll be able to accomplish: the more data they’ll be able to steal, the more money they’ll be able to extort, the more damage they can do to your reputation. So it’s a race to see whether the bad guys can move faster than the good guys. You don’t want to be on the wrong side of that equation.

One way to move fast is to optimize your alerting and incident response processes (which are, of course, tightly connected). What does this mean in practice? It means your security tools need to be integrated into the operations team’s workflows so the moment a security issue is detected, an alert is sent to those who can fix it, enabling them to take rapid action based on solid information. Doing this will optimize security workflows and improve operational support.

Here’s an effective way to optimize alerting and incident response.

1. Optimize Alerting

There are two types of integrations and workflows that you need to think about optimizing here: alert management and incident audit.

Alert Management: Workflows to handle alerts

When your security team needs to triage an alert to determine whether additional help and clarification are needed, it’s vital that the supporting data required to make a decision is immediately at their fingertips.

To make this happen, you’ll need to integrate your security alerting tools with your incident management and chatops tools, such as PagerDuty and Slack. This way, security alerts will flow directly into the tools and workflows that other stakeholders use (i.e., your operations and development teams), so all of the data about a particular security event is located in a single place. And when it comes to responding, you won’t have to switch between tools in order to fill everyone in: it’s all in one place for everyone to see.

Response Audit: Review of response actions taken

Once alert processes are automated and streamlined, it’s time to build an audit (i.e., event review and analysis) trail. Audit trails capture what was done to triage and respond to an alert, providing critical visibility, accountability, and even a framework from which processes can be created or refined.

Security teams must be able to audit actions taken within security tools, including:

  • Who dismissed a particular alert
  • When the alert was dismissed
  • What annotations were made

This is helpful both while an incident is being responded to (to make sure procedures are being followed) and after the fact as well. Your team should audit responses to alerts on a regular basis to verify that correct response actions were taken and to continuously improve processes. This way, you can be sure your team is continually optimizing its approach and staying ahead of threats.

2. Optimize Incident Response

The best way to move fast is to automate aspects of the response process that don’t require the “human touch.” Nowhere is this more necessary than in incident management and response. Security in the cloud can be better managed when incident management tools are used to automatically provide visibility and alerts. Moreover, they can handle the prioritization of alerts — high-, medium-, and low-severity incidents — so you can quickly see what to focus on first and what to leave until later.

Specifically, security operations teams benefit when critical security alerts are integrated into high severity operational tools like PagerDuty. This guarantees visibility and alerting, making it easy for teams to gather and act on information in a timely manner. With the right tools in place, your team will be able to review any critical security incidents within minutes of detection. Here’s how we recommend handling each level of alert severity:

  • Medium-Severity
    “Warning-level” security alerts should be integrated through a platform like Threat Stack with operational tools (such as Slack) so the operations teams see them in real time, can open up dialogue about them, and determine a response — all in a single tool.
  • Low-Severity
    Low-severity alerts that don’t require immediate action still need to be captured through a platform like Threat Stack so they’re available for event reviews and are also available to the Compliance team for verification and compliance audits.

3. Measure Success

Once you have optimized your alerting and incident management workflows, you need to test, measure, and refine these from time to time to make sure the processes work well for your specific organization. The best way to measure success is to use your audit trails to find out:

  • How quickly events were triaged
  • How accurately events were triaged
  • How many people were involved in triage
  • How quickly triaged events were resolved
  • How many security incidents escaped “unnoticed” (or weren’t caught quickly enough)

You may want to identify and track other metrics that are unique to your organization, but the ones listed above are a good place to start. The more visibility you have into what’s working, the better you can optimize your alerting and response processes to quickly and effectively address security incidents when they arise.

Final Words . . .

For use cases on optimizing your alerting and incident response processes (as well as use cases on other critical security vulnerabilities), download a copy of our recently published Cloud Security Use Cases Playbook.